# On link-informed graph traversal¶

The all-simple-paths algorithm as implemented in `nx.all_simple_paths()`

inside the
`networkx`

(abbreviated below as `nx`

) package version 2.2 uses a
depth-first traversal scheme to find all possible paths from a start node to one or more
end nodes [1].

For example, let nodes `A`

-`F`

represent unitigs in a De Bruijn graph created from sequencing
reads of transcripts:

An all-simple-paths traversal starting at `A`

will return the paths `ABDE`

, `ABDF`

, `ACDE`

,
and `ACDF`

. However, what if the sequenced reads that were used to create this graph only
originated from two paths: `ABDE`

and
`ACDF`

? Can some of these sequencing reads be used to restrict the paths returned by the
all-simple-paths algorithm?

Mccortex provides a data structure called “links” for annotating De Bruijn graphs in Cortex
format. In the example above, links can be used to store information on a read that covers both the
`A`

–> `B`

and `D`

–> `E`

junctions. Cortexpy can use these links to performed a
“link-informed” (that is, a restricted) all-simple-paths traversal.

[1] | Cortexpy currently uses a copy of `nx.all_simple_paths()`
named `_all_simple_paths_graph()` . |

## Link-informed graph traversal in `cortexpy`

¶

### Cortexpy uses `networkx`

algorithms¶

Cortexpy represents Cortex graphs as `nx.DiGraph`

objects [2]. This
allows the easy application of `networkx`

algorithms to Cortex graphs.
`cortexpy`

achieves link-informed traversal by wrapping a Cortex graph in a
`LinkedGraphTraverser`

object, which modifies the behavior of the
`__getitem__()`

method. To understand why this works, let us first take a look at the
all-simple-paths algorithm:

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 | ```
def _all_simple_paths_graph(G, source, targets, cutoff):
"""From networkx.algorithms.simple_paths"""
visited = collections.OrderedDict.fromkeys([source])
stack = [iter(G[source])]
while stack:
children = stack[-1]
child = next(children, None)
if child is None:
stack.pop()
visited.popitem()
elif len(visited) < cutoff:
if child in visited:
continue
if child in targets:
yield list(visited) + [child]
visited[child] = None
if targets - set(visited.keys()): # expand stack until find all targets
stack.append(iter(G[child]))
else:
visited.popitem() # maybe other ways to child
else: # len(visited) == cutoff:
for target in (targets & (set(children) | {child})) - set(visited.keys()):
yield list(visited) + [target]
stack.pop()
visited.popitem()
``` |

The key line here is the highlighted line 18. This is the line that appends an iterator of a node’s
successors to the stack of nodes to visit. The algorithm asks the graph object `G`

for the successor nodes of `child`

by calling the
`__getitem__()`

method of `G`

:

```
G[child]
```

This means that we can restrict the paths returned by `_all_simple_paths_graph()`

by
restricting the successor nodes returned by `G`

.

[2] | The implementation is not perfect and could use some improvement. |

`LinkedGraphTraverser`

restricts simple paths using links¶

The `__getitem__()`

method of
`LinkedGraphTraverser`

restricts the returned
successors using the following rules:

- If no link information exists for the query node, then return all successors.
- Otherwise, return only successors that are consistent with the links encountered on the path from start to query node.
- If the query node is annotated with links, pick up all links.
- For each successor node, only retain links that are consistent with the path taken from the start to this successor node.
- For each successor node, drop links that are no longer relevant to the successor node (i.e. links that have expired)