On link-informed graph traversal¶

The all-simple-paths algorithm as implemented in nx.all_simple_paths() inside the networkx (abbreviated below as nx) package version 2.2 uses a depth-first traversal scheme to find all possible paths from a start node to one or more end nodes [1].

For example, let nodes A-F represent unitigs in a De Bruijn graph created from sequencing reads of transcripts:

graph LR; A-->B A-->C B-->D C-->D D-->E D-->F

An all-simple-paths traversal starting at A will return the paths ABDE, ABDF, ACDE, and ACDF. However, what if the sequenced reads that were used to create this graph only originated from two paths: ABDE and ACDF? Can some of these sequencing reads be used to restrict the paths returned by the all-simple-paths algorithm?

Mccortex provides a data structure called “links” for annotating De Bruijn graphs in Cortex format. In the example above, links can be used to store information on a read that covers both the A –> B and D –> E junctions. Cortexpy can use these links to performed a “link-informed” (that is, a restricted) all-simple-paths traversal.

[1]	Cortexpy currently uses a copy of `nx.all_simple_paths()` named `_all_simple_paths_graph()`.

Link-informed graph traversal in `cortexpy`¶

Cortexpy uses `networkx` algorithms¶

Cortexpy represents Cortex graphs as nx.DiGraph objects [2]. This allows the easy application of networkx algorithms to Cortex graphs. cortexpy achieves link-informed traversal by wrapping a Cortex graph in a LinkedGraphTraverser object, which modifies the behavior of the __getitem__() method. To understand why this works, let us first take a look at the all-simple-paths algorithm:

 def _all_simple_paths_graph(G, source, targets, cutoff):
     """From networkx.algorithms.simple_paths"""
     visited = collections.OrderedDict.fromkeys([source])
     stack = [iter(G[source])]
     while stack:
         children = stack[-1]
         child = next(children, None)
         if child is None:
             stack.pop()
             visited.popitem()
         elif len(visited) < cutoff:
             if child in visited:
                 continue
             if child in targets:
                 yield list(visited) + [child]
             visited[child] = None
             if targets - set(visited.keys()):  # expand stack until find all targets
                 stack.append(iter(G[child]))
             else:
                 visited.popitem()  # maybe other ways to child
         else:  # len(visited) == cutoff:
             for target in (targets & (set(children) | {child})) - set(visited.keys()):
                 yield list(visited) + [target]
             stack.pop()
             visited.popitem()

The key line here is the highlighted line 18. This is the line that appends an iterator of a node’s successors to the stack of nodes to visit. The algorithm asks the graph object G for the successor nodes of child by calling the __getitem__() method of G:

G[child]

This means that we can restrict the paths returned by _all_simple_paths_graph() by restricting the successor nodes returned by G.

[2]	The implementation is not perfect and could use some improvement.

`LinkedGraphTraverser` restricts simple paths using links¶

The __getitem__() method of LinkedGraphTraverser restricts the returned successors using the following rules:

If no link information exists for the query node, then return all successors.
Otherwise, return only successors that are consistent with the links encountered on the path from start to query node.
If the query node is annotated with links, pick up all links.
For each successor node, only retain links that are consistent with the path taken from the start to this successor node.
For each successor node, drop links that are no longer relevant to the successor node (i.e. links that have expired)

On link-informed graph traversal¶

Link-informed graph traversal in cortexpy¶

Cortexpy uses networkx algorithms¶

LinkedGraphTraverser restricts simple paths using links¶

Link-informed graph traversal in `cortexpy`¶

Cortexpy uses `networkx` algorithms¶

`LinkedGraphTraverser` restricts simple paths using links¶