API reference

Random access of Cortex graphs

This module contains classes for inspecting Cortex graphs with random access to their kmers.

class cortexpy.graph.parser.random_access.RandomAccess(graph_handle, kmer_cache_size=None)[source]

Provide fast k-mer access to Cortex graph in log(n) time (n = number of kmers in graph)

__getitem__(lexlo_string)[source]

Return kmer associated with kmer string

No check is performed to make sure that the input string is a lexicographically-lowest kmer string. Use get_kmer_for_string() in order to convert a kmer string to its lexlo form before retrieving it from the cortex object.

__iter__()[source]

Iterate over kmer strings in graph in order stored in graph

get_kmer_for_string(string)[source]

Will compute the revcomp of kmer string before getting a kmer

items()[source]

Iterate over kmer strings and kmers in graph in order stored in graph

values()[source]

Iterate over kmers in cortex graph

Cortex graph headers

This module contains classes for parsing and representing a Cortex file header

class cortexpy.graph.parser.header.Header(version=6, kmer_size=1, kmer_container_size=None, num_colors=None, mean_read_lengths=None, total_sequences=None, sample_names=None, error_rates=None, color_info_blocks=NOTHING)[source]

Cortex header object

This object allows access to header information contained in a cortex file

classmethod from_stream(stream)[source]

Extract a cortex header from a file handle

Cortex kmers

This module provides classes and functions for working with Cortex kmers.

class cortexpy.graph.parser.kmer.Kmer(kmer_data, num_colors, kmer_size, revcomp=None)[source]

Represents a Cortex kmer

This class wraps a kmer data object with attributes and methods for inspecting and manipulating the underlying kmer data object.

increment_color_coverage(color)[source]

Increment the coverage of a color by one

class cortexpy.graph.parser.kmer.StringKmerConverter(kmer_size)[source]

Converts kmer strings to various binary representations

to_uints(kmer_string)[source]

Converts kmer_string to big-endian uint64 array

cortexpy.graph.parser.kmer.connect_kmers(first, second, color, identical_kmer_check=True)[source]

Connect two kmers

cortexpy.graph.parser.kmer.disconnect_kmers(first, second, colors)[source]

Disconnect two kmers

cortexpy.graph.parser.kmer.find_all_neighbors(first, second)[source]

Return kmers and letters to get from first kmer to second

class cortexpy.links.LinkedGraphTraverser(graph, walkers=NOTHING)[source]

Adapter for linked walkers to be able to work with nx.all_simple_paths()

__getitem__(item)[source]

Get the children of item according to the walker object associated with item

Warning: This scheme only works with depth-first search.

Representing Cortex graphs as nx.Graph objects

This module contains classes for representing Cortex graphs as objects that are compatible with networkx algorithms.

todo: Simplify the Graph implementations

class cortexpy.graph.cortex.ConsistentCortexDiGraph(kmer_mapping=NOTHING, graph=NOTHING)[source]

Graph that stores kmer strings that are consistent with each other

class cortexpy.graph.cortex.CortexDiGraph(kmer_mapping=NOTHING, graph=NOTHING)[source]

Stores cortex k-mers and conforms to parts of the interface of nx.MultiDiGraph

add_edge(first, second, *, key)[source]

Note: edges can only be added to existing nodes

nbunch_iter(nbunch=None)[source]

Return an iterator over nodes contained in nbunch that are also in the graph.

This code has been copied from networkx.

The nodes in nbunch are checked for membership in the graph and if not are silently ignored.

Parameters:nbunch (single node, container, or all nodes (default= all nodes)) – The view will only report edges incident to these nodes.
Returns:niter – An iterator over nodes in nbunch that are also in the graph. If nbunch is None, iterate over all nodes in the graph.
Return type:iterator
Raises:NetworkXError – If nbunch is not a node or or sequence of nodes. If a node in nbunch is not hashable.

See also

Graph.__iter__()

Notes

When nbunch is an iterator, the returned iterator yields values directly from nbunch, becoming exhausted when nbunch is exhausted.

To test whether nbunch is a single node, one can use “if nbunch in self:”, even after processing with this routine.

If nbunch is not a node or a (possibly empty) sequence/iterator or None, a NetworkXError is raised. Also, if any object in nbunch is not hashable, a NetworkXError is raised.

This method was copied from Networkx version 2.1 and then modified

class cortexpy.graph.cortex.CortexGraphMapping(ra_parser, exclusion_set=NOTHING, new_kmers=NOTHING, n_duplicates=0)[source]

Create a dict-like kmer mapping from a RandomAccess parser (ra_parser)

The exclusion set tracks kmers deleted from the ra_parser. The new_kmers track kmers that have been added to the mapping. Kmers that exist in both new_kmers and ra_parser are considered overwritten. The kmers in new_kmers have precedence.

connect_kmers(first, second, color, identical_kmer_check=True)[source]

Connect two kmers

disconnect_kmers(first, second, colors)[source]

Disconnect two kmers

cortexpy.graph.cortex.build_cortex_graph(*, sample_names, kmer_size, num_colors, colors, kmer_generator=None, kmer_mapping=None)[source]

Colored de Bruijn graph constructor

cortexpy.graph.cortex.get_canonical_edge(first, second)[source]

Get canonical edge.

Canonical edges are between lexlo kmers and are ordered lexicographically

Return canonical edge, if the first and second nodes were lexlo

Interacting with graphs

This module contains classes and functions for inspecting, manipulating, and traversing graphs

class cortexpy.graph.interactor.SeedKmerStringIterator(seed_kmer_strings, unseen_lexlo_kmer_strings, seen_lexlo_kmer_strings=NOTHING)[source]

Iterates seeds and their lexlo representations that exist in the supplied all_kmers:

>>> list(SeedKmerStringIterator.from_all_kmer_strings_and_seeds(['AAC'], ['GTT']))
[('GTT', 'AAC')]

Kmers that are not in the seed list are return after that:

>>> list(SeedKmerStringIterator.from_all_kmer_strings_and_seeds(['AAA', 'AAC'], ['GTT']))
[('GTT', 'AAC'), ('AAA', 'AAA')]

Seeds that do not exist in the all_kmers are not returned.

>>> list(SeedKmerStringIterator.from_all_kmer_strings_and_seeds([], ['CCC']))
[]

Returned kmers from all_kmers list are returned in order.

>>> list(SeedKmerStringIterator.from_all_kmer_strings_and_seeds(['AAA', 'AAG', 'AAC'], []))
[('AAA', 'AAA'), ('AAG', 'AAG'), ('AAC', 'AAC')]
cortexpy.graph.interactor.edge_nodes_of(graph)[source]

Find all edge nodes of a graph

Second return value is direction of edge.

cortexpy.graph.interactor.make_copy_of_color_for_kmer_graph(graph, color, include_self_refs=False)[source]

Makes a copy of graph, but only copies over links with key=color. Only copies over nodes that are linked by a link with key=color.

Utility functions

This module contains utility functions that are used inside cortexpy. These functions may also be useful outside of cortexpy.

cortexpy.utils.kmerize_contig(contig, kmer_size)[source]

Return generator of kmers in contig

The returned kmers are not lexicographically lowest.

>>> list(kmerize_contig('ATTT', 3))
['ATT', 'TTT']
cortexpy.utils.kmerize_fasta(fasta, kmer_size)[source]

Return generator to all kmers in fasta

cortexpy.utils.lexlo[source]

Return lexicographically lowest version of a kmer string and its reverse complement

The reverse complement of a kmer string is generated and the lexicographically-lowest kmer string is returned.

>>> lexlo('AAA')
'AAA'
>>> lexlo('TTT')
'AAA'