Bifrost
|
Represent a Colored and Compacted de Bruijn graph. More...
Inherits CompactedDBG< DataAccessor< void >, DataStorage< void > >.
Public Member Functions | |
ColoredCDBG (int kmer_length=31, int minimizer_length=-1) | |
Constructor (set up an empty colored cdBG). More... | |
ColoredCDBG (const ColoredCDBG &o) | |
Copy constructor (copy a colored cdBG). More... | |
ColoredCDBG (ColoredCDBG &&o) | |
Move constructor (move a colored cdBG). More... | |
ColoredCDBG & | operator= (const ColoredCDBG &o) |
Copy assignment operator (copy a colored cdBG). More... | |
ColoredCDBG & | operator= (ColoredCDBG &&o) |
Move assignment operator (move a colored cdBG). More... | |
bool | operator== (const ColoredCDBG &o) const |
Equality operator. More... | |
bool | operator!= (const ColoredCDBG &o) const |
Inequality operator. More... | |
ColoredCDBG & | operator+= (const ColoredCDBG &o) |
Addition assignment operator (merge a colored cdBG). More... | |
void | clear () |
Clear the graph: remove unitigs, user data and colors + reset its parameters. | |
bool | buildGraph (const CCDBG_Build_opt &opt) |
Build the Colored and compacted de Bruijn graph (only the unitigs). More... | |
bool | buildColors (const CCDBG_Build_opt &opt) |
Map the colors to the unitigs. More... | |
bool | write (const string &prefix_output_fn, const size_t nb_threads=1, const bool write_index_file=true, const bool compress_output=false, const bool verbose=false) const |
Write a colored and compacted de Bruijn graph to disk. More... | |
bool | read (const string &input_graph_fn, const string &input_colors_fn, const size_t nb_threads=1, const bool verbose=false) |
Read a colored and compacted de Bruijn graph from disk. More... | |
bool | read (const string &input_graph_fn, const string &input_index_fn, const string &input_colors_fn, const size_t nb_threads=1, const bool verbose=false) |
Read a colored and compacted de Bruijn graph from disk using an index file. More... | |
bool | merge (const ColoredCDBG &o, const size_t nb_threads=1, const bool verbose=false) |
Merge a colored and compacted de Bruijn graph. More... | |
bool | merge (ColoredCDBG &&o, const size_t nb_threads=1, const bool verbose=false) |
Merge and clear a colored and compacted de Bruijn graph. More... | |
bool | merge (const vector< ColoredCDBG > &v, const size_t nb_threads=1, const bool verbose=false) |
Merge multiple colored and compacted de Bruijn graphs. More... | |
bool | merge (vector< ColoredCDBG > &&v, const size_t nb_threads=1, const bool verbose=false) |
Merge and clear multiple colored and compacted de Bruijn graphs. More... | |
string | getColorName (const size_t color_id) const |
Get the name of a color. More... | |
vector< string > | getColorNames () const |
Get the names of all colors. More... | |
size_t | getNbColors () const |
Get the number of colors in the graph. More... | |
![]() | |
CompactedDBG (const int kmer_length=31, const int minimizer_length=-1) | |
Constructor (set up an empty compacted dBG). More... | |
CompactedDBG (const CompactedDBG< U, G > &o) | |
Copy constructor (copy a compacted de Bruijn graph). More... | |
CompactedDBG (CompactedDBG< U, G > &&o) | |
Move constructor (move a compacted de Bruijn graph). More... | |
virtual | ~CompactedDBG () |
Destructor. | |
CompactedDBG< U, G > & | operator= (const CompactedDBG< U, G > &o) |
Copy assignment operator (copy a compacted de Bruijn graph). More... | |
CompactedDBG< U, G > & | operator= (CompactedDBG< U, G > &&o) |
Move assignment operator (move a compacted de Bruijn graph). More... | |
CompactedDBG< U, G > & | operator+= (const CompactedDBG< U, G > &o) |
Addition assignment operator (merge a compacted de Bruijn graph). More... | |
bool | operator== (const CompactedDBG< U, G > &o) const |
Equality operator. More... | |
bool | operator!= (const CompactedDBG< U, G > &o) const |
Inequality operator. More... | |
void | clear () |
Clear the graph: empty the graph and reset its parameters. | |
bool | build (CDBG_Build_opt &opt) |
Build the Compacted de Bruijn graph. More... | |
bool | simplify (const bool delete_short_isolated_unitigs=true, const bool clip_short_tips=true, const bool verbose=false) |
Simplify the Compacted de Bruijn graph: clip short (< 2k length) tips and/or delete short (< 2k length) isolated unitigs. More... | |
bool | write (const string &output_fn, const size_t nb_threads=1, const bool GFA_output=true, const bool FASTA_output=false, const bool BFG_output=false, const bool write_index_file=true, const bool compressed_output=false, const bool verbose=false) const |
Write the Compacted de Bruijn graph to disk (GFA1 format). More... | |
bool | read (const string &input_graph_fn, const size_t nb_threads=1, const bool verbose=false) |
Load a Compacted de Bruijn graph from disk (GFA1 or FASTA format). More... | |
bool | read (const string &input_graph_fn, const string &input_index_fn, const size_t nb_threads=1, const bool verbose=false) |
Read a Compacted de Bruijn graph from disk (GFA1, FASTA or BFG format) using an index file (BFI format). More... | |
UnitigMap< U, G > | find (const Kmer &km, const bool extremities_only=false) |
Find the unitig containing the queried k-mer in the Compacted de Bruijn graph. More... | |
const_UnitigMap< U, G > | find (const Kmer &km, const bool extremities_only=false) const |
Find the unitig containing the queried k-mer in the Compacted de Bruijn graph. More... | |
UnitigMap< U, G > | findUnitig (const char *s, const size_t pos, const size_t len) |
Find the unitig containing the k-mer starting at a given position in a query sequence and extends the mapping (if the k-mer is found, the function extends the mapping from the k-mer as long as the query sequence and the unitig matches). More... | |
const_UnitigMap< U, G > | findUnitig (const char *s, const size_t pos, const size_t len) const |
Find the unitig containing the k-mer starting at a given position in a query sequence and extends the mapping (if the k-mer is found, the function extends the mapping from the k-mer as long as the query sequence and the unitig matches). More... | |
vector< pair< size_t, UnitigMap< U, G > > > | searchSequence (const string &s, const bool exact, const bool insertion, const bool deletion, const bool substitution, const bool or_exclusive_match=false) |
Performs exact and/or inexact search of the k-mers of a sequence query in the Compacted de Bruijn graph. More... | |
vector< pair< size_t, const_UnitigMap< U, G > > > | searchSequence (const string &s, const bool exact, const bool insertion, const bool deletion, const bool substitution, const bool or_exclusive_match=false) const |
Performs exact and/or inexact search of the k-mers of a sequence query in the Compacted de Bruijn graph. More... | |
bool | add (const string &seq, const bool verbose=false) |
Add a sequence to the Compacted de Bruijn graph. More... | |
bool | remove (const const_UnitigMap< U, G > &um, const bool verbose=false) |
Remove a unitig from the Compacted de Bruijn graph. More... | |
bool | merge (const CompactedDBG &o, const size_t nb_threads=1, const bool verbose=false) |
Merge a compacted de Bruijn graph. More... | |
bool | merge (const vector< CompactedDBG > &v, const size_t nb_threads=1, const bool verbose=false) |
Merge multiple compacted de Bruijn graphs. More... | |
iterator | begin () |
Create an iterator to the first unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically). More... | |
const_iterator | begin () const |
Create an constant iterator to the first unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically). More... | |
iterator | end () |
Create an iterator to the "past-the-last" unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically). More... | |
const_iterator | end () const |
Create a constant iterator to the "past-the-last" unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically). More... | |
size_t | length () const |
Return the sum of the unitigs length. More... | |
size_t | nbKmers () const |
Return the number of k-mers in the graph. More... | |
bool | isInvalid () const |
Return a boolean indicating if the graph is invalid (wrong input parameters/files, error occurring during a method, etc.). More... | |
int | getK () const |
Return the length of k-mers of the graph. More... | |
int | getG () const |
Return the length of minimizers of the graph. More... | |
size_t | size () const |
Return the number of unitigs in the graph. More... | |
G * | getData () |
Return a pointer to the graph data. More... | |
const G * | getData () const |
Return a constant pointer to the graph data. More... | |
Additional Inherited Members | |
![]() | |
typedef unitigIterator< U, G, false > | iterator |
An iterator for the unitigs of the graph. More... | |
typedef unitigIterator< U, G, true > | const_iterator |
A constant iterator for the unitigs of the graph. More... | |
Represent a Colored and Compacted de Bruijn graph.
The class inherits from CompactedDBG which means that all public functions available with CompactedDBG are also available with ColoredCDBG.
If data are to be associated with the unitigs, these data must be wrapped into a class that inherits from the abstract class CCDBG_Data_t, such as in:
Because CCDBG_Data_t is an abstract class, all the methods of the base class (CCDBG_Data_t) must be implemented in your wrapper (the derived class, aka MyUnitigData in this example). IMPORTANT: If you do not implement those methods, default ones that have no effects will be applied.
ColoredCDBG< Unitig_data_t >::ColoredCDBG | ( | int | kmer_length = 31 , |
int | minimizer_length = -1 |
||
) |
Constructor (set up an empty colored cdBG).
kmer_length | is the length k of k-mers used in the graph (each unitig is of length at least k). |
minimizer_length | is the length g of minimizers (g < k) used in the graph. |
ColoredCDBG< Unitig_data_t >::ColoredCDBG | ( | const ColoredCDBG< Unitig_data_t > & | o | ) |
Copy constructor (copy a colored cdBG).
This function is expensive in terms of time and memory as the content of a colored and compacted de Bruijn graph is copied. After the call to this function, the same graph exists twice in memory.
o | is a constant reference to the colored and compacted de Bruijn graph to copy. |
ColoredCDBG< Unitig_data_t >::ColoredCDBG | ( | ColoredCDBG< Unitig_data_t > && | o | ) |
Move constructor (move a colored cdBG).
The content of o is moved ("transfered") to a new colored and compacted de Bruijn graph. The colored and compacted de Bruijn graph referenced by o will be empty after the call to this constructor.
o | is a reference on a reference to the colored and compacted de Bruijn graph to move. |
bool ColoredCDBG< Unitig_data_t >::buildColors | ( | const CCDBG_Build_opt & | opt | ) |
Map the colors to the unitigs.
This is done by reading the input files and querying the graph. If a color filename is provided in opt.filename_colors_in, colors are loaded from that file instead.
opt | is a structure from which the members are parameters of this function. See CCDBG_Build_opt. |
bool ColoredCDBG< Unitig_data_t >::buildGraph | ( | const CCDBG_Build_opt & | opt | ) |
Build the Colored and compacted de Bruijn graph (only the unitigs).
A call to ColoredCDBG::mapColors is required afterwards to map colors to unitigs.
opt | is a structure from which the members are parameters of this function. See CCDBG_Build_opt. |
string ColoredCDBG< Unitig_data_t >::getColorName | ( | const size_t | color_id | ) | const |
Get the name of a color.
As colors match the input files, the color names match the input filenames.
vector<string> ColoredCDBG< Unitig_data_t >::getColorNames | ( | ) | const |
Get the names of all colors.
As colors match the input files, the color names match the input filenames.
|
inline |
Get the number of colors in the graph.
bool ColoredCDBG< Unitig_data_t >::merge | ( | ColoredCDBG< Unitig_data_t > && | o, |
const size_t | nb_threads = 1 , |
||
const bool | verbose = false |
||
) |
Merge and clear a colored and compacted de Bruijn graph.
After merging, all unitigs and colors of the input graph have been added to and compacted with the current colored and compacted de Bruijn graph (this). The input graph is cleared before the function returns. If the unitigs of the input graph had data of type "MyUnitigData" associated, they have been added to the current colored and compacted de Bruijn graph using the functions of the class MyUnitigData which are also present in its base class CCDBG_Data_t<MyUnitigData>. Note that if multiple colored and compacted de Bruijn graphs have to be merged, it is more efficient to call ColoredCDBG::merge with a vector of ColoredCDBG as input.
o | is a reference on a reference to the colored and compacted de Bruijn graph to merge. It can be obtained using std::move(). After merging, the graph pointed by o is cleared. |
nb_threads | is an integer indicating how many threads can be used during the merging. |
verbose | is a boolean indicating if information messages must be printed during the execution of the function. |
bool ColoredCDBG< Unitig_data_t >::merge | ( | const ColoredCDBG< Unitig_data_t > & | o, |
const size_t | nb_threads = 1 , |
||
const bool | verbose = false |
||
) |
Merge a colored and compacted de Bruijn graph.
After merging, all unitigs and colors of the input graph have been added to and compacted with the current colored and compacted de Bruijn graph (this). If the unitigs of the input graph had data of type "MyUnitigData" associated, they have been added to the current colored and compacted de Bruijn graph using the functions of the class MyUnitigData which are also present in its base class CCDBG_Data_t<MyUnitigData>. Note that if multiple colored and compacted de Bruijn graphs have to be merged, it is more efficient to call ColoredCDBG::merge with a vector of ColoredCDBG as input.
o | is a constant reference to the colored and compacted de Bruijn graph to merge. |
nb_threads | is an integer indicating how many threads can be used during the merging. |
verbose | is a boolean indicating if information messages must be printed during the execution of the function. |
bool ColoredCDBG< Unitig_data_t >::merge | ( | const vector< ColoredCDBG< Unitig_data_t > > & | v, |
const size_t | nb_threads = 1 , |
||
const bool | verbose = false |
||
) |
Merge multiple colored and compacted de Bruijn graphs.
After merging, all unitigs and colors of the input colored and compacted de Bruijn graphs have been added to and compacted with the current colored and compacted de Bruijn graph (this). If the unitigs had data of type "MyUnitigData" associated, they have been added to the current colored and compacted de Bruijn graph using the functions of the class MyUnitigData which are also present in its base class CCDBG_Data_t<MyUnitigData>.
v | is a constant reference to a vector of colored and compacted de Bruijn graphs to merge. |
nb_threads | is an integer indicating how many threads can be used during the merging. |
verbose | is a boolean indicating if information messages must be printed during the execution of the function. |
bool ColoredCDBG< Unitig_data_t >::merge | ( | vector< ColoredCDBG< Unitig_data_t > > && | v, |
const size_t | nb_threads = 1 , |
||
const bool | verbose = false |
||
) |
Merge and clear multiple colored and compacted de Bruijn graphs.
After merging, all unitigs and colors of the input colored and compacted de Bruijn graphs have been added to and compacted with the current colored and compacted de Bruijn graph (this). The input graphs are cleared before the function returns. If the input unitigs had data of type "MyUnitigData" associated, they have been added to the current colored and compacted de Bruijn graph using the functions of the class MyUnitigData which are also present in its base class CCDBG_Data_t<MyUnitigData>.
v | is a reference on a reference to a vector of colored and compacted de Bruijn graphs to merge. It can be obtained using std::move(). After merging, the graphs in v are cleared. |
nb_threads | is an integer indicating how many threads can be used during the merging. |
verbose | is a boolean indicating if information messages must be printed during the execution of the function. |
|
inline |
Inequality operator.
ColoredCDBG& ColoredCDBG< Unitig_data_t >::operator+= | ( | const ColoredCDBG< Unitig_data_t > & | o | ) |
Addition assignment operator (merge a colored cdBG).
After merging, all unitigs and colors of o have been added to and compacted with the current colored and compacted de Bruijn graph (this). If the unitigs of o had data of type "MyUnitigData" associated, they have been added to the current colored and compacted de Bruijn graph using the functions of the class MyUnitigData which are in base class CCDBG_Data_t<MyUnitigData>. This function is similar to ColoredCDBG::merge except that it uses only one thread while ColoredCDBG::merge can work with multiple threads (number of threads provided as a parameter). Note that if multiple colored and compacted de Bruijn graphs have to be merged, it is more efficient to call ColoredCDBG::merge with a vector of ColoredCDBG as input.
o | is a constant reference to the colored and compacted de Bruijn graph to merge. |
ColoredCDBG& ColoredCDBG< Unitig_data_t >::operator= | ( | ColoredCDBG< Unitig_data_t > && | o | ) |
Move assignment operator (move a colored cdBG).
The content of o is moved ("transfered") to a new colored and compacted de Bruijn graph. The colored and compacted de Bruijn graph referenced by o will be empty after the call to this operator.
o | is a reference on a reference to the colored and compacted de Bruijn graph to move. |
ColoredCDBG& ColoredCDBG< Unitig_data_t >::operator= | ( | const ColoredCDBG< Unitig_data_t > & | o | ) |
Copy assignment operator (copy a colored cdBG).
This function is expensive in terms of time and memory as the content of a colored and compacted de Bruijn graph is copied. After the call to this function, the same graph exists twice in memory.
o | is a constant reference to the colored and compacted de Bruijn graph to copy. |
bool ColoredCDBG< Unitig_data_t >::operator== | ( | const ColoredCDBG< Unitig_data_t > & | o | ) | const |
Equality operator.
bool ColoredCDBG< Unitig_data_t >::read | ( | const string & | input_graph_fn, |
const string & | input_colors_fn, | ||
const size_t | nb_threads = 1 , |
||
const bool | verbose = false |
||
) |
Read a colored and compacted de Bruijn graph from disk.
The graph (in GFA, FASTA or BFG format) must have been produced by Bifrost. By default, the function detects if an index file (BFI format) exists for the input graph and will use it to load the graph. Otherwise, reading the graph will be much slower than function read() with the index filename in input parameter.
input_graph_fn | is a string which is the prefix of the graph filename to read |
input_colors_fn | is a string which is the prefix of the color filename to read |
nb_threads | is the number of threads that can be used to read the graph and its colors from disk. |
verbose | is a boolean indicating if information messages are printed during reading (true) or not (false). |
bool ColoredCDBG< Unitig_data_t >::read | ( | const string & | input_graph_fn, |
const string & | input_index_fn, | ||
const string & | input_colors_fn, | ||
const size_t | nb_threads = 1 , |
||
const bool | verbose = false |
||
) |
Read a colored and compacted de Bruijn graph from disk using an index file.
The graph (in GFA, FASTA or BFG format) must have been produced by Bifrost.
input_graph_fn | is a string which is the prefix of the graph filename to read |
input_index_fn | is a string which is the prefix of the index filename to read |
input_colors_fn | is a string which is the prefix of the color filename to read |
nb_threads | is the number of threads that can be used to read the graph and its colors from disk. |
verbose | is a boolean indicating if information messages are printed during reading (true) or not (false). |
bool ColoredCDBG< Unitig_data_t >::write | ( | const string & | prefix_output_fn, |
const size_t | nb_threads = 1 , |
||
const bool | write_index_file = true , |
||
const bool | compress_output = false , |
||
const bool | verbose = false |
||
) | const |
Write a colored and compacted de Bruijn graph to disk.
prefix_output_fn | is a string which is the prefix of the filename for the two files that are going to be written to disk. Assuming the prefix is "XXX", two files "XXX.gfa" and "XXX.color.bfg" will be written to disk. |
nb_threads | is the number of threads that can be used to write the graph to disk. |
write_meta_file | indicates if a graph meta file is written to disk. Graph meta files enable faster graph loading. |
compressed_output | indicates if the output file is compressed. |
verbose | is a boolean indicating if information message are printed during writing (true) or not (false). |