Bifrost
Public Member Functions
ColoredCDBG< Unitig_data_t > Class Template Reference

Represent a Colored and Compacted de Bruijn graph. More...

Inherits CompactedDBG< DataAccessor< void >, DataStorage< void > >.

Public Member Functions

 ColoredCDBG (int kmer_length=31, int minimizer_length=-1)
 Constructor (set up an empty colored cdBG). More...
 
 ColoredCDBG (const ColoredCDBG &o)
 Copy constructor (copy a colored cdBG). More...
 
 ColoredCDBG (ColoredCDBG &&o)
 Move constructor (move a colored cdBG). More...
 
ColoredCDBGoperator= (const ColoredCDBG &o)
 Copy assignment operator (copy a colored cdBG). More...
 
ColoredCDBGoperator= (ColoredCDBG &&o)
 Move assignment operator (move a colored cdBG). More...
 
bool operator== (const ColoredCDBG &o) const
 Equality operator. More...
 
bool operator!= (const ColoredCDBG &o) const
 Inequality operator. More...
 
ColoredCDBGoperator+= (const ColoredCDBG &o)
 Addition assignment operator (merge a colored cdBG). More...
 
void clear ()
 Clear the graph: remove unitigs, user data and colors + reset its parameters.
 
bool buildGraph (const CCDBG_Build_opt &opt)
 Build the Colored and compacted de Bruijn graph (only the unitigs). More...
 
bool buildColors (const CCDBG_Build_opt &opt)
 Map the colors to the unitigs. More...
 
bool write (const string &prefix_output_fn, const size_t nb_threads=1, const bool write_index_file=true, const bool compress_output=false, const bool verbose=false) const
 Write a colored and compacted de Bruijn graph to disk. More...
 
bool read (const string &input_graph_fn, const string &input_colors_fn, const size_t nb_threads=1, const bool verbose=false)
 Read a colored and compacted de Bruijn graph from disk. More...
 
bool read (const string &input_graph_fn, const string &input_index_fn, const string &input_colors_fn, const size_t nb_threads=1, const bool verbose=false)
 Read a colored and compacted de Bruijn graph from disk using an index file. More...
 
bool merge (const ColoredCDBG &o, const size_t nb_threads=1, const bool verbose=false)
 Merge a colored and compacted de Bruijn graph. More...
 
bool merge (ColoredCDBG &&o, const size_t nb_threads=1, const bool verbose=false)
 Merge and clear a colored and compacted de Bruijn graph. More...
 
bool merge (const vector< ColoredCDBG > &v, const size_t nb_threads=1, const bool verbose=false)
 Merge multiple colored and compacted de Bruijn graphs. More...
 
bool merge (vector< ColoredCDBG > &&v, const size_t nb_threads=1, const bool verbose=false)
 Merge and clear multiple colored and compacted de Bruijn graphs. More...
 
string getColorName (const size_t color_id) const
 Get the name of a color. More...
 
vector< string > getColorNames () const
 Get the names of all colors. More...
 
size_t getNbColors () const
 Get the number of colors in the graph. More...
 
- Public Member Functions inherited from CompactedDBG< DataAccessor< void >, DataStorage< void > >
 CompactedDBG (const int kmer_length=31, const int minimizer_length=-1)
 Constructor (set up an empty compacted dBG). More...
 
 CompactedDBG (const CompactedDBG< U, G > &o)
 Copy constructor (copy a compacted de Bruijn graph). More...
 
 CompactedDBG (CompactedDBG< U, G > &&o)
 Move constructor (move a compacted de Bruijn graph). More...
 
virtual ~CompactedDBG ()
 Destructor.
 
CompactedDBG< U, G > & operator= (const CompactedDBG< U, G > &o)
 Copy assignment operator (copy a compacted de Bruijn graph). More...
 
CompactedDBG< U, G > & operator= (CompactedDBG< U, G > &&o)
 Move assignment operator (move a compacted de Bruijn graph). More...
 
CompactedDBG< U, G > & operator+= (const CompactedDBG< U, G > &o)
 Addition assignment operator (merge a compacted de Bruijn graph). More...
 
bool operator== (const CompactedDBG< U, G > &o) const
 Equality operator. More...
 
bool operator!= (const CompactedDBG< U, G > &o) const
 Inequality operator. More...
 
void clear ()
 Clear the graph: empty the graph and reset its parameters.
 
bool build (CDBG_Build_opt &opt)
 Build the Compacted de Bruijn graph. More...
 
bool simplify (const bool delete_short_isolated_unitigs=true, const bool clip_short_tips=true, const bool verbose=false)
 Simplify the Compacted de Bruijn graph: clip short (< 2k length) tips and/or delete short (< 2k length) isolated unitigs. More...
 
bool write (const string &output_fn, const size_t nb_threads=1, const bool GFA_output=true, const bool FASTA_output=false, const bool BFG_output=false, const bool write_index_file=true, const bool compressed_output=false, const bool verbose=false) const
 Write the Compacted de Bruijn graph to disk (GFA1 format). More...
 
bool read (const string &input_graph_fn, const size_t nb_threads=1, const bool verbose=false)
 Load a Compacted de Bruijn graph from disk (GFA1 or FASTA format). More...
 
bool read (const string &input_graph_fn, const string &input_index_fn, const size_t nb_threads=1, const bool verbose=false)
 Read a Compacted de Bruijn graph from disk (GFA1, FASTA or BFG format) using an index file (BFI format). More...
 
UnitigMap< U, Gfind (const Kmer &km, const bool extremities_only=false)
 Find the unitig containing the queried k-mer in the Compacted de Bruijn graph. More...
 
const_UnitigMap< U, Gfind (const Kmer &km, const bool extremities_only=false) const
 Find the unitig containing the queried k-mer in the Compacted de Bruijn graph. More...
 
UnitigMap< U, GfindUnitig (const char *s, const size_t pos, const size_t len)
 Find the unitig containing the k-mer starting at a given position in a query sequence and extends the mapping (if the k-mer is found, the function extends the mapping from the k-mer as long as the query sequence and the unitig matches). More...
 
const_UnitigMap< U, GfindUnitig (const char *s, const size_t pos, const size_t len) const
 Find the unitig containing the k-mer starting at a given position in a query sequence and extends the mapping (if the k-mer is found, the function extends the mapping from the k-mer as long as the query sequence and the unitig matches). More...
 
vector< pair< size_t, UnitigMap< U, G > > > searchSequence (const string &s, const bool exact, const bool insertion, const bool deletion, const bool substitution, const bool or_exclusive_match=false)
 Performs exact and/or inexact search of the k-mers of a sequence query in the Compacted de Bruijn graph. More...
 
vector< pair< size_t, const_UnitigMap< U, G > > > searchSequence (const string &s, const bool exact, const bool insertion, const bool deletion, const bool substitution, const bool or_exclusive_match=false) const
 Performs exact and/or inexact search of the k-mers of a sequence query in the Compacted de Bruijn graph. More...
 
bool add (const string &seq, const bool verbose=false)
 Add a sequence to the Compacted de Bruijn graph. More...
 
bool remove (const const_UnitigMap< U, G > &um, const bool verbose=false)
 Remove a unitig from the Compacted de Bruijn graph. More...
 
bool merge (const CompactedDBG &o, const size_t nb_threads=1, const bool verbose=false)
 Merge a compacted de Bruijn graph. More...
 
bool merge (const vector< CompactedDBG > &v, const size_t nb_threads=1, const bool verbose=false)
 Merge multiple compacted de Bruijn graphs. More...
 
iterator begin ()
 Create an iterator to the first unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically). More...
 
const_iterator begin () const
 Create an constant iterator to the first unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically). More...
 
iterator end ()
 Create an iterator to the "past-the-last" unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically). More...
 
const_iterator end () const
 Create a constant iterator to the "past-the-last" unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically). More...
 
size_t length () const
 Return the sum of the unitigs length. More...
 
size_t nbKmers () const
 Return the number of k-mers in the graph. More...
 
bool isInvalid () const
 Return a boolean indicating if the graph is invalid (wrong input parameters/files, error occurring during a method, etc.). More...
 
int getK () const
 Return the length of k-mers of the graph. More...
 
int getG () const
 Return the length of minimizers of the graph. More...
 
size_t size () const
 Return the number of unitigs in the graph. More...
 
GgetData ()
 Return a pointer to the graph data. More...
 
const GgetData () const
 Return a constant pointer to the graph data. More...
 

Additional Inherited Members

- Public Types inherited from CompactedDBG< DataAccessor< void >, DataStorage< void > >
typedef unitigIterator< U, G, false > iterator
 An iterator for the unitigs of the graph. More...
 
typedef unitigIterator< U, G, true > const_iterator
 A constant iterator for the unitigs of the graph. More...
 

Detailed Description

template<typename Unitig_data_t = void>
class ColoredCDBG< Unitig_data_t >

Represent a Colored and Compacted de Bruijn graph.

The class inherits from CompactedDBG which means that all public functions available with CompactedDBG are also available with ColoredCDBG.

ColoredCDBG<> ccdbg_1; // No unitig data
ColoredCDBG<void> ccdbg_2; // Equivalent to previous notation
ColoredCDBG<MyUnitigData> ccdbg_3; // An object of type MyUnitigData will be associated with each unitig (along the colors)

If data are to be associated with the unitigs, these data must be wrapped into a class that inherits from the abstract class CCDBG_Data_t, such as in:

class MyUnitigData : public CCDBG_Data_t<MyUnitigData> { ... };

Because CCDBG_Data_t is an abstract class, all the methods of the base class (CCDBG_Data_t) must be implemented in your wrapper (the derived class, aka MyUnitigData in this example). IMPORTANT: If you do not implement those methods, default ones that have no effects will be applied.

Constructor & Destructor Documentation

◆ ColoredCDBG() [1/3]

template<typename Unitig_data_t = void>
ColoredCDBG< Unitig_data_t >::ColoredCDBG ( int  kmer_length = 31,
int  minimizer_length = -1 
)

Constructor (set up an empty colored cdBG).

Parameters
kmer_lengthis the length k of k-mers used in the graph (each unitig is of length at least k).
minimizer_lengthis the length g of minimizers (g < k) used in the graph.

◆ ColoredCDBG() [2/3]

template<typename Unitig_data_t = void>
ColoredCDBG< Unitig_data_t >::ColoredCDBG ( const ColoredCDBG< Unitig_data_t > &  o)

Copy constructor (copy a colored cdBG).

This function is expensive in terms of time and memory as the content of a colored and compacted de Bruijn graph is copied. After the call to this function, the same graph exists twice in memory.

Parameters
ois a constant reference to the colored and compacted de Bruijn graph to copy.

◆ ColoredCDBG() [3/3]

template<typename Unitig_data_t = void>
ColoredCDBG< Unitig_data_t >::ColoredCDBG ( ColoredCDBG< Unitig_data_t > &&  o)

Move constructor (move a colored cdBG).

The content of o is moved ("transfered") to a new colored and compacted de Bruijn graph. The colored and compacted de Bruijn graph referenced by o will be empty after the call to this constructor.

Parameters
ois a reference on a reference to the colored and compacted de Bruijn graph to move.

Member Function Documentation

◆ buildColors()

template<typename Unitig_data_t = void>
bool ColoredCDBG< Unitig_data_t >::buildColors ( const CCDBG_Build_opt opt)

Map the colors to the unitigs.

This is done by reading the input files and querying the graph. If a color filename is provided in opt.filename_colors_in, colors are loaded from that file instead.

Parameters
optis a structure from which the members are parameters of this function. See CCDBG_Build_opt.
Returns
boolean indicating if the colors have been mapped successfully.

◆ buildGraph()

template<typename Unitig_data_t = void>
bool ColoredCDBG< Unitig_data_t >::buildGraph ( const CCDBG_Build_opt opt)

Build the Colored and compacted de Bruijn graph (only the unitigs).

A call to ColoredCDBG::mapColors is required afterwards to map colors to unitigs.

Parameters
optis a structure from which the members are parameters of this function. See CCDBG_Build_opt.
Returns
boolean indicating if the graph has been built successfully.

◆ getColorName()

template<typename Unitig_data_t = void>
string ColoredCDBG< Unitig_data_t >::getColorName ( const size_t  color_id) const

Get the name of a color.

As colors match the input files, the color names match the input filenames.

Returns
a string which is either a color name or an empty string if the color ID is invalid or if the colors have not yet been mapped to the unitigs.

◆ getColorNames()

template<typename Unitig_data_t = void>
vector<string> ColoredCDBG< Unitig_data_t >::getColorNames ( ) const

Get the names of all colors.

As colors match the input files, the color names match the input filenames.

Returns
a vector of strings for which each string is either a color name or an empty string if the color ID is invalid or if the colors have not yet been mapped to the unitigs.

◆ getNbColors()

template<typename Unitig_data_t = void>
size_t ColoredCDBG< Unitig_data_t >::getNbColors ( ) const
inline

Get the number of colors in the graph.

Returns
the number of colors in the graph.

◆ merge() [1/4]

template<typename Unitig_data_t = void>
bool ColoredCDBG< Unitig_data_t >::merge ( ColoredCDBG< Unitig_data_t > &&  o,
const size_t  nb_threads = 1,
const bool  verbose = false 
)

Merge and clear a colored and compacted de Bruijn graph.

After merging, all unitigs and colors of the input graph have been added to and compacted with the current colored and compacted de Bruijn graph (this). The input graph is cleared before the function returns. If the unitigs of the input graph had data of type "MyUnitigData" associated, they have been added to the current colored and compacted de Bruijn graph using the functions of the class MyUnitigData which are also present in its base class CCDBG_Data_t<MyUnitigData>. Note that if multiple colored and compacted de Bruijn graphs have to be merged, it is more efficient to call ColoredCDBG::merge with a vector of ColoredCDBG as input.

Parameters
ois a reference on a reference to the colored and compacted de Bruijn graph to merge. It can be obtained using std::move(). After merging, the graph pointed by o is cleared.
nb_threadsis an integer indicating how many threads can be used during the merging.
verboseis a boolean indicating if information messages must be printed during the execution of the function.
Returns
a boolean indicating if the graph has been successfully merged.

◆ merge() [2/4]

template<typename Unitig_data_t = void>
bool ColoredCDBG< Unitig_data_t >::merge ( const ColoredCDBG< Unitig_data_t > &  o,
const size_t  nb_threads = 1,
const bool  verbose = false 
)

Merge a colored and compacted de Bruijn graph.

After merging, all unitigs and colors of the input graph have been added to and compacted with the current colored and compacted de Bruijn graph (this). If the unitigs of the input graph had data of type "MyUnitigData" associated, they have been added to the current colored and compacted de Bruijn graph using the functions of the class MyUnitigData which are also present in its base class CCDBG_Data_t<MyUnitigData>. Note that if multiple colored and compacted de Bruijn graphs have to be merged, it is more efficient to call ColoredCDBG::merge with a vector of ColoredCDBG as input.

Parameters
ois a constant reference to the colored and compacted de Bruijn graph to merge.
nb_threadsis an integer indicating how many threads can be used during the merging.
verboseis a boolean indicating if information messages must be printed during the execution of the function.
Returns
a boolean indicating if the graph has been successfully merged.

◆ merge() [3/4]

template<typename Unitig_data_t = void>
bool ColoredCDBG< Unitig_data_t >::merge ( const vector< ColoredCDBG< Unitig_data_t > > &  v,
const size_t  nb_threads = 1,
const bool  verbose = false 
)

Merge multiple colored and compacted de Bruijn graphs.

After merging, all unitigs and colors of the input colored and compacted de Bruijn graphs have been added to and compacted with the current colored and compacted de Bruijn graph (this). If the unitigs had data of type "MyUnitigData" associated, they have been added to the current colored and compacted de Bruijn graph using the functions of the class MyUnitigData which are also present in its base class CCDBG_Data_t<MyUnitigData>.

Parameters
vis a constant reference to a vector of colored and compacted de Bruijn graphs to merge.
nb_threadsis an integer indicating how many threads can be used during the merging.
verboseis a boolean indicating if information messages must be printed during the execution of the function.
Returns
a boolean indicating if the graphs have been successfully merged.

◆ merge() [4/4]

template<typename Unitig_data_t = void>
bool ColoredCDBG< Unitig_data_t >::merge ( vector< ColoredCDBG< Unitig_data_t > > &&  v,
const size_t  nb_threads = 1,
const bool  verbose = false 
)

Merge and clear multiple colored and compacted de Bruijn graphs.

After merging, all unitigs and colors of the input colored and compacted de Bruijn graphs have been added to and compacted with the current colored and compacted de Bruijn graph (this). The input graphs are cleared before the function returns. If the input unitigs had data of type "MyUnitigData" associated, they have been added to the current colored and compacted de Bruijn graph using the functions of the class MyUnitigData which are also present in its base class CCDBG_Data_t<MyUnitigData>.

Parameters
vis a reference on a reference to a vector of colored and compacted de Bruijn graphs to merge. It can be obtained using std::move(). After merging, the graphs in v are cleared.
nb_threadsis an integer indicating how many threads can be used during the merging.
verboseis a boolean indicating if information messages must be printed during the execution of the function.
Returns
a boolean indicating if the graphs have been successfully merged.

◆ operator!=()

template<typename Unitig_data_t = void>
bool ColoredCDBG< Unitig_data_t >::operator!= ( const ColoredCDBG< Unitig_data_t > &  o) const
inline

Inequality operator.

Returns
a boolean indicating if two compacted de Bruijn graphs have different colored unitigs (does not compare the data associated with the unitigs).

◆ operator+=()

template<typename Unitig_data_t = void>
ColoredCDBG& ColoredCDBG< Unitig_data_t >::operator+= ( const ColoredCDBG< Unitig_data_t > &  o)

Addition assignment operator (merge a colored cdBG).

After merging, all unitigs and colors of o have been added to and compacted with the current colored and compacted de Bruijn graph (this). If the unitigs of o had data of type "MyUnitigData" associated, they have been added to the current colored and compacted de Bruijn graph using the functions of the class MyUnitigData which are in base class CCDBG_Data_t<MyUnitigData>. This function is similar to ColoredCDBG::merge except that it uses only one thread while ColoredCDBG::merge can work with multiple threads (number of threads provided as a parameter). Note that if multiple colored and compacted de Bruijn graphs have to be merged, it is more efficient to call ColoredCDBG::merge with a vector of ColoredCDBG as input.

Parameters
ois a constant reference to the colored and compacted de Bruijn graph to merge.
Returns
a reference to the current colored and compacted de Bruijn after merging.

◆ operator=() [1/2]

template<typename Unitig_data_t = void>
ColoredCDBG& ColoredCDBG< Unitig_data_t >::operator= ( ColoredCDBG< Unitig_data_t > &&  o)

Move assignment operator (move a colored cdBG).

The content of o is moved ("transfered") to a new colored and compacted de Bruijn graph. The colored and compacted de Bruijn graph referenced by o will be empty after the call to this operator.

Parameters
ois a reference on a reference to the colored and compacted de Bruijn graph to move.
Returns
a reference to the colored and compacted de Bruijn which has (and owns) the content of o.

◆ operator=() [2/2]

template<typename Unitig_data_t = void>
ColoredCDBG& ColoredCDBG< Unitig_data_t >::operator= ( const ColoredCDBG< Unitig_data_t > &  o)

Copy assignment operator (copy a colored cdBG).

This function is expensive in terms of time and memory as the content of a colored and compacted de Bruijn graph is copied. After the call to this function, the same graph exists twice in memory.

Parameters
ois a constant reference to the colored and compacted de Bruijn graph to copy.
Returns
a reference to the colored and compacted de Bruijn which is the copy.

◆ operator==()

template<typename Unitig_data_t = void>
bool ColoredCDBG< Unitig_data_t >::operator== ( const ColoredCDBG< Unitig_data_t > &  o) const

Equality operator.

Returns
a boolean indicating if two compacted de Bruijn graphs have the same colored unitigs (does not compare the data associated with the unitigs).

◆ read() [1/2]

template<typename Unitig_data_t = void>
bool ColoredCDBG< Unitig_data_t >::read ( const string &  input_graph_fn,
const string &  input_colors_fn,
const size_t  nb_threads = 1,
const bool  verbose = false 
)

Read a colored and compacted de Bruijn graph from disk.

The graph (in GFA, FASTA or BFG format) must have been produced by Bifrost. By default, the function detects if an index file (BFI format) exists for the input graph and will use it to load the graph. Otherwise, reading the graph will be much slower than function read() with the index filename in input parameter.

Parameters
input_graph_fnis a string which is the prefix of the graph filename to read
input_colors_fnis a string which is the prefix of the color filename to read
nb_threadsis the number of threads that can be used to read the graph and its colors from disk.
verboseis a boolean indicating if information messages are printed during reading (true) or not (false).
Returns
a boolean indicating if the graph was successfully read.

◆ read() [2/2]

template<typename Unitig_data_t = void>
bool ColoredCDBG< Unitig_data_t >::read ( const string &  input_graph_fn,
const string &  input_index_fn,
const string &  input_colors_fn,
const size_t  nb_threads = 1,
const bool  verbose = false 
)

Read a colored and compacted de Bruijn graph from disk using an index file.

The graph (in GFA, FASTA or BFG format) must have been produced by Bifrost.

Parameters
input_graph_fnis a string which is the prefix of the graph filename to read
input_index_fnis a string which is the prefix of the index filename to read
input_colors_fnis a string which is the prefix of the color filename to read
nb_threadsis the number of threads that can be used to read the graph and its colors from disk.
verboseis a boolean indicating if information messages are printed during reading (true) or not (false).
Returns
a boolean indicating if the graph was successfully read.

◆ write()

template<typename Unitig_data_t = void>
bool ColoredCDBG< Unitig_data_t >::write ( const string &  prefix_output_fn,
const size_t  nb_threads = 1,
const bool  write_index_file = true,
const bool  compress_output = false,
const bool  verbose = false 
) const

Write a colored and compacted de Bruijn graph to disk.

Parameters
prefix_output_fnis a string which is the prefix of the filename for the two files that are going to be written to disk. Assuming the prefix is "XXX", two files "XXX.gfa" and "XXX.color.bfg" will be written to disk.
nb_threadsis the number of threads that can be used to write the graph to disk.
write_meta_fileindicates if a graph meta file is written to disk. Graph meta files enable faster graph loading.
compressed_outputindicates if the output file is compressed.
verboseis a boolean indicating if information message are printed during writing (true) or not (false).
Returns
a boolean indicating if the graph was successfully written.

The documentation for this class was generated from the following file:
CCDBG_Data_t
If data are to be associated with the unitigs of the colored and compacted de Bruijn graph,...
Definition: ColoredCDBG.hpp:60
CompactedDBG
Represent a Compacted de Bruijn graph.
Definition: CompactedDBG.hpp:313
ColoredCDBG
Represent a Colored and Compacted de Bruijn graph.
Definition: ColoredCDBG.hpp:151