Bifrost
Public Types | Public Member Functions
CompactedDBG< Unitig_data_t, Graph_data_t > Class Template Reference

Represent a Compacted de Bruijn graph. More...

Public Types

typedef unitigIterator< U, G, false > iterator
 An iterator for the unitigs of the graph. More...
 
typedef unitigIterator< U, G, true > const_iterator
 A constant iterator for the unitigs of the graph. More...
 

Public Member Functions

 CompactedDBG (const int kmer_length=31, const int minimizer_length=-1)
 Constructor (set up an empty compacted dBG). More...
 
 CompactedDBG (const CompactedDBG< U, G > &o)
 Copy constructor (copy a compacted de Bruijn graph). More...
 
 CompactedDBG (CompactedDBG< U, G > &&o)
 Move constructor (move a compacted de Bruijn graph). More...
 
virtual ~CompactedDBG ()
 Destructor.
 
CompactedDBG< U, G > & operator= (const CompactedDBG< U, G > &o)
 Copy assignment operator (copy a compacted de Bruijn graph). More...
 
CompactedDBG< U, G > & operator= (CompactedDBG< U, G > &&o)
 Move assignment operator (move a compacted de Bruijn graph). More...
 
CompactedDBG< U, G > & operator+= (const CompactedDBG< U, G > &o)
 Addition assignment operator (merge a compacted de Bruijn graph). More...
 
bool operator== (const CompactedDBG< U, G > &o) const
 Equality operator. More...
 
bool operator!= (const CompactedDBG< U, G > &o) const
 Inequality operator. More...
 
void clear ()
 Clear the graph: empty the graph and reset its parameters.
 
bool build (CDBG_Build_opt &opt)
 Build the Compacted de Bruijn graph. More...
 
bool simplify (const bool delete_short_isolated_unitigs=true, const bool clip_short_tips=true, const bool verbose=false)
 Simplify the Compacted de Bruijn graph: clip short (< 2k length) tips and/or delete short (< 2k length) isolated unitigs. More...
 
bool write (const string &output_fn, const size_t nb_threads=1, const bool GFA_output=true, const bool FASTA_output=false, const bool BFG_output=false, const bool write_index_file=true, const bool compressed_output=false, const bool verbose=false) const
 Write the Compacted de Bruijn graph to disk (GFA1 format). More...
 
bool read (const string &input_graph_fn, const size_t nb_threads=1, const bool verbose=false)
 Load a Compacted de Bruijn graph from disk (GFA1 or FASTA format). More...
 
bool read (const string &input_graph_fn, const string &input_index_fn, const size_t nb_threads=1, const bool verbose=false)
 Read a Compacted de Bruijn graph from disk (GFA1, FASTA or BFG format) using an index file (BFI format). More...
 
UnitigMap< U, G > find (const Kmer &km, const bool extremities_only=false)
 Find the unitig containing the queried k-mer in the Compacted de Bruijn graph. More...
 
const_UnitigMap< U, G > find (const Kmer &km, const bool extremities_only=false) const
 Find the unitig containing the queried k-mer in the Compacted de Bruijn graph. More...
 
UnitigMap< U, G > findUnitig (const char *s, const size_t pos, const size_t len)
 Find the unitig containing the k-mer starting at a given position in a query sequence and extends the mapping (if the k-mer is found, the function extends the mapping from the k-mer as long as the query sequence and the unitig matches). More...
 
const_UnitigMap< U, G > findUnitig (const char *s, const size_t pos, const size_t len) const
 Find the unitig containing the k-mer starting at a given position in a query sequence and extends the mapping (if the k-mer is found, the function extends the mapping from the k-mer as long as the query sequence and the unitig matches). More...
 
vector< pair< size_t, UnitigMap< U, G > > > searchSequence (const string &s, const bool exact, const bool insertion, const bool deletion, const bool substitution, const bool or_exclusive_match=false)
 Performs exact and/or inexact search of the k-mers of a sequence query in the Compacted de Bruijn graph. More...
 
vector< pair< size_t, const_UnitigMap< U, G > > > searchSequence (const string &s, const bool exact, const bool insertion, const bool deletion, const bool substitution, const bool or_exclusive_match=false) const
 Performs exact and/or inexact search of the k-mers of a sequence query in the Compacted de Bruijn graph. More...
 
bool add (const string &seq, const bool verbose=false)
 Add a sequence to the Compacted de Bruijn graph. More...
 
bool remove (const const_UnitigMap< U, G > &um, const bool verbose=false)
 Remove a unitig from the Compacted de Bruijn graph. More...
 
bool merge (const CompactedDBG &o, const size_t nb_threads=1, const bool verbose=false)
 Merge a compacted de Bruijn graph. More...
 
bool merge (const vector< CompactedDBG > &v, const size_t nb_threads=1, const bool verbose=false)
 Merge multiple compacted de Bruijn graphs. More...
 
iterator begin ()
 Create an iterator to the first unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically). More...
 
const_iterator begin () const
 Create an constant iterator to the first unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically). More...
 
iterator end ()
 Create an iterator to the "past-the-last" unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically). More...
 
const_iterator end () const
 Create a constant iterator to the "past-the-last" unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically). More...
 
size_t length () const
 Return the sum of the unitigs length. More...
 
size_t nbKmers () const
 Return the number of k-mers in the graph. More...
 
bool isInvalid () const
 Return a boolean indicating if the graph is invalid (wrong input parameters/files, error occurring during a method, etc.). More...
 
int getK () const
 Return the length of k-mers of the graph. More...
 
int getG () const
 Return the length of minimizers of the graph. More...
 
size_t size () const
 Return the number of unitigs in the graph. More...
 
G * getData ()
 Return a pointer to the graph data. More...
 
const G * getData () const
 Return a constant pointer to the graph data. More...
 

Detailed Description

template<typename Unitig_data_t = void, typename Graph_data_t = void>
class CompactedDBG< Unitig_data_t, Graph_data_t >

Represent a Compacted de Bruijn graph.

The two template parameters of this class corresponds to the type of data to associate with the unitigs of the graph (unitig data) and the type of data to associate with the graph (graph data). If no template parameters are specified or if the types are void, no data are associated with the unitigs nor the graph and no memory will be allocated for such data.

CompactedDBG<> cdbg_1; // No unitig data, no graph data
CompactedDBG<void> cdbg_2; // Equivalent to previous notation
CompactedDBG<void, void> cdbg_3; // Equivalent to previous notation
CompactedDBG<MyUnitigData> cdbg_4; // An object of type MyUnitigData will be associated with each unitig, no graph data
CompactedDBG<MyUnitigData, void> cdbg_5; // Equivalent to previous notation
CompactedDBG<void, MyGraphData> cdbg_6; // No unitig data, an object of type MyGraphData will be associated with the graph
CompactedDBG<MyUnitigData, MyGraphData> cdbg_7; // Unitig data of type MyUnitigData for each unitig, graph data of type MyGraphData

If data are to be associated with the unitigs, these data must be wrapped into a class that inherits from the abstract class CDBG_Data_t, such as in:

class MyUnitigData : public CDBG_Data_t<MyUnitigData> { ... };

Because CDBG_Data_t is an abstract class, all the methods from the base class (CDBG_Data_t) must be implemented in your wrapper (the derived class, aka MyUnitigData in this example). IMPORTANT: If you do not implement those methods in your class, default ones that have no effect will be applied.

Member Typedef Documentation

◆ const_iterator

template<typename Unitig_data_t = void, typename Graph_data_t = void>
typedef unitigIterator<U, G, true> CompactedDBG< Unitig_data_t, Graph_data_t >::const_iterator

A constant iterator for the unitigs of the graph.

No specific order is assumed.

◆ iterator

template<typename Unitig_data_t = void, typename Graph_data_t = void>
typedef unitigIterator<U, G, false> CompactedDBG< Unitig_data_t, Graph_data_t >::iterator

An iterator for the unitigs of the graph.

No specific order is assumed.

Constructor & Destructor Documentation

◆ CompactedDBG() [1/3]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
CompactedDBG< Unitig_data_t, Graph_data_t >::CompactedDBG ( const int  kmer_length = 31,
const int  minimizer_length = -1 
)

Constructor (set up an empty compacted dBG).

Parameters
kmer_lengthis the length k of k-mers used in the graph (each unitig is of length at least k).
minimizer_lengthis the length g of minimizers (g < k) used in the graph.

◆ CompactedDBG() [2/3]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
CompactedDBG< Unitig_data_t, Graph_data_t >::CompactedDBG ( const CompactedDBG< U, G > &  o)

Copy constructor (copy a compacted de Bruijn graph).

This function is expensive in terms of time and memory as the content of a compacted de Bruijn graph is copied. After the call to this function, the same graph exists twice in memory.

Parameters
ois a constant reference to the compacted de Bruijn graph to copy.

◆ CompactedDBG() [3/3]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
CompactedDBG< Unitig_data_t, Graph_data_t >::CompactedDBG ( CompactedDBG< U, G > &&  o)

Move constructor (move a compacted de Bruijn graph).

The content of o is moved ("transfered") to a new compacted de Bruijn graph. The compacted de Bruijn graph referenced by o will be empty after the call to this constructor.

Parameters
ois a reference on a reference to the compacted de Bruijn graph to move.

Member Function Documentation

◆ add()

template<typename Unitig_data_t = void, typename Graph_data_t = void>
bool CompactedDBG< Unitig_data_t, Graph_data_t >::add ( const string &  seq,
const bool  verbose = false 
)

Add a sequence to the Compacted de Bruijn graph.

Non-{A,C,G,T} characters such as Ns are discarded. The function automatically breaks the sequence into unitig(s). Those unitigs can be stored as the reverse-complement of the input sequence.

Parameters
seqis a string containing the sequence to insert.
verboseis a boolean indicating if information messages must be printed during the function execution.
Returns
a boolean indicating if the sequence was successfully inserted in the graph.

◆ begin() [1/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
iterator CompactedDBG< Unitig_data_t, Graph_data_t >::begin ( )

Create an iterator to the first unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically).

Returns
an iterator to the first unitig of the graph.

◆ begin() [2/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
const_iterator CompactedDBG< Unitig_data_t, Graph_data_t >::begin ( ) const

Create an constant iterator to the first unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically).

Returns
a constant iterator to the first unitig of the graph.

◆ build()

template<typename Unitig_data_t = void, typename Graph_data_t = void>
bool CompactedDBG< Unitig_data_t, Graph_data_t >::build ( CDBG_Build_opt opt)

Build the Compacted de Bruijn graph.

Parameters
optis a structure from which the members are parameters of this function. See CDBG_Build_opt.
Returns
boolean indicating if the graph has been built successfully.

◆ end() [1/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
iterator CompactedDBG< Unitig_data_t, Graph_data_t >::end ( )

Create an iterator to the "past-the-last" unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically).

Returns
an iterator to the "past-the-last" unitig of the graph.

◆ end() [2/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
const_iterator CompactedDBG< Unitig_data_t, Graph_data_t >::end ( ) const

Create a constant iterator to the "past-the-last" unitig of the Compacted de Bruijn graph (unitigs are NOT sorted lexicographically).

Returns
a constant iterator to the "past-the-last" unitig of the graph.

◆ find() [1/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
UnitigMap<U, G> CompactedDBG< Unitig_data_t, Graph_data_t >::find ( const Kmer km,
const bool  extremities_only = false 
)

Find the unitig containing the queried k-mer in the Compacted de Bruijn graph.

Parameters
kmis the queried k-mer (see Kmer class). It does not need to be a canonical k-mer.
extremities_onlyis a boolean indicating if the k-mer must be searched only in the unitig heads and tails (extremities_only = true). By default, the k-mer is searched everywhere (extremities_only = false) but is is slightly slower than looking only in the unitig heads and tails.
Returns
UnitigMap<U, G> object containing the k-mer mapping information to the unitig containing the queried k-mer (if present). If the queried k-mer is not found, UnitigMap::isEmpty = true (see UnitigMap class).

◆ find() [2/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
const_UnitigMap<U, G> CompactedDBG< Unitig_data_t, Graph_data_t >::find ( const Kmer km,
const bool  extremities_only = false 
) const

Find the unitig containing the queried k-mer in the Compacted de Bruijn graph.

Parameters
kmis the queried k-mer (see Kmer class). It does not need to be a canonical k-mer.
extremities_onlyis a boolean indicating if the k-mer must be searched only in the unitig heads and tails (extremities_only = true). By default, the k-mer is searched everywhere (extremities_only = false) but is is slightly slower than looking only in the unitig heads and tails.
Returns
const_UnitigMap<U, G> object containing the k-mer mapping information to the unitig having the queried k-mer (if present). If the k-mer is not found, const_UnitigMap::isEmpty = true (see UnitigMap class).

◆ findUnitig() [1/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
UnitigMap<U, G> CompactedDBG< Unitig_data_t, Graph_data_t >::findUnitig ( const char *  s,
const size_t  pos,
const size_t  len 
)

Find the unitig containing the k-mer starting at a given position in a query sequence and extends the mapping (if the k-mer is found, the function extends the mapping from the k-mer as long as the query sequence and the unitig matches).

Parameters
sis a pointer to an array of character containing the sequence to query.
posis the position of the first k-mer to find in the sequence to query.
lenis the length of s.
Returns
UnitigMap<U, G> object containing the mapping information to the unitig having the queried k-mer (if present). If the k-mer is found, the function extends the mapping from the k-mer as long as the query sequence and the unitig matches (um.len >= 1).

◆ findUnitig() [2/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
const_UnitigMap<U, G> CompactedDBG< Unitig_data_t, Graph_data_t >::findUnitig ( const char *  s,
const size_t  pos,
const size_t  len 
) const

Find the unitig containing the k-mer starting at a given position in a query sequence and extends the mapping (if the k-mer is found, the function extends the mapping from the k-mer as long as the query sequence and the unitig matches).

Parameters
sis a pointer to an array of character containing the sequence to query.
posis the position of the first k-mer to find in the sequence to query.
lenis the length of s.
Returns
const_UnitigMap<U, G> object containing the mapping information to the unitig having the queried k-mer (if present). If the k-mer is found, the function extends the mapping from the k-mer as long as the query sequence and the unitig matches (um.len >= 1).

◆ getData() [1/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
G* CompactedDBG< Unitig_data_t, Graph_data_t >::getData ( )
inline

Return a pointer to the graph data.

Pointer is nullptr if type of graph data is void.

Returns
A pointer to the graph data. Pointer is nullptr if type of graph data is void.

◆ getData() [2/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
const G* CompactedDBG< Unitig_data_t, Graph_data_t >::getData ( ) const
inline

Return a constant pointer to the graph data.

Pointer is nullptr if type of graph data is void.

Returns
A constant pointer to the graph data. Pointer is nullptr if type of graph data is void.

◆ getG()

template<typename Unitig_data_t = void, typename Graph_data_t = void>
int CompactedDBG< Unitig_data_t, Graph_data_t >::getG ( ) const
inline

Return the length of minimizers of the graph.

Returns
Length of minimizers of the graph.

◆ getK()

template<typename Unitig_data_t = void, typename Graph_data_t = void>
int CompactedDBG< Unitig_data_t, Graph_data_t >::getK ( ) const
inline

Return the length of k-mers of the graph.

Returns
Length of k-mers of the graph.

◆ isInvalid()

template<typename Unitig_data_t = void, typename Graph_data_t = void>
bool CompactedDBG< Unitig_data_t, Graph_data_t >::isInvalid ( ) const
inline

Return a boolean indicating if the graph is invalid (wrong input parameters/files, error occurring during a method, etc.).

Returns
A boolean indicating if the graph is invalid.

◆ length()

template<typename Unitig_data_t = void, typename Graph_data_t = void>
size_t CompactedDBG< Unitig_data_t, Graph_data_t >::length ( ) const

Return the sum of the unitigs length.

Returns
An integer which corresponds to the sum of the unitigs length.

◆ merge() [1/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
bool CompactedDBG< Unitig_data_t, Graph_data_t >::merge ( const CompactedDBG< Unitig_data_t, Graph_data_t > &  o,
const size_t  nb_threads = 1,
const bool  verbose = false 
)

Merge a compacted de Bruijn graph.

After merging, all unitigs of o have been added to and compacted with the current compacted de Bruijn graph (this). If the unitigs of o had data of type "MyUnitigData" associated, they have been added to the current compacted de Bruijn graph using the functions of the class MyUnitigData which are also present in its base class CDBG_Data_t<MyUnitigData>. Note that if multiple compacted de Bruijn graphs have to be merged, it is more efficient to call CompactedDBG::merge with a vector of CompactedDBG as input.

Parameters
ois a constant reference to the compacted de Bruijn graph to merge.
nb_threadsis an integer indicating how many threads can be used during the merging.
verboseis a boolean indicating if information messages must be printed during the execution of the function.
Returns
a boolean indicating if the graph has been successfully merged.

◆ merge() [2/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
bool CompactedDBG< Unitig_data_t, Graph_data_t >::merge ( const vector< CompactedDBG< Unitig_data_t, Graph_data_t > > &  v,
const size_t  nb_threads = 1,
const bool  verbose = false 
)

Merge multiple compacted de Bruijn graphs.

After merging, all unitigs of the compacted de Bruijn graphs have been added to and compacted with the current compacted de Bruijn graph (this). If the unitigs had data of type "MyUnitigData" associated, they have been added to the current compacted de Bruijn graph using the functions of the class MyUnitigData which are also present in its base class CCDBG_Data_t<MyUnitigData>.

Parameters
vis a constant reference to a vector of colored and compacted de Bruijn graphs to merge.
nb_threadsis an integer indicating how many threads can be used during the merging.
verboseis a boolean indicating if information messages must be printed during the execution of the function.
Returns
a boolean indicating if the graphs have been successfully merged.

◆ nbKmers()

template<typename Unitig_data_t = void, typename Graph_data_t = void>
size_t CompactedDBG< Unitig_data_t, Graph_data_t >::nbKmers ( ) const

Return the number of k-mers in the graph.

Returns
An integer which corresponds to the number of k-mers in the graph.

◆ operator!=()

template<typename Unitig_data_t = void, typename Graph_data_t = void>
bool CompactedDBG< Unitig_data_t, Graph_data_t >::operator!= ( const CompactedDBG< U, G > &  o) const
inline

Inequality operator.

Returns
a boolean indicating if two compacted de Bruijn graphs have different unitigs (does not compare the data associated with the unitigs).

◆ operator+=()

template<typename Unitig_data_t = void, typename Graph_data_t = void>
CompactedDBG<U, G>& CompactedDBG< Unitig_data_t, Graph_data_t >::operator+= ( const CompactedDBG< U, G > &  o)

Addition assignment operator (merge a compacted de Bruijn graph).

After merging, all unitigs of o have been added to and compacted with the current compacted de Bruijn graph (this). If the unitigs of o had data of type "MyUnitigData" associated, they have been added to the current compacted de Bruijn graph using the functions of the class MyUnitigData which are in base class CDBG_Data_t<MyUnitigData>. This function is similar to CompactedDBG::merge except that it uses only one thread while CompactedDBG::merge can work with multiple threads (number of threads provided as a parameter). Note that if multiple compacted de Bruijn graphs have to be merged, it is more efficient to call CompactedDBG::merge with a vector of CompactedDBG as input.

Parameters
ois a constant reference to the compacted de Bruijn graph to merge.
Returns
a reference to the current compacted de Bruijn after merging.

◆ operator=() [1/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
CompactedDBG<U, G>& CompactedDBG< Unitig_data_t, Graph_data_t >::operator= ( CompactedDBG< U, G > &&  o)

Move assignment operator (move a compacted de Bruijn graph).

The content of o is moved ("transfered") to a new compacted de Bruijn graph. The compacted de Bruijn graph referenced by o will be empty after the call to this operator.

Parameters
ois a reference on a reference to the compacted de Bruijn graph to move.
Returns
a reference to the compacted de Bruijn which has (and owns) the content of o.

◆ operator=() [2/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
CompactedDBG<U, G>& CompactedDBG< Unitig_data_t, Graph_data_t >::operator= ( const CompactedDBG< U, G > &  o)

Copy assignment operator (copy a compacted de Bruijn graph).

This function is expensive in terms of time and memory as the content of a compacted de Bruijn graph is copied. After the call to this function, the same graph exists twice in memory.

Parameters
ois a constant reference to the compacted de Bruijn graph to copy.
Returns
a reference to the compacted de Bruijn which is the copy.

◆ operator==()

template<typename Unitig_data_t = void, typename Graph_data_t = void>
bool CompactedDBG< Unitig_data_t, Graph_data_t >::operator== ( const CompactedDBG< U, G > &  o) const

Equality operator.

Returns
a boolean indicating if two compacted de Bruijn graphs have the same unitigs (does not compare the data associated with the unitigs).

◆ read() [1/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
bool CompactedDBG< Unitig_data_t, Graph_data_t >::read ( const string &  input_graph_fn,
const size_t  nb_threads = 1,
const bool  verbose = false 
)

Load a Compacted de Bruijn graph from disk (GFA1 or FASTA format).

This function detects if an index file (BFI format) exists (same prefix as graph) for the input graph and will use it to load the graph. Otherwise, loading will be slower than read() with the index graph file. If the input GFA file has not been built by Bifrost or if the input is FASTA format, it is your responsibility to make sure that the graph is correctly compacted and to set correctly the parameters of the graph (such as the k-mer length) before the call to this function.

Parameters
input_graph_fnis a string containing the name of the graph file to read.
nb_threadsis a number indicating how many threads can be used to read the graph from disk.
verboseis a boolean indicating if information messages must be printed during the function execution.
Returns
boolean indicating if the graph has been read successfully.

◆ read() [2/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
bool CompactedDBG< Unitig_data_t, Graph_data_t >::read ( const string &  input_graph_fn,
const string &  input_index_fn,
const size_t  nb_threads = 1,
const bool  verbose = false 
)

Read a Compacted de Bruijn graph from disk (GFA1, FASTA or BFG format) using an index file (BFI format).

Index files make the loading much faster than the other function read() without meta graph file. If the input GFA file has not been built by Bifrost or if the input is FASTA format, it is your responsibility to make sure that the graph is correctly compacted and to set correctly the parameters of the graph (k-mer length and g-mer) before the call to this function.

Parameters
input_graph_fnis a string containing the name of the graph file to read.
input_index_fnis a string containing the name of the index file to read.
nb_threadsis a number indicating how many threads can be used to read the graph from disk.
verboseis a boolean indicating if information messages must be printed during the function execution.
Returns
boolean indicating if the graph has been read successfully.

◆ remove()

template<typename Unitig_data_t = void, typename Graph_data_t = void>
bool CompactedDBG< Unitig_data_t, Graph_data_t >::remove ( const const_UnitigMap< U, G > &  um,
const bool  verbose = false 
)

Remove a unitig from the Compacted de Bruijn graph.

Parameters
umis a UnitigMap object containing the information of the unitig to remove from the graph.
verboseis a boolean indicating if information messages must be printed during the execution of the function.
Returns
a boolean indicating if the unitig was successfully removed from the graph.

◆ searchSequence() [1/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
vector<pair<size_t, UnitigMap<U, G> > > CompactedDBG< Unitig_data_t, Graph_data_t >::searchSequence ( const string &  s,
const bool  exact,
const bool  insertion,
const bool  deletion,
const bool  substitution,
const bool  or_exclusive_match = false 
)

Performs exact and/or inexact search of the k-mers of a sequence query in the Compacted de Bruijn graph.

Parameters
sis a string representing the sequence to be searched (the query).
exactis a boolean indicating if the exact k-mers of string s must be searched.
insertionis a boolean indicating if the inexact k-mers of string s, with one insertion, must be searched.
deletionis a boolean indicating if the inexact k-mers of string s, with one deletion, must be searched.
substitutionis a boolean indicating if the inexact k-mers of string s, with one substitution, must be searched.
or_exclusive_matchis a boolean indicating to NOT search for the inexact k-mers at any given position in s if the exact corresponding k-mer at that position is found in the graph. This option might lead to a substantial running time decrease.
Returns
a vector of pair<size_t, UnitigMap<U, G>> objects. Each such pair has two elements: the position of the k-mer match in sequence s and the corresponding k-mer match in the graph. Note that no information is given on whether the match is exact or inexact, nor on what edit operation makes the match to be inexact or at what position the edit operation takes place.

◆ searchSequence() [2/2]

template<typename Unitig_data_t = void, typename Graph_data_t = void>
vector<pair<size_t, const_UnitigMap<U, G> > > CompactedDBG< Unitig_data_t, Graph_data_t >::searchSequence ( const string &  s,
const bool  exact,
const bool  insertion,
const bool  deletion,
const bool  substitution,
const bool  or_exclusive_match = false 
) const

Performs exact and/or inexact search of the k-mers of a sequence query in the Compacted de Bruijn graph.

Parameters
sis a string representing the sequence to be searched (the query).
exactis a boolean indicating if the exact k-mers of string s must be searched.
insertionis a boolean indicating if the inexact k-mers of string s, with one insertion, must be searched.
deletionis a boolean indicating if the inexact k-mers of string s, with one deletion, must be searched.
substitutionis a boolean indicating if the inexact k-mers of string s, with one substitution, must be searched.
or_exclusive_matchis a boolean indicating to NOT search for the inexact k-mers at any given position in s if the exact corresponding k-mer at that position is found in the graph. This option might lead to a substantial running time decrease.
Returns
a vector of pair<size_t, const_UnitigMap<U, G>> objects. Each such pair has two elements: the position of the k-mer match in sequence s and the corresponding k-mer match in the graph. Note that no information is given on whether the match is exact or inexact, nor on what edit operation makes the match to be inexact or at what position the edit operation takes place.

◆ simplify()

template<typename Unitig_data_t = void, typename Graph_data_t = void>
bool CompactedDBG< Unitig_data_t, Graph_data_t >::simplify ( const bool  delete_short_isolated_unitigs = true,
const bool  clip_short_tips = true,
const bool  verbose = false 
)

Simplify the Compacted de Bruijn graph: clip short (< 2k length) tips and/or delete short (< 2k length) isolated unitigs.

Parameters
delete_short_isolated_unitigsis a boolean indicating short isolated unitigs must be removed.
clip_short_tipsis a boolean indicating short tips must be clipped.
verboseis a boolean indicating if information messages must be printed during the function execution.
Returns
boolean indicating if the graph has been simplified successfully.

◆ size()

template<typename Unitig_data_t = void, typename Graph_data_t = void>
size_t CompactedDBG< Unitig_data_t, Graph_data_t >::size ( ) const
inline

Return the number of unitigs in the graph.

Returns
Number of unitigs in the graph.

◆ write()

template<typename Unitig_data_t = void, typename Graph_data_t = void>
bool CompactedDBG< Unitig_data_t, Graph_data_t >::write ( const string &  output_fn,
const size_t  nb_threads = 1,
const bool  GFA_output = true,
const bool  FASTA_output = false,
const bool  BFG_output = false,
const bool  write_index_file = true,
const bool  compressed_output = false,
const bool  verbose = false 
) const

Write the Compacted de Bruijn graph to disk (GFA1 format).

Parameters
output_fnis a string containing the name of the file in which the graph will be written.
nb_threadsis a number indicating how many threads can be used to write the graph to disk.
GFA_outputindicates if the graph will be output in GFA format.
FASTA_outputindicates if the graph will be output in FASTA format.
BFG_outputindicates if the graph will be output in BFG/BFI format.
write_index_fileindicates if an index file is written to disk. Index files enable faster graph loading. This parameter is discarded if BFG format output is selected (index output is required then).
compressed_outputindicates if the output file is compressed.
verboseis a boolean indicating if information messages must be printed during the function execution.
Returns
boolean indicating if the graph has been written successfully.

The documentation for this class was generated from the following file:
CDBG_Data_t
If data are to be associated with the unitigs of the compacted de Bruijn graph, those data must be wr...
Definition: CompactedDBG.hpp:220
CompactedDBG
Represent a Compacted de Bruijn graph.
Definition: CompactedDBG.hpp:313