report¶
A “HMMReport” object represents the results of a Hmmer program search on a dataset with a hidden Markov model protein profile (see this section).
This object has methods to extract and filter Hmmer raw outputs (see generated output files), and then build Hits relevant for system detection.
For matches selected with the filtering parameters, “Hit” objects (macsypy.HMMReport.Hit
) are built.
report API reference¶
HMMReport¶
-
class
macsypy.report.
HMMReport
(gene, hmmer_output, cfg)[source]¶ Handle the results from the HMM search. Extract a synthetic report from the raw hmmer output, after having applied a hit filtering. This class is an abstract class. There are two implementations of this abstract class depending on whether the input sequence dataset is “ordered” (“gembase” or “ordered_replicon” db_type) or not (“unordered” db_type).
-
__init__
(gene, hmmer_output, cfg)[source]¶ - Parameters
gene (
macsypy.gene.CoreGene
object) – the gene corresponding to the profile search reported herehmmer_output (string) – The path to the raw Hmmer output file
cfg (
macsypy.config.Config
object) – the configuration object
-
__weakref__
¶ list of weak references to the object (if defined)
-
_build_my_db
(hmm_output)[source]¶ Build the keys of a dictionary object to store sequence identifiers of hits.
- Parameters
hmm_output (string) – the path to the hmmsearch output to parse.
- Returns
a dictionary containing a key for each sequence id of the hits
- Return type
dict
-
_fill_my_db
(db)[source]¶ Fill the dictionary with information on the matched sequences
- Parameters
db (dict) – the database containing all sequence id of the hits.
-
abstract
_get_replicon_name
(hit_id)[source]¶ This method is used by extract method and must be implemented by concrete class
- Parameters
hit_id (str) – the id of the current hit extract from hmm output.
- Returns
The name of the replicon
-
_hit_start
(line)[source]¶ - Parameters
line (string) – the line to parse
- Returns
True if it’s the beginning of a new hit in Hmmer raw output files. False otherwise
- Return type
boolean.
-
_parse_hmm_body
(hit_id, gene_profile_lg, seq_lg, coverage_threshold, replicon_name, position_hit, i_evalue_sel, b_grp)[source]¶ Parse the raw Hmmer output to extract the hits, and filter them with threshold criteria selected (“coverage_profile” and “i_evalue_select” command-line parameters)
- Parameters
hit_id (str) – the sequence identifier
gene_profile_lg (int) – the length of the profile matched
coverage_threshold (float) – the minimal coverage of the profile to be reached in the Hmmer alignment for hit selection.
replicon_name (str) – the identifier of the replicon
position_hit (int) – the rank of the sequence matched in the input dataset file
i_evalue_sel (float) – the maximal i-evalue (independent evalue) for hit selection
b_grp (list of list of strings) – the Hmmer output lines to deal with (grouped by hit)
- Paramint seq_lg
the length of the sequence
- Returns
a sequence of hits
- Return type
list of
macsypy.report.CoreHit
objects
-
_parse_hmm_header
(h_grp)[source]¶ - Parameters
h_grp (sequence of string (<itertools._grouper object at 0x7ff9912e3b50>)) – the sequence of string return by groupby function representing the header of a hit
- Returns
the sequence identifier from a set of lines that corresponds to a single hit
- Return type
string
-