HMMReport API¶
A “HMMReport” object represents the results of a Hmmer program search on a dataset with a hidden Markov model protein profile (see this section).
This object has methods to extract and filter Hmmer raw outputs (see generated output files), and then build Hits relevant for system detection.
For matches selected with the filtering parameters, “Hit” objects (macsypy.HMMReport.Hit
) are built.
HMMReport API reference¶
-
class
macsypy.report.
HMMReport
(gene, hmmer_output, cfg)[source]¶ Handle the results from the HMM search. Extract a synthetic report from the raw hmmer output, after having applied a hit filtering. This class is an abstract class. There are two implementations of this abstract class depending on whether the input sequence dataset is “ordered” (“gembase” or “ordered_replicon” db_type) or not (“unordered” or “unordered_replicon” db_type).
-
__init__
(gene, hmmer_output, cfg)[source]¶ Parameters: - gene (
macsypy.gene.Gene
object) – the gene corresponding to the profile search reported here - hmmer_output (string) – The path to the raw Hmmer output file
- cfg (
macsypy.config.Config
object) – the configuration object
- gene (
-
__metaclass__
¶ alias of
ABCMeta
-
__weakref__
¶ list of weak references to the object (if defined)
-
_build_my_db
(hmm_output)[source]¶ Build the keys of a dictionary object to store sequence identifiers of hits.
Parameters: hmm_output (string) – the path to the hmmsearch output to parse. Returns: a dictionary containing a key for each sequence id of the hits Return type: dict
-
_fill_my_db
(macsyfinder_idx, db)[source]¶ Fill the dictionary with information on the matched sequences
Parameters: - macsyfinder_idx (string) – the path the macsyfinder index corresponding to the dataset
- db (dict) – the database containing all sequence id of the hits.
-
_hit_start
(line)[source]¶ Parameters: line (string) – the line to parse Returns: True if it’s the beginning of a new hit in Hmmer raw output files. False otherwise Return type: boolean.
-
_parse_hmm_body
(hit_id, gene_profile_lg, seq_lg, coverage_treshold, replicon_name, position_hit, i_evalue_sel, b_grp)[source]¶ Parse the raw Hmmer output to extract the hits, and filter them with threshold criteria selected (“coverage_profile” and “i_evalue_select” command-line parameters)
Parameters: - hit_id (string) – the sequence identifier
- gene_profile_lg (integer) – the length of the profile matched
- seq_lg (integer) – the length of the sequence
- coverage_treshold (float) – the minimal coverage of the profile to be reached in the Hmmer alignment for hit selection
- replicon_name (string) – the identifier of the replicon
- position_hit (integer) – the rank of the sequence matched in the input dataset file
- i_evalue_sel (float) – the maximal i-evalue (independent evalue) for hit selection
- b_grp (list of list of strings) – the Hmmer output lines to deal with (grouped by hit)
Returns: a set of hits
Return type: list of
macsypy.report.Hit
objects
-
_parse_hmm_header
(h_grp)[source]¶ Parameters: h_grp (sequence of string) – the sequence of string return by groupby function representing the header of a hit Returns: the sequence identifier from a set of lines that corresponds to a single hit Return type: string
-
GeneralHMMReport API reference¶
OrderedHMMReport¶
GembaseHMMReport¶
Hit¶
-
class
macsypy.report.
Hit
(gene, system, hit_id, hit_seq_length, replicon_name, position_hit, i_eval, score, profile_coverage, sequence_coverage, begin_match, end_match)[source]¶ Handle the hits filtered from the Hmmer search. The hits are instanciated by
HMMReport.extract()
method-
__cmp__
(other)[source]¶ Compare two Hits. If the sequence identifier is the same, do the comparison on the score. Otherwise, do it on alphabetical comparison of the sequence identifier.
Parameters: other ( macsypy.report.Hit
object) – the hit to compare to the current objectReturns: the result of the comparison
-
__eq__
(other)[source]¶ Return True if two hits are totally equivalent, False otherwise.
Parameters: other ( macsypy.report.Hit
object) – the hit to compare to the current objectReturns: the result of the comparison Return type: boolean
-
__init__
(gene, system, hit_id, hit_seq_length, replicon_name, position_hit, i_eval, score, profile_coverage, sequence_coverage, begin_match, end_match)[source]¶ Parameters: - gene (
macsypy.gene.Gene
object) – the gene corresponding to this profile - system (
macsypy.system.System
object) – the system to which this gene belongs - hit_id (string) – the identifier of the hit
- hit_seq_length (integer) – the length of the hit sequence
- replicon_name (string) – the name of the replicon
- position_hit (integer) – the rank of the sequence matched in the input dataset file
- i_eval (float) – the best-domain evalue (i-evalue, “independent evalue”)
- score (float) – the score of the hit
- profile_coverage (float) – percentage of the profile that matches the hit sequence
- sequence_coverage (float) – percentage of the hit sequence that matches the profile
- begin_match (integer) – where the hit with the profile starts in the sequence
- end_match (integer) – where the hit with the profile ends in the sequence
- gene (
-
__str__
()[source]¶ Print useful information on the Hit: regarding Hmmer statistics, and sequence information
-
__weakref__
¶ list of weak references to the object (if defined)
-