HMMReport API¶

A “HMMReport” object represents the results of a Hmmer program search on a dataset with a hidden Markov model protein profile (see this section). This object has methods to extract and filter Hmmer raw outputs (see generated output files), and then build Hits relevant for system detection. For matches selected with the filtering parameters, “Hit” objects (macsypy.HMMReport.Hit) are built.

HMMReport API reference¶

class macsypy.report.HMMReport(gene, hmmer_output, cfg)[source]¶

Handle the results from the HMM search. Extract a synthetic report from the raw hmmer output, after having applied a hit filtering. This class is an abstract class. There are two implementations of this abstract class depending on whether the input sequence dataset is “ordered” (“gembase” or “ordered_replicon” db_type) or not (“unordered” or “unordered_replicon” db_type).

__init__(gene, hmmer_output, cfg)[source]¶

Parameters:	gene (`macsypy.gene.Gene` object) – the gene corresponding to the profile search reported here hmmer_output (string) – The path to the raw Hmmer output file cfg (`macsypy.config.Config` object) – the configuration object

__metaclass__¶: alias of ABCMeta

__str__()[source]¶: Print information on filtered hits

__weakref__¶: list of weak references to the object (if defined)

_build_my_db(hmm_output)[source]¶

Build the keys of a dictionary object to store sequence identifiers of hits.

Parameters:	hmm_output (string) – the path to the hmmsearch output to parse.
Returns:	a dictionary containing a key for each sequence id of the hits
Return type:	dict

_fill_my_db(macsyfinder_idx, db)[source]¶

Fill the dictionary with information on the matched sequences

Parameters:	macsyfinder_idx (string) – the path the macsyfinder index corresponding to the dataset db (dict) – the database containing all sequence id of the hits.

_hit_start(line)[source]¶

Parameters:	line (string) – the line to parse
Returns:	True if it’s the beginning of a new hit in Hmmer raw output files. False otherwise
Return type:	boolean.

_parse_hmm_body(hit_id, gene_profile_lg, seq_lg, coverage_treshold, replicon_name, position_hit, i_evalue_sel, b_grp)[source]¶

Parse the raw Hmmer output to extract the hits, and filter them with threshold criteria selected (“coverage_profile” and “i_evalue_select” command-line parameters)

Parameters:	hit_id (string) – the sequence identifier gene_profile_lg (integer) – the length of the profile matched seq_lg (integer) – the length of the sequence coverage_treshold (float) – the minimal coverage of the profile to be reached in the Hmmer alignment for hit selection replicon_name (string) – the identifier of the replicon position_hit (integer) – the rank of the sequence matched in the input dataset file i_evalue_sel (float) – the maximal i-evalue (independent evalue) for hit selection b_grp (list of list of strings) – the Hmmer output lines to deal with (grouped by hit)
Returns:	a set of hits
Return type:	list of `macsypy.report.Hit` objects

_parse_hmm_header(h_grp)[source]¶

Parameters:	h_grp (sequence of string) – the sequence of string return by groupby function representing the header of a hit
Returns:	the sequence identifier from a set of lines that corresponds to a single hit
Return type:	string

best_hit()[source]¶: Return the best hit among multiple hits

extract()[source]¶: Parse the raw Hmmer output file and produce a new synthetic report file by applying a filter on hits. Contain selected and sorted hits ( this abstract method is implemented in inherited classes )

save_extract()[source]¶: Write the string representation of the extract report in a file. The name of this file is the concatenation of the gene name and of the “res_extract_suffix” from the config object

GeneralHMMReport API reference¶

class macsypy.report.GeneralHMMReport(gene, hmmer_output, cfg)[source]¶

Handle HMM report. Extract a synthetic report from the raw hmmer output. Dedicated to any type of ‘unordered’ datasets.

extract()[source]¶: Parse the output file of hmmer compute from an unordered genes base and produced a new synthetic report file.

OrderedHMMReport¶

class macsypy.report.OrderedHMMReport(gene, hmmer_output, cfg)[source]¶

Handle HMM report. Extract a synthetic report from the raw hmmer output. Dedicated to ‘ordered_replicon’ datasets.

extract()[source]¶: Parse the output file of Hmmer obtained from a search in an ordered set of sequences and produce a new synthetic report file.

GembaseHMMReport¶

class macsypy.report.GembaseHMMReport(gene, hmmer_output, cfg)[source]¶

Handle HMM report. Extract a synthetic report from the raw hmmer output. Dedicated to ‘gembase’ format datasets.

extract()[source]¶: Parse the output file of Hmmer obtained from a search in a ‘gembase’ set of sequences and produce a new synthetic report file.

Hit¶

class macsypy.report.Hit(gene, system, hit_id, hit_seq_length, replicon_name, position_hit, i_eval, score, profile_coverage, sequence_coverage, begin_match, end_match)[source]¶

Handle the hits filtered from the Hmmer search. The hits are instanciated by HMMReport.extract() method

__cmp__(other)[source]¶

Compare two Hits. If the sequence identifier is the same, do the comparison on the score. Otherwise, do it on alphabetical comparison of the sequence identifier.

Parameters:	other (`macsypy.report.Hit` object) – the hit to compare to the current object
Returns:	the result of the comparison

__eq__(other)[source]¶

Return True if two hits are totally equivalent, False otherwise.

Parameters:	other (`macsypy.report.Hit` object) – the hit to compare to the current object
Returns:	the result of the comparison
Return type:	boolean

__init__(gene, system, hit_id, hit_seq_length, replicon_name, position_hit, i_eval, score, profile_coverage, sequence_coverage, begin_match, end_match)[source]¶

Parameters:

gene (macsypy.gene.Gene object) – the gene corresponding to this profile
system (macsypy.system.System object) – the system to which this gene belongs
hit_id (string) – the identifier of the hit
hit_seq_length (integer) – the length of the hit sequence
replicon_name (string) – the name of the replicon
position_hit (integer) – the rank of the sequence matched in the input dataset file
i_eval (float) – the best-domain evalue (i-evalue, “independent evalue”)
score (float) – the score of the hit
profile_coverage (float) – percentage of the profile that matches the hit sequence
sequence_coverage (float) – percentage of the hit sequence that matches the profile
begin_match (integer) – where the hit with the profile starts in the sequence
end_match (integer) – where the hit with the profile ends in the sequence

__str__()[source]¶: Print useful information on the Hit: regarding Hmmer statistics, and sequence information

__weakref__¶: list of weak references to the object (if defined)

get_position()[source]¶

Returns:	the position of the hit (rank in the input dataset file)
Return type:	integer

get_syst_inter_gene_max_space()[source]¶

Returns:	the ‘inter_gene_max_space’ parameter defined for the gene of the hit
Return type:	integer

HMMReport API¶

HMMReport API reference¶

GeneralHMMReport API reference¶

OrderedHMMReport¶

GembaseHMMReport¶

Hit¶

Table Of Contents

Previous topic

Next topic

This Page