hit

This module implements class relative to hit and some functions to do some computation on hit objects.

macsypy.hit.CoreHit

Modelize a hmm hit on the replicon. There is only one Corehit for a CoreGene.

macsypy.hit.ModelHit

Modelize a hit and its relation to the Model.

macsypy.hit.AbstractCounterpartHit

Parent class of Loner, MultiSystem. It’s inherits from ModelHit.

macsypy.hit.Loner

Modelize “true” Loner.

macsypy.hit.MultiSystem

Modelize hit which can be used in several Systems (same model)

macsypy.hit.LonerMultiSystem

Modelize a hit representing a gene Loner and MultiSystem at same time.

macsypy.hit.HitWeight

The weights apply to the hit to compute score

macsypy.hit.get_best_hit_4_func()

Return the best hit for a given function

macsypy.hit.sort_model_hits()

Sort hits

macsypy.hit.compute_best_MSHit()

Choose among svereal multisystem hits the best one

macsypy.hit.get_best_hits()

If several profile hit the same gene return the best hit

A Hit is created when hmmsearch find similarities between a profile and protein of the input dataset

Below the ingheritance diagram of Hits

digraph inheritanced9e8cfc1bc { rankdir=LR; size="8.0, 12.0"; "AbstractCounterpartHit" [URL="#macsypy.hit.AbstractCounterpartHit",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top",tooltip="Abstract Class to handle ModelHit wit equivalent for instance Loner or MultiSystem hit"]; "ModelHit" -> "AbstractCounterpartHit" [arrowsize=0.5,style="setlinewidth(0.5)"]; "CoreHit" [URL="#macsypy.hit.CoreHit",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top",tooltip="Handle the hits filtered from the Hmmer search."]; "Loner" [URL="#macsypy.hit.Loner",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top",tooltip="Handle hit which encode for a gene tagged as loner and which not clustering with other hit."]; "AbstractCounterpartHit" -> "Loner" [arrowsize=0.5,style="setlinewidth(0.5)"]; "LonerMultiSystem" [URL="#macsypy.hit.LonerMultiSystem",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top",tooltip="Handle hit which encode for a gene"]; "Loner" -> "LonerMultiSystem" [arrowsize=0.5,style="setlinewidth(0.5)"]; "MultiSystem" -> "LonerMultiSystem" [arrowsize=0.5,style="setlinewidth(0.5)"]; "ModelHit" [URL="#macsypy.hit.ModelHit",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top",tooltip="Encapsulates a :class:`macsypy.report.CoreHit`"]; "MultiSystem" [URL="#macsypy.hit.MultiSystem",fontname="Vera Sans, DejaVu Sans, Liberation Sans, Arial, Helvetica, sans",fontsize=10,height=0.25,shape=box,style="setlinewidth(0.5)",target="_top",tooltip="Handle hit which encode for a gene tagged as loner and which not clustering with other hit."]; "AbstractCounterpartHit" -> "MultiSystem" [arrowsize=0.5,style="setlinewidth(0.5)"]; }

And a diagram showing the interaction between CoreGene, ModelGene, Model, Hit, Loner, … interactions

../../_images/gene_obj_interaction.svg

The diagram above represents the models, genes and hit generated from the definitions below.

<model name="A" inter_gene_max_space="2">
    <gene name="abc" presence="mandatory"/>
    <gene name="def" presence="accessory"/>
</model>

<model name="B" inter_gene_max_space="5">
    <gene name="def" presence="mandatory"/>
        <exchangeables>
            <gene name="abc"/>
        </exchangeables>
    <gene name="ghj" presence="accessory"
</model>

hit API reference

CoreHit

class macsypy.hit.CoreHit(gene, hit_id, hit_seq_length, replicon_name, position_hit, i_eval, score, profile_coverage, sequence_coverage, begin_match, end_match)[source]

Handle the hits filtered from the Hmmer search. The hits are instanciated by HMMReport.extract() method In one run of MacSyFinder, there exists only one CoreHit per gene These hits are independent of any macsypy.model.Model instance.

__eq__(other)[source]

Return True if two hits are totally equivalent, False otherwise.

Parameters

other (macsypy.report.CoreHit object) – the hit to compare to the current object

Returns

the result of the comparison

Return type

boolean

__gt__(other)[source]

compare two Hits. If the sequence identifier is the same, do the comparison on the score. Otherwise, do it on alphabetical comparison of the sequence identifier.

Parameters

other (macsypy.report.CoreHit object) – the hit to compare to the current object

Returns

True if self is > other, False otherwise

__hash__()[source]

To be hashable, it’s needed to be put in a set or used as dict key

__init__(gene, hit_id, hit_seq_length, replicon_name, position_hit, i_eval, score, profile_coverage, sequence_coverage, begin_match, end_match)[source]
Parameters
  • gene (macsypy.gene.CoreGene object) – the gene corresponding to this profile

  • hit_id (str) – the identifier of the hit

  • hit_seq_length (int) – the length of the hit sequence

  • replicon_name (str) – the name of the replicon

  • position_hit (int) – the rank of the sequence matched in the input dataset file

  • i_eval (float) – the best-domain evalue (i-evalue, “independent evalue”)

  • score (float) – the score of the hit

  • profile_coverage (float) – percentage of the profile that matches the hit sequence

  • sequence_coverage (float) – percentage of the hit sequence that matches the profile

  • begin_match (int) – where the hit with the profile starts in the sequence

  • end_match (int) – where the hit with the profile ends in the sequence

__lt__(other)[source]

Compare two Hits. If the sequence identifier is the same, do the comparison on the score. Otherwise, do it on alphabetical comparison of the sequence identifier.

Parameters

other (macsypy.report.CoreHit object) – the hit to compare to the current object

Returns

True if self is < other, False otherwise

__str__()[source]
Returns

Useful information on the CoreHit: regarding Hmmer statistics, and sequence information

Return type

str

__weakref__

list of weak references to the object (if defined)

get_position()[source]
Returns

the position of the hit (rank in the input dataset file)

Return type

integer

ModelHit

class macsypy.hit.ModelHit(hit, gene_ref, gene_status)[source]

Encapsulates a macsypy.report.CoreHit This class stores a CoreHit that has been attributed to a putative system. Thus, it also stores:

  • the system,

  • the status of the gene in this system, (‘mandatory’, ‘accessory’, …

  • the gene in the model for which it’s an occurrence

for one gene it can exist several ModelHit instance one for each Model containing this gene

__eq__(other)[source]

Return self==value.

__gt__(other)[source]

Return self>value.

__hash__()[source]

To be hashable, it’s needed to be put in a set or used as dict key

__init__(hit, gene_ref, gene_status)[source]
Parameters
__lt__(other)[source]

Return self<value.

__str__()[source]

Return str(self).

__weakref__

list of weak references to the object (if defined)

property hit
Returns

The CoreHit below this ModelHit

Return type

macsypy.hit.CoreHit oject

property loner
Returns

True if the hit represent a loner macsypy.Gene.ModelGene, False otherwise. A True Loner is a hit representing a gene with the attribute loner and which does not include in a cluster.

  • a hit representing a loner gene but include in a cluster is not a true loner

  • a hit which is not include with other gene in a cluster but does not represent a gene loner is not a True loner (This situation may append when min_genes_required = 1)

Return type

bool

property multi_model
Returns

True if the hit represent a multi_model macsypy.Gene.ModelGene, False otherwise.

Return type

bool

property multi_system
Returns

True if the hit represent a multi_system macsypy.Gene.ModelGene, False otherwise.

Return type

bool

AbstractCounterpartHit

class macsypy.hit.AbstractCounterpartHit(hit, gene_ref=None, gene_status=None, counterpart=None)[source]

Abstract Class to handle ModelHit wit equivalent for instance Loner or MultiSystem hit

__init__(hit, gene_ref=None, gene_status=None, counterpart=None)[source]
Parameters
__str__()[source]

Return str(self).

property counterpart
Returns

The set of hits that can play the same role

property loner
Returns

True if the hit represent a loner macsypy.Gene.ModelGene, False otherwise. A True Loner is a hit representing a gene with the attribute loner and which does not include in a cluster.

  • a hit representing a loner gene but include in a cluster is not a true loner

  • a hit which is not include with other gene in a cluster but does not represent a gene loner is not a True loner (This situation may append when min_genes_required = 1)

Return type

bool

property multi_system
Returns

True if the hit represent a multi_system macsypy.Gene.ModelGene, False otherwise.

Return type

bool

Loner

class macsypy.hit.Loner(hit, gene_ref=None, gene_status=None, counterpart=None)[source]

Handle hit which encode for a gene tagged as loner and which not clustering with other hit.

__init__(hit, gene_ref=None, gene_status=None, counterpart=None)[source]

hit that is outside a cluster, the gene_ref is a loner

Parameters
property loner
Returns

True if the hit represent a loner macsypy.Gene.ModelGene, False otherwise. A True Loner is a hit representing a gene with the attribute loner and which does not include in a cluster.

  • a hit representing a loner gene but include in a cluster is not a true loner

  • a hit which is not include with other gene in a cluster but does not represent a gene loner is not a True loner (This situation may append when min_genes_required = 1)

Return type

bool

MultiSystem

class macsypy.hit.MultiSystem(hit, gene_ref=None, gene_status=None, counterpart=None)[source]

Handle hit which encode for a gene tagged as loner and which not clustering with other hit.

__init__(hit, gene_ref=None, gene_status=None, counterpart=None)[source]

hit that is outside a cluster, the gene_ref is a loner

Parameters
property multi_system
Returns

True if the hit represent a multi_system macsypy.Gene.ModelGene, False otherwise.

Return type

bool

LonerMultiSystem

class macsypy.hit.LonerMultiSystem(hit, gene_ref=None, gene_status=None, counterpart=None)[source]
Handle hit which encode for a gene
  • gene tagged as multi-system

  • and gene tagged as loner also

  • and the hit do not clustering with other hits.

__init__(hit, gene_ref=None, gene_status=None, counterpart=None)[source]

hit that is outside a cluster, the gene_ref is loner and multi_system

Parameters

HitWeight

class macsypy.hit.HitWeight(itself: float = 1, exchangeable: float = 0.8, mandatory: float = 1, accessory: float = 0.5, neutral: float = 0, out_of_cluster: float = 0.7)[source]

The weight to compute the cluster and system score see user documentation macsyfinder functionning for further details by default

  • itself = 1

  • exchangeable = 0.8

  • mandatory = 1

  • accessory = 0.5

  • neutral = 0

  • out_of_cluster = 0.7

__weakref__

list of weak references to the object (if defined)

get_best_hit_4_func

macsypy.hit.get_best_hit_4_func(function, hits, key='score')[source]

select the best Loner among several ones encoding for same function

  • score

  • i_evalue

  • profile_coverage

Parameters
  • function (str) – the name of the function fulfill by the hits (all hits must have same function)

  • hits (sequence of macsypy.hit.ModelHit object) – the hits to filter.

  • key (str) – The criterion used to select the best hit ‘score’, i_evalue’, ‘profile_coverage’

Returns

the best hit

Return type

macsypy.hit.ModelHit object

sort_model_hits

macsypy.hit.sort_model_hits(model_hits)[source]

Sort macsypy.hit.ModelHit per function

Parameters

model_hits – a sequence of macsypy.hit.ModelHit

Returns

dict {str function name: [model_hit, …] }

compute_best_MSHit

macsypy.hit.compute_best_MSHit(ms_registry)[source]
Parameters

ms_registry

Returns

get_best_hits

macsypy.hit.get_best_hits(hits, key='score')[source]

If several hits match the same protein, keep only the best match based either on

  • score

  • i_evalue

  • profile_coverage

Parameters
  • hits ([ macsypy.hit.CoreHit object, …]) – the hits to filter, all hits must match the same protein.

  • key (str) – The criterion used to select the best hit ‘score’, i_evalue’, ‘profile_coverage’

Returns

the list of the best hits

Return type

[ macsypy.hit.CoreHit object, …]