The Pangenome
class
- class ppanggolin.pangenome.Pangenome[source]
Bases:
object
This is a class representing your pangenome. It is used as a basic unit for all of the analysis to access to the different elements of your pangenome, such as organisms, contigs, genes or gene families. It has setter and getter methods for most elements in your pangenome and you can use those to add new elements to it, or get objects that have a specific identifier to manipulate them directly.
- addEdge(gene1, gene2)[source]
Adds an edge between the two gene families that the two given genes belong to. Genes object are expected, and they are also expected to have a family assigned
- Parameters
gene1 (
ppanggolin.genome.Gene
) – The first genegene2 (
ppanggolin.genome.Gene
) – The second gene
- Returns
the created Edge
- Return type
- addFile(pangenomeFile)[source]
Links an HDF5 file to the pangenome. If needed elements will be loaded from this file, and anything that is computed will be saved to this file when
ppanggolin.formats.writeBinaries.writePangenome()
is called.- Parameters
pangenomeFile (str) – A string representing the filepath to the hdf5 pangenome file to be either used or created
- addGeneFamily(name)[source]
Get the
ppanggolin.geneFamily.GeneFamily
object that has the given name. If it does not exist, creates it. returns the geneFamily object.- Parameters
name (str) – The gene family name to get if it exists, and create otherwise.
- addOrganism(newOrg)[source]
adds an organism that did not exist previously in the pangenome if an
ppanggolin.genome.Organism
object is provided. If an organism with the same name exists it will raise an error. If astr
object is provided, will return the corresponding organism that has this name OR create a new one if it does not exist.- Parameters
newOrg (
ppanggolin.genome.Organism
or str) – Organism to add to the pangenome- Returns
The created organism
- Return type
- Raises
TypeError – if the provided newOrg is neither a str nor a
ppanggolin.genome.Organism
- addRegions(regionGroup)[source]
Takes an Iterable or a Region object and adds it to the pangenome
- Parameters
regionGroup (
ppanggolin.region.Region
or Iterable[ppanggolin.region.Region
]) – a region or an Iterable of regions to add to the pangenome- Raises
TypeError – if regionGroup is neither a Region nor a Iterable[
ppanggolin.region.Region
]
- addSpots(spots)[source]
Adds the given iterable of spots to the pangenome.
- Parameters
spots (Iterable[
ppanggolin.region.Spot
]) – An iterable of spots.
- computeFamilyBitarrays()[source]
Based on the index generated by
ppanggolin.pangenome.Pangenome.getIndex()
, generated a bitarray for each gene family. If the family j is present in the organism with the index i, the bit at position i will be 1. If it is not, the bit will be 0.- Returns
A dictionnary with
ppanggolin.genome.Organism
as key and int as value.- Return type
dict[
ppanggolin.genome.Organism
, int]
- property edges
returns all the edges in the pangenome graph
- Returns
list of
ppanggolin.pangenome.Edge
- Return type
list
- property geneFamilies
returns all the gene families in the pangenome
- Returns
list of
ppanggolin.geneFamily.GeneFamily
- Return type
list
- property genes
Creates the geneGetter if it does not exist, and returns all the genes of all organisms in the pangenome.
- Returns
list of
ppanggolin.genome.Gene
- Return type
list
- getGene(geneID)[source]
returns the gene that has the given geneID
- Parameters
geneID (any) – The gene ID to look for
- Returns
returns the gene that has the ID geneID
- Return type
- Raises
KeyError – If the geneID is not in the pangenome
- getGeneFamily(name)[source]
returns the gene family that has the given name
- Parameters
name (any) – The gene family name to look for
- Returns
returns the gene family that has the name name
- Return type
- getIndex()[source]
Creates an index for Organisms (each organism is assigned an Integer).
- Returns
A dictionnary with
ppanggolin.genome.Organism
as key and int as value.- Return type
dict[
ppanggolin.genome.Organism
, int]
- getOrAddRegion(regionName)[source]
Returns a region with the given regionName. Creates it if it does not exist.
- Parameters
regionName (str) – The name of the region to return
- Returns
The region
- Return type
- get_multigenics(dup_margin)[source]
Returns the multigenic persistent families of the pangenome graph. A family will be considered multigenic if it is duplicated in more than dup_margin of the genomes where it is present.
- Parameters
dup_margin (float) – the ratio of presence in multicopy above which a gene family is considered multigenic
- Returns
a set of gene families considered multigenic
- Return type
- number_of_geneFamilies()[source]
Returns the number of gene families present in the pangenome
- Returns
the number of gene families
- Return type
int
- number_of_organisms()[source]
Returns the number of organisms present in the pangenome
- Returns
the number of organism
- Return type
int
- property organisms
returns all the organisms in the pangenome
- Returns
list of
ppanggolin.genome.Organism
- Return type
list
- property regions
returns all the regions (RGP) in the pangenome
- Returns
list of
ppanggolin.region.Region
- Return type
list