The Pangenome class

class ppanggolin.pangenome.Pangenome[source]

Bases: object

This is a class representing your pangenome. It is used as a basic unit for all of the analysis to access to the different elements of your pangenome, such as organisms, contigs, genes or gene families. It has setter and getter methods for most elements in your pangenome and you can use those to add new elements to it, or get objects that have a specific identifier to manipulate them directly.

addEdge(gene1, gene2)[source]

Adds an edge between the two gene families that the two given genes belong to. Genes object are expected, and they are also expected to have a family assigned

Parameters
Returns

the created Edge

Return type

ppanggolin.pangenome.Edge

addFile(pangenomeFile)[source]

Links an HDF5 file to the pangenome. If needed elements will be loaded from this file, and anything that is computed will be saved to this file when ppanggolin.formats.writeBinaries.writePangenome() is called.

Parameters

pangenomeFile (str) – A string representing the filepath to the hdf5 pangenome file to be either used or created

addGeneFamily(name)[source]

Get the ppanggolin.geneFamily.GeneFamily object that has the given name. If it does not exist, creates it. returns the geneFamily object.

Parameters

name (str) – The gene family name to get if it exists, and create otherwise.

addOrganism(newOrg)[source]

adds an organism that did not exist previously in the pangenome if an ppanggolin.genome.Organism object is provided. If an organism with the same name exists it will raise an error. If a str object is provided, will return the corresponding organism that has this name OR create a new one if it does not exist.

Parameters

newOrg (ppanggolin.genome.Organism or str) – Organism to add to the pangenome

Returns

The created organism

Return type

ppanggolin.genome.Organism

Raises

TypeError – if the provided newOrg is neither a str nor a ppanggolin.genome.Organism

addRegions(regionGroup)[source]

Takes an Iterable or a Region object and adds it to the pangenome

Parameters

regionGroup (ppanggolin.region.Region or Iterable[ppanggolin.region.Region]) – a region or an Iterable of regions to add to the pangenome

Raises

TypeError – if regionGroup is neither a Region nor a Iterable[ppanggolin.region.Region]

addSpots(spots)[source]

Adds the given iterable of spots to the pangenome.

Parameters

spots (Iterable[ppanggolin.region.Spot]) – An iterable of spots.

computeFamilyBitarrays()[source]

Based on the index generated by ppanggolin.pangenome.Pangenome.getIndex(), generated a bitarray for each gene family. If the family j is present in the organism with the index i, the bit at position i will be 1. If it is not, the bit will be 0.

Returns

A dictionnary with ppanggolin.genome.Organism as key and int as value.

Return type

dict[ppanggolin.genome.Organism, int]

property edges

returns all the edges in the pangenome graph

Returns

list of ppanggolin.pangenome.Edge

Return type

list

property geneFamilies

returns all the gene families in the pangenome

Returns

list of ppanggolin.geneFamily.GeneFamily

Return type

list

property genes

Creates the geneGetter if it does not exist, and returns all the genes of all organisms in the pangenome.

Returns

list of ppanggolin.genome.Gene

Return type

list

getGene(geneID)[source]

returns the gene that has the given geneID

Parameters

geneID (any) – The gene ID to look for

Returns

returns the gene that has the ID geneID

Return type

ppanggolin.genome.Gene

Raises

KeyError – If the geneID is not in the pangenome

getGeneFamily(name)[source]

returns the gene family that has the given name

Parameters

name (any) – The gene family name to look for

Returns

returns the gene family that has the name name

Return type

ppanggolin.geneFamily.GeneFamily

getIndex()[source]

Creates an index for Organisms (each organism is assigned an Integer).

Returns

A dictionnary with ppanggolin.genome.Organism as key and int as value.

Return type

dict[ppanggolin.genome.Organism, int]

getOrAddRegion(regionName)[source]

Returns a region with the given regionName. Creates it if it does not exist.

Parameters

regionName (str) – The name of the region to return

Returns

The region

Return type

ppanggolin.region.Region

get_multigenics(dup_margin)[source]

Returns the multigenic persistent families of the pangenome graph. A family will be considered multigenic if it is duplicated in more than dup_margin of the genomes where it is present.

Parameters

dup_margin (float) – the ratio of presence in multicopy above which a gene family is considered multigenic

Returns

a set of gene families considered multigenic

Return type

set[ppanggolin.geneFamily.GeneFamily]

number_of_geneFamilies()[source]

Returns the number of gene families present in the pangenome

Returns

the number of gene families

Return type

int

number_of_organisms()[source]

Returns the number of organisms present in the pangenome

Returns

the number of organism

Return type

int

property organisms

returns all the organisms in the pangenome

Returns

list of ppanggolin.genome.Organism

Return type

list

property regions

returns all the regions (RGP) in the pangenome

Returns

list of ppanggolin.region.Region

Return type

list