Developper doc

ppanggolin is both a command line tool and a python library for comparative genomics. It tries to prodive a solution for using cutting-edge methods for large scale comparative analysis and stores any computed results in a compact format so that they can be reused at will. This part of the documentation is made for people that want to use PPanGGOLiN as a python library, or for those that need to maintain the package or want to modify it.

If you were looking for the command line tool documentation of PPanGGOLiN, you should check the github wiki instead.

Subpackages

There is a ppanggolin subpackage for each specific step of the analysis. Each subpackage is associated to one or more subcommand.

Submodules

Submodules includes all of the basic classes of PPanGGOLiN that will be used by the subpackages.

ppanggolin.genome module

class ppanggolin.genome.Contig(name, is_circular=False)[source]

Bases: object

addGene(gene)[source]
addRNA(gene)[source]
property genes
class ppanggolin.genome.Feature(ID)[source]

Bases: object

add_dna(dna)[source]
fill_annotations(start, stop, strand, geneType='', name='', product='', local_identifier='', position=None, genetic_code=11)[source]
fill_parents(organism, contig)[source]
class ppanggolin.genome.Gene(ID)[source]

Bases: Feature

add_protein(protein)[source]
fill_annotations(start, stop, strand, geneType='', name='', product='', local_identifier='', position=None, genetic_code=11)[source]
class ppanggolin.genome.Organism(name)[source]

Bases: object

property contigs
property families

returns the gene families present in the organism

property genes
getOrAddContig(key, is_circular=False)[source]
number_of_genes()[source]
class ppanggolin.genome.RNA(ID)[source]

Bases: Feature

ppanggolin.main module

ppanggolin.main.checkInputFiles(anno=None, pangenome=None, fasta=None)[source]

Checks if the provided input files exist and are of the proper format

ppanggolin.main.checkLog(name)[source]
ppanggolin.main.checkTsvSanity(tsv)[source]
ppanggolin.main.cmdLine()[source]
ppanggolin.main.main()[source]

ppanggolin.region module

class ppanggolin.region.Region(ID)[source]

Bases: object

append(value)[source]
property contig
property families
getBorderingGenes(n, multigenics)[source]
getRNAs()[source]
property isContigBorder
property isWholeContig

Indicates if the region is an entire contig

property organism
property start
property startGene
property stop
property stopGene
class ppanggolin.region.Spot(ID)[source]

Bases: object

addRegion(region)[source]
addRegions(regions)[source]

Adds region(s) contained in an Iterable to the spot which all have the same bordering persistent genes provided with ‘borders’

borders(set_size, multigenics)[source]

extracts all the borders of all RGPs belonging to the spot

countUniqContent()[source]

Returns a counter with a representative rgp as key and the number of identical rgp in terms of gene family content as value

countUniqOrderedSet()[source]

Returns a counter with a representative rgp as key and the number of identical rgp in terms of synteny as value

getUniqContent()[source]

returns an Iterable of all the unique rgp (in terms of gene family content) in the spot

getUniqOrderedSet()[source]

returns an Iterable of all the unique syntenies in the spot

ppanggolin.utils module

ppanggolin.utils.get_num_lines(file)[source]
ppanggolin.utils.is_compressed(file_or_file_path)[source]

Checks is a file, or file path given is compressed or not

ppanggolin.utils.jaccard_similarities(mat, jaccard_similarity_th)[source]
ppanggolin.utils.mkFilename(basename, output, force)[source]

Returns a usable filename for a ppanggolin output file, or crashes.

ppanggolin.utils.mkOutdir(output, force)[source]
ppanggolin.utils.read_compressed_or_not(file_or_file_path)[source]

reads a file object or file path, uncompresses it if need be. returns a TextIO object in read only.

ppanggolin.utils.restricted_float(x)[source]
ppanggolin.utils.write_compressed_or_not(file_path, compress)[source]

Returns a file-like object, compressed or not.

Module contents