The formats package
This package is used by the other packages to read and write the pangenome to/from the HDF-5 file.
Submodules
ppanggolin.formats.readBinaries module
- ppanggolin.formats.readBinaries.checkPangenomeInfo(pangenome, needAnnotations=False, needFamilies=False, needGraph=False, needPartitions=False, needRGP=False, needSpots=False)[source]
defines what needs to be read depending on what is needed, and automatically checks if the required elements have been computed with regards to the pangenome.status
- ppanggolin.formats.readBinaries.getGeneSequencesFromFile(pangenome, fileObj, list_CDS=None)[source]
Writes the CDS sequences of the Pangenome object to a tmpFile object that can by filtered or not by a list of CDS Loads the sequences from a .h5 pangenome file
- ppanggolin.formats.readBinaries.getNumberOfOrganisms(pangenome)[source]
standalone function to get the number of organisms in a pangenome
- ppanggolin.formats.readBinaries.getStatus(pangenome, pangenomeFile)[source]
Checks which elements are already present in the file.
- ppanggolin.formats.readBinaries.readOrganism(pangenome, orgName, contigDict, circularContigs, link=False)[source]
- ppanggolin.formats.readBinaries.readPangenome(pangenome, annotation=False, geneFamilies=False, graph=False, rgp=False, spots=False)[source]
Reads a previously written pangenome, with all of its parts, depending on what is asked, with regards to what is filled in the ‘status’ field of the hdf5 file.
ppanggolin.formats.writeBinaries module
- ppanggolin.formats.writeBinaries.ErasePangenome(pangenome, graph=False, geneFamilies=False, partition=False, rgp=False, spots=False)[source]
erases tables from a pangenome .h5 file
- ppanggolin.formats.writeBinaries.geneDesc(orgLen, contigLen, IDLen, typeLen, nameLen, productLen, maxLocalId)[source]
- ppanggolin.formats.writeBinaries.updateGeneFragments(pangenome, h5f)[source]
updates the annotation table with the fragmentation informations from the defrag pipeline
- ppanggolin.formats.writeBinaries.writeAnnotations(pangenome, h5f)[source]
Function writing all of the pangenome’s annotations
- ppanggolin.formats.writeBinaries.writeGeneFamInfo(pangenome, h5f, force)[source]
Writing a table containing the protein sequences of each family
- ppanggolin.formats.writeBinaries.writeGeneFamilies(pangenome, h5f, force)[source]
Function writing all of the pangenome’s gene families
- ppanggolin.formats.writeBinaries.writeInfo(pangenome, h5f)[source]
writes information and numbers to be eventually called with the ‘info’ submodule
ppanggolin.formats.writeFlat module
- ppanggolin.formats.writeFlat.writeFlatFiles(pangenome, output, cpu=1, soft_core=0.95, dup_margin=0.05, csv=False, genePA=False, gexf=False, light_gexf=False, projection=False, stats=False, json=False, partitions=False, regions=False, families_tsv=False, all_genes=False, all_prot_families=False, all_gene_families=False, spots=False, borders=False, compress=False)[source]
- ppanggolin.formats.writeFlat.writeGeneSequencesFromAnnotations(pangenome, fileObj, verbose=False)[source]
Writes the CDS sequences of the Pangenome object to a tmpFile object Loads the sequences from previously computed or loaded annotations