The formats package

This package is used by the other packages to read and write the pangenome to/from the HDF-5 file.

Submodules

ppanggolin.formats.readBinaries module

ppanggolin.formats.readBinaries.checkPangenomeInfo(pangenome, needAnnotations=False, needFamilies=False, needGraph=False, needPartitions=False, needRGP=False, needSpots=False)[source]: defines what needs to be read depending on what is needed, and automatically checks if the required elements have been computed with regards to the pangenome.status

ppanggolin.formats.readBinaries.getGeneSequencesFromFile(pangenome, fileObj, list_CDS=None)[source]: Writes the CDS sequences of the Pangenome object to a tmpFile object that can by filtered or not by a list of CDS Loads the sequences from a .h5 pangenome file

ppanggolin.formats.readBinaries.getNumberOfOrganisms(pangenome)[source]: standalone function to get the number of organisms in a pangenome

ppanggolin.formats.readBinaries.getStatus(pangenome, pangenomeFile)[source]: Checks which elements are already present in the file.

ppanggolin.formats.readBinaries.launchReadOrganism(args)[source]

ppanggolin.formats.readBinaries.readAnnotation(pangenome, h5f, filename)[source]

ppanggolin.formats.readBinaries.readGeneFamilies(pangenome, h5f)[source]

ppanggolin.formats.readBinaries.readGeneFamiliesInfo(pangenome, h5f)[source]

ppanggolin.formats.readBinaries.readGraph(pangenome, h5f)[source]

ppanggolin.formats.readBinaries.readInfo(h5f)[source]

ppanggolin.formats.readBinaries.readOrganism(pangenome, orgName, contigDict, circularContigs, link=False)[source]

ppanggolin.formats.readBinaries.readPangenome(pangenome, annotation=False, geneFamilies=False, graph=False, rgp=False, spots=False)[source]: Reads a previously written pangenome, with all of its parts, depending on what is asked, with regards to what is filled in the ‘status’ field of the hdf5 file.

ppanggolin.formats.readBinaries.readParameters(h5f)[source]

ppanggolin.formats.readBinaries.readRGP(pangenome, h5f)[source]

ppanggolin.formats.readBinaries.readSpots(pangenome, h5f)[source]

ppanggolin.formats.readBinaries.read_chunks(table, column=None, chunk=10000)[source]: Reading entirely the provided table (or column if specified) chunk per chunk to limit RAM usage.

ppanggolin.formats.writeBinaries module

ppanggolin.formats.writeBinaries.ErasePangenome(pangenome, graph=False, geneFamilies=False, partition=False, rgp=False, spots=False)[source]: erases tables from a pangenome .h5 file

ppanggolin.formats.writeBinaries.RGPDesc(maxRGPLen, maxGeneLen)[source]

ppanggolin.formats.writeBinaries.gene2famDesc(geneFamNameLen, geneIDLen)[source]

ppanggolin.formats.writeBinaries.geneDesc(orgLen, contigLen, IDLen, typeLen, nameLen, productLen, maxLocalId)[source]

ppanggolin.formats.writeBinaries.geneFamDesc(maxNameLen, maxSequenceLength, maxPartLen)[source]

ppanggolin.formats.writeBinaries.geneSequenceDesc(geneIDLen, geneSeqLen, geneTypeLen)[source]

ppanggolin.formats.writeBinaries.getGene2famLen(pangenome)[source]

ppanggolin.formats.writeBinaries.getGeneFamLen(pangenome)[source]

ppanggolin.formats.writeBinaries.getGeneIDLen(pangenome)[source]

ppanggolin.formats.writeBinaries.getGeneSequencesLen(pangenome)[source]

ppanggolin.formats.writeBinaries.getMaxLenAnnotations(pangenome)[source]

ppanggolin.formats.writeBinaries.getRGPLen(pangenome)[source]

ppanggolin.formats.writeBinaries.getSpotDesc(pangenome)[source]

ppanggolin.formats.writeBinaries.graphDesc(maxGeneIDLen)[source]

ppanggolin.formats.writeBinaries.spotDesc(maxRGPLen)[source]

ppanggolin.formats.writeBinaries.updateGeneFamPartition(pangenome, h5f)[source]

ppanggolin.formats.writeBinaries.updateGeneFragments(pangenome, h5f)[source]: updates the annotation table with the fragmentation informations from the defrag pipeline

ppanggolin.formats.writeBinaries.writeAnnotations(pangenome, h5f)[source]: Function writing all of the pangenome’s annotations

ppanggolin.formats.writeBinaries.writeGeneFamInfo(pangenome, h5f, force)[source]: Writing a table containing the protein sequences of each family

ppanggolin.formats.writeBinaries.writeGeneFamilies(pangenome, h5f, force)[source]: Function writing all of the pangenome’s gene families

ppanggolin.formats.writeBinaries.writeGeneSequences(pangenome, h5f)[source]

ppanggolin.formats.writeBinaries.writeGraph(pangenome, h5f, force)[source]

ppanggolin.formats.writeBinaries.writeInfo(pangenome, h5f)[source]: writes information and numbers to be eventually called with the ‘info’ submodule

ppanggolin.formats.writeBinaries.writePangenome(pangenome, filename, force)[source]: Writes or updates a pangenome file pangenome is the corresponding pangenome object, filename the h5 file and status what has been modified.

ppanggolin.formats.writeBinaries.writeRGP(pangenome, h5f, force)[source]

ppanggolin.formats.writeBinaries.writeSpots(pangenome, h5f, force)[source]

ppanggolin.formats.writeBinaries.writeStatus(pangenome, h5f)[source]

ppanggolin.formats.writeFlat module

ppanggolin.formats.writeFlat.launch(args)[source]

ppanggolin.formats.writeFlat.spot2rgp(spots, output, compress)[source]

ppanggolin.formats.writeFlat.summarize_spots(spots, output, compress)[source]

ppanggolin.formats.writeFlat.writeBorders(output, dup_margin, compress)[source]

ppanggolin.formats.writeFlat.writeFastaGenFam(output, compress=False)[source]

ppanggolin.formats.writeFlat.writeFastaProtFam(output, compress=False)[source]

ppanggolin.formats.writeFlat.writeFlatFiles(pangenome, output, cpu=1, soft_core=0.95, dup_margin=0.05, csv=False, genePA=False, gexf=False, light_gexf=False, projection=False, stats=False, json=False, partitions=False, regions=False, families_tsv=False, all_genes=False, all_prot_families=False, all_gene_families=False, spots=False, borders=False, compress=False)[source]

ppanggolin.formats.writeFlat.writeFlatSubparser(subparser)[source]

ppanggolin.formats.writeFlat.writeGEXF(output, light=True, soft_core=0.95, compress=False)[source]

ppanggolin.formats.writeFlat.writeGEXFedges(gexf, light)[source]

ppanggolin.formats.writeFlat.writeGEXFend(gexf)[source]

ppanggolin.formats.writeFlat.writeGEXFheader(gexf, light)[source]

ppanggolin.formats.writeFlat.writeGEXFnodes(gexf, light, soft_core=0.95)[source]

ppanggolin.formats.writeFlat.writeGeneFamiliesTSV(output, compress=False)[source]

ppanggolin.formats.writeFlat.writeGenePresenceAbsence(output, compress=False)[source]

ppanggolin.formats.writeFlat.writeGeneSequences(output, compress=False)[source]

ppanggolin.formats.writeFlat.writeGeneSequencesFromAnnotations(pangenome, fileObj, verbose=False)[source]: Writes the CDS sequences of the Pangenome object to a tmpFile object Loads the sequences from previously computed or loaded annotations

ppanggolin.formats.writeFlat.writeJSON(output, compress)[source]

ppanggolin.formats.writeFlat.writeJSONGeneFam(geneFam, json)[source]

ppanggolin.formats.writeFlat.writeJSONedge(edge, json)[source]

ppanggolin.formats.writeFlat.writeJSONedges(json)[source]

ppanggolin.formats.writeFlat.writeJSONheader(json)[source]

ppanggolin.formats.writeFlat.writeJSONnodes(json)[source]

ppanggolin.formats.writeFlat.writeMatrix(sep, ext, output, compress=False, geneNames=False)[source]

ppanggolin.formats.writeFlat.writeOrgFile(org, output, compress=False)[source]

ppanggolin.formats.writeFlat.writeParts(output, soft_core, compress=False)[source]

ppanggolin.formats.writeFlat.writeProjections(output, compress=False)[source]

ppanggolin.formats.writeFlat.writeRegions(output, compress=False)[source]

ppanggolin.formats.writeFlat.writeSpots(output, compress)[source]

ppanggolin.formats.writeFlat.writeStats(output, soft_core, dup_margin, compress=False)[source]