The cluster package
This package is there to built gene families, or to read gene families from used input. It will mainly use MMseqs2 for the computation.
Submodules
ppanggolin.cluster.cluster module
- ppanggolin.cluster.cluster.checkPangenomeForClustering(pangenome, tmpFile, force)[source]
Check the pangenome statuses and write the gene sequences in the provided tmpFile. (whether they are written in the .h5 file or currently in memory)
- ppanggolin.cluster.cluster.checkPangenomeFormerClustering(pangenome, force)[source]
checks pangenome status and .h5 files for former clusterings, delete them if allowed or raise an error
- ppanggolin.cluster.cluster.clustering(pangenome, tmpdir, cpu, defrag=True, code='11', coverage=0.8, identity=0.8, mode='1', force=False)[source]
- ppanggolin.cluster.cluster.firstClustering(sequences, tmpdir, cpu, code, coverage, identity, mode)[source]
- ppanggolin.cluster.cluster.inferSingletons(pangenome)[source]
creates a new family for each gene with no associated family
- ppanggolin.cluster.cluster.mkLocal2Gene(pangenome)[source]
Creates a dictionnary that stores local identifiers, if all local identifiers are unique (and if they exist)
- ppanggolin.cluster.cluster.readClustering(pangenome, families_tsv_file, infer_singletons=False, force=False)[source]
Creates the pangenome, the gene families and the genes with an associated gene family. Reads a families tsv file from mmseqs2 output and adds the gene families and the genes to the pangenome.