The cluster package

This package is there to built gene families, or to read gene families from used input. It will mainly use MMseqs2 for the computation.

Submodules

ppanggolin.cluster.cluster module

ppanggolin.cluster.cluster.alignRep(faaFile, tmpdir, cpu, coverage, identity)[source]
ppanggolin.cluster.cluster.checkPangenomeForClustering(pangenome, tmpFile, force)[source]

Check the pangenome statuses and write the gene sequences in the provided tmpFile. (whether they are written in the .h5 file or currently in memory)

ppanggolin.cluster.cluster.checkPangenomeFormerClustering(pangenome, force)[source]

checks pangenome status and .h5 files for former clusterings, delete them if allowed or raise an error

ppanggolin.cluster.cluster.clusterSubparser(subparser)[source]
ppanggolin.cluster.cluster.clustering(pangenome, tmpdir, cpu, defrag=True, code='11', coverage=0.8, identity=0.8, mode='1', force=False)[source]
ppanggolin.cluster.cluster.firstClustering(sequences, tmpdir, cpu, code, coverage, identity, mode)[source]
ppanggolin.cluster.cluster.inferSingletons(pangenome)[source]

creates a new family for each gene with no associated family

ppanggolin.cluster.cluster.launch(args)[source]

launch the clustering step

ppanggolin.cluster.cluster.mkLocal2Gene(pangenome)[source]

Creates a dictionnary that stores local identifiers, if all local identifiers are unique (and if they exist)

ppanggolin.cluster.cluster.readClustering(pangenome, families_tsv_file, infer_singletons=False, force=False)[source]

Creates the pangenome, the gene families and the genes with an associated gene family. Reads a families tsv file from mmseqs2 output and adds the gene families and the genes to the pangenome.

ppanggolin.cluster.cluster.read_faa(faFileName)[source]
ppanggolin.cluster.cluster.read_fam2seq(pangenome, fam2seq)[source]
ppanggolin.cluster.cluster.read_gene2fam(pangenome, gene2fam)[source]
ppanggolin.cluster.cluster.read_tsv(tsvfileName)[source]
ppanggolin.cluster.cluster.refineClustering(tsv, alnFile, fam2seq)[source]