The annotate package

This package is made to either annotate genomes or read annotations from gbff or gff files.

It depends on the following subpackage:

formats, to write the pangenome to the HDF-5 file.

It depends on the following modules:

pangenome
genome
utils

Submodules

ppanggolin.annotate.annotate module

ppanggolin.annotate.annotate.annotatePangenome(pangenome, fastaList, tmpdir, cpu, translation_table='11', kingdom='bacteria', norna=False, overlap=True)[source]

ppanggolin.annotate.annotate.create_gene(org, contig, geneCounter, rnaCounter, ID, dbxref, start, stop, strand, gene_type, position=None, gene_name='', product='', genetic_code=11, protein_id='')[source]

ppanggolin.annotate.annotate.detect_filetype(filename)[source]: detects whether the current file is gff3, gbk/gbff or unknown. If unknown, it will raise an error

ppanggolin.annotate.annotate.getGeneSequencesFromFastas(pangenome, fasta_file)[source]

ppanggolin.annotate.annotate.launch(args)[source]

ppanggolin.annotate.annotate.launchAnnotateOrganism(pack)[source]

ppanggolin.annotate.annotate.launchReadAnno(args)[source]

ppanggolin.annotate.annotate.readAnnoFile(organism_name, filename, circular_contigs, getSeq, pseudo)[source]

ppanggolin.annotate.annotate.readAnnotations(pangenome, organisms_file, cpu, getSeq=True, pseudo=False)[source]

ppanggolin.annotate.annotate.read_org_gbff(organism, gbff_file_path, circular_contigs, getSeq, pseudo=False)[source]: reads a gbff file and fills Organism, Contig and Genes objects based on information contained in this file

ppanggolin.annotate.annotate.read_org_gff(organism, gff_file_path, circular_contigs, getSeq, pseudo=False)[source]

ppanggolin.annotate.annotate.syntaSubparser(subparser)[source]

ppanggolin.annotate.synta module

ppanggolin.annotate.synta.annotate_organism(orgName, fileName, circular_contigs, code, kingdom, norna, tmpdir, overlap)[source]: Function to annotate a single organism

ppanggolin.annotate.synta.get_dna_sequence(contigSeq, gene)[source]

ppanggolin.annotate.synta.launch_aragorn(fnaFile, org)[source]: launches Aragorn to annotate tRNAs. Takes a fna file name and a locustag to give an ID to the found genes. returns the annotated genes in a list of gene objects.

ppanggolin.annotate.synta.launch_infernal(fnaFile, org, kingdom, tmpdir)[source]: launches Infernal in hmmer-only mode to annotate rRNAs. Takes a fna file name and a locustag to give an ID to the found genes. returns the annotated genes in a list of gene objects.

ppanggolin.annotate.synta.launch_prodigal(fnaFile, org, code)[source]: launches Prodigal to annotate CDS. Takes a fna file name and a locustag to give an ID to the found genes. returns the annotated genes in a list of gene objects.

ppanggolin.annotate.synta.overlap_filter(allGenes, contigs, overlap)[source]: Removes the CDS that overlap with RNA genes.

ppanggolin.annotate.synta.read_fasta(org, fnaFile)[source]: Reads a fna file (or stream, or string) and stores it in a dictionnary with contigs as key and sequence as value.

ppanggolin.annotate.synta.reverse_complement(seq)[source]: reverse complement the given dna sequence

ppanggolin.annotate.synta.syntaxic_annotation(org, fastaFile, norna, kingdom, code, tmpdir)[source]

Runs the different softwares for the syntaxic annotation.

Takes in the file-like object containing the uncompressed fasta sequences to annotate the number of cpus that we can use. whether to annotate rna or not the locustag to give gene IDs.

ppanggolin.annotate.synta.write_tmp_fasta(contigs, tmpdir)[source]: Writes a temporary fna formated file, and returns the file-like object. This is for the cases where the given file is compressed, then we write a temporary file for the annotation tools to read from. The file will be deleted when close() is called.