The annotate package
This package is made to either annotate genomes or read annotations from gbff or gff files.
It depends on the following subpackage:
formats, to write the pangenome to the HDF-5 file.
It depends on the following modules:
pangenome
genome
utils
Submodules
ppanggolin.annotate.annotate module
- ppanggolin.annotate.annotate.annotatePangenome(pangenome, fastaList, tmpdir, cpu, translation_table='11', kingdom='bacteria', norna=False, overlap=True)[source]
- ppanggolin.annotate.annotate.create_gene(org, contig, geneCounter, rnaCounter, ID, dbxref, start, stop, strand, gene_type, position=None, gene_name='', product='', genetic_code=11, protein_id='')[source]
- ppanggolin.annotate.annotate.detect_filetype(filename)[source]
detects whether the current file is gff3, gbk/gbff or unknown. If unknown, it will raise an error
- ppanggolin.annotate.annotate.readAnnoFile(organism_name, filename, circular_contigs, getSeq, pseudo)[source]
- ppanggolin.annotate.annotate.readAnnotations(pangenome, organisms_file, cpu, getSeq=True, pseudo=False)[source]
- ppanggolin.annotate.annotate.read_org_gbff(organism, gbff_file_path, circular_contigs, getSeq, pseudo=False)[source]
reads a gbff file and fills Organism, Contig and Genes objects based on information contained in this file
ppanggolin.annotate.synta module
- ppanggolin.annotate.synta.annotate_organism(orgName, fileName, circular_contigs, code, kingdom, norna, tmpdir, overlap)[source]
Function to annotate a single organism
- ppanggolin.annotate.synta.launch_aragorn(fnaFile, org)[source]
launches Aragorn to annotate tRNAs. Takes a fna file name and a locustag to give an ID to the found genes. returns the annotated genes in a list of gene objects.
- ppanggolin.annotate.synta.launch_infernal(fnaFile, org, kingdom, tmpdir)[source]
launches Infernal in hmmer-only mode to annotate rRNAs. Takes a fna file name and a locustag to give an ID to the found genes. returns the annotated genes in a list of gene objects.
- ppanggolin.annotate.synta.launch_prodigal(fnaFile, org, code)[source]
launches Prodigal to annotate CDS. Takes a fna file name and a locustag to give an ID to the found genes. returns the annotated genes in a list of gene objects.
- ppanggolin.annotate.synta.overlap_filter(allGenes, contigs, overlap)[source]
Removes the CDS that overlap with RNA genes.
- ppanggolin.annotate.synta.read_fasta(org, fnaFile)[source]
Reads a fna file (or stream, or string) and stores it in a dictionnary with contigs as key and sequence as value.
- ppanggolin.annotate.synta.reverse_complement(seq)[source]
reverse complement the given dna sequence
- ppanggolin.annotate.synta.syntaxic_annotation(org, fastaFile, norna, kingdom, code, tmpdir)[source]
Runs the different softwares for the syntaxic annotation.
Takes in the file-like object containing the uncompressed fasta sequences to annotate the number of cpus that we can use. whether to annotate rna or not the locustag to give gene IDs.
- ppanggolin.annotate.synta.write_tmp_fasta(contigs, tmpdir)[source]
Writes a temporary fna formated file, and returns the file-like object. This is for the cases where the given file is compressed, then we write a temporary file for the annotation tools to read from. The file will be deleted when close() is called.