Configuration API¶
Options to run MacSyFinder can be specified in a Configuration file. The API described below handles all configuration options for MacSyFinder. The Config object provides some default values, and performs some validations of the values.
Config API reference¶
-
class
macsypy.config.
Config
(cfg_file='', sequence_db=None, db_type=None, replicon_topology=None, topology_file=None, inter_gene_max_space=None, min_mandatory_genes_required=None, min_genes_required=None, max_nb_genes=None, multi_loci=None, hmmer_exe=None, index_db_exe=None, e_value_res=None, i_evalue_sel=None, coverage_profile=None, def_dir=None, res_search_dir=None, res_search_suffix=None, profile_dir=None, profile_suffix=None, res_extract_suffix=None, out_dir=None, log_level=None, log_file=None, worker_nb=None, config_file=None, previous_run=None, build_indexes=None)[source]¶ Parse configuration files and handle the configuration according to the following file location precedence: /etc/macsyfinder/macsyfinder.conf < ~/.macsyfinder/macsyfinder.conf < .macsyfinder.conf
If a configuration file is given on the command-line, this file will be used. In fine the arguments passed on the command-line have the highest priority.
-
__init__
(cfg_file='', sequence_db=None, db_type=None, replicon_topology=None, topology_file=None, inter_gene_max_space=None, min_mandatory_genes_required=None, min_genes_required=None, max_nb_genes=None, multi_loci=None, hmmer_exe=None, index_db_exe=None, e_value_res=None, i_evalue_sel=None, coverage_profile=None, def_dir=None, res_search_dir=None, res_search_suffix=None, profile_dir=None, profile_suffix=None, res_extract_suffix=None, out_dir=None, log_level=None, log_file=None, worker_nb=None, config_file=None, previous_run=None, build_indexes=None)[source]¶ Parameters: - cfg_file (string) – the path to the MacSyFinder configuration file to use
- previous_run (string) – the path to the results directory of a previous run
- sequence_db (string) – the path to the sequence input dataset (fasta format)
- db_type (string) – the type of dataset to deal with. “unordered_replicon” corresponds to a non-assembled genome, “unordered” to a metagenomic dataset, “ordered_replicon” to an assembled genome, and “gembase” to a set of replicons where sequence identifiers follow this convention “>RepliconName_SequenceID”.”
- replicon_topology (string) – the topology (‘linear’ or ‘circular’) of the replicons. This option is meaningful only if the db_type is ‘ordered_replicon’ or ‘gembase’
- topology_file (string) – a tabular file of mapping between replicon names and the corresponding topology (e.g. “RepliconA linear”)
- inter_gene_max_space (list of list of 2 elements [[ string system, integer space] , ..]) –
- min_mandatory_genes_required (list of list of 2 elements [[ string system, integer ] , ..]) –
- min_genes_required (list of list of 2 elements [[ string system, integer ] , ..]) –
- max_nb_genes (list of list of 2 elements [[ string system, integer ] , ..]) –
- multi_loci (string) –
- hmmer_exe (string) – the Hmmer “hmmsearch” executable
- index_db_exe (string) – the indexer executable (“makeblastdb” or “formatdb”)
- e_value_res (float) – maximal e-value for hits to be reported during Hmmer search
- i_evalue_sel (float) – maximal independent e-value for Hmmer hits to be selected for system detection
- coverage_profile (float) – minimal profile coverage required in the hit alignment to allow the hit selection for system detection
- def_dir (string) – the path to the directory containing systems definition files (.xml)
- res_search_dir (string) – the path to the directory where to store MacSyFinder search results directories.
- out_dir (string) – The results are written in a directory. By default the directory is named macsyfinder-{date}, but this option allow to override this behavior. If out-dir option is set out-dir will be created if outdir already exists it must be empty. If out-dir and res-search-dir are sets res-search-dir will be ignore.
- res_search_suffix (string) – the suffix to give to Hmmer raw output files
- res_extract_suffix (string) – the suffix to give to filtered hits output files
- profile_dir (string) – path to the profiles directory
- profile_suffix (string) – the suffix of profile files. For each ‘Gene’ element, the corresponding profile is searched in the ‘profile_dir’, in a file which name is based on the Gene name + the profile suffix.
- log_level (int) – the level of log output
- log_file (string) – the path to the directory to write MacSyFinder log files
- worker_nb (int) – maximal number of processes to be used in parallel (multi-thread run, 0 use all cores available)
- build_indexes (boolean) – build the indexes from the sequence dataset in fasta format
-
__weakref__
¶ list of weak references to the object (if defined)
-
_validate
(cmde_line_opt, cmde_line_values)[source]¶ Get all configuration values and check the validity of their values. Create the working directory
Parameters: - cmde_line_opt (dict, all values are cast in string) – the options from the command line
- cmde_line_values (dict, values are not cast) – the options from the command line
Returns: all the options for this execution
Return type: dictionary
-
build_indexes
¶ Returns: True if the indexes must be rebuilt, False otherwise Return type: boolean
-
coverage_profile
¶ Returns: the coverage threshold used to select a hit for systems detection and for the Hmmer report (filtered hits) Return type: float
-
db_type
¶ Returns: the type of the input sequence data set. The allowed values are : * ‘unordered_replicon’, * ‘ordered_replicon’, * ‘gembase’, * ‘unordered’ Return type: string
-
def_dir
¶ Returns: the path to the directory where are stored definitions of secretion systems (.xml files) Return type: string
-
e_value_res
¶ Returns: The e_value threshold used by Hmmer to report hits in the Hmmer raw output file Return type: float
-
hmmer_dir
¶ Returns: the name of the directory where the hmmer results are stored Return type: string
-
hmmer_exe
¶ Returns: the name of the binary to execute for homology search from HMM protein profiles (Hmmer) Return type: string
-
i_evalue_sel
¶ Returns: the i_evalue threshold used to select a hit for systems detection and for the Hmmer report (filtered hits) Return type: float
-
index_db_exe
¶ Returns: the name of the binary to index the input sequences dataset for Hmmer Return type: string
-
inter_gene_max_space
(system)[source]¶ Parameters: system (string) – the name of a system Returns: the maximum number of components with no match allowed between two genes with a match to consider them contiguous (at the system level) Return type: integer
-
max_nb_genes
(system)[source]¶ Parameters: system (string) – the name of a system Returns: the maximum number of genes to assess the system presence Return type: integer
-
min_genes_required
(system)[source]¶ Parameters: system (string) – the name of a system Returns: the genes (mandatory+accessory) quorum to assess the system presence Return type: integer
-
min_mandatory_genes_required
(system)[source]¶ Parameters: system (string) – the name of a system Returns: the mandatory genes quorum to assess the system presence Return type: integer
-
multi_loci
(system)[source]¶ Parameters: system (string) – the name of a system Returns: the genes (mandatory+accessory) quorum to assess the system presence Return type: boolean
-
previous_run
¶ Returns: the path to the previous run directory to use (to recover Hmmer raw output) Return type: string
-
profile_dir
¶ Returns: the path to the directory where are the HMM protein profiles which corresponds to Gene Return type: string
-
profile_suffix
¶ Returns: the suffix for profile files Return type: string
-
replicon_topology
¶ Returns: the topology of the replicons. Two values are supported ‘linear’ (default) and circular. Only relevant for ‘ordered’ datasets Return type: string
-
res_extract_suffix
¶ Returns: the suffix of extract files (tabulated files after HMM output parsing and filtering of hits) Return type: string
-
res_search_dir
¶ :return the path to the directory to store results of MacSyFinder runs :rtype: string
-
res_search_suffix
¶ Returns: the suffix for Hmmer raw output files Return type: string
-
sequence_db
¶ Returns: the path to the input sequence dataset (in fasta format) Return type: string
-
topology_file
¶ Returns: the path to the file of replicons topology. Return type: string
-
worker_nb
¶ Returns: the maximum number of parallel jobs Return type: int
-
working_dir
¶ Returns: the path to the working directory to use for this run Rtpe: string
-