Configuration API

Options to run MacSyFinder can be specified in a Configuration file. The API described below handles all configuration options for MacSyFinder. The Config object provides some default values, and performs some validations of the values.

Config API reference

class macsypy.config.Config(cfg_file='', sequence_db=None, db_type=None, replicon_topology=None, topology_file=None, inter_gene_max_space=None, min_mandatory_genes_required=None, min_genes_required=None, max_nb_genes=None, multi_loci=None, hmmer_exe=None, index_db_exe=None, e_value_res=None, i_evalue_sel=None, coverage_profile=None, def_dir=None, res_search_dir=None, res_search_suffix=None, profile_dir=None, profile_suffix=None, res_extract_suffix=None, out_dir=None, log_level=None, log_file=None, worker_nb=None, config_file=None, previous_run=None, build_indexes=None)[source]

Parse configuration files and handle the configuration according to the following file location precedence: /etc/macsyfinder/macsyfinder.conf < ~/.macsyfinder/macsyfinder.conf < .macsyfinder.conf

If a configuration file is given on the command-line, this file will be used. In fine the arguments passed on the command-line have the highest priority.

__init__(cfg_file='', sequence_db=None, db_type=None, replicon_topology=None, topology_file=None, inter_gene_max_space=None, min_mandatory_genes_required=None, min_genes_required=None, max_nb_genes=None, multi_loci=None, hmmer_exe=None, index_db_exe=None, e_value_res=None, i_evalue_sel=None, coverage_profile=None, def_dir=None, res_search_dir=None, res_search_suffix=None, profile_dir=None, profile_suffix=None, res_extract_suffix=None, out_dir=None, log_level=None, log_file=None, worker_nb=None, config_file=None, previous_run=None, build_indexes=None)[source]
Parameters:
  • cfg_file (string) – the path to the MacSyFinder configuration file to use
  • previous_run (string) – the path to the results directory of a previous run
  • sequence_db (string) – the path to the sequence input dataset (fasta format)
  • db_type (string) – the type of dataset to deal with. “unordered_replicon” corresponds to a non-assembled genome, “unordered” to a metagenomic dataset, “ordered_replicon” to an assembled genome, and “gembase” to a set of replicons where sequence identifiers follow this convention “>RepliconName_SequenceID”.”
  • replicon_topology (string) – the topology (‘linear’ or ‘circular’) of the replicons. This option is meaningful only if the db_type is ‘ordered_replicon’ or ‘gembase’
  • topology_file (string) – a tabular file of mapping between replicon names and the corresponding topology (e.g. “RepliconA linear”)
  • inter_gene_max_space (list of list of 2 elements [[ string system, integer space] , ..]) –
  • min_mandatory_genes_required (list of list of 2 elements [[ string system, integer ] , ..]) –
  • min_genes_required (list of list of 2 elements [[ string system, integer ] , ..]) –
  • max_nb_genes (list of list of 2 elements [[ string system, integer ] , ..]) –
  • multi_loci (string) –
  • hmmer_exe (string) – the Hmmer “hmmsearch” executable
  • index_db_exe (string) – the indexer executable (“makeblastdb” or “formatdb”)
  • e_value_res (float) – maximal e-value for hits to be reported during Hmmer search
  • i_evalue_sel (float) – maximal independent e-value for Hmmer hits to be selected for system detection
  • coverage_profile (float) – minimal profile coverage required in the hit alignment to allow the hit selection for system detection
  • def_dir (string) – the path to the directory containing systems definition files (.xml)
  • res_search_dir (string) – the path to the directory where to store MacSyFinder search results directories.
  • out_dir (string) – The results are written in a directory. By default the directory is named macsyfinder-{date}, but this option allow to override this behavior. If out-dir option is set out-dir will be created if outdir already exists it must be empty. If out-dir and res-search-dir are sets res-search-dir will be ignore.
  • res_search_suffix (string) – the suffix to give to Hmmer raw output files
  • res_extract_suffix (string) – the suffix to give to filtered hits output files
  • profile_dir (string) – path to the profiles directory
  • profile_suffix (string) – the suffix of profile files. For each ‘Gene’ element, the corresponding profile is searched in the ‘profile_dir’, in a file which name is based on the Gene name + the profile suffix.
  • log_level (int) – the level of log output
  • log_file (string) – the path to the directory to write MacSyFinder log files
  • worker_nb (int) – maximal number of processes to be used in parallel (multi-thread run, 0 use all cores available)
  • build_indexes (boolean) – build the indexes from the sequence dataset in fasta format
__weakref__

list of weak references to the object (if defined)

_validate(cmde_line_opt, cmde_line_values)[source]

Get all configuration values and check the validity of their values. Create the working directory

Parameters:
  • cmde_line_opt (dict, all values are cast in string) – the options from the command line
  • cmde_line_values (dict, values are not cast) – the options from the command line
Returns:

all the options for this execution

Return type:

dictionary

build_indexes
Returns:True if the indexes must be rebuilt, False otherwise
Return type:boolean
coverage_profile
Returns:the coverage threshold used to select a hit for systems detection and for the Hmmer report (filtered hits)
Return type:float
db_type
Returns:the type of the input sequence data set. The allowed values are : * ‘unordered_replicon’, * ‘ordered_replicon’, * ‘gembase’, * ‘unordered’
Return type:string
def_dir
Returns:the path to the directory where are stored definitions of secretion systems (.xml files)
Return type:string
e_value_res
Returns:The e_value threshold used by Hmmer to report hits in the Hmmer raw output file
Return type:float
hmmer_dir
Returns:the name of the directory where the hmmer results are stored
Return type:string
hmmer_exe
Returns:the name of the binary to execute for homology search from HMM protein profiles (Hmmer)
Return type:string
i_evalue_sel
Returns:the i_evalue threshold used to select a hit for systems detection and for the Hmmer report (filtered hits)
Return type:float
index_db_exe
Returns:the name of the binary to index the input sequences dataset for Hmmer
Return type:string
inter_gene_max_space(system)[source]
Parameters:system (string) – the name of a system
Returns:the maximum number of components with no match allowed between two genes with a match to consider them contiguous (at the system level)
Return type:integer
max_nb_genes(system)[source]
Parameters:system (string) – the name of a system
Returns:the maximum number of genes to assess the system presence
Return type:integer
min_genes_required(system)[source]
Parameters:system (string) – the name of a system
Returns:the genes (mandatory+accessory) quorum to assess the system presence
Return type:integer
min_mandatory_genes_required(system)[source]
Parameters:system (string) – the name of a system
Returns:the mandatory genes quorum to assess the system presence
Return type:integer
multi_loci(system)[source]
Parameters:system (string) – the name of a system
Returns:the genes (mandatory+accessory) quorum to assess the system presence
Return type:boolean
previous_run
Returns:the path to the previous run directory to use (to recover Hmmer raw output)
Return type:string
profile_dir
Returns:the path to the directory where are the HMM protein profiles which corresponds to Gene
Return type:string
profile_suffix
Returns:the suffix for profile files
Return type:string
replicon_topology
Returns:the topology of the replicons. Two values are supported ‘linear’ (default) and circular. Only relevant for ‘ordered’ datasets
Return type:string
res_extract_suffix
Returns:the suffix of extract files (tabulated files after HMM output parsing and filtering of hits)
Return type:string
res_search_dir

:return the path to the directory to store results of MacSyFinder runs :rtype: string

res_search_suffix
Returns:the suffix for Hmmer raw output files
Return type:string
save(dir_path)[source]

save the configuration used for this run in the ini format file

sequence_db
Returns:the path to the input sequence dataset (in fasta format)
Return type:string
topology_file
Returns:the path to the file of replicons topology.
Return type:string
worker_nb
Returns:the maximum number of parallel jobs
Return type:int
working_dir
Returns:the path to the working directory to use for this run
Rtpe:string