Output format¶
MacSyFinder provides different types of output files. At each run, MacSyFinder creates a new folder, whose name is based on a fixed prefix and a random suffix, for instance “macsyfinder-20130128_08-57-46”. MacSyFinder output files are stored in this run-specific folder.
- There are three types of output files:
The main output files for the systems’ search. They differ with the search mode (ordered or unordered).
The HMMER output files (search of each systems’ components), located in the hmmer_results folder.
The internal configuration and log files.
Note
Each tabular output file contains a header line describing each column in the output.
Output files for the “ordered replicon(s)” search modes¶
These output files are provided when MacSyFinder search proceeds on a set of proteins that are deemed to follow the order of their genes on replicons. This corresponds to the two search modes gembase and ordered_replicon.
Systems detection results¶
Different types of output files are provided, human-readable files “.txt”, and tabulated files “.tsv”. For the latter, headers are provided with the content of the lines in the file.
best_solution.tsv - This file contains the best solution found by MacSyFinder in terms of systems detected, under the form of a per-component, tabulated report file. A solution consists in a set of compatible systems (no components’ overlap allowed). If multiple solutions showed a maximal score, a ranking is established.
To see potential other best solutions (in case several obtained the same highest score), see file all_best_solutions.tsv.
To see all possible, candidate systems without further processing, see files all_systems.txt and all_systems.tsv.
The best_solution.tsv file is the most similar to former V1 file macsyfinder.report.
best_solution_loners.tsv and best_solution_multisystems.tsv report hits which have been identified as loners or multi-systems which means that the corresponding gene is tagged as a ‘loner’ or ‘multi-system’ in the model definition and the hit is lot located in a cluster.
best_solution_summary.tsv is a summary of the best_solution.tsv file, containing the number of systems detected in each replicon analysed.
all_systems.txt - This file describes the search process of all possible candidate systems given the definitions in systems’ models - without processing of the potential overlaps between candidate systems. This set of possible candidate systems are also given under the form of a tabulated file in all_systems.tsv.
rejected_candidates.tsv and rejected_candidates.txt - This file lists candidate clusters (or a combination of clusters) components that were rejected by MacSyFinder during the search process, and were thus not assigned to a candidate system. This set of clusters are also given under the form of tabulated file rejected_candidates.tsv.
all_best_solutions.tsv - This file contains all possible best solutions under the form of a per-component, tabulated report file. To retrieve a single best solution as proposed by MacSyFinder, see file best_solution.tsv.
all_systems.tsv - This file contains all possible candidate systems given the definitions - without processing of the potential overlaps between candidate systems, under the form of a per-component, tabulated report file. It corresponds to the tabulated version of the all_systems.txt file.
all_systems.txt¶
The file starts with some comments:
the version of MacSyFinder used
the name of model package and version used
the command line used to produce this file
Then for each replicon, the systems detected are listed along with their description:
system_id - the unique identifier of a system
model - the model assigned to this system
replicon - the name of the replicon harbouring the system
clusters - the clusters composition of this system
each clusters is a list of tuple
each tuple is composed of:
the name of the matching gene(s) in the replicon
the name of the corresponding gene profile(s)
the position of the corresponding sequence(s) along the replicon
occurrence - the average number of occurrences of each components of the system (as a potential proxy to estimate whether there’s the genetic potential for multiple systems in one)
wholeness - the percentage of the model’s components that were found in this system
loci nb - the number of different loci constituting this system
score - the score of the system. See here for more details
systems components - the number of occurrences of each model components in parenthesis the name of the matching profile in square brackets the name of other putative systems that would involve this gene
Here is an example of the all_systems.txt file:
# macsyfinder 20200217.dev
# models: TFF-SF_final-0.1
# macsyfinder --sequence-db DATA_TEST/sequences.prt --db-type=gembase --models-dir data/models/ --models TFF-SF_final all -w 4
# Systems found:
system id = VICH001.B.00001.C001_MSH_1
model = TFF-SF_final/MSH
replicon = VICH001.B.00001.C001
clusters = [('VICH001.B.00001.C001_00406', 'MSH_mshI', 366), ('VICH001.B.00001.C001_00407', 'MSH_mshJ', 367), ('VICH001.B.00001.C001_00408', 'MSH_mshK', 368), ('VICH001.B.00001.C001_00409', '
MSH_mshL', 369), ('VICH001.B.00001.C001_00410', 'MSH_mshM', 370), ('VICH001.B.00001.C001_00411', 'MSH_mshN', 371), ('VICH001.B.00001.C001_00412', 'MSH_mshE', 372), ('VICH001.B.00001.C001_0041
3', 'MSH_mshG', 373), ('VICH001.B.00001.C001_00414', 'MSH_mshF', 374), ('VICH001.B.00001.C001_00415', 'MSH_mshB', 375), ('VICH001.B.00001.C001_00416', 'MSH_mshA', 376), ('VICH001.B.00001.C001
_00417', 'MSH_mshC', 377), ('VICH001.B.00001.C001_00418', 'MSH_mshD', 378), ('VICH001.B.00001.C001_00419', 'MSH_mshO', 379), ('VICH001.B.00001.C001_00420', 'MSH_mshP', 380), ('VICH001.B.00001
.C001_00421', 'MSH_mshQ', 381)]
occ = 1
wholeness = 0.941
loci nb = 1
score = 10.500
mandatory genes:
- MSH_mshA: 1 (MSH_mshA)
- MSH_mshE: 1 (MSH_mshE)
- MSH_mshG: 1 (MSH_mshG)
- MSH_mshL: 1 (MSH_mshL)
- MSH_mshM: 1 (MSH_mshM)
accessory genes:
- MSH_mshB: 1 (MSH_mshB)
- MSH_mshC: 1 (MSH_mshC)
- MSH_mshD: 1 (MSH_mshD)
- MSH_mshF: 1 (MSH_mshF)
- MSH_mshI: 1 (MSH_mshI)
- MSH_mshI2: 0 ()
- MSH_mshJ: 1 (MSH_mshJ)
- MSH_mshK: 1 (MSH_mshK)
- MSH_mshN: 1 (MSH_mshN)
- MSH_mshO: 1 (MSH_mshO)
- MSH_mshQ: 1 (MSH_mshQ)
- MSH_mshP: 1 (MSH_mshP)
neutral genes:
============================================================
system id = VICH001.B.00001.C001_T4P_14
model = TFF-SF_final/T4P
replicon = VICH001.B.00001.C001
clusters = [('VICH001.B.00001.C001_00476', 'T4P_pilT', 427), ('VICH001.B.00001.C001_00477', 'T4P_pilU', 428)], [('VICH001.B.00001.C001_00847', 'T4P_pilO', 778), ('VICH001.B.00001.C001_00850',
'T4P_pilE', 781), ('VICH001.B.00001.C001_00851', 'T4P_fimT', 782), ('VICH001.B.00001.C001_00852', 'T4P_pilW', 783), ('VICH001.B.00001.C001_00853', 'T4P_pilX', 784), ('VICH001.B.00001.C001_00
854', 'T4P_pilV', 785)], [('VICH001.B.00001.C001_02305', 'T4P_pilA', 2202), ('VICH001.B.00001.C001_02306', 'T4P_pilB', 2203), ('VICH001.B.00001.C001_02307', 'T4P_pilC', 2204), ('VICH001.B.000
01.C001_02308', 'T4P_pilD', 2205)], [('VICH001.B.00001.C001_02502', 'MSH_mshM', 2391), ('VICH001.B.00001.C001_02505', 'T4P_pilQ', 2394), ('VICH001.B.00001.C001_02506', 'T4P_pilP', 2395), ('VI
CH001.B.00001.C001_02507', 'T4P_pilO', 2396), ('VICH001.B.00001.C001_02508', 'T4P_pilN', 2397), ('VICH001.B.00001.C001_02509', 'T4P_pilM', 2398)]
occ = 1
wholeness = 0.944
loci nb = 4
score = 12.000
mandatory genes:
- T4P_pilE: 1 (T4P_pilE)
- T4P_pilB: 1 (T4P_pilB)
- T4P_pilC: 1 (T4P_pilC)
- T4P_pilO: 2 (T4P_pilO, T4P_pilO)
- T4P_pilQ: 1 (T4P_pilQ)
- T4P_pilN: 1 (T4P_pilN)
- T4P_pilT: 1 (T4P_pilT)
- T4P_pilD: 1 (T4P_pilD [VICH001.B.00001.C001_T2SS_4])
accessory genes:
- T4P_pilA: 1 (T4P_pilA)
- T4P_pilV: 1 (T4P_pilV)
- T4P_pilY: 0 ()
- T4P_pilW: 1 (T4P_pilW)
- T4P_pilX: 1 (T4P_pilX)
- T4P_fimT: 1 (T4P_fimT)
- T4P_pilM: 1 (T4P_pilM)
- T4P_pilP: 1 (T4P_pilP)
- T4P_pilU: 1 (T4P_pilU)
- MSH_mshM: 1 (MSH_mshM)
neutral genes:
all_systems.tsv¶
This corresponds to the tabulated version of the systems listed in all_systems.txt. Each line corresponds to a “hit” that has been assigned to a detected system. It includes:
replicon - the name of the replicon it belongs to
hit_id - the unique identifier of the hit
gene_name - the name of the component identified by the hit
hit_pos - the position of the sequence in the replicon
model_fqn - the model fully-qualified name
sys_id - the unique identifier attributed to the detected system
sys_loci - the number of loci
locus_num - the number of the locus where is located this gene. Loners gene have a negative locus_num
sys_wholeness - the wholeness of the system
sys_score - the system score
sys_occ - the estimated number of system occurrences that could be potentially “filled” with this system’s occurrence, based on the average number of each component found. A proxy for the genetic potential ton encode several systems from the set of components found in this one occurrence.
hit_gene_ref - the gene in the model whose this hit plays the role of
hit_status - the status of the component in the assigned system’s definition
hit_seq_len - the length of the protein sequence matched by this hit
hit_i_eval - Hmmer statistics, the independent-evalue
hit_score - Hmmer score
hit_profile_cov - the percentage of the profile covered by the alignment with the sequence
hit_seq_cov - the percentage of the sequence covered by the alignment with the profile
hit_begin_match - the position in the sequence where the profile match begins
hit_end_match - the position in the sequence where the profile match ends
counterpart - the hit id of some other hit which are equivalent. Only loners and multi-systems hits have counterparts
used_in - whether the hit could be used in another system’s occurrence
This file can be easily parsed using the Python pandas library.
import pandas as pd
systems = pd.read_csv("path/to/systems.tsv", sep='\t', comment='#')
Note
Each system reported is separated from the others with a blank line to ease human reading. These lines are ignored during the parsing with pandas.
# macsyfinder 20220121.dev
# models : functional-0.0b2
# /home/bneron/Projects/GEM/MacSyFinder/MacSyFinder/py39/bin/macsyfinder --db-type=gembase --models-dir=tests/data//models/ --models TFF-SF Archaeal-T4P ComM MSH T2SS T4bP T4P Tad --relative-path --sequence-db tests/data/base/gembase.fasta -w 12
# Systems found:
replicon hit_id gene_name hit_pos model_fqn sys_id sys_loci locus_num sys_wholeness sys_score sys_occ hit_gene_ref hit_status hit_seq_len hit_i_eval hit_score hit_profile_cov hit_seq_cov hit_begin_match hit_end_match counterpart used_in
GCF_000005845 GCF_000005845_000970 T4P_pilC 97 TFF-SF/T4P GCF_000005845_T4P_14 3 1 0.556 7.260 1 T4P_pilC mandatory 400 2.2e-105 353.100 0.991 0.830 62 393
GCF_000005845 GCF_000005845_000980 T4P_pilB 98 TFF-SF/T4P GCF_000005845_T4P_14 3 1 0.556 7.260 1 T4P_pilB mandatory 461 8.9e-152 506.100 0.948 0.850 62 453
GCF_000005845 GCF_000005845_000990 T4P_pilA 99 TFF-SF/T4P GCF_000005845_T4P_14 3 1 0.556 7.260 1 T4P_pilA accessory 146 1.1e-19 71.200 0.859 0.473 5 73
GCF_000005845 GCF_000005845_025680 T4P_pilW 2568 TFF-SF/T4P GCF_000005845_T4P_14 3 2 0.556 7.260 1 T4P_pilW accessory 187 3.3e-08 34.500 0.625 0.401 6 80
GCF_000005845 GCF_000005845_025690 T4P_fimT 2569 TFF-SF/T4P GCF_000005845_T4P_14 3 2 0.556 7.260 1 T4P_fimT accessory 156 2.5e-06 28.500 0.939 0.397 5 66
GCF_000005845 GCF_000005845_030590 T4P_pilQ 3059 TFF-SF/T4P GCF_000005845_T4P_14 3 3 0.556 7.260 1 T4P_pilQ mandatory 412 5.9e-51 173.100 0.919 0.408 244 411
GCF_000005845 GCF_000005845_030620 T4P_pilN 3062 TFF-SF/T4P GCF_000005845_T4P_14 3 3 0.556 7.260 1 T4P_pilN mandatory 179 3.8e-09 37.500 0.986 0.765 5 141
GCF_000005845 GCF_000005845_030630 T4P_pilM 3063 TFF-SF/T4P GCF_000005845_T4P_14 3 3 0.556 7.260 1 T4P_pilM accessory 259 1.1e-09 39.300 0.988 0.598 8 162
GCF_000005845 GCF_000005845_026740 T4P_pilT 2674 TFF-SF/T4P GCF_000005845_T4P_14 3 -1 0.556 7.260 1 T4P_pilT mandatory 326 1.1e-117 393.600 0.944 0.979 3 321
GCF_000005845 GCF_000005845_026930 T2SS_gspO 2693 TFF-SF/T4P GCF_000005845_T4P_14 3 -2 0.556 7.260 1 T4P_pilD mandatory 269 1.3e-87 294.000 1.000 0.859 30 260 GCF_000005845_030080 GCF_000005845_T2SS_2
Note
If a loner component is not clustered with other genes, it will not be considered as part of a locus. Thus, its locus number will be a negative value (numbered from -1) and will not be counted in the variable sys_loci (number of loci for a system). See above lines for more details.
GCF_000005845 GCF_000005845_026740 T4P_pilT 2674 TFF-SF/T4P GCF_000005845_T4P_25 3 -1 0.556 7.800
GCF_000005845 GCF_000005845_026930 T2SS_gspO 2693 TFF-SF/T4P GCF_000005845_T4P_25 3 -2 0.556 7.800
best_solution.tsv and all_best_solutions.tsv¶
Since MacSyFinder 2.0, a combinatorial exploration of solutions using sets of systems found is performed. We call best solution, the combination of systems offering the highest score.
The best_solution.tsv and all_best_solutions.tsv files have the same structure as the file all_systems.tsv, except that there is an extra column sol_id which is a solution identifier added to the file all_best_solutions.tsv. The systems that have the same “sol_id” belong to a same solution.
As the files have the same structure as all_systems.tsv, they can also be parsed with pandas as shown above.
For the description of the fields of best_solution.tsv, see above those of the all_systems.tsv file.
For the all_best_solutions.tsv, each line corresponds to a “hit” that has been assigned to a detected system. It includes:
sol_id - the name of the solution it is part of (only in all_best_solutions.tsv files)
replicon - the name of the replicon it belongs to
hit_id - the unique identifier of the hit
gene_name - the name of the component identified by the hit
hit_pos - the position of the sequence in the replicon
model_fqn - the model fully-qualified name
sys_id - the unique identifier attributed to the detected system
sys_loci - the number of loci
locus_num - the number of the locus where is located this gene. Loners gene have negative locus_num
sys_wholeness - the wholeness of the system
sys_score - the system score
sys_occ - the estimated number of system occurrences that could be potentially “filled” with this system’s occurrence, based on the average number of each component found. A proxy for the genetic potential ton encode several systems from the set of components found in this one occurrence.
hit_gene_ref - the gene in the model whose this hit plays the role of
hit_status - the status of the component in the assigned system’s definition
hit_seq_len - the length of the protein sequence matched by this hit
hit_i_eval - Hmmer statistics, the independent-evalue
hit_score - Hmmer score
hit_profile_cov - the percentage of the profile covered by the alignment with the sequence
hit_seq_cov - the percentage of the sequence covered by the alignment with the profile
hit_begin_match - the position in the sequence where the profile match begins
hit_end_match - the position in the sequence where the profile match ends
counterpart - the hit id of some other hit which are equivalent. Only loners and multi-systems hits have counterparts
used_in - whether the hit could be used in another system’s occurrence
Note
Each system reported is separated from the others with a blank line to ease human reading. These lines are ignored during the parsing with pandas.
Example of best_solution.tsv files
# macsyfinder 20220121.dev
# models : functional-0.0b2
# /home/bneron/Projects/GEM/MacSyFinder/MacSyFinder/py39/bin/macsyfinder --db-type=gembase --models-dir=tests/data//models/ --models TFF-SF Archaeal-T4P ComM MSH T2SS T4bP T4P Tad --relative-path --sequence-db tests/data/base/gembase.fasta -w 12
# Systems found:
replicon hit_id gene_name hit_pos model_fqn sys_id sys_loci locus_num sys_wholeness sys_score sys_occ hit_gene_ref hit_status hit_seq_len hit_i_eval hit_score hit_profile_cov hit_seq_cov hit_begin_match hit_end_match counterpart used_in
GCF_000005845 GCF_000005845_000970 T4P_pilC 97 TFF-SF/T4P GCF_000005845_T4P_9 1 1 0.278 3.760 1 T4P_pilC mandatory 400 2.2e-105 353.100 0.991 0.830 62 393
GCF_000005845 GCF_000005845_000980 T4P_pilB 98 TFF-SF/T4P GCF_000005845_T4P_9 1 1 0.278 3.760 1 T4P_pilB mandatory 461 8.9e-152 506.100 0.948 0.850 62 453
GCF_000005845 GCF_000005845_000990 T4P_pilA 99 TFF-SF/T4P GCF_000005845_T4P_9 1 1 0.278 3.760 1 T4P_pilA accessory 146 1.1e-19 71.200 0.859 0.473 5 73
GCF_000005845 GCF_000005845_026740 T4P_pilT 2674 TFF-SF/T4P GCF_000005845_T4P_9 1 -1 0.278 3.760 1 T4P_pilT mandatory 326 1.1e-117 393.600 0.944 0.979 3 321
GCF_000005845 GCF_000005845_026930 T2SS_gspO 2693 TFF-SF/T4P GCF_000005845_T4P_9 1 -2 0.278 3.760 1 T4P_pilD mandatory 269 1.3e-87 294.000 1.000 0.859 30 260 GCF_000005845_030080 GCF_000005845_T2SS_2
GCF_000005845 GCF_000005845_025680 T4P_pilW 2568 TFF-SF/T4P GCF_000005845_T4P_13 2 1 0.389 4.760 1 T4P_pilW accessory 187 3.3e-08 34.500 0.625 0.401 6 80
GCF_000005845 GCF_000005845_025690 T4P_fimT 2569 TFF-SF/T4P GCF_000005845_T4P_13 2 1 0.389 4.760 1 T4P_fimT accessory 156 2.5e-06 28.500 0.939 0.397 5 66
GCF_000005845 GCF_000005845_030590 T4P_pilQ 3059 TFF-SF/T4P GCF_000005845_T4P_13 2 2 0.389 4.760 1 T4P_pilQ mandatory 412 5.9e-51 173.100 0.919 0.408 244 411
GCF_000005845 GCF_000005845_030620 T4P_pilN 3062 TFF-SF/T4P GCF_000005845_T4P_13 2 2 0.389 4.760 1 T4P_pilN mandatory 179 3.8e-09 37.500 0.986 0.765 5 141
Example of all_best_solutions.tsv files
# macsyfinder 20220121.dev
# models : functional-0.0b2
# /home/bneron/Projects/GEM/MacSyFinder/MacSyFinder/py39/bin/macsyfinder --db-type=gembase --models-dir=tests/data//models/ --models TFF-SF Archaeal-T4P ComM MSH T2SS T4bP T4P Tad --relative-path --sequence-db tests/data/base/gembase.fasta -w 12
# Systems found:
sol_id replicon hit_id gene_name hit_pos model_fqn sys_id sys_loci locus_num sys_wholeness sys_score sys_occ hit_gene_ref hit_status hit_seq_len hit_i_eval hit_score hit_profile_cov hit_seq_cov hit_begin_match hit_end_match counterpart used_in
1 GCF_000005845 GCF_000005845_000970 T4P_pilC 97 TFF-SF/T4P GCF_000005845_T4P_9 1 1 0.278 3.760 1 T4P_pilC mandatory 400 2.2e-105 353.100 0.991 0.830 62 393
1 GCF_000005845 GCF_000005845_000980 T4P_pilB 98 TFF-SF/T4P GCF_000005845_T4P_9 1 1 0.278 3.760 1 T4P_pilB mandatory 461 8.9e-152 506.100 0.948 0.850 62 453
1 GCF_000005845 GCF_000005845_000990 T4P_pilA 99 TFF-SF/T4P GCF_000005845_T4P_9 1 1 0.278 3.760 1 T4P_pilA accessory 146 1.1e-19 71.200 0.859 0.473 5 73
1 GCF_000005845 GCF_000005845_026740 T4P_pilT 2674 TFF-SF/T4P GCF_000005845_T4P_9 1 -1 0.278 3.760 1 T4P_pilT mandatory 326 1.1e-117 393.600 0.944 0.979 3 321
1 GCF_000005845 GCF_000005845_026930 T2SS_gspO 2693 TFF-SF/T4P GCF_000005845_T4P_9 1 -2 0.278 3.760 1 T4P_pilD mandatory 269 1.3e-87 294.000 1.000 0.859 30 260 GCF_000005845_030080 GCF_000005845_T2SS_2
1 GCF_000005845 GCF_000005845_025680 T4P_pilW 2568 TFF-SF/T4P GCF_000005845_T4P_13 2 1 0.389 4.760 1 T4P_pilW accessory 187 3.3e-08 34.500 0.625 0.401 6 80
1 GCF_000005845 GCF_000005845_025690 T4P_fimT 2569 TFF-SF/T4P GCF_000005845_T4P_13 2 1 0.389 4.760 1 T4P_fimT accessory 156 2.5e-06 28.500 0.939 0.397 5 66
1 GCF_000005845 GCF_000005845_030590 T4P_pilQ 3059 TFF-SF/T4P GCF_000005845_T4P_13 2 2 0.389 4.760 1 T4P_pilQ mandatory 412 5.9e-51 173.100 0.919 0.408 244 411
1 GCF_000005845 GCF_000005845_030620 T4P_pilN 3062 TFF-SF/T4P GCF_000005845_T4P_13 2 2 0.389 4.760 1 T4P_pilN mandatory 179 3.8e-09 37.500 0.986 0.765 5 141
1 GCF_000005845 GCF_000005845_030630 T4P_pilM 3063 TFF-SF/T4P GCF_000005845_T4P_13 2 2 0.389 4.760 1 T4P_pilM accessory 259 1.1e-09 39.300 0.988 0.598 8 162
1 GCF_000005845 GCF_000005845_026740 T4P_pilT 2674 TFF-SF/T4P GCF_000005845_T4P_13 2 -1 0.389 4.760 1 T4P_pilT mandatory 326 1.1e-117 393.600 0.944 0.979 3 321
1 GCF_000005845 GCF_000005845_026930 T2SS_gspO 2693 TFF-SF/T4P GCF_000005845_T4P_13 2 -2 0.389 4.760 1 T4P_pilD mandatory 269 1.3e-87 294.000 1.000 0.859 30 260 GCF_000005845_030080 GCF_000005845_T2SS_2
1 GCF_000005845 GCF_000005845_029970 T2SS_gspC 2997 TFF-SF/T2SS GCF_000005845_T2SS_1 1 1 0.857 9.000 1 T2SS_gspC mandatory 271 2.3e-19 70.400 0.897 0.358 47 143
1 GCF_000005845 GCF_000005845_030050 T2SS_gspK 3005 TFF-SF/T2SS GCF_000005845_T2SS_1 1 1 0.857 9.000 1 T2SS_gspK accessory 327 1e-16 61.500 1.000 0.180 6 64
1 GCF_000005845 GCF_000005845_030060 T2SS_gspL 3006 TFF-SF/T2SS GCF_000005845_T2SS_1 1 1 0.857 9.000 1 T2SS_gspL accessory 387 1.5e-37 129.300 1.000 0.351 6 141
1 GCF_000005845 GCF_000005845_030070 T2SS_gspM 3007 TFF-SF/T2SS GCF_000005845_T2SS_1 1 1 0.857 9.000 1 T2SS_gspM accessory 153 2.8e-29 102.900 0.985 0.804 13 135
1 GCF_000005845 GCF_000005845_030080 T2SS_gspO 3008 TFF-SF/T2SS GCF_000005845_T2SS_1 1 1 0.857 9.000 1 T2SS_gspO mandatory 225 4e-65 220.400 0.978 0.840 26 214
# WARNING Loner: there is only 1 occurrence(s) of loner 'T4P_pilT' and 2 potential systems [GCF_000005845_T4P_9, GCF_000005845_T4P_13]
2 GCF_000005845 GCF_000005845_000970 T4P_pilC 97 TFF-SF/T4P GCF_000005845_T4P_11 2 1 0.389 4.760 1 T4P_pilC mandatory 400 2.2e-105 353.100 0.991 0.830 62 393
Note
If a loner component is not clustered with other genes, it will not be considered as part of a locus. Thus, its locus number will be a negative value (numbered from -1) and will not be counted in the variable sys_loci (number of loci for a system). See above lines for more details.
Note
If several systems from same model use a loner (same gene) msf check that there is at least one occurrence of this hit for each system. If there are fewer hits than systems occurrence a warning is displayed in best_solution.tsv or all_best_solution.tsv as comment. So the file can be parsed with pandas without problem.
1 GCF_000005845 GCF_000005845_030080 T2SS_gspO 3008 TFF-SF/T2SS GCF_000005845_T2SS_1 1 1 0.857 9.000 1 T2SS_gspO mandatory 225 4e-65 220.400 0.978 0.840 26 214
# WARNING Loner: there is only 1 occurrence(s) of loner 'T4P_pilT' and 2 potential systems [GCF_000005845_T4P_9, GCF_000005845_T4P_13]
2 GCF_000005845 GCF_000005845_000970 T4P_pilC 97 TFF-SF/T4P GCF_000005845_T4P_11 2 1 0.389 4.760 1 T4P_pilC mandatory 400 2.2e-105 353.100 0.991 0.830 62 393
Note
In case multiple solutions have the exact same score, a sorting is performed among the best solutions, and the solution ranked 1st is reported in the best_solution.tsv and best_solution.txt files. The ranking is performed as follow:
by the number of systems’ components (hits) constituting the solution (most components first)
by the number of systems (most systems in first)
by the average of systems’ wholeness
by hits position. This criterion is mostly introduced to produce reproducible results between two runs.
best_solution_summary.tsv¶
This file is a concise view of which systems have been found in your replicons and how many per replicon. It is based on best_solution.tsv. The first two lines are comments that indicate the version of MacSyFinder and the command line used to generate the results. Then a table represented by tabulated text to separate columns, with the searched models in columns and the replicons scanned for the models in row.
# macsyfinder 20220121.dev
# models : functional-0.0b2
# /home/bneron/Projects/GEM/MacSyFinder/MacSyFinder/py39/bin/macsyfinder --db-type=gembase --models-dir=tests/data//models/ --models TFF-SF Archaeal-T4P ComM MSH T2SS T4bP T4P Tad --relative-path --sequence-db tests/data/base/gembase.fasta -w 12
replicon TFF-SF/MSH TFF-SF/T2SS TFF-SF/T4P TFF-SF/T4bP TFF-SF/Tad TFF-SF/Archaeal-T4P TFF-SF/ComM
GCF_000005845 0 1 2 0 0 0 0
GCF_000006725 0 1 2 0 0 0 0
GCF_000006745 1 1 2 1 0 0 0
GCF_000006765 0 3 1 0 1 0 0
GCF_000006845 0 0 1 0 0 0 0
GCF_000006905 0 1 0 0 1 0 0
GCF_000006925 0 0 1 0 0 0 0
GCF_000006945 0 0 2 0 0 0 0
as a tsv file it can be parsed easily using pandas:
import pandas as pd
solution = pd.read_csv('path to best_solution_summary.tsv', sep='\t', comment='#', index_col=0)
Note
If you want to do the same operation but based on the all_best_solutions.tsv file, you can do it with the few lines of pandas below:
import pandas as pd all_best_sol = '<macsyfinder_results_dir>/all_best_solutions.tsv' # read data from best_solution file data = pd.read_csv(all_best_sol, sep='\t', comment='#') # remove useless columns selection = data[['sol_id', 'replicon', 'sys_id', 'model_fqn']] # keep only one row per replicon, sys_id dropped = selection.drop_duplicates(subset=['sol_id', 'replicon', 'sys_id']) # count for each replicon which models have been detected and their occurrences summary = pd.crosstab(index=[dropped.sol_id, dropped.replicon], columns=dropped['model_fqn'])
if you are not fluent in pandas, we provide you a tiny script msf_summary.py based on few lines above to do the job
Then you can run the script
python msf_summary.py <path_to_all_best_solutions.tsv>
below an example of summary of all_best_solutions.tsv
sol_id replicon TFF-SF/MSH TFF-SF/T2SS TFF-SF/T4P TFF-SF/T4bP TFF-SF/Tad
1 GCF_000005845 0 1 1 0 0
2 GCF_000006725 0 1 1 0 0
3 GCF_000006725 0 1 1 0 0
4 GCF_000006745 1 1 2 1 0
5 GCF_000006745 1 1 2 1 0
6 GCF_000006745 1 1 1 1 0
7 GCF_000006765 0 3 1 0 1
8 GCF_000006845 0 0 1 0 0
9 GCF_000006905 0 1 0 0 1
10 GCF_000006925 0 0 1 0 0
11 GCF_000006945 0 0 1 0 0
best_solution_loners.tsv¶
This file give an overview of all hits identified as Loner in the best_solution
# macsyfinder 20220121.dev # models : functional-0.0b2 # /home/bneron/Projects/GEM/MacSyFinder/MacSyFinder/py39/bin/macsyfinder --db-type=gembase --models-dir=tests/data//models/ --models TFF-SF Archaeal-T4P ComM MSH T2SS T4bP T4P Tad --relative-path --sequence-db tests/data/base/gembase.fasta -w 12 # Loners found: replicon model_fqn function gene_name hit_id hit_pos hit_status hit_seq_len hit_i_eval hit_score hit_profile_cov hit_seq_cov hit_begin_match hit_end_match GCF_000005845 TFF-SF/T4P T4P_pilT T4P_pilT GCF_000005845_026740 2674 mandatory 326 1.100e-117 393.600 0.944 0.979 3 321 GCF_000005845 TFF-SF/T4P T4P_pilD T2SS_gspO GCF_000005845_026930 2693 mandatory 269 1.300e-87 294.000 1.000 0.859 30 260 GCF_000005845 TFF-SF/T4P T4P_pilD T2SS_gspO GCF_000005845_030080 3008 mandatory 225 4.000e-65 220.400 0.978 0.840 26 214 GCF_000006725 TFF-SF/T4P T4P_pilT T4P_pilT GCF_000006725_000270 4269 mandatory 344 1.800e-172 573.700 0.994 0.985 2 340 GCF_000006725 TFF-SF/T4P T4P_pilA T4P_pilA GCF_000006725_003680 4610 accessory 187 9.000e-10 39.500 0.667 0.278 6 57 GCF_000006725 TFF-SF/T2SS T2SS_gspO T4P_pilD GCF_000006725_014570 5699 mandatory 287 7.400e-77 258.600 1.000 0.836 28 267 GCF_000006725 TFF-SF/T2SS T2SS_gspE T2SS_gspE GCF_000006725_018700 6112 mandatory 566 1.800e-171 571.000 0.936 0.701 165 561 GCF_000006725 TFF-SF/T4P T4P_pilA T4P_pilA GCF_000006725_022640 6506 accessory 178 2.000e-10 41.600 0.603 0.264 5 51 GCF_000006745 TFF-SF/T2SS T2SS_gspO T4P_pilD GCF_000006745_021980 8766 mandatory 291 3.100e-88 295.800 1.000 0.832 28 269 GCF_000006765 TFF-SF/T2SS T2SS_gspO T4P_pilD GCF_000006765_044730 14545 mandatory 290 1.100e-88 297.200 1.000 0.828 31 270 GCF_000006925 TFF-SF/T4P T4P_pilT T4P_pilT GCF_000006925_026070 23874 mandatory 341 6.600e-118 394.300 0.950 0.941 18 338 GCF_000006945 TFF-SF/T4P T4P_pilT T4P_pilT GCF_000006945_030160 28596 mandatory 326 3.400e-113 378.800 0.933 0.966 3 317 GCF_000006945 TFF-SF/T4P T4P_pilD T2SS_gspO GCF_000006945_033450 28925 mandatory 155 2.900e-35 122.700 0.588 0.871 9 143
best_solution_multisystems.tsv¶
This file give an overview of all hits identified as multi-systems in the best_solution
# macsyfinder 20220121.dev # models : functional-0.0b2 # /home/bneron/Projects/GEM/MacSyFinder/MacSyFinder/py39/bin/macsyfinder --db-type ordered_replicon --replicon-topology linear --models-dir tests/data/models/ -m functional T12SS-multisystem --relative-path --sequence-db tests/data/base/test_13.fasta -w 15 # Multisystems found: replicon model_fqn function gene_name hit_id hit_pos hit_status hit_seq_len hit_i_eval hit_score hit_profile_cov hit_seq_cov hit_begin_match hit_end_match UserReplicon functional/T12SS-multisystem T1SS_omf T1SS_omf VICH001.B.00001.C001_01360 20 mandatory 484 3.200e-28 90.000 0.985 0.820 80 476 UserReplicon functional/T12SS-multisystem T1SS_omf T1SS_omf VICH001.B.00001.C001_01506 35 mandatory 419 9.100e-35 111.500 0.998 0.912 25 406
rejected_candidates.txt¶
This file records all clusters or cluster combinations (if the “multi_loci” search mode is on) which have been discarded and the reason why they were not selected as systems.
The header is composed of the MacSyFinder version and the command line used followed by the description of the cluster(s). The list of the hits composing the cluster is presented at the end of the cluster or clusters’ combination, followed by the reason why it has been discarded.
Note
This file is in human readable format. If you need to parse the information about rejected candidates, use the tsv formatted file rejected_candidates.tsv
# macsyfinder 20200511.dev
# models : TFF-SF-0.1b
# macsyfinder --sequence-db data/base/GCF_000006745.fasta --models TFF-SF all --models-dir data/models/ --db-type gembase -w 4
# Rejected candidates:
Cluster:
- model: T4P
- hits: (GCF_000005845_025680, T4P_pilW, 2568), (GCF_000005845_025690, T4P_fimT, 2569)
Cluster:
- model: T4P
- hits: (GCF_000005845_026930, T2SS_gspO, 2693)
Cluster:
- model: T4P
- hits: (GCF_000005845_030080, T2SS_gspO, 3008)
This candidate has been rejected because:
The quorum of mandatory genes required (4) is not reached: 1
The quorum of genes required (5) is not reached: 3
============================================================
Cluster:
- model: Archaeal-T4P
- hits: (GCF_000005845_019260, Archaeal-T4P_arCOG00589, 1926), (GCF_000005845_019310, Archaeal-T4P_arCOG02900, 1931)
This candidate has been rejected because:
The quorum of mandatory genes required (3) is not reached: 0
The quorum of genes required (3) is not reached: 2
============================================================
rejected_candidates.tsv¶
This file contains same information as rejected_candidates.txt but in tsv format, so it’s more convenient to parse it. for instance with python and pandas library.:
import pandas as pd
pd.read_csv("path/to/rejected_candidates.tsv", sep-'\t', comment='#')
As other file the first lines are comments and provides informations to indicate how this file has been produced.
the macsyfinder version
the model package and version used
the command line used
then the following information separated by ‘tabulation’ character ‘t’
candidate_id - An unique identifier of the candidate (for this run)
replicon - The name of the replicon
model_fqn - The model fully-qualified name
cluster_id - An unique identifier for the cluster constituting the candidate
hit_id - The identifier of the hit (as indicate in hmmer output)
hit_pos - The position of the sequence in the replicon
gene_name - The name of the component identified by the hit
function - The name of the gene for which it it fulfill the function.
reasons - The reasons why this cluster has been discarded. ther can be several reasons, in this case each reason are separated by ‘/’.
Note
A rejected candidate can be constituted of
clusters (can have several clusters if the model is multi loci),
loners
Example of rejected_candidates.tsv
# macsyfinder 20220805.dev
# models : TFF-SF-None
# /home/bneron/Projects/GEM/MacSyFinder/MacSyFinder/py39/bin/macsyfinder --sequence-db data/base/GCF_000006745.fasta --models TFF-SF all --models-dir data/models/ --db-type gembase -w 15
# Rejected candidates found:
candidate_id replicon model_fqn cluster_id hit_id hit_pos gene_name function reasons
GCF_000006745_Archaeal-T4P_1 GCF_000006745 TFF-SF/Archaeal-T4P c3 GCF_000006745_018740 1874 Archaeal-T4P_arCOG00589 Archaeal-T4P_arCOG00589 The quorum of mandatory genes required (3) is not reached: 0/The quorum of genes required (3) is not reached: 1
GCF_000006745_Archaeal-T4P_1 GCF_000006745 TFF-SF/Archaeal-T4P c3 GCF_000006745_018800 1880 Archaeal-T4P_arCOG00589 Archaeal-T4P_arCOG00589 The quorum of mandatory genes required (3) is not reached: 0/The quorum of genes required (3) is not reached: 1
GCF_000006745_Archaeal-T4P_2 GCF_000006745 TFF-SF/Archaeal-T4P c4 GCF_000006745_026670 2667 Archaeal-T4P_arCOG02900 Archaeal-T4P_arCOG02900 The quorum of mandatory genes required (3) is not reached: 0/The quorum of genes required (3) is not reached: 1
GCF_000006745_Archaeal-T4P_2 GCF_000006745 TFF-SF/Archaeal-T4P c4 GCF_000006745_026680 2668 Archaeal-T4P_arCOG02900 Archaeal-T4P_arCOG02900 The quorum of mandatory genes required (3) is not reached: 0/The quorum of genes required (3) is not reached: 1
GCF_000006745_ComM_4 GCF_000006745 TFF-SF/ComM c11 GCF_000006745_017080 1708 ComM_comEC ComM_comEC The quorum of mandatory genes required (4) is not reached: 1/The quorum of genes required (4) is not reached: 1
GCF_000006745_ComM_5 GCF_000006745 TFF-SF/ComM c12 GCF_000006745_032430 3243 ComM_comEB ComM_comEB The quorum of mandatory genes required (4) is not reached: 1/The quorum of genes required (4) is not reached: 2
GCF_000006745_ComM_5 GCF_000006745 TFF-SF/ComM c13 GCF_000006745_017080 1708 ComM_comEC ComM_comEC The quorum of mandatory genes required (4) is not reached: 1/The quorum of genes required (4) is not reached: 2
GCF_000006745_ComM_3 GCF_000006745 TFF-SF/ComM c10 GCF_000006745_032430 3243 ComM_comEB ComM_comEB The quorum of mandatory genes required (4) is not reached: 0/The quorum of genes required (4) is not reached: 1
GCF_000006745_MSH_6 GCF_000006745 TFF-SF/MSH c18 GCF_000006745_004600 460 MSH_mshA MSH_mshA The quorum of mandatory genes required (3) is not reached: 1/The quorum of genes required (4) is not reached: 1
GCF_000006745_T2SS_7 GCF_000006745 TFF-SF/T2SS c25 GCF_000006745_021980 2198 T4P_pilD T2SS_gspO The quorum of mandatory genes required (4) is not reached: 1/The quorum of genes required (6) is not reached: 1
GCF_000006745_T4P_8 GCF_000006745 TFF-SF/T4P c30 GCF_000006745_004240 424 T4P_pilT T4P_pilT The quorum of mandatory genes required (4) is not reached: 1/The quorum of genes required (5) is not reached: 2
GCF_000006745_T4P_8 GCF_000006745 TFF-SF/T4P c30 GCF_000006745_004250 425 T4P_pilU T4P_pilU The quorum of mandatory genes required (4) is not reached: 1/The quorum of genes required (5) is not reached: 2
GCF_000006745_T4P_12 GCF_000006745 TFF-SF/T4P c34 GCF_000006745_004240 424 T4P_pilT T4P_pilT The quorum of mandatory genes required (4) is not reached: 2
GCF_000006745_T4P_12 GCF_000006745 TFF-SF/T4P c34 GCF_000006745_004250 425 T4P_pilU T4P_pilU The quorum of mandatory genes required (4) is not reached: 2
GCF_000006745_T4P_12 GCF_000006745 TFF-SF/T4P c35 GCF_000006745_007820 782 T4P_pilE T4P_pilE The quorum of mandatory genes required (4) is not reached: 2
GCF_000006745_T4P_12 GCF_000006745 TFF-SF/T4P c35 GCF_000006745_007830 783 T4P_fimT T4P_fimT The quorum of mandatory genes required (4) is not reached: 2
GCF_000006745_T4P_12 GCF_000006745 TFF-SF/T4P c35 GCF_000006745_007840 784 T4P_pilW T4P_pilW The quorum of mandatory genes required (4) is not reached: 2
GCF_000006745_T4P_12 GCF_000006745 TFF-SF/T4P c35 GCF_000006745_007850 785 T4P_pilX T4P_pilX The quorum of mandatory genes required (4) is not reached: 2
GCF_000006745_T4P_12 GCF_000006745 TFF-SF/T4P c35 GCF_000006745_007860 786 T4P_pilV T4P_pilV The quorum of mandatory genes required (4) is not reached: 2
Note
If a timeout is set to limit the time spent in best solution resolution. This timeout is applied per replicon. If the best solution resolution reach the timeout for a replicon, a WARNING is raised in macysfinder.log The warning is also report in the following files:
for instance
# macsyfinder 20230113.dev
# models : TXSScan-1.1.1
# macsyfinder --sequence-db tests/data/base/gembase.fasta --db-type gembase --models TXSScan -w 15 --timeout 1s
#
# WARNING: The replicon 'GCF_000006765' has been SKIPPED. Cannot be solved before timeout.
#
replicon hit_id ...
Output files for the “unordered replicon” search mode¶
Systems detection results¶
As for ordered replicons, several output files are provided.
all_systems.txt - This file contains the description of candidate systems found.
all_systems.tsv - The same information as in all_systems.txt but in the tabulated tsv format.
uncomplete_systems.txt - This file contains occurrences for systems that did not complete models’ definitions and that were therefore not kept as candidate systems.
Note
In this unordered search mode, there is no notion of order or distance of the components along the replicon. The clustering step is skipped by MacSyFinder, and it is therefore “only” checked for each type of system being searched whether there is the genetic potential to fulfil its model definition.
all_systems.txt¶
This file contains potential systems for unordered replicon in human readable format.
In this file, for each component of each searched system’s model, we report the number of hits found. For the description of the fields, see above.
Warning
In this mode the forbidden genes are reported here to the user. As we do not know if they co-localize (cluster) with the other genes they could be present in the replicon, yet far away - or very close on the contrary - to the potential system.
# macsyfinder 20201028.dev
# models : TFF-SF-0.1b
# macsyfinder --sequence-db tests/data/base/one_replicon.fasta --db-type unordered --models-dir tests/data/models -m TFF-SF T4P_single_locus
# Systems found:
This replicon contains genetic materials needed for system TFF-SF/T4P_single_locus
system id = Unordered_T4P_single_locus_1
model = TFF-SF/T4P_single_locus
replicon = Unordered
hits = [('GCF_000006845_000250', 'T4P_pilY', 25), ('GCF_000006845_000700', 'T4P_pilY', 70), ('GCF_000006845_001030', 'T4P_pilQ', 103), ('GCF_000006845_001040', 'T4P_pilP', 104), ('GCF_000006845_001050', 'T4P_pilO', 105), ('GCF_000006845_001060', 'T4P_pilN', 106), ('GCF_000006845_001070', 'T4P_pilM', 107), ('GCF_000006845_003200', 'T4P_pilU', 320), ('GCF_000006845_004190', 'T4P_fimT', 419), ('GCF_000006845_004200', 'T4P_pilV', 420), ('GCF_000006845_004210', 'T4P_pilW', 421), ('GCF_000006845_004220', 'T4P_pilX', 422), ('GCF_000006845_004230', 'T4P_pilA', 423), ('GCF_000006845_010160', 'T4P_pilA', 1016), ('GCF_000006845_012440', 'T4P_pilA', 1244), ('GCF_000006845_014270', 'T4P_pilC', 1427), ('GCF_000006845_014280', 'T4P_pilD', 1428), ('GCF_000006845_014310', 'T4P_pilB', 1431), ('GCF_000006845_016430', 'T4P_pilT', 1643), ('GCF_000006845_016440', 'T4P_pilU', 1644)]
wholeness = 0.889
mandatory genes:
- T4P_pilE: 0 ()
- T4P_pilB: 1 (T4P_pilB)
- T4P_pilC: 1 (T4P_pilC)
- T4P_pilO: 1 (T4P_pilO)
- T4P_pilQ: 1 (T4P_pilQ)
- T4P_pilN: 1 (T4P_pilN)
- T4P_pilT: 1 (T4P_pilT)
- T4P_pilD: 1 (T4P_pilD)
accessory genes:
- T4P_pilA: 3 (T4P_pilA, T4P_pilA, T4P_pilA)
- T4P_pilV: 1 (T4P_pilV)
- T4P_pilY: 2 (T4P_pilY, T4P_pilY)
- T4P_pilW: 1 (T4P_pilW)
- T4P_pilX: 1 (T4P_pilX)
- T4P_fimT: 1 (T4P_fimT)
- T4P_pilM: 1 (T4P_pilM)
- T4P_pilP: 1 (T4P_pilP)
- T4P_pilU: 2 (T4P_pilU, T4P_pilU)
- MSH_mshM: 0 ()
neutral genes:
forbidden genes:
Use ordered replicon to have better prediction.
all_systems.tsv¶
This file contains the same information as in all_systems.txt but in tsv format. For the description of the fields, see above.
Note
This file can be easily parsed with pandas:
import pandas as pd
pot_systems = pd.read_csv('all_systems.tsv', sep='\t', comment='#')
# macsyfinder 20201028.dev
# models : TFF-SF-0.1b
# macsyfinder --sequence-db tests/data/base/one_replicon.fasta --db-type unordered --models-dir tests/data/models -m TFF-SF T4P_single_locus
# Likely Systems found:
replicon hit_id gene_name hit_pos model_fqn sys_id sys_wholeness hit_gene_ref hit_status hit_seq_len hit_i_eval hit_score hit_profile_cov hit_seq_cov hit_begin_match hit_end_match used_in
Unordered GCF_000006845_014310 T4P_pilB 1431 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilB mandatory 558 3.8e-178 589.000 0.964 0.731 146 553
Unordered GCF_000006845_014270 T4P_pilC 1427 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilC mandatory 410 1.9e-131 434.800 0.997 0.817 72 406
Unordered GCF_000006845_014280 T4P_pilD 1428 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilD mandatory 286 2.8e-82 272.300 1.000 0.829 28 264
Unordered GCF_000006845_001060 T4P_pilN 106 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilN mandatory 199 2.3e-33 112.200 0.986 0.714 7 148
Unordered GCF_000006845_001050 T4P_pilO 105 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilO mandatory 215 2.9e-37 124.800 0.980 0.693 23 171
Unordered GCF_000006845_001030 T4P_pilQ 103 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilQ mandatory 723 1.9e-62 206.600 0.935 0.238 548 719
Unordered GCF_000006845_016430 T4P_pilT 1643 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilT mandatory 347 6.9e-167 551.400 0.997 0.983 2 342
Unordered GCF_000006845_004190 T4P_fimT 419 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_fimT accessory 221 2.7e-23 78.900 0.985 0.294 7 71
Unordered GCF_000006845_004230 T4P_pilA 423 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilA accessory 162 8.6e-20 67.800 0.744 0.389 9 71
Unordered GCF_000006845_010160 T4P_pilA 1016 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilA accessory 149 1.3e-15 54.300 0.821 0.430 5 68
Unordered GCF_000006845_012440 T4P_pilA 1244 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilA accessory 129 1.5e-19 67.000 0.859 0.519 6 72
Unordered GCF_000006845_001070 T4P_pilM 107 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilM accessory 371 3.3e-43 144.300 0.988 0.429 30 188
Unordered GCF_000006845_001040 T4P_pilP 104 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilP accessory 181 2.7e-34 115.600 1.000 0.735 13 145
Unordered GCF_000006845_003200 T4P_pilU 320 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilU accessory 376 2.2e-170 562.600 0.985 0.896 16 352
Unordered GCF_000006845_016440 T4P_pilU 1644 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilU accessory 408 1.5e-127 421.800 0.994 0.833 40 379
Unordered GCF_000006845_004200 T4P_pilV 420 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilV accessory 203 9.6e-16 54.600 1.000 0.276 14 69
Unordered GCF_000006845_004210 T4P_pilW 421 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilW accessory 326 1.7e-10 38.000 0.517 0.190 17 78
Unordered GCF_000006845_004220 T4P_pilX 422 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilX accessory 203 2.8e-18 62.600 0.983 0.286 17 74
Unordered GCF_000006845_000250 T4P_pilY 25 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilY accessory 1006 2.2e-57 191.700 0.728 0.389 463 853
Unordered GCF_000006845_000700 T4P_pilY 70 TFF-SF/T4P_single_locus Unordered_T4P_single_locus_1 0.889 T4P_pilY accessory 1047 1.9e-57 191.900 0.721 0.362 516 894
uncomplete_systems.txt¶
This file is created when a search is performed in the unordered replicon mode. This file list models that probably do not have not full systems in the replicon(s). For each model, the reason why it is not fulfilled is reported, followed by the model description and the components found.
# macsyfinder 20201113.dev
# models : TFF-SF-0.1b
# macsyfinder --sequence-db tests/data/base/one_replicon.fasta --db-type unordered --models-dir tests/data/models -m TFF-SF all
# Unlikely Systems found:
This replicon probably not contains a system TFF-SF/T2SS:
The quorum of mandatory genes required (4) is not reached: 1
The quorum of genes required (6) is not reached: 2
system id = Unordered_T2SS_3
model = TFF-SF/T2SS
replicon = Unordered
hits = [('GCF_000006845_002600', 'Tad_tadD', 260), ('GCF_000006845_014280', 'T4P_pilD', 1428), ('GCF_000006845_016430', 'T4P_pilT', 1643)]
wholeness = 0.143
mandatory genes:
- T2SS_gspD: 0 ()
- T2SS_gspE: 0 ()
- T2SS_gspF: 0 ()
- T2SS_gspG: 0 ()
- T2SS_gspC: 0 ()
- T2SS_gspO: 1 (T4P_pilD)
accessory genes:
- T2SS_gspM: 0 ()
- T2SS_gspH: 0 ()
- T2SS_gspI: 0 ()
- T2SS_gspJ: 0 ()
- T2SS_gspK: 0 ()
- T2SS_gspN: 0 ()
- T2SS_gspL: 0 ()
- Tad_tadD: 1 (Tad_tadD)
neutral genes:
forbidden genes:
- T4P_pilT: 1 (T4P_pilT)
Use ordered replicon to have better prediction.
============================================================
Hmmer results’ output files¶
Raw Hmmer outputs are provided, as long with processed tabular outputs that include hits filtered as specified by the user. For instance, the Hmmer search for SctC homologs with the corresponding profile will result in the creation of two output files: “sctC.search_hmm.out” for the raw HMMER output file and “sctC.res_hmm_extract” for the output file after processing/filtering of the HMMER results by MacSyFinder.
The processed output file “sctC.res_hmm_extract” recalls on the first lines the parameters used for hits filtering and relevant information on the matches, as in this example:
# gene: sctC extract from /Users/bob/macsyfinder_results/
macsyfinder-20130128_08-57-46/sctC.search_hmm.out hmm output
# profile length= 544
# i_evalue threshold= 0.001000
# coverage threshold= 0.500000
# hit_id replicon_name position_hit hit_sequence_length gene_name gene_system i_eval score
profile_coverage sequence_coverage begin end
PSAE001c01_006940 PSAE001c01 3450 803 sctC T3SS 1.1e-41 141.6
0.588235 0.419676 395 731
PSAE001c01_018920 PSAE001c01 4634 776 sctC T3SS 9.2e-48 161.7
0.976103 0.724227 35 596
PSAE001c01_031420 PSAE001c01 5870 658 sctC T3SS 2.7e-52 176.7
0.963235 0.844985 49 604
PSAE001c01_051090 PSAE001c01 7801 714 sctC T3SS 1.9e-46 157.4
0.571691 0.463585 374 704
Logs and configuration files¶
Three specific output files are systematically built, whatever the search mode, to store information on MacSyFinder’s execution:
macsyfinder.conf - contains the configuration information of the run. It is useful to recover all the parameters used for the run.
macsyfinder.log - the log file, contains raw information on the run. Please send it to us with any bug report.