Skip to content

11. Config File

Nico Louwen edited this page Feb 11, 2025 · 1 revision

BiG-SCAPE 2 makes use of a config.yml file to store and manipulate parameters designed for advanced use.


Profiler

PROFILER_UPDATE_INTERVAL: 0.5

Update interval in seconds when profiler functionality is active.

Input

MERGED_CAND_CLUSTER_TYPE:
- Chemical_hybrid
- Interleaved

List of cand_cluster types where subrecords will be merged. See more information here.

MIN_BGC_LENGTH: 0
MAX_BGC_LENGTH: 500000

Minimum and maximum BGC lengths to be included in the analysis in the number of base pairs.

CDS_OVERLAP_CUTOFF: 0.1

Specify at which overlap percentage (as a decimal) two CDS in a .gbk are considered to overlap. This preserves longest overlapping CDS.

DOMAIN_OVERLAP_CUTOFF: 0.1

Specify at which overlap percentage (as a decimal) two domains in a CDS are considered to overlap. Domain with the best score is kept. See more information here.

LCS

REGION_MIN_LCS_LEN: 0.1

Minimum length percentage for accepting a Longest Common Subcluster (LCS), which must be satisfied in at least one of the compared records, measured in fraction of included domains, when comparing region records or cand_cluster records. See more information here.

PROTO_MIN_LCS_LEN: 0

Minimum length percentage for accepting a Longest Common Subcluster (LCS), which must be satisfied in at least one of the compared records, measured in fraction of included domains, when comparing protocluster records or protocore records. See more information here.

Extend

REGION_MIN_EXTEND_LEN: 0.3

Minimum length percentage for accepting an extended LCS slice, which must be satisfied in at least one of the compared records, measured in fraction of included domains, when comparing region records or cand_cluster records with no biosynthetic domains in the extended slice. See more information here.

REGION_MIN_EXTEND_LEN_BIO: 0.2

Minimum length percentage for accepting an extended LCS slice, which must be satisfied in at least one of the compared records, measured in fraction of included domains, when comparing protocluster records or protocore records with at least one biosynthetic domain in the extended slice. See more information here.

PROTO_MIN_EXTEND_LEN: 0.2

Minimum length percentage for accepting an extended LCS slice, which must be satisfied in at least one of the compared records, measured in fraction of included domains, when comparing protocluster records or protocore records with at least one biosynthetic domain in the extended slice. See more information here.

NO_MIN_CLASSES: Terpene

List of product classes that do not require a minimum length. In practice, this means that an LCS and/or Extended slice of at least 1 domain will be accepted, so long as this is a core biosynthetic domain. See more information here.

EXTEND_MATCH_SCORE: 5
EXTEND_MISMATCH_SCORE: -3
EXTEND_GAP_SCORE: -2

Integer scoring metrics used in the LCS-extend algorithm for match, mismatch and gap. See more information here.

EXTEND_MAX_MATCH_PERC = 0.1

Maximum distance to accept a matching domain as an actual match during LCS-extend as a percentage of total domains present in the compared record. See more information here.

GCF Calling

PREFERENCE: 0.0

Internal parameter of the Affinity Propagation clustering algorithm, governs the number of families created. Higher preference will result in more families and vice versa. See more information here.

DENSITY: 0.85
DENSE_PREFERENCE: -5.0

Connected component density threshold (equal or larger) and Affinity Propagation preference to be used on dense connected components.

GCF Tree

TOP_FREQS: 3

The number of common domains (present in the exemplar BGC record) used to generate GCF trees in top frequencies of occurrence. See more information here.

Anchor Domains

Domains which are given higher weight in the DSS index of the distance calculation. (see config.yml for actual list).

Legacy antiSMASH classes

List and grouping of antiSMASH classes that are used in the --classify legacy mode, and for which --legacy-weights have been optimized. These have been updated up to antiSMASH version 7.0, and will not be further maintained. (see config.yml for actual list).