Skip to content

06. BiG SCAPE Classes

Nico Louwen edited this page Feb 11, 2025 · 1 revision

BiG-SCAPE 2 can run the analysis in a ‘mix’ bin, which contains all input and reference BGCs, as well as run the analysis in several antiSMASH class/category based bins. All-vs-all comparisons are only performed within a bin and not across bins, which can reduce run times and computational resources required. The downside of not running the analysis in a ‘mix’ bin is that BGCs that may be related but were placed in different classes/categories (due to truncation, misassemblies, etc) will not get compared to one another.

Note: BiG-SCAPE 2 will run only the category-based bins by default. To change this behavior, toggle --mix, and use --classify and to change classification behavior. Mix AND classify-based bins can be used simultaneously.

Legacy classify

BiG-SCAPE 1 defined eight classes to separate clusters into, based on the product annotation from antiSMASH, see the description of this behavior here. To use these BiG-SCAPE 1 predefined groups (PKS1, PKSOther, NRPS, NRPS-PKS-hybrid, RiPP, Saccharide, Terpene, Others) use --classify legacy. This classify mode will also make use of the BiG-SCAPE 1 --legacy-weights for distance calculations - these are specific weights given to each of the distance score components attributed based on BGC class-grouping. This feature is available for backwards compatibility with antiSMASH versions up to 7.0. For higher antiSMASH versions, use at your own risk, as BGC classes may have changed. All antiSMASH classes that this legacy mode does not recognize will be grouped in 'Others'.

Classify based on antiSMASH class/category

BiG-SCAPE 2 allows classification based on antiSMASH classes (e.g. T1PKS), or antiSMASH categories (broader grouping of product types, e.g. PKS) to run analyses on class/category-based bins. Can be used in combination with --legacy-weights if BGC .gbks have been produced by antiSMASH version6 or higher. For older antiSMASH versions, either use --classify legacy or do not select --legacy-weights, thereby performing the weighted distance calculations based on the generic 'mix' weights.

Default: --classify category

Note: Consider classification choices carefully especially when running datasets including GBKs which have been processed with several antiSMASH versions. For example, since antiSMASH categories have only been introduced in antiSMASH 6, mixing output from antiSMASH versions 5 and 6 with --classify category will create a Categoryless bin for all GBKs which have been outputted by antiSMASH 5, and category-based bins for all GBKs outputted by antiSMASH 6.

Hybrids

When using class-based bins, BiG-SCAPE 2’s default is to create a hybrid class/category bin, i.e. terpene-nrps BGC would be added to a terpene.NRPS bin/network. Toggle --hybrids-off to deactivate this behavior and add this terpene-nrps BGC record to both the terpene and NRPS bins/networks.

Classify in the context of the BiG-SCAPE Query workflow

By default BiG-SCAPE will compare the query BGC record against any other supplied reference BGC records regardless of antiSMASH product class/category. Instead, select 'class' or 'category' to run analyses on one class-specific bin, in which case only reference BGC records with the same class/category as the query record will be compared. Can be used in combination with --legacy-weights for .gbks produced by antiSMASH version 6 or higher. For older antiSMASH versions or if --legacy-weights is not selected, BiG-SCAPE will use the generic 'mix' weights: {JC: 0.2, AI: 0.05, DSS: 0.75, Anchor boost: 2.0}. Hybrid handling is not applicable to bigscape query.