-
Notifications
You must be signed in to change notification settings - Fork 26
06. BiG SCAPE Classes
BiG-SCAPE 2 can run the analysis in a ‘mix’ bin, which contains all input and reference BGCs, as well as run the analysis in several antiSMASH class/category based bins. All-vs-all comparisons are only performed within a bin and not across bins, which can reduce run times and computational resources required. The downside of not running the analysis in a ‘mix’ bin is that BGCs that may be related but were placed in different classes/categories (due to truncation, misassemblies, etc) will not get compared to one another.
Note: BiG-SCAPE 2 will run only the category-based bins by default. To change this behavior, toggle --mix
, and use --classify
and to change classification behavior. Mix AND classify-based bins can be used simultaneously.
BiG-SCAPE 1 defined eight classes to separate clusters into, based on the product annotation from antiSMASH, see the description of this behavior here. To use these BiG-SCAPE 1 predefined groups (PKS1, PKSOther, NRPS, NRPS-PKS-hybrid, RiPP, Saccharide, Terpene, Others) use --classify legacy
. This classify mode will also make use of the BiG-SCAPE 1 --legacy-weights
for distance calculations - these are specific weights given to each of the distance score components attributed based on BGC class-grouping. This feature is available for backwards compatibility with antiSMASH versions up to 7.0. For higher antiSMASH versions, use at your own risk, as BGC classes may have changed. All antiSMASH classes that this legacy mode does not recognize will be grouped in 'Others'.
BiG-SCAPE 2 allows classification based on antiSMASH classes (e.g. T1PKS), or antiSMASH categories (broader grouping of product types, e.g. PKS) to run analyses on class/category-based bins. Can be used in combination with --legacy-weights
if BGC .gbks
have been produced by antiSMASH version6 or higher. For older antiSMASH versions, either use --classify legacy
or do not select --legacy-weights
, thereby performing the weighted distance calculations based on the generic 'mix' weights.
Default: --classify category
Note: Consider classification choices carefully especially when running datasets including GBKs
which have been processed with several antiSMASH versions. For example, since antiSMASH categories have only been introduced in antiSMASH 6, mixing output from antiSMASH versions 5 and 6 with --classify category
will create a Categoryless
bin for all GBKs
which have been outputted by antiSMASH 5, and category-based bins for all GBKs
outputted by antiSMASH 6.
When using class-based bins, BiG-SCAPE 2’s default is to create a hybrid class/category bin, i.e. terpene-nrps
BGC would be added to a terpene.NRPS
bin/network. Toggle --hybrids-off
to deactivate this behavior and add this terpene-nrps
BGC record to both the terpene
and NRPS
bins/networks.
By default BiG-SCAPE will compare the query BGC record against any other supplied reference BGC records regardless of antiSMASH product class/category. Instead, select 'class' or 'category' to run analyses on one class-specific bin, in which case only reference BGC records with the same class/category as the query record will be compared. Can be used in combination with --legacy-weights
for .gbks
produced by antiSMASH version 6 or higher. For older antiSMASH versions or if --legacy-weights
is not selected, BiG-SCAPE will use the generic 'mix' weights: {JC: 0.2, AI: 0.05, DSS: 0.75, Anchor boost: 2.0}
. Hybrid handling is not applicable to bigscape query
.