Skip to content
Milot Mirdita edited this page Jan 10, 2023 · 18 revisions

Foldseek User Guide


Summary

System requirements

Foldseek runs on modern UNIX operating systems and is tested on Linux and macOS. Additionally, we are providing a preview version for Windows.

Foldseek takes advantage of multi-core systems through OpenMP and uses the SIMD capabilities of the system. Optimal performance requires a system supporting the AVX2 instruction set, however SSE4.1 and very old systems with SSE2 are also supported. It also supports the PPC64LE and ARM64 processor architectures, these require support for the AltiVec or NEON SIMD instruction sets, respectively.

To check if Foldseek supports your system execute the following commands, depending on your operating system:

Check system requirements under Linux

[ $(uname -m) = "x86_64" ] && echo "64bit: Yes" || echo "64bit: No"
grep -q avx2 /proc/cpuinfo && echo "AVX2: Yes" || echo "AVX2: No"
grep -q sse4_1 /proc/cpuinfo && echo "SSE4.1: Yes" || echo "SSE4.1: No"
# for very old systems which support neither SSE4.1 or AVX2
grep -q sse2 /proc/cpuinfo && echo "SSE2: Yes" || echo "SSE2: No"

Check system requirements under macOS

[ $(uname -m) = "x86_64" ] && echo "64bit: Yes" || echo "64bit: No"
sysctl machdep.cpu.leaf7_features | grep -q AVX2 && echo "AVX2: Yes" || echo "AVX2: No"
sysctl machdep.cpu.features | grep -q SSE4.1 && echo "SSE4.1: Yes" || echo "SSE4.1: No"

Installation

Foldseek can be installed for Linux or macOS

(1) downloading a statically compiled version For Linux computer with supports AVX2 use:

wget https://mmseqs.com/foldseek/foldseek-linux-avx2.tar.gz
tar xvzf foldseek-linux-avx2.tar.gz
export PATH=$(pwd)/foldseek/bin/:$PATH

Linux with SSE4.1

wget https://mmseqs.com/foldseek/foldseek-linux-sse41.tar.gz
tar xvzf foldseek-linux-sse41.tar.gz
export PATH=$(pwd)/foldseek/bin/:$PATH

macOS build (universal binary with SSE4.1/AVX2/M1 NEON)

wget https://mmseqs.com/foldseek/foldseek-osx-universal.tar.gz
tar xvzf foldseek-osx-universal.tar.gz
export PATH=$(pwd)/foldseek/bin/:$PATH

(2) using bioconda

conda install -c conda-forge -c bioconda foldseek

(3) compiling the from source (see below),

Compile from source under Linux

Compiling Foldseek from source has the advantage that it will be optimized to the specific system, which should improve its performance. To compile Foldseek git, g++ (4.9 or higher) and cmake (2.8.12 or higher) are needed. Afterwards, the foldseek binary will be located in build/bin/.

git clone https://github.com/steineggerlab/foldseek.git
cd foldseek
mkdir build
cd build
cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. ..
make
make install 
export PATH=$(pwd)/bin/:$PATH

See the Customizing compilation through CMake section if you compile Foldseek on a different system than the one where it will eventually reun.

Compile from source under macOS

Compiling under Clang

To compile Foldseek with (Apple-)Clang you need to install either XCode or the Command Line Tools. You also need libomp. We recommend installing it using Homebrew:

brew install cmake libomp zlib bzip2

CMake currently does not correctly identify paths to libomp. Use the script in util/build_osx.sh to compile Foldseek. The resulting binary will be placed in OUTPUT_DIR/mmseqs.

./util/build_osx.sh PATH_TO_FOLDSEEK_REPO OUTPUT_DIR
Compiling using GCC

Please install the following packages with Homebrew:

brew install cmake gcc@11 zlib bzip2

Use the following cmake call:

CC="gcc-11" CXX="g++-11" cmake -DCMAKE_BUILD_TYPE=RELEASE -DCMAKE_INSTALL_PREFIX=. ..

Customizing compilation through CMake

Most of the MMseqs2 CMake options also apply to Foldseek, refer to MMseqs2's user guide for details.

Enable Google Cloud Storage support for createdb

Install the google-cloud-cpp package from vcpkg:

git clone https://github.com/microsoft/vcpkg.git
./vcpkg/bootstrap-vcpkg.sh
./vcpkg/vcpkg install google-cloud-cpp

Foldseek can now be compiled with GCS support:

cd path-to-foldseek
mkdir build && cd build
cmake -DHAVE_GCS=1 -DCMAKE_TOOLCHAIN_FILE=[path to vcpkg]/scripts/buildsystems/vcpkg.cmake ..
make -j $(nproc --all)

Frequently Asked Questions

What is the hit probability?

To provide the user an easy to interpret measure how significant hits are, Foldseek estimates the probability of true positives (i.e. query and target are homologous) for each hit from the structural bit score. The probability is the fraction of TP hits (same superfamily) from TP and FP (not same fold) hits found at the score on average. For this, we estimate the bit score distributions of TP and FP hits. Both score distributions were fitted on SCOPe40. For example, Foldseek finds around the same number of FP and TP with a score of 51 in SCOPe40, the resulting probability for a hit with score 51 is thus 50%.

Relationship between score and probability in Foldseek
Clone this wiki locally