VectorFFT

VectorFFT is a vectorized, pure C FFT library optimized for x86 processors (AVX-512, AVX2, SSE2) with zero external dependencies. It implements mixed-radix algorithms for common sizes and Bluestein's method for arbitrary lengths, with OpenMP multi-threading for large transforms.

Designed for both digital signal processing and financial applications requiring high-performance spectral analysis. VectorFFT combines the speed of hand-optimized SIMD code with the simplicity of standalone C, making it ideal for embedded systems, real-time applications, and projects where FFTW/MKL are overkill.

Features

Mixed-Radix DIT for lengths factorable into small primes (2, 3, 4, 5, 7, 8, 11, 13, …)
- *Radix-2, 3, 4, 5, 7, 8, 11, 13, 16, 32 now with fully tested AVX2 + SSE2 + scalar codepaths
  - Vector core processes 4 points per iteration
  - SSE2 tails handle 1–2 leftover samples
  - Scalar tail covers 0..3 leftovers or non-SIMD builds
  - Explicit prefetching of outputs and twiddles for cache-friendliness
  - FMADD/FMSUB macros used to leverage FMA when available
  - Error handling for insufficient scratch or missing precomputed twiddles
Bluestein’s Algorithm for arbitrary (prime) lengths via chirp z-transform
AVX2 + FMA SIMD acceleration
- Common butterfly math factored into helpers (e.g. cmul_soa_avx, rot90_soa_avx)
- Works with unaligned data (_mm256_loadu_pd, _mm256_storeu_pd)
- Prefetching added to hide latency in deeper radices
SSE2 acceleration
- Two-lane fallback for tails in radices 3, 5, 11
- Safe unaligned loads/stores (_mm_loadu_pd / _mm_storeu_pd)
Robust scratch-buffer model
- No per-call malloc — all recursion stages and local twiddles use a preallocated scratch buffer
- Layout partitioned: sub-FFT outputs + (optional) stage twiddles
- Prints an error and aborts stage if alignment or sizing is invalid
Precomputed or on-the-fly twiddles
- Radices 2, 3, 4, 5, 7, 8, 11, 13 supported
- If no precompute is provided, stage-local twiddles are generated in the scratch tail
Real ↔ Complex FFTs via half-complex storage
Minimal footprint (~25 KiB code, no giant tables)
CMake build system with optional GoogleTest harness
Comprehensive test suite
- Round-trip forward/backward FFTs
- Radix-3 and 5 vector/scalar tails
- Radix-11 special-case coverage (k=0,1 SSE2 lanes)
- Error conditions (invalid N, missing scratch, bad alignment)
- MSE checks against reference FFTW within 1e-10

Comparison to FFTW

Criterion	HighSpeedFFT	FFTW
Algorithm support	Mixed-radix + Bluestein	Cooley–Tuk + Rader + Bluestein
Vectorization	Hand-tuned AVX2/FMA + SSE2	Auto-SIMD (SSE/AVX)
Dependencies	`libm` only	`pthread`, `libsimd`, …
Memory layout	contiguous real/imag buffers	plan-specific allocators
Performance (bench)	≈FFTW on radix-friendly N, within 5–10%	highly tuned wide radix
Footprint	≈25 KiB code	≈300 KiB code + data

Benchmarks on Intel Xeon Gold 6226R (single thread):

Mixed-radix (2^12–2^20): within 5–10% of FFTW

Bluestein on large primes: ~30% slower (extra convolution)

Usage

#include "highspeedFFT.h"

// Complex FFT
fft_data *in  = malloc(N*sizeof*in),
         *out = malloc(N*sizeof*out);
fft_object fwd = fft_init(N, +1);
fft_exec(fwd, in, out);
fft_object inv = fft_init(N, -1);
fft_exec(inv, out, in);  // normalize by 1/N yourself

// Real FFT
fft_real_object r2c = fft_real_init(N, +1);
fft_data *hc = malloc((N/2+1)*sizeof*hc);
fft_r2c_exec(r2c, real_in, hc);
fft_real_object c2r = fft_real_init(N, -1);
fft_c2r_exec(c2r, hc, real_out);  // normalize by 1/N

Build

mkdir build && cd build
cmake ..                    # optionally: -DENABLE_GTEST=ON
make                        # builds static lib + test harness
./test_mixedRadixFFT        # standalone tests
# or, with GoogleTest:
cmake -DENABLE_GTEST=ON ..
make test

CMake Options

ENABLE_GTEST (OFF/ON): GoogleTest vs. standalone harness
SIMD flags auto-detected:
- GCC/Clang: -O3 -mavx2 -mfma -msse2
- MSVC: /O2 /arch:AVX2 /fp:fast

Name		Name	Last commit message	Last commit date
Latest commit History 287 Commits
.vscode		.vscode
docs		docs
src		src
test		test
.gitattributes		.gitattributes
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

VectorFFT

VectorFFT is a vectorized, pure C FFT library optimized for x86 processors (AVX-512, AVX2, SSE2) with zero external dependencies. It implements mixed-radix algorithms for common sizes and Bluestein's method for arbitrary lengths, with OpenMP multi-threading for large transforms.

Features

Comparison to FFTW

Usage

Build

CMake Options

About

Uh oh!

Releases

Packages

Languages

License

Tugbars/VectorFFT

Folders and files

Latest commit

History

Repository files navigation

VectorFFT

VectorFFT is a vectorized, pure C FFT library optimized for x86 processors (AVX-512, AVX2, SSE2) with zero external dependencies. It implements mixed-radix algorithms for common sizes and Bluestein's method for arbitrary lengths, with OpenMP multi-threading for large transforms.

Features

Comparison to FFTW

Usage

Build

CMake Options

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages