Releases · MikaelSlevinsky/FastTransforms

25 Nov 16:12

MikaelSlevinsky

v0.4.0

b60c2fb

Spherical isometries, triangular block-banded eigensolvers, generalized Zernike and Dunkl-Xu OPs on the disk

This release:

adds support for spherical isometries (rotations and reflections and general orthogonal coordinate transformations of spherical harmonic expansions), thanks in part to @ikaroruan.
includes an implementation of the factored alternating direction implicit method for diagonal matrix equations with low-rank right-hand sides, thanks in part to @BrockKlippenstein.
adds methods for the complete elliptic integrals K(k) and E(k) and Jacobian elliptic functions sn(x,k), cn(x,k), and dn(x,k).
converts associated classical orthogonal polynomial connection problems from two parameter triangular banded eigenvalue problems to quadratic triangular banded generalized eigenvalue problems.
adds fast divide and conquer methods for block 2x2 triangular-banded generalized eigenvalue problems (for linearizations of the QEPs above).
generalizes Zernike polynomials to a real orthonormal basis for L²(D², r^2α+1(1-r^2)^β dr dθ), a breaking change to the API.
adds Dunkl-Xu polynomials on the disk, a real orthonormal basis for L²(D², (1-x^2-y^2)^β dx dy), updating the holomorphic.c example to show the duality of the approaches.

Assets 2

05 Aug 14:29

MikaelSlevinsky

v0.3.4

6d28165

Support AArch64 linux

The SIMD required here is predominately double precision, so this invokes NEON on armv8-a, which is always available. This isn't applicable for armv7, though that distinction could be made later.

Also adds missing fallbacks for AVX and AVX_FMA spin-weighted spherical harmonic computational kernels.

Assets 2

29 Jul 04:56

MikaelSlevinsky

v0.3.3

670ca16

Cleanup Code

Allow code to be compiled by clang, make quadmath optional, sequester x86-specific code, fix miscellaneous warnings generated by clang.

Assets 2

15 May 02:53

MikaelSlevinsky

v0.3.2

b155651

Fix C -- Julia BigFloat interoperability

Julia BigFloat does not have the same size as mpfr_t.
So we give all Julia-owned data its own address, and dereference to retrieve the number.

Assets 2

05 May 01:21

MikaelSlevinsky

v0.3.1

202accd

AVX (FMA) support for spin-weighted spherical harmonics

fix tetrahedral drivers to use AVX (not 512F)
Make docs more precise
Docs use an independent job on Travis

Assets 2

26 Apr 17:41

MikaelSlevinsky

v0.3.0

dd9de00

Spin-weighted spherical harmonics

New features in this release:

Support for spin-weighted spherical harmonic transforms. They are orthonormalized in L^2, with the complex Fourier series as longitudinal basis and with complex coefficients.
Three new functions (x2 for float/double) for Horner's rule and Clenshaw's algorithm for Chebyshev series and for orthogonal polynomial series as well.

Improvements in this release:

The code is now designed with a cross-compiler in mind. For performance-critical tasks, SIMD is hidden from the user interface and instead is dispatched based on CPU ID. This allows a cross-compiler to include functions with more advanced SIMD than legal for the host computer, but a runtime check ensures that only the best SIMD level is dispatched (closes #12 and closes #41).
The computational kernels for the spherical/triangular/disk harmonics are refactored to not only use the correct types of registers, but also help the compiler maximize throughput. This relies on a property of Givens rotations that two adjacent rotations commute if they do not act on the same rows. This property allows one to re-order the Givens rotations to increase the ratio of computation to memory loads/stores. The computational kernels and execute drivers are largely generated by a macro, which means the code may already be prepared for AVX-1024 when the instruction sets are available in GCC. Part of this is the introduction of the ft_simd struct to store a bit-field of a variety of SIMD extensions.
The API for the computational kernels now includes transformation from orders m1 to m2 (rather than 0/1 to m), and includes a stride parameter in the data.
The real-to-real FFTW routines now use fftw_execute_dft_r2c and fftw_execute_dft_r2c instead of FFTW_R2HC and FFTW_HC2R-type real-to-real transforms to avoid a global transpose of the data.
The performance benchmark timings were not scaling as O(n3) because one needs to call a function a few times, typically at least twice, before peak performance is realized. These are now updated and the macro FT_TIME helps to bring this support system-wide.

New examples in this release:

spinweighted.c is a basic tutorial on how to use spin-weighted spherical harmonic transforms.

Releases no longer trigger the attachment of binaries, as compilation with -march=native may fail on a host computer.

Assets 2

14 Jan 21:45

MikaelSlevinsky

v0.2.13

a015672

Ofast to O3

-Ofast enables -ffast-math, which ignores strict IEEE compliance. This affects the global state for denormal numbers through DAZ (denormals-are-zero) and FTZ (flush-to-zero).
With -fno-protect-parens as well, -Ofast overly aggressively optimizes flops in definitions such as

static FLT X(diff)(FLT x, FLT y) {return x - y;}

resulting in the failure of test_tdc for the past while. It should be possible to reactivate this test now.

Assets 14

24 Nov 02:26

MikaelSlevinsky

v0.2.12

00fc862

Update CFLAGS

only add -march=native and friends by default. If CFLAGS is erstwhile defined, then these are not added.

Assets 14

21 Nov 04:37

MikaelSlevinsky

v0.2.11

a3c6365

Expand platform support

If target is not defined, gcc -dumpversion is used. This helps in a cross-compiler environment where OS may not be defined.

march=native, mtune=native, and mvzeroupper are only available on x86.

Assets 14

20 Nov 22:33

MikaelSlevinsky

v0.2.10

3de585e

More versatile make options

Make improvements (#40)

* introduce FT_LIBBLAS

with a default backup ==blas

check if linking to gmp or mpir is necessary

* position of libmath

* FT_PREFIX seems simpler than FT_USE_PREDEFINED_LIBRARIES

similarly for FT_LIBBLAS => FT_BLAS

Assets 14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Releases: MikaelSlevinsky/FastTransforms

Spherical isometries, triangular block-banded eigensolvers, generalized Zernike and Dunkl-Xu OPs on the disk

Support AArch64 linux

Cleanup Code

Fix C -- Julia BigFloat interoperability

AVX (FMA) support for spin-weighted spherical harmonics

Spin-weighted spherical harmonics

Ofast to O3

Update CFLAGS

Expand platform support

More versatile make options