Skip to content

Releases: MikaelSlevinsky/FastTransforms

Spherical isometries, triangular block-banded eigensolvers, generalized Zernike and Dunkl-Xu OPs on the disk

25 Nov 16:12
b60c2fb
Compare
Choose a tag to compare

This release:

  • adds support for spherical isometries (rotations and reflections and general orthogonal coordinate transformations of spherical harmonic expansions), thanks in part to @ikaroruan.
  • includes an implementation of the factored alternating direction implicit method for diagonal matrix equations with low-rank right-hand sides, thanks in part to @BrockKlippenstein.
  • adds methods for the complete elliptic integrals K(k) and E(k) and Jacobian elliptic functions sn(x,k), cn(x,k), and dn(x,k).
  • converts associated classical orthogonal polynomial connection problems from two parameter triangular banded eigenvalue problems to quadratic triangular banded generalized eigenvalue problems.
  • adds fast divide and conquer methods for block 2x2 triangular-banded generalized eigenvalue problems (for linearizations of the QEPs above).
  • generalizes Zernike polynomials to a real orthonormal basis for L2(D2, r2α+1(1-r^2)β dr dθ), a breaking change to the API.
  • adds Dunkl-Xu polynomials on the disk, a real orthonormal basis for L2(D2, (1-x^2-y^2)β dx dy), updating the holomorphic.c example to show the duality of the approaches.

Support AArch64 linux

05 Aug 14:29
6d28165
Compare
Choose a tag to compare

The SIMD required here is predominately double precision, so this invokes NEON on armv8-a, which is always available. This isn't applicable for armv7, though that distinction could be made later.

Also adds missing fallbacks for AVX and AVX_FMA spin-weighted spherical harmonic computational kernels.

Cleanup Code

29 Jul 04:56
Compare
Choose a tag to compare

Allow code to be compiled by clang, make quadmath optional, sequester x86-specific code, fix miscellaneous warnings generated by clang.

Fix C -- Julia BigFloat interoperability

15 May 02:53
Compare
Choose a tag to compare

Julia BigFloat does not have the same size as mpfr_t.
So we give all Julia-owned data its own address, and dereference to retrieve the number.

AVX (FMA) support for spin-weighted spherical harmonics

05 May 01:21
Compare
Choose a tag to compare
  • fix tetrahedral drivers to use AVX (not 512F)
  • Make docs more precise
  • Docs use an independent job on Travis

Spin-weighted spherical harmonics

26 Apr 17:41
Compare
Choose a tag to compare

New features in this release:

  • Support for spin-weighted spherical harmonic transforms. They are orthonormalized in L^2, with the complex Fourier series as longitudinal basis and with complex coefficients.
  • Three new functions (x2 for float/double) for Horner's rule and Clenshaw's algorithm for Chebyshev series and for orthogonal polynomial series as well.

Improvements in this release:

  • The code is now designed with a cross-compiler in mind. For performance-critical tasks, SIMD is hidden from the user interface and instead is dispatched based on CPU ID. This allows a cross-compiler to include functions with more advanced SIMD than legal for the host computer, but a runtime check ensures that only the best SIMD level is dispatched (closes #12 and closes #41).
  • The computational kernels for the spherical/triangular/disk harmonics are refactored to not only use the correct types of registers, but also help the compiler maximize throughput. This relies on a property of Givens rotations that two adjacent rotations commute if they do not act on the same rows. This property allows one to re-order the Givens rotations to increase the ratio of computation to memory loads/stores. The computational kernels and execute drivers are largely generated by a macro, which means the code may already be prepared for AVX-1024 when the instruction sets are available in GCC. Part of this is the introduction of the ft_simd struct to store a bit-field of a variety of SIMD extensions.
  • The API for the computational kernels now includes transformation from orders m1 to m2 (rather than 0/1 to m), and includes a stride parameter in the data.
  • The real-to-real FFTW routines now use fftw_execute_dft_r2c and fftw_execute_dft_r2c instead of FFTW_R2HC and FFTW_HC2R-type real-to-real transforms to avoid a global transpose of the data.
  • The performance benchmark timings were not scaling as O(n3) because one needs to call a function a few times, typically at least twice, before peak performance is realized. These are now updated and the macro FT_TIME helps to bring this support system-wide.

New examples in this release:

  • spinweighted.c is a basic tutorial on how to use spin-weighted spherical harmonic transforms.

Releases no longer trigger the attachment of binaries, as compilation with -march=native may fail on a host computer.

Ofast to O3

14 Jan 21:45
a015672
Compare
Choose a tag to compare

-Ofast enables -ffast-math, which ignores strict IEEE compliance. This affects the global state for denormal numbers through DAZ (denormals-are-zero) and FTZ (flush-to-zero).
With -fno-protect-parens as well, -Ofast overly aggressively optimizes flops in definitions such as

static FLT X(diff)(FLT x, FLT y) {return x - y;}

resulting in the failure of test_tdc for the past while. It should be possible to reactivate this test now.

Update CFLAGS

24 Nov 02:26
Compare
Choose a tag to compare

only add -march=native and friends by default. If CFLAGS is erstwhile defined, then these are not added.

Expand platform support

21 Nov 04:37
Compare
Choose a tag to compare

If target is not defined, gcc -dumpversion is used. This helps in a cross-compiler environment where OS may not be defined.

march=native, mtune=native, and mvzeroupper are only available on x86.

More versatile make options

20 Nov 22:33
3de585e
Compare
Choose a tag to compare
Make improvements (#40)

* introduce FT_LIBBLAS

with a default backup ==blas

check if linking to gmp or mpir is necessary

* position of libmath

* FT_PREFIX seems simpler than FT_USE_PREDEFINED_LIBRARIES

similarly for FT_LIBBLAS => FT_BLAS