-
Notifications
You must be signed in to change notification settings - Fork 309
Fast homogeneous and rotation matrix multiplications #1845
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Draft
SamFlt
wants to merge
26
commits into
lagadic:master
Choose a base branch
from
SamFlt:fast_homogeneous_proj
base: master
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Draft
Changes from all commits
Commits
Show all changes
26 commits
Select commit
Hold shift + click to select a range
711fe77
SIMD homogeneous matrix multiplicaiton
SamFlt 196660c
implement first version of simd for rotaiton matrix
SamFlt 729110a
AVX implem for rotation matmul
SamFlt e230a72
Fix intrinsics usage, performance improvement
SamFlt 1ff0b57
Simd version of the rbt dense depth, AVX matmul version for 3xN input…
SamFlt 690c718
Export SIMD intrinsics utils in a separate header
SamFlt 4c88d39
Fix test, improve vpRBDenseDepth
SamFlt 77adf9d
Remove debug prints
SamFlt 1f3c7c8
Move initVVS to cpp file, resize matrix there
SamFlt cc729b6
Add ENABLE_NATIVE_ARCH option for gcc
SamFlt a05c329
Remove reference to MBT tukey estimator, disable prints
SamFlt d847bfb
Fix SSE3 flag check
SamFlt c58271a
Merge branch 'master' into fast_homogeneous_proj
fspindle c4bc02a
Update copyright headers
fspindle 49e8c83
Fix warning unused variable
fspindle ae5966b
Remove useless empty lines
fspindle 3f3bac2
Fix warning variable set but not used
fspindle adc26ab
Remove vpSIMD namespace from doxygen doc
fspindle c6ff19a
Fix bug when input vector is not transposed and AVX or SSE2 not avail…
fspindle 51e89a3
Fix bug when input vector is not transposed and AVX or SSE3 not avail…
fspindle 09fc8cd
Cleanup tests to help debugging
fspindle f9f59e2
Fix _mm_hadd_pd() usage that requires SSE3 on pixi windows CI
fspindle 06b044c
Merge branch 'master' into fast_homogeneous_proj
fspindle 7502eac
Make vpSIMDUtils.h private to not expose SIMD code to the user
fspindle 1f60797
Remove to make code more explicit
fspindle 671b883
Make test independent from SIMD instruction set
fspindle File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,231 @@ | ||
| /* | ||
| * ViSP, open source Visual Servoing Platform software. | ||
| * Copyright (C) 2005 - 2025 by Inria. All rights reserved. | ||
| * | ||
| * This software is free software; you can redistribute it and/or modify | ||
| * it under the terms of the GNU General Public License as published by | ||
| * the Free Software Foundation; either version 2 of the License, or | ||
| * (at your option) any later version. | ||
| * See the file LICENSE.txt at the root directory of this source | ||
| * distribution for additional information about the GNU GPL. | ||
| * | ||
| * For using ViSP with software that can not be combined with the GNU | ||
| * GPL, please contact Inria about acquiring a ViSP Professional | ||
| * Edition License. | ||
| * | ||
| * See https://visp.inria.fr for more information. | ||
| * | ||
| * This software was developed at: | ||
| * Inria Rennes - Bretagne Atlantique | ||
| * Campus Universitaire de Beaulieu | ||
| * 35042 Rennes Cedex | ||
| * France | ||
| * | ||
| * If you have questions regarding the use of this file, please contact | ||
| * Inria at visp@inria.fr | ||
| * | ||
| * This file is provided AS IS with NO WARRANTY OF ANY KIND, INCLUDING THE | ||
| * WARRANTY OF DESIGN, MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. | ||
| * | ||
| * Description: | ||
| * SIMD utilities. | ||
| */ | ||
|
|
||
| /*! | ||
| \file vpSIMDUtils.h | ||
| \brief Header that defines and includes useful SIMD routines and macros | ||
| */ | ||
|
|
||
| #ifndef VP_SIMD_UTILS_H | ||
| #define VP_SIMD_UTILS_H | ||
| #include <visp3/core/vpConfig.h> | ||
|
|
||
| #ifndef DOXYGEN_SHOULD_SKIP_THIS | ||
|
|
||
| #if defined __SSE2__ || defined _M_X64 || (defined _M_IX86_FP && _M_IX86_FP >= 2) | ||
| #include <emmintrin.h> | ||
| #include <immintrin.h> | ||
| #include <smmintrin.h> | ||
|
|
||
| #define VISP_HAVE_SSE2 1 | ||
| #endif | ||
|
|
||
| #if defined __AVX2__ | ||
| #define VISP_HAVE_AVX2 1 | ||
| #endif | ||
|
|
||
| #if defined __AVX__ | ||
| #define VISP_HAVE_AVX 1 | ||
| #endif | ||
|
|
||
| // https://stackoverflow.com/a/40765925 | ||
| #if !defined(__FMA__) && defined(__AVX2__) | ||
| #define __FMA__ 1 | ||
| #endif | ||
|
|
||
|
|
||
| #if defined(__FMA__) | ||
| #define VISP_HAVE_FMA | ||
| #endif | ||
|
|
||
| #if defined _WIN32 && defined(_M_ARM64) | ||
| #define _ARM64_DISTINCT_NEON_TYPES | ||
| #include <Intrin.h> | ||
| #include <arm_neon.h> | ||
| #define VISP_HAVE_NEON 1 | ||
| #elif (defined(__ARM_NEON__) || defined (__ARM_NEON)) && defined(__aarch64__) | ||
| #include <arm_neon.h> | ||
| #define VISP_HAVE_NEON 1 | ||
| #else | ||
| #define VISP_HAVE_NEON 0 | ||
| #endif | ||
|
|
||
| #if VISP_HAVE_SSE2 && USE_SIMD_CODE | ||
| #define USE_SSE 1 | ||
| #else | ||
| #define USE_SSE 0 | ||
| #endif | ||
|
|
||
| #if VISP_HAVE_NEON && USE_SIMD_CODE | ||
| #define USE_NEON 1 | ||
| #else | ||
| #define USE_NEON 0 | ||
| #endif | ||
|
|
||
| namespace vpSIMD | ||
| { | ||
| #if defined(VISP_HAVE_AVX2) | ||
| using Register = __m512d; | ||
|
|
||
| inline constexpr int numLanes = 8; | ||
| inline const Register add(const Register a, const Register b) | ||
| { | ||
| return _mm512_add_pd(a, b); | ||
| } | ||
|
|
||
| inline Register sub(const Register a, const Register b) | ||
| { | ||
| return _mm512_sub_pd(a, b); | ||
| } | ||
|
|
||
| inline Register mul(const Register a, const Register b) | ||
| { | ||
| return _mm512_mul_pd(a, b); | ||
| } | ||
|
|
||
| inline Register fma(const Register a, const Register b, const Register c) | ||
| { | ||
| #if defined(VISP_HAVE_FMA) | ||
| return _mm512_fmadd_pd(a, b, c); | ||
| #else | ||
| return add(mul(a, b), c); | ||
| #endif | ||
| } | ||
|
|
||
| inline Register loadu(const double *const data) | ||
| { | ||
| return _mm512_loadu_pd(data); | ||
| } | ||
|
|
||
| inline Register set1(double v) | ||
| { | ||
| return _mm512_set1_pd(v); | ||
| } | ||
|
|
||
| inline void storeu(double *data, const Register a) | ||
| { | ||
| _mm512_storeu_pd(data, a); | ||
| } | ||
|
|
||
| #elif defined(VISP_HAVE_AVX) | ||
| using Register = __m256d; | ||
| inline const int numLanes = 4; | ||
|
|
||
| inline Register add(const Register a, const Register b) | ||
| { | ||
| return _mm256_add_pd(a, b); | ||
| } | ||
|
|
||
| inline Register sub(const Register a, const Register b) | ||
| { | ||
| return _mm256_sub_pd(a, b); | ||
| } | ||
|
|
||
| inline Register mul(const Register a, const Register b) | ||
| { | ||
| return _mm256_mul_pd(a, b); | ||
| } | ||
|
|
||
| inline Register fma(const Register a, const Register b, const Register c) | ||
| { | ||
| #if defined(VISP_HAVE_FMA) | ||
| return _mm256_fmadd_pd(a, b, c); | ||
| #else | ||
| return add(mul(a, b), c); | ||
| #endif | ||
| } | ||
|
|
||
| inline Register loadu(const double *const data) | ||
| { | ||
| return _mm256_loadu_pd(data); | ||
| } | ||
|
|
||
| inline Register set1(double v) | ||
| { | ||
| return _mm256_set1_pd(v); | ||
| } | ||
|
|
||
| inline void storeu(double *data, const Register a) | ||
| { | ||
| _mm256_storeu_pd(data, a); | ||
| } | ||
|
|
||
| #elif VISP_HAVE_SSE2 | ||
| using Register = __m128d; | ||
| inline const int numLanes = 2; | ||
|
|
||
| inline Register add(const Register a, const Register b) | ||
| { | ||
| return _mm_add_pd(a, b); | ||
| } | ||
|
|
||
| inline Register sub(const Register a, const Register b) | ||
| { | ||
| return _mm_sub_pd(a, b); | ||
| } | ||
|
|
||
| inline Register mul(const Register a, const Register b) | ||
| { | ||
| return _mm_mul_pd(a, b); | ||
| } | ||
|
|
||
| inline Register fma(const Register a, const Register b, const Register c) | ||
| { | ||
| #if defined(VISP_HAVE_FMA) | ||
| return _mm_fmadd_pd(a, b, c); | ||
| #else | ||
| return add(mul(a, b), c); | ||
| #endif | ||
| } | ||
|
|
||
| inline Register loadu(const double *const data) | ||
| { | ||
| return _mm_loadu_pd(data); | ||
| } | ||
|
|
||
| inline Register set1(double v) | ||
| { | ||
| return _mm_set1_pd(v); | ||
| } | ||
|
|
||
| inline void storeu(double *data, const Register a) | ||
| { | ||
| _mm_storeu_pd(data, a); | ||
| } | ||
|
|
||
| #endif | ||
|
|
||
| } | ||
|
|
||
| #endif // DOXYGEN_SHOULD_SKIP_THIS | ||
| #endif // VP_SIMD_UTILS_H | ||
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Two comments.
AVX2 != 512-bits register:
cat /proc/cpuinfoto see which AVX-512 variants your CPU hasI would definitely not expose SIMD code to the user:
.h, withmarch=nativeand running ViSP on another computer there are some chances to have SIGILL crash on old CPU.socan have all the more advanced instructions set