Distributed QEM Simplification

1. Introduction

Distributed QEM Simplification is a high-performance C++ implementation for 3D mesh simplification based on Quadric Error Metrics (QEM). It provides multiple implementations to handle massive meshes using both Shared Memory parallelism and Distributed Memory clusters.

Features

Parallel QEM: High-speed edge collapse algorithm using OpenMP and FastFlow.
Spatial Partitioning: Supports both Uniform Grid and Octree decomposition for workload balancing.
Distributed Computing: Scalable MPI implementations (Hybrid and Full MPI) for processing meshes across multiple nodes.
Modern C++: Built with C++23 features for efficiency and robust memory management.
Topology Management: Powered by OpenMesh for reliable half-edge data structures.

2. Papers, Libraries, and Requirements

Reference Papers and Algorithms

Libraries Used

cxxopts for CLI parsing
eigen for math computation
openmesh for mesh topological management
cpp-utils for generic utilities
fastflow for parallel pattern testing
CMake as the build system

Software Requirements

C++: >= C++23
OpenMPI: >= 5.0
OpenMP: >= 6.1
CMake: >= 3.20
Compiler: GCC 13+ / Clang 16+ / MSVC with C++23 support

3. Installation and Usage

Installation

git clone https://github.com/bigmat18/distributed-qem-simplification.git
cd distributed-qem-simplification
git submodule update --init --recursive

Build

./SETUP.sh <type>
./BUILD.sh <type>

The type parameter can be debug, release, reldeb that activete different optiomations.

Basic Usage

The project generates different executables based on the parallelization strategy:

Shared Memory (OMP/FastFlow):

# Uniform Grid OpenMP
./build/examples/02-omp-uniform input.obj -n <target_faces_num>
# Octree OpenMP
./build/examples/03-omp-octree input.obj -n <target_faces_num>
# FastFlow Uniform
./build/examples04-ff-uniform input.obj -n <target_faces_num>

Main Options:

-i, --filename : Input filename list
-t, --threads : Number of threads (default: -1, uses all available)
-p, --partitions : Start partitions (default: 16)
-n, --target : Target faces

Distributed Memory (MPI): Use the provided bash script to launch MPI executions. The script safely wraps mpirun handling hardware threads, CPU bindings, and OpenMP thread mapping for masters and workers.

./RUN.sh <type> <exe> -nw 4 -wt 2 -mt 4 -i data_folder/ -p 16 -t 10

Script Usage: ./run_mpi.sh <type> <exe> [options]

<type> : debug, release, or reldeb
<exe> : Target executable (e.g., 06-mpi-omp, 07-mpi)
-nw : Number of MPI worker processes
-wt : Number of OpenMP threads per worker
-mt : Number of OpenMP threads for the master
-p : Number of partitions
-t : Target percentage
-pr : Enable performance profiling via perf record

4. Results


Uniform Grid Reduction Stanford Bunny from 56,172 faces to 1,000.	Octree Reduction Stanford Bunny from 56,172 faces to 1,000 with wireframe visualization.

Uniform Grid Partitions Reduction

5. Benchmark and Performance Analysis

Overall Speedup Comparison

The performance of the various parallel implementations (Shared Memory and Distributed Memory) was evaluated against a sequential baseline and MeshLab's decimation algorithm. The efficiency of these parallel solutions is highly correlated with the input model's complexity. Small meshes, such as the Bunny (68K faces), show limited improvement due to thread management and communication overhead. Conversely, massive models like Lucy (28M faces) achieve up to a 10x speedup in distributed environments.

_{Bunny (68K Triangles) [cite: 315]}	_{Armadillo (300K Triangles) [cite: 316]}
_{Dragon (7.5M Triangles) [cite: 334]}	_{Lucy (28M Triangles) [cite: 375]}

Scalability Analysis

Strong scaling analysis highlights the operational limits of both architectures. Shared memory approaches scale well initially but saturate near 32 cores due to Amdahl's Law, synchronization overhead, and memory bandwidth limits. Distributed implementations (Full MPI and Hybrid MPI+OMP) process massive datasets efficiently but face severe communication bottlenecks and load imbalance beyond 128 cores, primarily driven by I/O constraints.

_{Shared Memory Strong Scaling
Execution time variation up to 32 physical cores.}

_{Distributed Memory Scalability
Comparison of Hybrid MPI+OMP and Full MPI architectures.}

Detailed Documentation: For a comprehensive explanation of the mathematical models, spatial partitioning strategies (Uniform Grid vs. Octree), and in-depth performance analysis, please refer to the full project report:
📄 Relation_SPM.pdf

Name		Name	Last commit message	Last commit date
Latest commit History 66 Commits
.github		.github
assets		assets
examples		examples
external		external
include		include
scripts		scripts
src		src
.gitignore		.gitignore
.gitmodules		.gitmodules
BUILD.sh		BUILD.sh
CMakeLists.txt		CMakeLists.txt
README.md		README.md
RUN.sh		RUN.sh
SETUP.sh		SETUP.sh
SRUN.sh		SRUN.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Distributed QEM Simplification

1. Introduction

Features

2. Papers, Libraries, and Requirements

Reference Papers and Algorithms

Libraries Used

Software Requirements

3. Installation and Usage

Installation

Build

Basic Usage

4. Results

Uniform Grid Partitions Reduction

5. Benchmark and Performance Analysis

Overall Speedup Comparison

Scalability Analysis

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Distributed QEM Simplification

1. Introduction

Features

2. Papers, Libraries, and Requirements

Reference Papers and Algorithms

Libraries Used

Software Requirements

3. Installation and Usage

Installation

Build

Basic Usage

4. Results

Uniform Grid Partitions Reduction

5. Benchmark and Performance Analysis

Overall Speedup Comparison

Scalability Analysis

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages