GPU 2D Copy Lib

This library intends to provide a simple very high performance way to copy strided data with a 2D layout between GPU devices, and between CPU and GPU memory spaces.

It is implemented in C++ and SYCL, and provides a C++ API.

Prerequisites

A SYCL implementation
- The library has been tested with SimSYCL (in CI), DPC++, and AdaptiveCpp
CMake 3.23.5 or later
[optional] CUDA Toolkit for CUDA-backend-specific interop features

Building

This library uses CMake as its build system.
To build, run the following commands, adjusted for your environment:

mkdir build
cd build
cmkae .. -G Ninja -DCMAKE_BUILD_TYPE=Release -DCMAKE_PREFIX_PATH=/path/to/sycl
ninja

The CMake script will report which SYCL implementation it has found and is using.

Configuration Options

COPYLIB_USE_MIMALLOC: Use MiMalloc as the memory allocator. Default: ON
COPYLIB_USE_FMT: Use the FMT library instead of relying on the C++20 std::format. Default: ON

Both dependencies are fetched automatically if their use is enabled.

Usage

To use the library for copy operations, first, an instance of an executor must be created.
Then, copy operations can be specified, turned into optimized parallel copy sets, and executed.

Here is a complete example which manually specifies all parameters to serve as a reference:

#include <copylib/copylib.hpp>

using namespace copylib;

// === 1. Initialization

const int64_t buffer_size = 128 * 1024 * 1024; // 128 MiB for staging buffers
const int64_t queues_per_device = 2; // number of in-order queues per device for asynchronicity
executor exec(buffer_size, 2, queues_per_device); // create an executor
utils::print(exec.get_info()); // [optional] print information about the executution environment

// === 2. Specifying a copy operation

// provide these pointers as appropriate to source and target memory
intptr_t src_ptr = 0x1; // pointer to the source
intptr_t dst_ptr = 0x2; // pointer to the destination

// all the following values are in bytes
const int64_t offset = 0; // offset of the data to copy from the start of the buffer
const int64_t length = 1024; // length of one fragment (row) of data to copy
const int64_t count = 1024; // number of fragments (rows) to copy
const int64_t stride = 1024*1024; // stride between the start of two consecutive fragments (rows)

const data_layout source_layout{src_ptr, offset, length, count, stride}; // source data layout
const data_layout target_layout{dst_ptr, source_layout}; // target data layout, same structure as the source

// copy from device 0 to device 1
const copy_spec spec{device_id::d0, source_layout, device_id::d1, target_layout};
COPYLIB_ENSURE(is_valid(spec), "Invalid copy spec: {}", spec); // [optional] check if the copy spec is valid

// === 3. Manifesting the copy operations into an optimized parallel copy set

const copy_type type = copy_type::staged; // perform linearization
const copy_properties props = copy_properties::use_kernel; // use a kernel for linearization, generally faster
const d2d_implementation d2d = d2d_implementation::host_staging_at_source; // use manual host staging
const int64_t chunk_size = 1024*1024; // generate 1 MiB chunks
const strategy strat(type, props, d2d, chunk_size); // create a strategy
const auto copy_set = manifest_strategy(spec, strat, basic_staging_provider{}); // manifest the copy set
COPYLIB_ENSURE(is_valid(copy_set), "Invalid copy set: {}", copy_set); // [optional] validate the copy set

// === 4. Executing the copy set

execute_copy(exec, copy_set);

Benchmarks and Utilities

Some benchmarks and utilities are provided:

utils/info: Print information about the execution environment and its features
benchmarks/manifest: Micro-benchmark measuring strategy manifesting performance
benchmarks/intra_device: Benchmark for intra-device linearization performnce
benchmarks/chunk_parallel: Benchmark for optimized device-to-device copy performance
benchmarks/full_set: Perform a very large run of various benchmarks to characterize platform performance

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.github/workflows		.github/workflows
benchmarks		benchmarks
lib		lib
tests		tests
utils		utils
vendor/bs_thread_pool		vendor/bs_thread_pool
.clang-format		.clang-format
.gitignore		.gitignore
CMakeLists.txt		CMakeLists.txt
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

GPU 2D Copy Lib

Prerequisites

Building

Configuration Options

Usage

Benchmarks and Utilities

About

Releases

Packages

Languages

License

PeterTh/gpu2dcopylib

Folders and files

Latest commit

History

Repository files navigation

GPU 2D Copy Lib

Prerequisites

Building

Configuration Options

Usage

Benchmarks and Utilities

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages