You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
jemalloc.NET is a .NET API over the [jemalloc](http://jemalloc.net/) native memory allocator and provides .NET applications with efficient data structures backed by native memory for large scale in-memory computation scenarios. jemalloc is "a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support" that is [widely used](https://github.com/jemalloc/jemalloc/wiki/Background#adoption) in the industry, particularly in applications that must [scale and utilize](http://highscalability.com/blog/2015/3/17/in-memory-computing-at-aerospike-scale-when-to-choose-and-ho.html) large amounts of memory. In addition to its fragmentation and concurrency optimizations, jemalloc provides an array of developer options for debugging, monitoring and tuning allocations that make it a great choice for use in developing memory-intensive applications.
4
+
5
+
Get the latest 0.2.x release from the [releases](https://github.com/allisterb/jemalloc.NET/releases) page.
6
+
7
+
jemalloc.NET is a .NET API over the [jemalloc](http://jemalloc.net/) native memory allocator and provides .NET applications with efficient data structures backed by native memory for large scale in-memory computation scenarios. jemalloc is "a general purpose malloc(3) implementation that emphasizes fragmentation avoidance and scalable concurrency support" that is [widely used](https://github.com/jemalloc/jemalloc/wiki/Background#adoption) in the industry, particularly in applications that must [scale and utilize](http://highscalability.com/blog/2015/3/17/in-memory-computing-at-aerospike-scale-when-to-choose-and-ho.html) large amounts of memory. In addition to its fragmentation and concurrency optimizations, jemalloc provides an array of developer options for debugging, monitoring and tuning allocations that makes it a great choice for use in developing memory-intensive applications.
5
8
6
9
The jemalloc.NET project provides:
7
10
* A low-level .NET API over the native jemalloc API functions like je_malloc, je_calloc, je_free, je_mallctl...
8
-
* A safety-focused high-level .NET API providing data structures like arrays backed by native memory allocated using jemalloc together with management features like reference counting.
11
+
* A safety-focused high-level .NET API providing data structures like arrays backed by native memory allocated using jemalloc together with management features like reference counting and acceleration features like SIMD vectorized operations via the `Vector<T>` .NET type.
9
12
* A benchmark CLI program: `jembench` which uses the excellent [BenchmarkDotNet](http://benchmarkdotnet.org/index.htm) library for easy and accurate benchmarking operations on native data structures vs managed objects using different parameters.
10
13
14
+
The high-level .NET API makes use of newly introduced C# and .NET features and classes like ref returns, `Span<T>`, `Vector<T>`, and `Unsafe<T>` from the `System.Runtime.CompilerServices.Unsafe` libraries, for working effectively with pointers to managed and unmanaged memory.
15
+
11
16
Data structures provided by the high-level API are more efficient than managed .NET arrays and objects at the scale of millions of elements, and memory allocation is much more resistant to fragmentation, while still providing necessary safety features like array bounds checking. Large .NET arrays must be allocated on the Large Object Heap and are not relocatable which leads to fragmentation and lower performance. For example in the following `jembench` benchmark on my laptop, simply filling an array is more or less the same across different kinds of memory and scales linearly depending on the size of the array, but *allocating* and filling a `UInt64[]` managed array of size 10000000 and 100000000 is more than 2x slower than using an equivalent native array provided by jemalloc.NET:
an `Int32[]` of maximum size can be allocated and filled in 3.2s. This array consumes 8.6GB on the managed heap. But a jemalloc.NET `HugeArray<Int32>` of nearly double the size at 4.2 billion elements can be allocated in only 4 s and again consumes no memory on the managed heap. The only limit on the size of a `HugeArray<T>` is the available system memory.
63
68
64
-
Perhaps the killer feature of the [recently introduced](https://blogs.msdn.microsoft.com/dotnet/2017/11/15/welcome-to-c-7-2-and-span/)`Span<T>` class in .NET is its ability to efficently zero-copy re-interpret numeric data structures (`Int32, Int64` and their siblings) into other structures like the `Vector<T>` SIMD-enabled data types introduced in 2016. `Vector<T>` types are special in that the .NET RyuJIT JIT compiler can compile operations on Vectors to use SIMD instructions like SSE, SSE2, and AVX for parallelizing operations on data on a single CPU core.
69
+
Perhaps the killer feature of the [recently introduced](https://blogs.msdn.microsoft.com/dotnet/2017/11/15/welcome-to-c-7-2-and-span/)`Span<T>` class in .NET is its ability to efficiently zero-copy reinterpret numeric data structures (`Int32, Int64` and their siblings) into other structures like the `Vector<T>` SIMD-enabled data types introduced in 2016. `Vector<T>` types are special in that the .NET RyuJIT compiler can compile operations on Vectors to use SIMD instructions like SSE, SSE2, and AVX for parallelizing operations on data on a single CPU core.
65
70
66
71
Using the SIMD-enabled `SafeBuffer<T>.VectorMultiply(n)` method provided by the jemalloc.NET API yields a more than 12x speedup for a simple in-place multiplication of a `UInt64[]` array of 10 million elements, compared to the unoptimized linear approach, allowing the operation to complete in 60 ms:
For a huge array with 4.1 billion `UInt16` values it takes 12 seconds to do a SIMD-enabled multiplication operation on all the elements of the array. This is still 3x the performance of doing the same non-vectorized operation on a managed array of half the size.
108
113
109
-
Inside a .NET application, jemalloc.NET native arrays and data structures can be straightforwardly accessed by native libraries without the need to make additional copies or allocations. The goal of the jemalloc.NET project is to make accessible to .NET the kind of big-data in-memory numeric, scientific and other computing that typically would require coding in a low=level language like C/C++ or assembler.
114
+
Inside a .NET application, jemalloc.NET native arrays and data structures can be straightforwardly accessed by native libraries without the need to make additional copies or allocations. The goal of the jemalloc.NET project is to make accessible to .NET the kind of big-data in-memory numeric, scientific and other computing that typically would require coding in a low-level language like C/C++ or assembly.
115
+
116
+
## How it works
117
+
Memory allocations are made using the jemalloc C functions like je_malloc, je_calloc, etc and returned as `IntPtr`. Memory alignment is handled by jemalloc and memory allocations (called extents) are aligned to a multiple of the jemalloc page size e.g 4096. Inside extents, elements are aligned according to the .NET struct alignment specified using the `StructLayout` and other attributes. In the future jemalloc's ability to manually align each extent (e.g using the `je_posix_memalign` function) may be exposed to alleviate potential [performance problems](http://adrianchadd.blogspot.se/2015/03/cache-line-aliasing-2-or-what-happens.html) with the default jemalloc page alignment.
118
+
119
+
Each allocation pointer is tracked in a thread-safe allocations ledger together with a reference count. Each valid Jem.NET data structure has a pointer in the allocations ledger. There are other allocation ledgers that track details of different data structures like `FixedBufferAllocations`.
120
+
121
+
Any attempt to read or write data structure memory locations is guarded by checking that the data structure owns a valid pointer to the memory location. Data structures are provided with reference counting methods to manage their lifetimes.
122
+
123
+
The primitive .NET types are always correctly aligned for SIMD vectorized operations and such operations like `SafeArray<T>.VectorMultiply()` can be significatnly faster than the non-optimized variants.
110
124
111
125
## Installation
112
126
### Requirements
@@ -122,6 +136,45 @@ Grab the latest release from the [releases](https://github.com/allisterb/jemallo
122
136
123
137
Note that if using jemalloc.NET in your own projects you must put the native jemallocd.dll library somewhere where it can be located by the .NET runtime. You can create a post-build step to copy it to the output folder of your project or put it somewhere on your %PATH%.
124
138
139
+
140
+
## Usage
141
+
142
+
### API
143
+
144
+
Currently there are 4 implemented data structures:
145
+
1. `FixedBuffer<T>` is a fixed-length array of data backed by memory on the unmanaged heap that is implemented as .NET `struct`. The underlying data type of a `FixedBuffer<T>` must be a [primitive](https://msdn.microsoft.com/en-us/library/system.type.isprimitive.aspx) type: Boolean, Byte, SByte, Int16, UInt16, Int32, UInt32, Int64, UInt64, IntPtr, UIntPtr, Char, Double, or Single. Primitives types have a maximum width of 64bits and assuming correct memory alignment can be read and updated atomically on machines with a 64bit word-size. This eliminates the possibility of 'struct tearing' or struct fields being read or written to inconsistently by multiple concurrent operation as writes are only done directly to the underlying memory location for the data element. All properties and fields of `FixedBuffer<T>` are readonly with only the index operator `[]` returing a `ref T` to the underlying data element. This allows data elements to be writable while preserving array copy-on-read semantics, since it is illegal to set a ref variable to a property accessor in C# e.g:
146
+
147
+
FixedBuffer<int> b = ...;
148
+
int num = b[15]; //works
149
+
b[15] = 6; //works
150
+
ref int r = ref b[15]; compiler error
151
+
152
+
Although `FixedBuffer<T>` is a .NET value type, only the metadata about the data structure is copied from one variable assignment to another. The actual data resides only at a single location in memory. Each data acess to a `<T>` element in a `FixedBuffer<T>` reads and writes the same memory location.
153
+
154
+
You can create user-defined structures that contain `FixedBuffer<T>` e.g
155
+
156
+
public struct Employee : IEquatable<Employee>
157
+
{
158
+
...
159
+
public DateTime? DOB { get; set; }
160
+
public decimal Balance { get; set; }
161
+
public FixedBuffer<float> BonusPayments { get; set; }
162
+
public FixedBuffer<byte> Photo { get; set; } //8
163
+
}
164
+
165
+
Arrays of these kinds of user-defined structures can also be stored in native memory
166
+
2.`FixedUtf8Buffer` is a fixed-length immutable string type using UTF-8 encoding that is backed by memory on the unmanaged heap and is implemented as a .NET `struct` You can create user-defined structures that contain
167
+
168
+
3.`SafeBuffer<T>` is a fixed-length array of data backed by memory on the unmanaged heap that inherits from the .NET `SafeHandle` class and is implemented as .NET `class`. `SafeArray<T>` can contain user-defined structures e.g:
169
+
4.
170
+
171
+
b = new FixedBuffer(1000000)
172
+
`
173
+
174
+
### jembench CLI
175
+
Examples:
176
+
*`jembench hugearray -l -u --math --cold-start -t 3 4096000000` Benchmark math operations on `HugeArray<UInt64>` arrays of size 4096000000 without benchmark warmup and only using 3 iterations of the target methods. Benchmarks on huge arrays can be lengthy so you should carefully choose the benchmark parameters affecting how long you want the benchmark to run,
177
+
125
178
## Building from source
126
179
Currently build instuctions are only provided for Visual Studio 2017 on Windows but instructions for building on Linux will also be provided. jemalloc.NET is a 64-bit library only.
127
180
### Requirements
@@ -147,10 +200,4 @@ Cygwin tools aren't actually used for compiling jemalloc but for generating the
147
200
2. Open a x64 Native Tools Command Prompt for VS 2017 and temporarily add `Cygwin\bin` to the PATH e.g `set PATH=%PATH%;C:\cygwin\bin`. Switch to the `jemalloc` subdirectory in your jemalloc.NET solution dir and run `sh -c "CC=cl ./autogen.sh"`. This will generate some files in the `jemalloc` subdirectory and only needs to be done once.
148
201
4. From a Visual Studio 2017 Developer Command prompt run `build.cmd`. Alternatively you can load the solution in Visual Studio and using the "Benchmark" solution configuration build the entire solution.
149
202
5. The solution should build without errors.
150
-
6. Run `jembench` from the solution folder to see the project version and help.
151
-
152
-
## Usage
153
-
154
-
### jembench CLI
155
-
Examples:
156
-
*`jembench hugearray -l -u --math --cold-start -t 3 4096000000` Benchmark math operations on `HugeArray<UInt64>` arrays of size 4096000000 without benchmark warmup and only using 3 iterations of the target methods. Benchmarks on huge arrays can be lengthy so you should carefully choose the benchmark parameters affecting how long you want the benchmark to run,
203
+
6. Run `jembench` from the solution folder to see the project version and help.
0 commit comments