Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
The table of contents is too big for display.
Diff view
Diff view
  •  
  •  
  •  
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -6,3 +6,4 @@ cmake-build-*
/blosc/config.h
/doc/doxygen/xml/
/doc/xml
.vs/
20 changes: 13 additions & 7 deletions ANNOUNCE.md
Original file line number Diff line number Diff line change
@@ -1,23 +1,29 @@
# Announcing C-Blosc2 2.14.4
# Announcing C-Blosc2 2.17.1
A fast, compressed and persistent binary data store library for C.

## What is new?

This is a patch release where the `SOVERSION` was bumped since the API changed.
Several fixes affecting uninitialized memory access and others:

For more info, please see the release notes in:
* Fix uninitialized memory access in newly added unshuffle12_sse2 and unshuffle12_avx2 functions
* Fix unaligned access in _sw32 and sw32_
* Fix DWORD being printed as %s in sprintf call
* Fix warning on unused variable (since this variable was only being used in the linux branch)
* `splitmode` variable was uninitialized if goto was triggered

https://github.com/Blosc/c-blosc2/blob/main/RELEASE_NOTES.md
See PR #658. Many thanks to @EmilDohne for this nice job.

Also, there is blog post introducing the most relevant changes in Blosc2:
For more info, see the release notes in:

https://www.blosc.org/posts/blosc2-ready-general-review/
https://github.com/Blosc/c-blosc2/blob/main/RELEASE_NOTES.md

## What is it?

Blosc2 is a high performance data container optimized for binary data.
It builds on the shoulders of Blosc, the high performance meta-compressor
(https://github.com/Blosc/c-blosc).
(https://github.com/Blosc/c-blosc). Blosc2 is the next generation of Blosc,
an award-winning (https://www.blosc.org/posts/prize-push-Blosc2)` library
that has been around for more than a decade.

Blosc2 expands the capabilities of Blosc by providing a higher lever
container that is able to store many chunks on it (hence the super-block name).
Expand Down
9 changes: 9 additions & 0 deletions CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -35,6 +35,8 @@
# do not include support for the Zlib library
# DEACTIVATE_ZSTD: default OFF
# do not include support for the Zstd library
# WITH_ZLIB_OPTIM: default ON
# set WITH_OPTIM when building Zlib library, setting OFF is useful for wasm32 targets
# PREFER_EXTERNAL_LZ4: default OFF
# when found, use the installed LZ4 libs instead of included
# sources
Expand Down Expand Up @@ -131,6 +133,8 @@ option(PREFER_EXTERNAL_ZLIB
"Find and use external ZLIB library instead of included sources." OFF)
option(PREFER_EXTERNAL_ZSTD
"Find and use external ZSTD library instead of included sources." OFF)
option(WITH_ZLIB_OPTIM
"Set WITH_OPTIM for ZLIB, turning off is useful for compiling for wasm32 targets." ON)

set(CMAKE_MODULE_PATH "${PROJECT_SOURCE_DIR}/cmake")

Expand Down Expand Up @@ -191,6 +195,11 @@ if(NOT DEACTIVATE_ZLIB)
endif()

if(NOT (ZLIB_NG_FOUND OR ZLIB_FOUND))

if (NOT WITH_ZLIB_OPTIM)
set(WITH_OPTIM FALSE)
set(WITH_RUNTIME_CPU_DETECTION FALSE)
endif()
message(STATUS "Using ZLIB-NG internal sources for ZLIB support.")
set(HAVE_ZLIB_NG TRUE)
add_definitions(-DZLIB_COMPAT)
Expand Down
10 changes: 4 additions & 6 deletions DEVELOPING-GUIDE.rst
Original file line number Diff line number Diff line change
@@ -1,14 +1,12 @@
Some conventions used in C-Blosc2
=================================

* Use C99 designated initialization whenever possible (specially in examples).
* Use C99 designated initialization only in examples. Libraries should use C89 initialization, which is more portable, specially with C++ (designated initialization in C++ is supported only since C++20).

* Use _new and _free for memory allocating constructors and destructors and _init and _destroy for non-memory allocating constructors and destructors.

* Lines must not exceed 120 characters. If a line is too long, it must be broken into several lines.

Naming things
-------------
* Conditional bodies must always use braces, even if they are one-liners. The only exception that can be is when the conditional is a single line and the body is a single line:

Naming is one of the most time-consuming tasks, but critical for communicating effectively. Here it is a preliminary list of names that I am not comfortable with:

* We are currently calling `filters` to a data transformation function that essentially produces the same amount of data, but with bytes shuffled or transformed in different ways. Perhaps `transformers` would be a better name?
if (condition) whatever();
13 changes: 11 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,10 +37,11 @@ C-Blosc2 is the new major version of `C-Blosc <https://github.com/Blosc/c-blosc>

See a 3 minutes `introductory video to Blosc2 <https://www.youtube.com/watch?v=ER12R7FXosk>`_.


Blosc2 NDim: an N-Dimensional store
===================================

One of the latest and more exciting additions in C-Blosc2 is the `Blosc2 NDim layer <https://www.blosc.org/c-blosc2/reference/b2nd.html>`_ (or b2nd for short), allowing to create *and* read n-dimensional datasets in an extremely efficient way thanks to a n-dim 2-level partitioning, that allows to slice and dice arbitrary large and compressed data in a more fine-grained way:
One of the latest and more exciting additions in C-Blosc2 is the `Blosc2 NDim layer <https://www.blosc.org/c-blosc2/reference/b2nd.html>`_ (or B2ND for short), allowing to create *and* read n-dimensional datasets in an extremely efficient way thanks to a n-dim 2-level partitioning, that allows to slice and dice arbitrary large and compressed data in a more fine-grained way:

.. image:: https://github.com/Blosc/c-blosc2/blob/main/images/b2nd-2level-parts.png?raw=true
:width: 75%
Expand All @@ -66,7 +67,7 @@ New features in C-Blosc2

* **64-bit containers:** the first-class container in C-Blosc2 is the `super-chunk` or, for brevity, `schunk`, that is made by smaller chunks which are essentially C-Blosc1 32-bit containers. The super-chunk can be backed or not by another container which is called a `frame` (see later).

* **NDim containers (b2nd):** allow to store n-dimensional data that can efficiently read datasets in slices that can be n-dimensional too. To achieve this, a n-dimensional 2-level partitioning has been implemented. This capabilities were formerly part of `Caterva <https://github.com/Blosc/caterva>`_, and now it is included in C-Blosc2 for convenience. Caterva is now deprecated.
* **NDim containers (B2ND):** allow to store n-dimensional data that can efficiently read datasets in slices that can be n-dimensional too. To achieve this, a n-dimensional 2-level partitioning has been implemented. This capabilities were formerly part of `Caterva <https://github.com/Blosc/caterva>`_, and now it is included in C-Blosc2 for convenience. Caterva is now deprecated.

* **More filters:** besides `shuffle` and `bitshuffle` already present in C-Blosc1, C-Blosc2 already implements:

Expand Down Expand Up @@ -119,6 +120,14 @@ More info about the `improved capabilities of C-Blosc2 can be found in this talk
C-Blosc2 API and format have been frozen, and that means that there is guarantee that your programs will continue to work with future versions of the library, and that next releases will be able to read from persistent storage generated from previous releases (as of 2.0.0).


Open format
===========

The Blosc2 format is open and `fully documented <https://github.com/Blosc/c-blosc2/blob/main/README_FORMAT.rst>`_.

The format specs are defined in less than 1000 lines of text, so they should be easy to read and understand. In our opinion, this is very important for the long-term success of the library, as it allows for third-party implementations of the format, and also for the users to understand what is going on under the hood.


Python wrapper
==============

Expand Down
10 changes: 10 additions & 0 deletions README_B2ND_FORMAT.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
B2ND Format
===========

The B2ND format is meant for storing multidimensional datasets defined by a shape and a data type.
Both the shape and the data type follow the NumPy conventions.

It is just a `B2ND metalayer <https://github.com/Blosc/c-blosc2/blob/main/README_B2ND_METALAYER.rst>`_
on top of a Blosc2 `CFrame <https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst>`_
(for contiguous storage) or `SFrame <https://github.com/Blosc/c-blosc2/blob/main/README_SFRAME_FORMAT.rst>`_
(for sparse storage).
13 changes: 8 additions & 5 deletions README_B2ND_METALAYER.rst
Original file line number Diff line number Diff line change
@@ -1,9 +1,12 @@
b2nd metalayer
++++++++++++++
B2ND Metalayer Format
=====================

b2nd format is specified as a metalayer on top of a Blosc2 container for storing
multidimensional information. Specifically, this metalayer is named 'b2nd'
and follows this format::
This is a `metalayer <https://www.blosc.org/posts/blosc-metalayers/>`_ on top of a Blosc2
`CFrame <https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst>`_ or
`SFrame <https://github.com/Blosc/c-blosc2/blob/main/README_SFRAME_FORMAT.rst>`_
that is meant for storing multidimensional information.

Specifically, this metalayer is named 'b2nd' and follows this format::

|-0-|-1-|-2-|-3-|~~~~~~~~~~~~~~~~|---|~~~~~~~~~~~~~~~~|---|~~~~~~~~~~~~~~~~|
| 9X| v | nd| 9X| shape | 9X| chunkshape | 9X| blockshape |
Expand Down
7 changes: 4 additions & 3 deletions README_CFRAME_FORMAT.rst
Original file line number Diff line number Diff line change
@@ -1,8 +1,9 @@
Blosc2 Contiguous Frame Format
==============================

Blosc (as of version 2.0.0) has a contiguous frame format (cframe for short) that allows for the storage of
different Blosc data chunks contiguously, either in-memory or on-disk.
Blosc (as of version 2.0.0) has a Contiguous Frame format (CFrame for short) that allows for the storage of
different `Blosc data chunks <https://github.com/Blosc/c-blosc2/blob/main/README_CHUNK_FORMAT.rst>`_ contiguously,
either in-memory or on-disk.

The frame is composed of a header, a chunks section, and a trailer::

Expand All @@ -20,7 +21,7 @@ Header
------

The header contains information needed to decompress the Blosc chunks contained in the frame. It is encoded using
`msgpack <https://msgpack.org>`_ and the format is as follows::
msgpack and the format is as follows::

|-0-|-1-|-2-|-3-|-4-|-5-|-6-|-7-|-8-|-9-|-A-|-B-|-C-|-D-|-E-|-F-|-10|-11|-12|-13|-14|-15|-16|-17|
| 9X| aX| "b2frame\0" | d2| header_size | cf| frame_size |
Expand Down
6 changes: 3 additions & 3 deletions README_CHUNK_FORMAT.rst
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
Blosc Chunk Format
==================
Blosc/Blosc2 Chunk Format
=========================

A regular chunk is composed of a header and a blocks section::

Expand All @@ -8,7 +8,7 @@ A regular chunk is composed of a header and a blocks section::
+---------+--------+

Also, there are the so-called lazy chunks that do not have the actual compressed data,
but only metainformation about how to read it. Lazy chunks typically appear when reading
but only meta-information about how to read it. Lazy chunks typically appear when reading
data from persistent media. A lazy chunk has header and bstarts sections in place and
in addition, an additional trailer for allowing to read the data blocks::

Expand Down
8 changes: 8 additions & 0 deletions README_EXTENSION_FILENAMES.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
Extensions for Blosc2 Filenames
===============================

Blosc2 has some recommendations for different file extensions for different purposes. Here is a list of the currently supported ones:

- `.b2frame` (but also `.b2f` or `.b2`) (Blosc2 Frame): this is the main extension for storing `Blosc2 Contiguous Frames <https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst>`_.

- `.b2nd` (Blosc2 N-Dim): this is just a contiguous frame file with `a metalayer for storing n-dimensional information <https://github.com/Blosc/c-blosc2/blob/main/README_B2ND_METALAYER.rst>`_ like shape, chunkshape, blockshape and dtype.
25 changes: 25 additions & 0 deletions README_FORMAT.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
Blosc2 Format
=============

The Blosc2 format is a specification for storing compressed data in a way that is simple to read and parse,
and that allows for fast random access to the compressed data. The format is designed to be used with
the Blosc2 library, but it is not tied to it, and can be used independently. Emphasis has been put on
simplicity and robustness, so that the format can be used in a wide range of applications.

See a diagram of a Contiguous Frame (aka CFrame), the most important part of the Blosc2 format below:

.. image:: blosc2-cframe.png
:width: 25%
:alt: Blosc2 CFrame format diagram

And here, the list of the different parts of the format, from the highest level to the lowest:

- `B2ND format <https://github.com/Blosc/c-blosc2/blob/main/README_B2ND_FORMAT.rst>`_
- `B2ND metalayer <https://github.com/Blosc/c-blosc2/blob/main/README_B2ND_METALAYER.rst>`_
- `SFrame format <https://github.com/Blosc/c-blosc2/blob/main/README_SFRAME_FORMAT.rst>`_
- `CFrame format <https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst>`_
- `Chunk format <https://github.com/Blosc/c-blosc2/blob/main/README_CHUNK_FORMAT.rst>`_

Finally, the recommended extension file names for the different parts of the format:

- `Blosc2 extension file names <https://github.com/Blosc/c-blosc2/blob/main/README_EXTENSION_FILENAMES.rst>`_
10 changes: 4 additions & 6 deletions README_SFRAME_FORMAT.rst
Original file line number Diff line number Diff line change
@@ -1,11 +1,11 @@
Blosc2 Sparse Frame Format
==========================

Blosc (as of version 2.0.0) has a sparse frame (sframe for short) format that allows for non-contiguous storage of Blosc data chunks on disk.
Blosc (as of version 2.0.0) has a Sparse Frame (SFrame for short) format that allows for non-contiguous storage of `Blosc2 data chunks <https://github.com/Blosc/c-blosc2/blob/main/README_CHUNK_FORMAT.rst>`_ on disk.

When creating an sparse frame one must denote the `storage.contiguous` as false and provide a name (which represents a directory, but in the future it could be an arbitrary URL) in `storage.urlpath` for the sframe to be stored. It is recommended to name the directory with the `.b2frame` (or `.b2f` for short) extension.
When creating an sparse frame one must denote the `contiguous` flag in `storage` struct as false and provide a name (which represents a directory, but in the future it could be an arbitrary URL) in `storage.urlpath` for the sframe to be stored. It is recommended to name the directory with the `.b2frame` (or `.b2f` for short) extension.

An sframe is made up of a frame index file and the chunks stored in the same directory on-disk. The frame file follows the format described in the `contiguous frame format <README_CFRAME_FORMAT.rst>`_ document, with the difference that the frame's chunks section is made up of multiple files (one per chunk). The frame index file name is always `chunks.b2frame`, and it also contains the metadata for the sframe.
A SFrame is made up of a frame index file and the chunks stored in the same directory on-disk. The frame index file follows the format described in the `contiguous frame format <https://github.com/Blosc/c-blosc2/blob/main/README_CFRAME_FORMAT.rst>`_ document, with the difference that the frame's chunks section is made up of multiple files (one per chunk). The frame index file name is always `chunks.b2frame`, and it also contains the metadata for the sframe.

Chunks
------
Expand All @@ -14,7 +14,7 @@ The chunks are stored in the directory as binary files. Each chunk file name wil

00000000.chunk, 00000001.chunk, ··· , 0000000E.chunk, 0000000F.chunk

Each chunk follows the format described in the `chunk format <README_CHUNK_FORMAT.rst>`_ document.
Each chunk follows the format described in the `chunk format <https://github.com/Blosc/c-blosc2/blob/main/README_CHUNK_FORMAT.rst>`_ document.

*Note:* The real order of the chunks is in the index chunk and may not follow the order of the names. This can occur when doing an insertion or a reorder. For more information see the **Examples** section below.

Expand Down Expand Up @@ -63,7 +63,6 @@ When doing an insertion in the nth position, in the same position of the index c

Note that neither the file names nor their contents change, so when accessing the 2nd chunk the `00000004.chunk` file will be read.


Reorder example
^^^^^^^^^^^^^^^
As in the insertion case, when doing a reorder the chunks names and their contents are not changed, but the content of the index chunk does. When reordering the chunks, a new order list is passed and the index chunk is changed according to that list. Following with the first example of this section, the content of the index chunk is shown before and after reordering::
Expand All @@ -73,4 +72,3 @@ As in the insertion case, when doing a reorder the chunks names and their conten
Possible index New index
chunk content: [0, 1, 2, 3] chunk content: [3, 1, 0, 2]
New order list: [3, 1, 0, 2]

96 changes: 96 additions & 0 deletions RELEASE_NOTES.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,102 @@
Release notes for C-Blosc2
==========================

Changes from 2.17.0 to 2.17.1
=============================

Several fixes affecting uninitialized memory access and others:

* Fix uninitialized memory access in newly added unshuffle12_sse2 and unshuffle12_avx2 functions
* Fix unaligned access in _sw32 and sw32_
* Fix DWORD being printed as %s in sprintf call
* Fix warning on unused variable (since this variable was only being used in the linux branch)
* `splitmode` variable was uninitialized if goto was triggered

See PR #658. Many thanks to @EmilDohne for this nice job.


Changes from 2.16.0 to 2.17.0
=============================

* New b2nd_copy_buffer2() function for copying buffers with typesizes
larger than 255. The previous b2nd_copy_buffer() function is now
deprecated and will be removed in a future release.

* Support repeated values larger than 8-bit, also for n-dim arrays.
This is useful for compressing arrays with large runs of repeated
values, like in the case of images with large areas of the same color.

* Fix a leak in the pthreads emulation for Windows. Fixes #647.
Thanks to @jocbeh for the report and fix (#655).

* Update zstd to 1.5.7. Thanks to Tom Birch.

* Add BLOSC2_MAXTYPESIZE constant.

### Deprecated Functions

- `int b2nd_copy_buffer(...)` is deprecated and will be removed in
a future release. Please use `b2nd_copy_buffer2(...)` instead.


Changes from 2.15.2 to 2.16.0
=============================

* Use _fseeki64/_ftelli64/_stat64 on Windows for large file (>2 GB) support.
Thanks to Abhi Jaiantilal (@ajaiantilal) for the report and help.
* Add 12-byte unshuffle for avx2. Thanks to Tom Birch (@froody).
* Add 12-byte sse2 unshuffle implementation. Thanks to Tom Birch (@froody).
* Better description of the Blosc2 format as a whole.


Changes from 2.15.1 to 2.15.2
=============================

* Support wasm32 by disabling ZLIB WITH_OPTIM option. Thanks to Miles Granger.

* Avoid rip-relative addressing for OSX x86_64. Thanks to Miles Granger.

* Added support for nvcc (NVidia Cuda Compiler) in CMake. Thanks to @dqwu.

* Fix public include directories for blosc2 targets. Thanks to Dmitry Mikushin.

* Fix ub in shuffle and unshuffle by marking _dst non-const. Thanks to Emil Dohne.


Changes from 2.15.0 to 2.15.1
=============================

* Do not pass `-m` flags when compiling `shuffle.c`. This prevents the
compiler from incidentally optimizing the code called independently
of the runtime CPU check to these instruction sets, effectively
causing `SIGILL` on other CPUs. Fixes #621. Thanks to @t20100 and @mgorny.

* Internal LZ4 sources bumped to 1.10.0.

* Allow direct loading of plugins by name, without relying on
the presence of python. Thanks to @boxerab.

* Add `b2nd_nans` method (PR #624). Thanks to @waynegm.


Changes from 2.14.4 to 2.15.0
=============================

* Removed some duplicated functions. See https://github.com/Blosc/c-blosc2/issues/503.

* Added a new io mode to memory map files. This forced to change the `io_cb` read API.
See https://github.com/Blosc/c-blosc2/blob/main/tests/test_mmap.c to see an example on
how to use it.

* Updated the `SOVERSION` to 4 due to the API change in `io_cb` read.

* Added functions to get cparams, dparams, storage and io defaults respectively.

* Internal zstd sources updated to 1.5.6.

* Fixed a bug when setting a slice using prefilters.


Changes from 2.14.3 to 2.14.4
=============================

Expand Down
5 changes: 5 additions & 0 deletions RELEASING.rst
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,11 @@ Preliminaries

- Check that *VERSION* symbols in include/blosc2.h contains the correct info.

- If API/ABI changes, please increase the minor number (e.g. 2.15 -> 2.16) *and*
bump the SOVERSION in blosc/CMakeLists.txt.
When in doubt on when SOVERSION should change, see these nice guidelines:
https://github.com/conda-forge/c-blosc2-feedstock/issues/62#issuecomment-2049675391

- Commit the changes with::

$ git commit -a -m "Getting ready for release X.Y.Z"
Expand Down
Loading
Loading