-
Notifications
You must be signed in to change notification settings - Fork 0
Home
Welcome to the file-compression-exercise wiki! This exercise project contains two compression algorithms. Their implementation is strongly influenced by other developers mentioned in the Resources section of the Implementation page.
file-compression-exercise/
│
├── assets/
│ ├── 2023-12-06 project UML.drawio
│ └── ...
│
├── core/
│ ├── __init__.py
│ └── compression.py
│
├── data/
│ ├── __init__.py
│ └── file_handler.py
│
├── frontend/
│ ├── __init__.py
│ ├── application.py
│ └── rich_output.py
│
├── sandbox/
│ ├── click_cli.py
│ └── ...
│
├── test/
│ ├── example_files/
│ ├── binary_tree_test.py
│ ├── file_handler.py
│ ├── huffman_test.py
│ └── lzw_test.py
│
├── sandbox/
│ ├── click_cli.py
│ └── ...
│
├── varc.py
│
├── .gitignore
├── LICENSE
└── README.mdLZW compression named after its creators Lempel-Ziv-Welch is a general purpose lossless compression algorithm. It passes only once over the initial data. From the encountered data a codebook is created that holds in best case long sequences of original data assigned to a single code. For decoding no codebook is required. The decoding process is a single pass over the encoded data as well. The codebook is created by reading the data during decompression. The codebook can not only grow during encoding or decoding, it can be restarted when it reaches certain size. This ensures that the code adapts to the encountered data.
The Huffman compression algorithm is developed by David Huffman. It is a lossless compression algorithm. For the encoding it needs two passes over the data. The first pass counts the occurrences of identical data. Based on the occurrences a binary tree is created and from it a binary code assigned to each data point. More frequently occurring data has shorter binary code. A shorter code is never a beginning part of a longer code, which makes the unambiguous decoding later possible. On the second pass during encoding the original data is replaced by the calculated codes - short codes for common data and longer codes for rare data. For the decoding the codebook with corresponding codes and data is necessary. Bits are red in sequence and whenever they correspond to data in the codebook the data is retrieved.