My own Unicode compression algorithm

Overview

Zee Code

ZCode is a custom compression algorithm I originally developed for a competition held for the Spring 2019 Datastructures and Algorithms course of Dr. Mahdi Safarnejad-Boroujeni at Sharif University of Technology, at which I became first-place. The code is pretty slow and has a lot of room for optimization, but it is pretty readable. It can be an excellent educational resource for whoever is starting on compression algorithms.

The algorithm is a cocktail of classical compression algorithms mixed and served for Unicode documents. It hinges around the LZW algorithm to create a finite size symbol dictionary; the results are then byte-coded into variable-length custom symbols, which I call zee codes! Finally, the symbol table is truncated accordingly, and the compressed document is encoded into a byte stream.

Huffman trees highly inspire zee codes, but because in normal texts, symbols are usually much more uniformly distributed than the original geometrical (or exponential) distribution assumption for effective Huffman coding, the gains of using variable-sized byte-codes both from an implementation and performance perspective outweighed bit Huffman encodings. Results may vary, but my tests showed a steady ~4-5x compression ratio on Farsi texts, which is pretty nice!

Installation

ZCode is available on pip, and only requires a 3.6 or higher python installation beforehand.

pip install -U zcode

Usage

You can run the algorithm for any utf-8 encoded file using the zcode command. It will automatically decompress files ending with a .zee extensions and compress others into .zee files, but you can always override the default behavior by providing optional arguments like:

zcode INPUTFILE [--output OUTPUT_FILE --action compress/decompress --symbol-size SYMBOL_SIZE --code-size CODE_SIZE]

The symbol-size argument controls the algorithms' buffer size for processing symbols (in bytes). It is automatically set depending on your input file size but you can change it as you wish. code-size controls the maximum length for coded bytes while encoding symbols (this equals to 2 by default and needs to be provided to the algorithm upon decompression).

LICENSE

MIT LICENSE, see vahidzee/zcode/LICENSE

You might also like...
Search algorithm implementations meant for teaching

Search-py A collection of search algorithms for teaching and experimenting. Non-adversarial Search There’s a heavy separation of concerns which leads

8-puzzle-solver with UCS, ILS, IDA* algorithm
8-puzzle-solver with UCS, ILS, IDA* algorithm

Eight Puzzle 8-puzzle-solver with UCS, ILS, IDA* algorithm pre-usage requirements python3 python3-pip virtualenv prepare enviroment virtualenv -p pyth

A Python Package for Portfolio Optimization using the Critical Line Algorithm

A Python Package for Portfolio Optimization using the Critical Line Algorithm

🧬 Training the car to do self-parking using a genetic algorithm
🧬 Training the car to do self-parking using a genetic algorithm

🧬 Training the car to do self-parking using a genetic algorithm

A raw implementation of the nearest insertion algorithm to resolve TSP problems in a TXT format.

TSP-Nearest-Insertion A raw implementation of the nearest insertion algorithm to resolve TSP problems in a TXT format. Instructions Load a txt file wi

A GUI visualization of QuickSort algorithm
A GUI visualization of QuickSort algorithm

QQuickSort A simple GUI visualization of QuickSort algorithm. It only uses PySide6, it does not have any other external dependency. How to run Install

A python implementation of the Basic Photometric Stereo Algorithm
A python implementation of the Basic Photometric Stereo Algorithm

Photometric-Stereo A python implementation of the Basic Photometric Stereo Algorithm Result Usage run Photometric_Stereo.py Code Tree |data #原始数据,tga格

Python package to monitor the power consumption of any algorithm

CarbonAI This project aims at creating a python package that allows you to monitor the power consumption of any python function. Documentation The com

marching rectangles algorithm in python with clean code.

Marching Rectangles marching rectangles algorithm in python with clean code. Tools Python 3 EasyDraw Creators Mohammad Dori Run the Code Installation

Releases(v0.0.1)
Owner
Vahid Zehtab
Applied & Theoretical Deep Learning Researcher currently studying my last semester of Compute Engineering Bachelors program at Sharif University of Technology
Vahid Zehtab
Python Sorted Container Types: Sorted List, Sorted Dict, and Sorted Set

Python Sorted Containers Sorted Containers is an Apache2 licensed sorted collections library, written in pure-Python, and fast as C-extensions. Python

Grant Jenks 2.8k Jan 04, 2023
Silver Trading Algorithm

Silver Trading Algorithm This project was done in the context of the Applied Algorithm Trading Course (FINM 35910) at the University of Chicago. Motiv

Laurent Lanteigne 1 Jan 29, 2022
Implementation for Evolution of Strategies for Cooperation

Moraliser Implementation for Evolution of Strategies for Cooperation Dependencies You will need a python3 (= 3.8) environment to run the code. Before

1 Dec 21, 2021
A custom prime algorithm, implementation, and performance code & review

Colander A custom prime algorithm, implementation, and performance code & review Pseudocode Algorithm 1. given a number of primes to find, the followi

Finn Lancaster 3 Dec 17, 2021
Supplementary Data for Evolving Reinforcement Learning Algorithms

evolvingrl Supplementary Data for Evolving Reinforcement Learning Algorithms This dataset contains 1000 loss graphs from two experiments: 500 unique g

John Co-Reyes 42 Sep 21, 2022
marching rectangles algorithm in python with clean code.

Marching Rectangles marching rectangles algorithm in python with clean code. Tools Python 3 EasyDraw Creators Mohammad Dori Run the Code Installation

Mohammad Dori 3 Jul 15, 2022
Official implementation of "Path Planning using Neural A* Search" (ICML-21)

Path Planning using Neural A* Search (ICML 2021) This is a repository for the following paper: Ryo Yonetani*, Tatsunori Taniai*, Mohammadamin Barekata

OMRON SINIC X 82 Jan 07, 2023
Sign data using symmetric-key algorithm encryption.

Sign data using symmetric-key algorithm encryption. Validate signed data and identify possible validation errors. Uses sha-(1, 224, 256, 385 and 512)/hmac for signature encryption. Custom hash algori

Artur Barseghyan 39 Jun 10, 2022
A priority of preferences for teacher assignment problem

Genetic-Algorithm-for-Assignment-Problem A priority of preferences for teacher assignment problem Keywords k-partition; clustering; education 4.0 Abst

hades 2 Oct 31, 2022
PathPlanning - Common used path planning algorithms with animations.

Overview This repository implements some common path planning algorithms used in robotics, including Search-based algorithms and Sampling-based algori

Huiming Zhou 5.1k Jan 08, 2023
Algorithms implemented in Python

Python Algorithms Library Laurent Luce Description The purpose of this library is to help you with common algorithms like: A* path finding. String Mat

Laurent Luce 264 Dec 06, 2022
Code for generating alloy / disordered structures through the special quasirandom structure (SQS) algorithm

Code for generating alloy / disordered structures through the special quasirandom structure (SQS) algorithm

Bruno Focassio 1 Nov 10, 2021
Genetic Algorithm for Robby Robot based on Complexity a Guided Tour by Melanie Mitchell

Robby Robot Genetic Algorithm A Genetic Algorithm based Robby the Robot in Chapter 9 of Melanie Mitchell's book Complexity: A Guided Tour Description

Matthew 2 Dec 01, 2022
This is a demo for AAD algorithm.

Asynchronous-Anisotropic-Diffusion-Algorithm This is a demo for AAD algorithm. The subroutine of the anisotropic diffusion algorithm is modified from

3 Mar 21, 2022
An NUS timetable generator which uses a genetic algorithm to optimise timetables to suit the needs of NUS students.

A timetable optimiser for NUS which uses an evolutionary algorithm to "breed" a timetable suited to your needs.

Nicholas Lee 3 Jan 09, 2022
N Queen Problem using Genetic Algorithm

The N Queen is the problem of placing N chess queens on an N×N chessboard so that no two queens attack each other.

Mahdi Hassanzadeh 2 Nov 11, 2022
A collection of design patterns/idioms in Python

python-patterns A collection of design patterns and idioms in Python. Current Patterns Creational Patterns: Pattern Description abstract_factory use a

Sakis Kasampalis 36.2k Jan 05, 2023
frePPLe - open source supply chain planning

frePPLe Open source supply chain planning FrePPLe is an easy-to-use and easy-to-implement open source advanced planning and scheduling tool for manufa

frePPLe 385 Jan 06, 2023
A genetic algorithm written in Python for educational purposes.

Genea: A Genetic Algorithm in Python Genea is a Genetic Algorithm written in Python, for educational purposes. I started writing it for fun, while lea

Dom De Felice 20 Jul 06, 2022
Algoritmos de busca:

Algoritmos-de-Buscas Algoritmos de busca: Abaixo está a interface da aplicação: Ao selecionar o tipo de busca e o caminho, então será realizado o cálc

Elielson Barbosa 5 Oct 04, 2021