My own Unicode compression algorithm

Overview

Zee Code

ZCode is a custom compression algorithm I originally developed for a competition held for the Spring 2019 Datastructures and Algorithms course of Dr. Mahdi Safarnejad-Boroujeni at Sharif University of Technology, at which I became first-place. The code is pretty slow and has a lot of room for optimization, but it is pretty readable. It can be an excellent educational resource for whoever is starting on compression algorithms.

The algorithm is a cocktail of classical compression algorithms mixed and served for Unicode documents. It hinges around the LZW algorithm to create a finite size symbol dictionary; the results are then byte-coded into variable-length custom symbols, which I call zee codes! Finally, the symbol table is truncated accordingly, and the compressed document is encoded into a byte stream.

Huffman trees highly inspire zee codes, but because in normal texts, symbols are usually much more uniformly distributed than the original geometrical (or exponential) distribution assumption for effective Huffman coding, the gains of using variable-sized byte-codes both from an implementation and performance perspective outweighed bit Huffman encodings. Results may vary, but my tests showed a steady ~4-5x compression ratio on Farsi texts, which is pretty nice!

Installation

ZCode is available on pip, and only requires a 3.6 or higher python installation beforehand.

pip install -U zcode

Usage

You can run the algorithm for any utf-8 encoded file using the zcode command. It will automatically decompress files ending with a .zee extensions and compress others into .zee files, but you can always override the default behavior by providing optional arguments like:

zcode INPUTFILE [--output OUTPUT_FILE --action compress/decompress --symbol-size SYMBOL_SIZE --code-size CODE_SIZE]

The symbol-size argument controls the algorithms' buffer size for processing symbols (in bytes). It is automatically set depending on your input file size but you can change it as you wish. code-size controls the maximum length for coded bytes while encoding symbols (this equals to 2 by default and needs to be provided to the algorithm upon decompression).

LICENSE

MIT LICENSE, see vahidzee/zcode/LICENSE

You might also like...
Search algorithm implementations meant for teaching

Search-py A collection of search algorithms for teaching and experimenting. Non-adversarial Search There’s a heavy separation of concerns which leads

8-puzzle-solver with UCS, ILS, IDA* algorithm
8-puzzle-solver with UCS, ILS, IDA* algorithm

Eight Puzzle 8-puzzle-solver with UCS, ILS, IDA* algorithm pre-usage requirements python3 python3-pip virtualenv prepare enviroment virtualenv -p pyth

A Python Package for Portfolio Optimization using the Critical Line Algorithm

A Python Package for Portfolio Optimization using the Critical Line Algorithm

🧬 Training the car to do self-parking using a genetic algorithm
🧬 Training the car to do self-parking using a genetic algorithm

🧬 Training the car to do self-parking using a genetic algorithm

A raw implementation of the nearest insertion algorithm to resolve TSP problems in a TXT format.

TSP-Nearest-Insertion A raw implementation of the nearest insertion algorithm to resolve TSP problems in a TXT format. Instructions Load a txt file wi

A GUI visualization of QuickSort algorithm
A GUI visualization of QuickSort algorithm

QQuickSort A simple GUI visualization of QuickSort algorithm. It only uses PySide6, it does not have any other external dependency. How to run Install

A python implementation of the Basic Photometric Stereo Algorithm
A python implementation of the Basic Photometric Stereo Algorithm

Photometric-Stereo A python implementation of the Basic Photometric Stereo Algorithm Result Usage run Photometric_Stereo.py Code Tree |data #原始数据,tga格

Python package to monitor the power consumption of any algorithm

CarbonAI This project aims at creating a python package that allows you to monitor the power consumption of any python function. Documentation The com

marching rectangles algorithm in python with clean code.

Marching Rectangles marching rectangles algorithm in python with clean code. Tools Python 3 EasyDraw Creators Mohammad Dori Run the Code Installation

Releases(v0.0.1)
Owner
Vahid Zehtab
Applied & Theoretical Deep Learning Researcher currently studying my last semester of Compute Engineering Bachelors program at Sharif University of Technology
Vahid Zehtab
A library for benchmarking, developing and deploying deep learning anomaly detection algorithms

A library for benchmarking, developing and deploying deep learning anomaly detection algorithms Key Features • Getting Started • Docs • License Introd

OpenVINO Toolkit 1.5k Jan 04, 2023
An NUS timetable generator which uses a genetic algorithm to optimise timetables to suit the needs of NUS students.

A timetable optimiser for NUS which uses an evolutionary algorithm to "breed" a timetable suited to your needs.

Nicholas Lee 3 Jan 09, 2022
Implemented page rank program

Page Rank Implemented page rank program based on fact that a website is more important if it is linked to by other important websites using recursive

Vaibhaw 6 Aug 24, 2022
Resilient Adaptive Parallel sImulator for griD (rapid)

Rapid is an open-source software library that implements a novel “parallel-in-time” (Parareal) algorithm and semi-analytical solutions for co-simulation of integrated transmission and distribution sy

Richard Lincoln 7 Sep 07, 2022
Path finding algorithm visualizer with python

path-finding-algorithm-visualizer ~ click on the grid to place the starting block and then click elsewhere to add the end block ~ click again to place

izumi 1 Oct 31, 2021
Implementation of Apriori algorithms via Python

Installing run bellow command for installing all packages pip install -r requirements.txt Data Put csv data under this directory "infrastructure/data

Mahdi Rezaei 0 Jul 25, 2022
Evol is clear dsl for composable evolutionary algorithms that optimised for joy.

Evol is clear dsl for composable evolutionary algorithms that optimised for joy. Installation We currently support python3.6 and python3.7 and you can

GoDataDriven 178 Dec 27, 2022
Rover. Finding the shortest pass by Dijkstra’s shortest path algorithm

rover Rover. Finding the shortest path by Dijkstra’s shortest path algorithm Задача Вы — инженер, проектирующий роверы-беспилотники. Вам надо спроекти

1 Nov 11, 2021
Infomap is a network clustering algorithm based on the Map equation.

Infomap Infomap is a network clustering algorithm based on the Map equation. For detailed documentation, see mapequation.org/infomap. For a list of re

347 Dec 23, 2022
Wordle-solver - A program that solves a Wordle using a simple algorithm

Wordle Solver A program that solves a Wordle using a simple algorithm. To see it

Luc Bouchard 3 Feb 13, 2022
🧬 Performant Evolutionary Algorithms For Python with Ray support

🧬 Performant Evolutionary Algorithms For Python with Ray support

Nathan 49 Oct 20, 2022
The DarkRift2 networking framework written in Python 3

DarkRiftPy is Darkrift2 written in Python 3. The implementation is fully compatible with the original version. So you can write a client side on Python that connects to a Darkrift2 server written in

Anton Dobryakov 6 May 23, 2022
A simple library for implementing common design patterns.

PyPattyrn from pypattyrn.creational.singleton import Singleton class DummyClass(object, metaclass=Singleton): # DummyClass is now a Singleton!

1.7k Jan 01, 2023
A priority of preferences for teacher assignment problem

Genetic-Algorithm-for-Assignment-Problem A priority of preferences for teacher assignment problem Keywords k-partition; clustering; education 4.0 Abst

hades 2 Oct 31, 2022
Slight modification to one of the Facebook Salina examples, to test the A2C algorithm on financial series.

Facebook Salina - Gym_AnyTrading Slight modification of Facebook Salina Reinforcement Learning - A2C GPU example for financial series. The gym FOREX d

Francesco Bardozzo 5 Mar 14, 2022
Path tracing obj - (taichi course final project) a path tracing renderer that can import and render obj files

Path tracing obj - (taichi course final project) a path tracing renderer that can import and render obj files

5 Sep 10, 2022
N Queen Problem using Genetic Algorithm

The N Queen is the problem of placing N chess queens on an N×N chessboard so that no two queens attack each other.

Mahdi Hassanzadeh 2 Nov 11, 2022
This project is an implementation of a simple K-means algorithm

Simple-Kmeans-Clustering-Algorithm Abstract K-means is a centroid-based algorithm, or a distance-based algorithm, where we calculate the distances to

Saman Khamesian 7 Aug 09, 2022
Parameterising Simulated Annealing for the Travelling Salesman Problem

Parameterising Simulated Annealing for the Travelling Salesman Problem Abstract The Travelling Salesman Problem is a well known NP-Hard problem. Given

Gary Sun 55 Jun 15, 2022
Multiple Imputation with Random Forests in Python

miceforest: Fast, Memory Efficient Imputation with lightgbm Fast, memory efficient Multiple Imputation by Chained Equations (MICE) with lightgbm. The

Samuel Wilson 202 Dec 31, 2022