Measure file similarity in a many-to-many fashion

Last update: Feb 02, 2022

Overview

Mesi

Mesi is a tool to measure the similarity in a many-to-many fashion of long-form documents like Python source code or technical writing. The output can be useful in determining which of a collection of files are the most similar to each other.

Installation

Python 3.9+ and pipx are recommended, although Python 3.6+ and/or pip will also work.

pipx install mesi

If you'd like to test out Mesi before installing it, use the remote execution feature of pipx, which will temporarily download Mesi and run it in an isolated virtual environment.

pipx run mesi --help

Usage

For a directory structure that looks like:

lab-one
├── StudentOne
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
├── StudentTwo
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
│

where similarity should be measured between each student's deliverables/python_program.py file, run the command:

mesi lab-one/*/deliverables/python_program.py

A lower distance in the produced table equates to a higher degree of similarity.

See the help menu (mesi --help) for additional options and configuration.

Algorithms

There are many algorithms to choose from when comparing string similarity! Mesi implements all the algorithms provided by TextDistance. In general levenshtein is never a bad choice, which is why it is the default.

Bugs/Requests

Please use the GitHub issue tracker to submit bugs or request new features, options, or algorithms.

Dependencies

Mesi uses two primary dependencies for text similarity calculation: polyleven, and TextDistance. Polyleven is the default, as its singular implementation of Levenshtein distance can be faster in most situations. However, if a different edit distance algorithm is requested, TextDistance's implementations will be used.

License

Distributed under the terms of the GPL v3 license, mesi is free and open source software.

File-manager - A basic file manager, written in Python

File Manager A basic file manager, written in Python. Installation Install Pytho

1 Feb 5, 2022

Two scripts help you to convert csv file to md file by template

Two scripts help you to convert csv file to md file by template. One help you generate multiple md files with different filenames from the first colume of csv file. Another can generate one md file with several blocks.

2 Oct 15, 2022

A simple Python code that takes input from a csv file and makes it into a vcf file.

Contacts-Maker A simple Python code that takes input from a csv file and makes it into a vcf file. Imagine a college or a large community where each y

1 Feb 13, 2022

This program can help you to move and rename many files at once

This program can help you to rename and save many files in a folder in seconds, but don't give the same name to files, it can delete both files.

1 Oct 10, 2022

Object-oriented file system path manipulation

path (aka path pie, formerly path.py) implements path objects as first-class entities, allowing common operations on files to be invoked on those path

1k Dec 28, 2022

An object-oriented approach to Python file/directory operations.

Unipath An object-oriented approach to file/directory operations Version: 1.1 Home page: https://github.com/mikeorr/Unipath Docs: https://github.com/m

506 Dec 29, 2022

File support for asyncio

aiofiles: file support for asyncio aiofiles is an Apache2 licensed library, written in Python, for handling local disk files in asyncio applications.

2.1k Jan 1, 2023

Object-oriented file system path manipulation

path (aka path pie, formerly path.py) implements path objects as first-class entities, allowing common operations on files to be invoked on those path

1k Dec 28, 2022

A platform independent file lock for Python

py-filelock This package contains a single module, which implements a platform independent file lock in Python, which provides a simple way of inter-p

497 Jan 5, 2023

Releases(v1.1.0)

v1.1.0(Dec 8, 2021)

This release adds an --average option, which prints the average distance computed against the given files after the individual comparison distances. An unimplemented --distribution option, intended for use in the future, was also added.
Source code(tar.gz)
Source code(zip)
v1.0.2(Oct 26, 2021)

This release adds more documentation, a --version option, and removes the bright white table styling from previous versions that were hard to read on light-themed terminals.

Full Changelog: https://github.com/Michionlion/mesi/compare/v1.0.1...v1.0.2
Source code(tar.gz)
Source code(zip)
v1.0.1(Oct 26, 2021)

This release fixes a bug where in some situations, some distinct parts of compared file names were not displayed. Additionally, a new option, --table-format, was added. This allows configuration of the output table format, using tabulate's table formats.

Full Changelog: https://github.com/Michionlion/mesi/compare/v1.0.0...v1.0.1
Source code(tar.gz)
Source code(zip)
v1.0.0(Oct 25, 2021)
Mesi is a tool to check the similarity between many different files; this is its first release, with support for many basic features and lots of text similarity/distance algorithms. You can even try out Mesi without downloading it!

pipx run mesi --help

If you don't have pipx installed, you may need to install it, but that's something you should do anyways. :smile:
Source code(tar.gz)
Source code(zip)

Owner

GatorEducator

Software tools developed at Allegheny College for computer science courses

GitHub Repository

A tiny Configuration File Parser for Python Projects

A tiny Configuration File Parser for Python Projects. Currently working on JSON Config Files only.

1 Feb 12, 2022

ZipFly is a zip archive generator based on zipfile.py

ZipFly is a zip archive generator based on zipfile.py. It was created by Buzon.io to generate very large ZIP archives for immediate sending out to clients, or for writing large ZIP archives without m

506 Jan 04, 2023

Vericopy - This Python script provides various usage modes for secure local file copying and hashing.

Vericopy This Python script provides various usage modes for secure local file copying and hashing. Hash data is captured and logged for paths before

15 Nov 05, 2022

Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of.

Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of. Th

1.1k Jan 05, 2023

Python code snippets for extracting PDB codes from .fasta files

Python_snippets_for_bioinformatics Python code snippets for extracting PDB codes from .fasta files If you have a single .fasta file for all protein se

3 Feb 09, 2022

Object-oriented file system path manipulation

path (aka path pie, formerly path.py) implements path objects as first-class entities, allowing common operations on files to be invoked on those path

1k Dec 28, 2022

MetaMove is written in Python3 and aims at easing batch renaming operations based on file meta data.

MetaMove MetaMove is written in Python3 and aims at easing batch renaming operations based on file meta data. MetaMove abuses eval combined with f-str

2 Dec 28, 2021

CredSweeper is a tool to detect credentials in any directories or files.

CredSweeper is a tool to detect credentials in any directories or files. CredSweeper could help users to detect unwanted exposure of credentials (such as personal information, token, passwords, api k

54 Dec 13, 2022

Lumar - Smart File Creator

Lumar is a free tool for creating and managing files. With Lumar you can quickly create any type of file, add a file content and file size. With Lumar you can also find out if Photoshop or other imag

3 Dec 10, 2021

dotsend is a web application which helps you to upload your large files and share file via link

0 Dec 03, 2022

Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.

13 Nov 23, 2022

🧹 Create symlinks for .m2ts files and classify them into directories in yyyy-mm format.

2 Feb 07, 2022

Various technical documentation, in electronically parseable format

a-pile-of-documentation Various technical documentation, in electronically parseable format. You will need Python 3 to run the scripts and programs in

2 Nov 20, 2022

Pti-file-format - Reverse engineering the Polyend Tracker instrument file format

pti-file-format Reverse engineering the Polyend Tracker instrument file format.

14 Dec 30, 2022

This program can help you to move and rename many files at once

This program can help you to rename and save many files in a folder in seconds, but don't give the same name to files, it can delete both files.

1 Oct 10, 2022

File storage with API access. Used as a part of the Swipio project

API File storage File storage with API access. Used as a part of the Swipio project 📝 About The Project File storage allows you to upload and downloa

25 Sep 17, 2022

Two scripts help you to convert csv file to md file by template

Two scripts help you to convert csv file to md file by template. One help you generate multiple md files with different filenames from the first colume of csv file. Another can generate one md file w

2 Oct 15, 2022

useful files for the Freenove Big Hexapod

FreenoveBigHexapod useful files for the Freenove Big Hexapod HexaDogPos is a utility for converting the Freenove xyz co-ordinate system to servo angle

2 May 28, 2022

A tool for batch processing large fasta files and accompanying metadata table to upload to repositories via API

Fasta Uploader A tool for batch processing large fasta files and accompanying metadata table to repositories via API The python fasta_uploader.py scri

1 Dec 09, 2021

A JupyterLab extension that allows opening files and directories with external desktop applications.

0 Oct 14, 2021

Measure file similarity in a many-to-many fashion

Related tags

Overview

Mesi

Installation

Usage

Algorithms

Bugs/Requests

Dependencies

License

You might also like...

File-manager - A basic file manager, written in Python

Two scripts help you to convert csv file to md file by template

A simple Python code that takes input from a csv file and makes it into a vcf file.

This program can help you to move and rename many files at once

Object-oriented file system path manipulation

An object-oriented approach to Python file/directory operations.

File support for asyncio

Object-oriented file system path manipulation

A platform independent file lock for Python

Releases(v1.1.0)

v1.1.0(Dec 8, 2021)

v1.0.2(Oct 26, 2021)

v1.0.1(Oct 26, 2021)

v1.0.0(Oct 25, 2021)

Owner

GatorEducator

A tiny Configuration File Parser for Python Projects

ZipFly is a zip archive generator based on zipfile.py

Vericopy - This Python script provides various usage modes for secure local file copying and hashing.

Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of.

Python code snippets for extracting PDB codes from .fasta files

Object-oriented file system path manipulation

MetaMove is written in Python3 and aims at easing batch renaming operations based on file meta data.

CredSweeper is a tool to detect credentials in any directories or files.

Lumar - Smart File Creator

dotsend is a web application which helps you to upload your large files and share file via link

Extract longest transcript or longest CDS transcript from GTF annotation file or gencode transcripts fasta file.

🧹 Create symlinks for .m2ts files and classify them into directories in yyyy-mm format.

Various technical documentation, in electronically parseable format

Pti-file-format - Reverse engineering the Polyend Tracker instrument file format

This program can help you to move and rename many files at once

File storage with API access. Used as a part of the Swipio project

Two scripts help you to convert csv file to md file by template

useful files for the Freenove Big Hexapod

A tool for batch processing large fasta files and accompanying metadata table to upload to repositories via API

A JupyterLab extension that allows opening files and directories with external desktop applications.