Measure file similarity in a many-to-many fashion

Overview

Mesi

Lint and Test codecov PyPI PyPI - Downloads License


Mesi is a tool to measure the similarity in a many-to-many fashion of long-form documents like Python source code or technical writing. The output can be useful in determining which of a collection of files are the most similar to each other.

Installation

Python 3.9+ and pipx are recommended, although Python 3.6+ and/or pip will also work.

pipx install mesi

If you'd like to test out Mesi before installing it, use the remote execution feature of pipx, which will temporarily download Mesi and run it in an isolated virtual environment.

pipx run mesi --help

Usage

For a directory structure that looks like:

lab-one
├── StudentOne
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
├── StudentTwo
│   ├── pyproject.toml
│   ├── deliverables
│   │   └── python_program.py
│   └── README.md
│

where similarity should be measured between each student's deliverables/python_program.py file, run the command:

mesi lab-one/*/deliverables/python_program.py

A lower distance in the produced table equates to a higher degree of similarity.

See the help menu (mesi --help) for additional options and configuration.

Algorithms

There are many algorithms to choose from when comparing string similarity! Mesi implements all the algorithms provided by TextDistance. In general levenshtein is never a bad choice, which is why it is the default.

Bugs/Requests

Please use the GitHub issue tracker to submit bugs or request new features, options, or algorithms.

Dependencies

Mesi uses two primary dependencies for text similarity calculation: polyleven, and TextDistance. Polyleven is the default, as its singular implementation of Levenshtein distance can be faster in most situations. However, if a different edit distance algorithm is requested, TextDistance's implementations will be used.

License

Distributed under the terms of the GPL v3 license, mesi is free and open source software.

You might also like...
File-manager - A basic file manager, written in Python

File Manager A basic file manager, written in Python. Installation Install Pytho

Two scripts help you to convert csv file to md file by template

Two scripts help you to convert csv file to md file by template. One help you generate multiple md files with different filenames from the first colume of csv file. Another can generate one md file with several blocks.

A simple Python code that takes input from a csv file and makes it into a vcf file.

Contacts-Maker A simple Python code that takes input from a csv file and makes it into a vcf file. Imagine a college or a large community where each y

This program can help you to move and rename many files at once
This program can help you to move and rename many files at once

This program can help you to rename and save many files in a folder in seconds, but don't give the same name to files, it can delete both files.

Object-oriented file system path manipulation

path (aka path pie, formerly path.py) implements path objects as first-class entities, allowing common operations on files to be invoked on those path

An object-oriented approach to Python file/directory operations.

Unipath An object-oriented approach to file/directory operations Version: 1.1 Home page: https://github.com/mikeorr/Unipath Docs: https://github.com/m

File support for asyncio

aiofiles: file support for asyncio aiofiles is an Apache2 licensed library, written in Python, for handling local disk files in asyncio applications.

Object-oriented file system path manipulation

path (aka path pie, formerly path.py) implements path objects as first-class entities, allowing common operations on files to be invoked on those path

A platform independent file lock for Python
A platform independent file lock for Python

py-filelock This package contains a single module, which implements a platform independent file lock in Python, which provides a simple way of inter-p

Releases(v1.1.0)
  • v1.1.0(Dec 8, 2021)

    This release adds an --average option, which prints the average distance computed against the given files after the individual comparison distances. An unimplemented --distribution option, intended for use in the future, was also added.

    Source code(tar.gz)
    Source code(zip)
  • v1.0.2(Oct 26, 2021)

    This release adds more documentation, a --version option, and removes the bright white table styling from previous versions that were hard to read on light-themed terminals.

    Full Changelog: https://github.com/Michionlion/mesi/compare/v1.0.1...v1.0.2

    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Oct 26, 2021)

    This release fixes a bug where in some situations, some distinct parts of compared file names were not displayed. Additionally, a new option, --table-format, was added. This allows configuration of the output table format, using tabulate's table formats.

    Full Changelog: https://github.com/Michionlion/mesi/compare/v1.0.0...v1.0.1

    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Oct 25, 2021)

    Mesi is a tool to check the similarity between many different files; this is its first release, with support for many basic features and lots of text similarity/distance algorithms. You can even try out Mesi without downloading it!

    pipx run mesi --help
    

    If you don't have pipx installed, you may need to install it, but that's something you should do anyways. :smile:

    Source code(tar.gz)
    Source code(zip)
Owner
GatorEducator
Software tools developed at Allegheny College for computer science courses
GatorEducator
Dragon Age: Origins toolset to extract/build .erf files, patch language-specific .dlg files, and view the contents of files in the ERF or GFF format

DAOTools This is a set of tools for Dragon Age: Origins modding. It can patch the text lines of .dlg files, extract and build an .erf file, and view t

8 Dec 06, 2022
Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Python function to stream unzip all the files in a ZIP archive: without loading the entire ZIP file or any of its files into memory at once

Department for International Trade 206 Jan 02, 2023
A simple file module for creating, editing and saving files.

A simple file module for creating, editing and saving files.

1 Nov 25, 2021
Instant Fuzzy File Search for Alfred

List all the files inside a folder using fd, and instantly fuzzy-search through all of them using fzf, all from inside Alfred with a single keyword: fzf.

Mr. Pennyworth 37 Nov 30, 2022
A simple file sharing tool written in python

Share it A simple file sharing tool written in python Installation If you are using Windows os you can directly Run .exe file -- download If you are

Sachit Yadav 7 Dec 16, 2022
PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

PaddingZip - a tool that you can craft a zip file that contains the padding characters between the file content.

phithon 53 Nov 07, 2022
Here is some Python code that allows you to read in SVG files and approximate their paths using a Fourier series.

Here is some Python code that allows you to read in SVG files and approximate their paths using a Fourier series. The Fourier series can be animated and visualized, the function can be output as a tw

Alexander 12 Jan 01, 2023
A simple Python code that takes input from a csv file and makes it into a vcf file.

Contacts-Maker A simple Python code that takes input from a csv file and makes it into a vcf file. Imagine a college or a large community where each y

1 Feb 13, 2022
A Certificate renaming tool made for IEEE CS SBC, SJCE.

PDF Batch Renamer Made for IEEE CS SBC, SJCE How to use? Before using the python script, ensure that pytesseract, pdf2image, opencv and other supporti

Ashwin Kumar U 2 Nov 14, 2021
An object-oriented approach to Python file/directory operations.

Unipath An object-oriented approach to file/directory operations Version: 1.1 Home page: https://github.com/mikeorr/Unipath Docs: https://github.com/m

Mike Orr 506 Dec 29, 2022
The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

The best way to convert files on your computer, be it .pdf to .png, .pdf to .docx, .png to .ico, or anything you can imagine.

JareBear 2 Nov 20, 2021
MHS2 Save file editing tools. Transfers save files between players, switch and pc version, encrypts and decrypts.

SaveTools MHS2 Save file editing tools. Transfers save files between players, switch and pc version, encrypts and decrypts. Credits Written by Asteris

31 Nov 17, 2022
Python interface for reading and appending tar files

Python interface for reading and appending tar files, while keeping a fast index for finding and reading files in the archive. This interface has been

Lawrence Livermore National Laboratory 1 Nov 12, 2021
Copy only text-like files from the folder

copy-only-text-like-files-from-folder-python copy only text-like files from the folder This project is for those who want to copy only source code or

1 May 17, 2022
BREP : Binary Search in plaintext and gzip files

BREP : Binary Search in plaintext and gzip files Search large files in O(log n) time using binary search. We support plaintext and Gzipped files. Benc

Arnaud de Saint Meloir 5 Dec 24, 2021
File support for asyncio

aiofiles: file support for asyncio aiofiles is an Apache2 licensed library, written in Python, for handling local disk files in asyncio applications.

Tin Tvrtković 2.1k Jan 01, 2023
Simple Python File Manager

This script lets you automatically relocate files based on their extensions. Very useful from the downloads folder !

Aimé Risson 22 Dec 27, 2022
organize - The file management automation tool

organize - The file management automation tool

Thomas Feldmann 1.5k Jan 01, 2023
OneDriveExplorer - A command line and GUI based application for reconstructing the folder strucure of OneDrive from the UserCid.dat file

OneDriveExplorer - A command line and GUI based application for reconstructing the folder strucure of OneDrive from the UserCid.dat file

Brian Maloney 100 Dec 13, 2022
Pti-file-format - Reverse engineering the Polyend Tracker instrument file format

pti-file-format Reverse engineering the Polyend Tracker instrument file format.

Jaap Roes 14 Dec 30, 2022