Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

Last update: Dec 18, 2022

Overview

What is Markup?

Usage

A full-feature version of Markup is available both via website and local installation.

Online

The online version of Markup can be found here.

Local Server

Docker

Run docker run -d -p 8000:8000 samueldobbie/markup and visit http://localhost:8000.

Manual Installation

Clone or download the repository.
Run python setup.py using 64-bit Python3.
Visit http://localhost:8000.

For futher sessions, the local server can be started directly by running python manage.py runserver localhost:8000.

Documentation

Documentation to help with setting up and using Markup can be found here.

Features

Ability to navigate between and annotate multiple documents in a single session.
Predictive annotation suggestions (incl. attributes) using underlying active learning and sequence-to-sequence models.
Integrated access to pre-loaded and user-defined ontologies, enabling predictive mappings and direct querying.
Built-in configuration file creator.
Built-in synthetic data generator and custom model trainer (local version only due to high computational expense).
Dynamic attribute display.
Any number of overlaying annotations, enabling the capture of complex data.
Full-feature tool available via local installation and website.
Dark mode.

Future Plans

Add user accounts.
Add ability for users to join a team and share ontologies, documents, guidelines, annotations, etc.
Accessible version for colour-blind users.
Add ability to perform text and image classification.
Add ability to annotate images.

Known Bugs / Issues

Annotations may be offset when annotating across newlines in CRLF (Windows) text documents. The offset is purely visual; the exported indicies will be correct.
When using the website version of Markup, certain features may freeze while annotations are being predicted.

Related tags

Overview

What is Markup?

Usage

Online

Local Server

Docker

Manual Installation

Documentation

Features

Future Plans

Known Bugs / Issues

Owner

Samuel Dobbie

Implementation of hashids (http://hashids.org) in Python. Compatible with Python 2 and Python 3

This repository contains scripts to control a RGB text fan attached to a Raspberry Pi.

Hspell, the free Hebrew spellchecker and morphology engine.

Word and phrase lists in CSV

Python tool to make adding to your armory spreadsheet armory less of a pain.

Convert text to morse code and play morse code sound.

Microsoft's Cascadia Code font customized to my liking.

This is an AI that is supposed to say you if your text is formal or not

A minimal python script for generating multiple onetime use bip39 seed phrases

A python tool one can extract the "hash" from a WINDOWS HELLO PIN

A collection of pre-commit hooks for handling text files.

汉字转拼音(pypinyin)

Returns unicode slugs

Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

REST API for sentence tokenization and embedding using Multilingual Universal Sentence Encoder.

A python Tk GUI that creates, writes text and attaches images into a custom spreadsheet file

🐸 Identify anything. pyWhat easily lets you identify emails, IP addresses, and more. Feed it a .pcap file or some text and it'll tell you what it is! 🧙‍♀️

Hotpotato is a recipe portfolio App that assists users to discover and comment new recipes.

An extension to detect if the articles content match its title.

Map Reduce Wordcount in Python using gRPC