Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

Overview

What is Markup?

Markup is an online annotation tool that can be used to transform unstructured documents into structured formats for NLP and ML tasks, such as named-entity recognition. Markup learns as you annotate in order to predict and suggest complex annotations. Markup also provides integrated access to existing and custom ontologies, enabling the prediction and suggestion of ontology mappings based on the text you're annotating.

Usage

A full-feature version of Markup is available both via website and local installation.

Online

The online version of Markup can be found here.

Local Server

Docker

Run docker run -d -p 8000:8000 samueldobbie/markup and visit http://localhost:8000.

Manual Installation

  1. Clone or download the repository.

  2. Run python setup.py using 64-bit Python3.

  3. Visit http://localhost:8000.

For futher sessions, the local server can be started directly by running python manage.py runserver localhost:8000.

Documentation

Documentation to help with setting up and using Markup can be found here.

Features

  • Ability to navigate between and annotate multiple documents in a single session.
  • Predictive annotation suggestions (incl. attributes) using underlying active learning and sequence-to-sequence models.
  • Integrated access to pre-loaded and user-defined ontologies, enabling predictive mappings and direct querying.
  • Built-in configuration file creator.
  • Built-in synthetic data generator and custom model trainer (local version only due to high computational expense).
  • Dynamic attribute display.
  • Any number of overlaying annotations, enabling the capture of complex data.
  • Full-feature tool available via local installation and website.
  • Dark mode.

Future Plans

  • Add user accounts.
  • Add ability for users to join a team and share ontologies, documents, guidelines, annotations, etc.
  • Accessible version for colour-blind users.
  • Add ability to perform text and image classification.
  • Add ability to annotate images.

Known Bugs / Issues

  • Annotations may be offset when annotating across newlines in CRLF (Windows) text documents. The offset is purely visual; the exported indicies will be correct.
  • When using the website version of Markup, certain features may freeze while annotations are being predicted.
知乎评论区词云分析

zhihu-comment-wordcloud 知乎评论区词云分析 起源于:如何看待知乎问题“男生真的很不能接受彩礼吗?”的一个回答下评论数超8万条,创单个回答下评论数新记录? 项目代码说明 2.download_comment.py 下载全量评论 2.word_cloud_by_dt 生成词云 2

李国宝 10 Sep 26, 2022
A slugifier that works in unicode

Unicode Slugify Unicode Slugify is a slugifier that generates unicode slugs. It was originally used in the Firefox Add-ons web site to generate slugs

Mozilla 315 Nov 21, 2022
AnnIE - Annotation Platform, tool for open information extraction annotations using text files.

AnnIE - Annotation Platform, tool for open information extraction annotations using text files.

Niklas 29 Dec 20, 2022
Extract knowledge from raw text

Extract knowledge from raw text This repository is a nearly copy-paste of "From Text to Knowledge: The Information Extraction Pipeline" with some cosm

Raphael Sourty 10 Dec 03, 2022
Paranoid text spacing in Python

pangu.py Paranoid text spacing for good readability, to automatically insert whitespace between CJK (Chinese, Japanese, Korean) and half-width charact

Vinta Chen 194 Nov 19, 2022
"Complexity" of Flags of the countries of the world

"Complexity" of Flags of the countries of the world Flags (png) from: https://flagcdn.com/w2560.zip https://flagpedia.net/download/images run: chmod +

Alexander Lelchuk 1 Feb 10, 2022
CowExcept - Spice up those exceptions with cowexcept!

CowExcept - Spice up those exceptions with cowexcept!

James Ansley 41 Jun 30, 2022
Shows twitch pay for any streamer from Twitch leaked CSV files.

twitch_leak_csv_reader Shows twitch pay for any streamer from Twitch leaked CSV files. Requirements: You need python3 (you can install python 3 from o

5 Nov 11, 2022
A collection of pre-commit hooks for handling text files.

texthooks A collection of pre-commit hooks for handling text files. In particular, hooks for handling unicode characters which may be undesirable in a

Stephen Rosen 5 Oct 28, 2022
Python flexible slugify function

awesome-slugify Python flexible slugify function PyPi: https://pypi.python.org/pypi/awesome-slugify Github: https://github.com/dimka665/awesome-slugif

Dmitry Voronin 471 Dec 20, 2022
Hamming code generation, error detection & correction.

Hamming code generation, error detection & correction.

Farhan Bin Amin 2 Jun 30, 2022
Parse Any Text With Python

ParseAnyText A small package to parse strings. What is the work of it? Well It's a module to creates parser that helps to parse a text easily with les

Sayam Goswami 1 Jan 11, 2022
🚩 A simple and clean python banner generator - Banners

🚩 A simple and clean python banner generator - Banners

Kumar Vicku 12 Oct 09, 2022
🍋 A Python package to process food

Pyfood is a simple Python package to process food, in different languages. Pyfood's ambition is to be the go-to library to deal with food, recipes, on

Local Seasonal 8 Apr 04, 2022
Fuzzy String Matching in Python

FuzzyWuzzy Fuzzy string matching like a boss. It uses Levenshtein Distance to calculate the differences between sequences in a simple-to-use package.

SeatGeek 8.8k Jan 08, 2023
Athens: a great tool for taking notes and organising knowldge

AthensSyncer Athens is a great tool for taking notes and organising knowldge. But it is a bummer that you cannot use it accross multiple devices. Well

6 Dec 14, 2022
Um simulador de caixa registradora com database usando arquivos .txt

🛒 Caixa Registradora V2 ❓ - Como usar? Execute o caixa-registradora.py, nele vai ter um menu interativo, você pode cadastrar diversos produtos em um

Gabriel 0 Sep 25, 2022
A generator library for concise, unambiguous and URL-safe UUIDs.

Description shortuuid is a simple python library that generates concise, unambiguous, URL-safe UUIDs. Often, one needs to use non-sequential IDs in pl

Stavros Korokithakis 1.8k Dec 31, 2022
box is a text-based visual programming language inspired by Unreal Engine Blueprint function graphs.

Box is a text-based visual programming language inspired by Unreal Engine blueprint function graphs. $ cat factorial.box ┌─ƒ(Factorial)───┐

Pranav 104 Dec 24, 2022
split Word file by chapter

split Word file by chapter we use the mircosoft word api to code this tool api url:https://docs.microsoft.com/zh-cn/dotnet/api/ if this tool is good f

wisdom under lemon trees 5 Nov 06, 2021