For making Tagtog annotation into csv dataset

Last update: Dec 28, 2021

Overview

tagtog_relation_extraction

for making Tagtog annotation into csv dataset

How to Use

On Tagtog

1. Go to Project > Downloads
2. Download all documents, using the button below

On Local

1. Place folders and files according to the structure specified below:

tagtog_relation_extraction
├──main.py
├──util.py
├──.gitignore
├──README.md
├──requirements.txt
└──Your_download_file_Name
   ├──annotations-legend.json
   ├──ann.json
   |  └──master
   |     └──pool/
   ├──plain.html
   |  └──pool/
   ├──guidelines.md
   └──README.md

2. Install other required packages

tqdm==4.62.3
pandas==1.1.5
beautifulsoup4==4.10.0

$ pip install -r $ROOT/tagtog_relation_extraction/requirements.txt

3. Run

$ python main.py --path Your_download_file_Name

Result

1. Dataset file (dataset.csv)

csv file with rows in KLUE dataset format
example:

sentence: 가장 가능성이 높은 새 대안은 플랑크 상수를 통해 질량을 정의하는 방안이다.질량의 단위는 킬로그램 외에도 여러가지가 있는데, 그중 대표적인 단위가 바로 원자질량단위이다
sub_tag: {'word': '원자질량단위', 'start_idx': 85, 'end_idx': 90, 'type': 'POH'}
obj_tag: {'word': '플랑크 상수', 'start_idx': 17, 'end_idx': 22, 'type': 'POH'}
label: POH:no_relation'

2. File for checking answers (answer_check.csv)

csv file desgined for checking entity taggings and labels
example:

sentence: 가장 가능성이 높은 새 대안은 
   
    를 통해 질량을 정의하는 방안이다.질량의 단위는 킬로그램 외에도 여러가지가 있는데, 그중 대표적인 단위가 바로 
    
     이다	
sub_tag: POH
obj_tag: POH
label: POH:no_relation

Restrictions

Entity labels should follow the following form

SUBJ-{ENT_TYPE}-{RELATION_NAME}
OBJ-{ENT_TYPE}-{RELATION_NAME}

If this is not the case you might need some revision on the util.py file

For making Tagtog annotation into csv dataset

Related tags

Overview

tagtog_relation_extraction

How to Use

On Tagtog

On Local

Result

Restrictions

Owner

hyeong

Meltano: ELT for the DataOps era. Meltano is open source, self-hosted, CLI-first, debuggable, and extensible.

Creating a statistical model to predict 10 year treasury yields

MIR Cheatsheet - Survival Guidebook for MIR Researchers in the Lab

Visions provides an extensible suite of tools to support common data analysis operations

Spectacular AI SDK fuses data from cameras and IMU sensors and outputs an accurate 6-degree-of-freedom pose of a device.

Python Practicum - prepare for your Data Science interview or get a refresher.

PyEmits, a python package for easy manipulation in time-series data.

track your GitHub statistics

Randomisation-based inference in Python based on data resampling and permutation.

Detailed analysis on fraud claims in insurance companies, gives you information as to why huge loss take place in insurance companies

Fast, flexible and easy to use probabilistic modelling in Python.

PCAfold is an open-source Python library for generating, analyzing and improving low-dimensional manifolds obtained via Principal Component Analysis (PCA).

Fitting thermodynamic models with pycalphad

BioMASS - A Python Framework for Modeling and Analysis of Signaling Systems

Methylation/modified base calling separated from basecalling.

Educational project on how to build an ETL (Extract, Transform, Load) data pipeline, orchestrated with Airflow.

Titanic data analysis for python

Using approximate bayesian posteriors in deep nets for active learning

Stitch together Nanopore tiled amplicon data without polishing a reference

Improving your data science workflows with