For making Tagtog annotation into csv dataset

Overview

tagtog_relation_extraction

  • for making Tagtog annotation into csv dataset

How to Use

On Tagtog

1. Go to Project > Downloads
2. Download all documents, using the button below Image

On Local

1. Place folders and files according to the structure specified below:

tagtog_relation_extraction
├──main.py
├──util.py
├──.gitignore
├──README.md
├──requirements.txt
└──Your_download_file_Name
   ├──annotations-legend.json
   ├──ann.json
   |  └──master
   |     └──pool/
   ├──plain.html
   |  └──pool/
   ├──guidelines.md
   └──README.md

2. Install other required packages

  • tqdm==4.62.3
  • pandas==1.1.5
  • beautifulsoup4==4.10.0
$ pip install -r $ROOT/tagtog_relation_extraction/requirements.txt

3. Run

$ python main.py --path Your_download_file_Name

Result

1. Dataset file (dataset.csv)

sentence: 가장 가능성이 높은 새 대안은 플랑크 상수를 통해 질량을 정의하는 방안이다.질량의 단위는 킬로그램 외에도 여러가지가 있는데, 그중 대표적인 단위가 바로 원자질량단위이다
sub_tag: {'word': '원자질량단위', 'start_idx': 85, 'end_idx': 90, 'type': 'POH'}
obj_tag: {'word': '플랑크 상수', 'start_idx': 17, 'end_idx': 22, 'type': 'POH'}
label: POH:no_relation'

2. File for checking answers (answer_check.csv)

  • csv file desgined for checking entity taggings and labels
  • example:
sentence: 가장 가능성이 높은 새 대안은 
   
    를 통해 질량을 정의하는 방안이다.질량의 단위는 킬로그램 외에도 여러가지가 있는데, 그중 대표적인 단위가 바로 
    
     이다	
sub_tag: POH
obj_tag: POH
label: POH:no_relation

    
   

Restrictions

  • Entity labels should follow the following form
SUBJ-{ENT_TYPE}-{RELATION_NAME}
OBJ-{ENT_TYPE}-{RELATION_NAME}
  • If this is not the case you might need some revision on the util.py file
Owner
hyeong
Data Analyst / AI Engineer CV:https://bit.ly/2YMgTXd
hyeong
Validation and inference over LinkML instance data using souffle

Translates LinkML schemas into Datalog programs and executes them using Souffle, enabling advanced validation and inference over instance data

Linked data Modeling Language 7 Aug 07, 2022
PyTorch implementation for NCL (Neighborhood-enrighed Contrastive Learning)

NCL (Neighborhood-enrighed Contrastive Learning) This is the official PyTorch implementation for the paper: Zihan Lin*, Changxin Tian*, Yupeng Hou* Wa

RUCAIBox 73 Jan 03, 2023
GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors

GWpy is a collaboration-driven Python package providing tools for studying data from ground-based gravitational-wave detectors. GWpy provides a user-f

GWpy 342 Jan 07, 2023
Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data.

Hatchet Hatchet is a Python-based library that allows Pandas dataframes to be indexed by structured tree and graph data. It is intended for analyzing

Lawrence Livermore National Laboratory 14 Aug 19, 2022
A Python module for clustering creators of social media content into networks

sm_content_clustering A Python module for clustering creators of social media content into networks. Currently supports identifying potential networks

72 Dec 30, 2022
Produces a summary CSV report of an Amber Electric customer's energy consumption and cost data.

Amber Electric Usage Summary This is a command line tool that produces a summary CSV report of an Amber Electric customer's energy consumption and cos

Graham Lea 12 May 26, 2022
Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Tablexplore is an application for data analysis and plotting built in Python using the PySide2/Qt toolkit.

Damien Farrell 81 Dec 26, 2022
Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Trung-Duy Nguyen 27 Nov 01, 2022
OpenDrift is a software for modeling the trajectories and fate of objects or substances drifting in the ocean, or even in the atmosphere.

opendrift OpenDrift is a software for modeling the trajectories and fate of objects or substances drifting in the ocean, or even in the atmosphere. Do

OpenDrift 167 Dec 13, 2022
PyStan, a Python interface to Stan, a platform for statistical modeling. Documentation: https://pystan.readthedocs.io

PyStan PyStan is a Python interface to Stan, a package for Bayesian inference. Stan® is a state-of-the-art platform for statistical modeling and high-

Stan 229 Dec 29, 2022
Data collection, enhancement, and metrics calculation.

l3_data_collection Data collection, enhancement, and metrics calculation. Summary Repository containing code for QuantDAO's JDT data collection task.

Ruiwyn 3 Dec 23, 2022
Stock Analysis dashboard Using Streamlit and Python

StDashApp Stock Analysis Dashboard Using Streamlit and Python If you found the content useful and want to support my work, you can buy me a coffee! Th

StreamAlpha 27 Dec 09, 2022
X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana

X-news - Pipeline data use scrapy, kafka, spark streaming, spark ML and elasticsearch, Kibana

Nguyễn Quang Huy 5 Sep 28, 2022
Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

Larch: Data Analysis Tools for X-ray Spectroscopy and More Documentation: http://xraypy.github.io/xraylarch Code: http://github.com/xraypy/xraylarch L

xraypy 95 Dec 13, 2022
Repository created with LinkedIn profile analysis project done

EN/en Repository created with LinkedIn profile analysis project done. The datase

Mayara Canaver 4 Aug 06, 2022
PyClustering is a Python, C++ data mining library.

pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each

Andrei Novikov 1k Jan 05, 2023
Visions provides an extensible suite of tools to support common data analysis operations

Visions And these visions of data types, they kept us up past the dawn. Visions provides an extensible suite of tools to support common data analysis

168 Dec 28, 2022
Python Practicum - prepare for your Data Science interview or get a refresher.

Python-Practicum Python Practicum - prepare for your Data Science interview or get a refresher. Data Data visualization using data on births from the

Jovan Trajceski 1 Jul 27, 2021
Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

Datashader is a data rasterization pipeline for automating the process of creating meaningful representations of large amounts of data.

HoloViz 2.9k Jan 06, 2023
WaveFake: A Data Set to Facilitate Audio DeepFake Detection

WaveFake: A Data Set to Facilitate Audio DeepFake Detection This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper

Chair for Sys­tems Se­cu­ri­ty 27 Dec 22, 2022