Informal Persian Universal Dependency Treebank (iPerUDT)

Informal Persian Universal Dependency Treebank, consisting of 3000 sentences and 54,904 tokens, is an open source collection of colloquial informal texts from Persian blogs. The corpus is annotated in CoNLL-U format within the Universal Dependencies scheme (Nivre et al., 2020).

The following Course-grained Universal Dependencies parts of speech tags (UPOS), and fine-grained language-specific parts of speech tags (XPOS) are used in this treebank.

UPOS	XPOS	Description
ADJ	ADJ	Adjective
ADJ	ADJ_CMPR	Comparative adjective
ADJ	ADJ_SUP	Superlative adjective
ADV	ADV	Adverb
ADV	ADV_I	Adverb of interrogation
ADV	ADV_LOC	Adverb of location
ADV	ADV_NEG	Adverb of Negation
ADV	ADV_TIME	Adverb of time
ADP	P	Preposition
AUX	V_AUX	Auxiliary/copula verb
CCONJ	CON	Coordinating conjunction
DET	DET	Determiner
INTJ	INTJ	Interjection
NOUN	N_PL	Plural noun
NOUN	N_SING	Singular noun
NUM	NUM	Numeral
PART	PART	Differential object marker, focus marker, negative particle, question particle
PRON	PRO	Pronoun
PROPN	PROPN	Proper nouns (persons,locations, months, organizations, geopolitical entities)
PUNCT	DELM	Punctuation/delimiter
SCONJ	CON	Subordinating conjunction
VERB	V_IMP	Imperative verb
VERB	V_PA	Past tense verb
VERB	V_PP	Past participle
VERB	V_PRS	Present tense verb
VERB	V_SUB	subjunctive verb
X	FW	Foreign word

We used the Universal Dependencies annotation scheme which produces syntactic analyses of sentences in terms of the dependency structures of dependency grammar, determined by the relation between a head and its dependents. The syntactic annotation consists of 42 dependency relations, including 32 universal and 10 language-specific relations (marked by *).

Dependency relation	Description
acl	Clausal modifier of noun
acl:relcl^*	relative clause modifier
advcl	Adverbial clause modifier
advmod	Adverbial modifier
amod	Adjectival modifier
appos	Appositional modifier
aux	Auxiliary
aux:pass	Passive auxiliary
case	Accusative marker/case marking
cc	Coordination
cc:preconj^*	Preconjunction
ccomp	Clausal complement
compound	Compound
compound:lvc^*	Nominal/adjectival NVE in complex predicates
compound:prt^*	Particle NVE in complex predicates
compound:redup^*	Reduplicative words
compound:svc^*	Serial verb constructions
conj	Conjunct
Cop	Copula
det	Determiner
det:predet^*	Predeterminer
discourse	Discourse element
discourse:top/foc^*	Topic/focus marker
dislocated	Dislocated elements
fixed	Fixed multiword expressions
flat	Flat multiword expressions
goeswith	Goes with for poorly-edited words
nmod	Nominal modifier
nmod:poss^*	Possessive/genitive modifier
nsubj	Nominal subject
nsubj:pass	Passive nominal subject
nummod	Numeric modifier
mark	Complementizer/marker
obj	Object
obl	Oblique
obl:arg^*	Oblique core argument
orphan	Ellipsis constructions
parataxis	Parataxis
punct	Punctuation
root	Root
vocative	Vocative
xcomp	Open clausal complement

References

Nivre, Joakim, Marie-Catherine de Marneffe, Filip Ginter, Jan Hajič, Christopher D. Manning, Sampo Pyysalo, Sebastian Schuster, Francis M. Tyers, and Dan Zeman. (2020). Universal dependencies v2: An evergrowing multilingual treebank collection. In Proceedings of the 12th Conference on Language Resources and Evaluation (LREC), 4027–4036.

Informal Persian Universal Dependency Treebank

Related tags

Overview

Informal Persian Universal Dependency Treebank (iPerUDT)

References

Owner

Roya Kabiri

PyArmadillo: an alternative approach to linear algebra in Python

A Fast Sequence Transducer Implementation with PyTorch Bindings

Tracking Progress in Question Answering over Knowledge Graphs

Keep CALM and Improve Visual Feature Attribution

Tracing Versus Freehand for Evaluating Computer-Generated Drawings (SIGGRAPH 2021)

Deep Reinforcement Learning based Trading Agent for Bitcoin

Official codebase used to develop Vision Transformer, MLP-Mixer, LiT and more.

Pytorch implementation of various High Dynamic Range (HDR) Imaging algorithms

A PyTorch implementation of "CoAtNet: Marrying Convolution and Attention for All Data Sizes".

Implementation of the algorithm shown in the article "Modelo de Predicción de Éxito de Canciones Basado en Descriptores de Audio"

Embeddinghub is a database built for machine learning embeddings.

AWS documentation corpus for zero-shot open-book question answering.

Semantic similarity computation with different state-of-the-art metrics

Reimplementation of the paper `Human Attention Maps for Text Classification: Do Humans and Neural Networks Focus on the Same Words? (ACL2020)`

Code Release for the paper "TriBERT: Full-body Human-centric Audio-visual Representation Learning for Visual Sound Separation"

OHLC Average Prediction of Apple Inc. Using LSTM Recurrent Neural Network

Buffon’s needle: one of the oldest problems in geometric probability

The codes of paper 'Active-LATHE: An Active Learning Algorithm for Boosting the Error exponent for Learning Homogeneous Ising Trees'

Predicting future trajectories of people in cameras of novel scenarios and views.

Official implementation of MSR-GCN (ICCV 2021 paper)