DziriBERT: a Pre-trained Language Model for the Algerian Dialect

Last update: Jan 07, 2023

Related tags

Overview

DziriBERT

DziriBERT is the first Transformer-based Language Model that has been pre-trained specifically for the Algerian Dialect. It handles Algerian text contents written using both Arabic and Latin characters. It sets new state of the art results on Algerian text classification datasets, even if it has been pre-trained on much less data (~1 million tweets).

The model is publicly available at: https://huggingface.co/alger-ia/dziribert.

For more information, please visit our paper: https://arxiv.org/pdf/2109.12346.pdf

Evaluation

The Twifil dataset was used to compare DziriBERT with current multilingual, standard Arabic and dialectal Arabic models:

Model	Sentiment acc.	Emotion acc.
bert-base-multilingual-cased	73.6 %	59.4 %
aubmindlab/bert-base-arabert	72.1 %	61.2 %
CAMeL-Lab/bert-base-arabic-camelbert-mix	77.1 %	65.7 %
qarib/bert-base-qarib	77.7 %	67.6 %
UBC-NLP/MARBERT	80.1 %	68.4 %
alger-ia/dziribert	80.3 %	69.3 %

In order to reproduce these results, please install the following requirements:

pip install -r requirements.txt

Then, run the following evaluation script:

python3 evaluate_model.py

These results have been obtained on a Tesla K80 GPU.

Pretrained DziriBERT

DziriBERT has been uploaded to the HuggingFace hub in order to facilitate its use: https://huggingface.co/alger-ia/dziribert.

It can be easily downloaded and loaded using the transformers library:

from transformers import BertTokenizer, BertForMaskedLM

tokenizer = BertTokenizer.from_pretrained("alger-ia/dziribert")
model = BertForMaskedLM.from_pretrained("alger-ia/dziribert")

How to cite

@article{dziribert,
  title={DziriBERT: a Pre-trained Language Model for the Algerian Dialect},
  author={Abdaoui, Amine and Berrimi, Mohamed and Oussalah, Mourad and Moussaoui, Abdelouahab},
  journal={arXiv preprint arXiv:2109.12346},
  year={2021}
}

Contact

Please contact [email protected] for any question, feedback or request.

DziriBERT: a Pre-trained Language Model for the Algerian Dialect

Related tags

Overview

DziriBERT

Evaluation

Pretrained DziriBERT

How to cite

Contact

Owner

This repository contains various models targetting multimodal representation learning, multimodal fusion for downstream tasks such as multimodal sentiment analysis.

A PyTorch implementation of EventProp [https://arxiv.org/abs/2009.08378], a method to train Spiking Neural Networks

Official repository of the paper "A Variational Approximation for Analyzing the Dynamics of Panel Data". Mixed Effect Neural ODE. UAI 2021.

Learning to Estimate Hidden Motions with Global Motion Aggregation

Job-Recommend-Competition - Vectorwise Interpretable Attentions for Multimodal Tabular Data

Catbird is an open source paraphrase generation toolkit based on PyTorch.

Official Pytorch implementation of "Learning to Estimate Robust 3D Human Mesh from In-the-Wild Crowded Scenes", CVPR 2022

PyTorch implementation of GLOM

Object detection GUI based on PaddleDetection

Learning from Synthetic Data with Fine-grained Attributes for Person Re-Identification

A Domain-Agnostic Benchmark for Self-Supervised Learning

PySOT - SenseTime Research platform for single object tracking, implementing algorithms like SiamRPN and SiamMask.

An implementation of based on pytorch and mmcv

Image super-resolution (SR) is a fast-moving field with novel architectures attracting the spotlight

Release of the ConditionalQA dataset

Contains supplementary materials for reproduce results in HMC divergence time estimation manuscript

Interactive web apps created using geemap and streamlit

Code that accompanies the paper Semi-supervised Deep Kernel Learning: Regression with Unlabeled Data by Minimizing Predictive Variance

OpenFace – a state-of-the art tool intended for facial landmark detection, head pose estimation, facial action unit recognition, and eye-gaze estimation.

Lightweight plotting to the terminal. 4x resolution via Unicode.