Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

Last update: Oct 22, 2022

Related tags

Text Data & NLP Yarub_library

Overview

Yarub_library

#The Problem

اللغة العربية تعد من اكثر اللغات انتشارا و استخداما و تتميز لغة الضاد بثراء رصيدها من الكلمات والصيغ ، وهي لغة متميزة من الناحية الصوتية ، فقد اشتملت على جميع الأصوات التي اشتملت عليها اللغات السامية الأخرى . كما تتميز بالمرونة حيث تستوعب جميع الألفاظ المشتقة والمترادفة وتضع لكل مقام مقال لها

ادركنا اهمية اللغة العربية و مكانتها بين شعوب الشرق الاوسط و العالم, و نسعى فى ادراج اللغة العربية ضمن اللغات التى يتيسر استخدامها فى تطبيقات الذكاء الاصطناعى و معالجة اللغات الطبيعية للبشر

In this Omdena project, our goal was to develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications like Morphological analysis, Named Entity Recognition, Sentiment Analysis, Word Embedding, Dialect Identification, Part of speech, and so on the training dataset. This article contains interesting code and could be beneficial for whatever your level of experience, but for beginners, it is a great start-up in data collection using web scraping with referral links to official documentation pages for every mentioned library.

Develop open-source Python Arabic NLP libraries that the Arab world will easily use in all Natural Language Processing applications

Related tags

Overview

Yarub_library

Owner

BADER ALABDAN

Generate a cool README/About me page for your Github Profile

NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

Pytorch implementation of Tacotron

A PyTorch implementation of paper "Learning Shared Semantic Space for Speech-to-Text Translation", ACL (Findings) 2021

NLP, before and after spaCy

Pretty-doc - Composable text objects with python

Open-Source Toolkit for End-to-End Speech Recognition leveraging PyTorch-Lightning and Hydra.

Mesh TensorFlow: Model Parallelism Made Easier

KLUE-baseline contains the baseline code for the Korean Language Understanding Evaluation (KLUE) benchmark.

Bpe algorithm can finetune tokenizer - Bpe algorithm can finetune tokenizer

DVC-NLP-Simple-usecase

jel - Japanese Entity Linker - is Bi-encoder based entity linker for japanese.

Guide: Finetune GPT2-XL (1.5 Billion Parameters) and GPT-NEO (2.7 B) on a single 16 GB VRAM V100 Google Cloud instance with Huggingface Transformers using DeepSpeed

Entity Disambiguation as text extraction (ACL 2022)

Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Sentello is python script that simulates the anti-evasion and anti-analysis techniques used by malware.

Yet Another Neural Machine Translation Toolkit

Practical Natural Language Processing Tools for Humans is build on the top of Senna Natural Language Processing (NLP)

PyTorch Implementation of "Bridging Pre-trained Language Models and Hand-crafted Features for Unsupervised POS Tagging" (Findings of ACL 2022)

Simple, hackable offline speech to text - using the VOSK-API.