Text language identification using Wikipedia data

The aim of this project is to provide high-quality language detection over all the web's languages. The proxy for all web's languages is Wikipedia. Currently, we support 156 languages that have their Wikipedia entries.

Usage

The main function is text-langs that returns 2 values:

a lang - probability alist (languages are represented by their ISO-639-1 codes)
a vector of tokens with their inferred langs

WILD> (text-langs "це тест")
((:UK . 0.5000003) (:RU . 0.4999998))
#(<це - UK:1.00> <тест - RU:1.00>)

Running as a service

Installation

Install SBCL
Get Quicklisp
Git clone project
$ cd wiki-lang-detect; sbcl --load run.lisp

Running as a Docker

docker build -t wiki-lang-detect:latest .
docker run -it -p 5000:5000 wiki-lang-detect:latest

curl -X POST -H "Content-Type: application/json" -d "{'text': 'Несе Галя'}"  http://localhost:5000/detect | jq '.'

Or you can use prebuilt Docker image maintained outside of this repository.

docker run -it -p 5000:5000 chaliy/wiki-lang-detect:latest

API

See swagger definition

Text language identification using Wikipedia data

Related tags

Overview

Text language identification using Wikipedia data

Usage

Running as a service

Installation

Running as a Docker

API

Helpful links:

Owner

Vsevolod Dyomkin

Introduction to image processing, most used and popular functions of OpenCV

One Metrics Library to Rule Them All!

How to detect objects in real time by using Jupyter Notebook and Neural Networks , by using Yolo3

A pure pytorch implemented ocr project including text detection and recognition

OCR of Chicago 1909 Renumbering Plan

QED-C: The Quantum Economic Development Consortium provides these computer programs and software for use in the fields of quantum science and engineering.

governance proposal to make fei redeemable for eth

An Optical Character Recognition system using Pytesseract/Extracting data from Blood Pressure Reports.

Zoom , GoogleMeets에서 Vtuber 데뷔하기

Distort a video using Seam Carving (video) and Vibrato effect (sound)

Play the Namibian game of Owela against a terrible AI. Built using Django and htmx.

Ready-to-use OCR with 80+ supported languages and all popular writing scripts including Latin, Chinese, Arabic, Devanagari, Cyrillic and etc.

Python Computer Vision Aim Bot for Roblox's Phantom Forces

document image degradation

Face Detection with DLIB

A semi-automatic open-source tool for Layout Analysis and Region EXtraction on early printed books.

APS 6º Semestre - UNIP (2021)

Character Segmentation using TensorFlow

Handwritten Text Recognition (HTR) system implemented with TensorFlow (TF) and trained on the IAM off-line HTR dataset. This Neural Network (NN) model recognizes the text contained in the images of segmented words.

Text-to-Image generation