Final Project for the Intel AI Readiness Boot Camp NLP (Jan)

Overview

NLP Boot Camp (Jan) Synopsis

Full Name:

Prameya Mohanty

Name of your School:

Delhi Public School, Rourkela

Class:

VIII

Title of the Project:

iTransect – A Language Detector cum Translator

Project Domain:

Natural Language Processing

Summary:

This application is an AI and NLP enabled language detector cum translator. It can first detect the language used in the text entered by the user. Then it can also convert the text in your desired language. This app has a capability to recognize and translate text to over 15 languages.

Context:

We frequently face problems while reading google articles or while going through websites which are not in English language or our mother tongue. Many rural people also don't understand any language except their Mother Tongue. So, they can also translate the text and go through it.

My idea for this problem is that we can create a translator to translate the text into a language which we can understand. But another problem which occurs is that we need to first recognize that the original text is written in which language and mostly we fail to do so. For this reason, my application would just take the text as input, recognize the language of the text and then it would also translate the text into our desired language.

I transformed my idea into a solution by performing some Natural Language Processing on the text given by the user to first recognize the language used in the text and then translate into the desired language of the user.

How does it work:

I have used the MultinomialNB Model of the Scikit-Learn Library. The multinomial Naive Bayes classifier is suitable for classification with discrete features (e.g., word counts for text classification). The multinomial distribution normally requires integer feature counts. However, in practice, fractional counts such as tf-idf may also work.

My application contains a Huge Dataset which contains over 15 languages and some texts on those languages. This dataset in trained on the MultinomialNB Model of the Scikit-Learn Library. This helps it to predict the language of the desired text which we provide to it. Then I have used the GoogleTrans API to Translate our Text into the desired language of the user.

My application takes some text as input from the user. Then it detects the language used in the text by a MultinomialNB Model of the Scikit-Learn Library. After that it uses the GoogleTrans API to translate the text into the desired language of the user.

The future scope of my model is that we can increase the dataset by adding more languages so that the predictions would be more accurate. This would also help our application to cover a broader audience.

Instructions for Usage:

  1. Prerequisite: To use this application, you should have Python installed in your system. Installation of Git is recommended but not compulsory.

  2. Clone Repo: If you have git installed in your system then you can use the command given here or else you can just click on the Code button and then click on the Download ZIP Button. git clone https://github.com/The-Coding-Hub/iTransect.git

  3. Install Requirements: Now you need to install the requirements of this application using pip and the requirements.txt file. Command to be executed in the console is given below. pip install -r ./requirements.txt

  4. Start App: Now you are all set the use this application. You just need to execute the command given below to start the development server of Python Flask in your Localhost.

  5. Enjoy App: Just open the link given in your console and then you can enjoy our application!

Video Link:

https://youtu.be/QsJQ1lxI2Lw

Code Folder Link:

https://github.com/The-Coding-Hub/iTransect

Owner
TheCodingHub
Student at Delhi Public School, Rourkela, Odisha. Programming is my favorite sport. YouTube Channel: TheCodingHub
TheCodingHub
Trex is a tool to match semantically similar functions based on transfer learning.

Trex is a tool to match semantically similar functions based on transfer learning.

62 Dec 28, 2022
문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

Namuwiki corpus 문장단위로 미리 분절된 나무위키 코퍼스. 목적이 LM등에서 사용하기 위한 데이터셋이라, 링크/이미지/테이블 등등이 잘려있습니다. 문장 단위 분절은 kss를 활용하였습니다. 라이선스는 나무위키에 명시된 바와 같이 CC BY-NC-SA 2.0

Jeong Ukjae 16 Apr 02, 2022
A highly sophisticated sequence-to-sequence model for code generation

CoderX A proof-of-concept AI system by Graham Neubig (June 30, 2021). About CoderX CoderX is a retrieval-based code generation AI system reminiscent o

Graham Neubig 39 Aug 03, 2021
Kurumi ChatBot

KurumiChatBot Just another Telegram AI chat bot written in Python using Pyrogram. A public running instance can be found on telegram as @TokisakiChatB

Yoga Pranata 3 Jun 28, 2022
Code for PED: DETR For (Crowd) Pedestrian Detection

Code for PED: DETR For (Crowd) Pedestrian Detection

36 Sep 13, 2022
Simple, Pythonic, text processing--Sentiment analysis, part-of-speech tagging, noun phrase extraction, translation, and more.

TextBlob: Simplified Text Processing Homepage: https://textblob.readthedocs.io/ TextBlob is a Python (2 and 3) library for processing textual data. It

Steven Loria 8.4k Dec 26, 2022
[WWW 2021 GLB] New Benchmarks for Learning on Non-Homophilous Graphs

New Benchmarks for Learning on Non-Homophilous Graphs Here are the codes and datasets accompanying the paper: New Benchmarks for Learning on Non-Homop

94 Dec 21, 2022
KoBART model on huggingface transformers

KoBART-Transformers SKT에서 공개한 KoBART를 편리하게 사용할 수 있게 transformers로 포팅하였습니다. Install (Optional) BartModel과 PreTrainedTokenizerFast를 이용하면 설치하실 필요 없습니다. p

Hyunwoong Ko 58 Dec 07, 2022
Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers.

Cherche (search in French) allows you to create a neural search pipeline using retrievers and pre-trained language models as rankers. Cherche is meant to be used with small to medium sized corpora. C

Raphael Sourty 224 Nov 29, 2022
Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP 2020)

This repository contains the resources in our paper "Revisiting Pre-trained Models for Chinese Natural Language Processing", which will be published i

Yiming Cui 463 Dec 30, 2022
Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU

GPU Docker NLP Application Deployment Deploying a Text Summarization NLP use case on Docker Container Utilizing Nvidia GPU, to setup the enviroment on

Ritesh Yadav 9 Oct 14, 2022
A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

MMF is a modular framework for vision and language multimodal research from Facebook AI Research. MMF contains reference implementations of state-of-t

Facebook Research 5.1k Dec 26, 2022
原神抽卡记录数据集-Genshin Impact gacha data

提要 持续收集原神抽卡记录中 可以使用抽卡记录导出工具导出抽卡记录的json,将json文件发送至[email protected],我会在清除个人信息后

117 Dec 27, 2022
Mesh TensorFlow: Model Parallelism Made Easier

Mesh TensorFlow - Model Parallelism Made Easier Introduction Mesh TensorFlow (mtf) is a language for distributed deep learning, capable of specifying

1.3k Dec 26, 2022
The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Unsupervised technique to Glossary and Definition Extraction Code Files GPT2-DefinitionModel.ipynb - GPT-2 model for definition generation. Data_Gener

Prakhar Mishra 28 May 25, 2021
Code for our paper "Mask-Align: Self-Supervised Neural Word Alignment" in ACL 2021

Mask-Align: Self-Supervised Neural Word Alignment This is the implementation of our work Mask-Align: Self-Supervised Neural Word Alignment. @inproceed

THUNLP-MT 46 Dec 15, 2022
Sentiment Classification using WSD, Maximum Entropy & Naive Bayes Classifiers

Sentiment Classification using WSD, Maximum Entropy & Naive Bayes Classifiers

Pulkit Kathuria 173 Jan 04, 2023
Using context-free grammar formalism to parse English sentences to determine their structure to help computer to better understand the meaning of the sentence.

Sentance Parser Executing the Program Make sure Python 3.6+ is installed. Install requirements $ pip install requirements.txt Run the program:

Vaibhaw 12 Sep 28, 2022
Named Entity Recognition API used by TEI Publisher

TEI Publisher Named Entity Recognition API This repository contains the API used by TEI Publisher's web-annotation editor to detect entities in the in

e-editiones.org 14 Nov 15, 2022
A library for end-to-end learning of embedding index and retrieval model

Poeem Poeem is a library for efficient approximate nearest neighbor (ANN) search, which has been widely adopted in industrial recommendation, advertis

54 Dec 21, 2022