Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"

Last update: Dec 01, 2022

Related tags

Deep Learning CMST

Overview

Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages

Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"

File organization

Preprocessing : contains all files used to preprocess the data (Python 3.6)
Data : contains data required to run this code
Statistics : contains all files that contains statistics of the dataset

Dataset

file name	discription
train/test/dev.csv	This is the dataset for code-mixed Speech Translation.
chopped_audios	This contains all the audios, transcription and translation.

Statistics of Corpora contained

Languages	#types	#tokens	Types per line	Tokens per line	Avg. token length
English[100%]	40,324	601889	10.58	11.27	4.92
French (France)	50510	645651	11.38	12.09	5.08
German[100%]	50748	584575	10.44	10.95	5.57
Gujarati[100%]	41959	584989	10.37	10.95	4.46
Hindi[100%]	29744	716800	12.36	13.42	3.74
Hungarian[100%]	84872	506608	9.13	9.49	5.89
Indonesian[100%]	39365	653374	11.54	12.23	6.14
Italian[100%]	52372	512061	9.23	9.59	5.37
Latvian[100%]	70040	477106	8.69	8.93	5.72
Lithuanian[100%]	75222	491558	8.92	9.2	6.04
Nepali[100%]	52630	570268	10.03	10.68	4.88
Persian (Farsi)[100%]	51722	598096	10.61	11.2	4.1
Polish[100%]	71662	494263	8.99	9.25	5.86
Portuguese (Brazil)[100%]	50087	608432	10.8	11.39	5.12
Russian[100%]	72162	490908	8.96	9.19	5.79
Slovak[100%]	73789	520465	9.39	9.75	5.37
Slovenian[100%]	68619	516649	9.35	9.67	5.3
Spanish[100%]	49806	608868	10.75	11.4	5.07
Swedish[100%]	48233	581751	10.31	10.89	5
Tamil[100%]	84183	460678	8.37	8.63	7.65
Telugu[100%]	72006	464665	8.34	8.7	6.56
Turkish[100%]	78957	453521	8.27	8.49	6.35
Bulgarian[100%]	60712	564150	10.1	10.56	5.24
Croatian[100%]	73075	531326	9.58	9.95	5.28
Danish[100%]	50170	587253	10.4	11	4.98
Dutch[100%]	42716	595464	10.52	11.15	5.05

Code-mixing

All languages in Code-mixing

Language	Total Words	Unique Words	Percentage
English	500136	6312	83.6
Bengali	46933	3907	7.84
Sanskrit	51246	7202	8.56
Total	598315	17421	100

Types of Code-mixing

	English-Sanskrit	Sanskrit-English	English-Bengali	Bengali-English
Inter-Sentential	2356	2366	339	339
Intra-Sentential	2338	851	124	0

Code for the paper titled "Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages"

Related tags

Overview

Prabhupadavani: A Code-mixed Speech Translation Data for 25 languages

File organization

Dataset

Statistics of Corpora contained

Code-mixing

All languages in Code-mixing

Types of Code-mixing

Owner

Ayush Daksh

Official implementation of the ICCV 2021 paper "Joint Inductive and Transductive Learning for Video Object Segmentation"

Jarvis Project is a basic virtual assistant that uses TensorFlow for learning.

Get 2D point positions (e.g., facial landmarks) projected on 3D mesh

One-Shot Neural Ensemble Architecture Search by Diversity-Guided Search Space Shrinking

Elastic weight consolidation technique for incremental learning.

Various operations like path tracking, counting, etc by using yolov5

a general-purpose Transformer based vision backbone

code for paper"A High-precision Semantic Segmentation Method Combining Adversarial Learning and Attention Mechanism"

dataset for ECCV 2020 "Motion Capture from Internet Videos"

Extreme Lightwegith Portrait Segmentation

Moment-DETR code and QVHighlights dataset

Memoized coduals - Shows that it is possible to implement reverse mode autodiff using a variation on the dual numbers called the codual numbers

Simple and Distributed Machine Learning

PyTorch implementation of Barlow Twins.

Ppq - A powerful offline neural network quantization tool with custimized IR

A PyTorch implementation of "ANEMONE: Graph Anomaly Detection with Multi-Scale Contrastive Learning", CIKM-21

This is the source code for the experiments related to the paper Unsupervised Audio Source Separation Using Differentiable Parametric Source Models

Food recognition model using convolutional neural network & computer vision

A little Python application to auto tag your photos with the power of machine learning.

🛰️ List of earth observation companies and job sites