Common Voice Dataset explorer

Last update: Nov 16, 2022

Related tags

Text Data & NLP common-voice-explorer

Overview

Common Voice Dataset Explorer

Common Voice Dataset is by Mozilla

Made during huggingface finetuning week

Usage

pip install -r requirements.txt

streamlit run common_voice.py

Details

Made using streamlit
Using https://github.com/PablocFonseca/streamlit-aggrid for interactivity, because you can't click plots yet.

I tried to put this together as quickly as I can, so it is not perfect.

Open a PR or issue~

Owner

Ceyda Cinarel

AI researcher & engineer~ ♥ all things NLP 🤖 generative models ★ like trying out new libraries & tools ♥ Python

GitHub Repository

GSoC'2021 | TensorFlow implementation of Wav2Vec2

73 Nov 28, 2022

Speech to text streamlit app

Speech to text Streamlit-app! 👄 This speech to text recognition is powered by t

9 Jan 01, 2023

justCTF [*] 2020 challenges sources

justCTF [*] 2020 This repo contains sources for justCTF [*] 2020 challenges hosted by justCatTheFish. TLDR: Run a challenge with ./run.sh (requires Do

25 Dec 27, 2022

Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

New State-of-the-Art in Preposition Sense Disambiguation Supervisor: Prof. Dr. Alexander Mehler Alexander Henlein Institutions: Goethe University TTLa

4 Apr 06, 2022

Perform sentiment analysis and keyword extraction on Craigslist listings

craiglist-helper synopsis Perform sentiment analysis and keyword extraction on Craigslist listings Background I love Craigslist. I've found most of my

1 Nov 08, 2021

NLP techniques such as named entity recognition, sentiment analysis, topic modeling, text classification with Python to predict sentiment and rating of drug from user reviews.

This file contains the following documents sumbited for Baruch CIS9665 group 9 fall 2021. 1. Dataset: drug_reviews.csv 2. python codes for text classi

2 Jan 04, 2023

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

464 Jan 04, 2023

ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost LOVE is accpeted by ACL22 main conference as a long pape

32 Jan 03, 2023

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

11 Aug 26, 2022

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

Main Idea The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank Semantic Search Re

2 Jan 28, 2022

Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"

CodeFill This repository contains the code for our paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Namin

11 Oct 31, 2022

Common Voice Dataset explorer

Related tags

Overview

Common Voice Dataset Explorer

Usage

Details

Owner

Ceyda Cinarel

GSoC'2021 | TensorFlow implementation of Wav2Vec2

Speech to text streamlit app

justCTF [*] 2020 challenges sources

Learn meanings behind words is a key element in NLP. This project concentrates on the disambiguation of preposition senses. Therefore, we train a bert-transformer model and surpass the state-of-the-art.

Perform sentiment analysis and keyword extraction on Craigslist listings

NLP techniques such as named entity recognition, sentiment analysis, topic modeling, text classification with Python to predict sentiment and rating of drug from user reviews.

Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

ACL22 paper: Imputing Out-of-Vocabulary Embeddings with LOVE Makes Language Models Robust with Little Cost

Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

The following links explain a bit the idea of semantic search and how search mechanisms work by doing retrieve and rerank

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

CodeBERT: A Pre-Trained Model for Programming and Natural Languages.

MRC approach for Aspect-based Sentiment Analysis (ABSA)

Anuvada: Interpretable Models for NLP using PyTorch

Russian words synonyms and antonyms

Mycroft Core, the Mycroft Artificial Intelligence platform.

PyABSA - Open & Efficient for Framework for Aspect-based Sentiment Analysis

🧪 Cutting-edge experimental spaCy components and features

基于pytorch_rnn的古诗词生成

Contains the code and data for our #ICSE2022 paper titled as "CodeFill: Multi-token Code Completion by Jointly Learning from Structure and Naming Sequences"