This repo contains simple to use, pretrained/training-less models for speaker diarization.

Last update: Jan 20, 2022

Related tags

Text Data & NLP pydiar

Overview

PyDiar

This repo contains simple to use, pretrained/training-less models for speaker diarization.

Supported Models

Binary Key Speaker Modeling

Based on pyBK by Jose Patino which implements the diarization system from "The EURECOM submission to the first DIHARD Challenge" by Patino, Jose and Delgado, Héctor and Evans, Nicholas

If you have any other models you would like to see added, please open an issue.

Usage

This library seeks to provide a very basic interface. To use the Binary Key model on a file, do something like this:

import numpy as np
from pydiar.models import BinaryKeyDiarizationModel, Segment
from pydiar.util.misc import optimize_segments
from pydub import AudioSegment

INPUT_FILE = "test.wav"

sample_rate = 32000
audio = AudioSegment.from_wav(test.wav)
audio = audio.set_frame_rate(sample_rate)
audio = audio.set_channels(1)

diarization_model = BinaryKeyDiarizationModel()
segments = diarization_model.diarize(
    sample_rate, np.array(audio.get_array_of_samples())
)
optimized_segments = optimize_segments(segments)

Now optimized_segments contains a list of segments with their start, length and speaker id

Example

A simple script which reads an audio file, diarizes it and transcribes it into the WebVTT format can be found in examples/generate_webvtt.py. To use it, download a vosk model from https://alphacephei.com/vosk/models and then run the script using

poetry install
poetry run python -m examples.generate_webvtt -i PATH/TO/INPUT.wav -m PATH/TO/VOSK_MODEL

This repo contains simple to use, pretrained/training-less models for speaker diarization.

Related tags

Overview

PyDiar

Supported Models

Usage

Example

Owner

Code for EMNLP 2021 main conference paper "Text AutoAugment: Learning Compositional Augmentation Policy for Text Classification"

Pipeline for chemical image-to-text competition

Weakly-supervised Text Classification Based on Keyword Graph

🤗🖼️ HuggingPics: Fine-tune Vision Transformers for anything using images found on the web.

Transformer related optimization, including BERT, GPT

glow-speak is a fast, local, neural text to speech system that uses eSpeak-ng as a text/phoneme front-end.

Chinese Pre-Trained Language Models (CPM-LM) Version-I

Modified GPT using average pooling to reduce the softmax attention memory constraints.

Auto translate textbox from Japanese to English or Indonesia

Production First and Production Ready End-to-End Keyword Spotting Toolkit

Text Normalization（文本正则化）

Skipgram Negative Sampling in PyTorch

Reproduction process of BERT on SST2 dataset

An A-SOUL Text Generator Based on CPM-Distill.

This is an incredibly powerful calculator that is capable of many useful day-to-day functions.

open-information-extraction-system, build open-knowledge-graph(SPO, subject-predicate-object) by pyltp(version==3.4.0)

Top2Vec is an algorithm for topic modeling and semantic search.

The projects lets you extract glossary words and their definitions from a given piece of text automatically using NLP techniques

Neural building blocks for speaker diarization: speech activity detection, speaker change detection, overlapped speech detection, speaker embedding

Unsupervised intent recognition