Simplified diarization pipeline using some pretrained models - audio file to diarized segments in a few lines of code

Last update: Dec 30, 2022

Overview

simple_diarizer

Simplified diarization pipeline using some pretrained models.

Made to be a simple as possible to go from an input audio file to diarized segments.

import soundfile as sf
import matplotlib.pyplot as plt

from simple_diarizer.diarizer import Diarizer
from simple_diarizer.utils import combined_waveplot

diar = Diarizer(
                  embed_model='xvec', # 'xvec' and 'ecapa' supported
                  cluster_method='sc' # 'ahc' and 'sc' supported
               )

segments = diar.diarize(WAV_FILE, num_speakers=NUM_SPEAKERS)

signal, fs = sf.read(WAV_FILE)
combined_waveplot(signal, fs, segments)
plt.show()

Source Video

"Some Quick Advice from Barack Obama!"

Pre-trained Models

The following pretrained models are used:

Voice Activity Detection (VAD)
- Silero VAD
Deep speaker embedding extraction
- SpeechBrain
  - X-Vector
  - ECAPA-TDNN
(Optional/Experimental) Speech-to-text
- ESPnet Model Zoo
  - English ASR model

Demo

It can be checked out in the above link, where it will try and diarize any input YouTube URL. It will also use YouTube's autogenerated transcriptions to produce a speaker labelled transcription.

Hopefully this can be of use as a free basic tool to produce a diarized transcript of a video/audio of interest.

Other References

Spectral clustering methods lifted from https://github.com/wq2012/SpectralCluster

Planned Features

Comments

WIP - Make an installable package

Description:

Include requirements.txt.
Add setup*. files to build a package.
Create a folder simple_diarizer to store source code.
Create Github Workflow to publish the package.

How to test:

Run command pip install .
Outside project folder type python and from simple_diarizer import diarizer

Notes:

Cannot use python 3.10.x yet

Source code to test:

from simple_diarizer.utils import (convert_wavfile, download_youtube_wav)

from simple_diarizer.diarizer import Diarizer
import tempfile

YOUTUBE_ID = "HyKmkLEtQbs"

with tempfile.TemporaryDirectory() as outdir:
    yt_file = download_youtube_wav(YOUTUBE_ID, outdir)

    wav_file = convert_wavfile(yt_file, f"{outdir}/{YOUTUBE_ID}_converted.wav")

    print(f"wav file: {wav_file}")

    diar = Diarizer(
        embed_model='ecapa', # supported types: ['xvec', 'ecapa']
        cluster_method='sc', # supported types: ['ahc', 'sc']
        window=1.5, # size of window to extract embeddings (in seconds)
        period=0.75 # hop of window (in seconds)
    )

    NUM_SPEAKERS = 2

    segments = diar.diarize(wav_file, 
                            num_speakers=NUM_SPEAKERS,
                            outfile=f"{outdir}/{YOUTUBE_ID}.rttm")

    print(segments)

opened by johnidm 16

"[Errno 30] Read-only file system: 'pretrained_models'"

I am using macOS and I am getting error "[Errno 30] Read-only file system: 'pretrained_models'" From what I can tell, the pretrained models are being fetched if you do not have them.

However, the save location is the root directory which is read-only. This is where I believe is the target directory "./pretrained_model_checkpoints"

Is there another location that can be used that can be used?

PythonKit/Python.swift:706: Fatal error: 'try!' expression unexpectedly raised an error: Python exception: [Errno 30] Read-only file system: 'pretrained_models' Traceback: File "/Users/wedwards/Documents/Development/A_PythonKit_Test/A_PythonKit_Test/Simple Diarizer.py", line 42, in diar = Diarizer( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/simple_diarizer/diarizer.py", line 48, in init self.embed_model = EncoderClassifier.from_hparams( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/speechbrain/pretrained/interfaces.py", line 342, in from_hparams hparams_local_path = fetch( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/speechbrain/pretrained/fetching.py", line 86, in fetch savedir.mkdir(parents=True, exist_ok=True) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pathlib.py", line 1179, in mkdir self.parent.mkdir(parents=True, exist_ok=True) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pathlib.py", line 1175, in mkdir self._accessor.mkdir(self, mode)

2022-11-11 13:14:00.531470-0500 A_PythonKit_Test[69382:7584330] PythonKit/Python.swift:706: Fatal error: 'try!' expression unexpectedly raised an error: Python exception: [Errno 30] Read-only file system: 'pretrained_models' Traceback: File "/Users/wedwards/Documents/Development/A_PythonKit_Test/A_PythonKit_Test/Simple Diarizer.py", line 42, in diar = Diarizer( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/simple_diarizer/diarizer.py", line 48, in init self.embed_model = EncoderClassifier.from_hparams( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/speechbrain/pretrained/interfaces.py", line 342, in from_hparams hparams_local_path = fetch( File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/speechbrain/pretrained/fetching.py", line 86, in fetch savedir.mkdir(parents=True, exist_ok=True) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pathlib.py", line 1179, in mkdir self.parent.mkdir(parents=True, exist_ok=True) File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/pathlib.py", line 1175, in mkdir self._accessor.mkdir(self, mode)

opened by MrEdwards007 5
Latest Python and packages

The current release prevents use of Python 3.10 and requires specific versions of Beautiful Soup and PyTube.

I've forked the repo to overcome these version limitations and it's working for me. I haven't made a pull request, however, as your repo doesn't have tests and I don't know whether there is a use case which would be broken by my changes.

Can you please remove these version limitations if they're not needed?

Thanks for the repo - it's effective and much easier to use than SpeechBrain.

opened by andrewmackie 3
takes 1 positional argument but 2 were given

running a demo on google co-lab i am getting the following error, any idea how to resolve this,

File "/root/anaconda3/envs/simple/lib/python3.8/site-packages/speechbrain/pretrained/fetching.py", line 116, in fetch fetched_file = huggingface_hub.cached_download(url, use_auth_token) TypeError: cached_download() takes 1 positional argument but 2 were given

opened by SanaullahOfficial 2

AttributeError when running Diarizer in simple_diarizer.diarizer

Hi there!

When running the following code in Python 3.7 on a fresh conda environment in Ubuntu 22.04

from simple_diarizer.diarizer import Diarizer

diar = Diarizer(
                    embed_model='xvec', # 'xvec' and 'ecapa' suported
                    cluster_method='sc' # 'ahc' and 'sc' supported
                )

I get the following error:

<ipython-input-3-286690ce0195> in <module>
      1 diar = Diarizer(
      2                     embed_model='xvec', # 'xvec' and 'ecapa' suported
----> 3                     cluster_method='sc' # 'ahc' and 'sc' supported
      4                 )

~/anaconda3/envs/test/lib/python3.7/site-packages/simple_diarizer/diarizer.py in __init__(self, embed_model, cluster_method, window, period)
     44             self.embed_model = EncoderClassifier.from_hparams(source="speechbrain/spkrec-xvect-voxceleb",
     45                                                               savedir="pretrained_models/spkrec-xvect-voxceleb",
---> 46                                                               run_opts=self.run_opts)
     47         if embed_model == 'ecapa':
     48             self.embed_model = EncoderClassifier.from_hparams(source="speechbrain/spkrec-ecapa-voxceleb",

~/anaconda3/envs/test/lib/python3.7/site-packages/speechbrain/pretrained/interfaces.py in from_hparams(cls, source, hparams_file, pymodule_file, overrides, savedir, use_auth_token, **kwargs)
    349         # Load the modules:
    350         with open(hparams_local_path) as fin:
--> 351             hparams = load_hyperpyyaml(fin, overrides)
    352 
    353         # Pretraining:

~/anaconda3/envs/test/lib/python3.7/site-packages/hyperpyyaml/core.py in load_hyperpyyaml(yaml_stream, overrides, overrides_must_match)
    187 
    188     # Remove items that start with "__"
--> 189     removal_keys = [k for k in hparams.keys() if k.startswith("__")]
    190     for key in removal_keys:
    191         del hparams[key]

AttributeError: 'str' object has no attribute 'keys'

opened by masonhargrave 2

Make project installable
Hi @cvqluu, this project is amazing, thanks for sharing.

I have some experience in packaging projects in Python.

What do you think I make these items on your to-do list?

Add to PyPi (make pip installable)

requirements.txt

If you authorize me, I will start doing this now and submit pull requests for your review and approval.
opened by johnidm 1
Added ipython depedency
Tested on local machine using:

pip install --user git+https://github.com/cvqluu/[email protected]

Fix for https://github.com/cvqluu/simple_diarizer/issues/12
opened by cvqluu 0
Bump ipython from 7.30.1 to 7.31.1
Bumps ipython from 7.30.1 to 7.31.1.

Commits

e321e76 release 7.31.1

67ca2b3 Merge pull request from GHSA-pq7m-3gw7-gq5x

2794330 back to dev

be343e7 release 7.31.0

0fcf2c4 Merge pull request #13428 from meeseeksmachine/auto-backport-of-pr-13427-on-7.x

b8db9b1 Backport PR #13427: wn 731

7f253dc Merge pull request #13412 from bnavigator/backport-inspect

4f26796 fix xxlimited_35 import name

77ca4a6 don't run nose-based iptest on py310, only pytest

533e509 back to decorator skip

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0

Undeclared IPython dependency

The current package (0.0.12 on PyPI) cannot run without IPython, but this is missing from requirements.txt

Steps to reproduce (outside of a Jupyter notebook):

pip install simple-diarizer

# index.py
from simple_diarizer.diarizer import Diarizer

Output:

File "[redacted]\index.py", line 1, in <module>
    from simple_diarizer.diarizer import Diarizer
File "[redacted]\lib\site-packages\simple_diarizer\diarizer.py", line 13, in <module>
    from .utils import check_wav_16khz_mono, convert_wavfile
File "[redacted]\lib\site-packages\simple_diarizer\utils.py", line 8, in <module>
    from IPython.display import Audio, display
ModuleNotFoundError: No module named 'IPython'

opened by DavidRalph 1

waveplot_perspeaker causes argument out of range error

While running through your code example, testing the workflow on a different audio file produced the following output:

C:\Users\xxx\Miniconda3\envs\simple_diarizer_env\lib\site-packages\IPython\lib\display.py:187: RuntimeWarning: invalid value encountered in divide
  scaled = data / normalization_factor * 32767
---------------------------------------------------------------------------
error                                     Traceback (most recent call last)
Cell In [18], line 1
----> 1 waveplot_perspeaker(signal, fs, segments)

File ~\Miniconda3\envs\simple_diarizer_env\lib\site-packages\simple_diarizer\utils.py:166, in waveplot_perspeaker(signal, fs, segments)
    164 if "words" in seg:
    165     pprint(seg["words"])
--> 166 display(Audio(speech, rate=fs))
    167 print("=" * 40 + "\n")

File ~\Miniconda3\envs\simple_diarizer_env\lib\site-packages\IPython\lib\display.py:130, in Audio.__init__(self, data, filename, url, embed, rate, autoplay, normalize, element_id)
    128 if rate is None:
    129     raise ValueError("rate must be specified when data is a numpy array or list of audio samples.")
--> 130 self.data = Audio._make_wav(data, rate, normalize)

File ~\Miniconda3\envs\simple_diarizer_env\lib\site-packages\IPython\lib\display.py:162, in Audio._make_wav(data, rate, normalize)
    160 waveobj.setsampwidth(2)
    161 waveobj.setcomptype('NONE','NONE')
--> 162 waveobj.writeframes(scaled)
    163 val = fp.getvalue()
    164 waveobj.close()

File ~\Miniconda3\envs\simple_diarizer_env\lib\wave.py:437, in Wave_write.writeframes(self, data)
    436 def writeframes(self, data):
--> 437     self.writeframesraw(data)
    438     if self._datalength != self._datawritten:
    439         self._patchheader()

File ~\Miniconda3\envs\simple_diarizer_env\lib\wave.py:426, in Wave_write.writeframesraw(self, data)
    424 if not isinstance(data, (bytes, bytearray)):
    425     data = memoryview(data).cast('B')
--> 426 self._ensure_header_written(len(data))
    427 nframes = len(data) // (self._sampwidth * self._nchannels)
    428 if self._convert:

File ~\Miniconda3\envs\simple_diarizer_env\lib\wave.py:467, in Wave_write._ensure_header_written(self, datasize)
    465 if not self._framerate:
    466     raise Error('sampling rate not specified')
--> 467 self._write_header(datasize)

File ~\Miniconda3\envs\simple_diarizer_env\lib\wave.py:479, in Wave_write._write_header(self, initlength)
    477 except (AttributeError, OSError):
    478     self._form_length_pos = None
--> 479 self._file.write(struct.pack('<L4s4sLHHLLHH4s',
    480     36 + self._datalength, b'WAVE', b'fmt ', 16,
    481     WAVE_FORMAT_PCM, self._nchannels, self._framerate,
    482     self._nchannels * self._framerate * self._sampwidth,
    483     self._nchannels * self._sampwidth,
    484     self._sampwidth * 8, b'data'))
    485 if self._form_length_pos is not None:
    486     self._data_length_pos = self._file.tell()

error: argument out of range

Any ideas what the issue could be? It works fine on other audio files, and everything up to this point seems to run without error.

opened by dcruiz01 1

Releases(v0.0.13)

v0.0.13(Dec 12, 2022)

Source code(tar.gz)
Source code(zip)
v0.0.12(Dec 8, 2022)

Setting extra_info to True will now return an additional dict, containing cluster labels
Source code(tar.gz)
Source code(zip)
v0.0.11(Nov 9, 2022)

Removed youtube related dependencies, keeping the repository slim. There are no longer youtube helper functions, but the core functionality should now work for python >=3.7
Source code(tar.gz)
Source code(zip)
v0.0.10(Aug 30, 2022)

Allowed for a newer version of speechbrain, which should have fixed the issues with pulling from huggingface_hub
Source code(tar.gz)
Source code(zip)
v0.0.9(Jan 10, 2022)

Source code(tar.gz)
Source code(zip)
v0.0.8(Jan 10, 2022)

Source code(tar.gz)
Source code(zip)
v0.0.7(Jan 10, 2022)

Source code(tar.gz)
Source code(zip)
v0.0.6(Jan 10, 2022)

Source code(tar.gz)
Source code(zip)
v0.0.5(Jan 10, 2022)

Source code(tar.gz)
Source code(zip)
v0.0.4(Jan 10, 2022)

Source code(tar.gz)
Source code(zip)
v0.0.3(Jan 10, 2022)

Source code(tar.gz)
Source code(zip)
v0.0.2(Jan 10, 2022)

Source code(tar.gz)
Source code(zip)
v0.0.1(Jan 10, 2022)

Source code(tar.gz)
Source code(zip)

Owner

Chau

PhD student at the University of Edinburgh, CSTR

GitHub Repository

A framework for evaluating Knowledge Graph Embedding Models in a fine-grained manner.

13 Sep 08, 2022

auto_code_complete is a auto word-completetion program which allows you to customize it on your need

auto_code_complete v1.3 purpose and usage auto_code_complete is a auto word-completetion program which allows you to customize it on your needs. the m

2 Feb 22, 2022

KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정한 코드입니다.

KoBERTopic 모델 소개 KoBERTopic은 BERTopic을 한국어 데이터에 적용할 수 있도록 토크나이저와 BERT를 수정했습니다. 기존 BERTopic : https://github.com/MaartenGr/BERTopic/tree/05a6790b21009d

26 Jan 03, 2023

A natural language modeling framework based on PyTorch

Overview PyText is a deep-learning based NLP modeling framework built on PyTorch. PyText addresses the often-conflicting requirements of enabling rapi

6.4k Dec 27, 2022

A combination of autoregressors and autoencoders using XLNet for sentiment analysis

A combination of autoregressors and autoencoders using XLNet for sentiment analysis Abstract In this paper sentiment analysis has been performed in or

2 Nov 20, 2021

Transformer training code for sequential tasks

Sequential Transformer This is a code for training Transformers on sequential tasks such as language modeling. Unlike the original Transformer archite

578 Dec 13, 2022

Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec

Wake Wake: Context-Sensitive Automatic Keyword Extraction Using Word2vec Abstract استخراج خودکار کلمات کلیدی متون کوتاه فارسی با استفاده از word2vec ب

1 Dec 17, 2021

💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

15.3k Jan 03, 2023

Python wrapper for Stanford CoreNLP tools v3.4.1

Python interface to Stanford Core NLP tools v3.4.1 This is a Python wrapper for Stanford University's NLP group's Java-based CoreNLP tools. It can eit

610 Sep 07, 2022

BARTpho: Pre-trained Sequence-to-Sequence Models for Vietnamese

Table of contents Introduction Using BARTpho with fairseq Using BARTpho with transformers Notes BARTpho: Pre-trained Sequence-to-Sequence Models for V

58 Dec 23, 2022

Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁

TGCLOUD 🪁 Simple telegram bot to convert files into direct download link.you can use telegram as a file server 🪁 Features Easy to Deploy Heroku Supp

6 Oct 18, 2022

Blackstone is a spaCy model and library for processing long-form, unstructured legal text

Blackstone Blackstone is a spaCy model and library for processing long-form, unstructured legal text. Blackstone is an experimental research project f

579 Jan 08, 2023

Chinese version of GPT2 training code, using BERT tokenizer.

GPT2-Chinese Description Chinese version of GPT2 training code, using BERT tokenizer or BPE tokenizer. It is based on the extremely awesome repository

5.6k Jan 04, 2023

A multi-voice TTS system trained with an emphasis on quality

TorToiSe Tortoise is a text-to-speech program built with the following priorities: Strong multi-voice capabilities. Highly realistic prosody and inton

2.1k Jan 01, 2023

An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

6k Dec 31, 2022

Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline

Twitter-News-Summarizer Twitter bot that uses NLP models to summarize news articles referenced in a user's twitter timeline 1.) Extracts all tweets fr

1 Jan 27, 2022

Synthetic data for the people.

zpy: Synthetic data in Blender. Website • Install • Docs • Examples • CLI • Contribute • Licence Abstract Collecting, labeling, and cleaning data for

253 Dec 21, 2022

Kerberoast with ACL abuse capabilities

targetedKerberoast targetedKerberoast is a Python script that can, like many others (e.g. GetUserSPNs.py), print "kerberoast" hashes for user accounts

213 Dec 22, 2022

Kashgari is a production-level NLP Transfer learning framework built on top of tf.keras for text-labeling and text-classification, includes Word2Vec, BERT, and GPT2 Language Embedding.

Kashgari Overview | Performance | Installation | Documentation | Contributing 🎉 🎉 🎉 We released the 2.0.0 version with TF2 Support. 🎉 🎉 🎉 If you

2.3k Dec 29, 2022

Python SDK for working with Voicegain Speech-to-Text

Voicegain Speech-to-Text Python SDK Python SDK for the Voicegain Speech-to-Text API. This API allows for large vocabulary speech-to-text transcription

3 Dec 14, 2022