OneShot Learning-based hotword detection.

Last update: Dec 25, 2022

Overview

EfficientWord-Net

Hotword detection based on one-shot learning

Home assistants require special phrases called hotwords to get activated (eg:"ok google")

EfficientWord-Net is an hotword detection engine based on one-shot learning inspired from FaceNet's Siamese Network Architecture. Works very similar to face recognition , just requires a few samples of your own custom hotword to get going. No extra training or huge datasets required!! This will allow developers to add custom hotwords to their programs without a sweat or any extra charges. Just like google assistant's hotword detector, the engine performs the best when 3-4 hotword samples are collected directly from the user This repository is an official implemenation of EfficientWord-Net as a python library from the authors.

The library is purely written with python and uses Google's Tflite implemenation for faster realtime inference.

Demo of EfficientWord-Net in Pi

EfficientWord-Net.mp4

Access preprint

The research paper is currently under review in IEEE, click here to access the preprint and the training code will be available for public access once the paper is published.

Python Version Requirements

This Library works between python versions: 3.6 to 3.9

Dependencies Installation

Before running the pip installation command for the library, few dependencies need to be installed manually.

PyAudio (depends on PortAudio)
Tflite (tensorflow lightweight binaries)
Librosa (Binaries might not be available for certain systems) Mac OS M* and Raspberry Pi users might have to compile these dependecies.

tflite package cannot be listed in requirements.txt hence will be automatically installed when the package is initialized in the system.

librosa package is not required for inference only cases , however when generate_reference is called , will be automatically installed.

Package Installation

Run the following pip command

pip install EfficientWord-Net

and to import running

import eff_word_net

Demo

After installing the packages, you can run the Demo script inbuilt with library (ensure you have a working mic).

Accesss Documentation from : https://ant-brain.github.io/EfficientWord-Net/

Command to run demo

python -m eff_word_net.engine

Generating Custom Wakewords

For any new hotword, the library needs information about the hotword, this information is obtained from a file called {wakeword}_ref.json. Eg: For the wakeword 'alexa', the library would need the file called alexa_ref.json

These files can be generated with the following procedure:

One needs to collect few 4 to 10 uniquely sounding pronunciations of a given wakeword. Then put them into a seperate folder, which doesnt contain anything else.

Finally run this command, it will ask for the input folder's location (containing the audio files) and the output folder (where _ref.json file will be stored).

python -m eff_word_net.generate_reference

The pathname of the generated wakeword needs to passed to the HotwordDetector detector instance.

HotwordDetector(
        hotword="hello",
        reference_file = "/full/path/name/of/hello_ref.json"),
        activation_count = 3 #2 by default
)

Few wakewords such as Mycroft, Google, Firefox, Alexa, Mobile, Siri the library has predefined embeddings readily available in the library installation directory, its path is readily available in the following variable

from eff_word_net import samples_loc

Try your first single hotword detection script

import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net.engine import HotwordDetector
from eff_word_net import samples_loc

mycroft_hw = HotwordDetector(
        hotword="Mycroft",
        reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
        activation_count=3
    )

mic_stream = SimpleMicStream()
mic_stream.start_stream()

print("Say Mycroft ")
while True :
    frame = mic_stream.getFrame()
    result = mycroft_hw.checkFrame(frame)
    if(result):
        print("Wakeword uttered")

Detecting Mulitple Hotwords from audio streams

The library provides a computation friendly way to detect multiple hotwords from a given stream, installed of running checkFrame() of each wakeword individually

import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net import samples_loc
print(samples_loc)

alexa_hw = HotwordDetector(
        hotword="Alexa",
        reference_file = os.path.join(samples_loc,"alexa_ref.json"),
    )

siri_hw = HotwordDetector(
        hotword="Siri",
        reference_file = os.path.join(samples_loc,"siri_ref.json"),
    )

mycroft_hw = HotwordDetector(
        hotword="mycroft",
        reference_file = os.path.join(samples_loc,"mycroft_ref.json"),
        activation_count=3
    )

multi_hw_engine = MultiHotwordDetector(
        detector_collection = [
            alexa_hw,
            siri_hw,
            mycroft_hw,
        ],
    )

mic_stream = SimpleMicStream()
mic_stream.start_stream()

print("Say Mycroft / Alexa / Siri")

while True :
    frame = mic_stream.getFrame()
    result = multi_hw_engine.findBestMatch(frame)
    if(None not in result):
        print(result[0],f",Confidence {result[1]:0.4f}")

Access documentation of the library from here : https://ant-brain.github.io/EfficientWord-Net/

About `activation_count` in `HotwordDetector`

Documenatation with detailed explanation on the usage of activation_count parameter in HotwordDetector is in the making , For now understand that for long hotwords 3 is advisable and 2 for smaller hotwords. If the detector gives out multiple triggers for a single utterance, try increasing activation_count. To experiment begin with smaller values. Default value for the same is 2

FAQ :

Hotword Perfomance is bad : if you are having some issue like this , feel to ask the same in discussions

CONTRIBUTION:

If you have an ideas to make the project better, feel free to ping us in discussions
The current logmelcalc.tflite graph can convert only 1 audio frame to Log Mel Spectrogram at a time. It will be of a great help if tensorflow guru's outthere help us out with this.

TODO :

Add audio file handler in streams. PR's are welcome.
Remove librosa requirement to encourage generating reference files directly in edge devices
Add more detailed documentation explaining slider window concept

SUPPORT US:

Our hotword detector's performance is notably low when compared to Porcupine. We have thought about better NN architectures for the engine and hope to outperform Porcupine. This has been our undergrad project. Hence your support and encouragement will motivate us to develop the engine. If you loved this project recommend this to your peers, give us a 🌟 in Github and a clap 👏 in medium.

LICENCSE : Apache License 2.0

Comments

Threshold value in engine.py not working?

hello,

first of all, thank you for this great library!

I managed to make it work on my M1 MacBook Air, and trying out my personal hotword detection, but the threshold value does not seem to be working on my environment.

In engine.py:

    def __init__(
            self,
            hotword:str,
            reference_file:str,
            threshold:float=0.995,
            activation_count=2,
            continuous=True,
            verbose = False):

And this is my script:

import os
from eff_word_net.streams import SimpleMicStream
from eff_word_net.engine import HotwordDetector
from eff_word_net import samples_loc

hotword_hw = HotwordDetector(
        hotword="hotword",
        reference_file = "hotword_ref.json",
        activation_count=3
    )

mic_stream = SimpleMicStream()
mic_stream.start_stream()

print("Say hotword ")
while True :
    frame = mic_stream.getFrame()
    result = hotword_hw.checkFrame(frame)
        print("Wakeword uttered")
        print(hotword_hw.getMatchScoreFrame(frame))

and when I run this script, the checkFrame returns true even when the getMatchScoreFrame returns under the threshold, like:

Wakeword uttered
0.9371609374494279
Wakeword uttered
0.9164050520717595
Wakeword uttered
0.9082509350226378
...

Could you please take a look at this?

Thank you!

opened by dominickchen 10

Hotword detection triggers the moment any sound is being playd, even with the default models

So I've been trying to make a custom hotwork. But after seeing it trigger all the time, the moment any kind of sound is being recorded, I decided to use a default one, like "brightness", "mobile", "google", etc.

They all trigger immediatley. Using the default values for the HotWordDetector, by the way. Any clue why? It seemed to have worked great in your video presentation.

Not using a cheap ass microphone by the way.

opened by TrackLab 9
circuit diagram

Good evening！Can you send me the circuit diagram of raspberry pie connecting the bread board and lighting the LED light? Your experiment is so interesting that I want to repeat it.
documentation

opened by preachwebsite 5
Invalid sample rate

ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.front.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM front ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround21 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround21 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround40.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround40 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround41 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround50 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround51.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround51 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.surround71.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM surround71 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM iec958 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM spdif ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM spdif ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_lavrate.so (libasound_module_rate_lavrate.so: libasound_module_rate_lavrate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_samplerate.so (libasound_module_rate_samplerate.so: libasound_module_rate_samplerate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_speexrate.so (libasound_module_rate_speexrate.so: libasound_module_rate_speexrate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_jack.so (libasound_module_pcm_jack.so: libasound_module_pcm_jack.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_jack.so (libasound_module_pcm_jack.so: libasound_module_pcm_jack.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_oss.so (libasound_module_pcm_oss.so: libasound_module_pcm_oss.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_oss.so (libasound_module_pcm_oss.so: libasound_module_pcm_oss.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_pulse.so (libasound_module_pcm_pulse.so: libasound_module_pcm_pulse.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_pulse.so (libasound_module_pcm_pulse.so: libasound_module_pcm_pulse.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_a52.so (libasound_module_pcm_a52.so: libasound_module_pcm_a52.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_a52.so (libasound_module_pcm_a52.so: libasound_module_pcm_a52.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_upmix.so (libasound_module_pcm_upmix.so: libasound_module_pcm_upmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_upmix.so (libasound_module_pcm_upmix.so: libasound_module_pcm_upmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_vdownmix.so (libasound_module_pcm_vdownmix.so: libasound_module_pcm_vdownmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_vdownmix.so (libasound_module_pcm_vdownmix.so: libasound_module_pcm_vdownmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_usb_stream.so (libasound_module_pcm_usb_stream.so: libasound_module_pcm_usb_stream.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_usb_stream.so (libasound_module_pcm_usb_stream.so: libasound_module_pcm_usb_stream.so: cannot open shared object file: No such file or directory) Expression 'paInvalidSampleRate' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2043 Expression 'PaAlsaStreamComponent_InitialConfigure( &self->capture, inParams, self->primeBuffers, hwParamsCapture, &realSr )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2713 Expression 'PaAlsaStream_Configure( stream, inputParameters, outputParameters, sampleRate, framesPerBuffer, &inputLatency, &outputLatency, &hostBufferSizeMode )' failed in 'src/hostapi/alsa/pa_linux_alsa.c', line: 2837 Traceback (most recent call last): File "/home/pi/Documents/test.py", line 11, in mic_stream = SimpleMicStream() File "/home/pi/Documents/eff_word_net/streams.py", line 71, in init mic_stream=p.open( File "/home/pi/Downloads/yes/envs/eff/lib/python3.8/site-packages/pyaudio.py", line 750, in open stream = Stream(self, *args, **kwargs) File "/home/pi/Downloads/yes/envs/eff/lib/python3.8/site-packages/pyaudio.py", line 441, in init self._stream = pa.open(**arguments) OSError: [Errno -9997] Invalid sample rate

Good evening！I have encountered this problem. How can I solve it?Looking forward to your reply.
raspberrypi

opened by preachwebsite 3
Could you help me?

I won't deploy its running environment. Can you control it remotely? I have TeamViewer, a remote control software. The ID is 621 081 831. Or use other remote control. We look forward to your help.

opened by preachwebsite 3
raising precision of custom wakeword

I'm curious whether the precision of custom wakeword improves if you provide more sound files, e.g. 50 files from different people? or is that meaningless?

We want to use a custom wakeword for a public interaction system, and want it to recognize voice input from a wide range of people (young&old, male&female, etc).

Thank you for letting me know.

opened by dominickchen 3
Invalid input device (no default output device)

ALSA lib conf.c:3723:(snd_config_hooks_call) Cannot open shared library libasound_module_conf_pulse.so (libasound_module_conf_pulse.so: libasound_module_conf_pulse.so: cannot open shared object file: No such file or directory) ALSA lib control.c:1379:(snd_ctl_open_noupdate) Invalid CTL hw:0 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.front.0:CARD=0' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM front ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.rear ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.center_lfe ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.side ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround21 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround40 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround41 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround50 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround51 ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.surround71 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM iec958 ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM spdif ALSA lib confmisc.c:1281:(snd_func_refer) Unable to find definition 'cards.bcm2835_headpho.pcm.iec958.0:CARD=0,AES0=4,AES1=130,AES2=0,AES3=2' ALSA lib conf.c:4743:(_snd_config_evaluate) function snd_func_refer returned error: No such file or directory ALSA lib conf.c:5231:(snd_config_expand) Evaluate error: No such file or directory ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM spdif ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.hdmi ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.modem ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib pcm.c:2660:(snd_pcm_open_noupdate) Unknown PCM cards.pcm.phoneline ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_lavrate.so (libasound_module_rate_lavrate.so: libasound_module_rate_lavrate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_samplerate.so (libasound_module_rate_samplerate.so: libasound_module_rate_samplerate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_rate_speexrate.so (libasound_module_rate_speexrate.so: libasound_module_rate_speexrate.so: cannot open shared object file: No such file or directory) ALSA lib pcm_rate.c:1468:(snd_pcm_rate_open) Cannot find rate converter ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_jack.so (libasound_module_pcm_jack.so: libasound_module_pcm_jack.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_jack.so (libasound_module_pcm_jack.so: libasound_module_pcm_jack.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_oss.so (libasound_module_pcm_oss.so: libasound_module_pcm_oss.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_oss.so (libasound_module_pcm_oss.so: libasound_module_pcm_oss.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_pulse.so (libasound_module_pcm_pulse.so: libasound_module_pcm_pulse.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_pulse.so (libasound_module_pcm_pulse.so: libasound_module_pcm_pulse.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_a52.so (libasound_module_pcm_a52.so: libasound_module_pcm_a52.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_a52.so (libasound_module_pcm_a52.so: libasound_module_pcm_a52.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_upmix.so (libasound_module_pcm_upmix.so: libasound_module_pcm_upmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_upmix.so (libasound_module_pcm_upmix.so: libasound_module_pcm_upmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_vdownmix.so (libasound_module_pcm_vdownmix.so: libasound_module_pcm_vdownmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_vdownmix.so (libasound_module_pcm_vdownmix.so: libasound_module_pcm_vdownmix.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_usb_stream.so (libasound_module_pcm_usb_stream.so: libasound_module_pcm_usb_stream.so: cannot open shared object file: No such file or directory) ALSA lib dlmisc.c:340:(snd_dlobj_cache_get0) Cannot open shared library libasound_module_pcm_usb_stream.so (libasound_module_pcm_usb_stream.so: libasound_module_pcm_usb_stream.so: cannot open shared object file: No such file or directory) Traceback (most recent call last): File "/home/pi/Documents/test.py", line 11, in mic_stream = SimpleMicStream() File "/home/pi/Documents/eff_word_net/streams.py", line 71, in init mic_stream=p.open( File "/home/pi/Downloads/yes/envs/eff/lib/python3.8/site-packages/pyaudio.py", line 750, in open stream = Stream(self, *args, **kwargs) File "/home/pi/Downloads/yes/envs/eff/lib/python3.8/site-packages/pyaudio.py", line 441, in init self._stream = pa.open(**arguments) OSError: [Errno -9996] Invalid input device (no default output device)

I have encountered this problem. How can I solve it?

opened by preachwebsite 2

Here that working fine with ref file but not if a record custom file.

Hello, i working on google collab, so i don't have access to mic. The work around is to used mp3 or wav file. To do that i have add this class:

from streams import CustomAudioStream
from pydub import AudioSegment

import numpy as np
import wave

RATE = 16000
index = 0

class SimpleFileStream(CustomAudioStream) :

    def open_stream(self, src, mp3):
        if mp3:
          dst = "Data/sample.wav"
          # convert mp3 to wav              
          sound = AudioSegment.from_mp3(src).set_frame_rate(16000)
          sound.export(dst, format="wav")
          self.wf = wave.open(dst, 'rb')
        else:
          print("Not an mp3")
          self.wf = wave.open(src, 'rb')
          self.wf.rewind()
        print("Get params of wav file " + str(self.wf.getparams()))

    def close_stream(self):
        self.wf.close()

    def get_next_frame(self):
        global index
        print("Index ", index)
        index = index + self.CHUNK
        return np.frombuffer(self.wf.readframes(self.CHUNK),dtype=np.int16)

    """
    Implements stream with sliding window, 
    implemented by inheriting CustomAudioStream
    """
    def __init__(self,sliding_window_secs:float=1/8):
        self.CHUNK = int(sliding_window_secs*RATE)

        CustomAudioStream.__init__(
            self,
            open_stream = self.open_stream,
            close_stream = self.close_stream,
            get_next_frame = self.get_next_frame,
        )

It seems working if i used ref file of github. But if i record a custom file using audacity it is not detect the wakeword.

If i change the threshold to 0.7 and the activation count to 2 it is work better, but il will increase the chance of getting false positive.

Is it mandatory to have custom ref for each user ?

Best regards Sebastien

opened by warichet 2

Bump numpy from 1.20.0 to 1.22.0
Bumps numpy from 1.20.0 to 1.22.0.

Release notes

Sourced from numpy's releases.

v1.22.0

NumPy 1.22.0 Release Notes

NumPy 1.22.0 is a big release featuring the work of 153 contributors spread over 609 pull requests. There have been many improvements, highlights are:

Annotations of the main namespace are essentially complete. Upstream is a moving target, so there will likely be further improvements, but the major work is done. This is probably the most user visible enhancement in this release.

A preliminary version of the proposed Array-API is provided. This is a step in creating a standard collection of functions that can be used across application such as CuPy and JAX.

NumPy now has a DLPack backend. DLPack provides a common interchange format for array (tensor) data.

New methods for quantile, percentile, and related functions. The new methods provide a complete set of the methods commonly found in the literature.

A new configurable allocator for use by downstream projects.

These are in addition to the ongoing work to provide SIMD support for commonly used functions, improvements to F2PY, and better documentation.

The Python versions supported in this release are 3.8-3.10, Python 3.7 has been dropped. Note that 32 bit wheels are only provided for Python 3.8 and 3.9 on Windows, all other wheels are 64 bits on account of Ubuntu, Fedora, and other Linux distributions dropping 32 bit support. All 64 bit wheels are also linked with 64 bit integer OpenBLAS, which should fix the occasional problems encountered by folks using truly huge arrays.

Expired deprecations

Deprecated numeric style dtype strings have been removed

Using the strings "Bytes0", "Datetime64", "Str0", "Uint32", and "Uint64" as a dtype will now raise a TypeError.

(gh-19539)

Expired deprecations for loads, ndfromtxt, and mafromtxt in npyio

numpy.loads was deprecated in v1.15, with the recommendation that users use pickle.loads instead. ndfromtxt and mafromtxt were both deprecated in v1.17 - users should use numpy.genfromtxt instead with the appropriate value for the usemask parameter.

(gh-19615)

... (truncated)

Commits

4adc87d Merge pull request #20685 from charris/prepare-for-1.22.0-release

fd66547 REL: Prepare for the NumPy 1.22.0 release.

125304b wip

c283859 Merge pull request #20682 from charris/backport-20416

5399c03 Merge pull request #20681 from charris/backport-20954

f9c45f8 Merge pull request #20680 from charris/backport-20663

794b36f Update armccompiler.py

d93b14e Update test_public_api.py

7662c07 Update init.py

311ab52 Update armccompiler.py

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

@dependabot use these labels will set the current labels as the default for future PRs for this repo and language

@dependabot use these reviewers will set the current reviewers as the default for future PRs for this repo and language

@dependabot use these assignees will set the current assignees as the default for future PRs for this repo and language

@dependabot use this milestone will set the current milestone as the default for future PRs for this repo and language

You can disable automated security fix PRs for this repo from the Security Alerts page.

dependencies
opened by dependabot[bot] 0
Discussion : Hotword's accuracy too low

If you are playing around with using your own custom hotwords and some hotword happen to not work so good Feel free to use the thread in discussions https://github.com/Ant-Brain/EfficientWord-Net/discussions/4

opened by TheSeriousProgrammer 0
Hotword matches without any utterance

Hi, first of all thanks for making this library, Its fantastic!! I understand well since its in very early phase so it will have some issue and eventually it will better. So this time I was trying to go with the given example of hotword detector, I tried to attach a speech recognition after hotword triggers, but the performance is quite messy , to demonstrate this I am including this gif.

Problem1: Basically what happening is I am trying to call speech recognition right after there is match, as the speech recognition ends it again shows hotword uttered and re listen, even though there no hotword uttered and with confidence.

Problem2: Also in some situations it matches when there is little click or desk sound.

any fix for at least for Problem 1 I see problem 2 could be the reason of weak training as depending upon the hotword.

opened by OnlinePage 2

OSError: [Errno -9981] Input overflowed

I've installed the python library with

pip install EfficientWord-Net

onto a raspberry pi 2 with recent raspbian lite.

However if i run the demo with python -m eff_word_net.engine i'll get the following error and nothing works:

Say Mycroft / Alexa / Siri
Traceback (most recent call last):
  File "/usr/lib/python3.9/runpy.py", line 197, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/usr/lib/python3.9/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/max/.local/lib/python3.9/site-packages/eff_word_net/engine.py", line 333, in <module>
    frame = mic_stream.getFrame()
  File "/home/max/.local/lib/python3.9/site-packages/eff_word_net/streams.py", line 49, in getFrame
    new_frame = self._get_next_frame()
  File "/home/max/.local/lib/python3.9/site-packages/eff_word_net/streams.py", line 85, in <lambda>
    np.frombuffer(mic_stream.read(CHUNK),dtype=np.int16)
  File "/usr/lib/python3/dist-packages/pyaudio.py", line 608, in read
    return pa.read_stream(self._stream, num_frames, exception_on_overflow)
OSError: [Errno -9981] Input overflowed

any idea?

opened by mKenfenheuer 1

complex hotwords support #Current Model Limitations Discussion

Hi, Thanks for your helpful research. I wonder if the current model can handle complex hot words like "Hey Siri" or just handle one word, like "Siri"?

My second question is about hot words that their pronunciation takes more than 1s, like"Hey XXXX." Does your model support changing the recording time?

Did you try to use cosine_similarity instead of Euclidian distance in inference time?

Thanks.
enhancement

opened by amoazeni75 7
Problem with Dependencies #Docker Support

Hello I left a comment on Reddit saying I would give it a go, and you said if I had a problem to log it here, so here I am, with a problem 😊

I seem to get stuck with pip3 install librosa I get this error Failed building wheel for llvmlite Running setup.py clean for llvmlite Failed to build llvmlite I can push on and get EfficientWord installed and working, if I say Alexa it says Yup I hear ya

The problem is then when I try to create my own wake word I run this command … python3 -m eff_word_net.generate_reference [email protected]:~ $ python3 -m eff_word_net.generate_reference Paste Path of folder Containing audio files:/home/pi/wakewords Paste Path of location to save *_ref.json :/home/pi/wakewords Enter Wakeword Name :bender Traceback (most recent call last): File "/usr/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/usr/lib/python3.7/runpy.py", line 85, in run_code exec(code, run_globals) File "/home/pi/.local/lib/python3.7/site-packages/eff_word_net/generate_reference.py", line 80, in input("Enter Wakeword Name :") File "/home/pi/.local/lib/python3.7/site-packages/eff_word_net/generate_reference.py", line 47, in generate_reference_file x, = librosa.load(audio_file,sr=16000) AttributeError: module 'librosa' has no attribute 'load'

My Problem is with librosa, I am not able to install it. I tried everything I could google but it will never install

How did you get around this problem ?
enhancement good first issue raspberrypi wake_word_generation

opened by Balro76 3

Releases(stable)

stable(Feb 19, 2022)

The engine used a sliding window approach to listen for hotwords, this results in one utterance having multiple triggers. To minimize the same, some complicated logic was used resulting in unnecessarily complex api. In this release we shift to more simpler approach i.e relaxation_time (min time required between any 2 triggers , earlier triggers will be dismissed) , resulting in more simpler programmer api

However these large upates are breaking changes : (
Source code(tar.gz)
Source code(zip)
EfficientWord_Net-0.2.2-py3-none-any.whl(1.65 MB)
v0.1.1-beta(Jan 6, 2022)

Improved false positive reduction and changes to reduce multiple trigger per utterance of the hotwiord to one

TODO: Need to update documentation accordingly
Source code(tar.gz)
Source code(zip)
EfficientWord_Net-0.1.1-py3-none-any.whl(1.65 MB)

Owner

ANT-BRaiN

Small is the new big.

GitHub Repository https://ant-brain.github.io/EfficientWord-Net/

UAV-Networks-Routing is a Python simulator for experimenting routing algorithms and mac protocols on unmanned aerial vehicle networks.

UAV-Networks Simulator - Autonomous Networking - A.A. 20/21 UAV-Networks-Routing is a Python simulator for experimenting routing algorithms and mac pr

0 Nov 13, 2021

DGCNN - Dynamic Graph CNN for Learning on Point Clouds

DGCNN is the author's re-implementation of Dynamic Graph CNN, which achieves state-of-the-art performance on point-cloud-related high-level tasks including category classification, semantic segmentat

1.3k Dec 26, 2022

《Truly shift-invariant convolutional neural networks》(2021)

Truly shift-invariant convolutional neural networks [Paper] Authors: Anadi Chaman and Ivan Dokmanić Convolutional neural networks were always assumed

46 Dec 19, 2022

Code and real data for the paper "Counterfactual Temporal Point Processes", available at arXiv.

counterfactual-tpp This is a repository containing code and real data for the paper Counterfactual Temporal Point Processes. Pre-requisites This code

11 Dec 09, 2022

Ranger deep learning optimizer rewrite to use newest components

Ranger21 - integrating the latest deep learning components into a single optimizer Ranger deep learning optimizer rewrite to use newest components Ran

266 Dec 28, 2022

Clairvoyance: a Unified, End-to-End AutoML Pipeline for Medical Time Series

Clairvoyance: A Pipeline Toolkit for Medical Time Series Authors: van der Schaar Lab This repository contains implementations of Clairvoyance: A Pipel

$van_der_Schaar \LAB$ 89 Dec 07, 2022

Repository for the paper titled: "When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer"

When is BERT Multilingual? Isolating Crucial Ingredients for Cross-lingual Transfer This repository contains code for our paper titled "When is BERT M

9 Dec 23, 2022

Algorithmic encoding of protected characteristics and its implications on disparities across subgroups

Algorithmic encoding of protected characteristics and its implications on disparities across subgroups This repository contains the code for the paper

15 Oct 24, 2022

Securetar - A streaming wrapper around python tarfile and allow secure handling files and support encryption

Secure Tar Secure Tarfile library It's a streaming wrapper around python tarfile

2 Dec 09, 2022

Semi-Supervised Signed Clustering Graph Neural Network (and Implementation of Some Spectral Methods)

SSSNET SSSNET: Semi-Supervised Signed Network Clustering For details, please read our paper. Environment Setup Overview The project has been tested on

9 Nov 24, 2022

OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis

OpenABC-D: A Large-Scale Dataset For Machine Learning Guided Integrated Circuit Synthesis Overview OpenABC-D is a large-scale labeled dataset generate

31 Nov 22, 2022

Official PyTorch implementation of the ICRA 2021 paper: Adversarial Differentiable Data Augmentation for Autonomous Systems.

Adversarial Differentiable Data Augmentation This repository provides the official PyTorch implementation of the ICRA 2021 paper: Adversarial Differen

3 Oct 15, 2022

Semi-Supervised Learning, Object Detection, ICCV2021

End-to-End Semi-Supervised Object Detection with Soft Teacher By Mengde Xu*, Zheng Zhang*, Han Hu, Jianfeng Wang, Lijuan Wang, Fangyun Wei, Xiang Bai,

789 Dec 27, 2022

C3d-pytorch - Pytorch porting of C3D network, with Sports1M weights

C3D for pytorch This is a pytorch porting of the network presented in the paper Learning Spatiotemporal Features with 3D Convolutional Networks How to

311 Jan 06, 2023

Machine learning for NeuroImaging in Python

nilearn Nilearn enables approachable and versatile analyses of brain volumes. It provides statistical and machine-learning tools, with instructive doc

919 Dec 25, 2022

[TIP 2020] Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion

Multi-Temporal Scene Classification and Scene Change Detection with Correlation based Fusion Code for Multi-Temporal Scene Classification and Scene Ch

33 Dec 12, 2022

Scikit-learn compatible estimation of general graphical models

skggm : Gaussian graphical models using the scikit-learn API In the last decade, learning networks that encode conditional independence relationships

213 Jan 02, 2023

In-Place Activated BatchNorm for Memory-Optimized Training of DNNs

In-Place Activated BatchNorm In-Place Activated BatchNorm for Memory-Optimized Training of DNNs In-Place Activated BatchNorm (InPlace-ABN) is a novel

1.3k Dec 29, 2022

Torchserve server using a YoloV5 model running on docker with GPU and static batch inference to perform production ready inference.

Yolov5 running on TorchServe (GPU compatible) ! This is a dockerfile to run TorchServe for Yolo v5 object detection model. (TorchServe (PyTorch librar

82 Nov 29, 2022

Trax — Deep Learning with Clear Code and Speed

Trax — Deep Learning with Clear Code and Speed Trax is an end-to-end library for deep learning that focuses on clear code and speed. It is actively us

7.3k Dec 26, 2022