On-device speech-to-intent engine powered by deep learning

Overview

Rhino

GitHub release GitHub GitHub language count

PyPI Nuget Go Reference Pub Version npm Maven Central Maven Central Cocoapods npm npm npm npm Crates.io

Made in Vancouver, Canada by Picovoice

Twitter URL YouTube Channel Views

Rhino is Picovoice's Speech-to-Intent engine. It directly infers intent from spoken commands within a given context of interest, in real-time. For example, given a spoken command:

Can I have a small double-shot espresso?

Rhino infers that the user and emits the following inference result:

{
  "isUnderstood": "true",
  "intent": "orderBeverage",
  "slots": {
    "beverage": "espresso",
    "size": "small",
    "numberOfShots": "2"
  }
}

Rhino is:

  • using deep neural networks trained in real-world environments.
  • compact and computationally-efficient. It is perfect for IoT.
  • cross-platform: Raspberry Pi, BeagleBone, Android, iOS, Linux (x86_64), Mac (x86_64), Windows (x86_64), and web browsers are supported. Additionally, enterprise customers have access to the ARM Cortex-M SDK.
  • self-service. Developers can train custom models using Picovoice Console.

Table of Contents

License & Terms

The Rhino SDK is free and licensed under Apache 2.0, including the pre-trained models available within the repository. Picovoice Console offers two types of subscriptions: Personal and Enterprise. Personal accounts can train custom wake word models, subject to limitations and strictly for non-commercial purposes. Personal accounts empower researchers, hobbyists, and tinkerers to experiment. Enterprise accounts can unlock all capabilities of Picovoice Console, are permitted for use in commercial settings, and have a path to graduate to commercial distribution*.

Use Cases

Rhino is the right choice if the domain of voice interactions is specific (limited).

  • If you want to create voice experiences similar to Alexa or Google, see the Picovoice platform.
  • If you need to recognize a few static (always listening) voice commands, see Porcupine.

Try It Out

Rhino in Action

Language Support

  • English, German, French and Spanish.
  • Support for additional languages is available for commercial customers on a case-by-case basis.

Performance

A comparison between the accuracy of Rhino and major cloud-based alternatives is provided here. Below is the summary of the benchmark:

Terminology

Rhino infers the user's intent from spoken commands within a domain of interest. We refer to such a specialized domain as a Context. A context can be thought of a set of voice commands, each mapped to an intent:

turnLightOff:
  - Turn off the lights in the office
  - Turn off all lights
setLightColor:
  - Set the kitchen lights to blue

In examples above, each voice command is called an Expression. Expressions are what we expect the user to utter to interact with our voice application.

Consider the expression:

Turn off the lights in the office

What we require from Rhino is:

  1. To infer the intent (turnLightOff)
  2. Record the specific details from the utterance, in this case the location (office)

We can capture these details using slots by updating the expression:

turnLightOff:
  - Turn off the lights in the $location:lightLocation.

$location:lightLocation means that we expect a variable of type location to occur and we want to capture its value in a variable named lightLocation. We call such variable a Slot. Slots give us the ability to capture details of the spoken commands. Each slot type is be defined as a set of phrases. For example:

lightLocation:
  - "attic"
  - "balcony"
  - "basement"
  - "bathroom"
  - "bedroom"
  - "entrance"
  - "kitchen"
  - "living room"
  - ...

You can create custom contexts using the Picovoice Console.

To learn the complete expression syntax of Rhino, see the Speech-to-Intent Syntax Cheat Sheet.

Demos

If using SSH, clone the repository with:

git clone --recurse-submodules [email protected]:Picovoice/rhino.git

If using HTTPS, clone the repository with:

git clone --recurse-submodules https://github.com/Picovoice/rhino.git

Python Demos

Install the demo package:

sudo pip3 install pvrhinodemo

With a working microphone connected to your device run the following in the terminal:

rhino_demo_mic --context_path ${CONTEXT_PATH}

Replace ${CONTEXT_PATH} with either a context file created using Picovoice Console or one within the repository.

For more information about Python demos, go to demo/python.

.NET Demos

Rhino .NET demo is a command-line application that lets you choose between running Rhino on an audio file or on real-time microphone input.

Make sure there is a working microphone connected to your device. From demo/dotnet/RhinoDemo run the following in the terminal:

dotnet run -c MicDemo.Release -- --context_path ${CONTEXT_FILE_PATH}

Replace ${CONTEXT_FILE_PATH} with either a context file created using Picovoice Console or one within the repository.

For more information about .NET demos, go to demo/dotnet.

Java Demos

The Rhino Java demo is a command-line application that lets you choose between running Rhino on a audio file or on real-time microphone input.

To try the real-time demo, make sure there is a working microphone connected to your device. Then invoke the following commands from the terminal:

cd demo/java
./gradlew build
cd build/libs
java -jar rhino-mic-demo.jar -c ${CONTEXT_FILE_PATH}

Replace ${CONTEXT_FILE_PATH} with either a context file created using Picovoice Console or one within the repository.

For more information about Java demos go to demo/java.

Go Demos

The demo requires cgo, which on Windows may mean that you need to install a gcc compiler like Mingw to build it properly.

From demo/go run the following command from the terminal to build and run the mic demo:

go run micdemo/rhino_mic_demo.go -context_path ${CONTEXT_FILE_PATH}

Replace ${CONTEXT_FILE_PATH} with either a context file created using Picovoice Console or one within the repository.

For more information about Go demos go to demo/go.

Unity Demos

To run the Rhino Unity demo, import the Rhino Unity package into your project, open the RhinoDemo scene and hit play. To run on other platforms or in the player, go to File > Build Settings, choose your platform and hit the Build and Run button.

To browse the demo source go to demo/unity.

Flutter Demos

To run the Rhino demo on Android or iOS with Flutter, you must have the Flutter SDK installed on your system. Once installed, you can run flutter doctor to determine any other missing requirements for your relevant platform. Once your environment has been set up, launch a simulator or connect an Android/iOS device.

Before launching the app, use the copy_assets.sh script to copy the rhino demo context file into the demo project. (NOTE: on Windows, Git Bash or another bash shell is required, or you will have to manually copy the context into the project.).

Run the following command from demo/flutter to build and deploy the demo to your device:

flutter run

The demo uses a smart lighting context, which can understand commands such as:

Turn off the lights.

or

Set the lights in the living room to purple.

React Native Demos

To run the React Native Rhino demo app you will first need to setup your React Native environment. For this, please refer to React Native's documentation. Once your environment has been set up, navigate to demo/react-native to run the following commands:

For Android:

yarn android-install    # sets up environment
yarn android-run        # builds and deploys to Android

For iOS:

yarn ios-install        # sets up environment
yarn ios-run            # builds and deploys to iOS

Both demos use a smart lighting context, which can understand commands such as:

Turn off the lights.

or

Set the lights in the living room to purple.

Android Demos

Using Android Studio, open demo/android/Activity as an Android project and then run the application. After pressing the start button you can issue commands such as:

Turn off the lights.

or:

Set the lights in the living room to purple.

For more information about Android demo and the complete list of available expressions, go to demo/android.

iOS Demos

Before building the demo app, run the following from this directory to install the Rhino-iOS Cocoapod:

pod install

Then, using Xcode, open the generated RhinoDemo.xcworkspace and run the application. After pressing the start button you can issue commands such as:

Turn off the lights.

or:

Set the lights in the living room to purple.

For more information about Android demo and the complete list of available expressions go to demo/ios.

Web Demos

Vanilla JavaScript and HTML

From demo/web run the following in the terminal:

yarn
yarn start

(or)

npm install
npm run start

Open http://localhost:5000 in your browser to try the demo.

Angular Demos

From demo/angular run the following in the terminal:

yarn
yarn start

(or)

npm install
npm run start

Open http://localhost:4200 in your browser to try the demo.

React Demos

From demo/react run the following in the terminal:

yarn
yarn start

(or)

npm install
npm run start

Open http://localhost:3000 in your browser to try the demo.

Vue Demos

From demo/vue run the following in the terminal:

yarn
yarn serve

(or)

npm install
npm run serve

Open http://localhost:8080 in your browser to try the demo.

NodeJS Demos

Install the demo package:

yarn global add @picovoice/rhino-node-demo

With a working microphone connected to your device, run the following in the terminal:

rhn-mic-demo --context_path ${CONTEXT_FILE_PATH}

Replace ${CONTEXT_FILE_PATH} with either a context file created using Picovoice Console or one within the repository.

For more information about NodeJS demos go to demo/nodejs.

Rust Demos

This demo opens an audio stream from a microphone and performs inference on spoken commands. From demo/rust/micdemo run the following:

cargo run --release -- --context_path ${CONTEXT_FILE_PATH}

Replace ${CONTEXT_FILE_PATH} with either a context file created using Picovoice Console or one within the repository.

For more information about Rust demos go to demo/rust.

C Demos

The C demo requires CMake version 3.4 or higher.

The Microphone demo requires miniaudio for accessing microphone audio data.

Windows Requires MinGW to build the demo.

Microphone Demo

At the root of the repository, build with:

cmake -S demo/c/. -B demo/c/build && cmake --build demo/c/build --target rhino_demo_mic

Linux (x86_64), macOS (x86_64), Raspberry Pi, BeagleBone, and Jetson

List input audio devices with:

./demo/c/build/rhino_demo_mic --show_audio_devices

Run the demo using:

./demo/c/build/rhino_demo_mic ${RHINO_LIBRARY_PATH} lib/common/rhino_params.pv \
resources/contexts/${PLATFORM}/smart_lighting_${PLATFORM}.rhn ${AUDIO_DEVICE_INDEX}

Replace ${LIBRARY_PATH} with path to appropriate library available under lib, ${PLATFORM} with the name of the platform you are running on (linux, raspberry-pi, mac, beaglebone, or jetson), and ${AUDIO_DEVICE_INDEX} with the index of your audio device.

Windows

List input audio devices with:

.\\demo\\c\\build\\rhino_demo_mic.exe --show_audio_devices

Run the demo using:

.\\demo\\c\\build\\rhino_demo_mic.exe lib/windows/amd64/libpv_rhino.dll lib/common/rhino_params.pv resources/contexts/windows/smart_lighting_windows.rhn ${AUDIO_DEVICE_INDEX}

Replace ${AUDIO_DEVICE_INDEX} with the index of your audio device.

The demo opens an audio stream and infers your intent from spoken commands in the context of a smart lighting system. For example, you can say:

"Turn on the lights in the bedroom."

File Demo

At the root of the repository, build with:

cmake -S demo/c/. -B demo/c/build && cmake --build demo/c/build --target rhino_demo_file

Linux (x86_64), macOS (x86_64), Raspberry Pi, BeagleBone, and Jetson

Run the demo using:

./demo/c/build/rhino_demo_file ${LIBRARY_PATH} lib/common/rhino_params.pv \
resources/contexts/${PLATFORM}/coffee_maker_${PLATFORM}.rhn resources/audio_samples/test_within_context.wav 

Replace ${LIBRARY_PATH} with path to appropriate library available under lib, ${PLATFORM} with the name of the platform you are running on (linux, raspberry-pi, mac, beaglebone, or jetson).

Windows

Run the demo using:

.\\demo\\c\\build\\rhino_demo_file.exe lib/windows/amd64/libpv_rhino.dll lib/common/rhino_params.pv resources/contexts/windows/coffee_maker_windows.rhn resources/audio_samples/test_within_context.wav

The demo opens up the WAV file and infers the intent in the context of a coffee maker system.

For more information about C demos go to demo/c.

SDKs

Python

Install the Python SDK:

pip3 install pvrhino

The SDK exposes a factory method to create instances of the engine:

import pvrhino

handle = pvrhino.create(context_path='/absolute/path/to/context')

Where context_path is the absolute path to the Speech-to-Intent context created either using Picovoice Console or one of the default contexts available on Rhino's GitHub repository.

When initialized, the required sample rate can be obtained using rhino.sample_rate. The expected frame length (number of audio samples in an input array) is provided by rhino.frame_length. The object can be used to infer intent from spoken commands as below:

import pvrhino

handle = pvrhino.create(context_path='/absolute/path/to/context')

def get_next_audio_frame():
    pass

while True:
    is_finalized = handle.process(get_next_audio_frame())

    if is_finalized:
        inference = handle.get_inference()
        if not inference.is_understood:
            # add code to handle unsupported commands
            pass
        else:
            intent = inference.intent
            slots = inference.slots
            # add code to take action based on inferred intent and slot values

Finally, when done be sure to explicitly release the resources using handle.delete().

.NET

Install the .NET SDK using NuGet or the dotnet CLI:

dotnet add package Rhino

The SDK exposes a factory method to create instances of the engine as below:

using Pv

Rhino handle = Rhino.Create(contextPath:"/absolute/path/to/context");

When initialized, the valid sample rate is given by handle.SampleRate. The expected frame length (number of audio samples in an input array) is handle.FrameLength. The engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio.

short[] GetNextAudioFrame()
{
    // .. get audioFrame
    return audioFrame;
}

while(true)
{
    bool isFinalized = handle.Process(GetNextAudioFrame());
    if(isFinalized)
    {
        Inference inference = handle.GetInference();
        if(inference.IsUnderstood)
        {
            string intent = inference.Intent;
            Dictionary<string, string> slots = inference.Slots;
            // .. code to take action based on inferred intent and slot values
        }
        else
        {
            // .. code to handle unsupported commands
        }
    }
}

Rhino will have its resources freed by the garbage collector, but to have resources freed immediately after use, wrap it in a using statement:

using(Rhino handle = Rhino.Create(contextPath:"/absolute/path/to/context"))
{
    // .. Rhino usage here
}

Java

The Rhino Java binding is available from the Maven Central Repository at ai.picovoice:rhino-java:${version}.

The SDK exposes a Builder that allows you to create an instance of the engine:

import ai.picovoice.rhino.*;

try{
    Rhino handle = new Rhino.Builder()
                    .setContextPath("/absolute/path/to/context")
                    .build();
} catch (RhinoException e) { }

When initialized, the valid sample rate is given by handle.getSampleRate(). The expected frame length (number of audio samples in an input array) is handle.getFrameLength(). The engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio.

short[] getNextAudioFrame(){
    // .. get audioFrame
    return audioFrame;
}

while(true) {
    boolean isFinalized = handle.process(getNextAudioFrame());
    if(isFinalized){
        RhinoInference inference = handle.getInference();
        if(inference.getIsUnderstood()){
            String intent = inference.getIntent();
            Map<string, string> slots = inference.getSlots();
            // .. code to take action based on inferred intent and slot values
        } else {
            // .. code to handle unsupported commands
        }
    }
}

Once you are done with Rhino, ensure you release its resources explicitly:

handle.delete();

Go

To install the Rhino Go module to your project, use the command:

go get github.com/Picovoice/rhino/binding/go

To create an instance of the engine with default parameters, pass a path to a Rhino context file (.rhn) to the NewRhino function and then make a call to .Init().

import . "github.com/Picovoice/rhino/binding/go"

rhino = NewRhino("/path/to/context/file.rhn")
err := rhino.Init()
if err != nil {
    // handle error
}

Once initialized, you can start passing in frames of audio for processing. The engine accepts 16-bit linearly-encoded PCM and operates on single-channel audio. The sample rate that is required by the engine is given by SampleRate and number of samples per frame is FrameLength.

To feed audio into Rhino, use the Process function in your capture loop. You must have called Init() before calling Process.

func getNextFrameAudio() []int16{
    // get audio frame
}

for {
    isFinalized, err := rhino.Process(getNextFrameAudio())
    if isFinalized {
        inference, err := rhino.GetInference()
        if inference.IsUnderstood {
            intent := inference.Intent
            slots := inference.Slots
            // add code to take action based on inferred intent and slot values
        } else {
            // add code to handle unsupported commands
        }
    }
}

When done resources have to be released explicitly.

rhino.Delete()

Unity

Import the Rhino Unity Package into your Unity project.

The SDK provides two APIs:

High-Level API

RhinoManager provides a high-level API that takes care of audio recording. This class is the quickest way to get started.

Using the constructor RhinoManager.Create will create an instance of the RhinoManager using the provided context file.

using Pv.Unity;

try
{
    RhinoManager _rhinoManager = RhinoManager.Create(
                                    "/path/to/context/file.rhn",
                                    (inference) => {});
}
catch (Exception ex)
{
    // handle rhino init error
}

Once you have instantiated a RhinoManager, you can start audio capture and intent inference by calling:

_rhinoManager.Process();

Audio capture stops and Rhino resets once an inference result is returned via the inference callback. When you wish to result, call .Process() again.

Once the app is done with using an instance of RhinoManager, you can explicitly release the audio resources and the resources allocated to Rhino:

_rhinoManager.Delete();

There is no need to deal with audio capture to enable intent inference with RhinoManager. This is because it uses our unity-voice-processor Unity package to capture frames of audio and automatically pass it to the inference engine.

Low-Level API

Rhino provides low-level access to the inference engine for those who want to incorporate speech-to-intent into a already existing audio processing pipeline.

To create an instance of Rhino, use the .Create static constructor and a context file.

using Pv.Unity;

try
{
    Rhino _rhino = Rhino.Create("path/to/context/file.rhn");
}
catch (Exception ex)
{
    // handle rhino init error
}

To feed Rhino your audio, you must send it frames of audio to its Process function until it has made an inference.

short[] GetNextAudioFrame()
{
    // .. get audioFrame
    return audioFrame;
}

try
{
    bool isFinalized = _rhino.Process(GetNextAudioFrame());
    if(isFinalized)
    {
        Inference inference = _rhino.GetInference();
        if(inference.IsUnderstood)
        {
            string intent = inference.Intent;
            Dictionary<string, string> slots = inference.Slots;
            // .. code to take action based on inferred intent and slot values
        }
        else
        {
            // .. code to handle unsupported commands
        }
    }
}
catch (Exception ex)
{
    Debug.LogError(ex.ToString());
}

For process to work correctly, the audio data must be in the audio format required by Picovoice.

Rhino implements the IDisposable interface, so you can use Rhino in a using block. If you don't use a using block, resources will be released by the garbage collector automatically or you can explicitly release the resources like so:

_rhino.Dispose();

Flutter

Add the Rhino Flutter plugin to your pub.yaml.

dependencies:
  rhino: ^<version>

The SDK provides two APIs:

High-Level API

RhinoManager provides a high-level API that takes care of audio recording. This class is the quickest way to get started.

The constructor RhinoManager.create will create an instance of the RhinoManager using a context file that you pass to it.

import 'package:rhino/rhino_manager.dart';
import 'package:rhino/rhino_error.dart';

void createRhinoManager() async {
    try{
        _rhinoManager = await RhinoManager.create(
            "/path/to/context/file.rhn",
            _inferenceCallback);
    } on PvError catch (err) {
        // handle rhino init error
    }
}

The inferenceCallback parameter is a function that you want to execute when Rhino makes an inference. The function should accept a map that represents the inference result.

void _infererence(Map<String, dynamic> inference){
    if(inference['isUnderstood']){
        String intent = inference['intent']
        Map<String, String> = inference['slots']
        // add code to take action based on inferred intent and slot values
    }
    else{
        // add code to handle unsupported commands
    }
}

Once you have instantiated a RhinoManager, you can start audio capture and intent inference using the .process() function. Audio capture stops and rhino resets once an inference result is returned via the inference callback.

try{
    await _rhinoManager.process();
} on PvAudioException catch (ex) { }

Once your app is done with using RhinoManager, be sure you explicitly release the resources allocated for it:

_rhinoManager.delete();

Our flutter_voice_processor Flutter plugin captures the frames of audio and automatically passes it to the speech-to-intent engine.

Low-Level API

Rhino provides low-level access to the inference engine for those who want to incorporate speech-to-intent into a already existing audio processing pipeline.

Rhino is created by passing a context file to its static constructor create:

import 'package:rhino/rhino_manager.dart';
import 'package:rhino/rhino_error.dart';

void createRhino() async {
    try{
        _rhino = await Rhino.create('/path/to/context/file.rhn');
    } on PvError catch (err) {
        // handle rhino init error
    }
}

To deliver audio to the engine, you must send audio frames to its process function. Each call to process will return a Map object that will contain the following items:

  • isFinalized - whether Rhino has made an inference
  • isUnderstood - if isFinalized, whether Rhino understood what it heard based on the context
  • intent - if isUnderstood, name of intent that were inferred
  • slots - if isUnderstood, dictionary of slot keys and values that were inferred
List<int> buffer = getAudioFrame();

try {
    Map<String, dynamic> inference = _rhino.process(buffer);
    if(inference['isFinalized']){
        if(inference['isUnderstood']){
            String intent = inference['intent']
            Map<String, String> = inference['slots']
            // add code to take action based on inferred intent and slot values
        }
    }
} on PvError catch (error) {
    // handle error
}

// once you are done
this._rhino.delete();

React Native

Install @picovoice/react-native-voice-processor and @picovoice/rhino-react-native. The SDK provides two APIs:

High-Level API

RhinoManager provides a high-level API that takes care of audio recording. This class is the quickest way to get started.

The constructor RhinoManager.create will create an instance of a RhinoManager using a context file that you pass to it.

async createRhinoManager(){
    try{
        this._rhinoManager = await RhinoManager.create(
            '/path/to/context/file.rhn',
            inferenceCallback);
    } catch (err) {
        // handle error
    }
}

Once you have instantiated a RhinoManager, you can start/stop audio capture and intent inference by calling .process(). Upon receiving an inference callback, audio capture will stop automatically and Rhino will reset. To restart it you must call .process() again.

let didStart = await this._rhinoManager.process();

When you are done using Rhino, release you must explicitly resources:

this._rhinoManager.delete();

@picovoice/react-native-voice-processor handles audio capture and RhinoManager passes frames to the inference engine for you.

Low-Level API

Rhino provides low-level access to the inference engine for those who want to incorporate speech-to-intent into a already existing audio processing pipeline.

Rhino is created by passing a context file to its static constructor create:

async createRhino(){
    try{
        this._rhino = await Rhino.create('/path/to/context/file.rhn');
    } catch (err) {
        // handle error
    }
}

To deliver audio to the enine, you must pass it audio frames using the process function. The JSON result that is returned from process will have up to four fields:

  • isFinalized - whether Rhino has made an inference
  • isUnderstood - if isFinalized, whether Rhino understood what it heard based on the context
  • intent - if isUnderstood, name of intent that were inferred
  • slots - if isUnderstood, dictionary of slot keys and values that were inferred
let buffer = getAudioFrame();
try {
    let result = await this._rhino.process(buffer);
    // use result
    // ..
    }
} catch (e) {
    // handle error
}

// once you are done
this._rhino.delete();

Android

To include the package in your Android project, ensure you have included mavenCentral() in your top-level build.gradle file and then add the following to your app's build.gradle:

dependencies {    
    implementation 'ai.picovoice:rhino-android:1.6.0'
}

There are two possibilities for integrating Rhino into an Android application: the High-level API and the Low-level API.

High-Level API

RhinoManager provides a high-level API for integrating Rhino into Android applications. It manages all activities related to creating an input audio stream, feeding it into Rhino, and invoking a user-provided inference callback.

try {
    RhinoManager rhinoManager = new RhinoManager.Builder()
                        .setContextPath("/path/to/context/file.rhn")
                        .setModelPath("/path/to/model/file.pv")
                        .setSensitivity(0.35f)                        
                        .build(appContext, new RhinoManagerCallback() {
                            @Override
                            public void invoke(RhinoInference inference) {
                                if (inference.getIsUnderstood()) {
                                    final String intent = inference.getIntent()));
                                    final Map<String, String> slots = inference.getSlots();
                                    // add code to take action based on inferred intent and slot values
                                }
                                else {
                                    // add code to handle unsupported commands
                                }
                            }
                        });
} catch (RhinoException e) { }

The appContext parameter is the Android application context - this is used to extract Rhino resources from the APK. Sensitivity is the parameter that enables developers to trade miss rate for false alarm. It is a floating point number within [0, 1]. A higher sensitivity reduces miss rate at cost of increased false alarm rate.

When initialized, input audio can be processed using manager.process(). When done, be sure to release the resources using manager.delete().

Low-Level API

Rhino provides a binding for Android using JNI. It can be initialized using:

import ai.picovoice.rhino.*;

try {    
    Rhino rhino = new Rhino.Builder()
                        .setContextPath("/path/to/context/file.rhn")                        
                        .build(appContext);
} catch (RhinoException e) { }

Once initialized, handle can be used for intent inference:

private short[] getNextAudioFrame();

while (!handle.process(getNextAudioFrame()));

final RhinoInference inference = handle.getInference();
if (inference.getIsUnderstood()) {
    // logic to perform an action given the intent object.
} else {
    // logic for handling out of context or unrecognized command
}

Finally, prior to exiting the application be sure to release resources acquired:

handle.delete()

iOS

The Rhino iOS binding is available via Cocoapods. To import it into your iOS project, add the following line to your Podfile and run pod install:

pod 'Rhino-iOS'

There are two approaches for integrating Rhino into an iOS application: The high-level API and the low-level API.

High-Level API

RhinoManager provides a high-level API for integrating Rhino into iOS applications. It manages all activities related to creating an input audio stream, feeding it to the engine, and invoking a user-provided inference callback.

do {
    RhinoManager manager = try RhinoManager(
        contextPath: "/path/to/context/file.rhn", 
        modelPath: "/path/to/model/file.pv",
        sensitivity: 0.35,
        onInferenceCallback: { inference in
                if inference.isUnderstood {
                    let intent:String = inference.intent
                    let slots:Dictionary<String,String> = inference.slots
                    // use inference results
                }
            })
} catch { }

Sensitivity is the parameter that enables developers to trade miss rate for false alarm. It is a floating point number within [0, 1]. A higher sensitivity reduces miss rate at cost of increased false alarm rate.

When initialized, input audio can be processed using manager.process(). When done, be sure to release the resources using manager.delete().

Low-Level API

Rhino provides low-level access to the Speech-to-Intent engine for those who want to incorporate intent inference into a already existing audio processing pipeline.

import Rhino

do {
    Rhino handle = try Rhino(contextPath: "/path/to/context/file.rhn")
} catch { }

Once initialized, handle can be used for intent inference:

func getNextAudioFrame() -> [Int16] {
    // .. get audioFrame
    return audioFrame
}

while true {
    do {
        let isFinalized = try handle.process(getNextAudioFrame())
        if isFinalized {
            let inference = try handle.getInference()
            if inference.isUnderstood {
                let intent:String = inference.intent
                let slots:Dictionary<String, String> = inference.slots
                // add code to take action based on inferred intent and slot values
            }
        }
    } catch { }
}

Finally, prior to exiting the application be sure to release resources acquired:

handle.delete()

Web

Rhino is available on modern web browsers (i.e. not Internet Explorer) via WebAssembly. Microphone audio is handled via the Web Audio API and is abstracted by the WebVoiceProcessor, which also handles downsampling to the correct format. Rhino is provided pre-packaged as a Web Worker.

Each spoken language is available as a dedicated npm package (e.g. @picovoice/rhino-web-en-worker). These packages can be used with the @picovoice/web-voice-processor. They can also be used with the Angular, React, and Vue bindings, which abstract and hide the web worker communication details.

Vanilla JavaScript and HTML (CDN Script Tag)

<!DOCTYPE html>
<html lang="en">
  <head>
    <script src="https://unpkg.com/@picovoice/rhino-web-en-worker/dist/iife/index.js"></script>
    <script src="https://unpkg.com/@picovoice/web-voice-processor/dist/iife/index.js"></script>
    <script type="application/javascript">
      const RHINO_CONTEXT_BASE64 = /* Base64 representation of .rhn file  */;

      async function startRhino() {
        console.log("Rhino is loading. Please wait...");
        window.rhinoWorker = await RhinoWebEnWorker.RhinoWorkerFactory.create(
          {
            context: {
              base64: RHINO_CONTEXT_BASE64,
              sensitivity: 0.5,
            },
            start: false,
          }
        );

        console.log("Rhino worker ready!");

        window.rhinoWorker.onmessage = (msg) => {
          if (msg.data.command === "rhn-inference") {
            console.log("Inference detected: " + JSON.stringify(msg.data.inference));
            window.rhinoWorker.postMessage({ command: "pause" });
            document.getElementById("push-to-talk").disabled = false;
            console.log("Rhino is paused. Press the 'Push to Talk' button to speak again.")
          }
        };

        console.log(
          "WebVoiceProcessor initializing. Microphone permissions requested ..."
        );

        try {
          let webVp = await WebVoiceProcessor.WebVoiceProcessor.init({
            engines: [window.rhinoWorker],
          });
          console.log(
            "WebVoiceProcessor ready! Press the 'Push to Talk' button to talk."
          );
        } catch (e) {
          console.log("WebVoiceProcessor failed to initialize: " + e);
        }
      }

      document.addEventListener("DOMContentLoaded", function () {
        startRhino();
        document.getElementById("push-to-talk").onclick = function (event) {
          console.log("Rhino is listening for your commands ...");
          this.disabled = true;
          window.rhinoWorker.postMessage({ command: "resume" });
        };
      });
    </script>
  </head>
  <body>
    <button id="push-to-talk">Push to Talk</button>
  </body>
</html>

Vanilla JavaScript and HTML (ES Modules)

yarn add @picovoice/rhino-web-en-worker @picovoice/web-voice-processor

(or)

npm install @picovoice/rhino-web-en-worker @picovoice/web-voice-processor
import { WebVoiceProcessor } from "@picovoice/web-voice-processor"
import { RhinoWorkerFactory } from "@picovoice/rhino-web-en-worker";
 
const RHN_CONTEXT_BASE64 = /* Base64 representation of a .rhn context */
 
async startRhino()
  // Create a Rhino Worker (English language) to listen for
  // commands in the specified context
  const rhinoWorker = await RhinoWorkerFactory.create(
    {context: RHN_CONTEXT_BASE64 }
  );
 
  // The worker will send a message with data.command = "rhn-inference" upon concluding
  // Here we tell it to log it to the console
  rhinoWorker.onmessage = (msg) => {
    switch (msg.data.command) {
      case 'rhn-inference':
        // Log the event
        console.log("Rhino inference: " + msg.data.inference);
        // Pause Rhino processing until the push-to-talk button is pressed again
        rhinoWorker.postMessage({command: "pause"})
        break;
      default:
        break;
    }
  };
 
  // Start up the web voice processor. It will request microphone permission
  // It downsamples the audio to voice recognition standard format (16-bit 16kHz linear PCM, single-channel)
  // The incoming microphone audio frames will then be forwarded to the Rhino Worker
  // n.b. This promise will reject if the user refuses permission! Make sure you handle that possibility.
  const webVp = await WebVoiceProcessor.init({
    engines: [rhinoWorker],
    start: true,
  });
  }
 
  // Rhino is push-to-talk. We need to to tell it that we
  // are starting a voice interaction:
  function pushToTalk() {
    rhinoWorker.postMessage({command: "resume"})
  }
 
}
startRhino()
 
...
 
// Finished with Rhino? Release the WebVoiceProcessor and the worker.
if (done) {
  webVp.release()
  rhinoWorker.sendMessage({command: "release"})
}

Angular

yarn add @picovoice/rhino-web-angular @picovoice/rhino-web-en-worker

(or)

npm install @picovoice/rhino-web-angular @picovoice/rhino-web-en-worker
async ngOnInit() {
  const rhinoFactoryEn = (await import('@picovoice/rhino-web-en-worker')).RhinoWorkerFactory
  // Initialize Rhino Service
  try {
    await this.rhinoService.init(rhinoFactoryEn, {context: { base64: RHN_CONTEXT_BASE64 }})
    console.log("Rhino is now loaded. Press the Push-to-Talk button to activate.")
  }
  catch (error) {
    console.error(error)
  }
}
 
ngOnDestroy() {
  this.rhinoDetection.unsubscribe()
  this.rhinoService.release()
}
 
public pushToTalk() {
  this.rhinoService.pushToTalk();
}

React

yarn add @picovoice/rhino-web-react @picovoice/rhino-web-en-worker

(or)

npm install @picovoice/rhino-web-react @picovoice/rhino-web-en-worker
mport React, { useState } from 'react';
import { RhinoWorkerFactory } from '@picovoice/rhino-web-en-worker';
import { useRhino } from '@picovoice/rhino-web-react';
 
const RHINO_CONTEXT_BASE64 = /* Base64 representation an English language .rhn file, omitted for brevity */
 
function VoiceWidget(props) {
  const [latestInference, setLatestInference] = useState(null)
 
  const inferenceEventHandler = (rhinoInference) => {
    console.log(`Rhino inferred: ${rhinoInference}`);
    setLatestInference(rhinoInference)
  };
 
  const {
    isLoaded,
    isListening,
    isError,
    isTalking,
    errorMessage,
    start,
    resume,
    pause,
    pushToTalk,
  } = useRhino(
    // Pass in the factory to build Rhino workers. This needs to match the context language below
    RhinoWorkerFactory,
    // Initialize Rhino (in a paused state).
    // Immediately start processing microphone audio,
    // Although Rhino itself will not start listening until the Push to Talk button is pressed.
    {
      context: { base64: RHINO_CONTEXT_BASE64 },
      start: true,
    }
    inferenceEventHandler
  );
 
return (
  <div className="voice-widget">
    <button onClick={() => pushToTalk()} disabled={isTalking || isError || !isLoaded}>
      Push to Talk
    </button>
    <p>{JSON.stringify(latestInference)}</p>
  </div>
)

Vue

yarn add @picovoice/rhino-web-vue @picovoice/rhino-web-en-worker

(or)

npm install @picovoice/rhino-web-vue @picovoice/rhino-web-en-worker
<template>
  <div class="voice-widget">
    <Rhino
      ref="rhino"
      v-bind:rhinoFactoryArgs="{
        context: {
          base64: '...', <!-- Base64 representation of a trained Rhino context; i.e. a `.rhn` file, omitted for brevity -->
        },
      }"
      v-bind:rhinoFactory="factory"
      v-on:rhn-error="rhnErrorFn"
      v-on:rhn-inference="rhnInferenceFn"
      v-on:rhn-init="rhnInitFn"
      v-on:rhn-ready="rhnReadyFn"
    />
  </div>
</template>
 
<script>
import Rhino from "@picovoice/rhino-web-vue";
import { RhinoWorkerFactory as RhinoWorkerFactoryEn } from "@picovoice/rhino-web-en-worker";
 
export default {
  name: "VoiceWidget",
  components: {
    Rhino,
  },
  data: function () {
    return {
      inference: null,
      isError: false,
      isLoaded: false,
      isListening: false,
      isTalking: false,
      factory: RhinoWorkerFactoryEn,
    };
  },
  methods: {
    pushToTalk: function () {
      if (this.$refs.rhino.pushToTalk()) {
        this.isTalking = true;
      }
    },
    rhnInitFn: function () {
      this.isError = false;
    },
    rhnReadyFn: function () {
      this.isLoaded = true;
      this.isListening = true;
    },
    rhnInferenceFn: function (inference) {
      this.inference = inference;
      console.log("Rhino inference: " + inference)
      this.isTalking = false;
    },
    rhnErrorFn: function (error) {
      this.isError = true;
      this.errorMessage = error.toString();
    },
  },
};

NodeJS

Install the NodeJS SDK:

yarn add @picovoice/rhino-node

Create instances of the Rhino class by specifying the path to the context file:

const Rhino = require("@picovoice/rhino-node");

let handle = new Rhino("/path/to/context/file.rhn");

When instantiated, handle can process audio via its .process method:

let getNextAudioFrame = function() {
    ...
};

let isFinalized = false;
while (!isFinalized) {
  isFinalized = handle.process(getNextAudioFrame());
  if (isFinalized) {
    let inference = engineInstance.getInference();
    // Insert inference event callback
  }
}

When done, be sure to release resources acquired by WebAssembly using release():

handle.release();

Rust

First you will need Rust and Cargo installed on your system.

To add the porcupine library into your app, add pv_rhino to your apps Cargo.toml manifest:

[dependencies]
pv_rhino = "*"

To create an instance of the engine you first create a RhinoBuilder instance with the configuration parameters for the speech to intent engine and then make a call to .init():

use rhino::RhinoBuilder;

let rhino: Rhino = RhinoBuilder::new("/path/to/context/file.rhn").init().expect("Unable to create Rhino");

To feed audio into Rhino, use the process function in your capture loop:

fn next_audio_frame() -> Vec<i16> {
    // get audio frame
}

loop {
    if let Ok(is_finalized) = rhino.process(&next_audio_frame()) {
        if is_finalized {
            if let Ok(inference) = rhino.get_inference() {
                if inference.is_understood {
                    let intent = inference.intent.unwrap();
                    let slots = inference.slots;
                    // add code to take action based on inferred intent and slot values
                } else {
                    // add code to handle unsupported commands
                }
            }
        }
    }
}

C

Rhino is implemented in ANSI C and therefore can be directly linked to C applications. The pv_rhino.h header file contains relevant information. An instance of the Rhino object can be constructed as follows:

const char *model_path = ... // Available at lib/common/rhino_params.pv
const char *context_path = ... // absolute path to context file for the domain of interest
const float sensitivity = 0.5f;

pv_rhino_t *handle = NULL;
const pv_status_t status = pv_rhino_init(model_path, context_path, sensitivity, &handle);
if (status != PV_STATUS_SUCCESS) {
    // add error handling code
}

Now the handle can be used to infer intent from an incoming audio stream. Rhino accepts single channel, 16-bit PCM audio. The sample rate can be retrieved using pv_sample_rate(). Finally, Rhino accepts input audio in consecutive chunks (frames); the length of each frame can be retrieved using pv_rhino_frame_length().

extern const int16_t *get_next_audio_frame(void);

while (true) {
    const int16_t *pcm = get_next_audio_frame();

    bool is_finalized = false;
    pv_status_t status = pv_rhino_process(handle, pcm, &is_finalized);
    if (status != PV_STATUS_SUCCESS) {
        // add error handling code
    }

    if (is_finalized) {
        bool is_understood = false;
        status = pv_rhino_is_understood(rhino, &is_understood);
        if (status != PV_STATUS_SUCCESS) {
            // add error handling code
        }

        if (is_understood) {
            const char *intent = NULL;
            int32_t num_slots = 0;
            const char **slots = NULL;
            const char **values = NULL;
            status = pv_rhino_get_intent(rhino, &intent, &num_slots, &slots, &values);
            if (status != PV_STATUS_SUCCESS) {
                // add error handling code
            }

            // add code to take action based on inferred intent and slot values

            pv_rhino_free_slots_and_values(rhino, slots, values);
        } else {
            // add code to handle unsupported commands
        }

        pv_rhino_reset(rhino);
    }
}

When done, remember to release the resources acquired.

pv_rhino_delete(rhino);

Releases

v1.6.0 December 2nd, 2020

  • Added support for React Native.
  • Added support for Java.
  • Added support for .NET.
  • Added support for NodeJS.

v1.5.0 June 4th, 2020

  • Accuracy improvements.

v1.4.0 April 13th, 2020

  • Accuracy improvements.
  • Builtin slots

v1.3.0 February 13th, 2020

  • Accuracy improvements.
  • Runtime optimizations.
  • Added support for Raspberry Pi 4
  • Added support for JavaScript.
  • Added support for iOS.
  • Updated documentation.

v1.2.0 April 26, 2019

  • Accuracy improvements.
  • Runtime optimizations.

v1.1.0 December 23rd, 2018

  • Accuracy improvements.
  • Open-sourced Raspberry Pi build.

v1.0.0 November 2nd, 2018

  • Initial Release

FAQ

You can find the FAQ here.

Comments
  • output confidence for inference

    output confidence for inference

    Hi,

    I am developing multilanguage intention detection using Rhino. I provide the same audio sequence to Rhino models of different languages. Unfortunately, some phrases provide double-detection on different languages. For instance, German "Wasser auf" ("Water on" in English) is sometimes detected by English Rhino model as "Water off" (opposite meaning) together with correct detection by German model. Playing with the Sensitivity parameter hasn't resolved the problem. Is it possible to compare the detections? Do you have something like "Detection probability" value that that estimates quality of detection? I program in C. Can such value be accessed via rhino pointer?

    I will appreciate very much any help that you can provide.

    Maybe you can extend libpv_rhino.so library with the function that can extract such value.

    Thank you!

    Best regards, Alex.

    enhancement 
    opened by shilovav3 11
  • Rhino Issue: Using non-EN models, builtins return

    Rhino Issue: Using non-EN models, builtins return "plain text" instead of parsed output

    Hey!

    Expected behaviour

    Using the library with the french model, and using this expression [change, ajuste, modifie, règle] (la luminosité de) la lumière (du, de) $room:room à $pv.Percent:pct, the inference returns the percentage in "plain text":

    Inference(is_understood=True, intent='lum', slots={'room': 'salon', 'pct': 'cinquante pour cent'})
    

    Actual behaviour

    According to the doc (https://picovoice.ai/docs/tips/syntax-cheat-sheet/), I expect the inference to be:

    Inference(is_understood=True, intent='lum', slots={'room': 'salon', 'pct': '50%'})
    

    Steps to reproduce the behaviour

    Create any expression in french, using a pv.Percent slot type.

    I am using the Python client, but the web console behaves in the same way.

    Thanks!

    bug 
    opened by tsileo 10
  • Built-in slot types newly implemented in non-English languages give

    Built-in slot types newly implemented in non-English languages give "textual" outputs (related to the ticket #252)

    Hello,

    I've tested the new implemented feature for non-English language (especially French language) but found that outputted values are provided in text strings and not in numbers strings 😢 contrary to the english implementation. For example: pv.TwoDigitInteger gives the string "vingt cinq" instead of the string "25". This is not convenient and complicates the application implementation. Could you please replace the text with numbers as done with English implementation?

    Thank you 😊

    enhancement 
    opened by safsoun 9
  • Optional slots?

    Optional slots?

    Is your feature request related to a problem? Please describe. I'm super impressed by your product. Really amazing offering, and great experience.

    My question: is it possible to make slot optional for matching? I.e. allow the expression to be matched even if slot was not matched? My example

    [open, extend, slide out] (the, a) $zone:zone (hydraulic, hydraulics, slide out, slide outs, room, section, slide, slides, cylinder, cylinders, areas, areas) (completely, entirely, fully, to the max, hundred percent, to the limit, as much as possible)

    So I have only one meaningful $zone slot here, but also I have a set of words that I don't care that much for, but they can be in the command. Because i have a bunch of intents and commands reusing those sets, i'd love to turn them into slots for reuse, while basically ignoring them later. The problem is that I'd like the command to work even if they are not matched.,

    Describe the solution you'd like ($zone:zone) type syntax for optional slots

    Describe alternatives you've considered I can use multiple separate commands with or without slots, but there's combinatorial complexity to support them all. I can also keep listing them as is which makes commands hard to parse visually

    enhancement 
    opened by Inviz 8
  • slot names cannot accept numerals

    slot names cannot accept numerals

    Hi there, I have been using the Picovoice console for some time now and I really like it. But since two days I get an error when I want to rebuild new context from the console. The context was created entirely in the console (I didn't change the .yaml file and uploaded it to the console). After pressing the microphone icon, the context is saved and rebuilt but then the process stops in the “WA” phase and the message “Error retrieving context” appears.

    afbeelding

    I am using Firefox but got the same with Chrome. Also I found that I cannot use a digit anymore to indicate different slot phrases like room1 and room2 in new intents while in already existing intents the digits still work. Could this be related to the error mentioned above?

    afbeelding

    Cheers! Jeroen.

    enhancement 
    opened by JPCDekker 7
  • handle non-ASCII chars when returning inference results.

    handle non-ASCII chars when returning inference results.

    Hello,

    After testing French language, the detection is working fine but the returned string from slots is "tricky" when the word contains symbols such as 'é', 'è' etc.. For example, for the word "éteindre", the returned slot string is "éteindre" which is not user-friendly! I don't know how do you manage this, I propose to replace any letter of a specific symbol by its basic Latin letter, for example: 'é', 'è' -> e à -> a etc ..

    PS: I tried to put the text "eteindre" instead of "éteindre" in the rhino console but the dictionary rejected the first one :-(

    bug 
    opened by safsoun 6
  • Rhino C code on Rpi unable to parse the wav file

    Rhino C code on Rpi unable to parse the wav file

    Hello Alireza, I am experimenting with Rhino and tried it on RPi-3. It takes input from the test_within_context.wav file from audio_samples and returns the detected intent. I recorded few more audios in same format and expected rhino to understand the intents. But it could just give slots, slots_value output for one of the audio files. The rest of three audio wav files are not understood by Rhino despite being recorded in the same environment as the one being understood. What might be the error?

    Comparing it with Snips: Snips is able to understand my voice commands spoken in the same environment. So, it makes me believe that my voice commands and noise should not be an issue. Please guide.

    Thanks!

    opened by AniketJangam 6
  • Rhino Issue: Use of 2 different language SDKs on the same device show up as multiple devices

    Rhino Issue: Use of 2 different language SDKs on the same device show up as multiple devices

    I was initially evaluating Rhino on Python before switching to Rust as pvrecorder was causing issues when I created an exe using pyinstaller. The picovoice console currently shows that I've used my access key on 2 devices.

    Expected behaviour

    Picovoice console should have showed that the access key was used only one device.

    Actual behaviour

    image

    Steps to reproduce the behaviour

    I can't try and reproduce this issue as I'll hit the device limit. But I started of with the python example and a few days later I used the rust example.

    bug 
    opened by tsgowtham 5
  • Possible to have a generic slot?

    Possible to have a generic slot?

    Hi, I'm very interested in using this package for a Flutter app I'm working on.

    I'm wondering if it is possible to have a slot that works somewhat like a wildcard. For example, a "food" slot is naturally difficult to make as listing all foods in a slot isn't really feasible.

    I tried to use the built-in $pv.Alphabetic slot but that seems to pick up individual letters instead of whole words.

    Some examples to demonstrate exactly what I'd like to do:

    How many calories are in $pv.TwoDigitInteger:quantity servings of $food:food? What pairs well with $food:food_a and $food:food_b?

    opened by AlexHartford 5
  • How to design the model?

    How to design the model?

    Or more specifically: what trade-offs are there to consider?

    Hi there! As a minimal examples to illustrate my questions:

    1. Are two intents "lightsOn" (expression: "turn lights on") and "lightsOff" (expression: "turn lights off") cheaper in terms of performance than one intent "switchLight" with expression "turn lights $state:state" with slot "state" having the elements "on" and "off"?

    2. How about the equivalent, but less intuitive option of a single intent "switchLight" with expressions "$dummy:on lights on" and "$dummy:off lights of" with the slot "dummy" having just the one element "turn"? This is admittedly a bad example, but I think the general idea to just have an expression put a dummy value into a specifically named slot could come in handy sometimes - unless it's always better to create a separate intent for some reason...

    3. Is it helpful to define sort of sub-slots (e.g. have a slot with all the days and a separate one with just the workdays) and use the more specific one where the other options are not valid? Or just put the general slot and filter the invalid results later, in your application, to avoid cluttering the model?

    4. Do I put everything into a single model or does it make sense to have multiple smaller models and just let Rhino listen for the one that is expected/allowed in the current situation? If neither performance nor colliding expressions are an issue, a single model might be easier to have, but its a bit hard to manage in the console because you cannot re-order elements (at least not as far as I have seen).

    And while I'm here, a question regarding licensing: what do you mean by # Voice Interactions (per month): 1000 on the pricing page? And can I at least switch between devices as I am allowed just one? Or would it even be acceptable to run the software on multiple computers as long as they are all my machines, located in different rooms of my home? (might be easier than having to send the data from all microphones to a single instance)

    opened by xbrtll 5
  • Rhino Documentation Issue

    Rhino Documentation Issue

    What is the URL of the doc?

    https://github.com/Picovoice/rhino

    What's the nature of the issue? (e.g. steps do not work, typos/grammar/spelling, etc., out of date)

    The note here about self service "self-service. Developers can train custom models using Picovoice Console " paired with the 30 day expiration shown in the console is unclear to me what is possible in the free model. Is the free model only available to be downloaded in that 30 day window or does it actually expire in use every 30 days and need to be retrained and installed? I don't seem to see any other options to create custom models outside of the console at this time so I am not sure if that is a possibility either.

    opened by SuperJonotron 5
Releases(v2.1)
Owner
Picovoice
Edge Voice AI Platform
Picovoice
Invasive Plant Species Identification

Invasive_Plant_Species_Identification Used LiDAR Odometry and Mapping (LOAM) to create a 3D point cloud map which can be used to identify invasive pla

2 May 12, 2022
Styled Augmented Translation

SAT Style Augmented Translation Introduction By collecting high-quality data, we were able to train a model that outperforms Google Translate on 6 dif

139 Dec 29, 2022
A tutorial on DataFrames.jl prepared for JuliaCon2021

JuliaCon2021 DataFrames.jl Tutorial This is a tutorial on DataFrames.jl prepared for JuliaCon2021. A video recording of the tutorial is available here

Bogumił Kamiński 106 Jan 09, 2023
Implementation of CrossViT: Cross-Attention Multi-Scale Vision Transformer for Image Classification

CrossViT : Cross-Attention Multi-Scale Vision Transformer for Image Classification This is an unofficial PyTorch implementation of CrossViT: Cross-Att

Rishikesh (ऋषिकेश) 103 Nov 25, 2022
Pytorch Geometric Tutorials

Pytorch Geometric Tutorials

Antonio Longa 648 Jan 08, 2023
OpenMMLab Image Classification Toolbox and Benchmark

Introduction English | 简体中文 MMClassification is an open source image classification toolbox based on PyTorch. It is a part of the OpenMMLab project. D

OpenMMLab 1.8k Jan 03, 2023
Bottom-up attention model for image captioning and VQA, based on Faster R-CNN and Visual Genome

bottom-up-attention This code implements a bottom-up attention model, based on multi-gpu training of Faster R-CNN with ResNet-101, using object and at

Peter Anderson 1.3k Jan 09, 2023
The repository includes the code for training cell counting applications. (Keras + Tensorflow)

cell_counting_v2 The repository includes the code for training cell counting applications. (Keras + Tensorflow) Dataset can be downloaded here : http:

Weidi 113 Oct 06, 2022
Pytorch-Swin-Unet-V2 - a modified version of Swin Unet based on Swin Transfomer V2

Swin Unet V2 Swin Unet V2 is a modified version of Swin Unet arxiv based on Swin

Chenxu Peng 26 Dec 03, 2022
Global-Local Attention for Emotion Recognition

Global-Local Attention for Emotion Recognition Requirements Python 3 Install tensorflow (or tensorflow-gpu) = 2.0.0 Install some other packages pip i

Minh Nhat Le 15 Apr 21, 2022
Self-Supervised Deep Blind Video Super-Resolution

Self-Blind-VSR Paper | Discussion Self-Supervised Deep Blind Video Super-Resolution By Haoran Bai and Jinshan Pan Abstract Existing deep learning-base

Haoran Bai 35 Dec 09, 2022
Python script that analyses the given datasets and comes up with the best polynomial regression representation with the smallest polynomial degree possible

Python script that analyses the given datasets and comes up with the best polynomial regression representation with the smallest polynomial degree possible, to be the most reliable with the least com

Nikolas B Virionis 2 Aug 01, 2022
This is the official repository of Music Playlist Title Generation: A Machine-Translation Approach.

PlyTitle_Generation This is the official repository of Music Playlist Title Generation: A Machine-Translation Approach. The paper has been accepted by

SeungHeonDoh 6 Jan 03, 2022
Keras Image Embeddings using Contrastive Loss

Image to Embedding projection in vector space. Implementation in keras and tensorflow of batch all triplet loss for one-shot/few-shot learning.

Shravan Anand K 5 Mar 21, 2022
Code and data form the paper BERT Got a Date: Introducing Transformers to Temporal Tagging

BERT Got a Date: Introducing Transformers to Temporal Tagging Satya Almasian*, Dennis Aumiller*, and Michael Gertz Heidelberg University Contact us vi

54 Dec 04, 2022
Source code for our paper "Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures"

Molecular Mechanics-Driven Graph Neural Network with Multiplex Graph for Molecular Structures Code for the Multiplex Molecular Graph Neural Network (M

shzhang 59 Dec 10, 2022
A Pytorch Implementation of a continuously rate adjustable learned image compression framework.

GainedVAE A Pytorch Implementation of a continuously rate adjustable learned image compression framework, Gained Variational Autoencoder(GainedVAE). N

39 Dec 24, 2022
Implementation of the paper "Language-agnostic representation learning of source code from structure and context".

Code Transformer This is an official PyTorch implementation of the CodeTransformer model proposed in: D. Zügner, T. Kirschstein, M. Catasta, J. Leskov

Daniel Zügner 131 Dec 13, 2022
A Novel Plug-in Module for Fine-grained Visual Classification

Pytorch implementation for A Novel Plug-in Module for Fine-Grained Visual Classification. fine-grained visual classification task.

ChouPoYung 109 Dec 20, 2022
This repository contains all data used for writing a research paper Multiple Object Trackers in OpenCV: A Benchmark, presented in ISIE 2021 conference in Kyoto, Japan.

OpenCV-Multiple-Object-Tracking Python is version 3.6.7 to install opencv: pip uninstall opecv-python pip uninstall opencv-contrib-python pip install

6 Dec 19, 2021