GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled

Last update: Nov 24, 2021

Related tags

Overview

Guidedog

Authors: Kyuhee Jo, Steven Gunarso, Jacky Wang, Raghav Sharma

GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled. You may as well think of it as "speaking guide dog," as the name suggests. It has three key features based on the scene captured by your mobile phone:

Reads text upon command
Describes the scene around you upon command
Warns you if there is an obstacle in front of you

Check out this demo video to learn more about our app!

Android App

UI/UX
- Simple and Responsive
- Voice Assistant architecture for targeted audience
Libraries / APIs
- GC Speech-to-text and Text-to-Speech
- Android SDK , androidX
- ML Kit object detection and tracking api
- TensorFlow Lite MobileNet Image Classification Model

Backend

Flask API
- Image Captioning
- Optical Character Recognition
Deployment
- Google App Engine
- fast central API with different endpoints

Image Captioning

We used tensorflow to build and train model for image captioning on MS-COCO 2014 based on the paper Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. The model uses standard convolutional network as an encoder to extract features from images (we use Inception V3) and feed the generated features into an attention-based decoder generate sentences. While the paper used LSTM model as a decoder, we use a simpler RNN instead.

GuideDog is an AI/ML-based mobile app designed to assist the lives of the visually impaired, 100% voice-controlled

Related tags

Overview

Guidedog

Android App

Backend

Image Captioning

Get more insights : Devpost

Owner

Kyuhee Jo

PyTorch code for EMNLP 2021 paper: Don't be Contradicted with Anything! CI-ToD: Towards Benchmarking Consistency for Task-oriented Dialogue System

RLBot Python bindings for the Rust crate rl_ball_sym

Federated Learning - Including common test models for federated learning, like CNN, Resnet18 and lstm, controlled by different parser

Image super-resolution (SR) is a fast-moving field with novel architectures attracting the spotlight

Image Captioning on google cloud platform based on iot

Generic Event Boundary Detection: A Benchmark for Event Segmentation

Multi-resolution SeqMatch based long-term Place Recognition

My personal Home Assistant configuration.

[CVPR2021] UAV-Human: A Large Benchmark for Human Behavior Understanding with Unmanned Aerial Vehicles

Spatial Sparse Convolution Library

Learning Chinese Character style with conditional GAN

A-ESRGAN aims to provide better super-resolution images by using multi-scale attention U-net discriminators.

MegEngine implementation of YOLOX

Sequential GCN for Active Learning

Learning What and Where to Draw

Backdoor Attack through Frequency Domain

Tweesent-back - Tweesent backend uses fastAPI as the web framework

Seeing if I can put together an interactive version of 3b1b's Manim in Streamlit

Train CNNs for the fruits360 data set in NTOU CS「Machine Vision」class.

MediaPipe Kullanarak İleri Seviye Bilgisayarla Görü