Image Captioning on google cloud platform based on iot

Overview

Design Details

We provide a decentralized solution that is migrated completely on the cloud for the purpose of this problem. Fig 1 diagrammatically displays the architecture that we have used in order to build our application. The model can be broadly divided into four major components namely, non- compute IoT devices, rekognition compute nodes, polly compute nodes and a single DApp node. Devices interact with one another in order to ask for services and provide results over a MQTT publish subscribe model. Each of these components and the communication between them has been described in greater detail below:

Non-Compute IoT Devices: The devices shown on the left-hand side (in a vertical queue) are the non-compute IoT devices. As per our problem statement these devices should have picture capturing and audio playing capabilities. As the name suggests this device does not have compute capabilities and relies on the network of compute nodes in order to get the final audio file for the image it publishes on the network. The non-compute IoT devices have been simulated on the Google Cloud Platform.

Rekognition Compute Nodes: This node has also been simulated using the Google Cloud Platform IoT Core service. The blue nodes that have been placed vertically on the left-hand side represent the rekognition compute nodes in Fig 1. As the name suggests this node has compute capabilities. The non-compute node sends the images to this node and this node is tasked with running the image through the object detection model and returns the labels identified to the caller IoT device. This node receives the image over the MQTT queue and then stores it on the cloud. We have used AWS S3 to meet our storage needs. AWS also offers the Image Rekognition service which has a ML model pre-trained for object detection and this node uses this service on the image stored in the S3 bucket so as to get the labels in the image. The decision of which node serves which request has been made by a three-way handshake between the non-compute node and this node, such that the first node that provides an acknowledgement to the non-compute device is tasked with serving the request. Details of this communication have been presented later in the part where the MQTT communication is discussed.

Polly Compute Nodes: This node has also been simulated using the Google Cloud Platform IoT Core service. As the name suggests this node has compute capabilities. The blue nodes that have been placed horizontally at the bottom represent the polly compute nodes in Fig 1. The non-compute node sends the labels that it has received from the Rekognition Compute node to this node and this node is tasked with running the labels through the text-to-speech-model and an audio file that reads out the labels is sent to the caller IoT device. This node receives the image over the MQTT queue and then stores it on the cloud. We have used AWS S3 to meet our storage needs. AWS also offers the Image Rekognition service which has a ML model pre-trained for object detection and this node uses this service on the image stored in the S3 bucket so as to get the labels in the image. These labels are again returned using the MQTT queue. The decision of which node serves which request has been made by a three-way handshake between the non-compute node and this node, such that the first node that provides an acknowledgement to the non-compute device is tasked with serving the request. Details of this communication have been presented later in the part where the MQTT communication is discussed.

DApp Node: Our architecture currently supports a single DApp Node. The single IoT device present in the rightmost coroner of Fig 1 is the DApp node in the design diagram. This node is used for authentication before giving the device access to the MQTT central queue. For our project, we have used the Ropsten test network where each device has a wallet with address and private keys. The device gets access to the central queue as soon as a payment of 0.02 ether has been made via the “pay()” function in the smart contract pushed to the Ropsten test network. This is reflected in the Metamask wallet and can be seen on Etherscan as well. The user interface of the same can be seen in the figure below.

MQTT Publish/Subscribe: The devices interact between themselves using the MQTT Pub/Sub model. GCP provides a MQTT bridge for registries and we made use of that. Every device in the network (except DApp) is subscribed to the central topic (‘projects/project2-277316/topics/my-topic’ in our case). The compute nodes read messages from this queue and act on them only if it is a request that can be handled by it. The communication from the compute node to the non-compute device is more secure however, since the compute nodes publish the message directly to the device’s config which is private to the device. The direction of communications between devices and nodes has been shown in Fig1 through arrows. As can be seen only the non-compute nodes publish information to the central topic, while the compute nodes only read from this topic. The interaction from the compute node to the non-compute device is done to the device’s config directly. An additional benefit of using a central queue across all services is that this makes the architecture very portable and easy to modify in case a new service has to be added at a later development stage.

Components

Decentralized (P2P) component: In order to ensure that our design has decentralized peers and compute capabilities we have tweaked the way we publish and acknowledge requests from the MQTT queue. Fig2 explains how the P2P status has been ensured.

Detailed Implementation P2P architecture diagram

IoT component: As explained above, all the devices and nodes have been simulated as IoT devices on GCP, hence covering this requirement.

Machine Learning component: Both Rekognition and Polly use ML models in order to detect objects and convert text to speech, hence covering this requirement

Multiple Clouds: While all our devices and nodes are in GCP, the ML support for the architecture has been taken from AWS Polly and Rekognition services, hence covering this breadth

DApp component: The DApp server which is a part of the IoT core in GCP, utilizes the Web3 python library to interact with the smart contract pushed on the Ropsten Network with the help of its address and ABI.

How to run

1. Google Cloud Setup

  1. In the Cloud Console, on the project selector page, select or create a Cloud project.
  2. Make sure that billing is enabled for your Google Cloud project.
  3. Enable the Cloud IoT Core and Cloud Pub/Sub APIs. Enable APIs

2. Setup your local environment

  1. Install and initialize the Cloud SDK
  2. Clone this repository.
  3. Change current directory to this repo folder
  4. Download Google's CA root certificate into the same directory. You can optionally set the location of the certificate with the --ca_certs flag.

3. Create a new Registry

  1. Go to Google Cloud Console. If a project is not already created, create a new project.

  2. Go to the Google Core IoT module in your project.

  3. Click Create registry.

  • Name: my-registry
  • Region: us-central1
  1. In the Default telemetry topic dropdown list, select Create a topic.
  • In the Create a topic dialog, enter my-device-events in the Name field.
  • Click Create in the Create a topic dialog
  • The Device state topic and Certificate value fields are optional, so leave them blank.
  1. Click Create on the Cloud IoT Core page.

4. Create a device private key pair

  1. On your local machine, open up a terminal window and run the following command :
openssl req -x509 -newkey rsa:2048 -keyout rsa_private.pem -nodes \
    -out rsa_cert.pem -subj "/CN=unused"
  1. This will generate two files "rsa_private.pem" and "rsa_cert.pem"

5. Create a new device

  1. On the Registries page, select my-registry.

  2. Select the Devices tab and click Create a device.

  3. Enter my-device for the Device ID.

  4. Select Allow for Device communication.

  5. Add the public key information to the Authentication fields.

  • Copy the contents of rsa_cert.pem from the step above to the clipboard. Make sure to include the lines that say -----BEGIN CERTIFICATE----- and -----END CERTIFICATE-----.
  • Select RS256_X509 for the Public key format.
  • Paste the public key in the Public key value box.
  • Click Add to associate the RS256_X509 key with the device.
  1. The Device metadata field is optional; leave it blank.

  2. Click Create.

6. Run the following command to create subscription

gcloud pubsub subscriptions create \
    projects/PROJECT_ID/subscriptions/my-subscription \
    --topic=projects/PROJECT_ID/topics/my-device-events

7. Create an IAM User.

  1. Use Cloud Console to create a service account:
  2. Click Select, then select a project to use for the service account.
  3. Click Create service account.
  • Name the account e2e-example and click Create.
  • Select the role Project > Editor and click Continue.
  • Click Create key.
  • Under Key type, select JSON and click Create.
  • Save this key to the same directory as the example Python files, and rename it service_account.json.

8. Start the dapp server using the below command. Double check the parameters

python3 dappserver.py
    `--project_id=<id of the google cloud project you made above>`
    `--registry_id=<registry id>`
    `--device_id=<The dapp server is also running on an iot device, so create a device on GCP and mention the  device id here>`
    `--private_key_file=<path to the private key for the iot device mentioned above>`
    `--algorithm=RS256`
    `--ca_cert=<path to the ca_certificate>`
    `--service_account_json=<path to the json file for the IAM User>`

9. Start the rekognition nodes using the following command. Multiple nodes can be added to the network, just ensure that each node is linked to a different device id and a different pubsub subscription. Make sure that AWS credentials have been configured. Add aws_access_key_id and aws_secret_access_key into aws/credentials file.

python reknode.py
    `--project_id=<id of the google cloud project you made above>`
    `--registry_id=<registry id>`
    `--device_id=<The rekognition server is also running on an iot device, so create a device on GCP and mention the device id here>`
    `--private_key_file=<path to the private key for the iot device mentioned above>`
    `--algorithm=RS256` 
    `--ca_cert=<path to the ca_certificate>`
    `--pubsub_subscription=<id of the subscription that was created for this device>`
    `--service_account_json=<path to the json file for the IAM User>`

10. Start the polly nodes using the following command. Multiple nodes can be added to the network, just ensure that each node is linked to a different device id and a different pubsub subscription. Add the AWS credentials in the code to be able to use polly.

python pollyNode.py
    `--project_id=<id of the google cloud project you made above>`
    `--registry_id=<registry id>`
    `--device_id=<The rekognition server is also running on an iot device, so create a device on GCP and mention the device id here>`
    `--private_key_file=<path to the private key for the iot device mentioned above>`
    `--algorithm=RS256`
    `--ca_cert=<path to the ca_certificate>`
    `--pubsub_subscription=<id of the subscription that was created for this device>`
    `--service_account_json=<path to the json file for the IAM User>`

11. Use the below code to send some example images via MQTT to the server. You should see the output objects detected printed out on the console. Multiple nodes can be added to the network, just ensure that each node is linked to a different device id and a different pubsub subscription

python iotdevice.py
    `--project_id=<id of the google cloud project you made above>`
    `--registry_id=<registry id>`
    `--device_id=<The rekognition server is also running on an iot device, so create a device on GCP and mention  the device id here>`
    `--private_key_file=<path to the private key for the iot device mentioned above>`
    `--algorithm=RS256`
    `--ca_cert=<path to the ca_certificate>`
    `--pubsub_subscription=<id of the subscription that was created for this device>`
    `--images_path=<path to the folder where the images are stored>`
    `--dapp_key=<the metamask key that is required for etherium payment>`
    `--dapp_addr=<the metamask address that is required for etherium payment>`
    `--dapp_id=<id of the dapp core machine that was started in step 8>`
    `--service_account_json=<path to the json file for the IAM User>`
Owner
Shweta_kumawat
AI software @ Computer programmer, Deep Learning, Computer Vision Researcher and Developer. Implement AI products and machine learning solutions
Shweta_kumawat
Multi-Output Gaussian Process Toolkit

Multi-Output Gaussian Process Toolkit Paper - API Documentation - Tutorials & Examples The Multi-Output Gaussian Process Toolkit is a Python toolkit f

GAMES 113 Nov 25, 2022
Freecodecamp Scientific Computing with Python Certification; Solution for Challenge 2: Time Calculator

Assignment Write a function named add_time that takes in two required parameters and one optional parameter: a start time in the 12-hour clock format

Hellen Namulinda 0 Feb 26, 2022
Politecnico of Turin Thesis: "Implementation and Evaluation of an Educational Chatbot based on NLP Techniques"

THESIS_CAIRONE_FIORENTINO Politecnico of Turin Thesis: "Implementation and Evaluation of an Educational Chatbot based on NLP Techniques" GENERATE TOKE

cairone_fiorentino97 1 Dec 10, 2021
Gauge equivariant mesh cnn

Geometric Mesh CNN The code in this repository is an implementation of the Gauge Equivariant Mesh CNN introduced in the paper Gauge Equivariant Mesh C

50 Dec 18, 2022
A Python Package for Portfolio Optimization using the Critical Line Algorithm

PyCLA A Python Package for Portfolio Optimization using the Critical Line Algorithm Getting started To use PyCLA, clone the repo and install the requi

19 Oct 11, 2022
code for "Feature Importance-aware Transferable Adversarial Attacks"

Feature Importance-aware Attack(FIA) This repository contains the code for the paper: Feature Importance-aware Transferable Adversarial Attacks (ICCV

Hengchang Guo 44 Nov 24, 2022
An implementation of the paper "A Neural Algorithm of Artistic Style"

A Neural Algorithm of Artistic Style implementation - Neural Style Transfer This is an implementation of the research paper "A Neural Algorithm of Art

Srijarko Roy 27 Sep 20, 2022
Unsupervised phone and word segmentation using dynamic programming on self-supervised VQ features.

Unsupervised Phone and Word Segmentation using Vector-Quantized Neural Networks Overview Unsupervised phone and word segmentation on speech data is pe

Herman Kamper 13 Dec 11, 2022
PyTorch implementation of Asymmetric Siamese (https://arxiv.org/abs/2204.00613)

Asym-Siam: On the Importance of Asymmetry for Siamese Representation Learning This is a PyTorch implementation of the Asym-Siam paper, CVPR 2022: @inp

Meta Research 89 Dec 18, 2022
Code to reproduce the experiments from our NeurIPS 2021 paper " The Limitations of Large Width in Neural Networks: A Deep Gaussian Process Perspective"

Code To run: python runner.py new --save SAVE_NAME --data PATH_TO_DATA_DIR --dataset DATASET --model model_name [options] --n 1000 - train - t

Geoff Pleiss 5 Dec 12, 2022
Code in conjunction with the publication 'Contrastive Representation Learning for Hand Shape Estimation'

HanCo Dataset & Contrastive Representation Learning for Hand Shape Estimation Code in conjunction with the publication: Contrastive Representation Lea

Computer Vision Group, Albert-Ludwigs-Universität Freiburg 38 Dec 13, 2022
source code of Adversarial Feedback Loop Paper

Adversarial Feedback Loop [ArXiv] [project page] Official repository of Adversarial Feedback Loop paper Firas Shama, Roey Mechrez, Alon Shoshan, Lihi

17 Jul 20, 2022
Face Detection & Age Gender & Expression & Recognition

Face Detection & Age Gender & Expression & Recognition

Sajjad Ayobi 188 Dec 28, 2022
The official repository for "Score Transformer: Generating Musical Scores from Note-level Representation" (MMAsia '21)

Score Transformer This is the official repository for "Score Transformer": Score Transformer: Generating Musical Scores from Note-level Representation

22 Dec 22, 2022
Measures input lag without dedicated hardware, performing motion detection on recorded or live video

What is InputLagTimer? This tool can measure input lag by analyzing a video where both the game controller and the game screen can be seen on a webcam

Bruno Gonzalez 4 Aug 18, 2022
PyTorch Implementation for AAAI'21 "Do Response Selection Models Really Know What's Next? Utterance Manipulation Strategies for Multi-turn Response Selection"

UMS for Multi-turn Response Selection Implements the model described in the following paper Do Response Selection Models Really Know What's Next? Utte

Taesun Whang 47 Nov 22, 2022
Code for the paper "Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks"

ON-LSTM This repository contains the code used for word-level language model and unsupervised parsing experiments in Ordered Neurons: Integrating Tree

Yikang Shen 572 Nov 21, 2022
Multi-Scale Aligned Distillation for Low-Resolution Detection (CVPR2021)

MSAD Multi-Scale Aligned Distillation for Low-Resolution Detection Lu Qi*, Jason Kuen*, Jiuxiang Gu, Zhe Lin, Yi Wang, Yukang Chen, Yanwei Li, Jiaya J

DV Lab 115 Dec 23, 2022
Python 3 module to print out long strings of text with intervals of time inbetween

Python-Fastprint Python 3 module to print out long strings of text with intervals of time inbetween Install: pip install fastprint Sync Usage: from fa

Kainoa Kanter 2 Jun 27, 2022
Lightweight, Portable, Flexible Distributed/Mobile Deep Learning with Dynamic, Mutation-aware Dataflow Dep Scheduler; for Python, R, Julia, Scala, Go, Javascript and more

Apache MXNet (incubating) for Deep Learning Apache MXNet is a deep learning framework designed for both efficiency and flexibility. It allows you to m

The Apache Software Foundation 20.2k Jan 08, 2023