A proof-of-concept jupyter extension which converts english queries into relevant python code

Overview

Text2Code for Jupyter notebook

A proof-of-concept jupyter extension which converts english queries into relevant python code.

Blog post with more details:

Data analysis made easy: Text2Code for Jupyter notebook

Demo Video:

Text2Code for Jupyter notebook

Supported Operating Systems:

  • Ubuntu
  • macOS

Installation

NOTE: We have renamed the plugin from mopp to jupyter-text2code. Uninstall mopp before installing new jupyter-text2code version.

pip uninstall mopp

CPU-only install:

For Mac and other Ubuntu installations not having a nvidia GPU, we need to explicitly set an environment variable at time of install.

export JUPYTER_TEXT2CODE_MODE="cpu"

GPU install dependencies:

sudo apt-get install libopenblas-dev libomp-dev

Installation commands:

git clone https://github.com/deepklarity/jupyter-text2code.git
cd jupyter-text2code
pip install .
jupyter nbextension enable jupyter-text2code/main

Uninstallation:

pip uninstall jupyter-text2code

Usage Instructions:

  • Start Jupyter notebook server by running the following command: jupyter notebook
  • If you don't see Nbextensions tab in Jupyter notebook run the following command:jupyter contrib nbextension install --user
  • You can open the sample notebooks/ctds.ipynb notebook for testing
  • If installation happened successfully, then for the first time, Universal Sentence Encoder model will be downloaded from tensorflow_hub
  • Click on the Terminal Icon which appears on the menu (to activate the extension)
  • Type "help" to see a list of currently supported commands in the repo
  • Watch Demo video for some examples

Docker containers for jupyter-text2code

We have published CPU and GPU images to docker hub with all dependencies pre-installed.

Visit https://hub.docker.com/r/deepklarity/jupyter-text2code/ to download the images and usage instructions.
CPU image size: 1.51 GB
GPU image size: 2.56 GB

Model training:

Generate training data:

From a list of templates present at jupyter_text2code/jupyter_text2code_serverextension/data/ner_templates.csv, generate training data by running the following command:

cd scripts && python generate_training_data.py

This command will generate data for intent matching and NER(Named Entity Recognition).

Create intent index faiss

Use the generated data to create a intent-matcher using faiss.

cd scripts && python create_intent_index.py

Train NER model

cd scripts && python train_spacy_ner.py

Steps to add more intents:

  • Add more templates in ner_templates with a new intent_id
  • Generate training data. Modify generate_training_data.py if different generation techniques are needed or if introducing a new entity.
  • Train intent index
  • Train NER model
  • modify jupyter_text2code/jupyter_text2code_serverextension/__init__.py with new intent's condition and add actual code for the intent
  • Reinstall plugin by running: pip install .

TODO:

  • Publish Docker image
  • Refactor code and make it mode modular, remove duplicate code, etc
  • Add support for Windows
  • Add support for more commands
  • Improve intent detection and NER
  • Explore sentence Paraphrasing to generate higher-quality training data
  • Gather real-world variable names, library names as opposed to randomly generating them
  • Try NER with a transformer-based model
  • With enough data, train a language model to directly do English->code like GPT-3 does, instead of having separate stages in the pipeline
  • Create a survey to collect linguistic data
  • Add Speech2Code support

Authored By:

Owner
DeepKlarity
DeepKlarity
WIP: extracting Geometry utilities from datacube-core

odc.geo This is still work in progress. This repository contains geometry related code extracted from Open Datacube. For details and motivation see OD

Open Data Cube 34 Jan 09, 2023
Global topography (referenced to sea-level) in a 10 arcminute resolution grid

Earth - Topography grid at 10 arc-minute resolution Global 10 arc-minute resolution grids of topography (ETOPO1 ice-surface) referenced to mean sea-le

Fatiando a Terra Datasets 1 Jan 20, 2022
iNaturalist observations along hiking trails

This tool reads the route of a hike and generates a table of iNaturalist observations along the trails. It also shows the observations and the route of the hike on a map. Moreover, it saves waypoints

7 Nov 11, 2022
Logging the position of the car on an sdcard

audi-mmi-3g-gps-logging Logging the position of the car on an sdcard, startup script origin not clear to me, logging setup and time change is what I d

2 May 31, 2022
A set of utility functions for working with GeoJSON annotations in Kaibu

kaibu-utils A set of utility functions for working with Kaibu. Create a new repository Create a new repository and select imjoy-team/imjoy-python-temp

ImJoy Team 0 Dec 12, 2021
glTF to 3d Tiles Converter. Convert glTF model to Glb, b3dm or 3d tiles format.

gltf-to-3d-tiles glTF to 3d Tiles Converter. Convert glTF model to Glb, b3dm or 3d tiles format. Usage λ python main.py --help Usage: main.py [OPTION

58 Dec 27, 2022
3D extension built off of shapely to make working with geospatial/trajectory data easier in python.

PyGeoShape 3D extension to shapely and pyproj to make working with geospatial/trajectory data easier in python. Getting Started Installation pip The e

Marc Brittain 5 Dec 27, 2022
PyTorch implementation of ''Background Activation Suppression for Weakly Supervised Object Localization''.

Background Activation Suppression for Weakly Supervised Object Localization PyTorch implementation of ''Background Activation Suppression for Weakly S

34 Dec 27, 2022
Introduction to Geospatial Analysis in Python

Introduction to Geospatial Analysis in Python This repository is in support of a talk on geospatial data. Data To recreate all of the examples, the da

Dillon Gardner 6 Oct 19, 2022
Example of animated maps in matplotlib + geopandas using entire time series of congressional district maps from UCLA archive. rendered, interactive version below

Example of animated maps in matplotlib + geopandas using entire time series of congressional district maps from UCLA archive. rendered, interactive version below

Apoorva Lal 5 May 18, 2022
Satellite imagery for dummies.

felicette Satellite imagery for dummies. What can you do with this tool? TL;DR: Generate JPEG earth imagery from coordinates/location name with public

Shivashis Padhi 1.8k Jan 03, 2023
A ninja python package that unifies the Google Earth Engine ecosystem.

A Python package that unifies the Google Earth Engine ecosystem. EarthEngine.jl | rgee | rgee+ | eemont GitHub: https://github.com/r-earthengine/ee_ex

47 Dec 27, 2022
Download and process satellite imagery in Python using Sentinel Hub services.

Description The sentinelhub Python package allows users to make OGC (WMS and WCS) web requests to download and process satellite images within your Py

Sentinel Hub 659 Dec 23, 2022
A public data repository for datasets created from TransLink GTFS data.

TransLink Spatial Data What: TransLink is the statutory public transit authority for the Metro Vancouver region. This GitHub repository is a collectio

Henry Tang 3 Jan 14, 2022
Asynchronous Client for the worlds fastest in-memory geo-database Tile38

This is an asynchonous Python client for Tile38 that allows for fast and easy interaction with the worlds fastest in-memory geodatabase Tile38.

Ben 53 Dec 29, 2022
Open Data Cube analyses continental scale Earth Observation data through time

Open Data Cube Core Overview The Open Data Cube Core provides an integrated gridded data analysis environment for decades of analysis ready earth obse

Open Data Cube 410 Dec 13, 2022
Pure Python NetCDF file reader and writer

Pyncf Pure Python NetCDF file reading and writing. Introduction Inspired by the pyshp library, which provides simple pythonic and dependency free data

Karim Bahgat 14 Sep 30, 2022
Python 台灣行政區地圖 (2021)

Python 台灣行政區地圖 (2021) 以 python 讀取政府開放平台的 ShapeFile 地圖資訊。歡迎引用或是協作 另有縣市資訊、村里資訊與各種行政地圖資訊 例如: 直轄市、縣市界線(TWD97經緯度) 鄉鎮市區界線(TWD97經緯度) | 政府資料開放平臺: https://data

WeselyOng 12 Sep 27, 2022
Water Detect Algorithm

WaterDetect Synopsis WaterDetect is an end-to-end algorithm to generate open water cover mask, specially conceived for L2A Sentinel 2 imagery from MAJ

142 Dec 30, 2022
Computer Vision in Python

Mahotas Python Computer Vision Library Mahotas is a library of fast computer vision algorithms (all implemented in C++ for speed) operating over numpy

Luis Pedro Coelho 792 Dec 20, 2022