A proof-of-concept jupyter extension which converts english queries into relevant python code

Overview

Text2Code for Jupyter notebook

A proof-of-concept jupyter extension which converts english queries into relevant python code.

Blog post with more details:

Data analysis made easy: Text2Code for Jupyter notebook

Demo Video:

Text2Code for Jupyter notebook

Supported Operating Systems:

  • Ubuntu
  • macOS

Installation

NOTE: We have renamed the plugin from mopp to jupyter-text2code. Uninstall mopp before installing new jupyter-text2code version.

pip uninstall mopp

CPU-only install:

For Mac and other Ubuntu installations not having a nvidia GPU, we need to explicitly set an environment variable at time of install.

export JUPYTER_TEXT2CODE_MODE="cpu"

GPU install dependencies:

sudo apt-get install libopenblas-dev libomp-dev

Installation commands:

git clone https://github.com/deepklarity/jupyter-text2code.git
cd jupyter-text2code
pip install .
jupyter nbextension enable jupyter-text2code/main

Uninstallation:

pip uninstall jupyter-text2code

Usage Instructions:

  • Start Jupyter notebook server by running the following command: jupyter notebook
  • If you don't see Nbextensions tab in Jupyter notebook run the following command:jupyter contrib nbextension install --user
  • You can open the sample notebooks/ctds.ipynb notebook for testing
  • If installation happened successfully, then for the first time, Universal Sentence Encoder model will be downloaded from tensorflow_hub
  • Click on the Terminal Icon which appears on the menu (to activate the extension)
  • Type "help" to see a list of currently supported commands in the repo
  • Watch Demo video for some examples

Docker containers for jupyter-text2code

We have published CPU and GPU images to docker hub with all dependencies pre-installed.

Visit https://hub.docker.com/r/deepklarity/jupyter-text2code/ to download the images and usage instructions.
CPU image size: 1.51 GB
GPU image size: 2.56 GB

Model training:

Generate training data:

From a list of templates present at jupyter_text2code/jupyter_text2code_serverextension/data/ner_templates.csv, generate training data by running the following command:

cd scripts && python generate_training_data.py

This command will generate data for intent matching and NER(Named Entity Recognition).

Create intent index faiss

Use the generated data to create a intent-matcher using faiss.

cd scripts && python create_intent_index.py

Train NER model

cd scripts && python train_spacy_ner.py

Steps to add more intents:

  • Add more templates in ner_templates with a new intent_id
  • Generate training data. Modify generate_training_data.py if different generation techniques are needed or if introducing a new entity.
  • Train intent index
  • Train NER model
  • modify jupyter_text2code/jupyter_text2code_serverextension/__init__.py with new intent's condition and add actual code for the intent
  • Reinstall plugin by running: pip install .

TODO:

  • Publish Docker image
  • Refactor code and make it mode modular, remove duplicate code, etc
  • Add support for Windows
  • Add support for more commands
  • Improve intent detection and NER
  • Explore sentence Paraphrasing to generate higher-quality training data
  • Gather real-world variable names, library names as opposed to randomly generating them
  • Try NER with a transformer-based model
  • With enough data, train a language model to directly do English->code like GPT-3 does, instead of having separate stages in the pipeline
  • Create a survey to collect linguistic data
  • Add Speech2Code support

Authored By:

Owner
DeepKlarity
DeepKlarity
Pure Python NetCDF file reader and writer

Pyncf Pure Python NetCDF file reading and writing. Introduction Inspired by the pyshp library, which provides simple pythonic and dependency free data

Karim Bahgat 14 Sep 30, 2022
Advanced raster and geometry manipulations

buzzard In a nutshell, the buzzard library provides powerful abstractions to manipulate together images and geometries that come from different kind o

Earthcube Lab 30 Jun 20, 2022
Get-countries-info - A python code that fetches data of any country

Country-info A python code getting countries information including country's map

CODE 2 Feb 21, 2022
A public data repository for datasets created from TransLink GTFS data.

TransLink Spatial Data What: TransLink is the statutory public transit authority for the Metro Vancouver region. This GitHub repository is a collectio

Henry Tang 3 Jan 14, 2022
Processing and interpolating spatial data with a twist of machine learning

Documentation | Documentation (dev version) | Contact | Part of the Fatiando a Terra project About Verde is a Python library for processing spatial da

Fatiando a Terra 468 Dec 20, 2022
Extract GoPro highlights and GPMF data.

Python script that parses the gpmd stream for GOPRO moov track (MP4) and extract the GPS info into a GPX (and kml) file.

Chris Auron 2 May 13, 2022
Geographic add-ons for Django REST Framework. Maintained by the OpenWISP Project.

Geographic add-ons for Django REST Framework. Maintained by the OpenWISP Project.

OpenWISP 982 Jan 06, 2023
A simple reverse geocoder that resolves a location to a country

Reverse Geocoder This repository holds a small web service that performs reverse geocoding to determine whether a user specified location is within th

4 Dec 25, 2021
Stitch image tiles into larger composite TIFs

untiler Utility to take a directory of {z}/{x}/{y}.(jpg|png) tiles, and stitch into a scenetiff (tif w/ exact merc tile bounds). Future versions will

Mapbox 38 Dec 16, 2022
ArcGIS Python Toolbox for WhiteboxTools

WhiteboxTools-ArcGIS ArcGIS Python Toolbox for WhiteboxTools. This repository is related to the ArcGIS Python Toolbox for WhiteboxTools, which is an A

Qiusheng Wu 190 Dec 30, 2022
WebGL2 powered geospatial visualization layers

deck.gl | Website WebGL2-powered, highly performant large-scale data visualization deck.gl is designed to simplify high-performance, WebGL-based visua

Vis.gl 10.5k Jan 08, 2023
Replace MSFS2020's bing map to google map

English verison here 中文 免责声明 本教程提到的方法仅用于研究和学习用途。我不对使用、拓展该教程及方法所造成的任何法律责任和损失负责。 背景 微软模拟飞行2020的地景使用了Bing的卫星地图,然而卫星地图比较老旧,很多地区都是几年前的图设置直接是没有的。这种现象在全球不同地区

hesicong 272 Dec 24, 2022
Interactive Maps with Geopandas

Create Interactive maps 🗺️ with your geodataframe Geopatra extends geopandas for interactive mapping and attempts to wrap the goodness of amazing map

sangarshanan 46 Aug 16, 2022
Wraps GEOS geometry functions in numpy ufuncs.

PyGEOS PyGEOS is a C/Python library with vectorized geometry functions. The geometry operations are done in the open-source geometry library GEOS. PyG

362 Dec 23, 2022
Simple, concise geographical visualization in Python

Geographic visualizations for HoloViews. Build Status Coverage Latest dev release Latest release Docs What is it? GeoViews is a Python library that ma

HoloViz 445 Jan 02, 2023
pure-Python (Numpy optional) 3D coordinate conversions for geospace ecef enu eci

Python 3-D coordinate conversions Pure Python (no prerequistes beyond Python itself) 3-D geographic coordinate conversions and geodesy. API similar to

Geospace code 292 Dec 29, 2022
GetOSM is an OpenStreetMap tile downloader written in Python that is agnostic of GUI frameworks.

GetOSM GetOSM is an OpenStreetMap tile downloader written in Python that is agnostic of GUI frameworks. It is used with tkinter by ProjPicker. Require

Huidae Cho 3 May 20, 2022
Create Siege configuration files from Cloud Optimized GeoTIFF.

cogeo-siege Documentation: Source Code: https://github.com/developmentseed/cogeo-siege Description Create siege configuration files from Cloud Optimiz

Development Seed 3 Dec 01, 2022
Water Detect Algorithm

WaterDetect Synopsis WaterDetect is an end-to-end algorithm to generate open water cover mask, specially conceived for L2A Sentinel 2 imagery from MAJ

142 Dec 30, 2022
Histogram matching plugin for rasterio

rio-hist Histogram matching plugin for rasterio. Provides a CLI and python module for adjusting colors based on histogram matching in a variety of col

Mapbox 75 Sep 23, 2022