Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Overview

Galois Autocompleter

Galois Autocompleter

Version Twitter: iedmrc

An autocompleter for code editors based on OpenAI GPT-2.

🏠 Homepage

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2. It is trained (finetuned) on a curated list of approximately 45K Python (~470MB) files gathered from the Github. Currently, it just works properly on Python but not bad at other languages (thanks to GPT-2's power).

This repository now contains the very first release of the Galois Project. With this project, I aim to create a Deep Learning Based Autocompleter such that anyone can run it on their own computer easily. Thus, coding will be more easier and fun!

Galois demo GIF

Installation

With Docker

Either clone the repository and build the image from docker file or directly run the following command:

docker run --rm -dit -p 3030:3030 iedmrc/galois-autocompleter:latest-gpu

P.S: CPU image is not available on the Docker Hub at the moment so if you want to run it on CPU rather than GPU, clone the repository and build the image as follows:

docker build --build-arg TENSORFLOW_VERSION=1.14.0-py3 -t iedmrc/galois-autocompleter:latest .

Without Docker

Clone the repository:

git clone https://github.com/iedmrc/galois-autocompleter

Download the latest model from releases and uncompress it into the directory:

curl -SL https://github.com/iedmrc/galois-autocompleter/releases/latest/download/model.tar.xz | tar -xJC ./galois-autocompleter

Install dependencies:

pip3 install -r requirements.txt

P.S.: Be sure that you have tensorflow version >= 1.13

Run the autocompleter:

python3 main.py

Usage

Currently, there are no extensions for code editors. You can use it through HTTP. When you run the main.py, it will serve an HTTP (flask) server. Then you can easily make a POST request to the http://localhost:3030/ with the some JSON body like the following:

{text: "your python code goes here"}

An example curl command:

curl -X POST \
  http://localhost:3030/autocomplete \
  -H 'Content-Type: application/json' \
  -d '{"text":"import os\nimport sys\n# Count lines of codes in the given directory, separated by file extension.\ndef main(directory):\n  line_count = {}\n  for filename in os.listdir(directory):\n    _, ext = os.path.splitext(filename)\n    if ext not"}'

Check out the gist here for a docker-compose file.

Finetuning The Model

Even you can finetune (re-train over) the model with/for your code files. Just follow the Max Woolf's gpt-2-simple or Neil Shepperd's gpt-2 repositories with 345M version. But don't forget to replace checkpoint (model) with the one in this repository.

You can train it on the Google Colaboratory for free. But if you need a production-grade (i.e. more accurate) one then you may need to train it for more longer time. In my case, it took ~48 hours on a P100 GPU.

Planned Works

  • Train the model to predict in most common programming languages.
  • Create extensions for most common code editors to use galois as an autocompleter.
  • Create a new, more lightweight but powerful model such that anyone can run it in their computer easily.

Contribution

Contributions are welcome. Feel free to create an issue or a pull request.

Author

👤 Ibrahim Ethem DEMIRCI

Twitter: @iedmrc | Github: @iedmrc | Patreon: @iedmrc

Ibrahim's open-source projects are supported by his Patreon. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.

License

It is licensed under MIT License as found in the LICENSE file.

Disclaimer

This repo has no affiliation or relationship with OpenAI.

You might also like...
Text editor on python tkinter to convert english text to other languages with the help of ployglot.
Text editor on python tkinter to convert english text to other languages with the help of ployglot.

Transliterator Text Editor This is a simple transliteration program which is used to convert english word to phonetically matching word in another lan

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.

An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Extracting OpenAI CLIP (Global/Grid) Features from Image and Text This repo aims at providing an easy to use and efficient code for extracting image &

Shirt Bot is a discord bot which uses GPT-3 to generate text
Shirt Bot is a discord bot which uses GPT-3 to generate text

SHIRT BOT · Shirt Bot is a discord bot which uses GPT-3 to generate text. Made by Cyclcrclicly#3420 (474183744685604865) on Discord. Support Server EX

Neural text generators like the GPT models promise a general-purpose means of manipulating texts.

Boolean Prompting for Neural Text Generators Neural text generators like the GPT models promise a general-purpose means of manipulating texts. These m

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN
A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

artificial intelligence cosmic love and attention fire in the sky a pyramid made of ice a lonely house in the woods marriage in the mountains lantern

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch
Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

OpenAI CLIP text encoders for multiple languages!
OpenAI CLIP text encoders for multiple languages!

Multilingual-CLIP OpenAI CLIP text encoders for any language Colab Notebook · Pre-trained Models · Report Bug Overview OpenAI recently released the pa

An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

GPT-NeoX An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hun

Comments
  • train code complete from zero

    train code complete from zero

    Hi, I try to train code complete from zero, below is two settings, and the performance is bad: 1. train from utf-8 encode, the default BPE encoding, 1 batch_size, 3 GPU card, 500K iteration 2. train from ascii encode, 1 batch_size, 3 GPU card, 500K iteration and method 2 is just map ascii from 1 to N, the size is much less, and the performance of the two settings ard bad. do you train by finetune the release model of gpt2? can you share your training settings?

    opened by yuandaxing 2
  • About training ..

    About training ..

    When you specify the training directory to the model, did you extract all .py files and delete the other types or the model will parse all the projects directories that you cloned from Github one by one ?

    opened by dimwael 1
  • Text size limit for the Galois Autocompleter API

    Text size limit for the Galois Autocompleter API

    Hi!

    While testing Galois, I discovered something that seems to be a limit at the size of the text that you can send to the Autocompleter API without getting a 500 error. I could not find the exactly number, but around 2180 characters of code it starts crashing.

    He are the error logs I got:

    Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 1024 is not in [0, 1024) [[{{node sample_sequence/while/model/GatherV2_1}}]]

    bug 
    opened by GabrielTamujo 1
  • Finetuning the Galois' model

    Finetuning the Galois' model

    Hi, @iedmrc I'm finetuning the Galois' model with the gpt-2-simple command aiming at featuring it with our team programming standards. (Well, actually, we hope so!) I'm running the finetune with "steps=-1" (what's, endless run). I'd like to hear from you when should I stop the process. This is the last 4 lines of the current history of the process:

    [310 | 23899.61] loss=0.09 avg=0.36 [320 | 24645.14] loss=0.06 avg=0.35 [330 | 25398.12] loss=0.09 avg=0.34 [340 | 26155.55] loss=0.05 avg=0.33

    Best regard!

    question 
    opened by DenisAraujo68 1
Releases(v0.1.0)
Owner
Galois Autocompleter
Auto code completer for code editors (or any text editor) based on OpenAI GPT-2.
Galois Autocompleter
Speech Recognition for Uyghur using Speech transformer

Speech Recognition for Uyghur using Speech transformer Training: this model using CTC loss and Cross Entropy loss for training. Download pretrained mo

Uyghur 11 Nov 17, 2022
Sapiens is a human antibody language model based on BERT.

Sapiens: Human antibody language model ____ _ / ___| __ _ _ __ (_) ___ _ __ ___ \___ \ / _` | '_ \| |/ _ \ '

Merck Sharp & Dohme Corp. a subsidiary of Merck & Co., Inc. 13 Nov 20, 2022
Integrating the Best of TF into PyTorch, for Machine Learning, Natural Language Processing, and Text Generation. This is part of the CASL project: http://casl-project.ai/

Texar-PyTorch is a toolkit aiming to support a broad set of machine learning, especially natural language processing and text generation tasks. Texar

ASYML 726 Dec 30, 2022
The official repository of the ISBI 2022 KNIGHT Challenge

KNIGHT The official repository holding the data for the ISBI 2022 KNIGHT Challenge About The KNIGHT Challenge asks teams to develop models to classify

Nicholas Heller 4 Jan 22, 2022
An open-source NLP library: fast text cleaning and preprocessing.

An open-source NLP library: fast text cleaning and preprocessing

Iaroslav 21 Mar 18, 2022
:P Some basic stuff I'm gonna use for my upcoming Agile Software Development and Devops

reverse-image-search-py bash script.sh img_name.jpg Requirements pip install requests pip install pyshorteners Dry run [ Sudhanva M 3 Dec 18, 2021

Code associated with the "Data Augmentation using Pre-trained Transformer Models" paper

Data Augmentation using Pre-trained Transformer Models Code associated with the Data Augmentation using Pre-trained Transformer Models paper Code cont

44 Dec 31, 2022
Input english text, then translate it between languages n times using the Deep Translator Python Library.

mass-translator About Input english text, then translate it between languages n times using the Deep Translator Python Library. How to Use Install dep

2 Mar 04, 2022
EdiTTS: Score-based Editing for Controllable Text-to-Speech

Official implementation of EdiTTS: Score-based Editing for Controllable Text-to-Speech

Neosapience 99 Jan 02, 2023
Collection of useful (to me) python scripts for interacting with napari

Napari scripts A collection of napari related tools in various state of disrepair/functionality. Browse_LIF_widget.py This module can be imported, for

5 Aug 15, 2022
In this project, we aim to achieve the task of predicting emojis from tweets. We aim to investigate the relationship between words and emojis.

Making Emojis More Predictable by Karan Abrol, Karanjot Singh and Pritish Wadhwa, Natural Language Processing (CSE546) under the guidance of Dr. Shad

Karanjot Singh 2 Jan 17, 2022
Jarvis is a simple Chatbot with a GUI capable of chatting and retrieving information and daily news from the internet for it's user.

J.A.R.V.I.S Kindly consider starring this repository if you like the program :-) What/Who is J.A.R.V.I.S? J.A.R.V.I.S is an chatbot written that is bu

Epicalable 50 Dec 31, 2022
Blue Brain text mining toolbox for semantic search and structured information extraction

Blue Brain Search Source Code DOI Data & Models DOI Documentation Latest Release Python Versions License Build Status Static Typing Code Style Securit

The Blue Brain Project 29 Dec 01, 2022
Deal or No Deal? End-to-End Learning for Negotiation Dialogues

Introduction This is a PyTorch implementation of the following research papers: (1) Hierarchical Text Generation and Planning for Strategic Dialogue (

Facebook Research 1.4k Dec 29, 2022
English loanwords in the world's languages

Wiktionary as CLDF Content cldf1 and cldf2 contain cldf-conform data sets with a total of 2 377 756 entries about the vocabulary of all 1403 languages

Viktor Martinović 3 Jan 14, 2022
This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 - treatments and vaccinations.

Project: Text Analysis - This project aims to conduct a text information retrieval and text mining on medical research publication regarding Covid19 -

1 Mar 14, 2022
⛵️The official PyTorch implementation for "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing" (EMNLP 2020).

BERT-of-Theseus Code for paper "BERT-of-Theseus: Compressing BERT by Progressive Module Replacing". BERT-of-Theseus is a new compressed BERT by progre

Kevin Canwen Xu 284 Nov 25, 2022
Code for "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022.

README Code for Two-stage Identifier: "Parallel Instance Query Network for Named Entity Recognition", accepted at ACL 2022. For details of the model a

Yongliang Shen 45 Nov 29, 2022
NeuTex: Neural Texture Mapping for Volumetric Neural Rendering

NeuTex: Neural Texture Mapping for Volumetric Neural Rendering Paper: https://arxiv.org/abs/2103.00762 Running Run on the provided DTU scene cd run ba

Fanbo Xiang 68 Jan 06, 2023
Kestrel Threat Hunting Language

Kestrel Threat Hunting Language What is Kestrel? Why we need it? How to hunt with XDR support? What is the science behind it? You can find all the ans

Open Cybersecurity Alliance 201 Dec 16, 2022