Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Last update: Sep 23, 2022

Overview

Galois Autocompleter

An autocompleter for code editors based on OpenAI GPT-2.

🏠 Homepage

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2. It is trained (finetuned) on a curated list of approximately 45K Python (~470MB) files gathered from the Github. Currently, it just works properly on Python but not bad at other languages (thanks to GPT-2's power).

This repository now contains the very first release of the Galois Project. With this project, I aim to create a Deep Learning Based Autocompleter such that anyone can run it on their own computer easily. Thus, coding will be more easier and fun!

Installation

With Docker

Either clone the repository and build the image from docker file or directly run the following command:

docker run --rm -dit -p 3030:3030 iedmrc/galois-autocompleter:latest-gpu

P.S: CPU image is not available on the Docker Hub at the moment so if you want to run it on CPU rather than GPU, clone the repository and build the image as follows:

docker build --build-arg TENSORFLOW_VERSION=1.14.0-py3 -t iedmrc/galois-autocompleter:latest .

Without Docker

Clone the repository:

git clone https://github.com/iedmrc/galois-autocompleter

Download the latest model from releases and uncompress it into the directory:

curl -SL https://github.com/iedmrc/galois-autocompleter/releases/latest/download/model.tar.xz | tar -xJC ./galois-autocompleter

Install dependencies:

pip3 install -r requirements.txt

P.S.: Be sure that you have tensorflow version >= 1.13

Run the autocompleter:

python3 main.py

Usage

Currently, there are no extensions for code editors. You can use it through HTTP. When you run the main.py, it will serve an HTTP (flask) server. Then you can easily make a POST request to the http://localhost:3030/ with the some JSON body like the following:

{text: "your python code goes here"}

An example curl command:

curl -X POST \
  http://localhost:3030/autocomplete \
  -H 'Content-Type: application/json' \
  -d '{"text":"import os\nimport sys\n# Count lines of codes in the given directory, separated by file extension.\ndef main(directory):\n  line_count = {}\n  for filename in os.listdir(directory):\n    _, ext = os.path.splitext(filename)\n    if ext not"}'

Check out the gist here for a docker-compose file.

Finetuning The Model

Even you can finetune (re-train over) the model with/for your code files. Just follow the Max Woolf's gpt-2-simple or Neil Shepperd's gpt-2 repositories with 345M version. But don't forget to replace checkpoint (model) with the one in this repository.

You can train it on the Google Colaboratory for free. But if you need a production-grade (i.e. more accurate) one then you may need to train it for more longer time. In my case, it took ~48 hours on a P100 GPU.

Planned Works

Train the model to predict in most common programming languages.
Create extensions for most common code editors to use galois as an autocompleter.
Create a new, more lightweight but powerful model such that anyone can run it in their computer easily.

Contribution

Contributions are welcome. Feel free to create an issue or a pull request.

Author

👤 Ibrahim Ethem DEMIRCI

Twitter: @iedmrc | Github: @iedmrc | Patreon: @iedmrc

Ibrahim's open-source projects are supported by his Patreon. If you found this project helpful, any monetary contributions to the Patreon are appreciated and will be put to good creative use.

License

It is licensed under MIT License as found in the LICENSE file.

Disclaimer

This repo has no affiliation or relationship with OpenAI.

You might also like...

Text editor on python tkinter to convert english text to other languages with the help of ployglot.

Transliterator Text Editor This is a simple transliteration program which is used to convert english word to phonetically matching word in another lan

1 Jan 16, 2022

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

GPT-2 Catalan playground and scripts to train a GPT-2 model either from scrath or from another pretrained model.

1 Jan 28, 2022

An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Extracting OpenAI CLIP (Global/Grid) Features from Image and Text This repo aims at providing an easy to use and efficient code for extracting image &

13 Jan 6, 2023

Shirt Bot is a discord bot which uses GPT-3 to generate text

SHIRT BOT · Shirt Bot is a discord bot which uses GPT-3 to generate text. Made by Cyclcrclicly#3420 (474183744685604865) on Discord. Support Server EX

31 Oct 31, 2022

Neural text generators like the GPT models promise a general-purpose means of manipulating texts.

Boolean Prompting for Neural Text Generators Neural text generators like the GPT models promise a general-purpose means of manipulating texts. These m

20 Jan 9, 2023

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

artificial intelligence cosmic love and attention fire in the sky a pyramid made of ice a lonely house in the woods marriage in the mountains lantern

2.3k Jan 1, 2023

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

5k Jan 2, 2023

OpenAI CLIP text encoders for multiple languages!

Multilingual-CLIP OpenAI CLIP text encoders for any language Colab Notebook · Pre-trained Models · Report Bug Overview OpenAI recently released the pa

481 Dec 30, 2022

An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

GPT-NeoX An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hun

3.1k Jan 8, 2023

Comments

train code complete from zero

Hi， I try to train code complete from zero， below is two settings， and the performance is bad: 1. train from utf-8 encode, the default BPE encoding, 1 batch_size, 3 GPU card, 500K iteration 2. train from ascii encode, 1 batch_size, 3 GPU card, 500K iteration and method 2 is just map ascii from 1 to N, the size is much less, and the performance of the two settings ard bad. do you train by finetune the release model of gpt2? can you share your training settings?

opened by yuandaxing 2
About training ..

When you specify the training directory to the model, did you extract all .py files and delete the other types or the model will parse all the projects directories that you cloned from Github one by one ?

opened by dimwael 1
Text size limit for the Galois Autocompleter API

Hi!

While testing Galois, I discovered something that seems to be a limit at the size of the text that you can send to the Autocompleter API without getting a 500 error. I could not find the exactly number, but around 2180 characters of code it starts crashing.

He are the error logs I got:

Traceback (most recent call last): File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1356, in _do_call return fn(*args) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1341, in _run_fn options, feed_dict, fetch_list, target_list, run_metadata) File "/usr/local/lib/python3.6/dist-packages/tensorflow/python/client/session.py", line 1429, in _call_tf_sessionrun run_metadata) tensorflow.python.framework.errors_impl.InvalidArgumentError: indices[0,0] = 1024 is not in [0, 1024) [[{{node sample_sequence/while/model/GatherV2_1}}]]
bug

opened by GabrielTamujo 1
Finetuning the Galois' model

Hi, @iedmrc I'm finetuning the Galois' model with the gpt-2-simple command aiming at featuring it with our team programming standards. (Well, actually, we hope so!) I'm running the finetune with "steps=-1" (what's, endless run). I'd like to hear from you when should I stop the process. This is the last 4 lines of the current history of the process:

[310 | 23899.61] loss=0.09 avg=0.36 [320 | 24645.14] loss=0.06 avg=0.35 [330 | 25398.12] loss=0.09 avg=0.34 [340 | 26155.55] loss=0.05 avg=0.33

Best regard!
question

opened by DenisAraujo68 1

Releases(v0.1.0)

v0.1.0(Aug 12, 2019)

Here is the first model of Galois. Download and unarchive it in the root directory of galois-autocompleter.
Source code(tar.gz)
Source code(zip)
model.tar.xz(1245.74 MB)

Owner

Galois Autocompleter

Auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

GitHub Repository https://usegalois.com

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

NeX: Real-time View Synthesis with Neural Basis Expansion Project Page | Video | Paper | COLAB | Shiny Dataset We present NeX, a new approach to novel

537 Jan 05, 2023

MASS: Masked Sequence to Sequence Pre-training for Language Generation

1.1k Dec 17, 2022

Bnagla hand written document digiiztion

Bnagla hand written document digiiztion This repo addresses the problem of digiizing hand written documents in Bangla. Documents have definite fields

1 Dec 10, 2021

🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

Pretrained BigBird Model for Korean What is BigBird • How to Use • Pretraining • Evaluation Result • Docs • Citation 한국어 | English What is BigBird? Bi

183 Dec 14, 2022

Python library to make development of portfolio analysis faster and easier

Trafalgar Python library to make development of portfolio analysis faster and easier Installation 🔥 For the moment, Trafalgar is still in beta develo

641 Jan 01, 2023

NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

NL-Augmenter 🦎 → 🐍 The NL-Augmenter is a collaborative effort intended to add transformations of datasets dealing with natural language. Transformat

684 Jan 09, 2023

Word Bot for JKLM Bomb Party

Word Bot for JKLM Bomb Party A bot for Bomb Party on https://www.jklm.fun (Only English) Requirements pynput pyperclip pyautogui Usage: Step 1: Run th

7 Oct 30, 2022

GPT-3 command line interaction

Writer_unblock Straight-forward command line interfacing with GPT-3. Finding yourself stuck at a conceptual stage? Spinning your wheels needlessly on

6 Feb 10, 2022

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

62 Dec 20, 2022

Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

CvarAdversarialRL Official code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning". Initial setup Create a virtual

1 Nov 19, 2021

A python project made to generate code using either OpenAI's codex or GPT-J (Although not as good as codex)

CodeJ A python project made to generate code using either OpenAI's codex or GPT-J (Although not as good as codex) Install requirements pip install -r

1 Dec 06, 2021

Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

capbot-siic Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021. Problem Inspiration A plethora

19 Feb 17, 2022

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks, which modifies the input text with a textual template and directly uses PLMs to conduct pre

2.3k Jan 08, 2023

Translators - is a library which aims to bring free, multiple, enjoyable translation to individuals and students in Python

907 Dec 27, 2022

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

Neural Machine Translation communication system The model is basically direct to convert one source language to another targeted language using encode

7 Sep 22, 2022

Reformer, the efficient Transformer, in Pytorch

Reformer, the Efficient Transformer, in Pytorch This is a Pytorch implementation of Reformer https://openreview.net/pdf?id=rkgNKkHtvB It includes LSH

1.8k Dec 30, 2022

An open source library for deep learning end-to-end dialog systems and chatbots.

DeepPavlov is an open-source conversational AI library built on TensorFlow, Keras and PyTorch. DeepPavlov is designed for development of production re

6k Dec 31, 2022

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

Welcome to AdaptNLP A high level framework and library for running, training, and deploying state-of-the-art Natural Language Processing (NLP) models

407 Jan 03, 2023

translate using your voice

speech-to-text-translator Usage translate using your voice description this project makes translating a word easy, all you have to do is speak and...

1 Oct 18, 2021

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (国語研長単位) Tokenizer for Transformers based on 青空文庫 Basic Usage from transformers import RemBertToken

3 Dec 22, 2021

Galois is an auto code completer for code editors (or any text editor) based on OpenAI GPT-2.

Related tags

Overview

Galois Autocompleter

🏠 Homepage

Installation

With Docker

Without Docker

Usage

Finetuning The Model

Planned Works

Contribution

Author

License

Disclaimer

You might also like...

Text editor on python tkinter to convert english text to other languages with the help of ployglot.

This repository serves as a place to document a toy attempt on how to create a generative text model in Catalan, based on GPT-2

An easy to use, user-friendly and efficient code for extracting OpenAI CLIP (Global/Grid) features from image and text respectively.

Shirt Bot is a discord bot which uses GPT-3 to generate text

Neural text generators like the GPT models promise a general-purpose means of manipulating texts.

A simple command line tool for text to image generation, using OpenAI's CLIP and a BigGAN

Implementation / replication of DALL-E, OpenAI's Text to Image Transformer, in Pytorch

OpenAI CLIP text encoders for multiple languages!

An implementation of model parallel GPT-3-like models on GPUs, based on the DeepSpeed library. Designed to be able to train models in the hundreds of billions of parameters or larger.

Comments

train code complete from zero

About training ..

Text size limit for the Galois Autocompleter API

Finetuning the Galois' model

Releases(v0.1.0)

v0.1.0(Aug 12, 2019)

Owner

Galois Autocompleter

Code release for NeX: Real-time View Synthesis with Neural Basis Expansion

MASS: Masked Sequence to Sequence Pre-training for Language Generation

Bnagla hand written document digiiztion

🦅 Pretrained BigBird Model for Korean (up to 4096 tokens)

Python library to make development of portfolio analysis faster and easier

NL-Augmenter 🦎 → 🐍 A Collaborative Repository of Natural Language Transformations

Word Bot for JKLM Bomb Party

GPT-3 command line interaction

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

Officile code repository for "A Game-Theoretic Perspective on Risk-Sensitive Reinforcement Learning"

A python project made to generate code using either OpenAI's codex or GPT-J (Although not as good as codex)

Repository to hold code for the cap-bot varient that is being presented at the SIIC Defence Hackathon 2021.

Prompt-learning is the latest paradigm to adapt pre-trained language models (PLMs) to downstream NLP tasks

Translators - is a library which aims to bring free, multiple, enjoyable translation to individuals and students in Python

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

Reformer, the efficient Transformer, in Pytorch

An open source library for deep learning end-to-end dialog systems and chatbots.

An easy to use Natural Language Processing library and framework for predicting, training, fine-tuning, and serving up state-of-the-art NLP models.

translate using your voice

Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers