Rootski - Full codebase for rootski.io (without the data)

Overview

breakdown-svg

๐Ÿ“ฃ Welcome to the Rootski codebase!

This is the codebase for the application running at rootski.io.

๐Ÿ—’ Note: You can find information and training on the architecture, ticket board, development practices, and how to contribute on our knowledge base.

Rootski is a full-stack application for studying the Russian language by learning roots.

Rootski uses an A.I. algorithm called a "transformer" to break Russian words into roots. Rootski enriches the word breakdowns with data such as definitions, grammar information, related words, and examples and then displays this information to users for them to study.

How is the Rootski project run? (Hint, get involved here ๐Ÿ˜ƒ )

Rootski is developed by volunteers!

We use Rootski as a platform to learn and mentor anyone with an interest in frontend/backend development, developing data science models, data engineering, MLOps, DevOps, UX, and running a business. Although the code is open-source, the license for reuse and redistribution is tightly restricted.

The premise for building Rootski "in the open" is this: possibly the best ways to learn to write production-ready, high quality software is to

  1. explore other high-quality software that is already written
  2. develop an application meant to support a large number of users
  3. work with experienced mentors

For better or worse, it's hard to find code for large software systems built to be hosted in the cloud and used by a large number of customers. This is because virtually all apps that fit this description... are proprietary ๐Ÿคฃ . That makes (1) hard.

(2) can be inaccessible due to the amount of time it takes to write well-written software systems without a team (or mentorship). If you're only interested in a sub-part of engineering, or if you are a beginner, it can be infeasible to build an entire production system on your own. Think of this as working on a personal project... with a bunch of other fun people working on it with you.

Contributors

Onboarded and contributed features :D

  • Eric Riddoch - Been working on Rootski for 3 years and counting!
  • Ryan Gardner - Helping with all of the legal/business aspects and dabbling in development

Friends

Completed a lot of the Rootski onboarding and chat with us in our Slack workspace about miscellanious code questions, careers, advice, etc.

  • Isaac Robbins - Learning and building experience in MLOps and DevOps!
  • Colin Varney - Full-stack python guy. Is working his first full-time software job!
  • Fazleem Baig - MLOps guy. Quite experienced with Python and learning about AWS. Working for an AI startup in Canada.
  • Ayse (Aysha) Arslan - Learning about all things MLOps. Working her first MLE/MLOps job!
  • Sebastian Sanchez - Learning about frontend development.
  • Yashwanth (Yash) Kumar - Finishing up the Georgia Tech online masters in CS.






The Technical Stuff

How to deploy an entire Rootski environment from scratch

Going through this, you'll notice that there are several one-time, manual steps. This is common even for teams with a heavily automated infrastructure-as-code workflow, particularly when it comes to the creation of users and storing of credentials.

Once these steps are complete, all subsequent interactions with our Rootski infrastructure can be done using our infrastructure as code and other automation tools.

1. Create an AWS account and user

  1. Create an IAM user with programmatic access
  2. Install the AWS CLI
  3. Run aws configure --profile rootski and copy the credentials from step (1). Set the region to us-west-2.

๐Ÿ—’ Note: this IAM user will need sufficient permissions to create and access the infrastructure that will be discussed below. This includes creating several types of infrastructure using CloudFormation.

2. Create an SSH key pair

  1. In the AWS console, go to EC2 and create an SSH key pair named rootski.
  2. Download the key pair.
  3. Save the key pair somewhere you won't forget. If the pair isn't already named, I like to rename them and store them at ~/.ssh/rootski/rootski.id_rsa (private key) and ~/.ssh/rootski/rootski.id_rsa.pub (public key).
  4. Create a new GitHub account for a "Machine User". Copy/paste the contents of rootski.id_rsa.pub into any boxes you have to to make this work :D this "machine user" is now authorized to clone the rootski repository!

3. Create several parameters in AWS SSM Parameter Store

Parameter Description
/rootski/ssh/private_key The contents of the private key needed to clone the rootski repository.
/rootski/prod/database_config A stringified JSON object with database connection information (see below)
{
    "postgres_user": "rootski-db-user",
    "postgres_password": "rootski-db-pass",
    "postgres_host": "database.rootski.io",
    "postgres_port": "5432",
    "postgres_db": "rootski-db-database-name"
}

4. Purchase a domain name that happens to be rootski.io

You know, the domain name rootski.io is hard coded in a few places throughout the Rootski infrastructure. It felt wasteful to parameterize this everywhere since... it's unlikely that we will ever change our domain name.

If we ever have a need for this, we can revisit it :D

5. Create an ACM TLS certificate verified with the DNS challenge for *.rootski.io

You'll need to do this in the AWS console. This certificate will allow us to access rootski.io and all of its subdomains over HTTPS. You'll need the ARN of this certificate for a later step.

4. Create the rootski infrastructure

Before running these commands, copy/paste the ARN of the *.rootski.io ACM certificate into the appropriate place in infrastructure/iac/cloudformation/front-end/static-website.yml.

# create the S3 bucket and Route53 hosted zone for hosting the React application as a static site
...

# create the AWS Cognito user pool
...

# create the AWS Lightsail instance with the backend database (simultaneously deploys the database)
...

# deploy the API Gateway and Lambda function
...

5. Deploy the frontend site

make deploy-frontend

DONE!

Owner
Eric
In modern Applied Mathematics, we specialize in algorithms. I'm a data scientist with a strong background in algorithm design and software development.
Eric
Japanese NLP Library

Japanese NLP Library Back to Home Contents 1 Requirements 1.1 Links 1.2 Install 1.3 History 2 Libraries and Modules 2.1 Tokenize jTokenize.py 2.2 Cabo

Pulkit Kathuria 144 Dec 27, 2022
A script that automatically creates a branch name using google translation api and jira api

About google translation api์™€ jira api์„ ์‚ฌ์šฉํ•˜์—ฌ ์ž๋™์œผ๋กœ ๋ธŒ๋žœ์น˜ ์ด๋ฆ„์„ ๋งŒ๋“ค์–ด์ฃผ๋Š” ์Šคํฌ๋ฆฝํŠธ Setup ํ™˜๊ฒฝ๋ณ€์ˆ˜์— ๋‹ค์Œ 3๊ฐ€์ง€๋ฅผ ๋“ฑ๋กํ•ด์•ผ ํ•œ๋‹ค. JIRA_USER : JIRA email (ex: hyunwook.kim 2 Dec 20, 2021

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

Alexa 62 Dec 20, 2022
Japanese Long-Unit-Word Tokenizer with RemBertTokenizerFast of Transformers

Japanese-LUW-Tokenizer Japanese Long-Unit-Word (ๅ›ฝ่ชž็ ”้•ทๅ˜ไฝ) Tokenizer for Transformers based on ้’็ฉบๆ–‡ๅบซ Basic Usage from transformers import RemBertToken

Koichi Yasuoka 3 Dec 22, 2021
This codebase facilitates fast experimentation of differentially private training of Hugging Face transformers.

private-transformers This codebase facilitates fast experimentation of differentially private training of Hugging Face transformers. What is this? Why

Xuechen Li 73 Dec 28, 2022
Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

Low-resource-Machine-Translation This repository contains the code for the project relative to the course Deep Natural Language Processing. The goal o

Andrea Cavallo 3 Jun 22, 2022
Mirco Ravanelli 2.3k Dec 27, 2022
Generating new names based on trends in data using GPT2 (Transformer network)

MLOpsNameGenerator Overall Goal The goal of the project is to develop a model that is capable of creating Pokรฉmon names based on its description, usin

Gustav Lang Moesmand 2 Jan 10, 2022
GVT is a generic translation tool for parts of text on the PC screen with Text to Speak functionality.

GVT is a generic translation tool for parts of text on the PC screen with Text to Speech functionality. I wanted to create it because the existing tools that I experimented with did not satisfy me in

Nuked 1 Aug 21, 2022
Smart discord chatbot integrated with Dialogflow

academic-NLP-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

Tom Huynh 5 Oct 24, 2022
Python package for Turkish Language.

PyTurkce Python package for Turkish Language. Documentation: https://pyturkce.readthedocs.io. Installation pip install pyturkce Usage from pyturkce im

Mert Cobanov 14 Oct 09, 2022
Original implementation of the pooling method introduced in "Speaker embeddings by modeling channel-wise correlations"

Speaker-Embeddings-Correlation-Pooling This is the original implementation of the pooling method introduced in "Speaker embeddings by modeling channel

Themos Stafylakis 10 Apr 30, 2022
ELECTRA: Pre-training Text Encoders as Discriminators Rather Than Generators

ELECTRA Introduction ELECTRA is a method for self-supervised language representation learning. It can be used to pre-train transformer networks using

Google Research 2.1k Dec 28, 2022
topic modeling on unstructured data in Space news articles retrieved from the Guardian (UK) newspaper using API

NLP Space News Topic Modeling Photos by nasa.gov (1, 2, 3, 4, 5) and extremetech.com Table of Contents Project Idea Data acquisition Primary data sour

edesz 1 Jan 03, 2022
A python gui program to generate reddit text to speech videos from the id of any post.

Reddit text to speech generator A python gui program to generate reddit text to speech videos from the id of any post. Current functionality Generate

Aadvik 17 Dec 19, 2022
Python functions for summarizing and improving voice dictation input.

Helpmespeak Help me speak uses Python functions for summarizing and improving voice dictation input. Get started with OpenAI gpt-3 OpenAI is a amazing

Margarita Humanitarian Foundation 6 Dec 17, 2022
Spooky Skelly For Python

_____ _ _____ _ _ _ | __| ___ ___ ___ | |_ _ _ | __|| |_ ___ | || | _ _ |__ || . || . || . || '

Kur0R1uka 1 Dec 23, 2021
Journalism AI โ€“ Quotes extraction for modular journalism

Quote extraction for modular journalism (JournalismAI collab 2021)

Journalism AI collab 2021 207 Dec 25, 2022
Stack based programming language that compiles to x86_64 assembly or can alternatively be interpreted in Python

lang lang is a simple stack based programming language written in Python. It can

Christoffer Aakre 1 May 30, 2022
This is a Prototype of an Ai ChatBot "Tea and Coffee Supplier" using python.

Ai-ChatBot-Python A chatbot is an intelligent system which can hold a conversation with a human using natural language in real time. Due to the rise o

1 Oct 30, 2021