Rootski - Full codebase for rootski.io (without the data)

Overview

breakdown-svg

📣 Welcome to the Rootski codebase!

This is the codebase for the application running at rootski.io.

🗒 Note: You can find information and training on the architecture, ticket board, development practices, and how to contribute on our knowledge base.

Rootski is a full-stack application for studying the Russian language by learning roots.

Rootski uses an A.I. algorithm called a "transformer" to break Russian words into roots. Rootski enriches the word breakdowns with data such as definitions, grammar information, related words, and examples and then displays this information to users for them to study.

How is the Rootski project run? (Hint, get involved here 😃 )

Rootski is developed by volunteers!

We use Rootski as a platform to learn and mentor anyone with an interest in frontend/backend development, developing data science models, data engineering, MLOps, DevOps, UX, and running a business. Although the code is open-source, the license for reuse and redistribution is tightly restricted.

The premise for building Rootski "in the open" is this: possibly the best ways to learn to write production-ready, high quality software is to

  1. explore other high-quality software that is already written
  2. develop an application meant to support a large number of users
  3. work with experienced mentors

For better or worse, it's hard to find code for large software systems built to be hosted in the cloud and used by a large number of customers. This is because virtually all apps that fit this description... are proprietary 🤣 . That makes (1) hard.

(2) can be inaccessible due to the amount of time it takes to write well-written software systems without a team (or mentorship). If you're only interested in a sub-part of engineering, or if you are a beginner, it can be infeasible to build an entire production system on your own. Think of this as working on a personal project... with a bunch of other fun people working on it with you.

Contributors

Onboarded and contributed features :D

  • Eric Riddoch - Been working on Rootski for 3 years and counting!
  • Ryan Gardner - Helping with all of the legal/business aspects and dabbling in development

Friends

Completed a lot of the Rootski onboarding and chat with us in our Slack workspace about miscellanious code questions, careers, advice, etc.

  • Isaac Robbins - Learning and building experience in MLOps and DevOps!
  • Colin Varney - Full-stack python guy. Is working his first full-time software job!
  • Fazleem Baig - MLOps guy. Quite experienced with Python and learning about AWS. Working for an AI startup in Canada.
  • Ayse (Aysha) Arslan - Learning about all things MLOps. Working her first MLE/MLOps job!
  • Sebastian Sanchez - Learning about frontend development.
  • Yashwanth (Yash) Kumar - Finishing up the Georgia Tech online masters in CS.






The Technical Stuff

How to deploy an entire Rootski environment from scratch

Going through this, you'll notice that there are several one-time, manual steps. This is common even for teams with a heavily automated infrastructure-as-code workflow, particularly when it comes to the creation of users and storing of credentials.

Once these steps are complete, all subsequent interactions with our Rootski infrastructure can be done using our infrastructure as code and other automation tools.

1. Create an AWS account and user

  1. Create an IAM user with programmatic access
  2. Install the AWS CLI
  3. Run aws configure --profile rootski and copy the credentials from step (1). Set the region to us-west-2.

🗒 Note: this IAM user will need sufficient permissions to create and access the infrastructure that will be discussed below. This includes creating several types of infrastructure using CloudFormation.

2. Create an SSH key pair

  1. In the AWS console, go to EC2 and create an SSH key pair named rootski.
  2. Download the key pair.
  3. Save the key pair somewhere you won't forget. If the pair isn't already named, I like to rename them and store them at ~/.ssh/rootski/rootski.id_rsa (private key) and ~/.ssh/rootski/rootski.id_rsa.pub (public key).
  4. Create a new GitHub account for a "Machine User". Copy/paste the contents of rootski.id_rsa.pub into any boxes you have to to make this work :D this "machine user" is now authorized to clone the rootski repository!

3. Create several parameters in AWS SSM Parameter Store

Parameter Description
/rootski/ssh/private_key The contents of the private key needed to clone the rootski repository.
/rootski/prod/database_config A stringified JSON object with database connection information (see below)
{
    "postgres_user": "rootski-db-user",
    "postgres_password": "rootski-db-pass",
    "postgres_host": "database.rootski.io",
    "postgres_port": "5432",
    "postgres_db": "rootski-db-database-name"
}

4. Purchase a domain name that happens to be rootski.io

You know, the domain name rootski.io is hard coded in a few places throughout the Rootski infrastructure. It felt wasteful to parameterize this everywhere since... it's unlikely that we will ever change our domain name.

If we ever have a need for this, we can revisit it :D

5. Create an ACM TLS certificate verified with the DNS challenge for *.rootski.io

You'll need to do this in the AWS console. This certificate will allow us to access rootski.io and all of its subdomains over HTTPS. You'll need the ARN of this certificate for a later step.

4. Create the rootski infrastructure

Before running these commands, copy/paste the ARN of the *.rootski.io ACM certificate into the appropriate place in infrastructure/iac/cloudformation/front-end/static-website.yml.

# create the S3 bucket and Route53 hosted zone for hosting the React application as a static site
...

# create the AWS Cognito user pool
...

# create the AWS Lightsail instance with the backend database (simultaneously deploys the database)
...

# deploy the API Gateway and Lambda function
...

5. Deploy the frontend site

make deploy-frontend

DONE!

Owner
Eric
In modern Applied Mathematics, we specialize in algorithms. I'm a data scientist with a strong background in algorithm design and software development.
Eric
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 06, 2023
Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context

Transformer-XL: Attentive Language Models Beyond a Fixed-Length Context This repository contains the code in both PyTorch and TensorFlow for our paper

Zhilin Yang 3.3k Dec 28, 2022
构建一个多源(公众号、RSS)、干净、个性化的阅读环境

2C 构建一个多源(公众号、RSS)、干净、个性化的阅读环境 作为一名微信公众号的重度用户,公众号一直被我设为汲取知识的地方。随着使用程度的增加,相信大家或多或少会有一个比较头疼的问题——广告问题。 假设你关注的公众号有十来个,若一个公众号两周接一次广告,理论上你会面临二十多次广告,实际上会更多,运

howie.hu 678 Dec 28, 2022
This code is the implementation of Text Emotion Recognition (TER) with linguistic features

APSIPA-TER This code is the implementation of Text Emotion Recognition (TER) with linguistic features. The network model is BERT with a pretrained mod

kenro515 1 Feb 08, 2022
🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Machine Learning for JAX, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrained models

Hugging Face 77.1k Dec 31, 2022
Malware-Related Sentence Classification

Malware-Related Sentence Classification This repo contains the code for the ICTAI 2021 paper "Enrichment of Features for Malware-Related Sentence Clas

Chau Nguyen 1 Mar 26, 2022
Training and evaluation codes for the BertGen paper (ACL-IJCNLP 2021)

BERTGEN This repository is the implementation of the paper "BERTGEN: Multi-task Generation through BERT" (https://arxiv.org/abs/2106.03484). The codeb

<a href=[email protected]"> 9 Oct 26, 2022
HiFi DeepVariant + WhatsHap workflowHiFi DeepVariant + WhatsHap workflow

HiFi DeepVariant + WhatsHap workflow Workflow steps align HiFi reads to reference with pbmm2 call small variants with DeepVariant, using two-pass meth

William Rowell 2 May 14, 2022
Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".

Code for the paper "A Simple but Tough-to-Beat Baseline for Sentence Embeddings".

1.1k Dec 27, 2022
Library for Russian imprecise rhymes generation

TOM RHYMER Library for Russian imprecise rhymes generation. Quick Start Generate rhymes by any given rhyme scheme (aabb, abab, aaccbb, etc ...): from

Alexey Karnachev 6 Oct 18, 2022
💬 Open source machine learning framework to automate text- and voice-based conversations: NLU, dialogue management, connect to Slack, Facebook, and more - Create chatbots and voice assistants

Rasa Open Source Rasa is an open source machine learning framework to automate text-and voice-based conversations. With Rasa, you can build contextual

Rasa 15.3k Dec 30, 2022
Bu Chatbot, Konya Bilim Merkezi Yen için tasarlanmış olan bir projedir.

chatbot Bu Chatbot, Konya Bilim Merkezi Yeni Ufuklar Sergisi için 2021 Yılında tasarlanmış olan bir projedir. Chatbot Python ortamında yazılmıştır. Sö

Emre Özkul 1 Feb 23, 2022
Spooky Skelly For Python

_____ _ _____ _ _ _ | __| ___ ___ ___ | |_ _ _ | __|| |_ ___ | || | _ _ |__ || . || . || . || '

Kur0R1uka 1 Dec 23, 2021
Grover is a model for Neural Fake News -- both generation and detectio

Grover is a model for Neural Fake News -- both generation and detection. However, it probably can also be used for other generation tasks.

Rowan Zellers 856 Dec 24, 2022
Generate a cool README/About me page for your Github Profile

Github Profile README/ About Me Generator 💯 This webapp lets you build a cool README for your profile. A few inputs + ~15 mins = Your Github Profile

Rahul Banerjee 179 Jan 07, 2023
PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit.

PyKaldi is a Python scripting layer for the Kaldi speech recognition toolkit. It provides easy-to-use, low-overhead, first-class Python wrappers for t

922 Dec 31, 2022
WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

WIT (Wikipedia-based Image Text) Dataset is a large multimodal multilingual dataset comprising 37M+ image-text sets with 11M+ unique images across 100+ languages.

Google Research Datasets 740 Dec 24, 2022
Experiments in converting wikidata to ftm

FollowTheMoney / Wikidata mappings This repo will contain tools for converting Wikidata entities into FtM schema. Prefixes: https://www.mediawiki.org/

Friedrich Lindenberg 2 Nov 12, 2021
Using Bert as the backbone model for lime, designed for NLP task explanation (sentence pair text classification task)

Lime Comparing deep contextualized model for sentences highlighting task. In addition, take the classic explanation model "LIME" with bert-base model

JHJu 2 Jan 18, 2022
Creating an LSTM model to generate music

Music-Generation Creating an LSTM model to generate music music-generator Used to create basic sin wave sounds music-ai Contains the functions to conv

Jerin Joseph 2 Dec 02, 2021