Tools and data for measuring the popularity & growth of various programming languages.

Overview

growth-data

Tools and data for measuring the popularity & growth of various programming languages.

Install the dependencies

$ pip install -r requirements.txt

Example queries

Number of (non-fork) repositories

sqlite> .mode column
sqlite> SELECT
    ds,
    github_search_q AS q,
    MAX(github_search_total_count) AS num_repos
  FROM github_search
  GROUP BY 1, 2
  ORDER BY 3;
ds          q                                  num_repos
----------  ---------------------------------  ---------
2021-12-22  language:tla and fork:false        64       
2021-12-22  language:lean and fork:false       75       
2021-12-22  language:idris and fork:false      140      
2021-12-22  language:agda and fork:false       192      
2021-12-22  language:ada and fork:false        438      
2021-12-22  language:coq and fork:false        509      
2021-12-22  language:erlang and fork:false     2260     
2021-12-22  language:ocaml and fork:false      2278     
2021-12-22  language:fortran and fork:false    3196     
2021-12-22  language:verilog and fork:false    3882     
2021-12-22  language:assembly and fork:false   8654     
2021-12-22  language:haskell and fork:false    10052    
2021-12-22  language:terraform and fork:false  10254    
2021-12-22  language:rust and fork:false       21906    
2021-12-22  language:go and fork:false         67601    
2021-12-22  language:r and fork:false          114942   
2021-12-22  language:c and fork:false          174439   
2021-12-22  language:c++ and fork:false        270351   
2021-12-22  language:python and fork:false     762729   
2021-12-22  language:java and fork:false       943381   
sqlite> 

Stats about the average (non-fork) repository

sqlite> .mode column
sqlite> SELECT
    github_search.ds AS ds,
    github_search_q AS q,
    COUNT(*) AS repos,
    SUM(github_repo_has_issues) AS repos_with_issues,
    SUM(github_repo_has_wiki) AS repos_with_wiki,
    SUM(github_repo_has_pages) AS repos_with_pages,
    SUM(github_repo_license_name != '') AS repos_with_license,
    SUM(github_repo_size) AS sum_repo_size,
    SUM(github_repo_stargazers_count) AS sum_stars,
    AVG(github_repo_stargazers_count) AS avg_stars,
    AVG(github_repo_forks_count) AS avg_forks,
    AVG(github_repo_size) AS avg_size,
    AVG(github_repo_open_issues_count) AS avg_open_issues
  FROM github_search INNER JOIN github_search_repo
  ON github_search.obj_id = github_search_obj_id
  GROUP BY 1, 2
  ORDER BY 3;
ds          q                              repos  repos_with_issues  repos_with_wiki  repos_with_pages  repos_with_license  sum_repo_size  sum_stars  avg_stars         avg_forks         avg_size          avg_open_issues  
----------  -----------------------------  -----  -----------------  ---------------  ----------------  ------------------  -------------  ---------  ----------------  ----------------  ----------------  -----------------
2021-12-22  language:tla and fork:false    64     63                 61               1                 23                  1393879        1937       30.265625         2.34375           21779.359375      0.359375         
2021-12-22  language:lean and fork:false   75     73                 72               5                 22                  1119783        1475       19.6666666666667  1.85333333333333  14930.44          1.61333333333333 
2021-12-22  language:idris and fork:false  140    139                136              4                 63                  108818         1242       8.87142857142857  0.85              777.271428571429  0.728571428571429
2021-12-22  language:agda and fork:false   192    188                187              9                 51                  394233         1725       8.984375          0.90625           2053.296875       0.291666666666667
2021-12-22  language:ada and fork:false    438    421                406              12                155                 2387761        2210       5.04566210045662  1.13926940639269  5451.50913242009  1.09360730593607 
2021-12-22  language:coq and fork:false    509    502                493              42                204                 2894476        4304       8.45579567779961  1.50098231827112  5686.59332023576  0.846758349705305
sqlite>

Stats about the average recently-updated (non-fork) repository

sqlite> .mode column
sqlite> SELECT
    github_search.ds AS ds,
    github_search_q AS q,
    COUNT(*) AS repos,
    SUM(github_repo_has_issues) AS repos_with_issues,
    SUM(github_repo_has_wiki) AS repos_with_wiki,
    SUM(github_repo_has_pages) AS repos_with_pages,
    SUM(github_repo_license_name != '') AS repos_with_license,
    SUM(github_repo_size) AS sum_repo_size,
    SUM(github_repo_stargazers_count) AS sum_stars,
    AVG(github_repo_stargazers_count) AS avg_stars,
    AVG(github_repo_forks_count) AS avg_forks,
    AVG(github_repo_size) AS avg_size,
    AVG(github_repo_open_issues_count) AS avg_open_issues
  FROM github_search INNER JOIN github_search_repo
  ON github_search.obj_id = github_search_obj_id
  WHERE github_repo_updated_at >= '2021-01-01T00:00:00Z'
  GROUP BY 1, 2
  ORDER BY 3;
ds          q                              repos  repos_with_issues  repos_with_wiki  repos_with_pages  repos_with_license  sum_repo_size  sum_stars  avg_stars         avg_forks         avg_size          avg_open_issues  
----------  -----------------------------  -----  -----------------  ---------------  ----------------  ------------------  -------------  ---------  ----------------  ----------------  ----------------  -----------------
2021-12-22  language:tla and fork:false    33     32                 30               1                 18                  1322462        1921       58.2121212121212  4.39393939393939  40074.6060606061  0.636363636363636
2021-12-22  language:idris and fork:false  44     44                 43               3                 23                  33576          1052       23.9090909090909  2.22727272727273  763.090909090909  1.61363636363636 
2021-12-22  language:lean and fork:false   46     44                 43               3                 14                  1116533        1442       31.3478260869565  2.93478260869565  24272.4565217391  2.58695652173913 
2021-12-22  language:agda and fork:false   77     74                 75               8                 24                  310115         1520       19.7402597402597  1.93506493506494  4027.46753246753  0.376623376623377
2021-12-22  language:ada and fork:false    168    165                148              10                82                  1615474        2065       12.2916666666667  2.67261904761905  9615.91666666667  2.80357142857143 
2021-12-22  language:coq and fork:false    211    206                201              32                113                 1962100        4018       19.042654028436   3.22748815165877  9299.05213270142  1.89099526066351 
sqlite> 
A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You can find two approaches for achieving this in this repo.

multitask-learning-transformers A simple recipe for training and inferencing Transformer architecture for Multi-Task Learning on custom datasets. You

Shahrukh Khan 48 Jan 02, 2023
DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time

DensePhrases provides answers to your natural language questions from the entire Wikipedia in real-time. While it efficiently searches the answers out of 60 billion phrases in Wikipedia, it is also v

Jinhyuk Lee 543 Jan 08, 2023
YACLC - Yet Another Chinese Learner Corpus

汉语学习者文本多维标注数据集YACLC V1.0 中文 | English 汉语学习者文本多维标注数据集(Yet Another Chinese Learner

BLCU-ICALL 47 Dec 15, 2022
NLP-SentimentAnalysis - Coursera Course ( Duration : 5 weeks ) offered by DeepLearning.AI

Coursera Natural Language Processing Specialization This repository contains material related to Coursera Natural Language Processing Specialization.

Nishant Sharma 1 Jun 05, 2022
Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP 2020)

This repository contains the resources in our paper "Revisiting Pre-trained Models for Chinese Natural Language Processing", which will be published i

Yiming Cui 463 Dec 30, 2022
Multilingual finetuning of Machine Translation model on low-resource languages. Project for Deep Natural Language Processing course.

Low-resource-Machine-Translation This repository contains the code for the project relative to the course Deep Natural Language Processing. The goal o

Andrea Cavallo 3 Jun 22, 2022
Implementation of Token Shift GPT - An autoregressive model that solely relies on shifting the sequence space for mixing

Token Shift GPT Implementation of Token Shift GPT - An autoregressive model that relies solely on shifting along the sequence dimension and feedforwar

Phil Wang 32 Oct 14, 2022
Partially offline multi-language translator built upon Huggingface transformers.

Translate Command-line interface to translation pipelines, powered by Huggingface transformers. This tool can download translation models, and then us

Richard Jarry 8 Oct 25, 2022
Generating new names based on trends in data using GPT2 (Transformer network)

MLOpsNameGenerator Overall Goal The goal of the project is to develop a model that is capable of creating Pokémon names based on its description, usin

Gustav Lang Moesmand 2 Jan 10, 2022
Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 models for speech recognition

Wav2Vec2 STT Python Beta Software Simple Python library, distributed via binary wheels with few direct dependencies, for easily using wav2vec 2.0 mode

David Zurow 22 Dec 29, 2022
SDL: Synthetic Document Layout dataset

SDL is the project that synthesizes document images. It facilitates multiple-level labeling on document images and can generate in multiple languages.

Sơn Nguyễn 0 Oct 07, 2021
Utility for Google Text-To-Speech batch audio files generator. Ideal for prompt files creation with Google voices for application in offline IVRs

Google Text-To-Speech Batch Prompt File Maker Are you in the need of IVR prompts, but you have no voice actors? Let Google talk your prompts like a pr

Ponchotitlán 1 Aug 19, 2021
Research Code for NeurIPS 2020 Spotlight paper "Large-Scale Adversarial Training for Vision-and-Language Representation Learning": UNITER adversarial training part

VILLA: Vision-and-Language Adversarial Training This is the official repository of VILLA (NeurIPS 2020 Spotlight). This repository currently supports

Zhe Gan 109 Dec 31, 2022
Pytorch version of BERT-whitening

BERT-whitening This is the Pytorch implementation of "Whitening Sentence Representations for Better Semantics and Faster Retrieval". BERT-whitening is

Weijie Liu 255 Dec 27, 2022
Document processing using transformers

Doc Transformers Document processing using transformers. This is still in developmental phase, currently supports only extraction of form data i.e (ke

Vishnu Nandakumar 13 Dec 21, 2022
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Facebook Research 1.6k Dec 29, 2022
Tools to download and cleanup Common Crawl data

cc_net Tools to download and clean Common Crawl as introduced in our paper CCNet. If you found these resources useful, please consider citing: @inproc

Meta Research 483 Jan 02, 2023
SIGIR'22 paper: Axiomatically Regularized Pre-training for Ad hoc Search

Introduction This codebase contains source-code of the Python-based implementation (ARES) of our SIGIR 2022 paper. Chen, Jia, et al. "Axiomatically Re

Jia Chen 17 Nov 09, 2022
Use Tensorflow2.7.0 Build OpenAI'GPT-2

TF2_GPT-2 Use Tensorflow2.7.0 Build OpenAI'GPT-2 使用最新tensorflow2.7.0构建openai官方的GPT-2 NLP模型 优点 使用无监督技术 拥有大量词汇量 可实现续写(堪比“xx梦续写”) 实现对话后续将应用于FloatTech的Bot

Watermelon 9 Sep 13, 2022
An evaluation toolkit for voice conversion models.

Voice-conversion-evaluation An evaluation toolkit for voice conversion models. Sample test pair Generate the metadata for evaluating models. The direc

30 Aug 29, 2022