Projeto: Machine Learning: Linguagens de Programacao 2004-2001

Overview

Projeto: Machine Learning: Linguagens de Programacao 2004-2001

Projeto de Data Science e Machine Learning de análise de linguagens de programação de 2004 a 2021 obtidos a partir do seguinte dataset do Kaggle.

Tecnologias

  • Python 3
  • Jupyter Notebook
  • Pandas
  • NumPy
  • Matplotlib
  • Seaborn
  • Scikit-Learn
  • Requests
  • REST API Call (Github API)

Algoritmos

  • Regressão Linear

Inicialmente, serão visualizados dados de séries temporais e regressão linear.

Tratamento de dados

df = pd.read_csv('Most Popular Programming Languages from 2004 to 2021 V4.csv')

def createDataFrameFor(df, colunas, colunaAtual):
    return pd.DataFrame(
        {
            'Date': df.Date,
            'Timestamp': map(lambda i : datetime.strptime(df["Date"][i], '%B %Y'), range(len(df.Date))),
            'Language': colunas[colunaAtual],
            'Value': df[df.columns[colunaAtual]]
        }
    )

colunas = df.columns

dados_tratados = createDataFrameFor(df, colunas, 1)

for coluna in range(1, len(colunas)):
    dados_tratados = pd.concat([dados_tratados, createDataFrameFor(df, colunas, coluna)])

dados_tratados.reset_index(drop=True, inplace=True)

dados_tratados['UnixTime'] = list(map(lambda i: (pd.to_datetime([dados_tratados['Timestamp'][i]]).astype(int) / 10**9)[0], range(len(dados_tratados['Date']))))

Visualização dos dados

df_java = dados_tratados[dados_tratados['Language'] == 'Java']
sns.regplot(x="UnixTime", y="Value", data= df_java)
plt.gcf().set_size_inches(16, 6)
plt.ylabel('% de uso da linguagem Java')
plt.xlabel('Anos em Unix Time de 2004 a 2021')
plt.show()

X = df_java.UnixTime.values.reshape(-1, 1)
y = df_java.Value.values.reshape(-1, 1)

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.33, random_state=42)

reg = linear_model.LinearRegression()
reg.fit(X_train, y_train)
y_pred = reg.predict(X_test)

plt.scatter(X_test, y_test,  color='blue')
plt.plot(X_test, y_pred, color='red', linewidth=3)

plt.gcf().set_size_inches(16, 6)

plt.ylabel('% de uso da linguagem Java')
plt.xlabel('Anos em Unix Time de 2004 a 2021')

plt.show()

Plots gerados

  • Regressão Linear com Seaborn (SNS):

Seaborn

  • Regressão Linear com Scikit-Learn (LinearRegression) e Matplotlib:

Matplotlib ScikitLearn

Plots de evolução das linguagens Java, Javascript, Python, C#, PHP, Delphi, Dart e Cobol nos últimos 17 anos

dados = df

dados['Date'] = pd.to_datetime(dados['Date'])
dados.set_index('Date', inplace = True) 

fig, axes = plt.subplots(nrows=4, ncols=2)

dados['Java'].plot(ax=axes[0,0], title = "Análise da linguagem Java nos últimos 17 anos")
dados['JavaScript'].plot(ax=axes[1,0], title = "Análise da linguagem JavaScript nos últimos 17 anos")
dados['Python'].plot(ax=axes[0,1], title = "Análise da linguagem Python nos últimos 17 anos")
dados['C#'].plot(ax=axes[1,1], title = "Análise da linguagem C# nos últimos 17 anos")
dados['PHP'].plot(ax=axes[2,0], title = "Análise da linguagem PHP nos últimos 17 anos")
dados['Delphi'].plot(ax=axes[2,1], title = "Análise da linguagem Delphi nos últimos 17 anos")
dados['Dart'].plot(ax=axes[3,0], title = "Análise da linguagem Dart nos últimos 17 anos")
dados['Cobol'].plot(ax=axes[3,1], title = "Análise da linguagem Cobol nos últimos 17 anos")

plt.gcf().set_size_inches(16, 22)

plt.show()

17 anos

Autor

  • Victor Hugo Negrisoli
  • Desenvolvedor Back-End Sênior | Analista de Dados
Owner
Victor Hugo Negrisoli
Cientista da Computação (UniFil), cursando pós em Data Science (PUC-MG), experiência em Java, Javascript e Python. Atuo como Desenvolvedor Back-End na Ilegra
Victor Hugo Negrisoli
pymc-learn: Practical Probabilistic Machine Learning in Python

pymc-learn: Practical Probabilistic Machine Learning in Python Contents: Github repo What is pymc-learn? Quick Install Quick Start Index What is pymc-

pymc-learn 196 Dec 07, 2022
Pandas Machine Learning and Quant Finance Library Collection

Pandas Machine Learning and Quant Finance Library Collection

148 Dec 07, 2022
Tools for mathematical optimization region

Tools for mathematical optimization region

林景 15 Nov 30, 2022
An implementation of Relaxed Linear Adversarial Concept Erasure (RLACE)

Background This repository contains an implementation of Relaxed Linear Adversarial Concept Erasure (RLACE). Given a dataset X of dense representation

Shauli Ravfogel 4 Apr 13, 2022
We have a dataset of user performances. The project is to develop a machine learning model that will predict the salaries of baseball players.

Salary-Prediction-with-Machine-Learning 1. Business Problem Can a machine learning project be implemented to estimate the salaries of baseball players

Ayşe Nur Türkaslan 9 Oct 14, 2022
Given the names and grades for each student in a class N of students, store them in a nested list and print the name(s) of any student(s) having the second lowest grade.

Hackerank-Nested-List Given the names and grades for each student in a class N of students, store them in a nested list and print the name(s) of any s

Sangeeth Mathew John 2 Dec 14, 2021
Anomaly Detection and Correlation library

luminol Overview Luminol is a light weight python library for time series data analysis. The two major functionalities it supports are anomaly detecti

LinkedIn 1.1k Jan 01, 2023
Toolss - Automatic installer of hacking tools (ONLY FOR TERMUKS!)

Tools Автоматический установщик хакерских утилит (ТОЛЬКО ДЛЯ ТЕРМУКС!) Оригиналь

14 Jan 05, 2023
Python module for data science and machine learning users.

dsnk-distributions package dsnk distribution is a Python module for data science and machine learning that was created with the goal of reducing calcu

Emmanuel ASIFIWE 1 Nov 23, 2021
Cool Python features for machine learning that I used to be too afraid to use. Will be updated as I have more time / learn more.

python-is-cool A gentle guide to the Python features that I didn't know existed or was too afraid to use. This will be updated as I learn more and bec

Chip Huyen 3.3k Jan 05, 2023
A collection of neat and practical data science and machine learning projects

Data Science A collection of neat and practical data science and machine learning projects Explore the docs » Report Bug · Request Feature Table of Co

Will Fong 2 Dec 10, 2021
Learning --> Numpy January 2022 - winter'22

Numerical-Python Numpy NumPy is a library for the Python programming language, adding support for large, multi-dimensional arrays and matrices, along

Shahzaneer Ahmed 0 Mar 12, 2022
BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python

BioPy is a collection (in-progress) of biologically-inspired algorithms written in Python. Some of the algorithms included are mor

Jared M. Smith 40 Aug 26, 2022
Python-based implementations of algorithms for learning on imbalanced data.

ND DIAL: Imbalanced Algorithms Minimalist Python-based implementations of algorithms for imbalanced learning. Includes deep and representational learn

DIAL | Notre Dame 220 Dec 13, 2022
A machine learning model for Covid case prediction

CovidcasePrediction A machine learning model for Covid case prediction Problem Statement Using regression algorithms we can able to track the active c

VijayAadhithya2019rit 1 Feb 02, 2022
100 Days of Machine and Deep Learning Code

💯 Days of Machine Learning and Deep Learning Code MACHINE LEARNING TOPICS COVERED - FROM SCRATCH Linear Regression Logistic Regression K Means Cluste

Tanishq Gautam 66 Nov 02, 2022
Price Prediction model is used to develop an LSTM model to predict the future market price of Bitcoin and Ethereum.

Price Prediction model is used to develop an LSTM model to predict the future market price of Bitcoin and Ethereum.

2 Jun 14, 2022
Machine Learning Model to predict the payment date of an invoice when it gets created in the system.

Payment-Date-Prediction Machine Learning Model to predict the payment date of an invoice when it gets created in the system.

15 Sep 09, 2022
Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Compare MLOps Platforms. Breakdowns of SageMaker, VertexAI, AzureML, Dataiku, Databricks, h2o, kubeflow, mlflow...

Thoughtworks 318 Jan 02, 2023
Customers Segmentation with RFM Scores and K-means

Customer Segmentation with RFM Scores and K-means RFM Segmentation table: K-Means Clustering: Business Problem Rule-based customer segmentation machin

5 Aug 10, 2022