Pipelines de datos, 2021.

Last update: May 19, 2022

Related tags

Overview

Este repo ilustra un proceso sencillo de automatización de transformación y modelado de datos, a través de un pipeline utilizando Luigi.

Stack principal

Python 3.7+
Streamlit
Scikit-learn
Pandas
Luigi

Idea

El proceso completo es descrito en una app interactiva que encuentras en el script app.py. Checa los detalles de cómo levantar la app en la sección de cómo ejecutar los scripts.

Setup

Crea un entorno virtual (te recomiendo usar conda):
```
conda create --name data-pipes python=3.7
```
Activate the virtual environment:
```
conda activate data-pipes
```
Install requirements:
```
pip install -r requirements.txt
```

Ejecuta los scripts

App interactiva

Para ejecutar la app interactiva, simplemente ejecuta el comando de Streamlit con el entorno virtual activado:

(data-pipes) streamlit run app.py

Esto abrirá un servidor local en: http://localhost:8501.

Pipeline de datos

Si deseas ejecutar una tarea en específico ,supongamos la TareaX que se encuentra en el script tareas.py, entonces ejecuta el comando:

PYTHONPATH=. luigi --module tareas TareaX --local-scheduler

¡Puedes extender el código y agregar las tareas que tú desees!

Pipelines de datos, 2021.

Related tags

Overview

Stack principal

Idea

Setup

Ejecuta los scripts

App interactiva

Pipeline de datos

Owner

Rodolfo Ferro

PyTorch Implementation of Meta-StyleSpeech : Multi-Speaker Adaptive Text-to-Speech Generation

A Word Level Transformer layer based on PyTorch and 🤗 Transformers.

Toy example of an applied ML pipeline for me to experiment with MLOps tools.

Sentence boundary disambiguation tool for Japanese texts (日本語文境界判定器)

multi-label，classifier，text classification，多标签文本分类，文本分类，BERT，ALBERT，multi-label-classification，seq2seq，attention，beam search

A Python 3.6+ package to run .many files, where many programs written in many languages may exist in one file.

PyTorch implementation of NATSpeech: A Non-Autoregressive Text-to-Speech Framework

2021语言与智能技术竞赛：机器阅读理解任务

문장단위로 분절된 나무위키 데이터셋. Releases에서 다운로드 받거나, tfds-korean을 통해 다운로드 받으세요.

[ICCV 2021] Instance-level Image Retrieval using Reranking Transformers

Collection of useful (to me) python scripts for interacting with napari

Telegram bot to auto post messages of one channel in another channel as soon as it is posted, without the forwarded tag.

Pytorch NLP library based on FastAI

This code is the implementation of Text Emotion Recognition (TER) with linguistic features

NLPShala , the best IDE for all Natural language processing tasks.

Transformers4Rec is a flexible and efficient library for sequential and session-based recommendation, available for both PyTorch and Tensorflow.

Text Analysis & Topic Extraction on Android App user reviews

Sentiment Analysis Project using Count Vectorizer and TF-IDF Vectorizer

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.