DLFlow is a deep learning framework.

Overview

contributions license python

DLFlow - A Deep Learning WorkFlow

DLFlow概述

DLFlow是一套深度学习pipeline,它结合了Spark的大规模特征处理能力和Tensorflow模型构建能力。利用DLFlow可以快速处理原始特征、训练模型并进行大规模分布式预测,十分适合离线环境下的生产任务。利用DLFlow,用户只需专注于模型开发,而无需关心原始特征处理、pipeline构建、生产部署等工作。

功能支持

配置驱动: DLFlow通过配置驱动,修改配置可以快速更换特征、模型超参数、任务流程等等,极大提高工作效率。

模块化结构: 任务和模型以插件形式存在,便于使用和开发,用户可以可以轻地将自定义任务和模型注册到框架内使用。

任务自组织: 通过内置的Workflow框架,根据任务的产出标记自动解决任务依赖,轻松构建深度学习pipeline。

最佳实践: 融入滴滴用户画像团队深度学习离线任务的最佳实践,有效应对离线生产中的多种问题。将Tensorflow和Spark进行合理结合,更适合离线深度学习任务。

快速开始

环境准备

首先请确保环境中已经安装和配置Hadoop和Spark,并设置好了基本的环境变量。

  • Tensorflow访问HDFS

为了能够使用让Tensorflow访问HDFS,需要确保如下环境变量生效:

# 确保libjvm.so被添加到LD_LIBRARY_PATH
export LD_LIBRARY_PATH=${LD_LIBRARY_PATH}:${JAVA_HOME}/jre/lib/amd64/server

# 确保hadoop jars被添加到CLASSPATH
export CLASSPATH=${CLASSPATH}:$(hadoop classpath --glob)

关于Tensorflow访问HDFS更多内容请参见 TensorFlow on Hadoop

  • Spark读写TFReocrds
# Clone tensorflow/ecosystem项目
git clone https://github.com/tensorflow/ecosystem.git

cd ecosystem/spark/spark-tensorflow-connector/

# 构建spark-tensorflow-connector
mvn versions:set -DnewVersion=1.14.0
mvn clean install

项目构建后生成 target/spark-tensorflow-connector_2.11-1.14.0.jar,后续需要确保该jar被添加到 spark.jars 中。 关于Spark读写TFRecoreds更多内容请参见 spark-tensorflow-connector

安装

通过pip安装:

pip install dlflow

通过源代码安装:

git clone  https://github.com/didi/dlflow.git
cd dlflow
python setup.py install

使用

  • 配置文件

运行配置可参考 conf 目录中的配置。 关于配置详情请参考 配置说明

  • 以模块运行
python -m dlflow.main --config <CONFIGURATION FILE>.conf
  • 以脚本运行

确保python环境的 bin 目录已经被添加到环境变量 PATH

export PATH=$PATH:/usr/local/python/bin

之后通过如下命令运行

dlflow --config .conf

更详细的使用参见 使用说明

预定义任务

预定义任务 描述
Merge 特征融合任务,请参见 特征融合
Encode 解析原始特征,对特征进行编码和预处理,生成能够直接输入模型的特征
Train 模型训练任务
Evaluate 模型评估任务
Predict 模型预测任务,使用Spark进行分布式预测,具备处理大规模数据能力

手册目录

技术方案

DLFlow整体架构

整体架构

DLFLow pipeline

Pipeline

Contributing

欢迎使用并参与到本项目的建设中,详细内容请参见 Contribution Guide

License

DLFlow 基于Apache-2.0协议进行分发和使用,更多信息参见 LICENSE

Owner
DiDi
滴滴出行
DiDi
Can we visualize a large scientific data set with a surrogate model? We're building a GAN for the Earth's Mantle Convection data set to see if we can!

EarthGAN - Earth Mantle Surrogate Modeling Can a surrogate model of the Earth’s Mantle Convection data set be built such that it can be readily run in

Tim 0 Dec 09, 2021
Official implementation for (Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation, CVPR-2021)

FRSKD Official implementation for Refine Myself by Teaching Myself : Feature Refinement via Self-Knowledge Distillation (CVPR-2021) Requirements Pytho

75 Dec 28, 2022
TensorFlow-based implementation of "ICNet for Real-Time Semantic Segmentation on High-Resolution Images".

ICNet_tensorflow This repo provides a TensorFlow-based implementation of paper "ICNet for Real-Time Semantic Segmentation on High-Resolution Images,"

HsuanKung Yang 406 Nov 27, 2022
High-performance moving least squares material point method (MLS-MPM) solver.

High-Performance MLS-MPM Solver with Cutting and Coupling (CPIC) (MIT License) A Moving Least Squares Material Point Method with Displacement Disconti

Yuanming Hu 2.2k Dec 31, 2022
Creating a custom CNN hypertunned architeture for the Fashion MNIST dataset with Python, Keras and Tensorflow.

custom-cnn-fashion-mnist Creating a custom CNN hypertunned architeture for the Fashion MNIST dataset with Python, Keras and Tensorflow. The following

Danielle Almeida 1 Mar 05, 2022
Code for A Volumetric Transformer for Accurate 3D Tumor Segmentation

VT-UNet This repo contains the supported pytorch code and configuration files to reproduce 3D medical image segmentaion results of VT-UNet. Environmen

Himashi Amanda Peiris 114 Dec 20, 2022
Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow.

Generative Models Collection of generative models, e.g. GAN, VAE in Pytorch and Tensorflow. Also present here are RBM and Helmholtz Machine. Note: Gen

Agustinus Kristiadi 7k Jan 02, 2023
Pretraining Representations For Data-Efficient Reinforcement Learning

Pretraining Representations For Data-Efficient Reinforcement Learning Max Schwarzer, Nitarshan Rajkumar, Michael Noukhovitch, Ankesh Anand, Laurent Ch

Mila 40 Dec 11, 2022
Unsupervised Attributed Multiplex Network Embedding (AAAI 2020)

Unsupervised Attributed Multiplex Network Embedding (DMGI) Overview Nodes in a multiplex network are connected by multiple types of relations. However

Chanyoung Park 114 Dec 06, 2022
An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow implementation of SERank model. The code is developed based on TF-Ranking.

SERank An efficient and effective learning to rank algorithm by mining information across ranking candidates. This repository contains the tensorflow

Zhihu 44 Oct 20, 2022
Dataset and Source code of paper 'Enhancing Keyphrase Extraction from Academic Articles with their Reference Information'.

Enhancing Keyphrase Extraction from Academic Articles with their Reference Information Overview Dataset and code for paper "Enhancing Keyphrase Extrac

15 Nov 24, 2022
Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation

Uncertainty Estimation via Response Scaling for Pseudo-mask Noise Mitigation in Weakly-supervised Semantic Segmentation Introduction This is a PyTorch

XMed-Lab 30 Sep 23, 2022
EdiBERT, a generative model for image editing

EdiBERT, a generative model for image editing EdiBERT is a generative model based on a bi-directional transformer, suited for image manipulation. The

16 Dec 07, 2022
Trading and Backtesting environment for training reinforcement learning agent or simple rule base algo.

TradingGym TradingGym is a toolkit for training and backtesting the reinforcement learning algorithms. This was inspired by OpenAI Gym and imitated th

Yvictor 1.1k Jan 02, 2023
NExT-QA: Next Phase of Question-Answering to Explaining Temporal Actions (CVPR2021)

NExT-QA We reproduce some SOTA VideoQA methods to provide benchmark results for our NExT-QA dataset accepted to CVPR2021 (with 1 'Strong Accept' and 2

Junbin Xiao 50 Nov 24, 2022
Official PyTorch implementation of "Synthesis of Screentone Patterns of Manga Characters"

Manga Character Screentone Synthesis Official PyTorch implementation of "Synthesis of Screentone Patterns of Manga Characters" presented in IEEE ISM 2

Tsubota 2 Nov 20, 2021
A PyTorch Implementation of Gated Graph Sequence Neural Networks (GGNN)

A PyTorch Implementation of GGNN This is a PyTorch implementation of the Gated Graph Sequence Neural Networks (GGNN) as described in the paper Gated G

Ching-Yao Chuang 427 Dec 13, 2022
Official implementations of PSENet, PAN and PAN++.

News (2021/11/03) Paddle implementation of PAN, see Paddle-PANet. Thanks @simplify23. (2021/04/08) PSENet and PAN are included in MMOCR. Introduction

395 Dec 14, 2022
Viewmaker Networks: Learning Views for Unsupervised Representation Learning

Viewmaker Networks: Learning Views for Unsupervised Representation Learning Alex Tamkin, Mike Wu, and Noah Goodman Paper link: https://arxiv.org/abs/2

Alex Tamkin 31 Dec 01, 2022
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

Natural Language Processing @UCLA 463 Dec 09, 2022