AI Flow is an open source framework that bridges big data and artificial intelligence.

Related tags

Deep Learningai-flow
Overview

Flink AI Flow

Introduction

Flink AI Flow is an open source framework that bridges big data and artificial intelligence. It manages the entire machine learning project lifecycle as a unified workflow, including feature engineering, model training, model evaluation, model service, model inference, monitoring, etc. Throughout the entire workflow, Flink is used as the general purpose computing engine.

In addition to the capability of orchestrating a group of batch jobs, by leveraging an event-based scheduler(enhanced version of Airflow), Flink AI Flow also supports workflows that contain streaming jobs. Such capability is quite useful for complicated real-time machine learning systems as well as other real-time workflows in general.

Features

You can use Flink AI Flow to do the following:

  1. Define the machine learning workflow including batch/stream jobs.

  2. Manage metadata(generated by the machine learning workflow) of date sets, models, artifacts, metrics, jobs etc.

  3. Schedule and run the machine learning workflow.

  4. Publish and subscribe events

To support online machine learning scenarios, notification service and event-based schedulers are introduced. Flink AI Flow's current components are:

  1. SDK: It defines how to build a machine learning workflow and includes the api of the Flink AI Flow.

  2. Notification Service: It provides event listening and notification functions.

  3. Meta Service: It saves the meta data of the machine learning workflow.

  4. Event-Based Scheduler: It is a scheduler that triggered jobs by some events happened.

Documentation

QuickStart

You can use Flink AI Flow according to the guidelines of Quick Start. Besides, you can also take a look at our Tutorial for writing your own workflow.

API

Please refer to the API Documentation to find the API supported by Flink AI Flow.

Design

If you are interested in design principle of Flink AI Flow, please see the Design Documentation for more details.

Examples

You can refer to some examples of Flink AI Flow to have a better understanding of how to write a workflow. Please see the Examples directory.

Reporting bugs

If you encounter any issues please open an issue in github and we encourage you to provide a patch through github pull request as well.

Contributing

We happily welcome contributions to Flink AI Flow. Please see our contribution guide for details.

Contact Us

For more information, you can join the Flink AI Flow Users Group on DingTalk to contact us. The number of the DingTalk group is 35876083.

You can also join the group by scanning the QR code below:

Comments
  • [Notification] Support CLI tool for notification server

    [Notification] Support CLI tool for notification server

    What is the purpose of the change

    Support CLI tool for notification server

    Brief change log

    (for example:)

    • Support CLI tool for notification server

    Verifying this change

    (Please pick either of the following options)

    This change added tests.

    opened by jiangxin369 2
  • [Notification] Add notification cli server command

    [Notification] Add notification cli server command

    What is the purpose of the change

    Add notification cli server command #170

    Brief change log

    • Add notification cli server command

    Verifying this change

    This change added tests.

    opened by Sxnan 2
  • [Airflow]Fix remote log handler bug

    [Airflow]Fix remote log handler bug

    What is the purpose of the change

    Fix #172

    Brief change log

    • Change the log_relative_path in remote log handler like S3, GCS, Azure when they init the context.
    • Add get_provider_info.py to pass airflow test cases.
    • Modified old test case in providers.

    Verifying this change

    This change has modified old test case to fit these new remote log handlers.

    opened by aqua7regia 2
  • [Airflow] The remote logging supports the current log mechanism

    [Airflow] The remote logging supports the current log mechanism

    FileTaskHandler, a python log handler that handles and reads task instance logs, supports the current log mechanism which uses seq_num and try_number to name the log file. The remote log handler like S3TaskHandler should support the log mechanism.

    bug Airflow 
    opened by SteNicholas 2
  • [hotfix] fix typo of spark operator

    [hotfix] fix typo of spark operator

    What is the purpose of the change

    (For example: This pull request makes user can stop workflow externally.)

    Brief change log

    (for example:)

    • Send StopDagEvent to Airflow scheduler via notification service
    • Once received StopDagEvent, Airflow scheduler kills all running dag runs

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    (or)

    This change is already covered by existing tests, such as (please describe tests).

    (or)

    This change added tests.

    opened by jiangxin369 1
  • Daily uploading packages to PyPI

    Daily uploading packages to PyPI

    What is the purpose of the change

    (For example: This pull request makes user can stop workflow externally.)

    Brief change log

    (for example:)

    • Send StopDagEvent to Airflow scheduler via notification service
    • Once received StopDagEvent, Airflow scheduler kills all running dag runs

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    (or)

    This change is already covered by existing tests, such as (please describe tests).

    (or)

    This change added tests.

    opened by jiangxin369 1
  • Add Tutorial Example

    Add Tutorial Example

    Brief change log

    (for example:)

    • Add tutorial example and docs
    • Each workflow execution should have its own working directory to avoid file overriding

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    opened by jiangxin369 1
  • Add documentations

    Add documentations

    What is the purpose of the change

    (For example: This pull request makes user can stop workflow externally.)

    Brief change log

    (for example:)

    • Send StopDagEvent to Airflow scheduler via notification service
    • Once received StopDagEvent, Airflow scheduler kills all running dag runs

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    (or)

    This change is already covered by existing tests, such as (please describe tests).

    (or)

    This change added tests.

    opened by jiangxin369 1
  • Refactor Notification Event

    Refactor Notification Event

    What is the purpose of the change

    Refactor Notification Event

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    opened by jiangxin369 1
  • Fix unittests about existed namespace

    Fix unittests about existed namespace

    What is the purpose of the change

    Fix unittests about existed namespace

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    opened by jiangxin369 1
  • Add docs about concepts

    Add docs about concepts

    What is the purpose of the change

    Add docs about concepts

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    opened by jiangxin369 1
  • Unifying TaskStatusEvent and TaskStatusChangedEvent

    Unifying TaskStatusEvent and TaskStatusChangedEvent

    Describe the bug

    Your environment

    Operating system

    Database

    Python version

    To Reproduce

    Steps to reproduce the behavior (if you can):

    1. Submit a '...'
    2. Click on '....'
    3. See error

    Expected behavior

    Actual behavior

    Screenshots

    Additional context

    opened by jiangxin369 0
  • Workflow Execution Status Incorrect

    Workflow Execution Status Incorrect

    Describe the bug

    task1 action_on_task_status(task2, success) when task1 failed, task2 would not be scheduled, but the status of workflow execution is still running. I think the status shoud be failed

    Your environment

    Operating system

    Database

    Python version

    To Reproduce

    Steps to reproduce the behavior (if you can):

    1. Submit a '...'
    2. Click on '....'
    3. See error

    Expected behavior

    Actual behavior

    Screenshots

    Additional context

    opened by jiangxin369 0
  • AIFlow support python3.6

    AIFlow support python3.6

    Describe the feature

    AIFlow support python3.6

    Describe the solution you'd like

    Describe alternatives you've considered

    Additional context

    When run aiflow with python3.6, notification server blocks wich following logs:

    [2022-07-25 14:07:07,682 - server.py:68 [MainThread] - INFO: Notification server started.
    [2022-07-25 14:07:18,780 - server.py:189 [Thread-1] - ERROR: Lock is not acquired.
    Traceback (most recent call last):
      File "/root/venv_for_aiflow/lib64/python3.6/site-packages/notification_service/server.py", line 185, in _call_behavior_async
        return await behavior(argument, context), True
      File "/usr/lib64/python3.6/asyncio/coroutines.py", line 225, in coro
        res = yield from await_meth()
      File "/root/venv_for_aiflow/lib64/python3.6/site-packages/notification_service/service.py", line 221, in _list_all_events
        pass
      File "/usr/lib64/python3.6/asyncio/coroutines.py", line 212, in coro
        res = func(*args, **kw)
      File "/usr/lib64/python3.6/asyncio/locks.py", line 86, in __aexit__
        self.release()
      File "/usr/lib64/python3.6/asyncio/locks.py", line 207, in release
        raise RuntimeError('Lock is not acquired.')
    RuntimeError: Lock is not acquired.
    
    opened by jiangxin369 0
  • action_on_event_received only process events sent by current workflow execution

    action_on_event_received only process events sent by current workflow execution

    Describe the feature

    Currently, the event is broadcasting, once the wanted event is sent, all workflow executions of this workflow would be triggered.

    Describe the solution you'd like

    As the scheduler dispatcher would inspect the context of each event to figure out if it contains the workflow execution id, we can inject the runtime context of each task execution to the user-defined event to make sure the event will only effect on the current workflow execution.

    We need the following changes:

    • Add a global variable _CURRENT_TASK_CONTEXT in context.py, it is used to store the runtime context of each task execution.
    • To read and write the _CURRENT_TASK_CONTEXT, add get_runtime_task_context and set_runtime_task_context functions.
    • Add a public API called wrap_execution_context to inject the runtime info to the context of the event before sending it.
    _CURRENT_TASK_CONTEXT: TaskExecutionContext = None
    
    
    def set_runtime_task_context(context: TaskExecutionContext):
        global _CURRENT_TASK_CONTEXT
        _CURRENT_TASK_CONTEXT = context
    
    
    def get_runtime_task_context():
        return _CURRENT_TASK_CONTEXT
    
    
    def wrap_execution_info_to_context(event: Event):
        """
      The event whose context is wrapped with workflow execution info would only be processed by specific workflow execution.
      """
        pass
    

    How to use it?

    def func():
        notification_client = get_notification_client()
        event = Event(event_key=EVENT_KEY, message='This is a custom message.')
        
        // wrap event with context
        wrap_execution_info_to_context(event)
        // send event
        notification_client.send_event(event)
    
    with Workflow(name='workflow') as w1:
        task = PythonOperator(name='task', python_callable=func)
    

    Describe alternatives you've considered

    Additional context

    opened by jiangxin369 0
  • [NotificationService] Cannot use one EmbeddedNotificationClient instance in multiple threads

    [NotificationService] Cannot use one EmbeddedNotificationClient instance in multiple threads

    Describe the bug

    Cannot use one EmbeddedNotificationClient instance in multiple threads.

    Your environment

    Operating system

    Database

    Python version

    To Reproduce

        def test_create_client_multiple_threads(self):
            import threading
            from notification_service.embedded_notification_client import EmbeddedNotificationClient
            from notification_service.event import Event, EventKey
    
            client = EmbeddedNotificationClient(server_uri="localhost:50052", namespace='default', sender='sender')
    
            def send_event():
                event = Event(event_key=EventKey('key1'), message='a')
                client.send_event(event)
                print(client.sequence_number)
            threads = []
            for i in range(100):
                thread = threading.Thread(target=send_event)
                threads.append(thread)
            for t in threads:
                t.start()
            for t in threads:
                t.join()
    

    Expected behavior

    100 events sent.

    Actual behavior

    less than 100 events sent.

    Screenshots

    Additional context

    opened by jiangxin369 0
Releases(release-0.3.1)
  • release-0.3.1(Feb 22, 2022)

    Features

    1. Flink job plugin supports the multiple Flink versions #236
    2. Support stopping/resuming job scheduling #241
    3. Support log view of job execution for frontend #251
    4. Improve documentation

    Bug Fixes

    1. Airflow operator context does not set correctly #250
    2. Failed to trigger workflow execution #260
    3. Fix update the deployed model version multiple times #269
    4. Fix register_model_version sending wrong event type bug #290

    Welcome to use this release version and give us the feedback of the AIFlow.

    Source code(tar.gz)
    Source code(zip)
  • release-0.3.0(Dec 16, 2021)

    Features

    AIFlow

    1. Introduces the command-line interface to help operation.
    2. Support database version migration.
    3. Supports the worflow development on the Jupyter Notebook #140

    Notification Service

    1. Introduces the command-line interface to help operation.
    2. Support database version migration.

    Bug Fixes

    1. The remote logging supports the current log mechanism #172
    2. Unittest failed cause by init_ai_flow_context #185
    3. AIFlow Webserver cannot sort model version by version #207

    Welcome to use this release version and give us the feedback of the AIFlow.

    Source code(tar.gz)
    Source code(zip)
    examples.tar.gz(32.95 MB)
  • release-0.2.2(Nov 19, 2021)

    Features

    AIFlow

    1. Add the documents of the AIFlow. #117

    Airflow

    1. Allow LocalExecutor to run with SQLite. #41
    2. Airflow webserver supports the Airflow and Notification databases. #49

    Notification Service

    1. Introduces the countEvents interface. #11

    Bug Fixes

    1. Fix oss blob manager download concurrently. #3
    2. Deepcopy tasks in celery executor to avoid race condition. #12
    3. Fix periodic workflow cannot run. #77
    4. Fix HDFSBlobManager failed to download existed file. #87
    5. Removes the uncompleted api action_on_dataset_event. #104
    6. The workflow directory is set incorrect in AIFlow runtime. #111

    Welcome to use this release version and give us the feedback of the AIFlow.

    Source code(tar.gz)
    Source code(zip)
    examples.tar.gz(32.93 MB)
  • release-0.2.0(Feb 9, 2022)

    Features

    AIFlow

    1. Add workflow execution on event and ContextExtractor API #476
    2. Add task execution restful api #478
    3. AIFlow add WorkflowEventManager to listen and handle events #492
    4. Support start new workflow execution with context #479
    5. Introduce the workflow frontend of the AIFlow UI #509
    6. Add FlinkSqlProcessor #527
    7. Support job execution label #529
    8. Frontend support metadata ui #533
    9. Notification service supports the idempotence #553
    10. Add read-only job plugin #555

    Airflow

    1. Support celery executor on event based scheduler #482
    2. Add AirFlowScheduler with airflow restful api #486

    Bug Fixes

    1. Make AI Flow be able to use Notification Service with HA enabled #510
    2. Duplicated entry when create dagrun #570
    3. EventBaseScheduler catches and prints exceptions #586
    4. Scheduler should find schedulable tasks once dagrun finished #587
    5. EventBaseScheduler would trigger task multiple times incorrectly #598
    Source code(tar.gz)
    Source code(zip)
    ai-flow-examples.tar.gz(32.93 MB)
  • release-0.1.0(Feb 9, 2022)

Owner
A neutral organization to host ecosystem projects for Apache Flink
Reinforcement Learning for the Blackjack

Reinforcement Learning for Blackjack Author: ZHA Mengyue Math Department of HKUST Problem Statement We study playing Blackjack by reinforcement learni

Dolores 3 Jan 24, 2022
A library that allows for inference on probabilistic models

Bean Machine Overview Bean Machine is a probabilistic programming language for inference over statistical models written in the Python language using

Meta Research 234 Dec 29, 2022
DAT4 - General Assembly's Data Science course in Washington, DC

DAT4 Course Repository Course materials for General Assembly's Data Science course in Washington, DC (12/15/14 - 3/16/15). Instructors: Sinan Ozdemir

Kevin Markham 779 Dec 25, 2022
PyTorch trainer and model for Sequence Classification

PyTorch-trainer-and-model-for-Sequence-Classification After cloning the repository, modify your training data so that the training data is a .csv file

NhanTieu 2 Dec 09, 2022
PyTorch code for the "Deep Neural Networks with Box Convolutions" paper

Box Convolution Layer for ConvNets Single-box-conv network (from `examples/mnist.py`) learns patterns on MNIST What This Is This is a PyTorch implemen

Egor Burkov 515 Dec 18, 2022
Code repository for "Free View Synthesis", ECCV 2020.

Free View Synthesis Code repository for "Free View Synthesis", ECCV 2020. Setup Install the following Python packages in your Python environment - num

Intelligent Systems Lab Org 253 Dec 07, 2022
Large dataset storage format for Pytorch

H5Record Large dataset ( 100G, = 1T) storage format for Pytorch (wip) Support python 3 pip install h5record Why? Writing large dataset is still a

theblackcat102 43 Oct 22, 2022
A tool to analyze leveraged liquidity mining and find optimal option combination for hedging.

LP-Option-Hedging Description A Python program to analyze leveraged liquidity farming/mining and find the optimal option combination for hedging imper

Aureliano 18 Dec 19, 2022
Visualizer using audio and semantic analysis to explore BigGAN (Brock et al., 2018) latent space.

BigGAN Audio Visualizer Description This visualizer explores BigGAN (Brock et al., 2018) latent space by using pitch/tempo of an audio file to generat

Rush Kapoor 2 Nov 21, 2022
Experiments for distributed optimization algorithms

Network-Distributed Algorithm Experiments -- This repository contains a set of optimization algorithms and objective functions, and all code needed to

Boyue Li 40 Dec 04, 2022
Tidy interface to polars

tidypolars tidypolars is a data frame library built on top of the blazingly fast polars library that gives access to methods and functions familiar to

Mark Fairbanks 144 Jan 08, 2023
A simple rest api that classifies pneumonia infection weather it is Normal, Pneumonia Virus or Pneumonia Bacteria from a chest-x-ray image.

This is a simple rest api that classifies pneumonia infection weather it is Normal, Pneumonia Virus or Pneumonia Bacteria from a chest-x-ray image.

crispengari 3 Jan 08, 2022
This is the face keypoint train code of project face-detection-project

face-key-point-pytorch 1. Data structure The structure of landmarks_jpg is like below: |--landmarks_jpg |----AFW |------AFW_134212_1_0.jpg |------AFW_

I‘m X 3 Nov 27, 2022
A torch implementation of "Pixel-Level Domain Transfer"

Pixel Level Domain Transfer A torch implementation of "Pixel-Level Domain Transfer". based on dcgan.torch. Dataset The dataset used is "LookBook", fro

Fei Xia 260 Sep 02, 2022
Algorithmic trading using machine learning.

Algorithmic Trading This machine learning algorithm was built using Python 3 and scikit-learn with a Decision Tree Classifier. The program gathers sto

Sourav Biswas 101 Nov 10, 2022
Object detection using yolo-tiny model and opencv used as backend

Object detection Algorithm used : Yolo algorithm Backend : opencv Library required: opencv = 4.5.4-dev' Quick Overview about structure 1) main.py Load

2 Jul 06, 2022
A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squares.

W.I.P-Aim-Memory-Game A customisable game where you have to quickly click on black tiles in order of appearance while avoiding clicking on white squar

dE_soot 1 Dec 08, 2021
Continual learning with sketched Jacobian approximations

Continual learning with sketched Jacobian approximations This repository contains the code for reproducing figures and results in the paper ``Provable

Machine Learning and Information Processing Laboratory 1 Jun 30, 2022
Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL BASALT Challenge.

KAIROS MineRL BASALT Codebase for the solution that won first place and was awarded the most human-like agent in the 2021 NeurIPS Competition MineRL B

Vinicius G. Goecks 37 Oct 30, 2022
Collaborative forensic timeline analysis

Timesketch Table of Contents About Timesketch Getting started Community Contributing About Timesketch Timesketch is an open-source tool for collaborat

Google 2.1k Dec 28, 2022