This repository is home to the Optimus data transformation plugins for various data processing needs.

Overview

Transformers

test workflow build workflow

Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus.

To install plugins via homebrew

brew tap odpf/taps
brew install optimus-plugins-odpf

To install plugins via shell

curl -sL ${PLUGIN_RELEASE_URL} | tar xvz
chmod +x optimus-*
mv optimus-* /usr/bin/
Comments
  • Fix: fix ignoreupstream helper for big query view

    Fix: fix ignoreupstream helper for big query view

    Hello, Currently, for any query, we try to find the dependancy and ignoredependancy with FindDependenciesWithRegex and then we again pull the Refereced table with big query dry run.

    If query contains view which is marked with /* @ignoreupstream */ helper, then ignoredependancy will contain the view name but not the table referenced by view.

    The change here is to revise ignoredependancy list with table referenced by view.

    I kept the loop execution in sequential manner, please let me know if should add concurrency here

    enhancement 
    opened by SumitAgrawal03071989 2
  • @ignoreupstream ineffective on big query view

    @ignoreupstream ineffective on big query view

    We have a query referencing to table as well as view. select * from proj.dataset.table t1 left join proj.dataset.view v1 on t1.date = v1.date and t1.id = v1.id

    • now if we apply @ignoreupstream helper on table proj.dataset.table then it correctly ignores to create upstream dependancy for this table.
    • But if we apply @ignoreupstream helper on view proj.dataset.view ( note the view query refers to 2 more tables ) then it does not ignore view or table referenced by view.
    opened by SumitAgrawal03071989 2
  • feat : migrate plugins for the inti-container changes in optimus

    feat : migrate plugins for the inti-container changes in optimus

    As per Optimus PR, the executor boot process is standardised and maintained at optimus. Plugin devs need no longer have to wrap the executor image. closes odpf/optimus#405

    opened by smarch-int 1
  • monthly job didn't run for the last day of month

    monthly job didn't run for the last day of month

    Hi team,

    I have a bq2bq job with window configuration

      window:
        size: 720h
        offset: -48h
        truncate_to: M
    

    I expect to have transformation for date 01 to last day of the month, e.g on April, I expect got transformation from date 01 - 30. but currently only got transformation from date 01 - 29

    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-26 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-27 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-28 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-29 00:00:00+00:00
    [2022-06-13 15:08:12,324] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: start transformation job
    [2022-06-13 15:08:12,324] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: sql transformation query:
    

    after checking, I suspect the logic may related to this line, where the last day generated by windows class not included as the transformation partition.

    https://github.com/odpf/transformers/blob/ea1de4f0de3d17d9be7ccefb1e2f3beab1a685f1/task/bq2bq/executor/bumblebee/transformation.py#L393

    please kindly check it, and release the fix. thank you

    opened by novanxyz 1
  • feat: add support for secret env vars

    feat: add support for secret env vars

    With this we are adding support for using secrets in macros, we do not want to print the env vars in the logs, so exporting them as a separate file from optimus.

    Plugins can export this extra file to get env vars.

    opened by sbchaos 1
  • feat : remove wrapper image and use bq2bq executor image in plugin

    feat : remove wrapper image and use bq2bq executor image in plugin

    As per https://github.com/odpf/optimus/pull/425, the executor boot process is standardised and maintained at optimus. Plugin devs need no longer have to wrap the executor image. closes https://github.com/odpf/optimus/issues/405

    opened by smarch-int 0
  • Generate Dependencies is using the dry run apis which is bound to fail with macros

    Generate Dependencies is using the dry run apis which is bound to fail with macros

    The most intuitive way is to parse the query and hit the metadata apis instead of going through the dry run which should be definitly costly then the metadata fetch apis.

    enhancement performance 
    opened by sravankorumilli 0
  • BQ2BQ Replace load dispostion doesn't handle aggregations

    BQ2BQ Replace load dispostion doesn't handle aggregations

    Options

    1. Add an extra option in Replace load dispostition to take input from users to replace a specific or range of partitions using literals / all , dstart, dend. Default is all : all represents splitting of query to multiple partitions from dstart to dend.
    2. Use a new Load Disposition, to replace to a single destination partition which is window start
    bug 
    opened by sravankorumilli 0
Releases(v0.2.1)
Owner
Open Data Platform
Next-gen collaborative, domain-driven and distributed data platform
Open Data Platform
PyTorch implementation of the NIPS-17 paper "Poincaré Embeddings for Learning Hierarchical Representations"

Poincaré Embeddings for Learning Hierarchical Representations PyTorch implementation of Poincaré Embeddings for Learning Hierarchical Representations

Facebook Research 1.6k Dec 29, 2022
This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm, corresponding to the paper Fully Supervised Speaker Diarization.

UIS-RNN Overview This is the library for the Unbounded Interleaved-State Recurrent Neural Network (UIS-RNN) algorithm. UIS-RNN solves the problem of s

Google 1.4k Dec 28, 2022
Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Natural Language Processing for Adverse Drug Reaction (ADR) Detection This repo contains code from a project to identify ADRs in discharge summaries a

Medicines Optimisation Service - Austin Health 21 Aug 05, 2022
Yet Another Neural Machine Translation Toolkit

YANMTT YANMTT is short for Yet Another Neural Machine Translation Toolkit. For a backstory how I ended up creating this toolkit scroll to the bottom o

Raj Dabre 121 Jan 05, 2023
SGMC: Spectral Graph Matrix Completion

SGMC: Spectral Graph Matrix Completion Code for AAAI21 paper "Scalable and Explainable 1-Bit Matrix Completion via Graph Signal Learning". Data Format

Chao Chen 8 Dec 12, 2022
Repository for Graph2Pix: A Graph-Based Image to Image Translation Framework

Graph2Pix: A Graph-Based Image to Image Translation Framework Installation Install the dependencies in env.yml $ conda env create -f env.yml $ conda a

18 Nov 17, 2022
Just Another Telegram Ai Chat Bot Written In Python With Pyrogram.

OkaeriChatBot Just another Telegram AI chat bot written in Python using Pyrogram. Requirements Python 3.7 or higher.

Wahyusaputra 2 Dec 23, 2021
An open collection of annotated voices in Japanese language

声庭 (Koniwa): オープンな日本語音声とアノテーションのコレクション Koniwa (声庭): An open collection of annotated voices in Japanese language 概要 Koniwa(声庭)は利用・修正・再配布が自由でオープンな音声とアノテ

Koniwa project 32 Dec 14, 2022
Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense.

PythonTextObfuscator Takes a string and puts it through different languages in Google Translate a requested amount of times, returning nonsense. Requi

2 Aug 29, 2022
Honor's thesis project analyzing whether the GPT-2 model can more effectively generate free-verse or structured poetry.

gpt2-poetry The following code is for my senior honor's thesis project, under the guidance of Dr. Keith Holyoak at the University of California, Los A

Ashley Kim 2 Jan 09, 2022
Pytorch code for ICRA'21 paper: "Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation"

Hierarchical Cross-Modal Agent for Robotics Vision-and-Language Navigation This repository is the pytorch implementation of our paper: Hierarchical Cr

44 Jan 06, 2023
Code for "Generative adversarial networks for reconstructing natural images from brain activity".

Reconstruct handwritten characters from brains using GANs Example code for the paper "Generative adversarial networks for reconstructing natural image

K. Seeliger 2 May 17, 2022
Healthsea is a spaCy pipeline for analyzing user reviews of supplementary products for their effects on health.

Welcome to Healthsea ✨ Create better access to health with spaCy. Healthsea is a pipeline for analyzing user reviews to supplement products by extract

Explosion 75 Dec 19, 2022
Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Frog for Python This is a Python binding to the Natural Language Processing suite Frog. Frog is intended for Dutch and performs part-of-speech tagging

Maarten van Gompel 46 Dec 14, 2022
ReCoin - Restoring our environment and businesses in parallel

Shashank Ojha, Sabrina Button, Abdellah Ghassel, Joshua Gonzales "Reduce Reuse R

sabrina button 1 Mar 14, 2022
Smart discord chatbot integrated with Dialogflow to manage different classrooms and assist in teaching!

smart-school-chatbot Smart discord chatbot integrated with Dialogflow to interact with students naturally and manage different classes in a school. De

Tom Huynh 5 Oct 24, 2022
Treemap visualisation of Maya scene files

Ever wondered which nodes are responsible for that 600 mb+ Maya scene file? Features Fast, resizable UI Parsing at 50 mb/sec Dependency-free, single-f

Marcus Ottosson 76 Nov 12, 2022
Unofficial PyTorch implementation of Google AI's VoiceFilter system

VoiceFilter Note from Seung-won (2020.10.25) Hi everyone! It's Seung-won from MINDs Lab, Inc. It's been a long time since I've released this open-sour

MINDs Lab 881 Jan 03, 2023
Machine Psychology: Python Generated Art

Machine Psychology: Python Generated Art A limited collection of 64 algorithmically generated artwork. Each unique piece is then given a title by the

Pixegami Team 67 Dec 13, 2022
Use the state-of-the-art m2m100 to translate large data on CPU/GPU/TPU. Super Easy!

Easy-Translate is a script for translating large text files in your machine using the M2M100 models from Facebook/Meta AI. We also privide a script fo

Iker García-Ferrero 41 Dec 15, 2022