This repository is home to the Optimus data transformation plugins for various data processing needs.

Overview

Transformers

test workflow build workflow

Optimus's transformation plugins are implementations of Task and Hook interfaces that allows execution of arbitrary jobs in optimus.

To install plugins via homebrew

brew tap odpf/taps
brew install optimus-plugins-odpf

To install plugins via shell

curl -sL ${PLUGIN_RELEASE_URL} | tar xvz
chmod +x optimus-*
mv optimus-* /usr/bin/
Comments
  • Fix: fix ignoreupstream helper for big query view

    Fix: fix ignoreupstream helper for big query view

    Hello, Currently, for any query, we try to find the dependancy and ignoredependancy with FindDependenciesWithRegex and then we again pull the Refereced table with big query dry run.

    If query contains view which is marked with /* @ignoreupstream */ helper, then ignoredependancy will contain the view name but not the table referenced by view.

    The change here is to revise ignoredependancy list with table referenced by view.

    I kept the loop execution in sequential manner, please let me know if should add concurrency here

    enhancement 
    opened by SumitAgrawal03071989 2
  • @ignoreupstream ineffective on big query view

    @ignoreupstream ineffective on big query view

    We have a query referencing to table as well as view. select * from proj.dataset.table t1 left join proj.dataset.view v1 on t1.date = v1.date and t1.id = v1.id

    • now if we apply @ignoreupstream helper on table proj.dataset.table then it correctly ignores to create upstream dependancy for this table.
    • But if we apply @ignoreupstream helper on view proj.dataset.view ( note the view query refers to 2 more tables ) then it does not ignore view or table referenced by view.
    opened by SumitAgrawal03071989 2
  • feat : migrate plugins for the inti-container changes in optimus

    feat : migrate plugins for the inti-container changes in optimus

    As per Optimus PR, the executor boot process is standardised and maintained at optimus. Plugin devs need no longer have to wrap the executor image. closes odpf/optimus#405

    opened by smarch-int 1
  • monthly job didn't run for the last day of month

    monthly job didn't run for the last day of month

    Hi team,

    I have a bq2bq job with window configuration

      window:
        size: 720h
        offset: -48h
        truncate_to: M
    

    I expect to have transformation for date 01 to last day of the month, e.g on April, I expect got transformation from date 01 - 30. but currently only got transformation from date 01 - 29

    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-26 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-27 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-28 00:00:00+00:00
    [2022-06-13 15:08:12,323] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: create transformation for partition: 2022-04-29 00:00:00+00:00
    [2022-06-13 15:08:12,324] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: start transformation job
    [2022-06-13 15:08:12,324] {pod_launcher.py:149} INFO - [2022-06-13 15:08:12] INFO:bumblebee.transformation: sql transformation query:
    

    after checking, I suspect the logic may related to this line, where the last day generated by windows class not included as the transformation partition.

    https://github.com/odpf/transformers/blob/ea1de4f0de3d17d9be7ccefb1e2f3beab1a685f1/task/bq2bq/executor/bumblebee/transformation.py#L393

    please kindly check it, and release the fix. thank you

    opened by novanxyz 1
  • feat: add support for secret env vars

    feat: add support for secret env vars

    With this we are adding support for using secrets in macros, we do not want to print the env vars in the logs, so exporting them as a separate file from optimus.

    Plugins can export this extra file to get env vars.

    opened by sbchaos 1
  • feat : remove wrapper image and use bq2bq executor image in plugin

    feat : remove wrapper image and use bq2bq executor image in plugin

    As per https://github.com/odpf/optimus/pull/425, the executor boot process is standardised and maintained at optimus. Plugin devs need no longer have to wrap the executor image. closes https://github.com/odpf/optimus/issues/405

    opened by smarch-int 0
  • Generate Dependencies is using the dry run apis which is bound to fail with macros

    Generate Dependencies is using the dry run apis which is bound to fail with macros

    The most intuitive way is to parse the query and hit the metadata apis instead of going through the dry run which should be definitly costly then the metadata fetch apis.

    enhancement performance 
    opened by sravankorumilli 0
  • BQ2BQ Replace load dispostion doesn't handle aggregations

    BQ2BQ Replace load dispostion doesn't handle aggregations

    Options

    1. Add an extra option in Replace load dispostition to take input from users to replace a specific or range of partitions using literals / all , dstart, dend. Default is all : all represents splitting of query to multiple partitions from dstart to dend.
    2. Use a new Load Disposition, to replace to a single destination partition which is window start
    bug 
    opened by sravankorumilli 0
Releases(v0.2.1)
Owner
Open Data Platform
Next-gen collaborative, domain-driven and distributed data platform
Open Data Platform
ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python) 日本語は以下に続きます (Japanese follows) English: This book is written in Japanese and primaril

Ryuichi Yamamoto 189 Dec 29, 2022
Kerberoast with ACL abuse capabilities

targetedKerberoast targetedKerberoast is a Python script that can, like many others (e.g. GetUserSPNs.py), print "kerberoast" hashes for user accounts

Shutdown 213 Dec 22, 2022
Higher quality textures for the Metal Gear Solid series.

Metal Gear Solid: HD Textures Higher quality textures for the Metal Gear Solid series. The goal is to maximize the quality of assets that the engine w

Samantha 6 Dec 06, 2022
ReCoin - Restoring our environment and businesses in parallel

Shashank Ojha, Sabrina Button, Abdellah Ghassel, Joshua Gonzales "Reduce Reuse R

sabrina button 1 Mar 14, 2022
Code repository for "It's About Time: Analog clock Reading in the Wild"

it's about time Code repository for "It's About Time: Analog clock Reading in the Wild" Packages required: pytorch (used 1.9, any reasonable version s

52 Nov 10, 2022
CLIPfa: Connecting Farsi Text and Images

CLIPfa: Connecting Farsi Text and Images OpenAI released the paper Learning Transferable Visual Models From Natural Language Supervision in which they

Sajjad Ayoubi 66 Dec 14, 2022
Command Line Text-To-Speech using Google TTS

cli-tts Thanks to gTTS by @pndurette! This is an interactive command line text-to-speech tool using Google TTS. Just type text and the voice will be p

ReekyStive 3 Nov 11, 2022
GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates

GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates Vibhor Agarwal, Sagar Joglekar, Anthony P. Young an

Vibhor Agarwal 2 Jun 30, 2022
Nateve compiler developed with python.

Adam Adam is a Nateve Programming Language compiler developed using Python. Nateve Nateve is a new general domain programming language open source ins

Nateve 7 Jan 15, 2022
UniSpeech - Large Scale Self-Supervised Learning for Speech

UniSpeech The family of UniSpeech: WavLM (arXiv): WavLM: Large-Scale Self-Supervised Pre-training for Full Stack Speech Processing UniSpeech (ICML 202

Microsoft 281 Dec 15, 2022
Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG)

Indobenchmark Toolkit Indobenchmark are collections of Natural Language Understanding (IndoNLU) and Natural Language Generation (IndoNLG) resources fo

Samuel Cahyawijaya 11 Aug 26, 2022
Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings

Cải thiện Elasticsearch trong bài toán semantic search sử dụng phương pháp Sentence Embeddings Trong bài viết này mình sẽ sử dụng pretrain model SimCS

Vo Van Phuc 18 Nov 25, 2022
An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

NLP-Pytorch-Assignment An assignment from my grad-level data mining course (before I started personal projects) demonstrating some experience with NLP

David Thorne 0 Feb 06, 2022
Compute distance between sequences. 30+ algorithms, pure python implementation, common interface, optional external libs usage.

TextDistance TextDistance -- python library for comparing distance between two or more sequences by many algorithms. Features: 30+ algorithms Pure pyt

Life4 3k Jan 06, 2023
Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration

Phrase-BERT: Improved Phrase Embeddings from BERT with an Application to Corpus Exploration This is the official repository for the EMNLP 2021 long pa

70 Dec 11, 2022
मराठी भाषा वाचविण्याचा एक प्रयास. इंग्रजी ते मराठीचा शब्दकोश. An attempt to preserve the Marathi language. A lightweight and ad free English to Marathi thesaurus.

For English, scroll down मराठी शब्द मराठी भाषा वाचवण्यासाठी मी हा ओपन सोर्स प्रोजेक्ट सुरू केला आहे. माझ्या मते, आपली भाषा हळूहळू आणि कोणाचाही लक्षात

मुक्त स्त्रोत 20 Oct 11, 2022
Torchrecipes provides a set of reproduci-able, re-usable, ready-to-run RECIPES for training different types of models, across multiple domains, on PyTorch Lightning.

Recipes are a standard, well supported set of blueprints for machine learning engineers to rapidly train models using the latest research techniques without significant engineering overhead.Specifica

Meta Research 193 Dec 28, 2022
This repository contains the codes for LipGAN. LipGAN was published as a part of the paper titled "Towards Automatic Face-to-Face Translation".

LipGAN Generate realistic talking faces for any human speech and face identity. [Paper] | [Project Page] | [Demonstration Video] Important Update: A n

Rudrabha Mukhopadhyay 438 Dec 31, 2022
Quick insights from Zoom meeting transcripts using Graph + NLP

Transcript Analysis - Graph + NLP This program extracts insights from Zoom Meeting Transcripts (.vtt) using TigerGraph and NLTK. In order to run this

Advit Deepak 7 Sep 17, 2022
Yuqing Xie 2 Feb 17, 2022