Extracts data from the database for a graph-node and stores it in parquet files

Overview

subgraph-extractor

Extracts data from the database for a graph-node and stores it in parquet files

Installation

For developing, it's recommended to use conda to create an environment.

Create one with python 3.9

conda create --name subgraph-extractor python=3.9

Now activate it

conda activate subgraph-extractor

Install the dev packages (note there is no space after the .)

pip install -e .[dev]

Use

Now you can use the main entrypoint, see help for more details

subgraph_extractor --help

Creating a config files

The easiest way to start is to use the interactive subgraph config generator.

Start by launching the subgraph config generator with the location you want to write the config file to.

subgraph_config_generator --config-location subgraph_config.yaml

It will default to using a local graph-node with default username & password (postgresql://graph-node:[email protected]:5432/graph-node) If you are connecting to something else you need to specify the database connection string with --database-string.

You will then be asked to select:

  • The relevant subgraph
  • From the subgraph, which tables to extract (multi-select)
  • For each table, which column to partition on (this is typically the block number or timestamp)
  • Any numeric columns that require mapping to another type * see note below

Numeric column mappings

Uint256 is a common data type in contracts but rare in most data processing tools. The graph node creates a Postgres Numeric column for any field marked as a BigInt as it is capable of accurately storing uint256s (a common data type in solidity).

However, many downstream tools cannot handle these as numbers.

By default, these columns will be exported as bytes - a lossless representation but one that is not as usable for sums, averages, etc. This is fine for some data, such as addresses or where the field is used to pack data (e.g. the tokenIds for decentraland).

For other use cases, the data must be converted to another type. In the config file, you can specify numeric columns that need to be mapped to another type:

column_mappings:
  my_original_column_name:
    my_new_column_name:
      type: uint64

However, if the conversion does not work (e.g. the number is too large), the extraction will stop with an error. This is fine for cases where you know the range (e.g. timestamp or block number). For other cases you can specify a maximum value, default and a column to store whether the row was at most the maximum value:

column_mappings:
  my_original_column_name:
    my_new_column_name:
      type: uint64
      max_value: 18446744073709551615
      default: 0
      validity_column: new_new_column_name_valid

If the number is over 18446744073709551615, there will be a 0 stored in the column my_new_column_name and FALSE stored in new_new_column_name_valid.

If your numbers are too large but can be safely lowered for your usecase (e.g. converting from wei to gwei) you can provide a downscale value:

column_mappings:
  transfer_fee_wei:
    transfer_fee_gwei:
      downscale: 1000000000
      type: uint64
      max_value: 18446744073709551615
      default: 0
      validity_column: transfer_fee_gwei_valid

This will perform an integer division (divide and floor) the original value. WARNING this is a lossy conversion.

You may have as many mappings for a single column as you want, and the original will always be present as bytes.

The following numeric types are allowed:

  • int8, int16, int32, int64
  • uint8, uint16, uint32, uint64
  • float32, float64
  • Numeric38 (this is a numeric/Decimal column with 38 digits of precision)

Contributing

Please format everything with black and isort

black . && isort --profile=black .
Owner
Cardstack
Experience Web 3.0.
Cardstack
Attack classification models with transferability, black-box attack; unrestricted adversarial attacks on imagenet

Attack classification models with transferability, black-box attack; unrestricted adversarial attacks on imagenet, CVPR2021 安全AI挑战者计划第六期:ImageNet无限制对抗攻击 决赛第四名(team name: Advers)

51 Dec 01, 2022
Temporal Knowledge Graph Reasoning Triggered by Memories

MTDM Temporal Knowledge Graph Reasoning Triggered by Memories To alleviate the time dependence, we propose a memory-triggered decision-making (MTDM) n

4 Sep 25, 2022
Message Passing on Cell Complexes

CW Networks This repository contains the code used for the papers Weisfeiler and Lehman Go Cellular: CW Networks (Under review) and Weisfeiler and Leh

Twitter Research 108 Jan 05, 2023
A 1.3B text-to-image generation model trained on 14 million image-text pairs

minDALL-E on Conceptual Captions minDALL-E, named after minGPT, is a 1.3B text-to-image generation model trained on 14 million image-text pairs for no

Kakao Brain 604 Dec 14, 2022
DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation

DynaTune: Dynamic Tensor Program Optimization in Deep Neural Network Compilation This repository is the implementation of DynaTune paper. This folder

4 Nov 02, 2022
This is the official code of L2G, Unrolling and Recurrent Unrolling in Learning to Learn Graph Topologies.

Learning to Learn Graph Topologies This is the official code of L2G, Unrolling and Recurrent Unrolling in Learning to Learn Graph Topologies. Requirem

Stacy X PU 16 Dec 09, 2022
Open-source Monocular Python HawkEye for Tennis

Tennis Tracking 🎾 Objectives Track the ball Detect court lines Detect the players To track the ball we used TrackNet - deep learning network for trac

ArtLabs 188 Jan 08, 2023
Official code for MPG2: Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN

This is the official code for Multi-attribute Pizza Generator (MPG2): Cross-domain Attribute Control with Conditional StyleGAN. Paper Demo Setup Envir

Fangda Han 5 Sep 01, 2022
Official implementation for (Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching, AAAI-2021)

Show, Attend and Distill: Knowledge Distillation via Attention-based Feature Matching Official pytorch implementation of "Show, Attend and Distill: Kn

Clova AI Research 80 Dec 16, 2022
An open-source Kazakh named entity recognition dataset (KazNERD), annotation guidelines, and baseline NER models.

Kazakh Named Entity Recognition This repository contains an open-source Kazakh named entity recognition dataset (KazNERD), named entity annotation gui

ISSAI 9 Dec 23, 2022
Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CVPR 2021)

Semi-supervised Semantic Segmentation with Directional Context-aware Consistency (CAC) Xin Lai*, Zhuotao Tian*, Li Jiang, Shu Liu, Hengshuang Zhao, Li

DV Lab 137 Dec 14, 2022
git《FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding》(CVPR 2021) GitHub: [fig8]

FSCE: Few-Shot Object Detection via Contrastive Proposal Encoding (CVPR 2021) This repo contains the implementation of our state-of-the-art fewshot ob

233 Dec 29, 2022
Interactive web apps created using geemap and streamlit

geemap-apps Introduction This repo demostrates how to build a multi-page Earth Engine App using streamlit and geemap. You can deploy the app on variou

Qiusheng Wu 27 Dec 23, 2022
Automatic Video Captioning Evaluation Metric --- EMScore

Automatic Video Captioning Evaluation Metric --- EMScore Overview For an illustration, EMScore can be computed as: Installation modify the encode_text

Yaya Shi 17 Nov 28, 2022
Autotype on websites that have copy-paste disabled like Moodle, HackerEarth contest etc.

Autotype A quick and small python script that helps you autotype on websites that have copy paste disabled like Moodle, HackerEarth contests etc as it

Tushar 32 Nov 03, 2022
Collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning.

Collection of tasks for fast prototyping, baselining, finetuning and solving problems with deep learning Installation

Pytorch Lightning 1.6k Jan 08, 2023
Hitters Linear Regression - Hitters Linear Regression With Python

Hitters_Linear_Regression Kullanacağımız veri seti Carnegie Mellon Üniversitesi'

AyseBuyukcelik 2 Jan 26, 2022
Machine Learning Platform for Kubernetes

Reproduce, Automate, Scale your data science. Welcome to Polyaxon, a platform for building, training, and monitoring large scale deep learning applica

polyaxon 3.2k Dec 23, 2022
Range Image-based LiDAR Localization for Autonomous Vehicles Using Mesh Maps

Range Image-based 3D LiDAR Localization This repo contains the code for our ICRA2021 paper: Range Image-based LiDAR Localization for Autonomous Vehicl

Photogrammetry & Robotics Bonn 208 Dec 15, 2022
Sarus implementation of classical ML models. The models are implemented using the Keras API of tensorflow 2. Vizualization are implemented and can be seen in tensorboard.

Sarus published models Sarus implementation of classical ML models. The models are implemented using the Keras API of tensorflow 2. Vizualization are

Sarus Technologies 39 Aug 19, 2022