Block fingerprinting for the beacon chain, for client identification & client diversity metrics

Overview

blockprint

This is a repository for discussion and development of tools for Ethereum block fingerprinting.

The primary aim is to measure beacon chain client diversity using on-chain data, as described in this tweet:

https://twitter.com/sproulM_/status/1440512518242197516

The latest estimate using the improved k-NN classifier for slots 2048001 to 2164916 is:

Getting Started

The raw data for block fingerprinting needs to be sourced from Lighthouse's block_rewards API.

This is a new API that is currently only available on the block-rewards-api branch, i.e. this pull request: https://github.com/sigp/lighthouse/pull/2628

Lighthouse can be built from source by following the instructions here.

VirtualEnv

All Python commands should be run from a virtualenv with the dependencies from requirements.txt installed.

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

k-NN Classifier

The best classifier implemented so far is a k-nearest neighbours classifier in knn_classifier.py.

It requires a directory of structered training data to run, and can be used either via a small API server, or in batch mode.

You can download a large (886M) training data set here.

To run in batch mode against a directory of JSON batches (individual files downloaded from LH), use this command:

./knn_classifier.py training_data_proc data_to_classify

Expected output is:

classifier score: 0.9886800869904645
classifying rewards from file slot_2048001_to_2050048.json
total blocks processed: 2032
Lighthouse,0.2072
Nimbus or Prysm,0.002
Nimbus or Teku,0.0025
Prysm,0.6339
Prysm or Teku,0.0241
Teku,0.1304

Training the Classifier

The classifier is trained from a directory of reward batches. You can fetch batches with the load_blocks.py script by providing a start slot, end slot and output directory:

./load_blocks.py 2048001 2048032 testdata

The directory testdata now contains 1 or more files of the form slot_X_to_Y.json downloaded from Lighthouse.

To train the classifier on this data, use the prepare_training_data.py script:

./prepare_training_data.py testdata testdata_proc

This will read files from testdata and write the graffiti-classified training data to testdata_proc, which is structured as directories of single block reward files for each client.

$ tree testdata_proc
testdata_proc
├── Lighthouse
│   ├── 0x03ae60212c73bc2d09dd3a7269f042782ab0c7a64e8202c316cbcaf62f42b942.json
│   └── 0x5e0872a64ea6165e87bc7e698795cb3928484e01ffdb49ebaa5b95e20bdb392c.json
├── Nimbus
│   └── 0x0a90585b2a2572305db37ef332cb3cbb768eba08ad1396f82b795876359fc8fb.json
├── Prysm
│   └── 0x0a16c9a66800bd65d997db19669439281764d541ca89c15a4a10fc1782d94b1c.json
└── Teku
    ├── 0x09d60a130334aa3b9b669bf588396a007e9192de002ce66f55e5a28309b9d0d3.json
    ├── 0x421a91ebdb650671e552ce3491928d8f78e04c7c9cb75e885df90e1593ca54d6.json
    └── 0x7fedb0da9699c93ce66966555c6719e1159ae7b3220c7053a08c8f50e2f3f56f.json

You can then use this directory as the first argument to ./knn_classifier.py.

Classifier API

With pre-processed training data installed in ./training_data_proc, you can host a classification API server like this:

gunicorn --reload api_server --timeout 1800

It will take a few minutes to start-up while it loads all of the training data into memory.

Initialising classifier, this could take a moment...
Start-up complete, classifier score is 0.9886800869904645

Once it has started up, you can make POST requests to the /classify endpoint containing a single JSON-encoded block reward. There is an example input file in examples.

curl -s -X POST -H "Content-Type: application/json" --data @examples/single_teku_block.json "http://localhost:8000/classify"

The response is of the following form:

{
  "block_root": "0x421a91ebdb650671e552ce3491928d8f78e04c7c9cb75e885df90e1593ca54d6",
  "best_guess_single": "Teku",
  "best_guess_multi": "Teku",
  "probability_map": {
    "Lighthouse": 0.0,
    "Nimbus": 0.0,
    "Prysm": 0.0,
    "Teku": 1.0
  }
}
  • best_guess_single is the single client that the classifier deemed most likely to have proposed this block.
  • best_guess_multi is a list of 1-2 client guesses. If the classifier is more than 95% sure of a single client then the multi guess will be the same as best_guess_single. Otherwise it will be a string of the form "Lighthouse or Teku" with 2 clients in lexicographic order. 3 client splits are never returned.
  • probability_map is a map from each known client label to the probability that the given block was proposed by that client.

TODO

  • Improve the classification algorithm using better stats or machine learning (done, k-NN).
  • Decide on data representations and APIs for presenting data to a frontend (done).
  • Implement a web backend for the above API (done).
  • Polish and improve all of the above.
Owner
Sigma Prime
Blockchain & Information Security Services
Sigma Prime
Introduction to Databases Coursework 2 (SQL) - dataset generator

Introduction to Databases Coursework 2 (SQL) - dataset generator This is python script generates a text file with insert queries for the schema.sql fi

Javier Bosch 1 Nov 08, 2021
Annotates sequences with Eggnog-mapper and hhblits against PDB70

Annotating "hypothetical" proteins with the PDB See config/ for configuration information. This workflow takes as input a set of protein sequences. It

1 Apr 05, 2022
Python framework to build apps with the GASP metaphor

Gaspium Python framework to build apps with the GASP metaphor This project is part of the Pyrustic Open Ecosystem. Installation | Documentation | Late

5 Jan 01, 2023
Liquid Rocket Engine Cooling Simulation

Liquid Rocket Engine Cooling Simulation NASA CEA The implemented class calls NASA CEA via RocketCEA. INSTALL GUIDE In progress install instructions fo

John Salib 1 Jan 30, 2022
Better Giveaways is a bot that will change the experience of using a giveaway bot forever.

Better-Giveaways Better Giveaways is a bot that will change the experience of using a giveaway bot forever. VoxelBotUtils/Novus, latest PyPi releases

Lightning 2 Jan 12, 2022
Export transactions for an algorand wallet to a CSV file

algorand_txn_csv_exporter - (Algorand transaction CSV exporter) This script will export transactions for an algorand wallet to a CSV file. It is inten

TeneoPython01 5 Jun 19, 2022
pybind11 — Seamless operability between C++11 and Python

pybind11 — Seamless operability between C++11 and Python Setuptools example • Scikit-build example • CMake example pybind11 is a lightweight header-on

pybind 12.1k Jan 08, 2023
Traditionally, there is considerable friction for developers when setting up development environments

This self-led, half-day training will teach participants the patterns and best practices for working with GitHub Codespaces

CSE Labs at Spark 12 Dec 02, 2022
Drop-down terminal for GNOME

Guake 3 README Introduction Guake is a python based dropdown terminal made for the GNOME desktop environment. Guake's style of window is based on an F

Guake 4.1k Dec 25, 2022
The bidirectional mapping library for Python.

bidict The bidirectional mapping library for Python. Status bidict: has been used for many years by several teams at Google, Venmo, CERN, Bank of Amer

Joshua Bronson 1.2k Dec 31, 2022
PwnDatas-DB-Project(PDDP)

PwnDatas-DB-Project PwnDatas-DB-Project(PDDP) 安裝依賴: pip3 install pymediawiki 使用: cd /opt git https://github.com/JustYoomoon/PwnDatas-DB-Project.git c

21 Jul 16, 2021
Package pyVHR is a comprehensive framework for studying methods of pulse rate estimation relying on remote photoplethysmography (rPPG)

Package pyVHR (short for Python framework for Virtual Heart Rate) is a comprehensive framework for studying methods of pulse rate estimation relying on remote photoplethysmography (rPPG)

PHUSE Lab 261 Jan 03, 2023
personal dotfiles for rolling release linux distros

dotfiles Screenshots: Directions: Deploy my dotfiles with yadm Packages from arch listed in .installed-packages Information on osu! see ~/Games/osu!/.

-pacer- 0 Sep 18, 2022
A webdav demo using a virtual filesystem that serves a random status of whether a cat in a box is dead or alive.

A webdav demo using a virtual filesystem that serves a random status of whether a cat in a box is dead or alive.

Marshall Conover 2 Jan 12, 2022
RestMapper takes the pain out of integrating with RESTful APIs.

python-restmapper RestMapper takes the pain out of integrating with RESTful APIs. It removes all of the complexity with writing API-specific code, and

Lionheart Software 8 Oct 31, 2020
A small Blender addon for changing an object's local orientation while in edit mode

A small Blender addon for changing an object's local orientation while in edit mode.

Jonathan Lampel 50 Jan 06, 2023
Automated GitHub profile content using the USGS API, Plotly and GitHub Actions.

Top 20 Largest Earthquakes in the Past 24 Hours Location Mag Date and Time (UTC) 92 km SW of Sechura, Peru 5.2 11-05-2021 23:19:50 113 km NNE of Lobuj

Mr. Phantom 28 Oct 31, 2022
Find the remote website version based on a git repository

versionshaker Versionshaker is a tool to find a remote website version based on a git repository This tool will help you to find the website version o

Orange Cyberdefense 110 Oct 23, 2022
This is the code of Python enthusiasts collection and written.

I am Python's enthusiast, like to collect Python's programs and code.

cnzb 35 Apr 18, 2022
本仓库整理了腾讯视频、爱奇艺、优酷、哔哩哔哩等视频网站中,能够观看的「豆瓣电影 Top250 榜单」影片。

Where is top 250 movie ? 本仓库整理了腾讯视频、爱奇艺、优酷、哔哩哔哩等视频网站中,能够观看的「豆瓣电影 Top250 榜单」影片,点击 Badge 可跳转至相应的电影首页。

MayanDev 123 Dec 22, 2022