Block fingerprinting for the beacon chain, for client identification & client diversity metrics

Overview

blockprint

This is a repository for discussion and development of tools for Ethereum block fingerprinting.

The primary aim is to measure beacon chain client diversity using on-chain data, as described in this tweet:

https://twitter.com/sproulM_/status/1440512518242197516

The latest estimate using the improved k-NN classifier for slots 2048001 to 2164916 is:

Getting Started

The raw data for block fingerprinting needs to be sourced from Lighthouse's block_rewards API.

This is a new API that is currently only available on the block-rewards-api branch, i.e. this pull request: https://github.com/sigp/lighthouse/pull/2628

Lighthouse can be built from source by following the instructions here.

VirtualEnv

All Python commands should be run from a virtualenv with the dependencies from requirements.txt installed.

python3 -m venv venv
source venv/bin/activate
pip install -r requirements.txt

k-NN Classifier

The best classifier implemented so far is a k-nearest neighbours classifier in knn_classifier.py.

It requires a directory of structered training data to run, and can be used either via a small API server, or in batch mode.

You can download a large (886M) training data set here.

To run in batch mode against a directory of JSON batches (individual files downloaded from LH), use this command:

./knn_classifier.py training_data_proc data_to_classify

Expected output is:

classifier score: 0.9886800869904645
classifying rewards from file slot_2048001_to_2050048.json
total blocks processed: 2032
Lighthouse,0.2072
Nimbus or Prysm,0.002
Nimbus or Teku,0.0025
Prysm,0.6339
Prysm or Teku,0.0241
Teku,0.1304

Training the Classifier

The classifier is trained from a directory of reward batches. You can fetch batches with the load_blocks.py script by providing a start slot, end slot and output directory:

./load_blocks.py 2048001 2048032 testdata

The directory testdata now contains 1 or more files of the form slot_X_to_Y.json downloaded from Lighthouse.

To train the classifier on this data, use the prepare_training_data.py script:

./prepare_training_data.py testdata testdata_proc

This will read files from testdata and write the graffiti-classified training data to testdata_proc, which is structured as directories of single block reward files for each client.

$ tree testdata_proc
testdata_proc
├── Lighthouse
│   ├── 0x03ae60212c73bc2d09dd3a7269f042782ab0c7a64e8202c316cbcaf62f42b942.json
│   └── 0x5e0872a64ea6165e87bc7e698795cb3928484e01ffdb49ebaa5b95e20bdb392c.json
├── Nimbus
│   └── 0x0a90585b2a2572305db37ef332cb3cbb768eba08ad1396f82b795876359fc8fb.json
├── Prysm
│   └── 0x0a16c9a66800bd65d997db19669439281764d541ca89c15a4a10fc1782d94b1c.json
└── Teku
    ├── 0x09d60a130334aa3b9b669bf588396a007e9192de002ce66f55e5a28309b9d0d3.json
    ├── 0x421a91ebdb650671e552ce3491928d8f78e04c7c9cb75e885df90e1593ca54d6.json
    └── 0x7fedb0da9699c93ce66966555c6719e1159ae7b3220c7053a08c8f50e2f3f56f.json

You can then use this directory as the first argument to ./knn_classifier.py.

Classifier API

With pre-processed training data installed in ./training_data_proc, you can host a classification API server like this:

gunicorn --reload api_server --timeout 1800

It will take a few minutes to start-up while it loads all of the training data into memory.

Initialising classifier, this could take a moment...
Start-up complete, classifier score is 0.9886800869904645

Once it has started up, you can make POST requests to the /classify endpoint containing a single JSON-encoded block reward. There is an example input file in examples.

curl -s -X POST -H "Content-Type: application/json" --data @examples/single_teku_block.json "http://localhost:8000/classify"

The response is of the following form:

{
  "block_root": "0x421a91ebdb650671e552ce3491928d8f78e04c7c9cb75e885df90e1593ca54d6",
  "best_guess_single": "Teku",
  "best_guess_multi": "Teku",
  "probability_map": {
    "Lighthouse": 0.0,
    "Nimbus": 0.0,
    "Prysm": 0.0,
    "Teku": 1.0
  }
}
  • best_guess_single is the single client that the classifier deemed most likely to have proposed this block.
  • best_guess_multi is a list of 1-2 client guesses. If the classifier is more than 95% sure of a single client then the multi guess will be the same as best_guess_single. Otherwise it will be a string of the form "Lighthouse or Teku" with 2 clients in lexicographic order. 3 client splits are never returned.
  • probability_map is a map from each known client label to the probability that the given block was proposed by that client.

TODO

  • Improve the classification algorithm using better stats or machine learning (done, k-NN).
  • Decide on data representations and APIs for presenting data to a frontend (done).
  • Implement a web backend for the above API (done).
  • Polish and improve all of the above.
Owner
Sigma Prime
Blockchain & Information Security Services
Sigma Prime
Comprehensive Python Cheatsheet

Comprehensive Python Cheatsheet

Jure Šorn 31.3k Dec 30, 2022
Solutions for the Advent of Code 2021 event.

About 📋 This repository holds all of the solution code for the Advent of Code 2021 event. All solutions are done in Python 3.9.9 and done in non-real

robert yin 0 Mar 21, 2022
YourX: URL Clusterer With Python

YourX | URL Clusterer Screenshots Instructions for running Install requirements

ARPSyndicate 1 Mar 11, 2022
Let's pretend you want to create a AWS Lambda project called "sns-processor".

Usage Let's pretend you want to create a AWS Lambda project called "sns-processor". Rather than using lambda and then editing the results to include y

1 Dec 31, 2021
Prints values and types during compilation!

Compile-Time Printer Compile-Time Printer prints values and types at compile-time in C++. Teaser test.cpp compile-time-printer

43 Dec 26, 2022
OTP-Bomber - An otp from MPL ID app, which can be spammed

OTP-Bomber An otp from MPL ID app, which can be spammed Note: Only available on

5 Oct 29, 2022
The Python agent for Apache SkyWalking

SkyWalking Python Agent SkyWalking-Python: The Python Agent for Apache SkyWalking, which provides the native tracing abilities for Python project. Sky

The Apache Software Foundation 149 Dec 12, 2022
This project intends to take the user's CEP (brazilian adress code) and return the local in which the CEP is placed.

This project aims to simply return the CEP's (the brazilian resident adress code) User of the application. The project uses a request and passes on to

Daniel Soares Saldanha 4 Nov 17, 2021
API for SpeechAnalytics integration with FreePBX/Asterisk

freepbx_speechanalytics_api API for SpeechAnalytics integration with FreePBX/Asterisk Скопировать файл settings.py.sample в settings.py и отредактиров

Iqtek, LLC 3 Nov 03, 2022
Repository to store sample python programs for python learning

py Repository to store sample Python programs. This repository is meant for beginners to assist them in their learning of Python. The repository cover

codebasics 5.8k Dec 30, 2022
A deployer and package manager for OceanBase open-source software.

OceanBase Deploy OceanBase Deploy (简称 OBD)是 OceanBase 开源软件的安装部署工具。OBD 同时也是包管理器,可以用来管理 OceanBase 所有的开源软件。本文介绍如何安装 OBD、使用 OBD 和 OBD 的命令。 安装 OBD 您可以使用以下方

OceanBase 59 Dec 27, 2022
A python module for DeSo

DeSo.py A python package for DeSo. Developed by ItsAditya Run pip install deso to install the module! Examples of How To Use DeSo.py Getting $DeSo pri

ItsAditya 0 Jun 30, 2022
Ontario-Covid-Screening - An automated Covid-19 School Screening Tool for Ontario

Ontario-Covid19-Screening An automated Covid-19 School Screening Tool for Ontari

Rayan K 0 Feb 20, 2022
A Klipper plugin for accurate Z homing

Stable Z Homing for Klipper A Klipper plugin for accurate Z homing This plugin provides a new G-code command, STABLE_Z_HOME, which homes Z repeatedly

Matthew Lloyd 24 Dec 28, 2022
One-stop-shop for docs and test coverage of dbt projects.

dbt-coverage One-stop-shop for docs and test coverage of dbt projects. Why do I need something like this? dbt-coverage is to dbt what coverage.py and

Slido 106 Dec 27, 2022
[draft] tools for schnetpack

schnetkit some tooling for schnetpack EXPERIMENTAL/IN DEVELOPMENT DO NOT USE This is an early draft of some infrastructure built around schnetpack. In

Marcel 1 Nov 08, 2021
VacationCycleLogicBackEnd - Vacation Cycle Logic BackEnd With Python

Vacation Cycle Logic BackEnd Getting Started Existing virtualenv If your project

Mohamed Gamal 0 Jan 03, 2022
MinimalGearDisplay, Assetto Corsa app

MinimalGearDisplay MinimalGearDisplay, Assetto Corsa app. Just displays the current gear you are on. Download and Install To use this app, clone or do

1 Jan 10, 2022
A (hopefully) considerably copious collection of classical cipher crackers

ClassicalCipherCracker A (hopefully) considerably copious collection of classical cipher crackers Written in Python3 (and run with PyPy) TODOs Write a

Stanley Zhong 2 Feb 22, 2022
Material de apoio da oficina de SAST apresentada pelo CAIS no Webinar de 28/05/21.

CAIS-CAIS Conjunto de Aplicações Intencionamente Sem-Vergonha do CAIS Material didático do Webinar "EP1. Oficina - Práticas de análise estática de cód

Fausto Filho 14 Jul 25, 2022