Collie is for uncovering RDMA NIC performance anomalies

Related tags

MiscellaneousCollie
Overview

Collie

Collie is for uncovering RDMA NIC performance anomalies.

Overview

Prerequisite

  • Two hosts with RDMA NICs.

    • Connected to the same switch is recommended since Collie currently does not take network(fabric) effect into consideration. But Collie should work once two hosts are connected and RDMA communication enabled.
  • Set up passwordless SSH login (e.g., ssh public/private keys login).

    • Collie currently uses passwordless SSH login to run traffic_engine on different hosts.
  • Google gflags and glog library installed.

    • Collie uses glog for logging and gflags for commandline flags processing.
  • Collie should supports all types of RDMA NICs and drivers that follow IB verbs specification, but currently we've only tested with Mellanox and Broadcom RNICs.

Quick Start

Environment Setup

  • Install prerequisites.
apt-get install -y libgflags-dev libgoogle-glog-dev
  • Setup passwordless SSH login.

Build Traffic Engine

  • Build the traffic engine without GPU and CUDA:
cd traffic_engine && make -j8
  • OR buidl the traffic engine that supports GPU Direct RDMA:
cd traffic_engine && GDR=1 make -j8

NOTICE: GDR is supported only for Tesla or Quadro GPUs according to GPUDirect RDMA.

Please refer to traffic_engine/README for more details.

How to Run: Arguments and Examples

Collie uses JSON configuration file to set parameters for a given RDMA subsystem.

  • Configuration Example: see ./example.json

    • username -- Collie uses SSH to run engines on different hosts, so it needs the username for login.
    • iplist -- the client IP and the server IP, given in a list.
    • logpath -- the logging path for Collie. Users can get detailed results of anomalies and the reproduce scripts for Collie here.
    • engine -- the path for traffic engine.
    • iters -- at most iters tests that Collie would run.
    • bars -- user's expected performance.
      • tx_pfc_bar -- TX (sent) PFC pause duration in us per second.
      • rx_pfc_bar -- RX (received) PFC pause duration in us per second.
      • bps_bar -- bits per second of the entire NIC.
      • pps_bar -- packets per second of the entire NIC.
  • Quick Run Example

python3 search/collie.py --config  ./example.json

Content

Collie consists of two components, the traffic engine and the search algorithms (the monitor is included as a part of search algorithm).

  • Traffic Engine (./traffic_engine)

    Traffic engine is an independent part that implemented in C/C++. Users can use the engine to generate flexible traffic of different patterns. See ./traffic_engine/README for more details and examples of complex traffic patterns.It is recommended to reproduce the anomalies (see Appendix of our NSDI paper) with the tool.

  • Search Algorithms (./search)

    Our simulated-annealing (SA) based algorithm and minimal feature set (MFS) are implemented in python scripts.

    • space.py -- the search space. Space defines the search space (upper/lower bounds, granularity for each parameter). Each Point has several Traffics (e.g., one A->B and one B->A). Each Traffic has two Endhost, one server and one client, as well as many other attributes that describe this traffic (e.g., QP type).
    • engine.py -- given a point, running collie_engine to set up the corresponding traffic described in the Point. If users need to set up traffics in different ways (rather than SSH), please modify the Engine class.
    • anneal.py -- the simulated-annealing based algorithm and minimal feature set algorithm are implemented here. If users need to modify the temperature and mutation logics, please modify here.
    • logger.py -- logging assistant functions for logging results and reproduce scripts.
    • bone.py -- monitor performance counters and collect statistic results based on vendor's tools.
    • hardware.py -- monitor diagnostic counters and collect statistic results based on vendor's tools. (Unfortunately currently diagnostic counters tools like NeoHost is not publicly available and open-sourced, so we only provide performance counter based code for NDA reasons.)
    • collie.py -- read user parameters and call SA to search.

Copyright

Collie is provided under the MIT license. See LICENSE for more details.

Owner
Bytedance Inc.
Bytedance Inc.
Runs macOS on linux with qemu.

mac-on-linux-with-qemu Runs macOS on linux with qemu. Pre-requisites qemu-system-x86_64 dmg2img pulseaudio python[click] Usage After cloning the repos

Arindam Das 177 Dec 26, 2022
In the works, creating a new Chess Board and way to Play...

sWJz4Chess date started on github.com 11-13-2021 In the works, creating a new Chess Board and way to Play... starting to write this in Pygame, any ind

Shawn 2 Nov 18, 2021
Python script which allows for automatic registration in Golfbox

Python script which allows for automatic registration in Golfbox

Guðni Þór Björnsson 8 Dec 04, 2021
Skip spotify ads by automatically restarting application when ad comes

SpotiByeAds No one likes interruptions! Don't you hate it when you're listening to your favorite jazz track or your EDM playlist and an ad for Old Spi

Partho 287 Dec 29, 2022
NeurIPS'19: Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting (Pytorch implementation for noisy labels).

Meta-Weight-Net NeurIPS'19: Meta-Weight-Net: Learning an Explicit Mapping For Sample Weighting (Official Pytorch implementation for noisy labels). The

243 Jan 03, 2023
Declarative and extensible library for configuration & code separation

ClassyConf ClassyConf is the configuration architecture solution for perfectionists with deadlines. It provides a declarative way to define settings f

83 Dec 07, 2022
Slimbook Battery 4 is the new version with new features that improves battery control and increases battery duration in laptops.

Slimbookbattery Slimbook Battery 4 is the new version with new features that improves battery control and increases battery duration in laptops. This

SLIMBOOK TEAM 128 Dec 28, 2022
A python script that automatically joins a zoom meeting based on your timetable.

Zoom Automation A python script that automatically joins a zoom meeting based on your timetable. What does it do? It performs the following processes:

Shourya Gupta 3 Jan 01, 2022
Paxos in Python, tested with Jepsen

Python implementation of Multi-Paxos with a stable leader and reconfiguration, roughly following "Paxos Made Moderately Complex". Run python3 paxos/st

A. Jesse Jiryu Davis 25 Dec 15, 2022
A bot to use in a pump & dump event

A bot to use in a pump & dump event on Binance.com. Please note the bot is in heavy devleopment currently so be aware of errors. If you experience err

Freddie Jonas 189 Dec 24, 2022
A python package template that can be adapted for RAP projects

Warning - this repository is a snapshot of a repository internal to NHS Digital. This means that links to videos and some URLs may not work. Repositor

NHS Digital 3 Nov 08, 2022
Step by step development of a vending coffee machine project, including tkinter, sqlite3, simulation, etc.

Step by step development of a vending coffee machine project, including tkinter, sqlite3, simulation, etc.

Nikolaos Avouris 2 Dec 05, 2021
Now you'll never be late for your Webinars or Meetings on the GoToWebinar Platform

GoToWebinar Launcher : Now you'll never be late for your Webinars or Meetings on the GoToWebinar Platform About Are you popular for always being late

Jay Thorat 6 Jun 07, 2022
Python data loader for Solar Orbiter's (SolO) Energetic Particle Detector (EPD).

Data loader (and downloader) for Solar Orbiter/EPD energetic charged particle sensors EPT, HET, and STEP. Supports level 2 and low latency data provided by ESA's Solar Orbiter Archive.

Jan Gieseler 9 Dec 16, 2022
Rock 💎 Paper 📝 Scissors ✂️ Lizard 🦎 Spock 🖖

Rock 💎 Paper 📝 Scissors ✂️ Lizard 🦎 Spock 🖖 If you’ve seen The Big Bang Theory, you’ve heard of a game called “Rock, Paper, Scissors, Lizard, Spoc

AmirHossein Mohammadi 16 Jun 19, 2022
a sketch of what a zkvm could look like

We want to build a ZKP that validates an entire EVM block or as much of it as we can efficiently. Its okay to adjust the gas costs for every EVM opcode. Its also to exclude some opcodes for now if th

25 Dec 30, 2022
AlexaUsingPython - Alexa will pay attention to your order, as: Hello Alexa, play music, Hello Alexa

AlexaUsingPython - Alexa will pay attention to your order, as: Hello Alexa, play music, Hello Alexa, what's the time? Alexa will pay attention to your order, get it, and afterward do some activity as

Abubakar Sattar 10 Aug 18, 2022
A tool to help the Poly copy-reading process! :D

PolyBot A tool to help the Poly copy-reading process! :D Let's face it-computers are better are repeatitive tasks. And, in spite of what one may want

1 Jan 10, 2022
A small script I made that takes any standard Decklist of magic the gathering cards and pulls all card images from scryfall at once!

A small script I made that takes any standard Decklist of magic the gathering cards and pulls all card images from scryfall at once!

15 Aug 26, 2022
Building an Investment Portfolio for Day Trade with Python

Montando um Portfólio de Investimentos para Day Trade com Python Instruções: Para reproduzir o projeto no Google Colab, faça o download do repositório

Paula Campigotto 9 Oct 26, 2021