Collie is for uncovering RDMA NIC performance anomalies

Related tags

MiscellaneousCollie
Overview

Collie

Collie is for uncovering RDMA NIC performance anomalies.

Overview

Prerequisite

  • Two hosts with RDMA NICs.

    • Connected to the same switch is recommended since Collie currently does not take network(fabric) effect into consideration. But Collie should work once two hosts are connected and RDMA communication enabled.
  • Set up passwordless SSH login (e.g., ssh public/private keys login).

    • Collie currently uses passwordless SSH login to run traffic_engine on different hosts.
  • Google gflags and glog library installed.

    • Collie uses glog for logging and gflags for commandline flags processing.
  • Collie should supports all types of RDMA NICs and drivers that follow IB verbs specification, but currently we've only tested with Mellanox and Broadcom RNICs.

Quick Start

Environment Setup

  • Install prerequisites.
apt-get install -y libgflags-dev libgoogle-glog-dev
  • Setup passwordless SSH login.

Build Traffic Engine

  • Build the traffic engine without GPU and CUDA:
cd traffic_engine && make -j8
  • OR buidl the traffic engine that supports GPU Direct RDMA:
cd traffic_engine && GDR=1 make -j8

NOTICE: GDR is supported only for Tesla or Quadro GPUs according to GPUDirect RDMA.

Please refer to traffic_engine/README for more details.

How to Run: Arguments and Examples

Collie uses JSON configuration file to set parameters for a given RDMA subsystem.

  • Configuration Example: see ./example.json

    • username -- Collie uses SSH to run engines on different hosts, so it needs the username for login.
    • iplist -- the client IP and the server IP, given in a list.
    • logpath -- the logging path for Collie. Users can get detailed results of anomalies and the reproduce scripts for Collie here.
    • engine -- the path for traffic engine.
    • iters -- at most iters tests that Collie would run.
    • bars -- user's expected performance.
      • tx_pfc_bar -- TX (sent) PFC pause duration in us per second.
      • rx_pfc_bar -- RX (received) PFC pause duration in us per second.
      • bps_bar -- bits per second of the entire NIC.
      • pps_bar -- packets per second of the entire NIC.
  • Quick Run Example

python3 search/collie.py --config  ./example.json

Content

Collie consists of two components, the traffic engine and the search algorithms (the monitor is included as a part of search algorithm).

  • Traffic Engine (./traffic_engine)

    Traffic engine is an independent part that implemented in C/C++. Users can use the engine to generate flexible traffic of different patterns. See ./traffic_engine/README for more details and examples of complex traffic patterns.It is recommended to reproduce the anomalies (see Appendix of our NSDI paper) with the tool.

  • Search Algorithms (./search)

    Our simulated-annealing (SA) based algorithm and minimal feature set (MFS) are implemented in python scripts.

    • space.py -- the search space. Space defines the search space (upper/lower bounds, granularity for each parameter). Each Point has several Traffics (e.g., one A->B and one B->A). Each Traffic has two Endhost, one server and one client, as well as many other attributes that describe this traffic (e.g., QP type).
    • engine.py -- given a point, running collie_engine to set up the corresponding traffic described in the Point. If users need to set up traffics in different ways (rather than SSH), please modify the Engine class.
    • anneal.py -- the simulated-annealing based algorithm and minimal feature set algorithm are implemented here. If users need to modify the temperature and mutation logics, please modify here.
    • logger.py -- logging assistant functions for logging results and reproduce scripts.
    • bone.py -- monitor performance counters and collect statistic results based on vendor's tools.
    • hardware.py -- monitor diagnostic counters and collect statistic results based on vendor's tools. (Unfortunately currently diagnostic counters tools like NeoHost is not publicly available and open-sourced, so we only provide performance counter based code for NDA reasons.)
    • collie.py -- read user parameters and call SA to search.

Copyright

Collie is provided under the MIT license. See LICENSE for more details.

Owner
Bytedance Inc.
Bytedance Inc.
A simple chatbot that I made for school project

Chatbot: Python A simple chatbot that I made for school Project. Tho this chatbot is dumb sometimes, but it's not too bad lol. Check it Out! FAQ How t

Prashant 2 Nov 13, 2021
Package to provide translation methods for pyramid, and means to reload translations without stopping the application

Package to provide translation methods for pyramid, and means to reload translations without stopping the application

Grzegorz Śliwiński 4 Nov 20, 2022
A project to work with databases in 4 worksheets, insert, update, select, delete using Python and MySqI

A project to work with databases in 4 worksheets, insert, update, select, delete using Python and MySqI As a small project for school or college hope it is useful

Sina Org 1 Jan 11, 2022
This is a menu driven Railway Reservation Project which is mainly based on the python-mysql connectivity.

Online-Railway-Reservation-System This is a menu driven Railway Reservation Project which is mainly based on the python-mysql connectivity. The projec

Ananya Gupta 1 Jan 09, 2022
Usando Multi Player Perceptron e Regressão Logistica para classificação de SPAM

Relatório dos procedimentos executados e resultados obtidos. Objetivos Treinar um modelo para classificação de SPAM usando o dataset train_data. Class

André Mediote 1 Feb 02, 2022
Framework To Ease Operating with Quantum Computers

QType Framework To Ease Operating with Quantum Computers Concept # define an array of 15 cubits:

Antonio Párraga Navarro 2 Jun 06, 2022
Mannaggia is a python application to praise or more likely to curse the saints

Mannaggia-py 👼 Remember Mannaggia? This is a Python remake of it, with new features. mannaggia is a python application to praise or more likely to cu

Christian Visintin 9 Aug 12, 2022
🦋 hundun is a python library for the exploration of chaos.

hundun hundun is a python library for the exploration of chaos. Please note that this library is in beta phase. Example Import the package's equation

kosh 7 Nov 07, 2022
📽 Streamlit application powered by a PyScaffold project setup

streamlit-demo Streamlit application powered by a PyScaffold project setup. Work in progress: The idea of this repo is to demonstrate how to package a

PyScaffold 2 Oct 10, 2022
A collection of convenient parsers for Advent of Code problems.

Advent of Code Parsers A collection of convenient Python parsers for Advent of Code problems. Installation pip install aocp Quickstart You can import

Miguel Blanco Marcos 3 Dec 13, 2021
🔩 Like builtins, but boltons. 250+ constructs, recipes, and snippets which extend (and rely on nothing but) the Python standard library. Nothing like Michael Bolton.

Boltons boltons should be builtins. Boltons is a set of over 230 BSD-licensed, pure-Python utilities in the same spirit as — and yet conspicuously mis

Mahmoud Hashemi 6k Jan 06, 2023
Create N Share is a No Code solution which gives users the ability to create any type of feature rich survey forms with ease.

create n share Note : The Project Scaffold will be pushed soon. Create N Share is a No Code solution which gives users the ability to create any type

Chiraag Kakar 11 Dec 03, 2022
A python program for rick rolling people.

Rickware A python program for rick rolling people. (And annoying them too) What is rick roll? Read this wikipedia article - Rickrolling About program

2 Jan 18, 2022
Open source stenotype engine

Plover Bringing stenography to everyone. Homepage Releases Wiki Blog Google Group Discord Chat About Installation Getting help Contributing Donations

Open Steno Project 2k Jan 09, 2023
To effectively detect the faulty wafers

wafer_fault_detection Aim of the project: In electronics, a wafer (also called a slice or substrate) is a thin slice of semiconductor, such as crystal

Arun Singh Babal 1 Nov 06, 2021
E5自动续期

AutoApi v6.3 (2021-2-18) ———— E5自动续期 AutoApi系列: AutoApi(v1.0) 、 AutoApiSecret(v2.0) 、 AutoApiSR(v3.0) 、 AutoApiS(v4.0) 、 AutoApiP(v5.0) 说明 E5自动续期程序,但是

34 Feb 20, 2021
importlib_resources is a backport of Python standard library importlib.resources module for older Pythons.

importlib_resources is a backport of Python standard library importlib.resources module for older Pythons. The key goal of this module is to replace p

Python 36 Dec 13, 2022
A Python3 script to decode an encoded VBScript file, often seen with a .vbe file extension

vbe-decoder.py Decode one or multiple encoded VBScript files, often seen with a .vbe file extension. Usage usage: vbe-decoder.py [-h] [-o output] file

John Hammond 147 Nov 15, 2022
Simple Denial of Service Program yang di bikin menggunakan bahasa pemograman Python,

Peringatan Tujuan kami share code Indo-DoS hanya untuk bertujuan edukasi / pembelajaran! Dilarang memperjual belikan source ini / memperjual-belikan s

SonLyte 8 Nov 07, 2021
Python project setup, updater, and launcher

Launcher Python project setup, updater, and launcher Purpose: Increase project productivity and provide features easily. Once installed as a git submo

DAAV, LLC 1 Jan 07, 2022