Collie is for uncovering RDMA NIC performance anomalies

Related tags

MiscellaneousCollie
Overview

Collie

Collie is for uncovering RDMA NIC performance anomalies.

Overview

Prerequisite

  • Two hosts with RDMA NICs.

    • Connected to the same switch is recommended since Collie currently does not take network(fabric) effect into consideration. But Collie should work once two hosts are connected and RDMA communication enabled.
  • Set up passwordless SSH login (e.g., ssh public/private keys login).

    • Collie currently uses passwordless SSH login to run traffic_engine on different hosts.
  • Google gflags and glog library installed.

    • Collie uses glog for logging and gflags for commandline flags processing.
  • Collie should supports all types of RDMA NICs and drivers that follow IB verbs specification, but currently we've only tested with Mellanox and Broadcom RNICs.

Quick Start

Environment Setup

  • Install prerequisites.
apt-get install -y libgflags-dev libgoogle-glog-dev
  • Setup passwordless SSH login.

Build Traffic Engine

  • Build the traffic engine without GPU and CUDA:
cd traffic_engine && make -j8
  • OR buidl the traffic engine that supports GPU Direct RDMA:
cd traffic_engine && GDR=1 make -j8

NOTICE: GDR is supported only for Tesla or Quadro GPUs according to GPUDirect RDMA.

Please refer to traffic_engine/README for more details.

How to Run: Arguments and Examples

Collie uses JSON configuration file to set parameters for a given RDMA subsystem.

  • Configuration Example: see ./example.json

    • username -- Collie uses SSH to run engines on different hosts, so it needs the username for login.
    • iplist -- the client IP and the server IP, given in a list.
    • logpath -- the logging path for Collie. Users can get detailed results of anomalies and the reproduce scripts for Collie here.
    • engine -- the path for traffic engine.
    • iters -- at most iters tests that Collie would run.
    • bars -- user's expected performance.
      • tx_pfc_bar -- TX (sent) PFC pause duration in us per second.
      • rx_pfc_bar -- RX (received) PFC pause duration in us per second.
      • bps_bar -- bits per second of the entire NIC.
      • pps_bar -- packets per second of the entire NIC.
  • Quick Run Example

python3 search/collie.py --config  ./example.json

Content

Collie consists of two components, the traffic engine and the search algorithms (the monitor is included as a part of search algorithm).

  • Traffic Engine (./traffic_engine)

    Traffic engine is an independent part that implemented in C/C++. Users can use the engine to generate flexible traffic of different patterns. See ./traffic_engine/README for more details and examples of complex traffic patterns.It is recommended to reproduce the anomalies (see Appendix of our NSDI paper) with the tool.

  • Search Algorithms (./search)

    Our simulated-annealing (SA) based algorithm and minimal feature set (MFS) are implemented in python scripts.

    • space.py -- the search space. Space defines the search space (upper/lower bounds, granularity for each parameter). Each Point has several Traffics (e.g., one A->B and one B->A). Each Traffic has two Endhost, one server and one client, as well as many other attributes that describe this traffic (e.g., QP type).
    • engine.py -- given a point, running collie_engine to set up the corresponding traffic described in the Point. If users need to set up traffics in different ways (rather than SSH), please modify the Engine class.
    • anneal.py -- the simulated-annealing based algorithm and minimal feature set algorithm are implemented here. If users need to modify the temperature and mutation logics, please modify here.
    • logger.py -- logging assistant functions for logging results and reproduce scripts.
    • bone.py -- monitor performance counters and collect statistic results based on vendor's tools.
    • hardware.py -- monitor diagnostic counters and collect statistic results based on vendor's tools. (Unfortunately currently diagnostic counters tools like NeoHost is not publicly available and open-sourced, so we only provide performance counter based code for NDA reasons.)
    • collie.py -- read user parameters and call SA to search.

Copyright

Collie is provided under the MIT license. See LICENSE for more details.

Owner
Bytedance Inc.
Bytedance Inc.
🔵Open many google dorks in a fasted way

Dorkinho 🔵 The author is not responsible for misuse of the tool, use it in good practices like Pentest and CTF OSINT challenges. Dorkinho is a script

SidHawks 2 May 02, 2022
Python script for diving image data to train test and val

dataset-division-to-train-val-test-python python script for dividing image data to train test and val If you have an image dataset in the following st

Muhammad Zeeshan 1 Nov 14, 2022
Flexible constructor to create dynamic list of heterogeneous properties for some kind of entity

Flexible constructor to create dynamic list of heterogeneous properties for some kind of entity. This set of helpers useful to create properties like contacts or attributes for describe car/computer/

Django Stars 24 Jul 21, 2022
Manjaro CN Repository

Manjaro CN Repository Automatically built packages based on archlinuxcn/repo and manjarocn/docker. Install Add manjarocn to /etc/pacman.conf: Please m

Manjaro CN 28 Jun 26, 2022
A student information management system in Python

Student-information-management-system 本项目是一个学生信息管理系统,这个项目是用Python语言实现的,也实现了图形化界面的显示,同时也实现了管理员端,学生端两个登陆入口,同时底层使用的是Redis做的数据持久化。 This project is a stude

liuyunfei 7 Nov 15, 2022
qecsim is a Python 3 package for simulating quantum error correction using stabilizer codes.

qecsim qecsim is a Python 3 package for simulating quantum error correction using stabilizer codes.

44 Dec 20, 2022
Tenda D151 & D301 - Unauthenticated configuration download

Exploit Title: Tenda D151 & D301 - Unauthenticated configuration download (login included)

Ayoub 3 Jul 14, 2022
Write complicated anonymous functions other than lambdas in Python.

lambdex allows you to write multi-line anonymous function expression (called a lambdex) in an idiomatic manner.

Xie Jingyi 71 May 19, 2022
Tools for dos (denial-of-service) website / web server

DoS Attack Tools Tools for dos (denial-of-service) website / web server di buat olah NurvySec How to install on debian / ubuntu $ apt update $ apt ins

nurvy 1 Feb 10, 2022
NBT-Project: This is a APP for building NBT's

NBT-Project This is an APP for building NBT's When using this you select a box on kit maker You input the name and enchant in there related boxes Then

1 Jan 21, 2022
Exercise to teach a newcomer to the CLSP grid to set up their environment and run jobs

Exercise to teach a newcomer to the CLSP grid to set up their environment and run jobs

Alexandra 2 May 18, 2022
API Rate Limit Decorator

ratelimit APIs are a very common way to interact with web services. As the need to consume data grows, so does the number of API calls necessary to re

Tomas Basham 574 Dec 26, 2022
p5 is a Python package based on the core ideas of Processing.

p5 p5 is a Python library that provides high level drawing functionality to help you quickly create simulations and interactive art using Python. It c

p5py 645 Jan 04, 2023
京东自动入会获取京豆

京东入会领京豆 要求 有一定的电脑知识 or 有耐心爱折腾 需要Chrome(推荐)、Edge(Chromium)、Firefox 操作系统需是Mac(本人没在m1上测试)、Linux(在deepin上测试过)、Windows 安装方法 脚本采用Selenium遍历京东入会有礼界面,由于遍历了200

Vanke Anton 500 Dec 22, 2022
Some usefull scripts for the Nastran's 145 solution (Flutter Analysis) using the pyNastran package.

nastran-aero-flutter This project is intended to analyse the Supersonic Panel Flutter using the NASTRAN software. The project uses the pyNastran and t

zuckberj 11 Nov 16, 2022
CBLang is a programming language aiming to fix most of my problems with Python

CBLang A bad programming language made in Python. CBLang is a programming language aiming to fix most of my problems with Python (this means that you

Chadderbox 43 Dec 22, 2022
Write a program that works out whether if a given year is a leap year

Leap Year 💪 This is a Difficult Challenge 💪 Instructions Write a program that works out whether if a given year is a leap year. A normal year has 36

Rodrigo Santos 0 Jun 22, 2022
Hacktoberfest2021 🥳- Contribute Any Pattern In Any Language😎 Every PR will be accepted Pls contribute

✨ Hacktober Fest 2021 ✨ 🙂 All Contributors are requested to star this repo and follow me for a successful merge of pull request. 🙂 👉 Add any patter

Md. Almas Ali 103 Jan 07, 2023
An application to see if your Ethereum staking validator(s) are members of the current or next post-Altair sync committees.

eth_sync_committee.py Since the Altair upgrade, 512 validators are randomly chosen every 256 epochs (~27 hours) to form a sync committee. Validators i

4 Oct 27, 2022
XAC HID Gamepad implementation for CircuitPython 7 or above.

CircuitPython_XAC_Gamepad Setup process Install CircuitPython 7 or above in your board. Add the init.py file under \lib\adafruit_hid directory of CIRC

5 Dec 19, 2022