Collie is for uncovering RDMA NIC performance anomalies

Related tags

MiscellaneousCollie
Overview

Collie

Collie is for uncovering RDMA NIC performance anomalies.

Overview

Prerequisite

  • Two hosts with RDMA NICs.

    • Connected to the same switch is recommended since Collie currently does not take network(fabric) effect into consideration. But Collie should work once two hosts are connected and RDMA communication enabled.
  • Set up passwordless SSH login (e.g., ssh public/private keys login).

    • Collie currently uses passwordless SSH login to run traffic_engine on different hosts.
  • Google gflags and glog library installed.

    • Collie uses glog for logging and gflags for commandline flags processing.
  • Collie should supports all types of RDMA NICs and drivers that follow IB verbs specification, but currently we've only tested with Mellanox and Broadcom RNICs.

Quick Start

Environment Setup

  • Install prerequisites.
apt-get install -y libgflags-dev libgoogle-glog-dev
  • Setup passwordless SSH login.

Build Traffic Engine

  • Build the traffic engine without GPU and CUDA:
cd traffic_engine && make -j8
  • OR buidl the traffic engine that supports GPU Direct RDMA:
cd traffic_engine && GDR=1 make -j8

NOTICE: GDR is supported only for Tesla or Quadro GPUs according to GPUDirect RDMA.

Please refer to traffic_engine/README for more details.

How to Run: Arguments and Examples

Collie uses JSON configuration file to set parameters for a given RDMA subsystem.

  • Configuration Example: see ./example.json

    • username -- Collie uses SSH to run engines on different hosts, so it needs the username for login.
    • iplist -- the client IP and the server IP, given in a list.
    • logpath -- the logging path for Collie. Users can get detailed results of anomalies and the reproduce scripts for Collie here.
    • engine -- the path for traffic engine.
    • iters -- at most iters tests that Collie would run.
    • bars -- user's expected performance.
      • tx_pfc_bar -- TX (sent) PFC pause duration in us per second.
      • rx_pfc_bar -- RX (received) PFC pause duration in us per second.
      • bps_bar -- bits per second of the entire NIC.
      • pps_bar -- packets per second of the entire NIC.
  • Quick Run Example

python3 search/collie.py --config  ./example.json

Content

Collie consists of two components, the traffic engine and the search algorithms (the monitor is included as a part of search algorithm).

  • Traffic Engine (./traffic_engine)

    Traffic engine is an independent part that implemented in C/C++. Users can use the engine to generate flexible traffic of different patterns. See ./traffic_engine/README for more details and examples of complex traffic patterns.It is recommended to reproduce the anomalies (see Appendix of our NSDI paper) with the tool.

  • Search Algorithms (./search)

    Our simulated-annealing (SA) based algorithm and minimal feature set (MFS) are implemented in python scripts.

    • space.py -- the search space. Space defines the search space (upper/lower bounds, granularity for each parameter). Each Point has several Traffics (e.g., one A->B and one B->A). Each Traffic has two Endhost, one server and one client, as well as many other attributes that describe this traffic (e.g., QP type).
    • engine.py -- given a point, running collie_engine to set up the corresponding traffic described in the Point. If users need to set up traffics in different ways (rather than SSH), please modify the Engine class.
    • anneal.py -- the simulated-annealing based algorithm and minimal feature set algorithm are implemented here. If users need to modify the temperature and mutation logics, please modify here.
    • logger.py -- logging assistant functions for logging results and reproduce scripts.
    • bone.py -- monitor performance counters and collect statistic results based on vendor's tools.
    • hardware.py -- monitor diagnostic counters and collect statistic results based on vendor's tools. (Unfortunately currently diagnostic counters tools like NeoHost is not publicly available and open-sourced, so we only provide performance counter based code for NDA reasons.)
    • collie.py -- read user parameters and call SA to search.

Copyright

Collie is provided under the MIT license. See LICENSE for more details.

Owner
Bytedance Inc.
Bytedance Inc.
Fetch data from an excel file and create HTML file

excel-to-html Problem Statement! - Fetch data from excel file and create html file Excel.xlsx file contain the information.in multiple rows that is ne

Vivek Kashyap 1 Oct 25, 2021
Python version of RocketLeague-Dropshot-Calculated-shot

Python version of RocketLeague-Dropshot-Calculated-shot. This is just to demo around and a tool I used to develop the actual plugin.

JareBear 1 Jan 14, 2022
SHF TEST BACKEND

➰ SHF TEST BACKEND ➿ 🐙 Goals Dada una matriz de números enteros. Obtenga el elemento máximo en la matriz que produce la suma más pequeña al agregar t

Wilmer Rodríguez S 1 Dec 19, 2021
BMI-Calculator: Program to Calculate Body Mass Index (BMI)

The Body Mass Index (BMI) or Quetelet index is a value derived from the mass (weight) and height of an individual, male or female.

PyLaboratory 0 Feb 07, 2022
Async timeit - Async version of python's timeit

Async Timeit Replica of default python timeit module with small changes to allow

Raghava G Dhanya 3 Apr 13, 2022
A 3D Slicer Extension to view data from the flywheel heirarchy

flywheel-connect A 3D Slicer Extension to view, select, and download images from a Flywheel instance to 3D Slicer and storing Slicer outputs back to F

4 Nov 05, 2022
Opendrop - An open Apple AirDrop implementation written in Python

OpenDrop: an Open Source AirDrop Implementation OpenDrop is a command-line tool that allows sharing files between devices directly over Wi-Fi. Its uni

Secure Mobile Networking Lab 7.5k Jan 03, 2023
A simple tool made in Python language

Simple tool Uma simples ferramenta feita 100% em linguagem Python 💻 Requisitos: Python3 instalado em seu dispositivo Clonagem e acesso 📳 git clone h

josh washington 4 Dec 07, 2021
frida-based ceserver. iOS analysis is possible with Cheat Engine.

frida-ceserver frida-based ceserver. iOS analysis is possible with Cheat Engine. Original by Dark Byte. Usage Install frida on iOS. python main.py Cyd

KenjiroIchise 89 Jan 08, 2023
Urban Big Data Centre Housing Sensor Project

Housing Sensor Project The Urban Big Data Centre is conducting a study of indoor environmental data in Scottish houses. We are using Raspberry Pi devi

Jeremy Singer 2 Dec 13, 2021
Includes Chapters for Python Crash Course session.

python-crash-course Includes Chapters for Python Crash Course session. What will you learn: Python Essentials Creating Server Writing REST API Writing

Vineet Rao 3 Feb 17, 2021
OpenSea NFT API App using Python and Streamlit

opensea-nft-api-tutorial OpenSea NFT API App using Python and Streamlit Tutorial Video Walkthrough https://www.youtube.com/watch?v=49SupvcFC1M Instruc

64 Oct 28, 2022
A project for Perotti's MGIS350 for incorporating Flask

MGIS350_5 This is our project for Perotti's MGIS350 for incorporating Flask... RIT Dev Biz Apps Web Project A web-based Inventory system for company o

1 Nov 07, 2021
Automation of VASP DFT workflows with ASE - application scripts

This repo contains a library that aims at automatizing some Density Functional Theory (DFT) workflows in VASP by using the ASE toolkit.

Frank Niessen 5 Sep 06, 2022
Runs macOS on linux with qemu.

mac-on-linux-with-qemu Runs macOS on linux with qemu. Pre-requisites qemu-system-x86_64 dmg2img pulseaudio python[click] Usage After cloning the repos

Arindam Das 177 Dec 26, 2022
Reverse the infix string. Note that while reversing the string you must interchange left and right parentheses

Reverse the infix string. Note that while reversing the string you must interchange left and right parentheses. Obtain the postfix expression of the infix expression Step 1.Reverse the postfix expres

Sazzad Hossen 1 Jan 04, 2022
School helper, helps you at your pyllabus's.

pyllabus, helps you at your syllabus's... WARNING: It won't run without config.py! You should add config.py yourself, it will include your APIKEY. e.g

Ahmet Efe AKYAZI 6 Aug 07, 2022
redun aims to be a more expressive and efficient workflow framework

redun yet another redundant workflow engine redun aims to be a more expressive and efficient workflow framework, built on top of the popular Python pr

insitro 372 Jan 04, 2023
Anki cards generator for Leetcode

Leetcode Anki card generator Summary By running this script you'll be able to generate Anki cards with all the leetcode problems. I personally use it

Pavel Safronov 150 Dec 25, 2022
bamboo-engine 是一个通用的流程引擎,他可以解析,执行,调度由用户创建的流程任务,并提供了如暂停,撤销,跳过,强制失败,重试和重入等等灵活的控制能力和并行、子流程等进阶特性,并可通过水平扩展来进一步提升任务的并发处理能力。

bamboo-engine 是一个通用的流程引擎,他可以解析,执行,调度由用户创建的流程任务,并提供了如暂停,撤销,跳过,强制失败,重试和重入等等灵活的控制能力和并行、子流程等进阶特性,并可通过水平扩展来进一步提升任务的并发处理能力。 整体设计 Quick start 1. 安装依赖 2. 项目初始

腾讯蓝鲸 96 Dec 15, 2022