Ankou: Guiding Grey-box Fuzzing towards Combinatorial Difference

Related tags

Deep Learningfuzzer
Overview

Ankou

Ankou is a source-based grey-box fuzzer. It intends to use a more rich fitness function by going beyond simple branch coverage and considering the combination of branches during program execution. The details of the technique can be found in our paper "Ankou: Guiding Grey-box Fuzzing towards Combinatorial Difference", which is published in ICSE 2020.

Dependencies.

Go

Ankou is written solely in Go and thus requires its installation. Be sure to configure this GOPATH environment variable, for example to ~/go directory.

AFL

Ankou relies on AFL instrumentation: fuzzed targets needs to compiled using afl-gcc or afl-clang. To install AFL:

wget http://lcamtuf.coredump.cx/afl/releases/afl-latest.tgz
tar xf afl-latest.tgz
cd afl-2.52b
make
# The last command is optional, but you'll need to provide the absolute path to
# the compiler in the configure step below if you don't install AFL compiler.
sudo make install

GDB

For the triaging gdb is required, and ASLR needs to be deactivated:

sudo echo 0 | sudo tee /proc/sys/kernel/randomize_va_space

Note that when using docker containers, this needs to be run in the host.

Installation

Once Go and AFL are installed, you can get Ankou by:

go get github.com/SoftSec-KAIST/Ankou   # Clone Ankou and its dependencies
go build github.com/SoftSec-KAIST/Ankou # Compile Ankou
Note: If getting Ankou from another location, this needs to be done manually:
mkdir -p $GOPATH/src/github.com/SoftSec-KAIST
cd $GOPATH/src/github.com/SoftSec-KAIST
git clone REPO  # By default REPO is https://github.com/SoftSec-KAIST/Ankou
cd Ankou
go get .    # Get dependencies
go build .  # Compile

Usage

Now we are ready to fuzz. We first to compile any target we want with afl-gcc or afl-clang. Let's take the classical starting example for fuzzing, binutils:

wget https://mirror.ibcp.fr/pub/gnu/binutils/binutils-2.33.1.tar.xz
tar xf binutils-2.33.1.tar.xz
cd binutils-2.33.1
CC=afl-gcc CXX=afl-g++ ./configure --prefix=`pwd`/install
make -j
make install

Now we are ready to run Ankou:

cd install/bin
mkdir seeds; cp elfedit seeds/ # Put anything in the seeds folder.
go run github.com/SoftSec-KAIST/Ankou -app ./readelf -args "-a @@" -i seeds -o out
# Or use the binary we compiled above:
/path/to/Ankou -app ./readelf -args "-a @@" -i seeds -o out

Evaluation Reproduction

Once Ankou is installed, in order to reproduce the Ankou evaluation:

  1. Compile the 24 packages mentioned in the paper at the same version or commit using afl-gcc. All the packages' source can be found with the same version used in Ankou evaluation at https://github.com/SoftSec-KAIST/Ankou-Benchmark. Additionnally, this repository includes the seeds used to initialize the evalution fuzzing campaigns.
  2. Run the produced subjects with the commands found in benchmark/configuration.json. benchmark/rq1_rq3.json only contains the 24 subjets used for Research Question 1 and 3 of the paper.
  3. Analyze Ankou output directory for results. Crashes are listed in $OUTPUT_DIR/crashes-* and found seeds in $OUTPUT_DIR/seeds-*. Statistics of the fuzzing campaign can be found in the $OUTPUT_DIR/status* directory CSV files. The edge_n value of receiver.csv represents the branch coverage. And the execN column of seed_manager.csv represents the total number of test cases executed so far. Divide it by the time column to obtain the throughout.

There are too many programs in our benchmark, so we will use only one package in this example: cflow.

  1. Compilation.
git clone https://github.com/SoftSec-KAIST/Ankou-Benchmark
cd Ankou-Benchmark
tar xf seeds.tar.xz
cd sources
tar xf cflow-1.6.tar.xz
cd cflow-1.6
CC=afl-gcc CXX=afl-g++ ./configure --prefix=`pwd`/build
make -j
make install
cd ../../..
  1. Preparation of the fuzzing campaign.
mkdir fuzzrun
cp Ankou-Benchmark/sources/cflow-1.6/build/bin/cflow fuzzrun
cp -r Ankou-Benchmark/seeds/cflow fuzzrun/seeds
  1. Run the campaign. The above starts a 24 hours fuzzing campaign. The '-dur' option can be adjusted, or Ankou interrupted earlier. In this version of cflow, and initialized with these seeds, a crash should be found in less than an hour.
cd fuzzrun
go run github.com/SoftSec-KAIST/Ankou -app cflow -args "-o /dev/null @@" \
    -i seeds -threads 1 -o cflow_out -dur 24h
  1. Results analysis
cd cflow_out/status_*
# Print the final branch coverage:
python -c "print(open('receiver.csv').readlines()[-1].split(',')[0])"
# Print the overall throughput:
python -c "last = open('seed_manager.csv').readlines()[-1].split(','); print(float(last[5])/int(last[6]))"
# Print effectiveness of the dynamic PCA (see RQ2):
python -c "last = open('receiver.csv').readlines()[-1].split(','); print('{}%'.format(100-100*float(last[2])/float(last[1])))"

Safe Stack Hash Triaging

Once the environment is setup, the scripts works in two steps:

  1. Run the binary on the crashing input to produce a core file. Using ulimit -c unlimited ensures the core to be dumped.
  2. Use the scripts in the triage folder of this repository:
cd $GOPATH/src/github.com/SoftSec-KAIST/Ankou/triage
gdb -x triage.py -x triage.gdb -batch -c /path/to/core /path/to/binary
cat hash.txt # The stack hashes are found in this text file.
Owner
SoftSec Lab
SoftSec Lab @ KAIST
SoftSec Lab
Punctuation Restoration using Transformer Models for High-and Low-Resource Languages

Punctuation Restoration using Transformer Models This repository contins official implementation of the paper Punctuation Restoration using Transforme

Tanvirul Alam 142 Jan 01, 2023
PyG (PyTorch Geometric) - A library built upon PyTorch to easily write and train Graph Neural Networks (GNNs)

PyG (PyTorch Geometric) is a library built upon PyTorch to easily write and train Graph Neural Networks (GNNs) for a wide range of applications related to structured data.

PyG 16.5k Jan 08, 2023
Generative Models for Graph-Based Protein Design

Graph-Based Protein Design This repo contains code for Generative Models for Graph-Based Protein Design by John Ingraham, Vikas Garg, Regina Barzilay

John Ingraham 159 Dec 15, 2022
DenseNet Implementation in Keras with ImageNet Pretrained Models

DenseNet-Keras with ImageNet Pretrained Models This is an Keras implementation of DenseNet with ImageNet pretrained weights. The weights are converted

Felix Yu 568 Oct 31, 2022
NVIDIA container runtime

nvidia-container-runtime A modified version of runc adding a custom pre-start hook to all containers. If environment variable NVIDIA_VISIBLE_DEVICES i

NVIDIA Corporation 938 Jan 06, 2023
Exploring Simple Siamese Representation Learning

G-SimSiam A PyTorch implementation which refers to repo for the paper Exploring Simple Siamese Representation Learning by Xinlei Chen & Kaiming He Add

zhuyun 1 Dec 19, 2021
Development Kit for the SoccerNet Challenge

SoccerNetv2-DevKit Welcome to the SoccerNet-V2 Development Kit for the SoccerNet Benchmark and Challenge. This kit is meant as a help to get started w

Silvio Giancola 117 Dec 30, 2022
A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

Orchard Dataset This repository contains the code used for generating the Orchard Dataset, as seen in the Multi-Hierarchical Reasoning in Sequences: S

Bill Pung 1 Jun 05, 2022
This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting.

GAN Memory for Lifelong learning This is a pytorch implementation of the NeurIPS paper GAN Memory with No Forgetting. Please consider citing our paper

Miaoyun Zhao 43 Dec 27, 2022
Curvlearn, a Tensorflow based non-Euclidean deep learning framework.

English | 简体中文 Why Non-Euclidean Geometry Considering these simple graph structures shown below. Nodes with same color has 2-hop distance whereas 1-ho

Alibaba 123 Dec 12, 2022
PyTorch Lightning + Hydra. A feature-rich template for rapid, scalable and reproducible ML experimentation with best practices. ⚡🔥⚡

Lightning-Hydra-Template A clean and scalable template to kickstart your deep learning project 🚀 ⚡ 🔥 Click on Use this template to initialize new re

Łukasz Zalewski 2.1k Jan 09, 2023
Fuzzing tool (TFuzz): a fuzzing tool based on program transformation

T-Fuzz T-Fuzz consists of 2 components: Fuzzing tool (TFuzz): a fuzzing tool based on program transformation Crash Analyzer (CrashAnalyzer): a tool th

HexHive 244 Nov 09, 2022
TorchFlare is a simple, beginner-friendly, and easy-to-use PyTorch Framework train your models effortlessly.

TorchFlare TorchFlare is a simple, beginner-friendly and an easy-to-use PyTorch Framework train your models without much effort. It provides an almost

Atharva Phatak 85 Dec 26, 2022
Clinica is a software platform for clinical research studies involving patients with neurological and psychiatric diseases and the acquisition of multimodal data

Clinica Software platform for clinical neuroimaging studies Homepage | Documentation | Paper | Forum | See also: AD-ML, AD-DL ClinicaDL About The Proj

ARAMIS Lab 165 Dec 29, 2022
Hso-groupie - A pwnable challenge in Real World CTF 4th

Hso-groupie - A pwnable challenge in Real World CTF 4th

Riatre Foo 42 Dec 05, 2022
Poisson Surface Reconstruction for LiDAR Odometry and Mapping

Poisson Surface Reconstruction for LiDAR Odometry and Mapping Surfels TSDF Our Approach Table: Qualitative comparison between the different mapping te

Photogrammetry & Robotics Bonn 305 Dec 21, 2022
BERT model training impelmentation using 1024 A100 GPUs for MLPerf Training v1.1

Pre-trained checkpoint and bert config json file Location of checkpoint and bert config json file This MLCommons members Google Drive location contain

SAIT (Samsung Advanced Institute of Technology) 12 Apr 27, 2022
Code for CVPR2021 paper "Learning Salient Boundary Feature for Anchor-free Temporal Action Localization"

AFSD: Learning Salient Boundary Feature for Anchor-free Temporal Action Localization This is an official implementation in PyTorch of AFSD. Our paper

Tencent YouTu Research 146 Dec 24, 2022
Medical Insurance Cost Prediction using Machine earning

Medical-Insurance-Cost-Prediction-using-Machine-learning - Here in this project, I will use regression analysis to predict medical insurance cost for people in different regions, and based on several

1 Dec 27, 2021
Deduplicating Training Data Makes Language Models Better

Deduplicating Training Data Makes Language Models Better This repository contains code to deduplicate language model datasets as descrbed in the paper

Google Research 431 Dec 27, 2022