Fuzzing the Kernel Using Unicornafl and AFL++

Overview

Unicorefuzz

Build Status code-style: black

Fuzzing the Kernel using UnicornAFL and AFL++. For details, skim through the WOOT paper or watch this talk at CCCamp19.

Is it any good?

yes.

AFL Screenshot

Unicorefuzz Setup

  • Install python2 & python3 (ucf uses python3, however qemu/unicorn needs python2 to build)
  • Run ./setup.sh, preferrably inside a Virtualenv (else python deps will be installed using --user). During install, afl++ and uDdbg as well as python deps will be pulled and installed.
  • Enjoy ucf

Upgrading

When upgrading from an early version of ucf:

  • Unicorefuzz will notify you of config changes and new options automatically.
  • Alternatively, run ucf spec to output a commented config.py spec-like element.
  • probe_wrapper.py is now ucf attach.
  • harness.py is now named ucf emu.
  • The song remains the same.

Debug Kernel Setup (Skip this if you know how this works)

  • Create a qemu-img and install your preferred OS on there through qemu
  • An easy way to get a working userspace up and running in QEMU is to follow the steps described by syzkaller, namely create-image.sh
  • For kernel customization you might want to clone your preferred kernel version and compile it on the host. This way you can also compile your own kernel modules (e.g. example_module).
  • In order to find out the address of a loaded module in the guest OS you can use cat /proc/modules to find out the base address of the module location. Use this as the offset for the function where you want to break. If you specify MODULE and BREAK_OFFSET in the config.py, it should use ./get_mod_addr.sh to start it automated.
  • You can compile the kernel with debug info. When you have compiled the linux kernel you can start gdb from the kernel folder with gdb vmlinux. After having loaded other modules you can use the lx-symbols command in gdb to load the symbols for the other modules (make sure the .ko files of the modules are in your kernel folder). This way you can just use something like break function_to_break to set breakpoints for the required functions.
  • In order to compile a custom kernel for Arch, download the current Arch kernel and set the .config to the Arch default. Then set DEBUG_KERNEL=y, DEBUG_INFO=y, GDB_SCRIPTS=y (for convenience), KASAN=y, KASAN_EXTRA=y. For convenience, we added a working example_config that can be place to the linux dir.
  • To only get necessary kernel modules boot the current system and execute lsmod > mylsmod and copy the mylsmod file to your host system into the linux kernel folder that you downloaded. Then you can use make LSMOD=mylsmod localmodconfig to only make the kernel modules that are actually needed by the guest system. Then you can compile the kernel like normal with make. Then mount the guest file system to /mnt and use make modules_install INSTALL_MOD_PATH=/mnt. At last you have to create a new initramfs, which apparently has to be done on the guest system. Here use mkinitcpio -k <folder in /lib/modules/...> -g <where to put initramfs>. Then you just need to copy that back to the host and let qemu know where your kernel and the initramfs are located.
  • Setting breakpoints anywhere else is possible. For this, set BREAKADDR in the config.py instead.
  • For fancy debugging, ucf uses uDdbg
  • Before fuzzing, run sudo ./setaflops.sh to initialize your system for fuzzing.

Run

  • ensure a target gdbserver is reachable, for example via ./startvm.sh
  • adapt config.py:
    • provide the target's gdbserver network address in the config to the probe wrapper
    • provide the target's target function to the probe wrapper and harness
    • make the harness put AFL's input to the desired memory location by adopting the place_input func config.py
    • add all EXITs
  • start ucf attach, it will (try to) connect to gdb.
  • make the target execute the target function (by using it inside the vm)
  • after the breakpoint was hit, run ucf fuzz. Make sure afl++ is in the PATH. (Use ./resumeafl.sh to resume using the same input folder)

Putting afl's input to the correct location must be coded invididually for most targets. However with modern binary analysis frameworks like IDA or Ghidra it's possible to find the desired location's address.

The following place_input method places at the data section of sk_buff in key_extract:

    # read input into param xyz here:
    rdx = uc.reg_read(UC_X86_REG_RDX)
    utils.map_page(uc, rdx) # ensure sk_buf is mapped
    bufferPtr = struct.unpack("<Q",uc.mem_read(rdx + 0xd8, 8))[0]
    utils.map_page(uc, bufferPtr) # ensure the buffer is mapped
    uc.mem_write(rdx, input) # insert afl input
    uc.mem_write(rdx + 0xc4, b"\xdc\x05") # fix tail

QEMUing the Kernel

A few general pointers. When using ./startvm.sh, the VM can be debugged via gdb. Use

$gdb
>file ./linux/vmlinux
>target remote :1234

This dynamic method makes it rather easy to find out breakpoints and that can then be fed to config.py. On top, startvm.sh will forward port 22 (ssh) to 8022 - you can use it to ssh into the VM. This makes it easier to interact with it.

Debugging

You can step through the code, starting at the breakpoint, with any given input. The fancy debugging makes use of uDdbg. To do so, run ucf emu -d $inputfile. Possible inputs to the harness (the thing wrapping afl-unicorn) that help debugging:

-d flag loads the target inside the unicorn debugger (uDdbg) -t flag enables the afl-unicorn tracer. It prints every emulated instruction, as well as displays memory accesses.

Gotchas

A few things to consider.

FS_BASE and GS_BASE

Unicorn did not offer a way to directly set model specific registers directly. The forked unicornafl version of AFL++ finally supports it. Most ugly code of earlier versions was scrapped.

Improve Fuzzing Speed

Right now, the Unicorefuzz ucf attach harness might need to be manually restarted after an amount of pages has been allocated. Allocated pages should propagate back to the forkserver parent automatically but might still get reloaded from disk for each iteration.

IO/Printthings

It's generally a good idea to nop out kprintf or kernel printing functionality if possible, when the program is loaded into the emulator.

Troubleshooting

If you got trouble running unicorefuzz, follow these rulse, worst case feel free to reach out to us, for example to @domenuk on twitter. For some notes on debugging and developing ucf and afl-unicorn further, read DEVELOPMENT.md

Just won't start

Run the harness without afl (ucf emu -t ./sometestcase). Make sure you are not in a virtualenv or in the correct one. If this works but it still crashes in AFL, set AFL_DEBUG_CHILD_OUTPUT=1 to see some harness output while fuzzing.

All testcases time out

Make sure ucf attach is running, in the same folder, and breakpoint has been triggered.

Owner
Security in Telecommunications
The Computer Security Group at Berlin University of Technology
Security in Telecommunications
[NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning

SoCo [NeurIPS 2021 Spotlight] Aligning Pretraining for Detection via Object-Level Contrastive Learning By Fangyun Wei*, Yue Gao*, Zhirong Wu, Han Hu,

Yue Gao 139 Dec 14, 2022
🤗 Transformers: State-of-the-art Natural Language Processing for Pytorch, TensorFlow, and JAX.

English | 简体中文 | 繁體中文 | 한국어 State-of-the-art Natural Language Processing for Jax, PyTorch and TensorFlow 🤗 Transformers provides thousands of pretrai

Hugging Face 77.4k Jan 05, 2023
A Factor Model for Persistence in Investment Manager Performance

Factor-Model-Manager-Performance A Factor Model for Persistence in Investment Manager Performance I apply methods and processes similar to those used

Omid Arhami 1 Dec 01, 2021
[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation

[CVPR2021] Domain Consensus Clustering for Universal Domain Adaptation [Paper] Prerequisites To install requirements: pip install -r requirements.txt

Guangrui Li 84 Dec 26, 2022
Official page of Patchwork (RA-L'21 w/ IROS'21)

Patchwork Official page of "Patchwork: Concentric Zone-based Region-wise Ground Segmentation with Ground Likelihood Estimation Using a 3D LiDAR Sensor

Hyungtae Lim 254 Jan 05, 2023
[ICCV'21] NEAT: Neural Attention Fields for End-to-End Autonomous Driving

NEAT: Neural Attention Fields for End-to-End Autonomous Driving Paper | Supplementary | Video | Poster | Blog This repository is for the ICCV 2021 pap

254 Jan 02, 2023
Survival analysis in Python

What is survival analysis and why should I learn it? Survival analysis was originally developed and applied heavily by the actuarial and medical commu

Cameron Davidson-Pilon 2k Jan 08, 2023
Official PyTorch implementation of "Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient".

Edge Rewiring Goes Neural: Boosting Network Resilience via Policy Gradient This repository is the official PyTorch implementation of "Edge Rewiring Go

Shanchao Yang 4 Dec 12, 2022
LogAvgExp - Pytorch Implementation of LogAvgExp

LogAvgExp - Pytorch Implementation of LogAvgExp for Pytorch Install $ pip instal

Phil Wang 31 Oct 14, 2022
On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks

On Size-Oriented Long-Tailed Graph Classification of Graph Neural Networks We provide the code (in PyTorch) and datasets for our paper "On Size-Orient

Zemin Liu 4 Jun 18, 2022
使用yolov5训练自己数据集(详细过程)并通过flask部署

使用yolov5训练自己的数据集(详细过程)并通过flask部署 依赖库 torch torchvision numpy opencv-python lxml tqdm flask pillow tensorboard matplotlib pycocotools Windows,请使用 pycoc

HB.com 19 Dec 28, 2022
The Power of Scale for Parameter-Efficient Prompt Tuning

The Power of Scale for Parameter-Efficient Prompt Tuning Implementation of soft embeddings from https://arxiv.org/abs/2104.08691v1 using Pytorch and H

Kip Parker 208 Dec 30, 2022
This is the official implement of paper "ActionCLIP: A New Paradigm for Action Recognition"

This is an official pytorch implementation of ActionCLIP: A New Paradigm for Video Action Recognition [arXiv] Overview Content Prerequisites Data Prep

268 Jan 09, 2023
Contra is a lightweight, production ready Tensorflow alternative for solving time series prediction challenges with AI

Contra AI Engine A lightweight, production ready Tensorflow alternative developed by Styvio styvio.com » How to Use · Report Bug · Request Feature Tab

styvio 14 May 25, 2022
Code for Learning Manifold Patch-Based Representations of Man-Made Shapes, in ICLR 2021.

LearningPatches | Webpage | Paper | Video Learning Manifold Patch-Based Representations of Man-Made Shapes Dmitriy Smirnov, Mikhail Bessmeltsev, Justi

Dima Smirnov 22 Nov 14, 2022
Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

WECHSEL Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models. arXiv: https://arx

Institute of Computational Perception 45 Dec 29, 2022
Code release to accompany paper "Geometry-Aware Gradient Algorithms for Neural Architecture Search."

Geometry-Aware Gradient Algorithms for Neural Architecture Search This repository contains the code required to run the experiments for the DARTS sear

18 May 27, 2022
This repository contains the code for the CVPR 2020 paper "Differentiable Volumetric Rendering: Learning Implicit 3D Representations without 3D Supervision"

Differentiable Volumetric Rendering Paper | Supplementary | Spotlight Video | Blog Entry | Presentation | Interactive Slides | Project Page This repos

697 Jan 06, 2023
The challenge for Quantum Coalition Hackathon 2021

Qchack 2021 Google Challenge This is a challenge for the brave 2021 qchack.io participants. Instructions Hello, intrepid qchacker, welcome to the G|o

quantumlib 18 May 04, 2022
An implementation of Video Frame Interpolation via Adaptive Separable Convolution using PyTorch

This work has now been superseded by: https://github.com/sniklaus/revisiting-sepconv sepconv-slomo This is a reference implementation of Video Frame I

Simon Niklaus 984 Dec 16, 2022