Fuzzing the Kernel Using Unicornafl and AFL++

Overview

Unicorefuzz

Build Status code-style: black

Fuzzing the Kernel using UnicornAFL and AFL++. For details, skim through the WOOT paper or watch this talk at CCCamp19.

Is it any good?

yes.

AFL Screenshot

Unicorefuzz Setup

  • Install python2 & python3 (ucf uses python3, however qemu/unicorn needs python2 to build)
  • Run ./setup.sh, preferrably inside a Virtualenv (else python deps will be installed using --user). During install, afl++ and uDdbg as well as python deps will be pulled and installed.
  • Enjoy ucf

Upgrading

When upgrading from an early version of ucf:

  • Unicorefuzz will notify you of config changes and new options automatically.
  • Alternatively, run ucf spec to output a commented config.py spec-like element.
  • probe_wrapper.py is now ucf attach.
  • harness.py is now named ucf emu.
  • The song remains the same.

Debug Kernel Setup (Skip this if you know how this works)

  • Create a qemu-img and install your preferred OS on there through qemu
  • An easy way to get a working userspace up and running in QEMU is to follow the steps described by syzkaller, namely create-image.sh
  • For kernel customization you might want to clone your preferred kernel version and compile it on the host. This way you can also compile your own kernel modules (e.g. example_module).
  • In order to find out the address of a loaded module in the guest OS you can use cat /proc/modules to find out the base address of the module location. Use this as the offset for the function where you want to break. If you specify MODULE and BREAK_OFFSET in the config.py, it should use ./get_mod_addr.sh to start it automated.
  • You can compile the kernel with debug info. When you have compiled the linux kernel you can start gdb from the kernel folder with gdb vmlinux. After having loaded other modules you can use the lx-symbols command in gdb to load the symbols for the other modules (make sure the .ko files of the modules are in your kernel folder). This way you can just use something like break function_to_break to set breakpoints for the required functions.
  • In order to compile a custom kernel for Arch, download the current Arch kernel and set the .config to the Arch default. Then set DEBUG_KERNEL=y, DEBUG_INFO=y, GDB_SCRIPTS=y (for convenience), KASAN=y, KASAN_EXTRA=y. For convenience, we added a working example_config that can be place to the linux dir.
  • To only get necessary kernel modules boot the current system and execute lsmod > mylsmod and copy the mylsmod file to your host system into the linux kernel folder that you downloaded. Then you can use make LSMOD=mylsmod localmodconfig to only make the kernel modules that are actually needed by the guest system. Then you can compile the kernel like normal with make. Then mount the guest file system to /mnt and use make modules_install INSTALL_MOD_PATH=/mnt. At last you have to create a new initramfs, which apparently has to be done on the guest system. Here use mkinitcpio -k <folder in /lib/modules/...> -g <where to put initramfs>. Then you just need to copy that back to the host and let qemu know where your kernel and the initramfs are located.
  • Setting breakpoints anywhere else is possible. For this, set BREAKADDR in the config.py instead.
  • For fancy debugging, ucf uses uDdbg
  • Before fuzzing, run sudo ./setaflops.sh to initialize your system for fuzzing.

Run

  • ensure a target gdbserver is reachable, for example via ./startvm.sh
  • adapt config.py:
    • provide the target's gdbserver network address in the config to the probe wrapper
    • provide the target's target function to the probe wrapper and harness
    • make the harness put AFL's input to the desired memory location by adopting the place_input func config.py
    • add all EXITs
  • start ucf attach, it will (try to) connect to gdb.
  • make the target execute the target function (by using it inside the vm)
  • after the breakpoint was hit, run ucf fuzz. Make sure afl++ is in the PATH. (Use ./resumeafl.sh to resume using the same input folder)

Putting afl's input to the correct location must be coded invididually for most targets. However with modern binary analysis frameworks like IDA or Ghidra it's possible to find the desired location's address.

The following place_input method places at the data section of sk_buff in key_extract:

    # read input into param xyz here:
    rdx = uc.reg_read(UC_X86_REG_RDX)
    utils.map_page(uc, rdx) # ensure sk_buf is mapped
    bufferPtr = struct.unpack("<Q",uc.mem_read(rdx + 0xd8, 8))[0]
    utils.map_page(uc, bufferPtr) # ensure the buffer is mapped
    uc.mem_write(rdx, input) # insert afl input
    uc.mem_write(rdx + 0xc4, b"\xdc\x05") # fix tail

QEMUing the Kernel

A few general pointers. When using ./startvm.sh, the VM can be debugged via gdb. Use

$gdb
>file ./linux/vmlinux
>target remote :1234

This dynamic method makes it rather easy to find out breakpoints and that can then be fed to config.py. On top, startvm.sh will forward port 22 (ssh) to 8022 - you can use it to ssh into the VM. This makes it easier to interact with it.

Debugging

You can step through the code, starting at the breakpoint, with any given input. The fancy debugging makes use of uDdbg. To do so, run ucf emu -d $inputfile. Possible inputs to the harness (the thing wrapping afl-unicorn) that help debugging:

-d flag loads the target inside the unicorn debugger (uDdbg) -t flag enables the afl-unicorn tracer. It prints every emulated instruction, as well as displays memory accesses.

Gotchas

A few things to consider.

FS_BASE and GS_BASE

Unicorn did not offer a way to directly set model specific registers directly. The forked unicornafl version of AFL++ finally supports it. Most ugly code of earlier versions was scrapped.

Improve Fuzzing Speed

Right now, the Unicorefuzz ucf attach harness might need to be manually restarted after an amount of pages has been allocated. Allocated pages should propagate back to the forkserver parent automatically but might still get reloaded from disk for each iteration.

IO/Printthings

It's generally a good idea to nop out kprintf or kernel printing functionality if possible, when the program is loaded into the emulator.

Troubleshooting

If you got trouble running unicorefuzz, follow these rulse, worst case feel free to reach out to us, for example to @domenuk on twitter. For some notes on debugging and developing ucf and afl-unicorn further, read DEVELOPMENT.md

Just won't start

Run the harness without afl (ucf emu -t ./sometestcase). Make sure you are not in a virtualenv or in the correct one. If this works but it still crashes in AFL, set AFL_DEBUG_CHILD_OUTPUT=1 to see some harness output while fuzzing.

All testcases time out

Make sure ucf attach is running, in the same folder, and breakpoint has been triggered.

Owner
Security in Telecommunications
The Computer Security Group at Berlin University of Technology
Security in Telecommunications
Source code for paper: Knowledge Inheritance for Pre-trained Language Models

Knowledge-Inheritance Source code paper: Knowledge Inheritance for Pre-trained Language Models (preprint). The trained model parameters (in Fairseq fo

THUNLP 31 Nov 19, 2022
Tensor-based approaches for fMRI classification

tensor-fmri Using tensor-based approaches to classify fMRI data from StarPLUS. Citation If you use any code in this repository, please cite the follow

4 Sep 07, 2022
PyTorch implementation of neural style randomization for data augmentation

README Augment training images for deep neural networks by randomizing their visual style, as described in our paper: https://arxiv.org/abs/1809.05375

84 Nov 23, 2022
Predicting path with preference based on user demonstration using Maximum Entropy Deep Inverse Reinforcement Learning in a continuous environment

Preference-Planning-Deep-IRL Introduction Check my portfolio post Dependencies Gym stable-baselines3 PyTorch Usage Take Demonstration python3 record.

Tianyu Li 9 Oct 26, 2022
Facebook Research 605 Jan 02, 2023
Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation

Grad2Task: Improved Few-shot Text Classification Using Gradients for Task Representation Prerequisites This repo is built upon a local copy of transfo

Jixuan Wang 10 Sep 28, 2022
Event sourced bank - A wide-and-shallow example using the Python event sourcing library

Event Sourced Bank A "wide but shallow" example of using the Python event sourci

3 Mar 09, 2022
How to Learn a Domain Adaptive Event Simulator? ACM MM, 2021

LETGAN How to Learn a Domain Adaptive Event Simulator? ACM MM 2021 Running Environment: pytorch=1.4, 1 NVIDIA-1080TI. More details can be found in pap

CVTEAM 4 Sep 20, 2022
2021 Artificial Intelligence Diabetes Datathon

A.I.D.D. 2021 2021 Artificial Intelligence Diabetes Datathon A.I.D.D. 2021은 ‘2021 인공지능 학습용 데이터 구축사업’을 통해 만들어진 학습용 데이터를 활용하여 당뇨병을 효과적으로 예측할 수 있는가에 대한 A

2 Dec 27, 2021
PartImageNet is a large, high-quality dataset with part segmentation annotations

PartImageNet: A Large, High-Quality Dataset of Parts We will release our dataset and scripts soon after cleaning and approval. Introduction PartImageN

Ju He 77 Nov 30, 2022
Code for the ICCV'21 paper "Context-aware Scene Graph Generation with Seq2Seq Transformers"

ICCV'21 Context-aware Scene Graph Generation with Seq2Seq Transformers Authors: Yichao Lu*, Himanshu Rai*, Cheng Chang*, Boris Knyazev†, Guangwei Yu,

Layer6 Labs 37 Dec 18, 2022
Implementation of the Remixer Block from the Remixer paper, in Pytorch

Remixer - Pytorch Implementation of the Remixer Block from the Remixer paper, in Pytorch. It claims that substituting the feedforwards in transformers

Phil Wang 35 Aug 23, 2022
DA2Lite is an automated model compression toolkit for PyTorch.

DA2Lite (Deep Architecture to Lite) is a toolkit to compress and accelerate deep network models. ⭐ Star us on GitHub — it helps!! Frameworks & Librari

Sinhan Kang 7 Mar 22, 2022
[NeurIPS 2021] Introspective Distillation for Robust Question Answering

Introspective Distillation (IntroD) This repository is the Pytorch implementation of our paper "Introspective Distillation for Robust Question Answeri

Yulei Niu 13 Jul 26, 2022
An implementation of EWC with PyTorch

EWC.pytorch An implementation of Elastic Weight Consolidation (EWC), proposed in James Kirkpatrick et al. Overcoming catastrophic forgetting in neural

Ryuichiro Hataya 166 Dec 22, 2022
a pytorch implementation of auto-punctuation learned character by character

Learning Auto-Punctuation by Reading Engadget Articles Link to Other of my work 🌟 Deep Learning Notes: A collection of my notes going from basic mult

Ge Yang 137 Nov 09, 2022
Live training loss plot in Jupyter Notebook for Keras, PyTorch and others

livelossplot Don't train deep learning models blindfolded! Be impatient and look at each epoch of your training! (RECENT CHANGES, EXAMPLES IN COLAB, A

Piotr Migdał 1.2k Jan 08, 2023
【steal piano】GitHub偷情分析工具!

【steal piano】GitHub偷情分析工具! 你是否有这样的困扰,有一天你的仓库被很多人加了star,但是你却不知道这些人都是从哪来的? 别担心,GitHub偷情分析工具帮你轻松解决问题! 原理 GitHub偷情分析工具透过分析star的时间以及他们之间的follow关系,可以推测出每个st

黄巍 442 Dec 21, 2022
A annotation of yolov5-5.0

代码版本:0714 commit #4000 $ git clone https://github.com/ultralytics/yolov5 $ cd yolov5 $ git checkout 720aaa65c8873c0d87df09e3c1c14f3581d4ea61 这个代码只是注释版

Laughing 229 Dec 17, 2022
LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation

LEDNet: A Lightweight Encoder-Decoder Network for Real-time Semantic Segmentation Table of Contents: Introduction Project Structure Installation Datas

Yu Wang 492 Dec 02, 2022