A tool that automatically creates fuzzing harnesses based on a library

Overview

AutoHarness

Created by Akshat Parikh

What is this tool?

AutoHarness is a tool that automatically generates fuzzing harnesses for you. This idea stems from a concurrent problem in fuzzing codebases today: large codebases have thousands of functions and pieces of code that can be embedded fairly deep into the library. It is very hard or sometimes even impossible for smart fuzzers to reach that codepath. Even for large fuzzing projects such as oss-fuzz, there are still parts of the codebase that are not covered in fuzzing. Hence, this program tries to alleviate this problem in some capacity as well as provide a tool that security researchers can use to initially test a code base. This program only supports code bases which are coded in C and C++.

Setup/Demonstration

This program utilizes llvm and clang for libfuzzer, Codeql for finding functions, and python for the general program. This program was tested on Ubuntu 20.04 with llvm 12 and python 3. Here is the initial setup.

sudo apt-get update;
sudo apt-get install python3 python3-pip llvm-12* clang-12 git;
pip3 install pandas lief subprocess os argparse ast;

Follow the installation procedure for Codeql on https://github.com/github/codeql. Make sure to install the CLI tools and the libraries. For my testing, I have stored both the tools and libraries under one folder. Finally, clone this repository or download a release. Here is the program's output after running on nginx with the multiple argument mode set. This is the command I used.

python3 harness.py -L /home/akshat/nginx-1.21.0/objs/ -C /home/akshat/codeql-h/ -M 1 -O /home/akshat/autoharness/ -D nginx -G 1 -Y 1 -F "-I /home/akshat/nginx-1.21.0/objs -I /home/akshat/nginx-1.21.0/src/core -I /home/akshat/nginx-1.21.0/src/event -I /home/akshat/nginx-1.21.0/src/http -I /home/akshat/nginx-1.21.0/src/mail -I /home/akshat/nginx-1.21.0/src/misc -I /home/akshat/nginx-1.21.0/src/os -I /home/akshat/nginx-1.21.0/src/stream -I /home/akshat/nginx-1.21.0/src/os/unix" -X ngx_config.h,ngx_core.h

Results: image It is definitely possible to raise the success by further debugging the compilation and adding more header files and more. Note the nginx project does not have any shared objects after compiling. However, this program does have a feature that can convert PIE executables into shared libraries.

Planned Features (in order of progress)

  1. Struct Fuzzing

The current way implemented in the program to fuzz functions with multiple arguments is by using fuzzing data provider. There are some improvements to make in this integration; however, I believe I can incorporate this feature with data structures. A problem which I come across when coding this is with codeql and nested structs. It becomes especially hard without writing multiple queries which vary for every function. In short, this feature needs more work. I was also thinking about a simple solution using protobufs.

  1. Implementation Based Harness Creation

Using codeql, it is possible to use to generate a control flow graph that maps how the parameters in a function are initialized. Using that information, we can create a better harness. Another way is to look for implementations for the function that exist in the library and use that information to make an educated guess on an implementation of the function as a harness. The problems I currently have with this are generating the control flow graphs with codeql.

  1. Parallelized fuzzing/False Positive Detection

I can create a simple program that runs all the harnesses and picks up on any of the common false positives using ASAN. Also, I can create a new interface that runs all the harnesses at once and displays their statistics.

Contribution/Bugs

If you find any bugs with this program, please create an issue. I will try to come up with a fix. Also, if you have any ideas on any new features or how to implement performance upgrades or the current planned features, please create a pull request or an issue with the tag (contribution).

PSA

This tool generates some false positives. Please first analyze the crashes and see if it is valid bug or if it is just an implementation bug. Also, you can enable the debug mode if some functions are not compiling. This will help you understand if there are some header files that you are missing or any linkage issues. If the project you are working on does not have shared libraries but an executable, make sure to compile the executable in PIE form so that this program can convert it into a shared library.

References

  1. https://lief.quarkslab.com/doc/latest/tutorials/08_elf_bin2lib.html
Comments
  • Fuzzers never compile

    Fuzzers never compile

    Hello, I made it to the final step in your README:

    python3 harness.py -L /root/nginx/ -C /root/codeql/ -M 1 -O /root/autoharness/ -D nginx -G 1 -Y 1 -F "-I /root/nginx/objs -I /root/nginx/src/core -I /root/nginx/src/event -I /root/nginx/src/http -I /root/nginx/src/mail -I /root/nginx/src/misc -I /root/nginx/src/os -I /root/nginx/src/stream -I /root/nginx/src/os/unix" -X ngx_config.h,ngx_core.h

    and it appears to be doing something:

    Compiling query plan for /root/codeql/multiargfunc.ql.
    [1/1 comp 19.4s] Compiled /root/codeql/multiargfunc.ql.
    Starting evaluation of codeql-cpp/multiargfunc.ql.
    [1/1 eval 4.9s] Evaluation done; writing results to /root/autoharness/multiarg.bqrs.
    Shutting down query evaluator.
    

    however, it never builds any fuzzers, this is all that appears on screen:

                      f             g                                                  t
    0             _Exit          void                                              [int]
    1        __asprintf           int    [const char *__restrict__, char **__restrict__]
    2    __asprintf_chk           int  [int, const char *__restrict__, char **__restr...
    3        __bswap_16    __uint16_t                                       [__uint16_t]
    4        __bswap_32    __uint32_t                                       [__uint32_t]
    ..              ...           ...                                                ...
    812         waitpid       __pid_t                              [int, __pid_t, int *]
    813        wcstombs        size_t  [size_t, char *__restrict__, const wchar_t *__...
    814          wctomb           int                                  [char *, wchar_t]
    815           write       ssize_t                        [int, size_t, const void *]
    816          zError  const char *                                              [int]
    
    [817 rows x 3 columns]****
    

    Am using Ubuntu clang version 12.0.0-3ubuntu1~21.04.2 and have installed all of the requirements as mentioned in the README. Any tips would be much appreciated.

    opened by geeknik 11
  • Could not resolve type ...

    Could not resolve type ...

    Hi, The following command failed to generate harness.

    python3 harness.py -L /local/codeql-home/nginx-1.21.1/objs/ -C /local/codeql-home/codeql/ -M 1 -O /local/nginx/autoharness/ -D nginx -G 1 -Y 1 -F "-I /local/codeql-home/nginx-1.21.1/objs -I /local/codeql-home/nginx-1.21.1/src/core -I /local/codeql-home/nginx-1.21.1/src/event -I /local/codeql-home/nginx-1.21.1/src/http -I /local/codeql-home/nginx-1.21.1/src/mail -I /local/codeql-home/nginx-1.21.1/src/misc -I /local/codeql-home/nginx-1.21.1/src/os -I /local/codeql-home/nginx-1.21.1/src/stream -I /local/codeql-home/nginx-1.21.1/src/os/unix" -X ngx_config.h,ngx_core.h
    

    Error messages:

    Compiling query plan for /local/codeql-home/codeql/multiargfunc.ql.
    ERROR: Could not resolve module cpp. There should probably be a qlpack.yml file declaring dependencies in /local/codeql-home/codeql or an enclosing directory. (/local/codeql-home/codeql/multiargfunc.ql:1,8-11)
    ERROR: Could not resolve type Type (/local/codeql-home/codeql/multiargfunc.ql:3,1-5)
    ERROR: Could not resolve type Parameter (/local/codeql-home/codeql/multiargfunc.ql:3,30-39)
    ERROR: Could not resolve type PointerType (/local/codeql-home/codeql/multiargfunc.ql:6,40-51)
    ERROR: Could not resolve type Type (/local/codeql-home/codeql/multiargfunc.ql:9,1-5)
    ERROR: Could not resolve type Parameter (/local/codeql-home/codeql/multiargfunc.ql:9,27-36)
    ERROR: Could not resolve type PointerType (/local/codeql-home/codeql/multiargfunc.ql:10,65-76)
    ERROR: Could not resolve type Function (/local/codeql-home/codeql/multiargfunc.ql:13,6-14)
    ERROR: Could not resolve type Type (/local/codeql-home/codeql/multiargfunc.ql:13,18-22)
    ERROR: Could not resolve type Parameter (/local/codeql-home/codeql/multiargfunc.ql:14,18-27)
    ERROR: Could not resolve type Struct (/local/codeql-home/codeql/multiargfunc.ql:14,91-97)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:4,3-9)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:6,3-9)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:10,3-9)
    ERROR: 'result' cannot be used in this context (/local/codeql-home/codeql/multiargfunc.ql:10,47-53)
    
    opened by JerryWang304 10
  • Master

    Master

    Lots of changes, some that I can remember:

    • Factoring out code to create bash commands into command_builder.py.
    • Change from readelf to nm to only harness dynamic, defined, exported function symbols.
    • Ignore functions with void * or array arguments. Arrays may be supported later.
    • Get parameters in the right order.
    • Handle const.

    Note: only tested using mode=1 and detect=1 on libsodium, with other settings you'll likely still get lots of errors but let's fix those by creating and resolving issues.

    opened by Jelle-Nauta 0
  • Support array parameters

    Support array parameters

    Functions with array parameters are currently ignored because they would lead to code like:

    auto data1= provider.ConsumeIntegral<char[16]>();
    

    Proposal: handle array-parameters as a special case, e.g. generating code like:

    char data1[16];
    for (size_t idx = 0; idx < 16; ++idx) {
        data1[idx] = provider.ConsumeIntegral<char>();
    }
    

    There may be a more elegant way to do this, but I don't see it at the moment.

    opened by Jelle-Nauta 0
  • Create test suite

    Create test suite

    Currently there are many different combinations of settings (mode, detect) and library-properties (exported or not, defined or not, function or other symbols, etc.) and many of these probably lead to errors, e.g. through uncompilable harness code.

    Proposal: create a minimal set of libraries with a comprehensive set of symbols and properties, and a test workflow to verify that autoharness works - or find bugs that can then be addressed.

    opened by Jelle-Nauta 0
Releases(1.0)
  • 1.0(Jul 10, 2021)

    Initial Release of AutoHarness -added executable to shared object functionality -added automatic header detection or function reconstruction -added automatic fuzzing harness creation for one argument and multiple arguments

    Source code(tar.gz)
    Source code(zip)
We want to check several batch of web URLs (1~100 K) and find the phishing website/URL among them.

We want to check several batch of web URLs (1~100 K) and find the phishing website/URL among them. This module is designed to do the URL/web attestation by using the API from NUS-Phishperida-Project.

3 Dec 28, 2022
Build Xmas cards with user inputs

Automatically build Xmas cards with user inputs

Anand 9 Jan 25, 2022
This is a menu driven Railway Reservation Project which is mainly based on the python-mysql connectivity.

Online-Railway-Reservation-System This is a menu driven Railway Reservation Project which is mainly based on the python-mysql connectivity. The projec

Ananya Gupta 1 Jan 09, 2022
A scuffed remake of Kahoot... Made by Y9 and Y10 SHSB

A scuffed remake of Kahoot... Made by Y9 and Y10 SHSB

Tobiloba Kujore 3 Oct 28, 2022
Awesome Cheatsheet

Awesome Cheatsheet List of useful cheatsheets Inspired by @sindresorhus awesome and improved by these amazing contributors. If you see a link here is

detailyang 6.5k Jan 07, 2023
Read and write life sciences file formats

Python-bioformats is a Python wrapper for Bio-Formats, a standalone Java library for reading and writing life sciences image file formats. Bio-Formats

CellProfiler 106 Dec 19, 2022
A(Sync) Interface for internal Audible API written in pure Python.

Audible Audible is a Python low-level interface to communicate with the non-publicly Audible API. It enables Python developers to create there own Aud

mkb79 192 Jan 03, 2023
This repository contains various tools useful for offensive operations (reversing, etc) regarding the PE (Portable Executable) format

PE-Tools This repository contains various tools useful for offensive operations (reversing, etc) regarding the PE (Portable Executable) format Install

stark0de 4 Oct 13, 2022
Project repository of Apache Airflow, deployed on Docker in Amazon EC2 via GitLab.

Airflow on Docker in EC2 + GitLab's CI/CD Personal project for simple data pipeline using Airflow. Airflow will be installed inside Docker container,

Ammar Chalifah 13 Nov 29, 2022
Python module used to generate random facts

Randfacts is a python library that generates random facts. You can use randfacts.get_fact() to return a random fun fact. Disclaimer: Facts are not gua

Tabulate 14 Dec 14, 2022
Group P-11's submission for the University of Waterloo's 2021 Engineering Competition (Programming section).

P-11-WEC2021 Group P-11's submission for the University of Waterloo's 2021 Engineering Competition (Programming section). Part I Compute typing time f

TRISTAN PARRY 1 May 14, 2022
Cash in on Expressed Barcode Tags (EBTs) from NGS Sequencing Data with Python

Cash in on Expressed Barcode Tags (EBTs) from NGS Sequencing Data with Python Cashier is a tool developed by Russell Durrett for the analysis and extr

3 Sep 11, 2022
WordPress-style shortcodes for Python

Python Shortcodes WordPress-style shortcodes for Python Create and use WordPress-style shortcodes in your Python based app. Example # static output de

Bob 1 Dec 22, 2021
Flask html response minifier

Flask-HTMLmin Minify flask text/html mime type responses. Just add MINIFY_HTML = True to your deployment config to minify HTML and text responses of y

Hamid Feizabadi 85 Dec 07, 2022
API moment - LussovAPI

LussovAPI TL;DR: py API container, pip install -r requirements.txt, example, main configuration Long version: Install Dependancies Download file requi

William Pedersen 1 Nov 30, 2021
Class and mathematical functions for quaternion numbers.

Quaternions Class and mathematical functions for quaternion numbers. Installation Python This is a Python 3 module. If you don't have Python installed

3 Nov 08, 2022
This is a database of 180.000+ symbols containing Equities, ETFs, Funds, Indices, Futures, Options, Currencies, Cryptocurrencies and Money Markets.

Finance Database As a private investor, the sheer amount of information that can be found on the internet is rather daunting.

Jeroen Bouma 1.4k Dec 31, 2022
Repository for 2021 Computer Vision Class @ Chulalongkorn University

2110443 - Computer Vision (2021/2) Computer Vision @ Chulalongkorn University Anaconda Download Link https://www.anaconda.com/download/ Miniconda and

Chula PIC Lab 5 Jul 19, 2022
This is a fork of the BakeTool with some improvements that I did to have better workflow.

blender-bake-tool This is a fork of the BakeTool with some improvements that I did to have better workflow. 99.99% of work was done by BakeTool team.

Acvarium 3 Oct 04, 2022
Developed a website to analyze and generate report of students based on the curriculum that represents student’s academic performance.

Developed a website to analyze and generate report of students based on the curriculum that represents student’s academic performance. We have developed the system such that, it will automatically pa

VIJETA CHAVHAN 3 Nov 08, 2022