Baseline is a cross-platform library and command-line utility that creates file-oriented baselines of your systems.

Overview

Baselining, on steroids!

Baseline is a cross-platform library and command-line utility that creates file-oriented baselines of your systems.

The project aims to offer an open-source alternative to the famous NSRL or HashSets and allows you to generate baselines from your own systems. Plus, it is cross-platform, so you can use it the same way whether on a Windows or a GNU/Linux system.

Currently available extractors:

  • fs: Extracts filesystem-related metadata.
  • hash: Computes several hashes from the entry's data (e.g. MD5, SHA-1, ssdeep).
  • pe: Extracts detailed information from Portable Executable (PE) files.

Table of contents

Installation

Installing from PyPI

Baseline is currently not available on PyPI. The main reason is that the name is currently taken by another project.

Installing from source

Since Baseline uses Poetry as its packaging toolkit of choice, so installing it from source is as simple as:

git clone httpe://github.com/sk4la/baseline.git
cd baseline

python3 -m pip install poetry
python3 -m poetry install

Precompiled binaries

Precompiled binaries are available in the Releases section.

Docker

Baseline is also available as a Docker image.

To pull the latest image from Docker Hub:

docker pull sk4la/baseline

See the official Docker documentation for details on how to install and use it.

Usage

The help menu for the baseline command-line utility:

Usage: baseline 
   
     COMMAND [ARGS]...

  Command-line utility that creates file-oriented baselines.

Options:
  --ensure-administrator          Ensure that the current user has administrative privileges (i.e.
                                  is `root` or equivalent on GNU/Linux systems).
  --log-file 
    
                    Set the log file path (e.g. 'baseline.log').  [default:
                                  /home/sk4la/baseline/20211031110323.25756fb1b706.log]
  --logging-configuration 
     
        Set the logging configuration using a custom file. The file must
                                  adhere to the official specification. See
                                  https://docs.python.org/3/library/logging.config.html#logging-
                                  config-dictschema for more details.
  --monochrome                    Disable console output coloring. This can be useful when piping
                                  the output to a log file.
  -v, --verbose                   Increase the logging verbosity. Supports up to 4 occurrences of
                                  the same option (e.g. -vvvv).  [0<=x<=4]
  --version                       Show the version and exit.
  --help                          Show this message and exit.

Commands:
  new     Creates a new filesystem-based baseline.
  schema  Show the JSON representation of the actual schema.

     
    
   

The help menu for the baseline new subcommand:

Usage: baseline new 
    
    
     ...

  Creates a new filesystem-based baseline.

Options:
  --comment 
     
                       Add an arbitrary comment to the generated output file.
  --exclude-directory 
      
       ...
                                  Exclude a specific directory from the baseline. Can be specified
                                  multiple times (e.g. `--exclude-directory /dev --exclude-
                                  directory /proc`).
  --exclude-extractor [hash|pe|fs]
                                  Exclude extractors. Can be specified multiple times (e.g.
                                  `--exclude-extractor hash --exclude-extractor pe`).
  --max-size 
       
         Set the maximum file size (in bytes) to inspect. [default: 5000000; x>=1] -o, --output-file 
        
          Set the output file path (e.g. 'baseline.ndjson'). [default: /workspaces/baseline/20211102200452.58bd60a3b16a.ndjson] --output-file-encoding [utf-8|utf-16le] Set the output file encoding. Only applies when writing to an actual file. [default: utf-8] -f, --output-format [ndjson] Set the output format. [default: ndjson] --partition-size 
         
           Set the partition size (i.e. number of entries per process). [default: 200; x>=1] --processes 
          
            Set the number of parallel processes. [default: 2; x>=1] --recursive / --non-recursive Whether to walk the filesystem recursively. When set, the program will only inspect the files and directories specifies on the first level of any included path. For example, if '/mnt/image' is specified as an included path, then only the directory '/mnt/image' itself and its direct children will be inspected. [default: recursive] --remap 
           
            ... Artificially remap included paths (e.g. '/mnt/image:/'). Can be specified multiple times (e.g. `--remap /mnt/image:/ --remap /dev/null:/dev/void`). --report / --no-report Whether to show a final report at the end. [default: report] --skip-compression Whether to skip on-the-fly compression of the resulting file. --skip-directories Whether to skip directories. --skip-empty Whether to skip empty entries. --help Show this message and exit. 
           
          
         
        
       
      
     
    
   

The help menu for the baseline schema subcommand:

Usage: baseline schema 
   
    

  Show the JSON representation of the actual schema.

Options:
  --compact                       Render compact JSON instead of the default idented version.
  --output-file 
    
                 Set the output file path (e.g. 'schema.json').
  --output-file-encoding [utf-8|utf-16le]
                                  Set the output file encoding. Only applies when writing to an
                                  actual file.  [default: utf-8]
  --help                          Show this message and exit.

    
   

Creating a baseline of a live system

Creating a baseline of a live system is as simple as:

baseline new

When using Baseline from a removable device, you may want to exclude its path (for example /mnt/usb) from the generated baseline:

baseline --ensure-administrator new --exclude-directory /mnt/usb

See the Usage section for a complete list of options and arguments.

Creating a baseline from a mounted image

When creating a baseline of a mounted image, you may want the baseline to represent the files as if they were read from the actual system, not the mounted image.

For example, if your image is currently mounted on /mnt/IMG-001, you can then execute the following command to remap all entries read from this path to /:

baseline new --remap /mnt/IMG-001:/ /mnt/IMG-001

You can think of this as a chroot jail.

Displaying the schema

Baseline uses a fixed schema for rendering the information. This schema is enforced using the Pydantic package and produces a heavily-typed output that can later be ingested as-is.

To print the standardized JSON schema:

baseline schema

To dump a compact version of the JSON schema to schema.min.json:

baseline schema --compact --output-file schema.min.json

The JSON schema produced by Pydantic is compatible with the specifications from JSON Schema Core, JSON Schema Validation and OpenAPI Data Types. See the official Pydantic documentation for more details.

Advanced usage

Building binaries

Baseline currently supports the following packaging systems:

Although precompiled binaries are available in the Releases section, you should always build your own binaries.

To produce a binary using PyInstaller:

make pyinstaller-linux

To produce a binary using Nuitka:

make nuitka-linux

As Nuitka is a Python compiler by itself and does not rely on the standard CPython interpreter, you should be aware that there may be bugs and/or issues unrelated to Baseline itself.

Building the Docker image

To build the official Docker image:

make docker

Additional instructions can be added to the Dockerfile in order to customize the image.

The official Docker image is available at https://hub.docker.com/r/sk4la/baseline. You can use the FROM docker.io/sk4la/baseline:latest instruction in your own Dockerfile to derive your own image.

API

Using Baseline from Python is possible using the Baseline class:

from baseline.core import Baseline

with Baseline() as baseline:
    for record in baseline.compute(*[
        "/mnt/IMG-001",
        "/mnt/IMG-002",
    ]):
        print(record.json(exclude_none=True))

See the actual code for a more thorough example.

The Baseline class emits logging messages to the baseline logger, to which you can subscribe to if you wish. The command-line utility displays these messages to the console by default.

Contribute

Baseline is a work in progress, everyone is welcome to contribute! 👐

Writing a new extractor

In order for new extractors to be able to enrich the generated records, the global schema first needs to be updated. To do this, you must create a sublass of Pydantic's BaseModel in schema.py that references the fields that will eventually be filled by the extractor. This class will then be referenced in the schema's root Record class.

In this example, we want to extract the first 50 lines of any *.txt file. Here we arbitrarily decide that the extracted text will be stored in the content attribute and that the extractor's key will be text:

class Text(pydantic.BaseModel):
    content: str

class Record(pydantic.BaseModel):
    ...
    text: typing.Optional[Text]

We can then start to write the actual code. All extractors must inherit from the base Extractor class:

None: with self.entry.open() as stream: setattr( record, self.KEY, schema.Text( content=stream.read(50), ), ) ">
from baseline.models import Extractor
from baseline.schema import Text

class Text(Extractor):
    """Extracts the first line of any `.txt` file."""

    EXTENSION_FILTERS = (
      r"\.txt$",
    )
    KEY = "text"

    def run(self: object, record: schema.Record) -> None:
        with self.entry.open() as stream:
            setattr(
                record,
                self.KEY,
                schema.Text(
                    content=stream.read(50),
                ),
            )

The extractor's KEY class variable must correspond to the one that was specified in the schema's root Record class (text in this example).

Support

In case you encounter a problem or want to suggest a new feature, please submit a ticket.

License

Baseline is licensed under the GNU General Public License (GPL) version 3.

You might also like...
A Python module and command line utility for working with web archive data using the WACZ format specification

py-wacz The py-wacz repository contains a Python module and command line utility for working with web archive data using the WACZ format specification

A Python module and command-line utility for converting .ANS format ANSI art to HTML

ansipants A Python module and command-line utility for converting .ANS format ANSI art to HTML. Installation pip install ansipants Command-line usage

A command line utility to export Google Keep notes to markdown.

Keep-Exporter A command line utility to export Google Keep notes to markdown files with metadata stored as a frontmatter header. Supports exporting: S

A command line utility for tracking a stock market portfolio. Primarily featuring high resolution braille graphs.
A command line utility for tracking a stock market portfolio. Primarily featuring high resolution braille graphs.

A command line stock market / portfolio tracker originally insipred by Ericm's Stonks program, featuring unicode for incredibly high detailed graphs even in a terminal.

📦 A command line utility to put text in a box.
📦 A command line utility to put text in a box.

boxie A command line utility to put text in a box. Installation pip install boxie If you are on Linux you may need to use sudo to access this globally

Tiny command-line utility for mapping broken keys to other positions.

brokenkey Tiny command-line utility for mapping broken keys to other positions. Installation Clone this repository using git: git clone https://github

This is a CLI utility that allows you to view RedFlagDeals.com on the command line.
This is a CLI utility that allows you to view RedFlagDeals.com on the command line.

RFD Description Motivation Installation Usage View Hot Deals View and Sort Hot Deals Search Advanced View Posts Shell Completion bash zsh Description

img-proof (IPA) provides a command line utility to test images in the Public Cloud

overview img-proof (IPA) provides a command line utility to test images in the Public Cloud (AWS, Azure, GCE, etc.). With img-proof you can now test c

A Python command-line utility for validating that the outputs of a given Declarative Form Azure Portal UI JSON template map to the input parameters of a given ARM Deployment Template JSON template

A Python command-line utility for validating that the outputs of a given Declarative Form Azure Portal UI JSON template map to the input parameters of a given ARM Deployment Template JSON template

Comments
  • executable variable is not defined in extractors/executables

    executable variable is not defined in extractors/executables

    self.executable is not defined in baseline/extractors/executables.py, so in the __del__ method, it crashes.

    $ baseline new /etc      
    Exception ignored in: <function PortableExecutable.__del__ at 0x7fd1142c98b0>
    Traceback (most recent call last):
      File "/home/git/baseline/env/lib/python3.9/site-packages/baseline/extractors/executables.py", line 80, in __del__
        if self.executable:
    AttributeError: 'PortableExecutable' object has no attribute 'executable'
    All done! 💪
    
    Entries Processed : 1592
    Output Format     : ndjson
    Output File       : /home/git/baseline/20211106191044.heaven.ndjson.xz
    Size              : 1.2 MB
    Total Time        : 0:00:01.993753
    
    opened by jurelou 0
Releases(0.1.0)
Owner
Nelson
Random bits of code
Nelson
Simple command line tool to train and deploy your machine learning models with AWS SageMaker

metamaker Simple command line tool to train and deploy your machine learning models with AWS SageMaker Features metamaker enables you to: Build a dock

Yasuhiro Yamaguchi 5 Jan 09, 2022
Neovim integration for Google Keep, built using gkeepapi

Gkeep.nvim Neovim integration for Google Keep, built using gkeepapi Requirements Neovim 0.5 Python 3.6+ A patched font (optional. Used for icons) Tabl

Steven Arcangeli 143 Jan 02, 2023
Double Pendulum visualised with fetching system information in Python.

Show off your terminal, in style. A nice relaxing double pendulum simulation using ASCII, able to simulate multiple pendulums at once, and provide tra

Nekurone 62 Dec 14, 2022
Analyzing the most strategic words to guess on Wordle, based on letter frequency distributions

wordle-analysis Evaluating different heuristics to determine the most effective solving strategy and building an AI-powered assistant tool to help you

Sejal Dua 9 Feb 27, 2022
An ZFS administration tool inspired on Midnight commander

ZC - ZFS Commander An ZFS administration tool inspired on Midnight commander Work in Progress Description ZFS Commander is a simple front-end for the

34 Dec 07, 2022
Postgres CLI with autocompletion and syntax highlighting

A REPL for Postgres This is a postgres client that does auto-completion and syntax highlighting. Home Page: http://pgcli.com MySQL Equivalent: http://

dbcli 10.8k Jan 02, 2023
:computer: tmux session manager. built on libtmux

tmuxp, tmux session manager. built on libtmux. We need help! tmuxp is a trusted session manager for tmux. If you could lend your time to helping answe

python utilities for tmux 3.6k Jan 01, 2023
iTerm2 Shell integration for Xonsh shell.

iTerm2 Shell Integration iTerm2 Shell integration for Xonsh shell. Installation To install use pip: xpip install xontrib-iterm2 # or: xpip install -U

Noorhteen Raja NJ 6 Dec 29, 2022
jrnl is a simple journal application for the command line.

jrnl To get help, submit an issue on Github. jrnl is a simple journal application for the command line. You can use it to easily create, search, and v

jrnl 5.7k Dec 31, 2022
A simple note taker CLI program written in python

note-taker A simple note taker program written in python This allows you to snip your todo's, notes, and your tasks easily without extra charges Requi

marcusz 4 Nov 02, 2021
Un module simple pour demander l'accord de l'utilisateur dans une CLI.

Demande de confirmation utilisateur pour CLI Présentation ask_lib est un module pour le langage Python proposant une seule fonction; ask(). Le but pri

CallMePixelMan 7 May 09, 2022
vimBrain is a brainfuck-based vim-inspired esoteric programming language.

vimBrain vimBrain is a brainfuck-based vim-inspired esoteric programming language. vimBrainPy Currently, the only interpreter available is written in

SalahDin Ahmed 3 May 08, 2022
A terminal spreadsheet multitool for discovering and arranging data

VisiData v2.6.1 A terminal interface for exploring and arranging tabular data. VisiData supports tsv, csv, sqlite, json, xlsx (Excel), hdf5, and many

Saul Pwanson 6.2k Jan 04, 2023
A Python3 rewrite of my original PwnedConsole project from almost a decade ago

PwnedConsoleX A CLI shell for performing queries against the HaveIBeenPwned? API to gather breach information for user-supplied email addresses. | wri

1 Jul 23, 2022
A next-generation CLI and TUI that aims to be your personal assistant for everything competitive programming related. 🚀

Competitive Programming Tool Kit The Competitive Programming Tool Kit (cptk for short), is a command line and terminal user interface (CLI and TUI) th

Alon 4 May 21, 2022
A command line application to analyse reports from TBC Warcraft Logs.

README A command line application to analyse reports from TBC Warcraft Logs. The application was written and tested with Python 3.9. Features Dumps an

2 Dec 17, 2021
Stream comments, submissions from subreddits and users across reddit right in your terminal

reddit_from_terminal stream comments, submissions from subreddits and users across reddit right in your terminal Alert! : Can't watch media contents(p

Pritam Dhara 2 Dec 30, 2021
QueraToCSV is a simple python CLI project to convert the Quera results file into CSV files.

Quera is an Iranian Learning management system (LMS) that has an online judge for programming languages. Some Iranian universities use it to automate the evaluation of programming assignments.

Amirmahdi Namjoo 16 Nov 11, 2022
A Command Line Calculator With Python

CalculadoraPY Usando no Termux apt install python3 apt install git pip3 install termcolor git clone https://github.com/kayke981/CalculadoraPY.git

kayake 5 Jan 30, 2022
dcargs is a tool for generating portable, reusable, and strongly typed CLI interfaces from dataclass definitions.

dcargs is a tool for generating portable, reusable, and strongly typed CLI interfaces from dataclass definitions.

Brent Yi 119 Jan 09, 2023