A Python module and command line utility for working with web archive data using the WACZ format specification

Overview

py-wacz

The py-wacz repository contains a Python module and command line utility for working with web archive data using the WACZ format specification. Web Archive Collection Zipped (WACZ) allows web archives to be shared and distributed by providing a predictable way of packaging up web archive data and metadata as a ZIP file. The wacz command line utility supports converting any WARC files into WACZ files, and optionally generating full-text search indices of pages.

Install

Use pip to install the module and a command line utility:

pip install wacz

Once installed you can use the wacz command line utility to create and validate WACZ files.

Create

To create a WACZ package you can point wacz at a WARC file and tell it where to write the WACZ with the -o option:

wacz create -o myfile.wacz 
   

   

The resulting myfile.wacz should be loadable via ReplayWeb.page.

wacz accepts the following options for customizing how the WACZ file is assembled.

-f --file

Explicitly declare the file being passed to the create function.

wacz create -f tests/fixtures/example-collection.warc

-o --output

Explicitly declare the name of the wacz being created

wacz create tests/fixtures/example-collection.warc -o mywacz.wacz

-t --text

Generates pages.jsonl page index with a full-text index, must be run in conjunction with --detect-pages. Will have no effect if run alone

wacz create tests/fixtures/example-collection.warc -t

--detect-pages

Generates pages.jsonl page index without a full-text index

wacz create tests/fixtures/example-collection.warc --detect-pages

-p --pages

Overrides the pages index generation with the passed jsonl pages.

wacz create tests/fixtures/example-collection.warc -p passed_pages.jsonl

-t --text

You can add a full text index by including the --text tag

wacz create tests/fixtures/example-collection.warc -p passed_pages.jsonl --text

--ts

Overrides the ts metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --ts TIMESTAMP

--url

Overrides the url metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --url URL

--title

Overrides the titles metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --title TITLE

--desc

Overrides the desc metadata value in the datapackage.json file

wacz create tests/fixtures/example-collection.warc --desc DESC

--hash-type

Allows the user to specify the hash type used: (sha256 or md5):

wacz create tests/fixtures/example-collection.warc --hash-type md5

Validate

You can also validate an existing WACZ file by running:

wacz validate myfile.wacz

-f --file

Explicitly declare the file being passed to the validate function.

wacz validate -f tests/fixtures/example-collection.warc

Testing

If you are developing wacz you can run the unit tests with pytest:

pytest tests
Owner
Webrecorder
Webrecorder Project
Webrecorder
EODAG is a command line tool and a plugin-oriented Python framework for searching, aggregating results and downloading remote sensed images while offering a unified API for data access regardless of the data provider

EODAG (Earth Observation Data Access Gateway) is a command line tool and a plugin-oriented Python framework for searching, aggregating results and downloading remote sensed images while offering a un

CS GROUP 205 Jan 03, 2023
doq (python docstring generator) extension for coc.nvim

coc-pydocstring doq (python docstring generator) extension for coc.nvim Install CocInstall: :CocInstall coc-pydocstring vim-plug: Plug 'yaegassy/coc-p

yaegassy 27 Jan 04, 2023
Module for converting 2D Python lists to fancy ASCII tables. Table2Ascii lets you display pretty tables in the terminal and on Discord.

table2ascii Module for converting 2D Python lists to a fancy ASCII/Unicode tables table2ascii 📥 Installation 🧑‍💻 Usage Convert lists to ASCII table

Jonah Lawrence 40 Jan 03, 2023
Zero-config CLI for TypeScript package development

Despite all the recent hype, setting up a new TypeScript (x React) library can be tough. Between Rollup, Jest, tsconfig, Yarn resolutions, ESLint, and

Jared Palmer 10.5k Jan 08, 2023
liquidctl – liquid cooler control Cross-platform tool and drivers for liquid coolers and other devices

Cross-platform CLI and Python drivers for AIO liquid coolers and other devices

1.7k Jan 08, 2023
A terminal utility to sort image files based on their characteristics.

About A terminal utility to sort image files based on their characteristics. Motivation This program was developed after I've realized that I had too

José Ferreira 1 Dec 10, 2022
A simple CLI productivity tool to quickly display the syntax of a desired piece of code

Iforgor Iforgor is a customisable and easy to use command line tool to manage code samples. It's a good way to quickly get your hand on syntax you don

Solaris 21 Jan 03, 2023
A CLI messenger for the Signum community.

A CLI messenger for the Signum community. Built for people who like using terminal for their work and want to communicate with other users in the Signum community.

Jush 5 Mar 18, 2022
A communist shell written in Python

kash A communist shell written in Python It doesn't support escapes, quotes, comment lines, |, &&, , or similar yet. If you need help, get it from

Çınar Yılmaz 1 Dec 10, 2021
CLI tool to develop StarkNet projects written in Cairo

OpenZeppelin Nile ⛵ Navigate your StarkNet projects written in Cairo. Getting started Create a folder for your project and cd into it: mkdir myproject

OpenZeppelin 305 Dec 30, 2022
3DigitDev 29 Jan 17, 2022
Official AIdea command line tool

AIdea CLI Official AIdea command line tool for https://aidea-web.tw. Installation Make sure you have installed both Python 3 and pip package manager.

AIdea 5 Dec 15, 2021
dsub is a command-line tool that makes it easy to submit and run batch scripts in the cloud.

Open-source command-line tool to run batch computing tasks and workflows on backend services such as Google Cloud.

Data Biosphere 233 Jan 01, 2023
Command-line program for organizing and managing ebook collections

Command-line program for organizing and managing ebook collections. It is a Python port from the original shell scripts ebook-tools

Raul 14 Nov 12, 2022
Notion-cli-list-manager - A simple command-line tool for managing Notion databases

A simple command-line tool for managing Notion List databases. ✨

Giacomo Salici 75 Dec 04, 2022
Terminal Colored Text for Python

Terminal Colored Text for Python

R3CKhi-**75 3 Sep 10, 2022
Python library & console tool for controlling Xiaomi smart appliances

python-miio This library (and its accompanying cli tool) can be used to interface with devices using Xiaomi's miIO and MIoT protocols. Getting started

Teemu R. 2.4k Jan 02, 2023
Message commands extension for discord-py-interactions

interactions-message-commands Message commands extension for discord-py-interactions README IS NOT FINISHED YET BUT IT IS A GOOD START Installation pi

2 Aug 04, 2022
A simple file transfer tools, similar to rz / sz but compatible with tmux (control mode), which works with iTerm2 and has a nice progress bar

trzsz A simple file transfer tools, similar to rz/sz but compatible with tmux (control mode), which works with iTerm2 and has a nice progress bar. Why

561 Jan 05, 2023
img-proof (IPA) provides a command line utility to test images in the Public Cloud

overview img-proof (IPA) provides a command line utility to test images in the Public Cloud (AWS, Azure, GCE, etc.). With img-proof you can now test c

13 Jan 07, 2022