An example of repository data as bundles

Related tags

Distributionbundles
Overview

Bundles

This repository is just an example of how we can host Git bundles in a way that supports fetching data from precomputed bundles without the origin server needing to manage those bundles.

This repository is mirrored as an Azure Static Web Site at https://nice-ocean-0f3ec7d10.azurestaticapps.net.

This repository contains a set of bundles corresponding to the data of the git/git repository in its master branch at different timepoints throughout October 2021.

Proposal for fetching bundles

Git clients can fetch a "table of contents" from some predetermined URL, such as https://nice-ocean-0f3ec7d10.azurestaticapps.net/bundles.json hosted by this repository.

This URL stores a JSON list with objects containing a few known members:

  • uri (required): the URI of the bundle being referenced.
  • timestamp: the timestamp of this URI.
  • requires: If this bundle is not closed under reachability (and might contain thin packs), then which uri is the "previous" one that contains a previous set of objects. (This assumes that the bundles can be ordered linearly.)

Cloning

The clone.sh script shows how we can create a new repository using these bundles. After initializing a new repository, we can use fetch.py to download all of the bundles in the JSON list. We then add the origin remote and fetch the remaining data from that list.

[email protected]:/_git$ GIT_TRACE2_PERF=/_git/trace2.txt /_git/bundles/clone.sh https://github.com/git/git git-bundle-test
Initialized empty Git repository in /_git/git-bundle-test/.git/
Downloading https://nice-ocean-0f3ec7d10.azurestaticapps.net/2021-10-01.bundle to .git/bundles/0.bundle
Downloading https://nice-ocean-0f3ec7d10.azurestaticapps.net/2021-10-4.bundle to .git/bundles/1.bundle
Downloading https://nice-ocean-0f3ec7d10.azurestaticapps.net/2021-10-7.bundle to .git/bundles/2.bundle
Downloading https://nice-ocean-0f3ec7d10.azurestaticapps.net/2021-10-12.bundle to .git/bundles/3.bundle
Downloading https://nice-ocean-0f3ec7d10.azurestaticapps.net/2021-10-13.bundle to .git/bundles/4.bundle
Downloading https://nice-ocean-0f3ec7d10.azurestaticapps.net/2021-10-14.bundle to .git/bundles/5.bundle
Downloading https://nice-ocean-0f3ec7d10.azurestaticapps.net/2021-10-15.bundle to .git/bundles/6.bundle
Downloading https://nice-ocean-0f3ec7d10.azurestaticapps.net/2021-10-19.bundle to .git/bundles/7.bundle
Downloading https://nice-ocean-0f3ec7d10.azurestaticapps.net/2021-10-26.bundle to .git/bundles/8.bundle
Note: switching to 'FETCH_HEAD'.

You are in 'detached HEAD' state. You can look around, make experimental
changes and commit them, and you can discard any commits you make in this
state without impacting any branches by switching back to a branch.

If you want to create a new branch to retain commits you create, you may
do so (now or later) by using -c with the switch command. Example:

  git switch -c <new-branch-name>

Or undo this operation with:

  git switch -

Turn off this advice by setting config variable advice.detachedHead to false

HEAD is now at af6d1d602a Git 2.33.1

The trace2 logs for this run are available as trace2.txt, so you can see how small the git fetch origin portion of clone.sh is.

[email protected]:/_git$ cd git-bundle-test/
[email protected]:/_git/git-bundle-test$ git branch -v
* (HEAD detached at FETCH_HEAD) af6d1d602a Git 2.33.1
  refs/bundles/2021-10-01       cefe983a32 The ninth batch
  refs/bundles/2021-10-12       2a97289ad8 Twelfth batch
  refs/bundles/2021-10-13       2bd2f258f4 Sync with Git 2.33.1
  refs/bundles/2021-10-14       9875c51553 Merge branch 'ja/doc-status-types-and-copies'
  refs/bundles/2021-10-15       f443b226ca Thirteenth batch
  refs/bundles/2021-10-19       9d530dc002 The fourteenth batch
  refs/bundles/2021-10-26       e9e5ba39a7 The fifteenth batch
  refs/bundles/2021-10-4        0785eb7698 The tenth batch
  refs/bundles/2021-10-7        106298f7f9 The eleventh batch

[email protected]:/_git/git-bundle-test$ ls .git/objects/pack/
[email protected]:/_git/git-bundle-test$ ls -al .git/objects/pack/
total 241064
drwxrwxr-x 2 stolee stolee      4096 Oct 28 11:52 .
drwxrwxr-x 4 stolee stolee      4096 Oct 28 11:52 ..
-rw-rw-r-- 1 stolee stolee   8877836 Oct 28 11:52 multi-pack-index
-r--r--r-- 1 stolee stolee     18152 Oct 28 11:52 pack-0de3636531b9ce15eae60de09224e8a62d9d0a4c.idx
-r--r--r-- 1 stolee stolee   1515581 Oct 28 11:52 pack-0de3636531b9ce15eae60de09224e8a62d9d0a4c.pack
-r--r--r-- 1 stolee stolee      9612 Oct 28 11:52 pack-1938b2e1527f7167687ee27e18951aac9a0baed1.idx
-r--r--r-- 1 stolee stolee    849728 Oct 28 11:52 pack-1938b2e1527f7167687ee27e18951aac9a0baed1.pack
-r--r--r-- 1 stolee stolee   8514836 Oct 28 11:52 pack-3174045eb5b62a6749b1daf60c0acfe8fda0facc.idx
-r--r--r-- 1 stolee stolee 100176426 Oct 28 11:52 pack-3174045eb5b62a6749b1daf60c0acfe8fda0facc.pack
-r--r--r-- 1 stolee stolee    298880 Oct 28 11:52 pack-43362f7e98023f4698ac7c3ace1f739616212d34.idx
-r--r--r-- 1 stolee stolee  11376553 Oct 28 11:52 pack-43362f7e98023f4698ac7c3ace1f739616212d34.pack
-r--r--r-- 1 stolee stolee     10928 Oct 28 11:52 pack-67d22f7b765041b551444e1c21c5950b3e9392d8.idx
-r--r--r-- 1 stolee stolee   1231140 Oct 28 11:52 pack-67d22f7b765041b551444e1c21c5950b3e9392d8.pack
-r--r--r-- 1 stolee stolee     27756 Oct 28 11:52 pack-6ab2c38b678cf338a9fa0cf2faf65653ef00f1cb.idx
-r--r--r-- 1 stolee stolee   1942093 Oct 28 11:52 pack-6ab2c38b678cf338a9fa0cf2faf65653ef00f1cb.pack
-r--r--r-- 1 stolee stolee      9780 Oct 28 11:52 pack-8271f33d606a5ab8804c97a1135f441a1c2ca361.idx
-r--r--r-- 1 stolee stolee    517529 Oct 28 11:52 pack-8271f33d606a5ab8804c97a1135f441a1c2ca361.pack
-r--r--r-- 1 stolee stolee     15324 Oct 28 11:52 pack-937b1699b65fd2cacbd9bc119b09fb05fd1a685c.idx
-r--r--r-- 1 stolee stolee   1166484 Oct 28 11:52 pack-937b1699b65fd2cacbd9bc119b09fb05fd1a685c.pack
-r--r--r-- 1 stolee stolee     14428 Oct 28 11:52 pack-98e8a35d1a2ad91a56b29b5b3e60182ca7dcbdaa.idx
-r--r--r-- 1 stolee stolee   1082390 Oct 28 11:52 pack-98e8a35d1a2ad91a56b29b5b3e60182ca7dcbdaa.pack
-r--r--r-- 1 stolee stolee   8499240 Oct 28 11:52 pack-b805e409cb3ed85b98e4c58697e33e1027f367a7.idx
-r--r--r-- 1 stolee stolee 100595382 Oct 28 11:52 pack-b805e409cb3ed85b98e4c58697e33e1027f367a7.pack
-r--r--r-- 1 stolee stolee      1324 Oct 28 11:52 pack-f58f8c9ebfd3fdfa41a79f6558bc5122019778d7.idx
-r--r--r-- 1 stolee stolee     37462 Oct 28 11:52 pack-f58f8c9ebfd3fdfa41a79f6558bc5122019778d7.pack

Fetching

As we download and store the bundles from the list of URIs, we update the bundle.latestTimestamp config value. This allows us to reexamine the table of contents and only download the bundles that are newer than that timestamp.

(If the timestamps have altered in a way that our previously-downloaded bundles are no longer in the list, hopefully we could use the requires members to download bundles until closing the missing objects. This is not implemented in fetch.py.)

Here is a test of the idea by manually modifying bundle.latestTimestamp:

[email protected]:/_git/git-bundle-test$ git config --replace-all bundle.latestTimestamp 1634072372
[email protected]:/_git/git-bundle-test$ git config --local --list
core.repositoryformatversion=0
core.filemode=true
core.bare=false
core.logallrefupdates=true
bundle.latesttimestamp=1634072372
remote.origin.url=https://github.com/git/git
remote.origin.fetch=+refs/heads/*:refs/remotes/origin/*
[email protected]:/_git/git-bundle-test$ /_git/bundles/fetch.py
Downloading https://nice-ocean-0f3ec7d10.azurestaticapps.net/2021-10-14.bundle to .git/bundles/0.bundle
Downloading https://nice-ocean-0f3ec7d10.azurestaticapps.net/2021-10-15.bundle to .git/bundles/1.bundle
Downloading https://nice-ocean-0f3ec7d10.azurestaticapps.net/2021-10-19.bundle to .git/bundles/2.bundle
Downloading https://nice-ocean-0f3ec7d10.azurestaticapps.net/2021-10-26.bundle to .git/bundles/3.bundle

Benefits over server-declared URIs

  1. The organization of the bundles is completely separate from the origin server. The bundle server can reorganize as needed without communicating with the origin server.

  2. The bundle server can be completely independent of the origin. If a company wants to create a local bundle cache, then users can point to it through client-side configuration instead of needing to communicate through the origin server.

  3. We can extend the server capabilities to advertise a number of bundle caches, and let the client pick their favorite one. This can present ways to optimize for network latency before committing to a download.

Things not covered in this proposal

  • We don't have a way to authenticate to the bundles. The table of contents and the bundles themselves could be under some form of authentication that is not covered here. We would want to extend the standard to handle auth appropriately, probably through a credential helper.

  • We don't consider encrypted bundles. It is likely possible to extend the table of contents with information about each bundle being encrypted with some public key, allowing future clients to understand that option and do the right thing. Extensions like this are obviously possible with the JSON format (as opposed to a custom format that might cause accidental restrictions).

Custom things to this implementation

  • The bundles attempt to store refs as refs/bundles/<X>, but somehow the bundles end up putting the refs as refs/heads/refs/bundles/<X>. To avoid polluting refs/remotes/ or other refspaces, the refs/heads/ is stripped out in these cases. The ref space could be very flexible, depending on how the bundle organizer designs it.

  • The first bundle is big: it includes all data in master from around 30 days ago. The rest are picking daily updates (if master moved in that time). This layout could shift over time, and I would expect the bundle maintenance to merge the oldest two bundles after generating a new, "latest" bundle.

  • These bundles only care about master, but they could be a full snapshot of refs/heads/. They could also contain all of the tags, if we wanted. (Tags would not want to be hidden away in another ref namespace, I think.)

  • Here, I am using a static web page to serve the data, but it could be a fancy web service with a real REST API. Specifically, it might be nice to add a GET parameter to the table of contents that allows us to specify a filter, such as https://{uri}/bundles?filter=blob:none. Alternatively, we could list the filter as part of the JSON objects and let the client decide without special modification to the URL.

  • Note: Bundles require modification to allow object filters, but that would be valuable for allowing these bundles to work at huge scale.

  • These bundle table of contents could be located via CDN, but they could also be on a GHES replica or some other tiny service. They could even be hosted as a route on github.com and backed by a near-the-edge microservice.

  • Notice that I don't include any details about "how does the client discover the table of contents?" This is currently vauge, but we could add things to the Git protocol to advertise the table's location. I think separating the table itself out of the origin Git server is helpful because we might want multiple, geodistributed locations. The GVFS Cache Servers do this: the origin advertises the possible cache server URLs and then the cache servers manage their own lists of precomputed packs. The client can decide which of those locations is best for them. The client could use a ping to test latency and choose the closest one that way. The specific way that Git could advertise this could look a lot like the gvfs/config endpoint which has other data than just the cache servers. We could create a "config" endpoint for clones that advertises these tables, but also advertises things like "you should use --filter=blob:none here" or other advanced recommendations.

Owner
Derrick Stolee
I used to be a mathematician in computational graph theory. These days I spend most of my time contributing to Git.
Derrick Stolee
tool for creating installers from conda packages

(conda) Constructor Description Constructor is a tool which allows constructing an installer for a collection of conda packages. It solves needed pack

Conda 386 Jan 04, 2023
A library and tool for generating .pex (Python EXecutable) files

PEX Contents Overview Installation Simple Examples Integrating pex into your workflow Documentation Development Contributing Overview pex is a library

Pants Build 2.2k Jan 01, 2023
debinstaller - A tool to install .deb files in any distro.

debinstaller A tool to install .deb files in any distro. Installation for debinstaller

Manoj Paramsetti 6 Nov 06, 2022
Python virtualenvs in Debian packages

dh-virtualenv Contents Overview Presentations, Blogs & Other Resources Using dh-virtualenv How does it work? Running tests Building the package in a D

Spotify 1.5k Dec 16, 2022
Python-easy-pack For Linux/Unix, Changed by laman28

Python-easy-pack For Linux/Unix, Changed by laman28

LMFS 2 Jan 28, 2022
Subpar is a utility for creating self-contained python executables. It is designed to work well with Bazel.

Subpar Subpar is a utility for creating self-contained python executables. It is designed to work well with Bazel. Status Subpar is currently owned by

Google 550 Dec 27, 2022
WinPython is a portable distribution of the Python programming language for Windows

WinPython tools Copyright © 2012-2013 Pierre Raybaut Copyright © 2014-2019+ The Winpython development team https://github.com/winpython/ Licensed unde

1.5k Jan 04, 2023
Core utilities for Python packages

packaging Reusable core utilities for various Python Packaging interoperability specifications. This library provides utilities that implement the int

Python Packaging Authority 451 Jan 04, 2023
Auto locust load test config and worker distribution with Docker and GitHub Action

Auto locust load test config and worker distribution with Docker and GitHub Action Install Fork the repo and change the visibility option to private S

Márk Zsibók 1 Nov 24, 2021
Freeze (package) Python programs into stand-alone executables

PyInstaller Overview PyInstaller bundles a Python application and all its dependencies into a single package. The user can run the packaged app withou

PyInstaller 9.9k Jan 08, 2023
Create standalone executables from Python scripts, with the same performance and is cross-platform.

About cx_Freeze cx_Freeze creates standalone executables from Python scripts, with the same performance, is cross-platform and should work on any plat

Marcelo Duarte 1k Jan 04, 2023
FreezeUI is a python package that creates applications using cx_freeze and GUI by converting .py to .exe .

FreezeUI is a python package use to create cx_Freeze setup files and run them to create applications and msi from python scripts (converts .py to .exe or .msi .

4 Aug 25, 2022
Psgcompiler A PySimpleGUI Application - Transform your Python programs in Windows, Mac, and Linux binary executables

psgcompiler A PySimpleGUI Application "Compile" your Python programs into an EXE for Windows, an APP for Mac, and a binary for Linux Installation Old-

PySimpleGUI 77 Jan 07, 2023
An example of repository data as bundles

Bundles This repository is just an example of how we can host Git bundles in a way that supports fetching data from precomputed bundles without the or

Derrick Stolee 1 Jan 02, 2022
py2app is a Python setuptools command which will allow you to make standalone Mac OS X application bundles and plugins from Python scripts.

py2app is a Python setuptools command which will allow you to make standalone Mac OS X application bundles and plugins from Python scripts. py2app is

Ronald Oussoren 222 Dec 30, 2022
Build Windows installers for Python applications

Pynsist is a tool to build Windows installers for your Python applications. The installers bundle Python itself, so you can distribute your applicatio

Thomas Kluyver 818 Jan 05, 2023
executable archive format

XAR XAR lets you package many files into a single self-contained executable file. This makes it easy to distribute and install. A .xar file is a read-

Facebook Incubator 1.5k Dec 29, 2022
Install .deb packages on any distribution:)

Install .deb packages on any distribution:) Install Dependencies The project needs dependencies Python python is often installed by default on linux d

GGroup 1 Mar 31, 2022
Anaconda is the OS installer used by Fedora, RHEL, CentOS and other Linux distributions.

Anaconda is the OS installer used by Fedora, RHEL, CentOS and other Linux distributions. Documentation Documentation for the Anaconda install

Red Hat Installer Engineering Team 454 Jan 08, 2023
Python Wheel Obfuscator

pywhlobf obfuscates your wheel distribution by compiling python source file to shared library.

Hunt Zhan 79 Dec 22, 2022