ZipFly is a zip archive generator based on zipfile.py

Overview

Build Status GitHub release (latest by date) Downloads

Buzon - ZipFly

ZipFly is a zip archive generator based on zipfile.py. It was created by Buzon.io to generate very large ZIP archives for immediate sending out to clients, or for writing large ZIP archives without memory inflation.

Requirements

Python 3.6+

Install

pip3 install zipfly

Basic usage, compress on-the-fly during writes

Using this library will save you from having to write the Zip to disk. Some data will be buffered by the zipfile deflater, but memory inflation is going to be very constrained. Data will be written to destination by default at regular 32KB intervals.

ZipFly defaults attributes:

  • paths: [ ]
  • mode: (write) w
  • chunksize: (bytes) 32768
  • compression: Stored
  • allowZip64: True
  • compresslevel: None
  • storesize: (bytes) 0
  • encode: utf-8

paths list of dictionaries:

.
fs Should be the path to a file on the filesystem
n (Optional) Is the name which it will have within the archive
(by default, this will be the same as fs)

    import zipfly

    paths = [
        {
            'fs': '/path/to/large/file'
        },
    ]

    zfly = zipfly.ZipFly(paths = paths)

    generator = zfly.generator()
    print (generator)
    # 
   


    with open("large.zip", "wb") as f:
        for i in generator:
            f.write(i)

Examples

Streaming multiple files in a zip with Django or Flask Send forth large files to clients with the most popular frameworks

Create paths Easy way to create the array paths from a parent folder.

Predict the size of the zip file before creating it Use the BufferPredictionSize to compute the correct size of the resulting archive before creating it.

Streaming a large file Efficient way to read a single very large binary file in python

Set a comment Your own comment in the zip file

Maintainer

Santiago Debus (@santiagodebus.com)

License

This library was created by Buzon.io and is released under the MIT. Copyright 2021 Cardallot, Inc.

Comments
  • Cannot include empty folders

    Cannot include empty folders

    Python's zipfile module supports writing empty folders to zip archive.

    when trying with zipfly i get:

    File "/.../zipfly.py", line 211, in generator
        with open( path[self.filesystem], 'rb' ) as e:
    IsADirectoryError: [Errno 21] Is a directory: '/path/to/dir'
    

    Is there an undocumented way for writing empty folders?

    opened by rnixx 2
  • Is compression possible?

    Is compression possible?

    Hi,

    I see that only uncompressed ZIP files are can be created with zipfly. What is the reason for that? I don't see any compression level specific code in ZipFly.generator, except file size estimation (which doesn't work for ZIP64 achives).

    So, is compression just an unimplemented feature? Or are there any aspects which don't allow using compression for streaming made this way?

    I understand that compression is often useless in streaming scenarios but it's not true in our case. We're looking for a replacement for zipstream-new which seems to be dead. ZipFly looks nice for us...

    opened by dezhin 2
  • ASGI breaks functionability in HttpStreamingResponse

    ASGI breaks functionability in HttpStreamingResponse

    Running Django 4.1 in async ASGI mode (channels & websockets needed)

    Zipfly functionability breaks as it tries to compile the entire zip to memory before sending zip to client, filling up memory and causing a crash

    opened by T-101 1
  • Release master branch to pypi

    Release master branch to pypi

    It looks like the last version on PyPI has been published 2 years ago and there have been many changes in the code since.

    It would be helpful if you could release the latest changes.

    PS since the project is already using GH action for testing, it may be helpful to set up GH actions for release as well.

    https://packaging.python.org/en/latest/guides/publishing-package-distribution-releases-using-github-actions-ci-cd-workflows/

    https://github.com/marketplace/actions/pypi-publish

    opened by 1oglop1 1
  • Stream into stream

    Stream into stream

    Hi, thank you for the inspirational code.

    I'm wondering if it's possible to adapt your code and make it compatible with S3 storage.

    I use django-storages and boto3 client returns a streaming body (already open fp) and all metadata.

    I need a zipfile (or any other archive) to be created from the set of files during the GET request (not great but it has to be).

    So to save the memory I have 2 options.

    1. Use zipfly as is and download files into a temporary location (and remove it after the operation)
    2. Or better solution that doesn't require intermediate storage so I could pipe the content of streaming body into zipfly and return that as a streaming response.

    streaming body (many of them) -> zipfly(zip file) -> streaming response

    Do you think that option two is possible? If so, could you please point out what pieces of code I should focus on to adapt zipfly?

    Thank you

    opened by 1oglop1 1
  • Add functionality to provide file streams and zip them on the fly

    Add functionality to provide file streams and zip them on the fly

    It is a common use case to store large files remotely and not next to the code. See e.g. for django https://github.com/jschneier/django-storages. If many (large) files must be shipped synchronuosly as a zip file, it saves memory and storage to pass them through the web worker as a stream without saving anything to disk. To archive that, file buffers as input may be supported.

    I would appreciate if we could add this functionality. This pull request contains a tested initial attempt. Feel free to improve!

    This pull request is based on https://github.com/BuzonIO/zipfly/pull/75 and https://github.com/BuzonIO/zipfly/pull/76

    opened by beckedorf 0
  • Using bytes instead of actual files

    Using bytes instead of actual files

    Hi, thanks for the great work. I have some bytes generators and I want to make a zip on the fly out of them. Is there an option to do this? Sorry for my poor python knowledge. Thanks :)

    opened by TaToTanWeb 2
Owner
Buzon
Buzon.io is a Cardallot Inc. service
Buzon
Organize the files into the relevant sub-folders

This program can be used to organize files in a directory by their file extension. And move duplicate files to a duplicates folder.

Thushara Thiwanka 2 Dec 15, 2021
Yadl - it is a simple library for working with both dotenv files and environment variables.

Yadl Yadl - it is a simple library for working with both dotenv files and environment variables. Features Validation of whitespaces. Validation of num

Ivan Kapranov 3 Oct 19, 2021
Annotate your Python requirements.txt file with summaries of each package.

Summarize Requirements šŸ šŸ“œ Annotate your Python requirements.txt file with a short summary of each package. This tool: takes a Python requirements.t

Zeke Sikelianos 8 Apr 22, 2022
Swiss army knife for Apple's .tbd file manipulation

Description Inspired by tbdswizzler, this simple python tool for manipulating Apple's .tbd format. Installation python3 -m pip install --user -U pytbd

10 Aug 31, 2022
Better directory iterator and faster os.walk(), now in the Python 3.5 stdlib

scandir, a better directory iterator and faster os.walk() scandir() is a directory iteration function like os.listdir(), except that instead of return

Ben Hoyt 506 Dec 29, 2022
Maltego transforms to pivot between PE files based on their VirusTotal codeblocks

VirusTotal Codeblocks Maltego Transforms Introduction These Maltego transforms allow you to pivot between different PE files based on codeblocks they

Ariel Jungheit 18 Feb 03, 2022
Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of.

Singer is an open source standard for moving data between databases, web APIs, files, queues, and just about anything else you can think of. Th

Singer 1.1k Jan 05, 2023
Instant Fuzzy File Search for Alfred

List all the files inside a folder using fd, and instantly fuzzy-search through all of them using fzf, all from inside Alfred with a single keyword: fzf.

Mr. Pennyworth 37 Nov 30, 2022
Import Python modules from any file system path

pathimp Import Python modules from any file system path. Installation pip3 install pathimp Usage import pathimp

Danijar Hafner 2 Nov 29, 2021
Convert All TXT Files To One File.

AllToOne Convert All TXT Files To One File. Hi šŸ‘‹ , I'm Alireza A Python Developer Boy šŸ”­ I’m currently working on my C# projects 🌱 I’m currently Lea

4 Jun 07, 2022
Powerful Python library for atomic file writes.

Powerful Python library for atomic file writes.

Markus Unterwaditzer 313 Oct 19, 2022
Automatically generates a TypeQL script for doing entity and relationship insertions from a .csv file, so you don't have to mess with writing TypeQL.

Automatically generates a TypeQL script for doing entity and relationship insertions from a .csv file, so you don't have to mess with writing TypeQL.

3 Feb 09, 2022
Python code snippets for extracting PDB codes from .fasta files

Python_snippets_for_bioinformatics Python code snippets for extracting PDB codes from .fasta files If you have a single .fasta file for all protein se

Sofi-Mukhtar 3 Feb 09, 2022
Python script for converting figma produced SVG files into C++ JUCE framework source code

AutoJucer Python script for converting figma produced SVG files into C++ JUCE framework source code Watch the tutorial here! Getting Started Make some

SuperConductor 1 Nov 26, 2021
A tiny Python library for writing multi-channel TIFF stacks.

xtiff A tiny Python library for writing multi-channel TIFF stacks. The aim of this library is to provide an easy way to write multi-channel image stac

23 Dec 27, 2022
A wrapper for DVD file structure and ISO files.

vs-parsedvd DVDs were an error. A wrapper for DVD file structure and ISO files. You can find me in the IEW Discord server

7 Nov 17, 2022
A tool written in python to generate basic repo files from github

A tool written in python to generate basic repo files from github

Riley 7 Dec 02, 2021
A Certificate renaming tool made for IEEE CS SBC, SJCE.

PDF Batch Renamer Made for IEEE CS SBC, SJCE How to use? Before using the python script, ensure that pytesseract, pdf2image, opencv and other supporti

Ashwin Kumar U 2 Nov 14, 2021
Two scripts help you to convert csv file to md file by template

Two scripts help you to convert csv file to md file by template. One help you generate multiple md files with different filenames from the first colume of csv file. Another can generate one md file w

2 Oct 15, 2022
A python module to parse text files with contains secret variables.

A python module to parse text files with contains secret variables.

0 Dec 05, 2022