Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included.

Overview

open(LARGE)

Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included.

Motivation

Oftentimes, especially when working with data-heavy applications, large files can proliferate in a repository. Version controlling them is an obvious next step, however, GitHub's git LFS implementation doesn't support the deletion of large files, making it easy for them to eat-up the LFS quota and explode the size of your repos.

Solution

pip install open-large

Simple example

from open_large import LargeFile

LargeFile.configure_credentials({
    "aws_region_name": "your_region_like_eu-west-2",
    "aws_access_key_id": "YOUR_ACCESS_KEY_ID",
    "aws_secret_access_key": "YOUR_VERY_SECRET_ACCESS_KEY",
    "large_files_bucket_name": "create_a_bucket_and_put_its_name_here",
})

# Creates a new version and deletes the older version leaving the 3 most recently used intact
with LargeFile("test.txt", "w", keep_last_n=3) as f:
    for i in range(100000):
        f.write('test\n')

# By default the latest version is returned
# but an optional `version` keyword argument can be provided as well
with LargeFile("test.txt", "r") as f:
    print(f.readlines()[0])

Automatically creates a file, writes to it, uploads it to S3, and then queries the most recent version of it. In this case, the latest version is already in the local cache, no download is required.

More details

LargeFile behaves like an opened file (in the background it is a temp file after all). Binary reading and writing is supported along with the different keywords open() accepts.

The local cache can be configured with these properties:

LargeFile.cache_path = Path('.cache')
LargeFile.max_cache_size = "30 GB"

I only need a path

In case you only need a path to the "remote" file, this pattern can be applied:

path_to_model = LargeFile("folder-of-my-bert-model", version=31).get()

This will first download the file/folder into your local cache folder. Then, it returns a Path object to the local version. Which can be turned into a string with str(path_to_model).

The same approach works for uploads:

LargeFile("folder-of-my-bert-model").push('path_to_local/folder_or_file')

This way, both regular files and folders can be handled. The uploaded file is called folder-of-my-bert-model, the local name is ignored.

Lastly, all version of the remote object can be deleted by calling LargeFile("my-file").delete(). It will still reside in your local cache afterwards, its deletion will happen next time your local cache has to be pruned.

Command-line example

The package can be used as a module from the command-line to give you more flexibility.

Setup

Create an .ini file (or use ~/.aws/credentials). It may look like this:

[DEFAULT]
aws_region_name = your_region_like_eu-west-2
aws_access_key_id = YOUR_ACCESS_KEY_ID
aws_secret_access_key = YOUR_VERY_SECRET_ACCESS_KEY
large_files_bucket_name = my_large_files

Just like in example secrets.

Print the expected options

python3 -m open_large --help

Upload some files

python3 -m open_large --secrets secrets.ini --push my_first_file.json folder/my_second_file my_folder

Only the filename is used as the S3 name, the rest of the path is ignored.

Download some files to the local cache

This can be useful when building a Docker image for example. This way, the files can already reside inside the container and need not be downloaded later.

python3 -m open_large --secrets ~/.aws/credentials --cache my_first_file.json:3 my_second_file my_folder:0

Versions may be specified by using :-s.

Delete remote files

python3 -m open_large --secrets ~/.aws/credentials --delete my_first_file.json
music downloader written in python. (Uses jiosaavn API)

music downloader written in python. (Uses jiosaavn API)

Rohn Chatterjee 35 Jul 20, 2022
Will load an SRC page, logged in with Firefox's cookies imported, and delete all comments from every run

SRCCommentsAutoDeleter Will load an SRC page, logged in with a support browser's cookies, and delete all comments from every run Config is all done in

3 Oct 29, 2021
Simple Python script to download images and videos from public subreddits without using Reddit's API 😎

Subreddit Media Downloader Download images and videos from any public subreddit without using Reddit's API Made with ❤ by Nico 💬 About: This script a

Nico 106 Jan 07, 2023
YouTube-Video-Downloader - Download Youtube Videos for free.

YouTube-Video-Downloader Download Youtube Videos for free. Installing Dependencies:- Windows pip install pytube Mac/Linux pip3 install pytube Clonin

Xception Inc. 1 Jan 01, 2022
Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included

WV-AMZN-4K-RIPPER Tool To download Amazon 4k SDR HDR 1080, CDM IS Not Included For CDM You can Mail :- Denis Trunov 179 Dec 17, 2022

Can automatically download mods from a Curseforge modpack

Curseforge-Modpack-Downloader A Python script which automatically downloads mods from a Curseforge modpack. Installing Dependencies ⚠ Make sure you ha

Rayr 1 Sep 20, 2022
Python based Telegram bot. Search and download YouTube video or audio.

Python-Telegram-Youtube-Media-Bot Python based Telegram bot. Search and download YouTube video or audio. Just change settings.py and start TelegramBot

Ahmet Bohur 2 Oct 02, 2022
A downloader for the ISIS service of TU Berlin

isis_dl A downloading utility for the ISIS tool of TU-Berlin. Version 0.4 Features Downloads all Material from all courses of your ISIS page. Efficien

1 Nov 06, 2021
A standalone pytube wrapper for downloading individual videos from YouTube.

pytube-runner This is a Python CLI script for downloading individual videos from YouTube. The pytube project is the core of this runner, so naturally

Shiva 2 Jun 21, 2022
Downloads photos you saved from a specific profile.

instagram-download-saved A python script that downloads photos you saved from a specific profile. The only dependency is instaloader, an open-source p

Aviv 1 Dec 19, 2021
Music, Album and Playlist downloader for JioSaavn

jiosaavn-dl Music, Album and Playlist downloader for JioSaavn Features Downloads tracks, albums and playlists in maximum available quality (320kbps AA

bunny 19 Dec 12, 2022
Download Web-10K data by querying Bing Image Search

gpv2-web10k This repository contains the script to download images from the Web-10K dataset. The script takes in a list of queries, queries Bing Image

AI2 8 Sep 06, 2022
Ripurei is a free-to-use osu! replay downloader, that can be configured to download from any osu! server.

Ripurei Ripurei is a fully functional osu! replay downloader, fully capable of downloading from almost any osu! server. Functionality Timeline ✔️ Able

Thomas 0 Feb 11, 2022
YT-Spammer-Purge - Allows you easily scan for and delete scam comments using several methods

YouTube Spammer Purge What Is This? - Allows you to filter and search for spamme

4.3k Dec 31, 2022
A YouTube downloader which allows you to choose which video you want

Youtube Video Downloader Download multiple videos in one go! How to Use 1.First type the video you want to download 2.On clicking the Search button yo

2 Dec 17, 2021
Youtube-downloader-using-Python - Youtube downloader using Python

Youtube-downloader-using-Python Hii guys !! Fancy to see here Welcome! built by

Lakshmi Deepak 2 Jun 09, 2022
Youtube-music - Youtube music with python

youtube-music fzf on https://github.com/junegunn/fzf python3 ytb.py [no/yes] yes

direskyfer 0 Feb 03, 2022
FireDM is a python open source (Internet Download Manager) with multi-connections, high speed engine, it downloads general files and videos from youtube and tons of other streaming websites .

python open source (Internet Download Manager) with multi-connections, high speed engine, based on python, LibCurl, and youtube_dl https://github.com/firedm/FireDM

1.6k Apr 12, 2022
Download YouTube videos that are available in the given playlist

Youtube-Playlist-Downloader Download YouTube videos that are available in the given playlist Project assets: music downloaded music folder. (will be g

Sultan Aljaberi 1 Dec 22, 2021
the best video downloader for terminals (currently only compatible with Linux and Windows)

the best video downloader for terminals (currently only compatible with Linux and Windows)

Amaral 2 Oct 14, 2021