Python script to download the celebA-HQ dataset from google drive

Last update: Dec 21, 2022

Related tags

Overview

download-celebA-HQ

Python script to download and create the celebA-HQ dataset.

WARNING from the author. I believe this script is broken since a few months (I have not try it for a while). I am really sorry about that. If you fix it, please share you solution in a PR so that everyone can benefit from it.

To get the celebA-HQ dataset, you need to a) download the celebA dataset download_celebA.py , b) download some extra files download_celebA_HQ.py, c) do some processing to get the HQ images make_HQ_images.py.

The size of the final dataset is 89G. However, you will need a bit more storage to be able to run the scripts.

Usage

Clone the repository

git clone https://github.com/nperraud/download-celebA-HQ.git
cd download-celebA-HQ

Install necessary packages (Because specific versions are required Conda is recomended)

Install miniconda https://conda.io/miniconda.html
Create a new environement

conda create -n celebaHQ python=3
source activate celebaHQ

Install the packages

conda install jpeg=8d tqdm requests pillow==3.1.1 urllib3 numpy cryptography scipy
pip install opencv-python==3.4.0.12 cryptography==2.1.4

Install 7zip (On Ubuntu)

sudo apt-get install p7zip-full

Run the scripts

python download_celebA.py ./
python download_celebA_HQ.py ./
python make_HQ_images.py ./

where ./ is the directory where you wish the data to be saved.

Go watch a movie, theses scripts will take a few hours to run depending on your internet connection and your CPU power. The final HQ images will be saved as .npy files in the ./celebA-HQ folder.

Windows

The script may work on windows, though I have not tested this solution personnaly

Step 2 becomes

Install miniconda https://conda.io/miniconda.html or anaconda
Create a new environement

conda create -n celebaHQ python=3
source activate celebaHQ

Install the packages

conda  install -c anaconda jpeg=8d tqdm requests pillow==3.1.1 urllib3 numpy cryptography scipy

Install 7zip

The rest should be unchanged.

Docker

If you have Docker installed, skip the previous installation steps and run the following command from the root directory of this project:

docker build -t celeba . && docker run -it -v $(pwd):/data celeba

By default, this will create the dataset in same directory. To put it elsewhere, replace $(pwd) with the absolute path to the desired output directory.

Outliers

It seems that the dataset has a few outliers. A of problematic images is stored in bad_images.txt. Please report if you find other outliers.

Remark

This script is likely to break somewhere, but if it executes until the end, you should obtain the correct dataset.

Sources

This code is inspired by these files

Citing the dataset

You probably want to cite the paper "Progressive Growing of GANs for Improved Quality, Stability, and Variation" that was submitted to ICLR 2018 by Tero Karras (NVIDIA), Timo Aila (NVIDIA), Samuli Laine (NVIDIA), Jaakko Lehtinen (NVIDIA and Aalto University).

Python script to download the celebA-HQ dataset from google drive

Related tags

Overview

download-celebA-HQ

Usage

Windows

Docker

Outliers

Remark

Sources

Citing the dataset

Owner

Code release for SLIP Self-supervision meets Language-Image Pre-training

Code for ICCV2021 paper SPEC: Seeing People in the Wild with an Estimated Camera

Nodule Generation Algorithm Baseline and template code for node21 generation track

git《Commonsense Knowledge Base Completion with Structural and Semantic Context》(AAAI 2020) GitHub: [fig1]

E2e music remastering system - End-to-end Music Remastering System Using Self-supervised and Adversarial Training

Keyword2Text This repository contains the code of the paper: "A Plug-and-Play Method for Controlled Text Generation"

Based on the paper "Geometry-aware Instance-reweighted Adversarial Training" ICLR 2021 oral

This is the repository for our paper Ditch the Gold Standard: Re-evaluating Conversational Question Answering

Dynamic Multi-scale Filters for Semantic Segmentation (DMNet ICCV'2019)

On Out-of-distribution Detection with Energy-based Models

PyTorch Implementation of Small Lesion Segmentation in Brain MRIs with Subpixel Embedding (ORAL, MICCAIW 2021)

Code and datasets for the paper "KnowPrompt: Knowledge-aware Prompt-tuning with Synergistic Optimization for Relation Extraction"

QTool: A Low-bit Quantization Toolbox for Deep Neural Networks in Computer Vision

This is the implementation of "SELF SUPERVISED REPRESENTATION LEARNING WITH DEEP CLUSTERING FOR ACOUSTIC UNIT DISCOVERY FROM RAW SPEECH" submitted to ICASSP 2022

Tackling data scarcity in Speech Translation using zero-shot multilingual Machine Translation techniques

Python scripts form performing stereo depth estimation using the HITNET model in ONNX.

Anti-UAV base on PaddleDetection

Python package facilitating the use of Bayesian Deep Learning methods with Variational Inference for PyTorch

This repository contains notebook implementations of the following Neural Process variants: Conditional Neural Processes (CNPs), Neural Processes (NPs), Attentive Neural Processes (ANPs).

Fast image augmentation library and an easy-to-use wrapper around other libraries