Web Downloader With Python

Overview

Web Downloader

Introduction

This module will provide API to download the webpage components : html file, image file, css fil, javascript file, href link file based on the input url (the url must start with 'http' or 'https' ).

To prosses multiple URLs at the same time, The user can list all the url he wants to download in the file "urllist.txt" as shown below:

# Add the URL you want to download line by line(The url must start with 'http' or 'https' ):
# example: https://www.google.com
https://www.google.com
https://www.carousell.sg/
https://www.google.com/search?q=github&sxsrf=AOaemvJh3t5_h8H85AE8Ajbb1IMnBrRISA%3A1636698503535&source=hp&ei=hwmOYY6mHdGkqtsPq8S9sAY&iflsig=ALs-wAMAAAAAYY4Xl7GLWS16_xc2Q9XrG0p3q277DpkL&oq=&gs_lcp=Cgdnd3Mtd2l6EAEYADIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzINCC4QxwEQowIQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJzIHCCMQ6gIQJ1AAWABgjgdoAXAAeACAAQCIAQCSAQCYAQCwAQo&sclient=gws-wiz
https://stackoverflow.com/questions/66022042/how-to-let-kubernetes-pod-run-a-local-script/66025424

Program Setup

Development Environment : python 3.7.4
Additional Lib/Software Need
  1. beautifulsoup4 4.10.0

    install:

    pip install beautifulsoup4
    

    Lib link: https://pypi.org/project/beautifulsoup4/

Hardware Needed : None
Program File List

version: v0.1

Program File Execution Env Description
webDownload.py python 3 Main executable program use the API.
urllist.txt url record list.

Program Usage

Module API Usage
  1. Downloader init:
soup = urlDownloader(imgFlg=True, linkFlg=True, scriptFlg=True)
  • imgFlg: Set to "True" to download all the "" tag files.
  • linkFlg: Set to "True" to download all the html section, image, icon, css file imported by ""
  • scriptFlg: set to "True" to download all the js file.
  1. Call API method savePage to scape url and save the data in a folder

    soup.savePage('
         
          ', '
          
           ')
    
    # Exampe:
    soup.savePage('https://www.google.com', 'www_google_com')
    
          
         
Program Execution
  1. Copy the url you want to check in the url record file "urllist.txt"

  2. Cd to the program folder and run program execution cmd:

    python webDownload.py
    
  3. Check the result:

    For example, if you copy the url "https://www.carousell.sg/" as the first url you want to check into the file "urllist.txt" file, all the html files, image file and js files will be under folder "1_www.carousell.sg_files"

    • The main web page will be saved as: "1_www.carousell.sg_files/1_www.carousell.sg.html"
    • The image used in the page will be saved in folder: "1_www.carousell.sg_files/img"
    • The html/imge/css import by href will be saved in folder: "1_www.carousell.sg_files/link"
    • The js file used by the page will be saved in fodler: "1_www.carousell.sg_files/script"

Problem and Solution

Problem[0]: Files download got slight different

Why there is a slight different between the files which download by using the program and the files which downlaod I use some-webBrowser's "page save as " for the same URL such as www.google.com

OS Platform : n.a

Error Message: n.a

Type: n.a

Solution:

This is normal situation, the logic of web scrape and browser display are different: if you type www.google.ccom if different people's browser, you can see the page shown on different browser are also different. This is because the browser cache, token in the local storage , cookie will make influence of the "GET" request. So when different people type in the google URL in their browser, they can see their own Gmail Icon shows on the right top corner. If you remove all the cache, token in the local storage , cookie of your browser and try "page save as ", the file downloaded by "page save as " should be same as the program.

Problem[2]: Some download Image are empty

OS Platform : n.a

Error Message: n.a

Type: n.a

Solution:

If a web use third party's storage to save the image and the net-storage need to authorization before download, our program download request will be reject and got 'null' when download the file. Then the saved image will be empty.


Last edit by LiuYuancheng([email protected]) at 13/11/2021

Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included.

open(LARGE) Storing, versioning, and downloading files from S3 made as easy as using open() in Python. Caching included. Motivation Oftentimes, especi

András Schmelczer 2 Jan 30, 2022
TikTok downloader video without watermark from Telegram bot

⬇️ How to download video from Tik Tok via telegram bot? Send a link to the video from tik tok to our telegram bot and it will send you a video without

1 Mar 04, 2022
This project is helps to download contents from Streamtape by utilizing the API

It scrapes Streamtape api and download contents from the site.

Debiprasad Das 5 Dec 28, 2022
Open Source application for downloading and playing music.

Musifre Greetings For HackHeist(Wartex) Judges: Synopsis, Promotion Video & Product Functioning Video are present in Documentation Folder. A Star woul

Yash Dhingra 9 Mar 22, 2022
Download YOUR files, documents from vk.

vk-documents-downloader Кароч эта симпл херня качает все ВАШИ документы с вк. Или я еблан, но в гх и тмб гугле я подобного не нашел. py main.py Login:

4 Jun 10, 2022
A user-friendly GUI for the ZSpotify music downloader.

ZSpotifyGUI A user-friendly desktop app for ZSpotify music downloader for Windows, MacOs, and Linux Discord Server - Matrix Server - Gitea Mirror - Ma

94 Dec 17, 2022
squid-dl is a massively parallel yt-dlp-based YouTube downloader.

squid-dl squid-dl is a massively parallel yt-dlp-based YouTube downloader. Installation Run the setup.py, which will install squid-dl and its two depe

tuxlovesyou 51 Jan 05, 2023
YT-Spammer-Purge - Allows you easily scan for and delete scam comments using several methods

YouTube Spammer Purge What Is This? - Allows you to filter and search for spamme

4.3k Dec 31, 2022
A python program to download one or multiple videos from YouTube.

YouTube-Video-Downloader A python program to download one or multiple videos from YouTube. Quick Start guide First Clone The Project git clone https:/

Imira Randeniya 1 Sep 11, 2022
Download minecraft head or skin, allows TLauncher accounts

Minecraft-skin-downloader Download minecraft head or skin, allows TLauncher accounts by BoBkiNN_ Contact: https://vk.com/bobkinnvk Requirements: Modul

3 Apr 03, 2022
Download Web-10K data by querying Bing Image Search

gpv2-web10k This repository contains the script to download images from the Web-10K dataset. The script takes in a list of queries, queries Bing Image

AI2 8 Sep 06, 2022
A small distributed download manager to help bypass device-specific bandwidth limitations.

Distributed Download Manager A small distributed download manager to help bypass device-specific bandwidth limitations. Architecture The download mana

Anand Balaji 3 Sep 23, 2022
Fully Automated YouTube Channel ▶️with Added Extra Features.

Fully Automated Youtube Channel ▒█▀▀█ █▀▀█ ▀▀█▀▀ ▀▀█▀▀ █░░█ █▀▀▄ █▀▀ █▀▀█ ▒█▀▀▄ █░░█ ░░█░░ ░▒█░░ █░░█ █▀▀▄ █▀▀ █▄▄▀ ▒█▄▄█ ▀▀▀▀ ░░▀░░ ░▒█░░ ░▀▀▀ ▀▀▀░

sam-sepiol 249 Jan 02, 2023
DYA ( Ditch YouTube API ) is a package created to power the user with YouTube Data API functionality without any API Key

Ditch YouTubeAPI (BETA) DYA ( Ditch YouTube API ) is a package created to power the user with YouTube Data API functionality without any API Key Detai

Sougata Jana 23 Dec 22, 2022
Used Insta Loader to download high quality images from instagram account

Insta Dp Downloader Project Description: In this project, I have used "Insta Loader" to download high quality images from instagram account. You only

Hassan Shahzad 3 Oct 31, 2022
Youtube video downloader and info extractor for python.

tube_dl Tube_dl is a Simple Youtube video downloader for Python. A Modular approach to bypass and download Youtube Videos and Playlist from Youtube us

Shekhar Chander 16 Jul 09, 2022
VK sticker downloader with python

VK Sticker Downloader This repository is used to automate download file from VK Sticker How to use Execute the file ./downloader.py Writedown full url

Hartawan Bahari M. 1 Dec 29, 2021
Downloads yiffer.xyz comics as images

yiffer-dl Downloads comics as images from yiffer.xyz.

Maxim 2 Mar 20, 2022
Downloads separate (specified) file to a randomly generated folder in /TEMP then executes it.

PyTemp-1 A Python3 file downloader. What you do with this code / project / idea is non of my buisness or concern, and this was made for **educational*

NightTab 1 Aug 03, 2022
A collection of modules I have created to programmatically search for/download imagery from live cam feeds across the state of California.

A collection of modules that I have created to programmatically search for/download imagery from all publicly available live cam feeds across the state of California. In no way am I affiliated with a

Chad Groom 5 Nov 21, 2022