Generate a repository with mirror links for DriveDroid app

Overview

DriveDroid Repository Generator

Generate a repository for the app that allow boot a PC using ISO files stored on your Android phone

Check also an official scraper written in JavaScript


Try Already Built Repo

Add the next link to image repositories in DriveDroid app:

https://dd.hexed.pw

or

https://raw.githubusercontent.com/flameshikari/ddrg/master/repo/repo.json

Contents

Requirements

Python 3.6+ with packages included in requirements.txt.

I recommend to create a venv then install packages there.

Usage

python ./src/main.py [-i dir] [-o dir] [-g]

-i dir where dir is a directory with distro scrapers (./src/distros is default).

-o dir where dir is a directory where the built repo will be saved (./build is default).

-g will generate a webpage to present the content of repo.json.

-h option is available anyway.

How to Make a Scraper

Create a folder in ./src/distros with next structure:

distro_name
├── info.toml
├── logo.png
└── scraper.py

If distro_name starts with underscore (e.g. _disabled), it will not be counted.

Let's take a look for every file.

info.toml

info.toml contains a distro name and a link to the official website. Arch Linux info.toml example:

name = "Arch Linux" # name of distro
url  = "https://example.com" # official site

If info.toml is missing or values ain't provided, fallback values will be used. Arch Linux fallback values will be next:

name = "arch" # distro folder name as value, also used in url
url  = "https://distrowatch.com/table.php?distribution=arch"

logo.png

Should be 128x128px with transparent background. Arch Linux logo.png example:


Arch Linux


If logo.png is missing, the fallback logo will be used:


DriveDroid Logo


scraper.py

A scraper can be written as you like, as long as it returns the desired values.

It must return an array of tuples (every tuple contains iso_url, iso_arch, iso_size, iso_version in order).

Arch Linux scraper returns next values:

[
  (
    'https://mirror.yandex.ru/archlinux/iso/2021.05.01/archlinux-2021.05.01-x86_64.iso',
    'x86_64',
    792014848,
    '2021.05.01'
  ),
  (
    'https://mirror.yandex.ru/archlinux/iso/2021.06.01/archlinux-2021.06.01-x86_64.iso',
    'x86_64',
    811937792,
    '2021.06.01'
  ),
  (
    'https://mirror.yandex.ru/archlinux/iso/2021.07.01/archlinux-2021.07.01-x86_64.iso',
    'x86_64',
    817180672,
    '2021.07.01'
  ),
  (
    'https://mirror.yandex.ru/archlinux/iso/archboot/2020.07/archlinux-2020.07-1-archboot-network.iso',
    'x86_64',
    516947968,
    '2020.07'
  ),
  (
    'https://mirror.yandex.ru/archlinux/iso/archboot/2020.07/archlinux-2020.07-1-archboot.iso',
    'x86_64',
    1280491520,
    '2020.07'
  )
]

A scraper includes from public import * in top which imports next stuff to the namespace:

  • bs (short for BeautifulSoup)
  • json
  • re
  • requests

Also it includes these functions:

  • get_afh_url(iso_url) — returns a download link for the file from AndroidFileHost
    iso_url must be like this: https://androidfilehost.com/?fid=8889791610682936459
  • get_iso_arch(iso_url) — returns the used processor architecture of iso_url
  • get_iso_size(iso_url) — returns the file size of iso_url in bytes

Arch Linux scraper.py example:

from public import *  # noqa


def init():

    array = []
    base_urls = [
        "https://mirror.yandex.ru/archlinux/iso/latest",
        "https://mirror.yandex.ru/archlinux/iso/archboot/latest"
    ]

    for base_url in base_urls:

        html = bs(requests.get(base_url).text, "html.parser")

        for filename in html.find_all("a", {"href": re.compile("^.*\.iso$")}):

            iso_url = f"{base_url}/{filename['href']}"
            iso_arch = get_iso_arch(iso_url)
            iso_size = get_iso_size(iso_url)
            iso_version = re.search(r"-(\d+.\d+(.\d+)?)", iso_url).group(1)

            array.append((iso_url, iso_arch, iso_size, iso_version))

    return array

Misc

Here's a snippet for nginx if you decided to self host the repository with website and you wanna access repo.json only by hostname via DriveDroid. Place it in server section of your config:

location = / {
  if ($http_user_agent ~* 'okhttp') {
    rewrite ^/(.*)$ /repo.json break;
  }
}

Roadmap

  • Option to generate a webpage
  • Add a mechanism to retry scraping if a network error occurs
  • Option to select mirrors (mainly uses mirrors based in Russia)
  • Package this project perhaps
  • Probably make the code better

Credits

License

MIT License

Copyright © 2021 flameshikari

Explore scraping with BeautifulSoup!

beautifulsoup-scrape Explore scraping with BeautifulSoup! Part One: Start from Shakespeare As my professor is a poet (yes, and he teaches me data and

Chuqin 2 Oct 05, 2022
原神爬虫 抓取原神界面圣遗物信息

原神圣遗物半自动爬虫 说明 直接抓取原神界面中的圣遗物数据 目前只适配了背包页面的抓取 准确率:97.5%(普通通用接口,对 40 件随机圣遗物识别,统计完全正确的数量为 39) 准确率:100%(4k 屏幕,普通通用接口,对 110 件圣遗物识别,统计完全正确的数量为 110) 不排除还有小错误的

hwa 28 Oct 10, 2022
Parsel lets you extract data from XML/HTML documents using XPath or CSS selectors

Parsel Parsel is a BSD-licensed Python library to extract and remove data from HTML and XML using XPath and CSS selectors, optionally combined with re

Scrapy project 859 Dec 29, 2022
fork huanghyw/jd_seckill

Jd_Seckill 特别声明: 本仓库发布的jd_seckill项目中涉及的任何脚本,仅用于测试和学习研究,禁止用于商业用途,不能保证其合法性,准确性,完整性和有效性,请根据情况自行判断。 本项目内所有资源文件,禁止任何公众号、自媒体进行任何形式的转载、发布。

512 Jan 03, 2023
Html Content / Article Extractor, web scrapping lib in Python

Python-Goose - Article Extractor Intro Goose was originally an article extractor written in Java that has most recently (Aug2011) been converted to a

Xavier Grangier 3.8k Jan 02, 2023
crypto currency scraping

SCRYPTO What ? Crypto currencies scraping (At the moment, only bitcoin and ethereum crypto currencies are supported) How ? A python script is running

15 Sep 01, 2022
CreamySoup - a helper script for automated SourceMod plugin updates management.

CreamySoup/"Creamy SourceMod Updater" (or just soup for short), a helper script for automated SourceMod plugin updates management.

3 Jan 03, 2022
河南工业大学 完美校园 自动校外打卡

HAUT-checkin 河南工业大学自动校外打卡 由于github actions存在明显延迟,建议直接使用腾讯云函数 特点 多人打卡 使用简单,仅需账号密码以及用于微信推送的uid 自动获取上一次打卡信息用于打卡 向所有成员微信单独推送打卡状态 完美校园服务器繁忙时造成打卡失败会自动重新打卡

36 Oct 27, 2022
Scraping web pages to get data

Scraping Data Get public data and save in database This is project use Python How to run a project 1 - Clone the repository 2 - Install beautifulsoup4

Soccer Project 2 Nov 01, 2021
High available distributed ip proxy pool, powerd by Scrapy and Redis

高可用IP代理池 README | 中文文档 本项目所采集的IP资源都来自互联网,愿景是为大型爬虫项目提供一个高可用低延迟的高匿IP代理池。 项目亮点 代理来源丰富 代理抓取提取精准 代理校验严格合理 监控完备,鲁棒性强 架构灵活,便于扩展 各个组件分布式部署 快速开始 注意,代码请在release

SpiderClub 5.2k Jan 03, 2023
联通手机营业厅自动做任务、签到、领流量、领积分等。

联通手机营业厅自动完成每日任务,领流量、签到获取积分等,月底流量不发愁。 功能 沃之树领流量、浇水(12M日流量) 每日签到(1积分+翻倍4积分+第七天1G流量日包) 天天抽奖,每天三次免费机会(随机奖励) 游戏中心每日打卡(连续打卡,积分递增至最高

2k May 06, 2021
Web Scraping Practica With Python

Web-Scraping-Practica Integrants: Guillem Vidal Pallarols. Lídia Bandrés Solé Fitxers: Aquest document és el primer que trobem. A continuació trobem u

2 Nov 08, 2021
Bigdata - This Scrapy project uses Redis and Kafka to create a distributed on demand scraping cluster

Scrapy Cluster This Scrapy project uses Redis and Kafka to create a distributed

Hanh Pham Van 0 Jan 06, 2022
Linkedin webscraping - Linkedin web scraping with python

linkedin_webscraping This is the first step of a full project called "LinkedIn J

Pedro Dib 4 Apr 24, 2022
This was supposed to be a web scraping project, but somehow I've turned it into a spamming project

Introduction This was supposed to be a web scraping project, but somehow I've turned it into a spamming project.

Boss Perry (Pez) 1 Jan 23, 2022
A scrapy pipeline that provides an easy way to store files and images using various folder structures.

scrapy-folder-tree This is a scrapy pipeline that provides an easy way to store files and images using various folder structures. Supported folder str

Panagiotis Simakis 7 Oct 23, 2022
Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data.

Web scraped S&P 500 Data from Wikipedia using Pandas and performed Exploratory Data Analysis on the data. Then used Yahoo Finance to get the related stock data and displayed them in the form of chart

Samrat Mitra 3 Sep 09, 2022
A simple python web scraper.

Dissec A simple python web scraper. It gets a website and its contents and parses them with the help of bs4. Installation To install the requirements,

11 May 06, 2022
Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Django and Vue.js

Gerapy Distributed Crawler Management Framework Based on Scrapy, Scrapyd, Scrapyd-Client, Scrapyd-API, Django and Vue.js. Documentation Documentation

Gerapy 2.9k Jan 03, 2023
TikTok Username Swapper/Claimer/etc

TikTok-Turbo TikTok Username Swapper/Claimer/etc I wanted to create it as fast as possible but i eventually gave up and recoded it many many many many

Kevin 12 Dec 19, 2022