A web scraper for nomadlist.com, made to avoid website restrictions.

Related tags

Web Crawlinggypsylist
Overview

Gypsylist

gypsylist.py is a web scraper for nomadlist.com, made to avoid website restrictions.

nomadlist.com is a website with a lot of information for digital nomad people, to find the best places to live and work remotely as a location independent remote worker. Unfortunately most of these contents are restricted if you are not member of this website.

This script doesn't cover all of the information retrievable from the website, but it's just an entry point to evaluate this without to sign up.

Installation

Before to use gypsylist you have to install some requirements:

pip3 install -r requirements.txt

Additionally, having selenium as dependency, you have also to setup the browser driver. To install this, please, take a look here: https://www.selenium.dev/documentation/webdriver/getting_started/install_drivers/.

Now you should be ready to run the script.

Usage

To use gypsylist, at first, browse the nomadlist.com website and apply the filters you need to do your research. Now, get the url path from the address bar of your browser (as shown below):

And use this to scrape with gypsylist:

./gypsylist.py --path "safe-places-for-remote-workers-to-live?sort=cost_for_nomad_in_usd&order=asc" --emoji

This is going to be the expected result:

#1
๐Ÿ™๏ธ  city: Lisbon
๐ŸŒŽ country: Portugal
โญ๏ธ overall: 4/5
๐Ÿ’ต cost: 4/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 5/5
๐Ÿ‘ฎ safety: 4/5

...

#440
๐Ÿ™๏ธ  city: Zurich
๐ŸŒŽ country: Switzerland
โญ๏ธ overall: 3/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 4/5
๐Ÿ‘ฎ safety: 4/5

#441
๐Ÿ™๏ธ  city: Leiden
๐ŸŒŽ country: Netherlands
โญ๏ธ overall: 3/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 4/5
๐Ÿ‘ฎ safety: 4/5

#442
๐Ÿ™๏ธ  city: Honolulu, Hawaii
๐ŸŒŽ country: United States
โญ๏ธ overall: 4/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 5/5
๐Ÿ‘ฎ safety: 4/5

#443
๐Ÿ™๏ธ  city: Lake Tahoe, CA
๐ŸŒŽ country: United States
โญ๏ธ overall: 3/5
๐Ÿ’ต cost: 1/5
๐Ÿ“ก internet: 5/5
๐Ÿ˜€ fun: 4/5
๐Ÿ‘ฎ safety: 4/5

(Always remember --emoji). Have fun!

Known Issues

This is not what you can call "a well written code" (sorry Gods of programming for this). For this reason there are several code smell or bugs that are not under review (due to the short time I dedicated to write the script).

  • Using --headless / -H parameter to set the browser in headless mode, you will retrieve just the first page contents from the website.
Owner
Alessio Greggi
Computer Scientist graduated at the University of Rome, Tor Vergata. Currently working as Linux Engineer. CTF Player during free time.
Alessio Greggi
Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

Repositรณrio contendo scripts Python que realizam a consulta de CPF e CNPJ diretamente no site da Receita Federal.

Josuรฉ Campos 5 Nov 29, 2021
Dictionary - Application focused on word search through web scraping

Dictionary - Application focused on word search through web scraping, in addition to other functions such as dictation, spell and conjugation of syllables.

Juan Manuel 2 May 09, 2022
A module for CME that spiders hashes across the domain with a given hash.

hash_spider A module for CME that spiders hashes across the domain with a given hash. Installation Simply copy hash_spider.py to your CME module folde

37 Sep 08, 2022
Open Crawl Vietnamese Text

Open Crawl Vietnamese Text This repo contains crawled Vietnamese text from multiple sources. This list of a topic-centric public data sources in high

QAI Research 4 Jan 05, 2022
CreamySoup - a helper script for automated SourceMod plugin updates management.

CreamySoup/"Creamy SourceMod Updater" (or just soup for short), a helper script for automated SourceMod plugin updates management.

3 Jan 03, 2022
Binance harvester - A Python 3 script to harvest data from the Binance socket stream and calculate popular TA indicators and produce lists of top trending coins

Binance harvester - A Python 3 script to harvest data from the Binance socket stream and calculate popular TA indicators and produce lists of top trending coins

68 Oct 08, 2022
A database scraper created with mechanical soup and sqlite

WebscrapingDatabases a database scraper created with mechanical soup and sqlite author: Mariya Sha Watch on YouTube: This repository was created to su

Mariya 30 Aug 08, 2022
An helper library to scrape data from Instagram effortlessly, using the Influencer Hunters APIs.

Instagram Scraper An utility library to scrape data from Instagram hassle-free Go to the website ยป View Demo ยท Report Bug ยท Request Feature About The

2 Jul 06, 2022
An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post

Autoscraper-n-blogger An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post and notifies via Telegram bot

GOKUL A.P 13 Dec 21, 2022
Here I provide the source code for doing web scraping using the python library, it is Selenium.

Here I provide the source code for doing web scraping using the python library, it is Selenium.

M Khaidar 1 Nov 13, 2021
Danbooru scraper with python

Danbooru Version: 0.0.1 License under: MIT License Dependencies Python: = 3.9.7 beautifulsoup4 cloudscraper Example of use Danbooru from danbooru imp

Sugarbell 2 Oct 27, 2022
A high-level distributed crawling framework.

Cola: high-level distributed crawling framework Overview Cola is a high-level distributed crawling framework, used to crawl pages and extract structur

Xuye (Chris) Qin 1.5k Jan 04, 2023
Fundamentus scrapy

Fundamentus_scrapy Baixa informacรตes que os outros scrapys do fundamentus nรฃo realizam. Para iniciar (python main.py), sera criado um arquivo chamado

Guilherme Silva Uchoa 1 Oct 24, 2021
Twitter Claimer / Swapper / Turbo - Proxyless - Multithreading

Twitter Turbo / Auto Claimer / Swapper Version: 1.0 Last Update: 01/26/2022 Use this at your own descretion. I've only used this on test accounts and

Underscores 6 May 02, 2022
Pro Football Reference Game Data Webscraper

Pro Football Reference Game Data Webscraper Code Copyright Yeetzsche This is a simple Pro Football Reference Webscraper that can either collect all ga

6 Dec 21, 2022
ๅŸบไบŽGithub Action็š„ๅฎšๆ—ถHITsz็–ซๆƒ…ไธŠๆŠฅ่„šๆœฌ๏ผŒๅผ€็ฎฑๅณ็”จ

HITsz Daily Report ๅŸบไบŽ GitHub Actions ็š„ใ€ŒHITsz ็–ซๆƒ…็ณป็ปŸใ€่ฎฟ้—ฎๅ…ฅๅฃ ๅฎšๆ—ถ่‡ชๅŠจไธŠๆŠฅ่„šๆœฌ๏ผŒๅผ€็ฎฑๅณ็”จใ€‚ ๆ„Ÿ่ฐข @JellyBeanXiewh ๆไพ›ๅŽŸๅง‹่„šๆœฌๅ’Œ ideaใ€‚ ๆ„Ÿ่ฐข @bugstop ๅฏน่„šๆœฌ่ฟ›่กŒ้‡ๆž„ๅนถๆ–ฐๅขž Easy Connect ๆ กๅ†…ไปฃ็†่ฎฟ้—ฎใ€‚

Ter 56 Nov 27, 2022
Lovely Scrapper

Lovely Scrapper

Tushar Gadhe 2 Jan 01, 2022
Proxy scraper. Format: IP | PORT | COUNTRY | TYPE

proxy scraper ๐Ÿ”Ž Installation: git clone https://github.com/ebankoff/proxy_scraper Required pip libraries (pip install library name): lxml beautifulso

Eban'ko 19 Dec 07, 2022
IGLS - Instagram Like Scraper CLI tool

IGLS - Instagram Like Scraper It's a web scraping command line tool based on python and selenium. Description This is a trial tool for learning purpos

Shreshth Goyal 5 Oct 29, 2021
Twitter Eye is a Twitter Information Gathering Tool With Twitter Eye

Twitter Eye is a Twitter Information Gathering Tool With Twitter Eye, you can search with various keywords and usernames on Twitter.

Jolanda de Koff 19 Dec 12, 2022