Gypsylist

gypsylist.py is a web scraper for nomadlist.com, made to avoid website restrictions.

nomadlist.com is a website with a lot of information for digital nomad people, to find the best places to live and work remotely as a location independent remote worker. Unfortunately most of these contents are restricted if you are not member of this website.

This script doesn't cover all of the information retrievable from the website, but it's just an entry point to evaluate this without to sign up.

Installation

Before to use gypsylist you have to install some requirements:

pip3 install -r requirements.txt

Additionally, having selenium as dependency, you have also to setup the browser driver. To install this, please, take a look here: https://www.selenium.dev/documentation/webdriver/getting_started/install_drivers/.

Now you should be ready to run the script.

Usage

To use gypsylist, at first, browse the nomadlist.com website and apply the filters you need to do your research. Now, get the url path from the address bar of your browser (as shown below):

And use this to scrape with gypsylist:

./gypsylist.py --path "safe-places-for-remote-workers-to-live?sort=cost_for_nomad_in_usd&order=asc" --emoji

This is going to be the expected result:

#1
🏙️  city: Lisbon
🌎 country: Portugal
⭐️ overall: 4/5
💵 cost: 4/5
📡 internet: 5/5
😀 fun: 5/5
👮 safety: 4/5

...

#440
🏙️  city: Zurich
🌎 country: Switzerland
⭐️ overall: 3/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 4/5
👮 safety: 4/5

#441
🏙️  city: Leiden
🌎 country: Netherlands
⭐️ overall: 3/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 4/5
👮 safety: 4/5

#442
🏙️  city: Honolulu, Hawaii
🌎 country: United States
⭐️ overall: 4/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 5/5
👮 safety: 4/5

#443
🏙️  city: Lake Tahoe, CA
🌎 country: United States
⭐️ overall: 3/5
💵 cost: 1/5
📡 internet: 5/5
😀 fun: 4/5
👮 safety: 4/5

(Always remember --emoji). Have fun!

Known Issues

This is not what you can call "a well written code" (sorry Gods of programming for this). For this reason there are several code smell or bugs that are not under review (due to the short time I dedicated to write the script).

Using --headless / -H parameter to set the browser in headless mode, you will retrieve just the first page contents from the website.

A web scraper for nomadlist.com, made to avoid website restrictions.

Related tags

Overview

Gypsylist

Installation

Usage

Known Issues

Owner

Alessio Greggi

Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

Dictionary - Application focused on word search through web scraping

A module for CME that spiders hashes across the domain with a given hash.

Open Crawl Vietnamese Text

CreamySoup - a helper script for automated SourceMod plugin updates management.

Binance harvester - A Python 3 script to harvest data from the Binance socket stream and calculate popular TA indicators and produce lists of top trending coins

A database scraper created with mechanical soup and sqlite

An helper library to scrape data from Instagram effortlessly, using the Influencer Hunters APIs.

An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post

Here I provide the source code for doing web scraping using the python library, it is Selenium.

Danbooru scraper with python

A high-level distributed crawling framework.

Fundamentus scrapy

Twitter Claimer / Swapper / Turbo - Proxyless - Multithreading

Pro Football Reference Game Data Webscraper

基于Github Action的定时HITsz疫情上报脚本，开箱即用

Lovely Scrapper

Proxy scraper. Format: IP | PORT | COUNTRY | TYPE

IGLS - Instagram Like Scraper CLI tool

Twitter Eye is a Twitter Information Gathering Tool With Twitter Eye