Make Selenium work on Github Actions

Overview

Make Selenium work on Github Actions

Scraping with BeautifulSoup on GitHub Actions is easy-peasy. But what about Selenium?? After you jump through some hoops it's just as simple.

How to use this

I think you can just fork this repository and get to work on scraper.py.

Otherwise, just steal scraper.py and .github/workflows/scrape.yml and you'll be good to go.

Saving CSV files

If you want every scrape to update a CSV or something like this, you'll need to edit your scrape.yml to commit to the repo after the scraper is done running.

Take a look at the bottom of autoscraper-history to see how to do that. It should be one more step, something like this:

      - name: Commit and push if content changed
        run: |-
          git config user.name "Automated"
          git config user.email "[email protected]"
          git add -A
          timestamp=$(date -u)
          git commit -m "Latest data: ${timestamp}" || exit 0
          git push

I'm pretty sure I lifted that code 100% from Simmon Willison.

Scheduling

Right now the yaml is set to only scrape when you click the "Run workflow" button in the Actions tab, but you can add something like

  schedule:
    - cron: '0 * * * *'

To make it run the first minute of every hour.

How this all works

To make Selenium + GitHub Actions work, this repo does a few magic things. In a normal world, you start Chrome like this:

from selenium import webdriver

driver = webdriver.Chrome()

But we.... do things a little differently. Let me walk you through the changes!

webdriver-manager to manage the webdriver

You can add a special action to set up Chromedriver but I feel it's honestly easier to use webdriver-manager. It magically picks out the right version of chromedriver for you.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

It works on your local machine, too!

Chromium, not Chrome

When you're running GitHub Actions, it's probably on a nice little Ubuntu Linux machine. In those situations, you install software using apt. Since you can't install Chrome with apt, you'll install Chromium instead, the open-source version of Chrome. Works the same, just opens a little differently.

As a result, our chromedriver install code changed a little more:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType

driver_path = ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()
driver = webdriver.Chrome(driver_path)

This also means we need to add a line that does apt-get install -y chromium-browser to our scrape.yml)

Headless Chromium

Since we don't have a monitor plugged into GitHub Actions, we can't actually see what's going on in the browser. In the olden days you had to construct some odd technical fake screen, but these days you just run in headless mode!

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)

Other options and more management of things

To combine the webdriver-manager driver path and the headless Chrome option, there are a lot of hoops to jump through. During the research process a lot of extra chrome options poked up, so I thought hey, let's just add all those, too.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

chrome_service = Service(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install())

chrome_options = Options()
options = [
    "--headless",
    "--disable-gpu",
    "--window-size=1920,1200",
    "--ignore-certificate-errors",
    "--disable-extensions",
    "--no-sandbox",
    "--disable-dev-shm-usage"
]
for option in options:
    chrome_options.add_argument(option)

driver = webdriver.Chrome(service=chrome_service, options=chrome_options)

The biggest change here beyond all those options is the Service thing. Apparently just giving it the path to chromdriver isn't good enough? Who knows, I just do what works.

Owner
Jonathan Soma
baby data journo wrangler @ledeprogram + @littlecolumns, cat wrangler @cat-republic
Jonathan Soma
:game_die: Pytest plugin to randomly order tests and control random.seed

pytest-randomly Pytest plugin to randomly order tests and control random.seed. Features All of these features are on by default but can be disabled wi

pytest-dev 471 Dec 30, 2022
Silky smooth profiling for Django

Silk Silk is a live profiling and inspection tool for the Django framework. Silk intercepts and stores HTTP requests and database queries before prese

Jazzband 3.7k Jan 04, 2023
create custom test databases that are populated with fake data

About Generate fake but valid data filled databases for test purposes using most popular patterns(AFAIK). Current support is sqlite, mysql, postgresql

Emir Ozer 2.2k Jan 04, 2023
Scalable user load testing tool written in Python

Locust Locust is an easy to use, scriptable and scalable performance testing tool. You define the behaviour of your users in regular Python code, inst

Locust.io 20.4k Jan 04, 2023
Python wrapper of Android uiautomator test tool.

uiautomator This module is a Python wrapper of Android uiautomator testing framework. It works on Android 4.1+ (API Level 16~30) simply with Android d

xiaocong 1.9k Dec 30, 2022
✅ Python web automation and testing. 🚀 Fast, easy, reliable. 💠

Build fast, reliable, end-to-end tests. SeleniumBase is a Python framework for web automation, end-to-end testing, and more. Tests are run with "pytes

SeleniumBase 3k Jan 04, 2023
Python Rest Testing

pyresttest Table of Contents What Is It? Status Installation Sample Test Examples Installation How Do I Use It? Running A Simple Test Using JSON Valid

Sam Van Oort 1.1k Dec 28, 2022
FaceBot is a script to automatically create a facebook account using the selenium and chromedriver modules.

FaceBot is a script to automatically create a facebook account using the selenium and chromedriver modules. That way, we don't need to input full name, email and password and date of birth. All will

Fadjrir Herlambang 2 Jun 17, 2022
PacketPy is an open-source solution for stress testing network devices using different testing methods

PacketPy About PacketPy is an open-source solution for stress testing network devices using different testing methods. Currently, there are only two c

4 Sep 22, 2022
Python scripts for a generic performance testing infrastructure using Locust.

TODOs Reference to published paper or online version of it loadtest_plotter.py: Cleanup and reading data from files ARS_simulation.py: Cleanup, docume

Juri Tomak 3 Dec 15, 2022
It helps to use fixtures in pytest.mark.parametrize

pytest-lazy-fixture Use your fixtures in @pytest.mark.parametrize. Installation pip install pytest-lazy-fixture Usage import pytest @pytest.fixture(p

Marsel Zaripov 299 Dec 24, 2022
WIP SAT benchmarking tooling, written with only my personal use in mind.

SAT Benchmarking Some early work in progress tooling for running benchmarks and keeping track of the results when working on SAT solvers and related t

Jannis Harder 1 Dec 26, 2021
Fills out the container extension form automatically. (Specific to IIT Ropar)

automated_container_extension Fills out the container extension form automatically. (Specific to IIT Ropar) Download the chrome driver from the websit

Abhishek Singh Sambyal 1 Dec 24, 2021
Language-agnostic HTTP API Testing Tool

Dredd — HTTP API Testing Framework Dredd is a language-agnostic command-line tool for validating API description document against backend implementati

Apiary 4k Jan 05, 2023
Instagram unfollowing bot. If this script is executed that specific accounts following will be reduced

Instagram-Unfollower-Bot Instagram unfollowing bot. If this script is executed that specific accounts following will be reduced.

Biswarup Bhattacharjee 1 Dec 24, 2021
An Instagram bot that can mass text users, receive and read a text, and store it somewhere with user details.

Instagram Bot 🤖 July 14, 2021 Overview 👍 A multifunctionality automated instagram bot that can mass text users, receive and read a message and store

Abhilash Datta 14 Dec 06, 2022
Python Testing Crawler 🐍 🩺 🕷️ A crawler for automated functional testing of a web application

Python Testing Crawler 🐍 🩺 🕷️ A crawler for automated functional testing of a web application Crawling a server-side-rendered web application is a

70 Aug 07, 2022
Fi - A simple Python 3.9+ command-line application for managing Fidelity portfolios

fi fi is a simple Python 3.9+ command-line application for managing Fidelity por

Darik Harter 2 Feb 26, 2022
Automação de Processos (obtenção de informações com o Selenium), atualização de Planilha e Envio de E-mail.

Automação de Processo: Código para acompanhar o valor de algumas ações na B3. O código entra no Google Drive, puxa os valores das ações (pré estabelec

Hemili Beatriz 1 Jan 08, 2022
reCaptchaBypasser For Bypass Any reCaptcha For Selenium Python

reCaptchaBypasser ' Usage : from selenium import webdriver from reCaptchaBypasser import reCaptchaScraper import time driver = webdriver.chrome(execu

Dr.Linux 8 Dec 17, 2022