Make Selenium work on Github Actions

Overview

Make Selenium work on Github Actions

Scraping with BeautifulSoup on GitHub Actions is easy-peasy. But what about Selenium?? After you jump through some hoops it's just as simple.

How to use this

I think you can just fork this repository and get to work on scraper.py.

Otherwise, just steal scraper.py and .github/workflows/scrape.yml and you'll be good to go.

Saving CSV files

If you want every scrape to update a CSV or something like this, you'll need to edit your scrape.yml to commit to the repo after the scraper is done running.

Take a look at the bottom of autoscraper-history to see how to do that. It should be one more step, something like this:

      - name: Commit and push if content changed
        run: |-
          git config user.name "Automated"
          git config user.email "[email protected]"
          git add -A
          timestamp=$(date -u)
          git commit -m "Latest data: ${timestamp}" || exit 0
          git push

I'm pretty sure I lifted that code 100% from Simmon Willison.

Scheduling

Right now the yaml is set to only scrape when you click the "Run workflow" button in the Actions tab, but you can add something like

  schedule:
    - cron: '0 * * * *'

To make it run the first minute of every hour.

How this all works

To make Selenium + GitHub Actions work, this repo does a few magic things. In a normal world, you start Chrome like this:

from selenium import webdriver

driver = webdriver.Chrome()

But we.... do things a little differently. Let me walk you through the changes!

webdriver-manager to manage the webdriver

You can add a special action to set up Chromedriver but I feel it's honestly easier to use webdriver-manager. It magically picks out the right version of chromedriver for you.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

It works on your local machine, too!

Chromium, not Chrome

When you're running GitHub Actions, it's probably on a nice little Ubuntu Linux machine. In those situations, you install software using apt. Since you can't install Chrome with apt, you'll install Chromium instead, the open-source version of Chrome. Works the same, just opens a little differently.

As a result, our chromedriver install code changed a little more:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType

driver_path = ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()
driver = webdriver.Chrome(driver_path)

This also means we need to add a line that does apt-get install -y chromium-browser to our scrape.yml)

Headless Chromium

Since we don't have a monitor plugged into GitHub Actions, we can't actually see what's going on in the browser. In the olden days you had to construct some odd technical fake screen, but these days you just run in headless mode!

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)

Other options and more management of things

To combine the webdriver-manager driver path and the headless Chrome option, there are a lot of hoops to jump through. During the research process a lot of extra chrome options poked up, so I thought hey, let's just add all those, too.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

chrome_service = Service(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install())

chrome_options = Options()
options = [
    "--headless",
    "--disable-gpu",
    "--window-size=1920,1200",
    "--ignore-certificate-errors",
    "--disable-extensions",
    "--no-sandbox",
    "--disable-dev-shm-usage"
]
for option in options:
    chrome_options.add_argument(option)

driver = webdriver.Chrome(service=chrome_service, options=chrome_options)

The biggest change here beyond all those options is the Service thing. Apparently just giving it the path to chromdriver isn't good enough? Who knows, I just do what works.

Owner
Jonathan Soma
baby data journo wrangler @ledeprogram + @littlecolumns, cat wrangler @cat-republic
Jonathan Soma
A rewrite of Python's builtin doctest module (with pytest plugin integration) but without all the weirdness

The xdoctest package is a re-write of Python's builtin doctest module. It replaces the old regex-based parser with a new abstract-syntax-tree based pa

Jon Crall 174 Dec 16, 2022
Python Projects - Few Python projects with Testing using Pytest

Python_Projects Few Python projects : Fast_API_Docker_PyTest- Just a simple auto

Tal Mogendorff 1 Jan 22, 2022
Fi - A simple Python 3.9+ command-line application for managing Fidelity portfolios

fi fi is a simple Python 3.9+ command-line application for managing Fidelity por

Darik Harter 2 Feb 26, 2022
API mocking with Python.

apyr apyr (all lowercase) is a simple & easy to use mock API server. It's great for front-end development when your API is not ready, or when you are

Umut Seven 55 Nov 25, 2022
模仿 USTC CAS 的程序,用于开发校内网站应用的本地调试。

ustc-cas-mock 模仿 USTC CAS 的程序,用于开发校内网站应用阶段调试。 请勿在生产环境部署! 只测试了最常用的三个 CAS route: /login /serviceValidate(验证 CAS ticket) /logout 没有测试过 proxy ticket。(因为我

taoky 4 Jan 27, 2022
automate the procedure of 403 response code bypass

403bypasser automate the procedure of 403 response code bypass Description i notice a lot of #bugbountytips describe how to bypass 403 response code s

smackerdodi2 40 Dec 16, 2022
Simple assertion library for unit testing in python with a fluent API

assertpy Simple assertions library for unit testing in Python with a nice fluent API. Supports both Python 2 and 3. Usage Just import the assert_that

19 Sep 10, 2022
Python tools for penetration testing

pyTools_PT python tools for penetration testing Please don't use these tool for illegal purposes. These tools is meant for penetration testing for leg

Gourab 1 Dec 01, 2021
Argument matchers for unittest.mock

callee Argument matchers for unittest.mock More robust tests Python's mocking library (or its backport for Python 3.3) is simple, reliable, and easy

Karol Kuczmarski 77 Nov 03, 2022
Python Moonlight (Machine Learning) Practice

PyML Python Moonlight (Machine Learning) Practice Contents Design Documentation Prerequisites Checklist Dev Setup Testing Run Prerequisites Python 3 P

Dockerian Seattle 2 Dec 25, 2022
Avocado is a set of tools and libraries to help with automated testing.

Welcome to Avocado Avocado is a set of tools and libraries to help with automated testing. One can call it a test framework with benefits. Native test

Ana Guerrero Lopez 1 Nov 19, 2021
Test scripts etc. for experimental rollup testing

rollup node experiments Test scripts etc. for experimental rollup testing. untested, work in progress python -m venv venv source venv/bin/activate #

Diederik Loerakker 14 Jan 25, 2022
pytest plugin providing a function to check if pytest is running.

pytest-is-running pytest plugin providing a function to check if pytest is running. Installation Install with: python -m pip install pytest-is-running

Adam Johnson 21 Nov 01, 2022
Python program that uses pynput to simulate key presses. Probably only works on Windows.

AutoKey Python program that uses pynput to simulate key presses. Probably only works on Windows. Can be used for pretty much whatever you want except

2 Oct 28, 2022
Travel through time in your tests.

time-machine Travel through time in your tests. A quick example: import datetime as dt

Adam Johnson 373 Dec 27, 2022
Asyncio http mocking. Similar to the responses library used for 'requests'

aresponses an asyncio testing server for mocking external services Features Fast mocks using actual network connections allows mocking some types of n

93 Nov 16, 2022
nose is nicer testing for python

On some platforms, brp-compress zips man pages without distutils knowing about it. This results in an error when building an rpm for nose. The rpm bui

1.4k Dec 12, 2022
Getting the most out of your hobby servo

ServoProject by Adam Bäckström Getting the most out of your hobby servo Theory The control system of a regular hobby servo looks something like this:

209 Dec 20, 2022
Just a small test with lists in cython

Test for lists in cython Algorithm create a list of 10^4 lists each with 10^4 floats values (namely: 0.1) - 2 nested for iterate each list and compute

Federico Simonetta 32 Jul 23, 2022
A folder automation made using Watch-dog, it only works in linux for now but I assume, it will be adaptable to mac and PC as well

folder-automation A folder automation made using Watch-dog, it only works in linux for now but I assume, it will be adaptable to mac and PC as well Th

Parag Jyoti Paul 31 May 28, 2021