Make Selenium work on Github Actions

Overview

Make Selenium work on Github Actions

Scraping with BeautifulSoup on GitHub Actions is easy-peasy. But what about Selenium?? After you jump through some hoops it's just as simple.

How to use this

I think you can just fork this repository and get to work on scraper.py.

Otherwise, just steal scraper.py and .github/workflows/scrape.yml and you'll be good to go.

Saving CSV files

If you want every scrape to update a CSV or something like this, you'll need to edit your scrape.yml to commit to the repo after the scraper is done running.

Take a look at the bottom of autoscraper-history to see how to do that. It should be one more step, something like this:

      - name: Commit and push if content changed
        run: |-
          git config user.name "Automated"
          git config user.email "[email protected]"
          git add -A
          timestamp=$(date -u)
          git commit -m "Latest data: ${timestamp}" || exit 0
          git push

I'm pretty sure I lifted that code 100% from Simmon Willison.

Scheduling

Right now the yaml is set to only scrape when you click the "Run workflow" button in the Actions tab, but you can add something like

  schedule:
    - cron: '0 * * * *'

To make it run the first minute of every hour.

How this all works

To make Selenium + GitHub Actions work, this repo does a few magic things. In a normal world, you start Chrome like this:

from selenium import webdriver

driver = webdriver.Chrome()

But we.... do things a little differently. Let me walk you through the changes!

webdriver-manager to manage the webdriver

You can add a special action to set up Chromedriver but I feel it's honestly easier to use webdriver-manager. It magically picks out the right version of chromedriver for you.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager

driver = webdriver.Chrome(ChromeDriverManager().install())

It works on your local machine, too!

Chromium, not Chrome

When you're running GitHub Actions, it's probably on a nice little Ubuntu Linux machine. In those situations, you install software using apt. Since you can't install Chrome with apt, you'll install Chromium instead, the open-source version of Chrome. Works the same, just opens a little differently.

As a result, our chromedriver install code changed a little more:

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType

driver_path = ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install()
driver = webdriver.Chrome(driver_path)

This also means we need to add a line that does apt-get install -y chromium-browser to our scrape.yml)

Headless Chromium

Since we don't have a monitor plugged into GitHub Actions, we can't actually see what's going on in the browser. In the olden days you had to construct some odd technical fake screen, but these days you just run in headless mode!

from selenium import webdriver 
from selenium.webdriver.chrome.options import Options

chrome_options = Options()
chrome_options.add_argument("--headless")
driver = webdriver.Chrome(options=chrome_options)

Other options and more management of things

To combine the webdriver-manager driver path and the headless Chrome option, there are a lot of hoops to jump through. During the research process a lot of extra chrome options poked up, so I thought hey, let's just add all those, too.

from selenium import webdriver
from webdriver_manager.chrome import ChromeDriverManager
from webdriver_manager.utils import ChromeType
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service

chrome_service = Service(ChromeDriverManager(chrome_type=ChromeType.CHROMIUM).install())

chrome_options = Options()
options = [
    "--headless",
    "--disable-gpu",
    "--window-size=1920,1200",
    "--ignore-certificate-errors",
    "--disable-extensions",
    "--no-sandbox",
    "--disable-dev-shm-usage"
]
for option in options:
    chrome_options.add_argument(option)

driver = webdriver.Chrome(service=chrome_service, options=chrome_options)

The biggest change here beyond all those options is the Service thing. Apparently just giving it the path to chromdriver isn't good enough? Who knows, I just do what works.

Owner
Jonathan Soma
baby data journo wrangler @ledeprogram + @littlecolumns, cat wrangler @cat-republic
Jonathan Soma
bulk upload files to libgen.lc (Selenium script)

LibgenBulkUpload bulk upload files to http://libgen.lc/librarian.php (Selenium script) Usage ./upload.py to_upload uploaded rejects So title and autho

8 Jul 07, 2022
How to Create a YouTube Bot that Increases Views using Python Programming Language

YouTube-Bot-in-Python-Selenium How to Create a YouTube Bot that Increases Views using Python Programming Language. The app is for educational purpose

Edna 14 Jan 03, 2023
WEB PENETRATION TESTING TOOL 💥

N-WEB ADVANCE WEB PENETRATION TESTING TOOL Features 🎭 Admin Panel Finder Admin Scanner Dork Generator Advance Dork Finder Extract Links No Redirect H

56 Dec 23, 2022
Fidelipy - Semi-automated trading on fidelity.com

fidelipy fidelipy is a simple Python 3.7+ library for semi-automated trading on fidelity.com. The scope is limited to the Trade Stocks/ETFs simplified

Darik Harter 8 May 10, 2022
A simple script to login into twitter using Selenium in python.

Quick Talk A simple script to login into twitter using Selenium in python. I was looking for a way to login into twitter using Selenium in python. Sin

Lzy-slh 4 Nov 20, 2022
Redis fixtures and fixture factories for Pytest.

Redis fixtures and fixture factories for Pytest.This is a pytest plugin, that enables you to test your code that relies on a running Redis database. It allows you to specify additional fixtures for R

Clearcode 86 Dec 23, 2022
A single module to link Python ecosystem to the Web

A single module to link Python ecosystem to the Web. Have a quick look at the Gallery first to get convinced ! FAQ For any questions, please use Stack

66 Dec 21, 2022
Show, Edit and Tell: A Framework for Editing Image Captions, CVPR 2020

Show, Edit and Tell: A Framework for Editing Image Captions | arXiv This contains the source code for Show, Edit and Tell: A Framework for Editing Ima

Fawaz Sammani 76 Nov 25, 2022
Compiles python selenium script to be a Window's executable

Problem Statement Setting up a Python project can be frustrating for non-developers. From downloading the right version of python, setting up virtual

Jerry Ng 8 Jan 09, 2023
Lightweight, scriptable browser as a service with an HTTP API

Splash - A javascript rendering service Splash is a javascript rendering service with an HTTP API. It's a lightweight browser with an HTTP API, implem

Scrapinghub 3.8k Jan 03, 2023
A grab-bag of nifty pytest plugins

A goody-bag of nifty plugins for pytest OS Build Coverage Plugin Description Supported OS pytest-server-fixtures Extensible server-running framework w

Man Group 492 Jan 03, 2023
Show surprise when tests are passing

pytest-pikachu pytest-pikachu prints ascii art of Surprised Pikachu when all tests pass. Installation $ pip install pytest-pikachu Usage Pass the --p

Charlie Hornsby 13 Apr 15, 2022
An improbable web debugger through WebSockets

wdb - Web Debugger Description wdb is a full featured web debugger based on a client-server architecture. The wdb server which is responsible of manag

Kozea 1.6k Dec 09, 2022
A pytest plugin, that enables you to test your code that relies on a running PostgreSQL Database

This is a pytest plugin, that enables you to test your code that relies on a running PostgreSQL Database. It allows you to specify fixtures for PostgreSQL process and client.

Clearcode 252 Dec 21, 2022
pytest plugin for distributed testing and loop-on-failures testing modes.

xdist: pytest distributed testing plugin The pytest-xdist plugin extends pytest with some unique test execution modes: test run parallelization: if yo

pytest-dev 1.1k Dec 30, 2022
This project is used to send a screenshot by email of your MyUMons schedule using Selenium python lib (headless mode)

MyUMonsSchedule Use MyUMonsSchedule python script to send a screenshot by email (Gmail) of your MyUMons schedule. If you use it on Windows, take care

Pierre-Louis D'Agostino 6 May 12, 2022
Ab testing - The using AB test to test of difference of conversion rate

Facebook recently introduced a new type of offer that is an alternative to the current type of bidding called maximum bidding he introduced average bidding.

5 Nov 21, 2022
A Library for Working with Sauce Labs

Robotframework - Sauce Labs Plugin This is a plugin for the SeleniumLibrary to help with using Sauce Labs. This library is a plugin extension of the S

joshin4colours 6 Oct 12, 2021
🐍 Material for PyData Global 2021 Presentation: Effective Testing for Machine Learning Projects

Effective Testing for Machine Learning Projects Code for PyData Global 2021 Presentation by @edublancas. Slides available here. The project is develop

Eduardo Blancas 73 Nov 06, 2022
Using openpyxl in Python, performed following task

Python-Automation-with-openpyxl Using openpyxl in Python, performed following tasks on an Excel Sheet containing Product Suppliers along with their pr

1 Apr 06, 2022