Integration layer between Requests and Selenium for automation of web actions.

Overview

Requestium

Build Status License

Requestium is a Python library that merges the power of Requests, Selenium, and Parsel into a single integrated tool for automatizing web actions.

The library was created for writing web automation scripts that are written using mostly Requests but that are able to seamlessly switch to Selenium for the JavaScript heavy parts of the website, while maintaining the session.

Requestium adds independent improvements to both Requests and Selenium, and every new feature is lazily evaluated, so its useful even if writing scripts that use only Requests or Selenium.

Features

  • Enables switching between a Requests' Session and a Selenium webdriver while maintaining the current web session.
  • Integrates Parsel's parser into the library, making xpath, css, and regex much cleaner to write.
  • Improves Selenium's handling of dynamically loading elements.
  • Makes cookie handling more flexible in Selenium.
  • Makes clicking elements in Selenium more reliable.
  • Supports Chrome and PhantomJS.

Installation

pip install requestium

You should then download your preferred Selenium webdriver if you plan to use the Selenium part of Requestium: Chromedriver or PhantomJS

Usage

First create a session as you would do on Requests, and optionally add arguments for the web-driver if you plan to use one.

from requestium import Session, Keys

s = Session(webdriver_path='./chromedriver',
            browser='chrome',
            default_timeout=15,
            webdriver_options={'arguments': ['headless']})

You don't need to parse the response, it is done automatically when calling xpath, css or re.

title = s.get('http://samplesite.com').xpath('//title/text()').extract_first(default='Default Title')

Regex require less boilerplate when compared to Python's standard re module.

response = s.get('http://samplesite.com/sample_path')

# Extracts the first match
identifier = response.re_first(r'ID_\d\w\d', default='ID_1A1')

# Extracts all matches as a list
users = response.re(r'user_\d\d\d')

The Session object is just a regular Requests's session object, so you can use all of its methods.

s.post('http://www.samplesite.com/sample', data={'field1': 'data1'})
s.proxies.update({'http': 'http://10.11.4.254:3128', 'https': 'https://10.11.4.252:3128'})

And you can switch to using the Selenium webdriver to run any js code.

s.transfer_session_cookies_to_driver()  # You can maintain the session if needed
s.driver.get('http://www.samplesite.com/sample/process')

The driver object is a Selenium webdriver object, so you can use any of the normal selenium methods plus new methods added by Requestium.

s.driver.find_element_by_xpath("//input[@class='user_name']").send_keys('James Bond', Keys.ENTER)

# New method which waits for element to load instead of failing, useful for single page web apps
s.driver.ensure_element_by_xpath("//div[@attribute='button']").click()

Requestium also adds xpath, css, and re methods to the Selenium driver object.

if s.driver.re(r'ID_\d\w\d some_pattern'):
    print('Found it!')

And finally you can switch back to using Requests.

s.transfer_driver_cookies_to_session()
s.post('http://www.samplesite.com/sample2', data={'key1': 'value1'})

Selenium workarounds

Requestium adds several 'ensure' methods to the driver object, as Selenium is known to be very finicky about selecting elements and cookie handling.

Wait for element

The ensure_element_by_ methods waits for the element to be loaded in the browser and returns it as soon as it loads. It's named after Selenium's find_element_by_ methods (which immediately raise an exception if they can't find the element).

Requestium can wait for an element to be in any of the following states:

  • present (default)
  • clickable
  • visible
  • invisible (useful for things like waiting for loading... gifs to disappear)

These methods are very useful for single page web apps where the site is dynamically changing its elements. We usually end up completely replacing our find_element_by_ calls with ensure_element_by_ calls as they are more flexible.

Elements you get using these methods have the new ensure_click method which makes the click less prone to failure. This helps with getting through a lot of the problems with Selenium clicking.

s.driver.ensure_element_by_xpath("//li[@class='b1']", state='clickable', timeout=5).ensure_click()

# === We also added these methods named in accordance to Selenium's api design ===
# ensure_element_by_id
# ensure_element_by_name
# ensure_element_by_link_text
# ensure_element_by_partial_link_text
# ensure_element_by_tag_name
# ensure_element_by_class_name
# ensure_element_by_css_selector

Add cookie

The ensure_add_cookie method makes adding cookies much more robust. Selenium needs the browser to be at the cookie's domain before being able to add the cookie, this method offers several workarounds for this. If the browser is not in the cookies domain, it GETs the domain before adding the cookie. It also allows you to override the domain before adding it, and avoid making this GET. The domain can be overridden to '', this sets the cookie's domain to whatever domain the driver is currently in.

If it can't add the cookie it tries to add it with a less restrictive domain (Eg.: home.site.com -> site.com) before failing.

cookie = {"domain": "www.site.com",
          "secure": false,
          "value": "sd2451dgd13",
          "expiry": 1516824855.759154,
          "path": "/",
          "httpOnly": true,
          "name": "sessionid"}
s.driver.ensure_add_cookie(cookie, override_domain='')

Considerations

New features are lazily evaluated, meaning:

  • The Selenium webdriver process is only started if you call the driver object. So if you don't need to use the webdriver, you could use the library with no overhead. Very useful if you just want to use the library for its integration with Parsel.
  • Parsing of the responses is only done if you call the xpath, css, or re methods of the response. So again there is no overhead if you don't need to use this feature.

A byproduct of this is that the Selenium webdriver could be used just as a tool to ease in the development of regular Requests code: You can start writing your script using just the Requests' session, and at the last step of the script (the one you are currently working on) transfer the session to the Chrome webdriver. This way, a Chrome process starts in your machine, and acts as a real time "visor" for the last step of your code. You can see in what state your session is currently in, inspect it with Chrome's excellent inspect tools, and decide what's the next step your session object should take. Very useful to try code in an IPython interpreter and see how the site reacts in real time.

When transfer_driver_cookies_to_session is called, Requestium automatically updates your Requests session user-agent to match that of the browser used in Selenium. This doesn't happen when running Requests without having switched from a Selenium session first though. So if you just want to run Requests but want it to use your browser's user agent instead of the default one (which sites love to block), just run:

s.copy_user_agent_from_driver()

Take into account that doing this will launch a browser process.

Note: The Selenium Chrome webdriver doesn't support automatic transfer of proxies from the Session to the Webdriver at the moment. The PhantomJS driver does though.

Comparison with Requests + Selenium + lxml

A silly working example of a script that runs on Reddit. We'll then show how it compares to using Requests + Selenium + lxml instead of Requestium.

Using Requestium

from requestium import Session, Keys

# If you want requestium to type your username in the browser for you, write it in here:
reddit_user_name = ''

s = Session('./chromedriver', browser='chrome', default_timeout=15)
s.driver.get('http://reddit.com')
s.driver.find_element_by_xpath("//a[@href='https://www.reddit.com/login']").click()

print('Waiting for elements to load...')
s.driver.ensure_element_by_class_name("desktop-onboarding-sign-up__form-toggler",
				      state='visible').click()

if reddit_user_name:
    s.driver.ensure_element_by_id('user_login').send_keys(reddit_user_name)
    s.driver.ensure_element_by_id('passwd_login').send_keys(Keys.BACKSPACE)
print('Please log-in in the chrome browser')

s.driver.ensure_element_by_class_name("desktop-onboarding__title", timeout=60, state='invisible')
print('Thanks!')

if not reddit_user_name:
    reddit_user_name = s.driver.xpath("//span[@class='user']//text()").extract_first()

if reddit_user_name:
    s.transfer_driver_cookies_to_session()
    response = s.get("https://www.reddit.com/user/{}/".format(reddit_user_name))
    cmnt_karma = response.xpath("//span[@class='karma comment-karma']//text()").extract_first()
    reddit_golds_given = response.re_first(r"(\d+) gildings given out")
    print("Comment karma: {}".format(cmnt_karma))
    print("Reddit golds given: {}".format(reddit_golds_given))
else:
    print("Couldn't get user name")

Using Requests + Selenium + lxml

import re
from lxml import etree
from requests import Session
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC

# If you want requestium to type your username in the browser for you, write it in here:
reddit_user_name = ''

driver = webdriver.Chrome('./chromedriver')
driver.get('http://reddit.com')
driver.find_element_by_xpath("//a[@href='https://www.reddit.com/login']").click()

print('Waiting for elements to load...')
WebDriverWait(driver, 15).until(
    EC.visibility_of_element_located((By.CLASS_NAME, "desktop-onboarding-sign-up__form-toggler"))
).click()

if reddit_user_name:
    WebDriverWait(driver, 15).until(
        EC.presence_of_element_located((By.ID, 'user_login'))
    ).send_keys(reddit_user_name)
    driver.find_element_by_id('passwd_login').send_keys(Keys.BACKSPACE)
print('Please log-in in the chrome browser')

try:
    WebDriverWait(driver, 3).until(
        EC.presence_of_element_located((By.CLASS_NAME, "desktop-onboarding__title"))
    )
except TimeoutException:
    pass
WebDriverWait(driver, 60).until(
    EC.invisibility_of_element_located((By.CLASS_NAME, "desktop-onboarding__title"))
)
print('Thanks!')

if not reddit_user_name:
    tree = etree.HTML(driver.page_source)
    try:
        reddit_user_name = tree.xpath("//span[@class='user']//text()")[0]
    except IndexError:
        reddit_user_name = None

if reddit_user_name:
    s = Session()
    # Reddit will think we are a bot if we have the wrong user agent
    selenium_user_agent = driver.execute_script("return navigator.userAgent;")
    s.headers.update({"user-agent": selenium_user_agent})
    for cookie in driver.get_cookies():
        s.cookies.set(cookie['name'], cookie['value'], domain=cookie['domain'])
    response = s.get("https://www.reddit.com/user/{}/".format(reddit_user_name))
    try:
        cmnt_karma = etree.HTML(response.content).xpath(
            "//span[@class='karma comment-karma']//text()")[0]
    except IndexError:
        cmnt_karma = None
    match = re.search(r"(\d+) gildings given out", str(response.content))
    if match:
        reddit_golds_given = match.group(1)
    else:
        reddit_golds_given = None
    print("Comment karma: {}".format(cmnt_karma))
    print("Reddit golds given: {}".format(reddit_golds_given))
else:
    print("Couldn't get user name")

Similar Projects

This project intends to be a drop-in replacement of Requests' Session object, with added functionality. If your use case is a drop in replacement for a Selenium webdriver, but that also has some of Requests' functionality, Selenium-Requests does just that.

License

Copyright © 2018, Tryolabs. Released under the BSD 3-Clause.

Owner
Tryolabs
We are a Machine Learning consulting shop with an active R&D division, focused on Deep Learning, Computer Vision and NLP.
Tryolabs
WrightEagle AutoTest (Has been updated by Cyrus team members)

Autotest2d WrightEagle AutoTest (Has been updated by Cyrus team members) Thanks go to WrightEagle Members. Steps 1- prepare start_team file. In this s

Cyrus Soccer Simulation 2D Team 3 Sep 01, 2022
The evaluator covering all of the metrics required by tasks within the DUE Benchmark.

DUE Evaluator The repository contains the evaluator covering all of the metrics required by tasks within the DUE Benchmark, i.e., set-based F1 (for KI

DUE Benchmark 4 Jan 21, 2022
This project is used to send a screenshot by email of your MyUMons schedule using Selenium python lib (headless mode)

MyUMonsSchedule Use MyUMonsSchedule python script to send a screenshot by email (Gmail) of your MyUMons schedule. If you use it on Windows, take care

Pierre-Louis D'Agostino 6 May 12, 2022
A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python :rocket:

A suite of benchmarks for CPU and GPU performance of the most popular high-performance libraries for Python :rocket:

Dion Häfner 255 Jan 04, 2023
The Penetration Testers Framework (PTF) is a way for modular support for up-to-date tools.

The PenTesters Framework (PTF) is a Python script designed for Debian/Ubuntu/ArchLinux based distributions to create a similar and familiar distribution for Penetration Testing

trustedsec 4.5k Dec 28, 2022
Compiles python selenium script to be a Window's executable

Problem Statement Setting up a Python project can be frustrating for non-developers. From downloading the right version of python, setting up virtual

Jerry Ng 8 Jan 09, 2023
Mypy static type checker plugin for Pytest

pytest-mypy Mypy static type checker plugin for pytest Features Runs the mypy static type checker on your source files as part of your pytest test run

Dan Bader 218 Jan 03, 2023
A configurable set of panels that display various debug information about the current request/response.

Django Debug Toolbar The Django Debug Toolbar is a configurable set of panels that display various debug information about the current request/respons

Jazzband 7.3k Jan 02, 2023
Data-Driven Tests for Python Unittest

DDT (Data-Driven Tests) allows you to multiply one test case by running it with different test data, and make it appear as multiple test cases. Instal

424 Nov 28, 2022
Statistical tests for the sequential locality of graphs

Statistical tests for the sequential locality of graphs You can assess the statistical significance of the sequential locality of an adjacency matrix

2 Nov 23, 2021
A python bot using the Selenium library to auto-buy specified sneakers on the nike.com website.

Sneaker-Bot-UK A python bot using the Selenium library to auto-buy specified sneakers on the nike.com website. This bot is still in development and is

Daniel Hinds 4 Dec 14, 2022
Photostudio是一款能进行自动化检测网页存活并实时给网页拍照的工具,通过调用Fofa/Zoomeye/360qua/shodan等 Api快速准确查询资产并进行网页截图,从而实施进一步的信息筛查。

Photostudio-红队快速爬取网页快照工具 一、简介: 正如其名:这是一款能进行自动化检测,实时给网页拍照的工具 信息收集要求所收集到的信息要真实可靠。 当然,这个原则是信息收集工作的最基本的要求。为达到这样的要求,信息收集者就必须对收集到的信息反复核实,不断检验,力求把误差减少到最低限度。我

s7ck Team 41 Dec 11, 2022
Switch among Guest VMs organized by Resource Pool

Proxmox PCI Switcher Switch among Guest VMs organized by Resource Pool. main features: ONE GPU card, N OS (at once) Guest VM command client Handler po

Rosiney Gomes Pereira 111 Dec 27, 2022
Aplikasi otomasi klik di situs popcat.click menggunakan Python dan Selenium

popthe-popcat Aplikasi Otomasi Klik di situs popcat.click. aplikasi ini akan secara otomatis melakukan click pada kucing viral itu, sehingga anda tida

cndrw_ 2 Oct 07, 2022
Python Testing Crawler 🐍 🩺 🕷️ A crawler for automated functional testing of a web application

Python Testing Crawler 🐍 🩺 🕷️ A crawler for automated functional testing of a web application Crawling a server-side-rendered web application is a

70 Aug 07, 2022
PyAutoEasy is a extension / wrapper around the famous PyAutoGUI, a cross-platform GUI automation tool to replace your boooring repetitive tasks.

PyAutoEasy PyAutoEasy is a extension / wrapper around the famous PyAutoGUI, a cross-platform GUI automation tool to replace your boooring repetitive t

Dingu Sagar 7 Oct 27, 2022
Data App Performance Tests

Data App Performance Tests My hypothesis is that The different architectures of

Marc Skov Madsen 6 Dec 14, 2022
Voip Open Linear Testing Suite

VOLTS Voip Open Linear Tester Suite Functional tests for VoIP systems based on voip_patrol and docker 10'000 ft. view System is designed to run simple

Igor Olhovskiy 17 Dec 30, 2022
Generic automation framework for acceptance testing and RPA

Robot Framework Introduction Installation Example Usage Documentation Support and contact Contributing License Introduction Robot Framework is a gener

Robot Framework 7.7k Jan 07, 2023
Instagram unfollowing bot. If this script is executed that specific accounts following will be reduced

Instagram-Unfollower-Bot Instagram unfollowing bot. If this script is executed that specific accounts following will be reduced.

Biswarup Bhattacharjee 1 Dec 24, 2021