Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Docker container for demoing Wi-Fi calling stack.

VoWiFiLocalDemo - Docker container that runs StrongSwan and Kamailio to demonstrate how Wi-Fi calling works on smartphones.

18 Nov 12, 2022
GlokyPortScannar is a really fast tool to scan TCP ports implemented in Python.

GlokyPortScannar is a really fast tool to scan TCP ports implemented in Python. Installation: This program requires Python 3.9. Linux

gl0ky 5 Jun 25, 2022
API Server for VoIP analysis (CDR + Audio CODECs)

Swagger generated server Overview This server was generated by the swagger-codegen project. By using the OpenAPI-Spec from a remote server, you can ea

Noor Muhammad Malik 1 Jan 11, 2022
MS Iot Device Can Platform

Kavo MS IoT Platform Version: 2.0 Author: Luke Garceau Requirements Read CAN messages in real-time Convert the given variables to engineering useful v

Luke Garceau 1 Oct 13, 2021
CSP-style concurrency for Python

aiochan Aiochan is a library written to bring the wonderful idiom of CSP-style concurrency to python. The implementation is based on the battle-tested

Ziyang Hu 127 Dec 23, 2022
A simple, 2-person chat program that runs on a single computer. No Internet, just you

localChat A simple, 2-person chat program that runs on a single computer. No Internet, just you. Simple and Local This was created with ease of use in

Owls 2 Aug 19, 2022
Automated network configuration backups using Github actions and git-scraping

Network Config Scraper This repository demonstrates the use of Github Actions and git-scraping to build an automated backup solution for network confi

WWT 19 Dec 14, 2022
Repo for investigation of timeouts that happens with prolonged training on clients

Flower-timeout Repo for investigation of timeouts that happens with prolonged training on clients. This repository is meant purely for demonstration o

1 Jan 21, 2022
Caching for HTTPX

Caching for HTTPX. Note: Early development / alpha, use at your own risk. This package adds caching functionality to HTTPX Adapted from Eric Larson's

Mehul Solanki 0 Oct 08, 2022
Pywbem - A WBEM client and related utilities, written in pure Python.

Pywbem - A WBEM client and related utilities, written in pure Python Overview Pywbem is a WBEM client and WBEM indication listener and provides relate

PyWBEM Projects 39 Dec 22, 2022
Decentra Network is an open source blockchain that combines speed, security and decentralization.

Decentra Network is an open source blockchain that combines speed, security and decentralization. Decentra Network has very high speeds, scalability, asymptotic security and complete decentralization

Decentra Network 74 Nov 22, 2022
Mass querying whois records using whois tool

Mass querying whois records using whois tool

Mohamed Elbadry 24 Nov 10, 2022
A socket script to obtain chinese phones-sequence for any english word

Foreign Pronunciation Generator (English-Chinese) We provide a simple socket script for acquiring Chinese pronunciation of English words (phones in ai

Ephemeroptera 5 Jul 25, 2022
An automatic reaction network generator for reactive molecular dynamics simulation.

ReacNetGenerator An automatic reaction network generator for reactive molecular dynamics simulation. ReacNetGenerator: an automatic reaction network g

Tong Zhu Research Group 35 Dec 14, 2022
Bark Toolkit is a toolkit wich provides Denial-of-service attacks, SMS attacks and more.

Bark Toolkit About Bark Toolkit Bark Toolkit is a set of tools that provides denial of service attacks. Bark Toolkit includes SMS attack tool, HTTP

13 Jan 04, 2023
A simple port scanner for Web/ip scanning Port 0/500 editable inside the .py file

Simple-Port-Scanner a simple port scanner for Web/ip scanning Port 0/500 editable inside the .py file Open Cmd/Terminal Cmd Downloads Run Command: pip

YABOI 1 Nov 22, 2021
A Python tool used to automate the execution of the following tools : Nmap , Nikto and Dirsearch but also to automate the report generation during a Web Penetration Testing

📡 WebMap A Python tool used to automate the execution of the following tools : Nmap , Nikto and Dirsearch but also to automate the report generation

Iliass Alami Qammouri 274 Jan 01, 2023
A Cheap Flight Alert program sends you a SMS to notify cheap flights in next 8 months.

Flight Dealer A Cheap Flight Alert program sends you a SMS to notify cheap flights (return trips) in next 6 months. Installing Download the Python 3 i

Aytaç Kaşoğlu 2 Feb 10, 2022
Wifijammer - Continuously jam all wifi clients/routers

wifijammer Continuously jam all wifi clients and access points within range. The effectiveness of this script is constrained by your wireless card. Al

Dan McInerney 3.5k Dec 31, 2022
GNS3 Graphical Network Simulator

GNS3-gui GNS3 GUI repository.

GNS3 1.7k Dec 29, 2022