Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Simple P2P application for sending files over open and forwarded network ports.

FileShareV2 A major overhaul to the V1 (now deprecated) FileShare application. V2 brings major improvements in both UI and performance. V2 is now base

Michael Wang 1 Nov 23, 2021
RollerScanner — Fast Port Scanner Written On Python

RollerScanner RollerScanner — Fast Port Scanner Written On Python Installation You should clone this repository using: git clone https://github.com/Ma

68 Nov 09, 2022
ProtOSINT is a Python script that helps you investigate Protonmail accounts and ProtonVPN IP addresses

ProtOSINT ProtOSINT is a Python script that helps you investigate ProtonMail accounts and ProtonVPN IP addresses. Description This tool can help you i

pixelbubble 249 Dec 23, 2022
netpy - more than implementation of netcat 🐍🔥

netpy - more than implementation of netcat 🐍🔥

Mahmoud S. ElGammal 1 Jan 26, 2022
The sequel to SquidNet. It has many of the previous features that were in the original script, however a lot of the functions that do not serve much functionality have been removed.

SquidNet2 The sequel to SquidNet. It has many of the previous features that were in the original script, however a lot of the functions that do not se

DrSquidX 5 Mar 25, 2022
Light, simple RPC framework for Python

Agileutil是一个Python3 RPC框架。基于微服务架构,封装了rpc/http/orm/log等常用组件,提供了简洁的API,开发者可以很快上手,快速进行业务开发。

16 Nov 22, 2022
GNS3 Graphical Network Simulator

GNS3-gui GNS3 GUI repository.

GNS3 1.7k Dec 29, 2022
Test - Python project for Collection Server and API Server

QProjectPython Collection Server 와 API Server 를 위한 Python 프로젝트 입니다. [FastAPI참고]

1 Jan 03, 2022
A Python3 discord trojan, utilizing discord webhooks for sending information.

Vape-Lite-RAT A Python3 discord trojan, utilizing discord webhooks for sending information. What you do with this code / project / idea is non of my b

NightTab 12 Oct 15, 2022
Nautobot is a Network Source of Truth and Network Automation Platform.

Nautobot is a Network Source of Truth and Network Automation Platform. Nautobot was initially developed as a fork of NetBox (v2.10.4). Nautobot runs as a web application atop the Django Python framew

Nautobot 549 Dec 31, 2022
A web-based app that allows easy, simple - and if desired high-throughput - analysis of qPCR data

qpcr-Analyser A web-based GUI for the qpcr package that allows easy, simple and high-throughput analysis of qPCR data. As is described in more detail

1 Sep 13, 2022
Query protocol and response

whois Query protocol and response _MᵃˢᵗᵉʳBᵘʳⁿᵗ_ _ ( ) _ ( )( ) _ | | ( ) | || |__ _ (_) ___ | | | | | || _ `\ /'_`\ | |/',__) |

MasterBurnt 4 Sep 05, 2021
A website to list Shadowsocks proxies and check them periodically

Shadowmere An automatically tested list of Shadowsocks proxies. Motivation Collecting proxies around the internet is fun, but what if they stop workin

Jorge Alberto Díaz Orozco (Akiel) 29 Dec 21, 2022
🥑 A Python ARP and DNS Spoofer CLI and INTERFACE 🥓

NEXTGEN SPOOFER 🥑 A Python ARP and DNS Spoofer CLI and INTERFACE 🥓 CLI - advanced pentesters INTERFACE - beginners SetUp Make sure you installed P

9 Dec 25, 2022
Nexum is an open-source, remote administration tool written in Python 3

A full-featured remote administration tool written in Python 3. The goal of this project is to make the use of a remote administration tool as simple

z3phyrus 2 Nov 26, 2021
Heroku Cloudflare App Domain

Heroku Cloudflare App Domain Creating branded herokuapp.com-like domains using Cloudflare, based on the app name (eg my-app-prod.example.com). Feature

Torchbox 2 Oct 04, 2022
Vent domain information retrieval tool, which is capable of retrieving customer information

Vent domain information retrieval tool, which is capable of retrieving customer information. This tool has been created for the purpose of complete education, Iam not responsible for any illegal acti

Md. Ridwanul Islam Muntakim 25 Dec 09, 2022
Juniper SNMP Migrations For Python

Juniper SNMP Migrations This example will show how to use the PyEZ plugin for Nornir to build a NETCONF connection to a remote device validate that SN

Calvin Remsburg 1 Jan 07, 2022
A library of functions that can be used to manage the download of claims from the LBRY network.

lbrytools A library of functions that can be used to manage the download of claims from the LBRY network. It includes methods to download claims by UR

13 Dec 03, 2022
Connects to databases or sftp server based on configured environmental variables.

Myconnections Connects to Oracle databases or sftp servers depending on configured environmental variables. VERY IMPORTANT: VPN must exist. Installati

0 Jan 02, 2022