Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
WebRTC and ORTC implementation for Python using asyncio

aiortc What is aiortc? aiortc is a library for Web Real-Time Communication (WebRTC) and Object Real-Time Communication (ORTC) in Python. It is built o

3.2k Jan 07, 2023
A simple port scanner for Web/ip scanning Port 0/500 editable inside the .py file

Simple-Port-Scanner a simple port scanner for Web/ip scanning Port 0/500 editable inside the .py file Open Cmd/Terminal Cmd Downloads Run Command: pip

YABOI 1 Nov 22, 2021
snappi-trex is a snappi plugin that allows executing scripts written using snappi with Cisco's TRex Traffic Generator

snappi-trex snappi-trex is a snappi plugin that allows executing scripts written using snappi with Cisco's TRex Traffic Generator Design snappi-trex c

Open Traffic Generator 14 Sep 07, 2022
Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Integrating Oxylabs' Residential Proxies with AIOHTTP Requirements for the Integration For the integration to work you'll need to install aiohttp libr

Oxylabs.io 6 Sep 14, 2022
This python script can change the mac address after some attack

MAC-changer Hello people, this python script was written for people who want to change the mac address after some attack, I know there are many ways t

5 Oct 10, 2022
Discord RPC Generator With Python

Discord-RPC-Generator Thank you for using this Discord Custom RP Generator. This is 100% safe and open source. Download Discord for your computer here

1 Nov 09, 2021
Web-server with a parser, connection to DBMS, and the Hugging Face.

Final_Project Web-server with parser, connection to DBMS and the Hugging Face. Team: Aisha Bazylzhanova(SE-2004), Arysbay Dastan(SE-2004) Installation

Aisha Bazylzhanova 2 Nov 18, 2021
PyBERT is a serial communication link bit error rate tester simulator with a graphical user interface (GUI).

PyBERT PyBERT is a serial communication link bit error rate tester simulator with a graphical user interface (GUI). It uses the Traits/UI package of t

David Banas 59 Dec 23, 2022
An opensource library to use SNMP get/bulk/set/walk in Python

SNMP-UTILS An opensource library to use SNMP get/bulk/set/walk in Python Features Work with OIDS json list [Find Here](#OIDS List) GET command SET com

Alexandre Gossard 3 Aug 03, 2022
Home Assistant integration for MyEnergi devices

myenergi for Home Assistant myenergi custom component for Home Assistant This is a very early release, will add more documentations soon! This compone

Johan Isacsson 70 Dec 18, 2022
Tool written on Python that locate all up host on your subnet

HOSTSCAN Easy to use command line network host scanner. From noob to noobs. Dependencies Nmap 7.92 or superior Python 3.9 or superior All requirements

NexCreep 4 Feb 27, 2022
msgspec is a fast and friendly implementation of the MessagePack protocol for Python 3.8+

msgspec msgspec is a fast and friendly implementation of the MessagePack protocol for Python 3.8+. In addition to serialization/deserializat

Jim Crist-Harif 414 Jan 06, 2023
Godzilla traffic decoder Godzilla Decoder 是一个用于 哥斯拉Godzilla 加密流量分析的辅助脚本。

Godzilla Decoder 简介 Godzilla Decoder 是一个用于 哥斯拉Godzilla 加密流量分析的辅助脚本。 Godzilla Decoder 基于 mitmproxy,是mitmproxy的addon脚本。 目前支持 哥斯拉3.0.3 PhpDynamicPayload的

He Ruiliang 40 Dec 25, 2022
Very simple FTP client, sync folder to FTP server, use python, opensource

ftp-sync-python Opensource, A way to safe your data, avoid lost data by Virus, Randsomware Some functions: Upload a folder automatically to FTP server

4 Sep 13, 2022
Synchronised text editor over TCP, for live editing with others.

SyncTEd Synchronised text editor over TCP, for live editing with others. Written in Python with PyGame. Run Install requirements: pip install -r requi

Marko Živić 1 May 13, 2022
This tool will scans your wi-fi/wlan and show you the connected clients

This tool will scans your wi-fi/wlan and show you the connected clients

VENKAT SAI SAGAR 3 Mar 24, 2022
A library of functions that can be used to manage the download of claims from the LBRY network.

lbrytools A library of functions that can be used to manage the download of claims from the LBRY network. It includes methods to download claims by UR

13 Dec 03, 2022
A protocol or procedure that connects an ever-changing IP address to a fixed physical machine address

p0znMITM ARP Poisoning Tool What is ARP? Address Resolution Protocol (ARP) is a protocol or procedure that connects an ever-changing IP address to a f

Furkan OZKAN 9 Sep 18, 2022
Cobalt Strike script for ScareCrow payloads

🎃 🌽 ScareCrow Cobalt Strike intergration CNA A Cobalt Strike script for ScareCrow payload generation. Works only with the binary and DLL Loader. 💣

UserX 401 Dec 11, 2022
Network Dynaimcs Simulation

A Final Year Project in CUHK, Autumn 2021 Network Dynaimcs Simulation Files param.h edit all the variables & settings here simulate.c the main program

Likchun 0 Mar 28, 2022