Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Script and library to wait for a DNS authority server to get its configuration.

DNSWait dnswait is a small script to wait for the "propagation" of a namserver configuration. Installing It's as easy as: python -m pip install dnswai

Julien Palard 14 Jan 17, 2022
FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware.

FIRM-AFL is the first high-throughput greybox fuzzer for IoT firmware. FIRM-AFL addresses two fundamental problems in IoT fuzzing

356 Dec 23, 2022
A simple tool to get information about IP

IP Info Tool Just a simple tool to get IP's information, it uses requests module to gather information about IP, if you dont have much knowledge about

0 Dec 01, 2021
Port Traffic/Bandwidth Monitor Script

python-switch-port-traffic-alarm Port Traffic/Bandwidth Monitor Script That's an Switch Port Traffic monitor program is checking the switch uplink por

goksinenki 4 Sep 02, 2021
A Scapy implementation of SMS-SUBMIT and (U)SIM Application Toolkit command packets.

A Scapy implementation of SMS-SUBMIT and (U)SIM Application Toolkit command packets.

mnemonic 83 Dec 11, 2022
An opensource library to use SNMP get/bulk/set/walk in Python

SNMP-UTILS An opensource library to use SNMP get/bulk/set/walk in Python Features Work with OIDS json list [Find Here](#OIDS List) GET command SET com

Alexandre Gossard 3 Aug 03, 2022
Edge static generator. Also Edge means: the outside limit of an object, area, or surface.

Edge Edge is a new static generator. Edge is onworking. Do not clone or do any changes. No P.R will be merged Also Edge means: the outside limit of an

AmirHossein Mohammadi 12 Jan 16, 2022
libsigrok stacked Protocol Decoder for TPM 2.0 transactions from an SPI bus. BitLocker Volume Master Key (VMK) are automatically extracted.

libsigrok stacked Protocol Decoder for TPM 2.0 transactions from an SPI bus. BitLocker Volume Master Key (VMK) are automatically extracted.

Jordan Ovrè 9 Dec 26, 2022
It's an extra broadcast driver for masonite. It adds support for socketio.

It's an extra broadcast driver for masonite. It adds support for socketio.

Yubaraj Shrestha 6 Feb 23, 2022
Tripwire monitors ports and icmp to send the admin a message if somebody is scanning a machine that shouldn't be touched

Tripwire monitors ports and icmp to send the admin a message if somebody is scanning a machine that shouldn't be touched

3 Apr 05, 2022
PoC code for stealing the WiFi password of a network with a Lovebox IOT device connected

LoveBoxer PoC code for stealing the WiFi password of a network with a Lovebox IOT device connected. This PoC was is what I used in this blogpost Usage

Graham Helton 10 May 24, 2022
RollerScanner — Fast Port Scanner Written On Python

RollerScanner RollerScanner — Fast Port Scanner Written On Python Installation You should clone this repository using: git clone https://github.com/Ma

68 Nov 09, 2022
Scan any IP address except IPv6 using Python.

Port_Scanner-python To use this tool called "Console Port Scanner", you need to enter an IP address (NOT IPv6). It might take a long time to scan port

1 Dec 24, 2021
Initial code of an A3C network

A3C-network Initial code of an A3C network Open the python file named as "APL452 Project Report2" The following libraries and packages have been insta

Ayush Tanwar 0 Jun 11, 2022
DNS monitoring system built with Python.

DNS monitoring system built with Python.

Andressa Cabistani 7 Sep 28, 2021
This is simple script that changes the config register of a cisco router over serial so that you can reset the password

Cisco-router-config-bypass-tool- This is simple script that changes the config register of a cisco router over serial so that you can bypass the confi

James 1 Jan 02, 2022
boofuzz: Network Protocol Fuzzing for Humans

boofuzz: Network Protocol Fuzzing for Humans Boofuzz is a fork of and the successor to the venerable Sulley fuzzing framework. Besides numerous bug fi

Joshua Pereyda 1.7k Dec 31, 2022
A simple Encrypted IM chat software Server & client based on Python3.

SecretBox A simple Encrypted IM chat software Server & client based on Python3. Version 1.0 命令行版 安装步骤 Server 运行pip3 install -r requirements 安装依赖。 运行py

h3h3da 5 Oct 31, 2022
Terminal based chat - networking project with sockets in python

Terminal based chat - networking project with sockets in python

2 Jan 24, 2022
NanoChat - nano chat server and client

NanoChat This is a work in progress! NanoChat is an application for connecting with your friends using Python that uses ONLY default Python libraries.

Miss Bliss 1 Nov 13, 2021