Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
9SPY: a Windows RAT built in Python using sockets

9SPY 👁‍🗨 9SPY is a Windows RAT built in Python using sockets Features Features will be listed here soon, there are currenly 14 Information This is a

doop 12 Dec 01, 2022
Serves some data over HTTP, once. Based on the built-in Python module http.server

serve-me-once Serves some data over HTTP, once. Based on the built-in Python module http.server.

Peder Bergebakken Sundt 2 Jan 06, 2022
A simple tool to get information about IP

IP Info Tool Just a simple tool to get IP's information, it uses requests module to gather information about IP, if you dont have much knowledge about

0 Dec 01, 2021
A Python framework for interacting with Solana's Pyth network.

Pyth Network A basic Python framework for reading and decoding data regarding the Pyth network

1 Nov 29, 2021
Out-of-box Python RPC framework

typed-jsonrpc Out-of-box Python RPC framework. WIP. Make LSP easy for everyone. The conception of final usage: from typed_jsonrpc import * ls = Langu

Taine Zhao 4 Dec 28, 2021
A Simple Web Server made by Python3.

A Simple Web Server made by Python3.

GGN_2015 2 Nov 27, 2021
Python Scrcpy Client - allows you to view and control android device in realtime

Python Scrcpy Client This package allows you to view and control android device in realtime. Note: This gif is compressed and experience lower quality

LengYue 126 Jan 02, 2023
This is an open project to maintain a list of domain names that serve YouTube ads

The YouTube ads blocklist project This is an open project to maintain a list of domain names that serve YouTube ads. The original project only produce

Evan Pratten 574 Dec 30, 2022
This tool extracts Credit card numbers, NTLM(DCE-RPC, HTTP, SQL, LDAP, etc), Kerberos (AS-REQ Pre-Auth etype 23), HTTP Basic, SNMP, POP, SMTP, FTP, IMAP, etc from a pcap file or from a live interface.

This tool extracts Credit card numbers, NTLM(DCE-RPC, HTTP, SQL, LDAP, etc), Kerberos (AS-REQ Pre-Auth etype 23), HTTP Basic, SNMP, POP, SMTP, FTP, IMAP, etc from a pcap file or from a live interface

1.6k Jan 01, 2023
Simple P2P application for sending files over open and forwarded network ports.

FileShareV2 A major overhaul to the V1 (now deprecated) FileShare application. V2 brings major improvements in both UI and performance. V2 is now base

Michael Wang 1 Nov 23, 2021
A vpn that sits in your browser, accessible via a website

VPNInYourBrowser A vpn that sits in your browser, accessible via a website Example setup: https://VPNInBrowser.jaffa42.repl.co Setup Put the code onto

1 Jan 20, 2022
A Python Tor template on Gitpod

A Python Tor template on Gitpod This is template configured for ephemeral development environments on Gitpod. prebuild Get Started With Your Own Proje

Ivan Yastrebov 1 Dec 17, 2021
School Project using Python Sockets and Personal Encryption Method.

Python-Secure-File-Transfer School Project using Python Sockets and Personal Encryption Method. Installation Must have python3 installed on your syste

1 Dec 03, 2021
User-friendly packet captures

capture-packets: User-friendly packet captures Please read before using All network traffic occurring on your machine is captured (unless you specify

Seth Michael Larson 2 Feb 05, 2022
A website to list Shadowsocks proxies and check them periodically

Shadowmere An automatically tested list of Shadowsocks proxies. Motivation Collecting proxies around the internet is fun, but what if they stop workin

Jorge Alberto Díaz Orozco (Akiel) 29 Dec 21, 2022
The C based gRPC (C++, Python, Ruby, Objective-C, PHP, C#)

gRPC - An RPC library and framework gRPC is a modern, open source, high-performance remote procedure call (RPC) framework that can run anywhere. gRPC

grpc 36.6k Dec 30, 2022
Qtas(Quite a Storage)is an experimental distributed storage system developed by Q-team in BJFU Advanced Computer Network sources.

Qtas(Quite a Storage)is a experimental distributed storage system developed by Q-team in BJFU Advanced Computer Network sources.

Jiaming Zhang 3 Jan 12, 2022
This Tool can help enginners and biggener in network, the tool help you to find of any ip with subnet mask that can calucate them and show you ( Availble IP's , Subnet Mask, Network-ID, Broadcast-ID )

This Tool can help enginners and biggener in network, the tool help you to find of any ip with subnet mask that can calucate them and show you ( Availble IP's , Subnet Mask, Network-ID, Broadcast-ID

12 Dec 13, 2022
This script will make it easier to connect to any wireguard vpn config

wireguard-linux-python-script-vpn This script will make it easier to connect to any wireguard vpn config also u will need your wireguard vpn from your

Jimo 1 Sep 21, 2022
Anonymously Reverse shell over Tor Network using Hidden Services without portfortwarding

Anonymously Reverse shell over Tor Network using Hidden Services without portfortwarding Tor ağı ile Dark Web servislerini kullanarak anonim biçimde p

249 Dec 29, 2022