Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
TsuserverMoS - A Python-based server for Attorney Online,

tsuserverMoS A Python-based server for Attorney Online, forked from RealKaiser/tsuserverCC Requires Python 3.7+ and PyYAML. Changes/additions from tsu

1 Dec 30, 2021
NetMiaou is an crossplatform hacking tool that can do reverse shells, send files, create an http server or send and receive tcp packet

NetMiaou is an crossplatform hacking tool that can do reverse shells, send files, create an http server or send and receive tcp packet

TRIKKSS 5 Oct 05, 2022
Ping Verification Python Script

Python Script Port Scanner Script WHAT IS IT? Port scanner script using Python. HOW IT WORKS Once the script has been executed, it will request the ta

AC 0 Dec 12, 2021
Whoisss is a website information gatharing Tool.

Whoisss Whoisss is a website information gatharing Tool. You can cse it to collect information about website. Usage apt-get update apt-get upgrade pkg

Md. Nur habib 2 Jan 23, 2022
This tool will scans your wi-fi/wlan and show you the connected clients

This tool will scans your wi-fi/wlan and show you the connected clients

VENKAT SAI SAGAR 3 Mar 24, 2022
Simple app that redirect fixed URL to changing URL, configurable via POST requests

This is a basic URL redirection service. It stores associations between apps and redirection URLs, for apps with changing URLs. You can then use GET r

Maxime Weyl 2 Jan 28, 2022
This python script can change the mac address after some attack

MAC-changer Hello people, this python script was written for people who want to change the mac address after some attack, I know there are many ways t

5 Oct 10, 2022
Pteronode - Script for managing Pterodactyl nodes

pteronode Script for managing Pterodactyl nodes Pteronode allows you to create s

9 Sep 28, 2022
A repository dedicated to IoT(internet of things ) and python scripts

📑 Introduction Week of Learning is a weekly program in which you will get all the necessary knowledge about Circuit-Building, Arduino and Micro-Contr

27 Nov 22, 2022
Heroku Cloudflare App Domain

Heroku Cloudflare App Domain Creating branded herokuapp.com-like domains using Cloudflare, based on the app name (eg my-app-prod.example.com). Feature

Torchbox 2 Oct 04, 2022
Easy-to-setup bot, ChatOps project for handling telegram chat logging over docker-compose services, being runned as one of them.

Easy-to-setup bot, ChatOps project for handling telegram chat logging over docker-compose services, being runned as one of them.

Rashid 7 Aug 08, 2022
gRPC typing stubs for Python

gRPC Typing Stubs for Python This is a PEP-561-compliant stub-only package which provides type information of gRPC. Install using pip: pip install grp

Blake Williams 27 Dec 20, 2022
Mass querying whois records using whois tool

Mass querying whois records using whois tool

Mohamed Elbadry 24 Nov 10, 2022
pyngrok is a Python wrapper for ngrok

pyngrok is a Python wrapper for ngrok that manages its own binary, making ngrok available via a convenient Python API.

Alex Laird 329 Dec 31, 2022
This is a Client-Server-System which can share the screen from the server to client and in the other direction.

Screenshare-Streaming-Python This is a Client-Server-System which can share the screen from the server to client and in the other direction. You have

VFX / Videoeffects Creator 1 Nov 19, 2021
🔥 Minimal performant package to asynchronously make GET requests.

Minimal performant package to asynchronously make GET requests without any dependencies other than asyncio.

Yannick Perrenet 1 Jun 01, 2022
Fast and configurable script to get and check free HTTP, SOCKS4 and SOCKS5 proxy lists from different sources and save them to files

Fast and configurable script to get and check free HTTP, SOCKS4 and SOCKS5 proxy lists from different sources and save them to files. It can also get geolocation for each proxy and check if proxies a

Almaz 385 Dec 31, 2022
This script aims to make the dynamic public ip of your local server, public.

EZ DDNS CLOUDFLARE This script aims to make the dynamic ip of your local server, public. It does this by regularly updating cloudflare's dns record. B

3 Feb 13, 2022
VRF-StarkNet - Contracts for verifiable randomness on StarkNet

VRF-StarkNet Contracts for verifiable randomness on StarkNet Motivation Deployed

Non 32 Oct 30, 2022
Wifi-jammer - Continuously perform deauthentication attacks on all detectable stations

wifi-jammer Continuously perform deauthentication attacks on all detectable stat

Leonardo de Araujo 14 Nov 03, 2022