Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
This python script can change the mac address after some attack

MAC-changer Hello people, this python script was written for people who want to change the mac address after some attack, I know there are many ways t

5 Oct 10, 2022
Netwalk is a Python library to discover, parse, analyze and change Cisco switched networks

Netwalk is a Python library born out of a large remadiation project aimed at making network device discovery and management as fast and painless as possible.

38 Nov 07, 2022
Juniper SNMP Migrations For Python

Juniper SNMP Migrations This example will show how to use the PyEZ plugin for Nornir to build a NETCONF connection to a remote device validate that SN

Calvin Remsburg 1 Jan 07, 2022
mitm6 is a pentesting tool that exploits the default configuration of Windows to take over the default DNS server.

mitm6 is a pentesting tool that exploits the default configuration of Windows to take over the default DNS server.

Fox-IT 1.3k Jan 05, 2023
Roadster - Distance to Closest Road Feature Server

Roadster: Distance to Closest Road Feature Server Milliarium Aerum, the zero of

Textualization Software Ltd. 4 May 23, 2022
Dark Utilities - Cloudflare Uam Bypass

Dark Utilities - Cloudflare Uam Bypass

Inplex-sys 26 Dec 14, 2022
Converts Cisco formatted MAC Addresses to PC formatted MAC Addresses

Cisco-MAC-to-PC-MAC Converts a file with a list of Cisco formatted MAC Addresses to PC formatted MAC Addresses... Ex: abcd.efgh.ijkl to AB:CD:EF:GH:I

Stew Alexander 0 Jan 04, 2022
Herramienta para transferir eventos de Shadowserver REST API hacia Azure Blob Storage.

Herramienta para transferir eventos de Shadowserver REST API hacia Azure Blob Storage.

CSIRT-RD 1 Feb 04, 2022
Slowloris is basically an HTTP Denial of Service attack that affects threaded servers.

slowrise-ddos-tool What is Slowloris? Slowloris is basically an HTTP Denial of S

DEMON cat 4 Jun 19, 2022
gRPC typing stubs for Python

gRPC Typing Stubs for Python This is a PEP-561-compliant stub-only package which provides type information of gRPC. Install using pip: pip install grp

Blake Williams 27 Dec 20, 2022
It's a little project for change MAC address, for ethical hacking purposes

MACChangerPy It's a small project for MAC address change, for ethical hacking purposes, don't use it for bad purposes, any infringement will be your r

Erick Adriano Nunes da Silva 1 Mar 11, 2022
MS Iot Device Can Platform

Kavo MS IoT Platform Version: 2.0 Author: Luke Garceau Requirements Read CAN messages in real-time Convert the given variables to engineering useful v

Luke Garceau 1 Oct 13, 2021
PoC code for stealing the WiFi password of a network with a Lovebox IOT device connected

LoveBoxer PoC code for stealing the WiFi password of a network with a Lovebox IOT device connected. This PoC was is what I used in this blogpost Usage

Graham Helton 10 May 24, 2022
Python module to interface with Tuya WiFi smart devices

TinyTuya Python module to interface with Tuya WiFi smart devices Description This python module controls and monitors Tuya compatible WiFi Smart Devic

Jason Cox 365 Dec 26, 2022
Tsunami-Fi is simple multi-tool bash application for Wi-Fi attacks

🪴 Tsunami-Fi 🪴 Русская версия README 🌿 Description 🌿 Tsunami-Fi is simple multi-tool bash application for Wi-Fi WPS PixieDust and NullPIN attack,

【Kiko】 35 Dec 09, 2022
Simple local RPG turn-based to play while learn something using the anki system

Simple local RPG turn-based to play while learn something using the anki system

Raphael Kieling 5 Aug 02, 2022
Socialhome is best described as a federated personal profile with social networking functionality

Description Socialhome is best described as a federated personal profile with social networking functionality. Users can create rich content using Mar

Jason Robinson 332 Dec 30, 2022
🎥 PYnema is a simple UDP server written in python, allows you to watch downloaded videos.

🎥 PYnema is a simple UDP server written in python, allows you to watch downloaded videos.

Jan Kupczyk 1 Jan 16, 2022
This Tool Help To Information gathering for domain name or ip address...

Owl-Eye This Tool Help To Information gathering for domain name or ip address... follow this command $apt update && upgrade $apt install python apt in

Black Owl 6 Nov 12, 2022
A simple GitHub Action that physically puts your senses on alert when your build/release fails

GH Release Paniker A simple GitHub Action that physically puts your senses on alert when your build/release fails Usage Requirements: Raspberry Pi, LE

Hemanth Krishna 5 Dec 20, 2021