Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
Modern Denial-of-service ToolKit for python

💣 Impulse Modern Denial-of-service ToolKit 💻 Main window 📡 Methods: Method Target Description SMS PHONE Sends a massive amount of SMS messages and

1 Nov 29, 2021
Jogo da forca simples com conexão entre cliente e servidor utilizando TCP.

JogoDaForcaTCP Um jogo da forca simples com conexão entre cliente e servidor utilizando o protocólo TCP. Como jogar: Habilite a porta 20000, inicie o

Kelvin Santos 1 Dec 01, 2021
School Project using Python Sockets and Personal Encryption Method.

Python-Secure-File-Transfer School Project using Python Sockets and Personal Encryption Method. Installation Must have python3 installed on your syste

1 Dec 03, 2021
基于多线程快速端口扫描脚本,支持目标批量导入、结果导出。

JWS_portscan 基于多线程快速端口扫描脚本,支持目标批量导入、结果导出。如果扫描公网资产,为了提升扫描的精准性,建议放到服务器运行。 用法 依赖安装:pip3 install -r requriement.txt 支持参数:python3 JWS_portscan.py --help 脚本

jammny 5 Apr 12, 2022
DataShare - Simple library for data sharing between scripts and public functions calling

DataShare - Simple library for data sharing between scripts and public functions calling. Installation. Install code, Delete LICENSE, README, readme.t

Ivan Perzhinsky. 1 Dec 17, 2021
NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

NetworkX is a Python package for the creation, manipulation, and study of the structure, dynamics, and functions of complex networks.

NetworkX 12k Jan 02, 2023
This script helps us to add IP, host name entry in hosts file and create directory run nmap scan and directory scan with your favourite tools

A python script to automate your set-up for Hack The Box, It sets up Workspace, Opens TMUX session, connects to OpenVPN, Runs Nmap and many more...

Cognizance 7 Mar 25, 2022
Converts from PC formatted MAC addresses (hardware addresses) to Cisco format or vice-versa

MAC-Converter Converts from PC formatted MAC addresses (hardware addresses) to Cisco format or vice-versa Stores the results to a file in the same dir

Stew Alexander 0 Dec 24, 2022
Raspberry Pi Based Serial Console Server, with PushBullet Notification of IP changes, Automatic VPN termination, custom menu, Power Outlet Control, and a lot more

ConsolePi Acts as a serial Console Server, allowing you to remotely connect to ConsolePi via Telnet/SSH/bluetooth to gain Console Access to devices co

120 Jan 05, 2023
PcapConverter - A project for generating 15min frames out of a .pcap file containing network traffic

CMB Assignment 02 code + notebooks This is a project for containing code for the

Yannik S 2 Jan 24, 2022
A Python framework for interacting with Solana's Pyth network.

Pyth Network A basic Python framework for reading and decoding data regarding the Pyth network

1 Nov 29, 2021
A simple, personal chat program that runs on a single computer. No Internet, just you.

MultiChat A simple, personal chat program that runs on a single computer. No Internet, just you. Simple and Local MultiChat was created with ease of u

Owls 2 Aug 19, 2022
This is a simple python script to collect sub-domains from hackertarget API

Domain-Scraper 🌐 This is a simple python script to collect sub-domains from hackertarget API Note : This is tool is limited to 20 Queries / day with

CHINO TECH TOOLS 4 Sep 09, 2021
An API for controlling Wi-Fi connections on Balena devices.

Description An API for controlling Wi-Fi connections on Balena devices. It does not contain an interface, instead it provides API endpoints to send re

8 Dec 25, 2022
The can package provides controller area network support for Python developers

python-can The Controller Area Network is a bus standard designed to allow microcontrollers and devices to communicate with each other. It has priorit

Brian Thorne 904 Dec 29, 2022
BibleNotifyDesktop - Desktop version of Bible Notify

Bible Notify Desktop This is the repository for the Desktop version of the daily

Bible Notify 5 Nov 16, 2022
API to establish connection between server and client

Socket-programming API to establish connection between server and client, socket.socket() creates a socket object that supports the context manager ty

Muziwandile Nkomo 1 Oct 30, 2021
Pywbem - A WBEM client and related utilities, written in pure Python.

Pywbem - A WBEM client and related utilities, written in pure Python Overview Pywbem is a WBEM client and WBEM indication listener and provides relate

PyWBEM Projects 39 Dec 22, 2022
It can be used both locally and remotely (indicating IP and port)

It can be used both locally and remotely (indicating IP and port). It automatically finds the offset to the Instruction Pointer stored in the stack.

DiegoAltF4 13 Dec 29, 2022
ASC - Api Server Controller

ASC - Api Server Controller

Uriel Alves 1 Jan 03, 2022