Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
A website to list Shadowsocks proxies and check them periodically

Shadowmere An automatically tested list of Shadowsocks proxies. Motivation Collecting proxies around the internet is fun, but what if they stop workin

Jorge Alberto Díaz Orozco (Akiel) 29 Dec 21, 2022
ARTEMIS: Real-Time Detection and Automatic Mitigation for BGP Prefix Hijacking.

ARTEMIS: Real-Time Detection and Automatic Mitigation for BGP Prefix Hijacking. This is the main ARTEMIS repository that composes artemis-frontend, artemis-backend, artemis-monitor and other needed c

INSPIRE Group @FORTH-ICS 273 Jan 01, 2023
Query protocol and response

whois Query protocol and response _MᵃˢᵗᵉʳBᵘʳⁿᵗ_ _ ( ) _ ( )( ) _ | | ( ) | || |__ _ (_) ___ | | | | | || _ `\ /'_`\ | |/',__) |

MasterBurnt 4 Sep 05, 2021
ServerStatus with node management and monitor

ServerStatus with node management and monitor

lidalao 162 Jan 01, 2023
Jogo da forca simples com conexão entre cliente e servidor utilizando TCP.

JogoDaForcaTCP Um jogo da forca simples com conexão entre cliente e servidor utilizando o protocólo TCP. Como jogar: Habilite a porta 20000, inicie o

Kelvin Santos 1 Dec 01, 2021
SMS Based Headless Browsing

Browse the internet without a network connection - Submission for ConUHacks VI

Zafir Khalid 2 Feb 07, 2022
Usbkill - an anti-forensic kill-switch that waits for a change on your USB ports and then immediately shuts down your computer.

Usbkill - an anti-forensic kill-switch that waits for a change on your USB ports and then immediately shuts down your computer.

Hephaestos 4.1k Dec 30, 2022
Simple threaded Python Rickroll server. Listens on port 23 by default.

Terminal Rickroll Simple threaded Python Rickroll server. Listens on port 23 by default. Rickroll video made using Video-To-Ascii and the standard ric

AG 10 Sep 13, 2022
A Simplest TCP client and echo server

Простейшие TCP-клиент и эхо-сервер Цель работы Познакомиться с приемами работы с сетевыми сокетами в языке программирования Python. Задания для самост

Юля Нагубнева 1 Oct 25, 2021
This script will make it easier to connect to any wireguard vpn config

wireguard-linux-python-script-vpn This script will make it easier to connect to any wireguard vpn config also u will need your wireguard vpn from your

Jimo 1 Sep 21, 2022
Program can control your server via discord bot

GTPS Controller Program can control your server via discord bot Require Python How To Use Download This Source Extract The Zip File Paste gtps.py to y

Lamp 2 Mar 15, 2022
A library of functions that can be used to manage the download of claims from the LBRY network.

lbrytools A library of functions that can be used to manage the download of claims from the LBRY network. It includes methods to download claims by UR

13 Dec 03, 2022
MQTT Explorer - MQTT Subscriber client to explore topic hierarchies

mqtt-explorer MQTT Explorer - MQTT Subscriber client to explore topic hierarchies Overview The MQTT Explorer subscriber client is designed to explore

Gambit Communications, Inc. 4 Jun 19, 2022
tradingview socket api for fetching real time prices.

tradingView-API tradingview socket api for fetching real time prices. How to run git clone https://github.com/mohamadkhalaj/tradingView-API.git cd tra

MohammadKhalaj 35 Dec 31, 2022
Python Program to connect to different VPN servers autoatically using Windscribe VPN.

AutomateVPN What is VPN ? VPN stands for Virtual Private Network , it is a technology that creates a safe and encrypted connectionover a less secure n

Vivek 1 Oct 27, 2021
A simple Tor switcher script switches tor nodes in interval of time

Tor_Switcher A simple Tor switcher script switches tor nodes in interval of time This script will switch tor nodes in every interval of time that you

d4rk sh4d0w 2 Nov 15, 2021
A python 3 library which helps in using nmap port scanner.

A python 3 library which helps in using nmap port scanner. This is done by converting each nmap command into a callable python3 method or function. System administrators can now automatic nmap scans

Nmmapper 179 Dec 19, 2022
Top server mcpe Indonesia!

server_mcpe Top server mcpe Indonesia! install pkg install python pkg install git git clone https://github.com/Latip176/server_mcpe cd server_mcpe pip

Muhammad Latif Harkat 2 Jul 17, 2022
A Python script that alerts via SMS when a stock is reaching an inflection point

TradeAlert Not sure what this will ultimately become, but for now, its a Python script that alerts via SMS when a stock is reaching an inflection poin

3 Feb 22, 2022
DNSStager is an open-source project based on Python used to hide and transfer your payload using DNS.

What is DNSStager? DNSStager is an open-source project based on Python used to hide and transfer your payload using DNS. DNSStager will create a malic

Askar 547 Dec 20, 2022