Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
TradingView Interactive Brokers Integration using Webhooks

TradingView Interactive Brokers Integration using Webhooks

84 Dec 19, 2022
Automated network configuration backups using Github actions and git-scraping

Network Config Scraper This repository demonstrates the use of Github Actions and git-scraping to build an automated backup solution for network confi

WWT 19 Dec 14, 2022
Pritunl is a distributed enterprise vpn server built using the OpenVPN protocol.

Pritunl is a distributed enterprise vpn server built using the OpenVPN protocol.

Pritunl 3.8k Jan 03, 2023
Docker container for demoing Wi-Fi calling stack.

VoWiFiLocalDemo - Docker container that runs StrongSwan and Kamailio to demonstrate how Wi-Fi calling works on smartphones.

18 Nov 12, 2022
simple subdomain finder

Subdomain-finder Simple SubDomain finder using python which is easy to use just download and run it Wordlist you can use your own wordlist but here i

AsjadOwO 5 Sep 24, 2021
RollerScanner — Fast Port Scanner Written On Python

RollerScanner RollerScanner — Fast Port Scanner Written On Python Installation You should clone this repository using: git clone https://github.com/Ma

68 Nov 09, 2022
sync application configuration and settings across multiple multiplatform devices

sync application configuration and settings across multiple multiplatform devices ✨ Key Features • ⚗️ Installation • 📑 How To Use • 🤔 FAQ • 🛠️ Setu

Souvik 6 Aug 25, 2022
Azure-function-proxy - Basic proxy as an azure function serverless app

azure function proxy (for phishing) here are config files for using *[.]azureweb

17 Nov 09, 2022
Get Your Localhost Online - Ngrok Alternative

Get Your Localhost Online - Ngrok Alternative

Azimjon Pulatov 442 Jan 04, 2023
This script helps us to add IP, host name entry in hosts file and create directory run nmap scan and directory scan with your favourite tools

A python script to automate your set-up for Hack The Box, It sets up Workspace, Opens TMUX session, connects to OpenVPN, Runs Nmap and many more...

Cognizance 7 Mar 25, 2022
An opensource library to use SNMP get/bulk/set/walk in Python

SNMP-UTILS An opensource library to use SNMP get/bulk/set/walk in Python Features Work with OIDS json list [Find Here](#OIDS List) GET command SET com

Alexandre Gossard 3 Aug 03, 2022
Aiotor - a pool of proxies, shifting on each request

Aiotor - a pool of proxies, shifting on each request

Leon 32 Dec 26, 2022
The Delegate Network: An Interactive Voice Response Delegative Democracy Implementation of Liquid Democracy

The Delegate Network Overview The delegate network is a completely transparent, easy-to-use and understand version of what is sometimes called liquid

James Bowery 2 Feb 25, 2022
euserv auto-renew script - A Python script which can help you renew your free EUserv IPv6 VPS.

eu_ex eu_ex means EUserv_extend. A Python script which can help you renew your free EUserv IPv6 VPS. This Script can check the VPS amount in your acco

A beam of light 92 Jan 25, 2022
Top server mcpe Indonesia!

server_mcpe Top server mcpe Indonesia! install pkg install python pkg install git git clone https://github.com/Latip176/server_mcpe cd server_mcpe pip

Muhammad Latif Harkat 2 Jul 17, 2022
A Python Tor template on Gitpod

A Python Tor template on Gitpod This is template configured for ephemeral development environments on Gitpod. prebuild Get Started With Your Own Proje

Ivan Yastrebov 1 Dec 17, 2021
A fire and forget command-line tool to allow for easy transitions of VPN connections between a pool of AWS machines.

VPN Swapper A fire and forget command-line tool to allow for easy transitions of VPN connections between a pool of AWS machines. Dependencies poetry -

Workday 5 Jul 07, 2022
📨 Share files easily over your local network from the terminal! 📨

Fileshare 📨 Share files easily over your local network from the terminal! 📨 Installation #

Dopevog 11 Sep 10, 2021
It can be used both locally and remotely (indicating IP and port)

It can be used both locally and remotely (indicating IP and port). It automatically finds the offset to the Instruction Pointer stored in the stack.

DiegoAltF4 13 Dec 29, 2022
This tool extracts Credit card numbers, NTLM(DCE-RPC, HTTP, SQL, LDAP, etc), Kerberos (AS-REQ Pre-Auth etype 23), HTTP Basic, SNMP, POP, SMTP, FTP, IMAP, etc from a pcap file or from a live interface.

This tool extracts Credit card numbers, NTLM(DCE-RPC, HTTP, SQL, LDAP, etc), Kerberos (AS-REQ Pre-Auth etype 23), HTTP Basic, SNMP, POP, SMTP, FTP, IMAP, etc from a pcap file or from a live interface

1.6k Jan 01, 2023