Python tutorial for implementing Oxylabs' Residential Proxies with AIOHTTP

Overview

Integrating Oxylabs' Residential Proxies with AIOHTTP

Requirements for the Integration

For the integration to work you'll need to install aiohttp library, use Python 3.6 version or higher and Residential Proxies.
If you don't have aiohttp library, you can install it by using pip command:

pip install aiohttp

You can get Residential Proxies here: https://oxylabs.io/products/residential-proxy-pool

Proxy Authentication

There are 2 ways to authenticate proxies with aiohttp.
The first way is to authorize and pass credentials along with the proxy URL using aiohttp.BasicAuth:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"
 
async def fetch():
    async with aiohttp.ClientSession() as session:
        proxy_auth = aiohttp.BasicAuth(USER, PASSWORD)
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy="http://pr.oxylabs.io:7777", 
                proxy_auth=proxy_auth ,
        ) as resp:
            print(await resp.text())

The second one is by passing authentication credentials in proxy URL:

import aiohttp

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

async def fetch():
    async with aiohttp.ClientSession() as session:
        async with session.get(
                "http://ip.oxylabs.io", 
                proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as resp: 
            print(await resp.text())

In order to use your own proxies, adjust user and pass fields with your Oxylabs account credentials.

Testing Proxies

To see if the proxy is working, try visiting https://ip.oxylabs.io. If everything is working correctly, it will return an IP address of a proxy that you're currently using.

Sample Project: Extracting Data From Multiple Pages

To better understand how residential proxies can be utilized for asynchronous data extracting operations, we wrote a sample project to scrape product listing data and save the output to a CSV file. The proxy rotation allows us to send multiple requests at once risk-free – meaning that we don't need to worry about CAPTCHA or getting blocked. This makes the web scraping process extremely fast and efficient – now you can extract data from thousands of products in a matter of seconds!

li > article.product_pod"): data = { "title": product_data.select_one("h3 > a")["title"], "url": product_data.select_one("h3 > a").get("href")[5:], "product_price": product_data.select_one("p.price_color").text, "stars": product_data.select_one("p")["class"][1], } results_list.append(data) # Fill results_list by reference. print(f"Extracted data for a book: {data['title']}") async def fetch(session, sem, url, results_list): async with sem: async with session.get( url, proxy=f"http://{USER}:{PASSWORD}@{END_POINT}", ) as response: await parse_data(await response.text(), results_list) async def create_jobs(results_list): sem = asyncio.Semaphore(4) async with aiohttp.ClientSession() as session: await asyncio.gather( *[fetch(session, sem, url, results_list) for url in url_list] ) if __name__ == "__main__": results = [] start = time.perf_counter() # Different EventLoopPolicy must be loaded if you're using Windows OS. # This helps to avoid "Event Loop is closed" error. if sys.platform.startswith("win") and sys.version_info.minor >= 8: asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy()) try: asyncio.run(create_jobs(results)) except Exception as e: print(e) print("We broke, but there might still be some results") print( f"\nTotal of {len(results)} products from {len(url_list)} pages " f"gathered in {time.perf_counter() - start:.2f} seconds.", ) df = pd.DataFrame(results) df["url"] = df["url"].map( lambda x: "".join(["https://books.toscrape.com/catalogue", x]) ) filename = "scraped-books.csv" df.to_csv(filename, encoding="utf-8-sig", index=False) print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}") ">
import asyncio
import time
import sys
import os

import aiohttp
import pandas as pd
from bs4 import BeautifulSoup

USER = "user"
PASSWORD = "pass"
END_POINT = "pr.oxylabs.io:7777"

# Generate a list of URLs to scrape.
url_list = [
    f"https://books.toscrape.com/catalogue/category/books_1/page-{page_num}.html"
    for page_num in range(1, 51)
]


async def parse_data(text, results_list):
    soup = BeautifulSoup(text, "lxml")
    for product_data in soup.select("ol.row > li > article.product_pod"):
        data = {
            "title": product_data.select_one("h3 > a")["title"],
            "url": product_data.select_one("h3 > a").get("href")[5:],
            "product_price": product_data.select_one("p.price_color").text,
            "stars": product_data.select_one("p")["class"][1],
        }
        results_list.append(data)  # Fill results_list by reference.
        print(f"Extracted data for a book: {data['title']}")


async def fetch(session, sem, url, results_list):
    async with sem:
        async with session.get(
            url,
            proxy=f"http://{USER}:{PASSWORD}@{END_POINT}",
        ) as response:
            await parse_data(await response.text(), results_list)


async def create_jobs(results_list):
    sem = asyncio.Semaphore(4)
    async with aiohttp.ClientSession() as session:
        await asyncio.gather(
            *[fetch(session, sem, url, results_list) for url in url_list]
        )


if __name__ == "__main__":
    results = []
    start = time.perf_counter()

    # Different EventLoopPolicy must be loaded if you're using Windows OS.
    # This helps to avoid "Event Loop is closed" error.
    if sys.platform.startswith("win") and sys.version_info.minor >= 8:
        asyncio.set_event_loop_policy(asyncio.WindowsSelectorEventLoopPolicy())

    try:
        asyncio.run(create_jobs(results))
    except Exception as e:
        print(e)
        print("We broke, but there might still be some results")

    print(
        f"\nTotal of {len(results)} products from {len(url_list)} pages "
        f"gathered in {time.perf_counter() - start:.2f} seconds.",
    )
    df = pd.DataFrame(results)
    df["url"] = df["url"].map(
        lambda x: "".join(["https://books.toscrape.com/catalogue", x])
    )
    filename = "scraped-books.csv"
    df.to_csv(filename, encoding="utf-8-sig", index=False)
    print(f"\nExtracted data can be found at {os.path.join(os.getcwd(), filename)}")

If you want to test the project's script by yourself, you'll need to install some additional packages. To do that, simply download requirements.txt file and use pip command:

pip install -r requirements.txt

If you're having any trouble integrating proxies with aiohttp and this guide didn't help you - feel free to contact Oxylabs customer support at [email protected].

Owner
Oxylabs.io
Oxylabs.io
BibleNotifyDesktop - Desktop version of Bible Notify

Bible Notify Desktop This is the repository for the Desktop version of the daily

Bible Notify 5 Nov 16, 2022
Simple reverse backdoor utility, that uses sockets to communicate.

reverse_backdoor Simple reverse backdoor utility, that uses sockets to communicate. How to use Run rev_bd_listener.py using command below: $ python3 r

1 Dec 10, 2021
A SOCKS proxy server implemented with the powerful python cooperative concurrency framework asyncio.

asyncio-socks-server A SOCKS proxy server implemented with the powerful python cooperative concurrency framework asyncio. Features Supports both TCP a

Amaindex 164 Dec 30, 2022
Converts Cisco formatted MAC Addresses to PC formatted MAC Addresses

Cisco-MAC-to-PC-MAC Converts a file with a list of Cisco formatted MAC Addresses to PC formatted MAC Addresses... Ex: abcd.efgh.ijkl to AB:CD:EF:GH:I

Stew Alexander 0 Jan 04, 2022
Dokumentasi belajar Network automation

Repositori belajar network automation dengan Docker, Python & GNS3 Using Frameworks and integrate with: Paramiko Netmiko Telnetlib CSV SFTP Netmiko, S

Daniel.Pepuho 3 Mar 15, 2022
Securely and anonymously share files, host websites, and chat with friends using the Tor network

OnionShare OnionShare is an open source tool that lets you securely and anonymously share files, host websites, and chat with friends using the Tor ne

OnionShare 5.4k Jan 01, 2023
E4GL3OS1NT - Simple Information Gathering Tool

E4GL30S1NT Features userrecon - username reconnaissance facedumper - dump facebook information mailfinder - find email with specific name godorker - d

C0MPL3XDEV 195 Dec 21, 2022
Tiny Interactive File Transfer Application

TIFTA: Tiny Interactive File Transfer Application This repository holds all the source code, tests and documentation of the TIFTA software. The main g

Jorge Martínez 2 Dec 08, 2021
D-dos attack GUI tool written in python using tkinter module

ddos D-dos attack GUI tool written in python using tkinter module #to use this tool on android, do the following on termux. *. apt update *. apt upgra

6 Feb 05, 2022
This is a python based command line Network Scanner utility, which input as an argument for the exact IP address or the relative IP Address range you wish to do the Network Scan for and returns all the available IP addresses with their MAC addresses on your current Network.

This is a python based command line Network Scanner utility, which input as an argument for the exact IP address or the relative IP Address range you wish to do the Network Scan for and returns all t

Abhinandan Khurana 1 Feb 09, 2022
ThorFI: A Novel Approach for Network Fault Injection as a Service

ThorFI: a Novel Approach for Network Fault Injection as a Service This repo includes ThorFI, a novel fault injection solution for virtual networks in

DESSERT research lab (Federico II University of Naples, Italy) 6 Dec 14, 2022
MoreIP 一款基于Python的面向 MacOS/Linux 用户用于查询IP/域名信息的日常渗透小工具

MoreIP 一款基于Python的面向 MacOS/Linux 用户用于查询IP/域名信息的日常渗透小工具

xq17 9 Sep 21, 2022
Herramienta para transferir eventos de Shadowserver REST API hacia Azure Blob Storage.

Herramienta para transferir eventos de Shadowserver REST API hacia Azure Blob Storage.

CSIRT-RD 1 Feb 04, 2022
Network-Shredder is a python based NIDS.

Network-Shredder is a python based NIDS.

Oussama RAHALI 9 Dec 13, 2022
A simple, configurable application and set of services to monitor multiple raspberry pi's on a network.

rpi-info-monitor A simple, configurable application and set of services to monitor multiple raspberry pi's on a network. It can be used in a terminal

Kevin Kirchhoff 11 May 22, 2022
This is the code repository for Mastering Python for Networking and Security – Second Edition

Mastering Python for Networking and Security – Second Edition This is the code repository for Mastering Python for Networking and Security – Second Ed

Frank Gottinger 1 Feb 09, 2022
A simple and lightweight server that allows clients to connect and launch a shell remotely through a browser.

carrotsh A simple and lightweight server that allows clients to connect and launch a shell remotely through a browser. Uses xterm.js for the frontend

V9 31 Dec 27, 2022
Asyncer, async and await, focused on developer experience

Asyncer, async and await, focused on developer experience. Documentation: https:

Sebastián Ramírez 895 Dec 28, 2022
DEMO SOCKET AF INET SSL PYTHON

DEMO_SOCKET_AF_INET_SSL_PYTHON Python demo of socket family as AF_INET using TCP with SSL. Compatibility : macOS & GNU/Linux Network Topology style :

Enola 1 Jan 24, 2022
This tool extracts Credit card numbers, NTLM(DCE-RPC, HTTP, SQL, LDAP, etc), Kerberos (AS-REQ Pre-Auth etype 23), HTTP Basic, SNMP, POP, SMTP, FTP, IMAP, etc from a pcap file or from a live interface.

This tool extracts Credit card numbers, NTLM(DCE-RPC, HTTP, SQL, LDAP, etc), Kerberos (AS-REQ Pre-Auth etype 23), HTTP Basic, SNMP, POP, SMTP, FTP, IMAP, etc from a pcap file or from a live interface

1.6k Jan 01, 2023