CloudProxy is to hide your scrapers IP behind the cloud

Related tags


CodeCov Coverage Codacy Quality Docker Cloud Build Status Contributors Forks Stargazers Issues MIT License



About The Project

The purpose of CloudProxy is to hide your scrapers IP behind the cloud. It allows you to spin up a pool of proxies using popular cloud providers with just an API token. No configuration needed.

CloudProxy exposes an API with the IPs and credentials of the provisioned proxies.

Providers supported:


  • Google Cloud
  • Azure
  • Scaleway
  • Vultr

Inspired by

This project was inspired by Scrapoxy, though that project no longer seems actively maintained.

The primary advantage of CloudProxy over Scrapoxy is that CloudProxy only requires an API token from a cloud provider. CloudProxy automatically deploys and configures the proxy on the cloud instances without the user needing to preconfigure or copy an image.

Please always scrape nicely, respectfully and do not slam servers.

Getting Started

To get a local copy up and running follow these simple steps.


All you need is:

  • Docker


Environment variables:


USERNAME - set the username for the forward proxy.

PASSWORD - set the password for the forward proxy.


AGE_LIMIT - set the age limit for your forward proxies in seconds. Once the age limit is reached, the proxy is replaced. A value of 0 disables the feature. Default value: 0.

See individual provider pages for environment variables required in above providers supported section.

Docker (recommended)

For example:

    -it -p 8000:8000 laffin/cloudproxy:latest

It is recommended to use a Docker image tagged to a version e.g. laffin/cloudproxy:0.3.0-beta, see releases for latest version.


CloudProxy exposes an API on localhost:8000. Your application can use the below API to retrieve the IPs with auth for the proxy servers deployed. Then your application can use those IPs to proxy.

The logic to cycle through IPs for proxying will need to be in your application, for example:

import random
import requests as requests

# Returns a random proxy from CloudProxy
def random_proxy():
    ips = requests.get("http://localhost:8000").json()
    return random.choice(ips['ips'])

proxies = {"http": random_proxy(), "https": random_proxy()}
my_request = requests.get("", proxies=proxies)

CloudProxy UI


You can manage CloudProxy via an API and UI. You can access the UI at http://localhost/ui.

You can scale up and down your proxies and remove them for each provider via the UI.

CloudProxy API

List available proxy servers



curl -X 'GET' 'http://localhost:8000/' -H 'accept: application/json'


{"ips":["http://username:password:", "http://username:password:"]}

List random proxy server


GET /random

curl -X 'GET' 'http://localhost:8000/random' -H 'accept: application/json'



Remove proxy server


DELETE /destroy

curl -X 'DELETE' 'http://localhost:8000/destroy?ip_address=' -H 'accept: application/json'


["Proxy to be destroyed"]

Get provider


GET /provider/digitalocean

curl -X 'GET' 'http://localhost:8000/providers/digitalocean' -H 'accept: application/json'


    "ips": [
    "scaling": {
      "min_scaling": 2,
      "max_scaling": 2

Update provider


PATCH /provider/digitalocean

curl -X 'PATCH' 'http://localhost:8000/providers/digitalocean?min_scaling=5&max_scaling=5' -H 'accept: application/json'


    "ips": [
    "scaling": {
      "min_scaling": 5,
      "max_scaling": 5

CloudProxy runs on a schedule of every 30 seconds, it will check if the minimum scaling has been met, if not then it will deploy the required number of proxies. The new proxy info will appear in IPs once they are deployed and ready to be used.


The project is at early alpha with limited features. In the future more providers will be supported, autoscaling will be implemented and a rich API to allow for blacklisting and recycling of proxies.

See the open issues for a list of proposed features (and known issues).


This method of scraping via cloud providers has limitations, many websites have anti-bot protections and blacklists in place which can limit the effectiveness of CloudProxy. Many websites block datacenter IPs and IPs may be tarnished already due to IP recycling. Rotating the CloudProxy proxies regularly may improve results. The best solution for scraping is via proxy services providing residential IPs, which are less likely to be blocked, however are much more expensive. CloudProxy is a much cheaper alternative for scraping sites that do not block datacenter IPs nor have advanced anti-bot protection. This a point frequently made when people share this project which is why I am including this in the README.


Contributions are what make the open source community such an amazing place to be learn, inspire, and create. Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request


Distributed under the MIT License. See LICENSE for more information.


Your Name - @christianlaffin - [email protected]

Project Link:


  • Maybe add feature for multiple digital ocean accounts?

    Maybe add feature for multiple digital ocean accounts?

    I'm loving this, it's perfect for what I need. Maybe a feature to consider in the future is to be able to use multiple digital ocean accounts as there's a limit of 10 for new users?

    Thanks for releasing this!

    opened by sblfc 11
  • Digitalocean Droplet created but not showing in the API

    Digitalocean Droplet created but not showing in the API


    I launched the container as per the documentation, but I'm still not seeing anything in the UI nor the API, I can see it created the droplets and keep removing/adding new droplets every few minutes

    What I'm missing?

    export DIGITALOCEAN_SIZE="s-1vcpu-512mb-10gb"
    export DIGITALOCEAN_REGION="fra1"
    export AGE_LIMIT="1200"
    export USERNAME="XXX"
    export PASSWORD='XXXX'
    docker run -e USERNAME=$USERNAME \
        -e AGE_LIMIT=$AGE_LIMIT \
        -it -p 8000:8000 laffin/cloudproxy:latest
    opened by mrahmadt 5
  • Allowed IP as alternative auth

    Allowed IP as alternative auth

    Actually i'm working with a project with same name called CloudProxy ( It's to pass throw cloudflare challenge. To do that is using a chrome with pupeeter. It's starting a chrome browser passing a proxy via parameter but it doesn't accept user and password on it ( This is the motivation to fork it and implement an alternative way to authenticate with proxy than using user and password. I used the parameter allowed_ip from tinyproxy.


    • Alternatrive authentication using ALLOWED_IP
    • Variable environments that are using boolean now are using real boolean instead of String. Instead of using ENABLE_AWS=True now you should write ENABLE_AWS=true. And in code we are using now (if ENABLE_AWS:) instead of (if ENABLE_AWS='True')
    • Check if proxy is working we are trying to load throw the proxy instead of check the proxy url directly. For some strange reason proxy server response 403 HTTP forbidden when you visit is directly. Ex: if it's using allowed_id, but it's really working as proxy. Increased timeout to 6 to let the proxy do its job.
    • Added optional parameter AWS_KEY_NAME to pass throw a pairkey to login on the EC2 instance.
    • Added optional parameter PROXY_STEALTH=true, default false to set proxy in stealth mode and not sending proxy headers on its requests to the sites.
    • Update and docs/

    Since is the first time i do something on python and docker. Probably is not really optimitzed and i guess there is space to enhance. I tested it on DigitalOcean, Hetzner and AWS and it works flawless without problems atm.

    I published a docker image: bubexel/cloudprroxy:latest

    Thank you for your work on it!


    opened by serk7 5
  • Unable to authenticate through DigitalOcean

    Unable to authenticate through DigitalOcean

    Expected Behavior

    Running the command:

    "docker run -e USERNAME='xxx' -e PASSWORD='xx' -e DIGITALOCEAN_ENABLED=True -e DIGITALOCEAN_ACCESS_TOKEN='xxx' -it -p 8000:8000 laffin/cloudproxy:latest

    Username & Password being alphanumeric. Token validated by using:

    "doctl auth init -t "xxx"

    I get the following error:

    File "/usr/local/lib/python3.8/site-packages/digitalocean/", line 233, in get_data raise DataReadError(msg) │ └ 'Unable to authenticate you' └ <class 'digitalocean.DataReadError'>

    I think my bug is identical to George Roscoe's. I've never had an issue running this before. I ran this a few weeks ago and it worked completely fine

    opened by HazzaWaltham123 4
  • Requests to AWS starts throwing [Errno 113] No route to host

    Requests to AWS starts throwing [Errno 113] No route to host

    I've run into an issue that I can't seem to pinpoint so I'm not sure if it's due to CloudProxy (TinyProxy).

    I've set up CloudProxy to run in Docker with 15 AWS Spot instances. Then I've written a Python Flask script that fetches the IPs from CloudProxy once every minute, accepts an URL (GET request), and returns the html page fetched through one of these AWS proxies. The reason I'm doing it this way is because my original application that uses the html data doesn't allow me to set the user agent, so I need to go through a proxy that allows this.

    This is the fetch line in the Flask application (proxy):

    proxies = {"http": proxy, "https": proxy} resp = requests.get(url, headers=headers, proxies=proxies, timeout=5, allow_redirects=True, stream=True)

    It can run fine for hours until suddenly all my AWS instances started dying. I went through the CloudProxy code and identified that the restarts was due to the ALIVE checks failing. So I disabled that code and also added some exceptions in my own application. It solved the instances dying, but not the original issue.

    It turned out that the code line above (requests.get) suddenly starts throwing the following error:

    HTTPConnectionPool(host='X.X.X.X', port=8899): Max retries exceeded with url: (Caused by ProxyError('Cannot connect to proxy.', NewConnectionError('<urllib3.connection.HTTPConnection object at 0x7f39cd7d7550>: Failed to establish a new connection: [Errno 113] No route to host')))

    I've masked the IP and url for privacy reasons.

    So basically, my scripts run for hours with 1-2 requests every second until all the requests suddenly starts spitting out the exception above until it bogged down my entire WiFi. The internet on all of my computers almost stops responding. The only solution is to stop the requests, give it a few minutes and then resume like nothing happened.

    After fixing the Spot instances dying, my second idea was that there's some kind of TCP limit in AWS. So I upgraded my instances from Nano to Micro with no apparent improvement. I considered it being a Docker issue but I only fetch ALIVE IPs once every minute so I can't see how that would be limited in any way. I don't see it being a TinyProxy limit since my 1-2 requests are spread out over 15 different AWS instances.

    Do you know if there is any AWS limit I'm hitting or have you experienced anything similar with CloudProxy?

    opened by secretobserve 3
  • Add Google Cloud provider

    Add Google Cloud provider

    This Pull Request adds Google Cloud provider support for cloudproxy and solves #35.

    Known issues:

    • Proxy removal from cloudproxy-ui does not work when "Remove" button is clicked. (At least didn't work for me)

    Future work:

    • Given that this code is based on the AWS code, it will be probably a good idea at some point in the future to refactor all the providers' code to reduce the amount of duplicated code and logic.
    opened by dusancz 3
  • trouble authenticating proxy / documentation of authentication for AWS

    trouble authenticating proxy / documentation of authentication for AWS

    Expected Behavior

    I followed the docs here and here I created environment variables, all alphanumerica, for USERNAME and PASSWORD. I created an IAM role as instructed, and I can see the EC2 instances. WHen using the toy example these are correctly filled in (i.e., instead of being changeme:[email protected] it is user:[email protected]).

    Actual Behavior

    WHen connecting to the proxy I get the error message: The administrator of this proxy has not configured it to service requests from you.

    This is almost certainly from my misunderstanding the docs (as I haven't worked with AWS before). Are we meant to set the username and password somewhere in AWS too? I also tried creating a password for the IAM user and using that, but that isn't allowed to be alphanumeric. I'd also be happy to write some documentation for entry/beginners like myself once I get it up and running

    opened by EthanTheMathmo 2
  • Can't change the default zone of GCP Proxies

    Can't change the default zone of GCP Proxies

    docker run -e USERNAME='xxx' -e PASSWORD='xxx' -e GCP_ENABLED=True -e GCP_PROJECT='xxx' -e GCP_SIZE='e2-micro' -e GCP_ZONE='asia-northeast3-a' -e GCP_SERVICE_ACCOUNT_KEY='xxx' -it -p 8000:8000 laffin/cloudproxy:latest

    I tried this command but it always creates instances in the us default zone, I can't switch it to the different zone. Can you fix this ?

    Thank you

    opened by LuongPhuHoa 2
  • Cancel spot requests when associated instances are terminated

    Cancel spot requests when associated instances are terminated

    When deleting proxies filled by one-time spot requests, the instances are terminated, but the spot request itself is not cancelled, leaving the door open to being filled again in the future when not associated with cloudproxy. This PR deletes the spot request if it exists in the delete_proxy() function. Related to #42, but unclear if applicable to persistent spot requests.

    opened by henryzxu 2
  • Ghost proxies when destroying using spot in AWS

    Ghost proxies when destroying using spot in AWS

    Expected Behavior

    The proxies are destroyed and remain gone.

    Actual Behavior

    The proxies are destroyed, then some unknown time later are restarted but without the cloudproxy tag since they are started by something other than cloudproxy. They are fully functional proxies though, just missing the tag.

    Steps to Reproduce the Problem

    1. Start cloudproxy
    2. Increase servers to 30, wait.
    3. Decrease servers to 5, wait.


    • Version: 0.5.2


    My guess is that when you destroy the instances you also have to remove the spot request somehow, but I don't quite understand why.

    opened by xanrag 2
  • Add SSL config for HTTPS

    Add SSL config for HTTPS

    I think this will partly solve

    The cert.pem and key.pem can be generated with mkcert

    Tbh I have no idea what I'm doing, any suggestions would be much appreciated. I tested this locally and I have the docker image running on https now.

    Edit: I suppose mkcert is only good for https in local environments. My goal is to deploy Cloudproxy to an AWS ec2 instance and make it accessible via HTTPS, and ensure communication between the cloudproxy server and proxy servers are done via HTTPS as well.

    opened by jcohenho 1
  • Multiple regions & Historical reporting

    Multiple regions & Historical reporting


    Thank you very much for this great script, simple and can be a replacement for scraproxy

    Is it possible to define multiple regions? for example, I want to have 3-5 regions with digitalocean and cloudproxy will randomly create VMs on them

    my second question is there any log file or report that I can use to check how many VMs has been created, the duration, and the period? so I can compare my hosting cost let's say on weekly bases and decide which cloud provider is better for me?

    opened by mrahmadt 1
  • Support multiple client applications sharing single proxy cloud

    Support multiple client applications sharing single proxy cloud

    One thing I've always missed in Scrapoxy is ability to support multiple clients. Would be great to see it implemented here.

    In Scrapoxy you could set (min, required, max) scaling and it works well as long as there is just one client application trying to use the proxy cloud. But as soon as you want to share same cloud between multiple applications, you run into a problem that they conflict with each other. E.g. when one application has finished crawling, it can't just downscale the cloud as it's still being used by another application etc.

    Ideally that requires a centralized logic that manages requests from multiple client applications. It would need to track the most recently requested scaling for each client, and combine them. A very simple logic could be to just take max of all min/required/max parameters across clients and use that as the scaling. That way, the cloud would only downscale when the last client sends the downscale request. You can imagine logic becoming more complex though, e.g. when one client asks to destroy an instance that the other client still uses etc.

    As an extra feature, it should ideally handle stale clients - if a client has not communicated with it for a while, it should disregard its requirements, to avoid leaving dangling instances when client unexpectedly disappears.

    opened by nirvana-msu 0
  • Not all environment vars passed to container being used

    Not all environment vars passed to container being used

    Report of -e DIGITALOCEAN_MIN_SCALING=0 -e DIGITALOCEAN_MAX_SCALING=0 commands do not work, it always starts at 2.

    Originally posted by @sblfc in

    opened by claffin 0
  • What providers should be added next?

    What providers should be added next?

    At the moment CloudProxy supports AWS and DigitalOcean, which is enough for my own personal use case. I'm keen to hear if there is interest in other providers being supported, please share here and I will prioritise. Otherwise, new features will be prioritised for now.

    opened by claffin 6
  • v0.6.5-beta(Sep 27, 2022)

  • v0.6.4-beta(Jul 5, 2022)

  • v0.6.3-beta(Feb 13, 2022)

  • v0.6.1-beta(Jul 19, 2021)

  • v0.6.0-beta(Jul 4, 2021)

  • v0.5.2-beta(Jul 1, 2021)

  • v0.5.1-beta(Jul 1, 2021)

  • v0.5.0-beta(Jun 28, 2021)

  • v0.4.0-beta(Jun 15, 2021)

    • #24 Bugfix AWS delete only checking the first instance
    • Change retries to 1 and set a timeout of 10s on fetch_ip
    • Rewrote the check_alive function to be much simpler, the fetch_ip check was not viable at 20+ proxies. It took too long.
    • Updated ip_list not to return IP:s slated for destruction
    • Added option to restart AWS proxies, much faster than destroy/create and fetches a new IP. Not supported for DO.
    • Opened up port 22 on proxies for debugging, future enhancement is to only allow web control. (Use the EC2_INSTANCE_CONNECT filter for the service parameter to get the IP address ranges in the EC2 Instance Connect subset. )
    • Enhanced status messages a bit in the check_alive for AWS
    • Moved check_delete/stop to before provision so it removes and then immediately provisions a new one instead of waiting 20s for the next tick
    • Changed the proxy software to tinyproxy directly on the image instead of using docker. Much faster deployment and less CPU intensive so should work better with t2.nano
    • Updated the settings checks to compare true/false as a string since it seems to be what it is getting, earlier a value of False in the config would read as true.
    • Updated environ get to match the doc (ie SCALING instead of SCALE)
    • Added botocore to requirements.txt, which seemed to be missing.

    @xanrag thank you for all these fixes.

    Source code(tar.gz)
    Source code(zip)
  • v0.3.3-beta(May 10, 2021)

  • v0.3.2-beta(May 7, 2021)

  • v0.3.1-beta(May 6, 2021)

  • v0.3.0-beta(Apr 28, 2021)

  • v0.2.2-beta(Apr 27, 2021)

    • Updated error handling
    • Added retry to check alive
    • Added CORS and delete_queue now set
    • Schedule with providers every 20 seconds now and removed auth from IP
    • Fixed failing tests
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1-beta(Apr 26, 2021)

  • v0.2.0-beta(Apr 22, 2021)

  • v0.1.1-alpha(Apr 19, 2021)

  • v0.1.0-alpha(Apr 19, 2021)

Christian Laffin
Christian Laffin
A Python script that alerts via SMS when a stock is reaching an inflection point

TradeAlert Not sure what this will ultimately become, but for now, its a Python script that alerts via SMS when a stock is reaching an inflection poin

3 Feb 22, 2022
A Python based command line ARP Spoofer utility, which takes input as arguments for the exact target IP and gateway IP for which you wish to Spoof ARP request

A Python based command line ARP Spoofer utility, which takes input as arguments for the exact target IP and gateway IP for which you wish to Spoof ARP request

Abhinandan Khurana 1 Feb 10, 2022
BibleNotifyDesktop - Desktop version of Bible Notify

Bible Notify Desktop This is the repository for the Desktop version of the daily

Bible Notify 5 Nov 16, 2022
Tripwire monitors ports and icmp to send the admin a message if somebody is scanning a machine that shouldn't be touched

Tripwire monitors ports and icmp to send the admin a message if somebody is scanning a machine that shouldn't be touched

3 Apr 05, 2022
This is a Client-Server-System which can share the screen from the server to client and in the other direction.

Screenshare-Streaming-Python This is a Client-Server-System which can share the screen from the server to client and in the other direction. You have

VFX / Videoeffects Creator 1 Nov 19, 2021
Simple Port Scanner script written in Python, plans is to expand upon this script to turn it into a GUI based pen testing suite

PortScanner Simple Port Scanner script written in Python, plans is to expand upon this script to turn it into a GUI based pen testing suite. #IMPORTAN

1 Oct 23, 2021
Simplest dashboard for WireGuard VPN written in Python w/ Flask

Hi! I'm planning the next major update for this project, please let me know if you have any suggestions or feature requests ;) You can create an issue

Donald Zou 763 Jan 02, 2023
PetrickScanner is a simple Python OOP TCP Port Scanner

PetrickScanner PetrickScanner is a simple Python OOP TCP Port Scanner Functions Python TCP Port Scanner DNS Resolver Random Scanner PLEASE ANY PROBLEM

11 Nov 30, 2021
A server and client for passing data between computercraft computers/turtles across dimensions or even servers.

ccserver A server and client for passing data between computercraft computers/turtles across dimensions or even servers. pastebin get zUnE5N0v client

1 Jan 22, 2022
Python Program to connect to different VPN servers autoatically using Windscribe VPN.

AutomateVPN What is VPN ? VPN stands for Virtual Private Network , it is a technology that creates a safe and encrypted connectionover a less secure n

Vivek 1 Oct 27, 2021
RabbitMQ asynchronous connector library for Python with built in RPC support

About RabbitMQ connector library for Python that is fully integrated with the aio-pika framework. Introduction BunnyStorm is here to simplify working

22 Sep 11, 2022
ThorFI: A Novel Approach for Network Fault Injection as a Service

ThorFI: a Novel Approach for Network Fault Injection as a Service This repo includes ThorFI, a novel fault injection solution for virtual networks in

DESSERT research lab (Federico II University of Naples, Italy) 6 Dec 14, 2022
School Project using Python Sockets and Personal Encryption Method.

Python-Secure-File-Transfer School Project using Python Sockets and Personal Encryption Method. Installation Must have python3 installed on your syste

1 Dec 03, 2021
NetMiaou is an crossplatform hacking tool that can do reverse shells, send files, create an http server or send and receive tcp packet

NetMiaou is an crossplatform hacking tool that can do reverse shells, send files, create an http server or send and receive tcp packet

TRIKKSS 5 Oct 05, 2022
NSX-T infrastructure as code - SDDC deployment

Deploy NSX-T Infrastructure - Simple Topology by Nicolas MICHEL @vpackets / LinkedIn Introduction The purpose of this entire repository is to automate

21 Nov 28, 2022
A gRPC-Web implementation for Python

Sonora Sonora is a Python-first implementation of gRPC-Web built on top of standard Python APIs like WSGI and ASGI for easy integration. Why? Regular

Alex Stapleton 216 Dec 30, 2022
Jogo da forca simples com conexão entre cliente e servidor utilizando TCP.

JogoDaForcaTCP Um jogo da forca simples com conexão entre cliente e servidor utilizando o protocólo TCP. Como jogar: Habilite a porta 20000, inicie o

Kelvin Santos 1 Dec 01, 2021
Discord RPC Generator With Python

Discord-RPC-Generator Thank you for using this Discord Custom RP Generator. This is 100% safe and open source. Download Discord for your computer here

1 Nov 09, 2021
PcapXray - A Network Forensics Tool - To visualize a Packet Capture offline as a Network Diagram

PcapXray - A Network Forensics Tool - To visualize a Packet Capture offline as a Network Diagram including device identification, highlight important communication and file extraction

Srinivas P G 1.4k Dec 28, 2022
PyBERT is a serial communication link bit error rate tester simulator with a graphical user interface (GUI).

PyBERT PyBERT is a serial communication link bit error rate tester simulator with a graphical user interface (GUI). It uses the Traits/UI package of t

David Banas 59 Dec 23, 2022