This Python script can enumerate all URLs present in robots.txt files, and test whether they can be accessed or not.

Overview

Robots.txt tester

With this script, you can enumerate all URLs present in robots.txt files, and test whether you can access them or not.

example

Setup

Clone the repository and install the dependencies :

git clone https://github.com/p0dalirius/robotstester
cd robotstester
python3 setup.py install

Usage

robotstester -u http://www.example.com/

You can find here a complete list of options :

[~] Robots.txt tester, v1.2.0

usage: robotstester.py [-h] (-u URL | -f URLSFILE) [-v] [-q] [-k] [-L] [-t THREADS] [-p] [-j JSONFILE] [-x PROXY] [-b COOKIES]

This Python script can enumerate all URLs present in robots.txt files, and test whether they can be accessed or not.

optional arguments:
  -h, --help            show this help message and exit
  -u URL, --url URL     URL to the robots.txt to test e.g. https://example.com:port/path
  -f URLSFILE, --urlsfile URLSFILE
                        List of robots.txt urls to test
  -v, --verbose         verbosity level (-v for verbose, -vv for debug)
  -q, --quiet           Show no information at all
  -k, --insecure        Allow insecure server connections when using SSL (default: False)
  -L, --location        Follow redirects (default: False)
  -t THREADS, --threads THREADS
                        Number of threads (default: 5)
  -p, --parsable        Parsable output
  -j JSONFILE, --jsonfile JSONFILE
                        Save results to specified JSON file.
  -x PROXY, --proxy PROXY
                        Specify a proxy to use for requests (e.g., http://localhost:8080)
  -b COOKIES, --cookies COOKIES
                        Specify cookies to use in requests. (e.g., --cookies "cookie1=blah;cookie2=blah")

Contributing

Pull requests are welcome. Feel free to open an issue if you want to add other features.

You might also like...
Import modules and files straight from URLs.

Import Python code from modules straight from the internet.

A python script made for personal use to monitor for sports card restocks on target.com since they are sold out often

TargetProductMonitor A python script made for personal use to monitor for sports card resocks on target.com since they are sold out often. When a rest

My sister is a GR of her class. She had to mark attendance of students from screenshots of teams meeting on an excel sheet. I resolved her problem by reading names from screenshots using PyTesseract and marking them present on the excel using Pandas in Python. It took me 1hr to write the code and it is saving half an hour everyday.
Manipulation OpenAI Gym environments to simulate robots at the STARS lab

liegroups Python implementation of SO2, SE2, SO3, and SE3 matrix Lie groups using numpy or PyTorch. [Documentation] Installation To install, cd into t

Serverless demo showing users how they can capture (and obfuscate) their Lambda payloads in Datadog APM
Serverless demo showing users how they can capture (and obfuscate) their Lambda payloads in Datadog APM

Serverless-capture-lambda-payload-demo Serverless demo showing users how they can capture (and obfuscate) their Lambda payloads in Datadog APM This wi

A one place destination to check whatever is trending on the top social and news websites at present.
A one place destination to check whatever is trending on the top social and news websites at present.

UpTrend A one place destination to check whatever is trending on the top social and news websites at present. Explore the docs » View Demo · Report Bu

 Python requirements.txt Guesser
Python requirements.txt Guesser

Python-Requirements-Guesser ⚠️ This is alpha quality software. Work in progress Attempt to guess requirements.txt modules versions based on Git histor

Birthday program - A program that lookups a birthday txt file and compares to the current date to check for birthdays
Birthday program - A program that lookups a birthday txt file and compares to the current date to check for birthdays

Birthday Program This is a program that lookups a birthday txt file and compares

Write a program that works out whether if a given year is a leap year
Write a program that works out whether if a given year is a leap year

Leap Year 💪 This is a Difficult Challenge 💪 Instructions Write a program that works out whether if a given year is a leap year. A normal year has 36

Comments
  • [Feature]  Add waybackmachine capability

    [Feature] Add waybackmachine capability

    In the past few days I've been experiencing using waybackmachine to enumerate robots.txt endpoints.

    Sometimes robots.txt gets removed and sometimes the removed content can be juicy. Thus the ideia of searching every WBM to look for old robots entries.

    I've implemented a quick and basic script to do a PoC, but I feel like this repo has the power to bring it to the next level since a lot of good features are already done.

    https://gist.github.com/felipecaon/035ad1718c3cae681d2afb03c699795f

    The gist works by getting all the robots.txt entries from WBM, parsing and sending to stdout. The script does not remove dps, just do a basic word removal.

    If I have the time I may be able to open a PR. But if someone wants to takes it further, I would love to see that. The core waybackmachine endpoints to be used are on my gist file.

    opened by felipecaon 0
Releases(1.2)
  • 1.2(Jul 7, 2021)

    Added --parsable option :cat2:

    usage: robotstester.py [-h] (-u URL | -f URLSFILE) [-v] [-q] [-k] [-L] [-t THREADS] [-p] [-j JSONFILE] [-x PROXY] [-b COOKIES]
    
    This Python script can enumerate all URLs present in robots.txt files, and test whether they can be accessed or not.
    
    optional arguments:
      -h, --help            show this help message and exit
      -u URL, --url URL     URL to the robots.txt to test e.g. https://example.com:port/path
      -f URLSFILE, --urlsfile URLSFILE
                            List of robots.txt urls to test
      -v, --verbose         verbosity level (-v for verbose, -vv for debug)
      -q, --quiet           Show no information at all
      -k, --insecure        Allow insecure server connections when using SSL (default: False)
      -L, --location        Follow redirects (default: False)
      -t THREADS, --threads THREADS
                            Number of threads (default: 5)
      -p, --parsable        Parsable output
      -j JSONFILE, --jsonfile JSONFILE
                            Save results to specified JSON file.
      -x PROXY, --proxy PROXY
                            Specify a proxy to use for requests (e.g., http://localhost:8080)
      -b COOKIES, --cookies COOKIES
                            Specify cookies to use in requests. (e.g., --cookies "cookie1=blah;cookie2=blah")
    
    Source code(tar.gz)
    Source code(zip)
  • 1.0(Jul 5, 2021)

    [~] Robots.txt tester, v1.0
    
    usage: robotstester.py [-h] [-u URL | -f URLSFILE] [-v] [-q] [-k] [-L] [-t THREADS] [-j JSONFILE] [-x PROXY] [-b COOKIES]
    
    This Python script can enumerate all URLs present in robots.txt files, and test whether they can be accessed or not.
    
    optional arguments:
      -h, --help            show this help message and exit
      -u URL, --url URL     URL to the robots.txt to test e.g. https://example.com:port/path
      -f URLSFILE, --urlsfile URLSFILE
                            List of robots.txt urls to test
      -v, --verbose         verbosity level (-v for verbose, -vv for debug)
      -q, --quiet           Show no information at all
      -k, --insecure        Allow insecure server connections when using SSL (default: False)
      -L, --location        Follow redirects (default: False)
      -t THREADS, --threads THREADS
                            Number of threads (default: 5)
      -j JSONFILE, --jsonfile JSONFILE
                            Save results to specified JSON file.
      -x PROXY, --proxy PROXY
                            Specify a proxy to use for requests (e.g., http://localhost:8080)
      -b COOKIES, --cookies COOKIES
                            Specify cookies to use in requests. (e.g., --cookies "cookie1=blah;cookie2=blah")
    
    Source code(tar.gz)
    Source code(zip)
Owner
Podalirius
Hacker of everything
Podalirius
Tucan Discord Token Generator - Remastered

TucanGEN-SRC Tucan Discord Token Generator - Remastered Tucan source made better by me. -- idk if it works anymore Includes: hCaptcha Bypass Automatic

Vast 8 Nov 04, 2022
A collection of python exercises to help your learning path!

How to use Step 1: run this command git clone https://github.com/TechPenguineer/Python-Exercises.git Step 2: Run this command cd Python-Exercises You

Tech Penguin 5 Aug 05, 2021
Birthday program - A program that lookups a birthday txt file and compares to the current date to check for birthdays

Birthday Program This is a program that lookups a birthday txt file and compares

Daquiver 4 Feb 02, 2022
Waydroid is a container-based approach to boot a full Android system on a regular GNU/Linux system like Ubuntu.

Waydroid is a container-based approach to boot a full Android system on a regular GNU/Linux system like Ubuntu.

WayDroid 4.7k Jan 08, 2023
Tool to audit and fix Python project requirements.

Requirement Auditor Utility to revise and updated python requirement files.

Luis Carlos Berrocal 1 Nov 07, 2021
Basic infrastructure for writing scripts in Python

Base Script Python is an excellent language that makes writing scripts very straightforward. Over the course of writing many scripts, we realized that

Deep Compute, LLC 9 Jan 07, 2023
Automation in socks label validation

This is a project for socks card label validation where the socks card is validated comparing with the correct socks card whose coordinates are stored in the database. When the test socks card is com

1 Jan 19, 2022
Inacap - Programa para pasar las notas de inacap a una hoja de cálculo rápidamente.

Inacap Programa en python para obtener varios datos académicos desde inacap y subirlos directamente a una hoja de cálculo. Cómo funciona Primero que n

Gabriel Barrientos 0 Jul 28, 2022
Print 'text color' and 'text format' on Term with Python

term-printer Print 'text color' and 'text format' on Term with Python ※ It may not work depending on the OS and shell used. PIP $ pip install term-pri

ななといつ 10 Nov 12, 2022
Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)

find_te_ins find_te_ins is designed to find Transposon Element (TE) insertions using long reads (nanopore), by alignment directly. (minimap2) Install

Ming Wang 1 Feb 09, 2022
Convert text with ANSI color codes to HTML or to LaTeX.

Convert text with ANSI color codes to HTML or to LaTeX.

PyContribs 326 Dec 28, 2022
Program Input Data Mahasiswa Oop

PROGRAM INPUT NILAI MAHASISWA MENGGUNAKAN OOP PENGERTIAN OOP object-oriented-programing/OOP adalah paradigma pemrograman berdasarkan konsep "objek", y

Maulana Reza Badrudin 1 Jan 05, 2022
A dead-simple service that notifies you when something goes down.

Totmannschalter Totmannschalter (German for dead man's switch) is a simple service that notifies you when it has not received any message from a servi

1 Dec 20, 2021
A Snakemake workflow for standardised sc/snRNAseq analysis

single_snake_sequencing - sc/snRNAseq Snakemake Workflow A Snakemake workflow for standardised sc/snRNAseq analysis. Every single cell analysis is sli

IMS Bio2Core Facility 1 Nov 02, 2021
TimeWizard - A script that generates every single Time Wizard EDOPRO lflist possible

EDOPRO F&L list generator This project is just a script that generates every sin

Diamond Dude 2 Sep 28, 2022
Pseudometa's dotfiles

pseudometa's dotfiles Table of Contents Why this repository? How this Repository works Special Explanations Got an idea for an improvement? Contact Wh

pseudometa 23 Dec 27, 2022
NORETURN is an esoteric programming language, based around the idea of not going back

NORETURN NORETURN is an esoteric programming language, based around the idea of not going back Concept Program coded in noreturn runs over one array,

1 Dec 15, 2021
Pykeeb - A small Python script that prints out currently connected keyboards

pykeeb 🐍 ⌨️ A small Python script that detects and prints out currently connect

Jordan Duabe 1 May 08, 2022
Stack BOF Protection Bypass Techniques

Stack Buffer Overflow - Protection Bypass Techniques

ommadawn46 18 Dec 28, 2022
Providing a working, flexible, easier and faster installer than the one officially provided by Arch Linux

Purpose The purpose is to bring more people to Arch Linux by providing a working, flexible, easier and faster installer than the one officially provid

André Luís 0 Nov 09, 2022