Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Last update: Nov 05, 2021

Related tags

Overview

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

This repository provides two web crawlers to label domain names using the McAfee API (https://www.trustedsource.org/sources/index.pl) and IP reputation using the TALOS API (https://talosintelligence.com/), respectively.

Requirements

BeautifulSoup

Usage

Descriptions of the demonstration code are as follows.

To label the categories of a set of domains, put the domain list in 'data/domain_list.txt' and run 'demo_domain_label.py'. The program will label the (1) category (e.g., Malicious Sites- Parked Domain) as well as (2) risk level (e.g., High Risk) of each domain (using the McAfee API) and save the results in 'res/domain_labels.txt'. When the program continuously outputs ''-Retry-'', please stop the program and wait for a moment. After the waiting, you can start the program again, which can automatically skip the domains already labeled and continue to label the rest domains.
To label the reputation of a set of IP addresses, put the IP list in 'data/IP_list.txt' and run 'demo_IP_label.py'. The program will label the (1) email reputation as well as (2) web reputation (with 3 levels of Poor, Neutral, and Good) and save the results in 'res/IP_labels.txt'. When the program continuously outputs ''None'', please stop the program and wait for a moment. After the waiting, you can start the program again, which can automatically skip the IPs already labeled and continue to label the rest IPs.
An example domain name list (with 21,820 effective second-level domains) and an example IP list (with 67,751 IP addresses) are given in 'data/examples/example_domain_list.txt' and 'data/examples/example_IP_list.txt', repsectively. The corresponding labeled results are saved in 'res/examples/example_domain_labels.txt' and 'res/examples/example_IP_labels.txt', respectively.

If you have questions regarding this repository, you can contact the author via [[email protected]].

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Related tags

Overview

Web Crawlers for Data Labelling of Malicious Domain Detection & IP Reputation Evaluation

Requirements

Usage

Owner

A simple python script to fetch the latest covid info

This script is intended to crawl license information of repositories through the GitHub API.

A multithreaded tool for searching and downloading images from popular search engines. It is straightforward to set up and run!

Demonstration on how to use async python to control multiple playwright browsers for web-scraping

Dailyiptvlist.com Scraper With Python

A low-code tool that generates python crawler code based on curl or url

Extract embedded metadata from HTML markup

Facebook Group Scraping Using Beautiful Soup & Selenium

Grab the changelog from releases on Github

A command-line program to download media, like and unlike posts, and more from creators on OnlyFans.

Using Python and Pushshift.io to Track stocks on the WallStreetBets subreddit

A simple reddit scraper to get memes (only images) from r/ProgrammerHumor.

simple http & https proxy scraper and checker

联通手机营业厅自动做任务、签到、领流量、领积分等。

Simple tool to scrape and download cross country ski timings and results from live.skidor.com

Semplice scraper realizzato in Python tramite la libreria BeautifulSoup

Pelican plugin that adds site search capability

Binance Smart Chain Contract Scraper + Contract Evaluator

Deep Web Miner Python | Spyder Crawler

Current Antarctic large iceberg positions derived from ASCAT and OSCAT-2