a small library for extracting rich content from urls

Last update: Dec 27, 2022

Related tags

Overview

A small library for extracting rich content from urls.

what does it do?

micawber supplies a few methods for retrieving rich metadata about a variety of links, such as links to youtube videos. micawber also provides functions for parsing blocks of text and html and replacing links to videos with rich embedded content.

examples

here is a quick example:

import micawber

# load up rules for some default providers, such as youtube and flickr
providers = micawber.bootstrap_basic()

providers.request('http://www.youtube.com/watch?v=54XHDUOHuzU')

# returns the following dictionary:
{
    'author_name': 'pascalbrax',
    'author_url': u'http://www.youtube.com/user/pascalbrax'
    'height': 344,
    'html': u'<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>',
    'provider_name': 'YouTube',
    'provider_url': 'http://www.youtube.com/',
    'title': 'Future Crew - Second Reality demo - HD',
    'type': u'video',
    'thumbnail_height': 360,
    'thumbnail_url': u'http://i2.ytimg.com/vi/54XHDUOHuzU/hqdefault.jpg',
    'thumbnail_width': 480,
    'url': 'http://www.youtube.com/watch?v=54XHDUOHuzU',
    'width': 459,
    'version': '1.0',
}

providers.parse_text('this is a test:\nhttp://www.youtube.com/watch?v=54XHDUOHuzU')

# returns the following string:
this is a test:
<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>

providers.parse_html('<p>http://www.youtube.com/watch?v=54XHDUOHuzU</p>')

# returns the following html:
<p><iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&amp;feature=oembed" frameborder="0" allowfullscreen="allowfullscreen"></iframe></p>

a small library for extracting rich content from urls

Related tags

Overview

what does it do?

examples

Owner

Charles Leifer

A social networking service scraper in Python

Grab the changelog from releases on Github

Facebook Group Scraping Using Beautiful Soup & Selenium

Pro Football Reference Game Data Webscraper

Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

This is python to scrape overview and reviews of companies from Glassdoor.

中国大学生在线四史自动答题刷分(现仅支持英雄篇)

京东云无线宝积分推送，支持查看多设备积分使用情况

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

A simple Discord scraper for discord bots

Scrape puzzle scrambles from csTimer.net

Kusonime scraper using python3

CreamySoup - a helper script for automated SourceMod plugin updates management.

Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

Extract embedded metadata from HTML markup

Poolbooru gelscraper - a simple python script for scraping images off gelbooru pools.

Web and PDF Scraper Refactoring

A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

Minimal set of tools to conduct stealthy scraping.

This program will help you to properly scrape all data from a specific website

a small library for extracting rich content from urls

Related tags

Overview

what does it do?

examples

Owner

Charles Leifer

A social networking service scraper in Python

Grab the changelog from releases on Github

Facebook Group Scraping Using Beautiful Soup & Selenium

Pro Football Reference Game Data Webscraper

Scrape data on SpaceX: Capsules, Rockets, Cores, Roadsters, SpaceX Info

This is python to scrape overview and reviews of companies from Glassdoor.

中国大学生在线 四史自动答题刷分(现仅支持英雄篇)

京东云无线宝积分推送，支持查看多设备积分使用情况

A training task for web scraping using python multithreading and a real-time-updated list of available proxy servers.

A simple Discord scraper for discord bots

Scrape puzzle scrambles from csTimer.net

Kusonime scraper using python3

CreamySoup - a helper script for automated SourceMod plugin updates management.

Consulta de CPF e CNPJ na Receita Federal com Web-Scraping

Extract embedded metadata from HTML markup

Poolbooru gelscraper - a simple python script for scraping images off gelbooru pools.

Web and PDF Scraper Refactoring

A simple, configurable and expandable combined shop scraper to minimize the costs of ordering several items

Minimal set of tools to conduct stealthy scraping.

This program will help you to properly scrape all data from a specific website

中国大学生在线四史自动答题刷分(现仅支持英雄篇)