a small library for extracting rich content from urls

Last update: Dec 27, 2022

Related tags

Overview

A small library for extracting rich content from urls.

what does it do?

micawber supplies a few methods for retrieving rich metadata about a variety of links, such as links to youtube videos. micawber also provides functions for parsing blocks of text and html and replacing links to videos with rich embedded content.

examples

here is a quick example:

import micawber

# load up rules for some default providers, such as youtube and flickr
providers = micawber.bootstrap_basic()

providers.request('http://www.youtube.com/watch?v=54XHDUOHuzU')

# returns the following dictionary:
{
    'author_name': 'pascalbrax',
    'author_url': u'http://www.youtube.com/user/pascalbrax'
    'height': 344,
    'html': u'<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>',
    'provider_name': 'YouTube',
    'provider_url': 'http://www.youtube.com/',
    'title': 'Future Crew - Second Reality demo - HD',
    'type': u'video',
    'thumbnail_height': 360,
    'thumbnail_url': u'http://i2.ytimg.com/vi/54XHDUOHuzU/hqdefault.jpg',
    'thumbnail_width': 480,
    'url': 'http://www.youtube.com/watch?v=54XHDUOHuzU',
    'width': 459,
    'version': '1.0',
}

providers.parse_text('this is a test:\nhttp://www.youtube.com/watch?v=54XHDUOHuzU')

# returns the following string:
this is a test:
<iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&feature=oembed" frameborder="0" allowfullscreen></iframe>

providers.parse_html('<p>http://www.youtube.com/watch?v=54XHDUOHuzU</p>')

# returns the following html:
<p><iframe width="459" height="344" src="http://www.youtube.com/embed/54XHDUOHuzU?fs=1&amp;feature=oembed" frameborder="0" allowfullscreen="allowfullscreen"></iframe></p>

a small library for extracting rich content from urls

Related tags

Overview

what does it do?

examples

Owner

Charles Leifer

Zotero2Readwise - A Python Library to retrieve annotations and notes from Zotero and upload them to your Readwise

Combine XPath, CSS Selectors and JSONPath for Web data extracting.

Every web site provides APIs.

Export your data from Xiami

a small library for extracting rich content from urls

Fast and robust date extraction from web pages, with Python or on the command-line

fast python port of arc90's readability tool, updated to match latest readability.js!

RSS feed generator website with user friendly interface

Convert HTML to Markdown-formatted text.

Brownant is a web data extracting framework.

Web Content Retrieval for Humans™

Web-Extractor - Simple Tool To Extract IP-Adress From Website

Pythonic HTML Parsing for Humans™

News, full-text, and article metadata extraction in Python 3. Advanced docs:

Open clone of OpenAI's unreleased WebText dataset scraper.

Github Actions采集RSS, 打造无广告内容优质的头版头条超赞宝藏页

Module for automatic summarization of text documents and HTML pages.