1 Repositories
Libextract is a statistics-enabled data extraction library that works on HTML and XML documents and written in Python