The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Last update: Dec 25, 2022

Overview

tiara - The Internet Archive Research Assistant

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

by Kay Savetz, May 2021.

Searches Internet Archive using its full text search for new items matching the keywords you specify. Run this script once a day via crontab for daily updates about new items relevant to your ongoing research subjects. It keeps track of the items it has already found, so will only alert you to new-to-you items. The script outputs its findings to an html file, and optionally emails that file to you via SendGrid or your system mail (eg Sendmail or Postfix).

Put your keywords in searchlist.txt, one search term per line. Very general terms (like "dogs") provide too many daily hits to be useful. More specific phrases work better.

Dependency: Internet Archive command line tool (Install with pip install internetarchive) The script also requires read-write access to the directory it lives in.

Issue: Internet Archive cannot generate thumbnails for all items. In these cases, you may see a broken image icon. Issue: Internet Archive's full text search doesn't seem to allow exact phrase matching. So a search for "Pliny The Elder" may turn up items mentioning Pliny The Younger, or with "Pliny" on one page and "elder" on another.

If you find this tool useful, please donate to Internet Archive

The Internet Archive Research Assistant - Daily search Internet Archive for new items matching your keywords

Related tags

Overview

tiara - The Internet Archive Research Assistant

Owner

Kay Savetz

🛸 Use pretrained transformers like BERT, XLNet and GPT-2 in spaCy

A modular framework for vision & language multimodal research from Facebook AI Research (FAIR)

Natural Language Processing for Adverse Drug Reaction (ADR) Detection

Need: Image Search With Python

[AAAI 21] Curriculum Labeling: Revisiting Pseudo-Labeling for Semi-Supervised Learning

CDLA: A Chinese document layout analysis (CDLA) dataset

Levenshtein and Hamming distance computation

A collection of GNN-based fake news detection models.

An assignment from my grad-level data mining course demonstrating some experience with NLP/neural networks/Pytorch

Chinese Pre-Trained Language Models (CPM-LM) Version-I

Extract rooms type, door, neibour rooms, rooms corners nad bounding boxes, and generate graph from rplan dataset

Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering

DeBERTa: Decoding-enhanced BERT with Disentangled Attention

GCRC: A Gaokao Chinese Reading Comprehension dataset for interpretable Evaluation

뉴스 도메인 질의응답 시스템 (21-1학기 졸업 프로젝트)

This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.

Sequence-to-Sequence Framework in PyTorch

Precision Medicine Knowledge Graph (PrimeKG)

GSoC'2021 | TensorFlow implementation of Wav2Vec2

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.