Time Masking for Temporal Language Models

This repository provides a reference implementation of the paper:

Time Masking for Temporal Language Models
Guy D. Rosin, Ido Guy, and Kira Radinsky
Accepted to WSDM 2022
Preprint: https://arxiv.org/abs/2110.06366

Abstract:

Our world is constantly evolving, and so is the content on the web. Consequently, our languages, often said to mirror the world, are dynamic in nature. However, most current contextual language models are static and cannot adapt to changes over time.
In this work, we propose a temporal contextual language model called TempoBERT, which uses time as an additional context of texts. Our technique is based on modifying texts with temporal information and performing time masking - specific masking for the supplementary time information.
We leverage our approach for the tasks of semantic change detection and sentence time prediction, experimenting on diverse datasets in terms of time, size, genre, and language. Our extensive evaluation shows that both tasks benefit from exploiting time masking.

Prerequisites

Python 3.8
Install requirements using pip install -r requirements.txt
Obtain datasets for training and evaluation:
- For semantic change detection: LiverpoolFC dataset or the SemEval-2020 Task 1 datasets.
- For sentence time prediction: our NYT dataset can be found under datasets.

Usage

Train TempoBERT using train_tempobert.py. This script is similar to Hugging Face's language modeling training script (link), and introduces two new arguments: time_embedding_type, that should be set to "prepend_token", and time_mlm_probability, that's optional and can used for setting a custom probability for time masking.
Evaluate TempoBERT using semantic_change_detection.py for semantic change detection and sentence_time_prediction.py for sentence time prediction.

Pointers

The modification to the input texts is performed in tokenization_utils_fast.py, in TempoPreTrainedTokenizerFast._batch_encode_plus().
Time masking is performed in temporal_data_collator.py.

Code & Data for the Paper "Time Masking for Temporal Language Models", WSDM 2022

Related tags

Overview

Time Masking for Temporal Language Models

Prerequisites

Usage

Pointers

Owner

Guy Rosin

[IEEE TPAMI21] MobileSal: Extremely Efficient RGB-D Salient Object Detection [PyTorch & Jittor]

Google Brain - Ventilator Pressure Prediction

This repository contains the code used for the implementation of the paper "Probabilistic Regression with HuberDistributions"

Topic Discovery via Latent Space Clustering of Pretrained Language Model Representations

Convnext-tf - Unofficial tensorflow keras implementation of ConvNeXt

DeepCO3: Deep Instance Co-segmentation by Co-peak Search and Co-saliency

A scikit-learn-compatible module for estimating prediction intervals.

COVID-VIT: Classification of Covid-19 from CT chest images based on vision transformer models

Exposure Time Calculator (ETC) and radial velocity precision estimator for the Near InfraRed Planet Searcher (NIRPS) spectrograph

WHENet - ONNX, OpenVINO, TFLite, TensorRT, EdgeTPU, CoreML, TFJS, YOLOv4/YOLOv4-tiny-3L

Romanian Automatic Speech Recognition from the ROBIN project

Official code for the ICLR 2021 paper Neural ODE Processes

This is an official implementation of the CVPR2022 paper "Blind2Unblind: Self-Supervised Image Denoising with Visible Blind Spots".

A PyTorch-based R-YOLOv4 implementation which combines YOLOv4 model and loss function from R3Det for arbitrary oriented object detection.

A repository for benchmarking neural vocoders by their quality and speed.

Preprocessed Datasets for our Multimodal NER paper

LSTM-VAE Implementation and Relevant Evaluations

A simple interface for editing natural photos with generative neural networks.

CL-Gym: Full-Featured PyTorch Library for Continual Learning

In this project, we develop a face recognize platform based on MTCNN object-detection netcwork and FaceNet self-supervised network.