Time Masking for Temporal Language Models

This repository provides a reference implementation of the paper:

Time Masking for Temporal Language Models
Guy D. Rosin, Ido Guy, and Kira Radinsky
Accepted to WSDM 2022
Preprint: https://arxiv.org/abs/2110.06366

Abstract:

Our world is constantly evolving, and so is the content on the web. Consequently, our languages, often said to mirror the world, are dynamic in nature. However, most current contextual language models are static and cannot adapt to changes over time.
In this work, we propose a temporal contextual language model called TempoBERT, which uses time as an additional context of texts. Our technique is based on modifying texts with temporal information and performing time masking - specific masking for the supplementary time information.
We leverage our approach for the tasks of semantic change detection and sentence time prediction, experimenting on diverse datasets in terms of time, size, genre, and language. Our extensive evaluation shows that both tasks benefit from exploiting time masking.

Prerequisites

Python 3.8
Install requirements using pip install -r requirements.txt
Obtain datasets for training and evaluation:
- For semantic change detection: LiverpoolFC dataset or the SemEval-2020 Task 1 datasets.
- For sentence time prediction: our NYT dataset can be found under datasets.

Usage

Train TempoBERT using train_tempobert.py. This script is similar to Hugging Face's language modeling training script (link), and introduces two new arguments: time_embedding_type, that should be set to "prepend_token", and time_mlm_probability, that's optional and can used for setting a custom probability for time masking.
Evaluate TempoBERT using semantic_change_detection.py for semantic change detection and sentence_time_prediction.py for sentence time prediction.

Pointers

The modification to the input texts is performed in tokenization_utils_fast.py, in TempoPreTrainedTokenizerFast._batch_encode_plus().
Time masking is performed in temporal_data_collator.py.

Code & Data for the Paper "Time Masking for Temporal Language Models", WSDM 2022

Related tags

Overview

Time Masking for Temporal Language Models

Prerequisites

Usage

Pointers

Owner

Guy Rosin

Implementation for paper "STAR: A Structure-aware Lightweight Transformer for Real-time Image Enhancement" (ICCV 2021).

An official source code for paper Deep Graph Clustering via Dual Correlation Reduction, accepted by AAAI 2022

Ankou: Guiding Grey-box Fuzzing towards Combinatorial Difference

Code accompanying paper: Meta-Learning to Improve Pre-Training

Crowd-sourced Annotation of Human Motion.

EFENet: Reference-based Video Super-Resolution with Enhanced Flow Estimation

A Player for Kanye West's Stem Player. Sort of an emulator.

Code for Multiple Instance Active Learning for Object Detection, CVPR 2021

League of Legends Reinforcement Learning Environment (LoLRLE) multiple training scenarios using PPO.

RLDS stands for Reinforcement Learning Datasets

Fight Recognition from Still Images in the Wild @ WACVW2022, Real-world Surveillance Workshop

Code for the SIGGRAPH 2022 paper "DeltaConv: Anisotropic Operators for Geometric Deep Learning on Point Clouds."

A BaSiC Tool for Background and Shading Correction of Optical Microscopy Images

Deeplearning project at The Technological University of Denmark (DTU) about Neural ODEs for finding dynamics in ordinary differential equations and real world time series data

基于YoloX目标检测+DeepSort算法实现多目标追踪Baseline

Code, environments, and scripts for the paper: "How Private Is Your RL Policy? An Inverse RL Based Analysis Framework"

FTIR-Deep Learning - FTIR Deep Learning With Python

Denoising Diffusion Implicit Models

we propose EfficientDerain for high-efficiency single-image deraining

Mind the Trade-off: Debiasing NLU Models without Degrading the In-distribution Performance