Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

Overview

Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

This is an accompanying repository to the ICAIL 2021 paper entitled "Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains". All the data and the code used in the experiments reported in the paper are to be found here.

Data

The data set consists of 807 adjudicatory decisions from 7 different countries (6 languages) annotated in terms of the following type system:

  • Out of Scope - Parts outside of the main document body (e.g., metadata, editorial content, dissents, end notes, appendices).
  • Heading - Typically an incomplete sentence or marker starting a section (e.g., “Discussion,” “Analysis,” “II.”).
  • Background - The part where the court describes procedural history, relevant facts, or the parties’ claims.
  • Analysis - The section containing reasoning of the court, issues, and application of law to the facts of the case.
  • Introductory Summary - A brief summary of the case at the beginning of the decision.
  • Outcome - A few sentences stating how the case was decided (i.e, the overall outcome of the case).

The country specific subsets:

  • Canada - Random selection of cases retrieved from www.canlii.org from multiple provinces. The selection is not limited to any specific topic or court.
  • Czech Republic - A random selection of cases from Constitutional Court (30), Supreme Court (40), and Supreme Administrative Court (30). Temporal distribution was taken into account.
  • France - A selection of cases decided by Cour de cassation between 2011 and 2019. A stratified sampling based on the year of publication of the decision was used to select the cases.
  • Germany - A stratified sample from the federal jurisprudence database spanning all federal courts (civil, criminal, labor, finance, patent, social, constitutional, and administrative).
  • Italy - The top 100 cases of the criminal courts stored between 2015 and 2020 mentioning “stalking” and keyed to the Article 612 bis of the Criminal Code.
  • Poland - A stratified sample from trial-level, appellate, administrative courts, the Supreme Court, and the Constitutional tribunal. The cases mention “democratic country ruled by law.”
  • U.S.A. I - Federal district court decisions in employment law mentioning “motion for summary judgment,” “employee,” and “independent contractor.”
  • U.S.A. II - Administrative decisions from the U.S. Department of Labor. Top 100 ordered in reverse chronological rulings order, starting in October 2020, were selected.

For more detailed information, please, refer to the original paper.

How to Use

ICAIL 2021 Data

The data used in the ICAIL 2021 experiments can be found in the following paths:

data/Country-Language-*/annotator-*-ICAIL2021.csv

Note that the Canadian subset could not be included in this repository due to concerns about personal information protection in Canada. However, it can be obtained upon request at [email protected]. Once you obtain the data, you just need to create data/Canada-EN-1 directory and place all the files there.

If you would like to experiment with different preprocessing techniques the original texts are placed in the following paths:

data/Country-Language-*/texts

You can find the annotations corresponding to these texts here:

data/Country-Language-*/annotator-*.csv

The texts cleaned of the Out of Scope and Heading segments (via dataset_clean.py) are placed in the following paths:

data/Country-Language-*/texts-clean-annotator-*

Note that the processing depends on annotations. Hence, there are several versions of documents at this stage if there were multiple annotators. The annotations corresponding to the cleaned texts are here:

data/Country-Language-*/annotator-*-clean.csv

The dataset_ICAIL2021.py has the processing code that has been applied to the cleaned texts and annotations to generate the ICAIL 2021 dataset (see above). Note, that the code will skip the Czech Republic subset by default. This is because this subset requires an external resource for sentence segmentation (czech-pdt-ud-X.X-XXXXXX.udpipe). You first need to obtain the file at https://universaldependencies.org/. Then, you need to place it into the data directory. Then, you can remove the Czech_Republic-CZ-1 string from the EXCLUDED tuple in dataset_ICAIL2021.py. Finally, you need to replace the data/czech-pdt-ud-2.5-191206.udpipe string in the utils.py to correspond to the file that you have downloaded. After these changes, the code will also operate on the Czech Republic part of the dataset.

Dataset Statistics

To replicate the inter-annotator agreement analysis performed in the ICAIL 2021 paper you can use the ia_agreement.ipynb notebook.

To generate the dataset statistics reported in the ICAIL 2021 paper you can use the dataset_statistics.ipynb notebook.

Experiments

The file ICAIL2021_experiments.ipynb contains the code necessary to run the code presented in the paper. This includes the code to embed the sentences of the cases into a multilingual vector representation, the definition of the Gated Recurrent Unit model and the code to train and evaluated along the different experiments described in the paper. It also contains the code to create the visualizations presented in the discussion section of the paper.

The notebook can be run in two different ways:

Attribution

We kindly ask you to cite the following paper:

@inproceedings{savelka2021,
    title={Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains},
    author={Jaromir Savelka and Hannes Westermann and Karim Benyekhlef and Charlotte S. Alexander and Jayla C. Grant and David Restrepo Amariles and Rajaa El Hamdani and S\'{e}bastien Mee\`{u}s and Aurore Troussel and Micha\l\ Araszkiewicz and Kevin D. Ashley and Alexandra Ashley and Karl Branting and Mattia Falduti and Matthias Grabmair and Jakub Hara\v{s}ta and Tereza Novotn\'a, Elizabeth Tippett and Shiwanni Johnson},
    year={2021},
    booktitle={Proceedings of the 18th International Conference on Artificial Intelligence and Law},
    publisher={Association for Computing Machinery},
    doi={10.1145/3462757.3466149}
}

Jaromir Savelka, Hannes Westermann, Karim Benyekhlef, Charlotte S. Alexander, Jayla C. Grant, David Restrepo Amariles, Rajaa El Hamdani, Sébastien Meeùs, Aurore Troussel, Michał Araszkiewicz, Kevin D. Ashley, Alexandra Ashley, Karl Branting, Mattia Falduti, Matthias Grabmair, Jakub Harašta, Tereza Novotná, Elizabeth Tippett, and Shiwanni Johnson. 2021. Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains. In Eighteenth International Conference for Artificial Intelligence and Law (ICAIL’21), June 21–25, 2021, São Paulo, Brazil. ACM, New York,NY, USA, 10 pages. https://doi.org/10.1145/3462757.3466149

Yolo Traffic Light Detection With Python

Yolo-Traffic-Light-Detection This project is based on detecting the Traffic light. Pretained data is used. This application entertained both real time

Ananta Raj Pant 2 Aug 08, 2022
code associated with ACL 2021 DExperts paper

DExperts Hi! This repository contains code for the paper DExperts: Decoding-Time Controlled Text Generation with Experts and Anti-Experts to appear at

Alisa Liu 68 Dec 15, 2022
Position detection system of mobile robot in the warehouse enviroment

Autonomous-Forklift-System About | GUI | Tests | Starting | License | Author | 🎯 About An application that run the autonomous forklift paletization a

Kamil Goś 1 Nov 24, 2021
Repo for the Tutorials of Day1-Day3 of the Nordic Probabilistic AI School 2021 (https://probabilistic.ai/)

ProbAI 2021 - Probabilistic Programming and Variational Inference Tutorial with Pryo Day 1 (June 14) Slides Notebook: students_PPLs_Intro Notebook: so

PGM-Lab 46 Nov 01, 2022
MAUS: A Dataset for Mental Workload Assessment Using Wearable Sensor - Baseline system

MAUS: A Dataset for Mental Workload Assessment Using Wearable Sensor - Baseline system Getting started To start working on this assignment, you should

2 Aug 06, 2022
PyTorch implementation of "Debiased Visual Question Answering from Feature and Sample Perspectives" (NeurIPS 2021)

D-VQA We provide the PyTorch implementation for Debiased Visual Question Answering from Feature and Sample Perspectives (NeurIPS 2021). Dependencies P

Zhiquan Wen 19 Dec 22, 2022
Python and Julia in harmony.

PythonCall & JuliaCall Bringing Python® and Julia together in seamless harmony: Call Python code from Julia and Julia code from Python via a symmetric

Christopher Rowley 414 Jan 07, 2023
A tf.keras implementation of Facebook AI's MadGrad optimization algorithm

MADGRAD Optimization Algorithm For Tensorflow This package implements the MadGrad Algorithm proposed in Adaptivity without Compromise: A Momentumized,

20 Aug 18, 2022
Supplemental Code for "ImpressionNet :A Multi view Approach to Predict Socio Facial Impressions"

Supplemental Code for "ImpressionNet :A Multi view Approach to Predict Socio Facial Impressions" Environment requirement This code is based on Python

Rohan Kumar Gupta 1 Dec 19, 2021
This repository contains the code for "Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP".

Self-Diagnosis and Self-Debiasing This repository contains the source code for Self-Diagnosis and Self-Debiasing: A Proposal for Reducing Corpus-Based

Timo Schick 62 Dec 12, 2022
PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility

PatchMatch-RL: Deep MVS with Pixelwise Depth, Normal, and Visibility Jae Yong Lee, Joseph DeGol, Chuhang Zou, Derek Hoiem Installation To install nece

31 Apr 19, 2022
Winners of DrivenData's Overhead Geopose Challenge

Winners of DrivenData's Overhead Geopose Challenge

DrivenData 22 Aug 04, 2022
Exploit Camera Raw Data for Video Super-Resolution via Hidden Markov Model Inference

RawVSR This repo contains the official codes for our paper: Exploit Camera Raw Data for Video Super-Resolution via Hidden Markov Model Inference Xiaoh

Xiaohong Liu 23 Oct 08, 2022
Deep Latent Force Models

Deep Latent Force Models This repository contains a PyTorch implementation of the deep latent force model (DLFM), presented in the paper, Compositiona

Tom McDonald 5 Oct 26, 2022
Project ArXiv Citation Network

Project ArXiv Citation Network Overview This project involved the analysis of the ArXiv citation network. Usage The complete code of this project is i

Dennis Núñez-Fernández 5 Oct 20, 2022
One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing".

Introduction One implementation of the paper "DMRST: A Joint Framework for Document-Level Multilingual RST Discourse Segmentation and Parsing". Users

seq-to-mind 18 Dec 11, 2022
[PNAS2021] The neural architecture of language: Integrative modeling converges on predictive processing

The neural architecture of language: Integrative modeling converges on predictive processing Code accompanying the paper The neural architecture of la

Martin Schrimpf 36 Dec 01, 2022
UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down. UpChecker - just run file and use project easy

UpChecker UpChecker is a simple opensource project to host it fast on your server and check is server up, view statistic, get messages if it is down.

Yan 4 Apr 07, 2022
Fake News Detection Using Machine Learning Methods

Fake-News-Detection-Using-Machine-Learning-Methods Fake news is always a real and dangerous issue. However, with the presence and abundance of various

Achraf Safsafi 1 Jan 11, 2022