Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

Overview

Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains

This is an accompanying repository to the ICAIL 2021 paper entitled "Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains". All the data and the code used in the experiments reported in the paper are to be found here.

Data

The data set consists of 807 adjudicatory decisions from 7 different countries (6 languages) annotated in terms of the following type system:

  • Out of Scope - Parts outside of the main document body (e.g., metadata, editorial content, dissents, end notes, appendices).
  • Heading - Typically an incomplete sentence or marker starting a section (e.g., “Discussion,” “Analysis,” “II.”).
  • Background - The part where the court describes procedural history, relevant facts, or the parties’ claims.
  • Analysis - The section containing reasoning of the court, issues, and application of law to the facts of the case.
  • Introductory Summary - A brief summary of the case at the beginning of the decision.
  • Outcome - A few sentences stating how the case was decided (i.e, the overall outcome of the case).

The country specific subsets:

  • Canada - Random selection of cases retrieved from www.canlii.org from multiple provinces. The selection is not limited to any specific topic or court.
  • Czech Republic - A random selection of cases from Constitutional Court (30), Supreme Court (40), and Supreme Administrative Court (30). Temporal distribution was taken into account.
  • France - A selection of cases decided by Cour de cassation between 2011 and 2019. A stratified sampling based on the year of publication of the decision was used to select the cases.
  • Germany - A stratified sample from the federal jurisprudence database spanning all federal courts (civil, criminal, labor, finance, patent, social, constitutional, and administrative).
  • Italy - The top 100 cases of the criminal courts stored between 2015 and 2020 mentioning “stalking” and keyed to the Article 612 bis of the Criminal Code.
  • Poland - A stratified sample from trial-level, appellate, administrative courts, the Supreme Court, and the Constitutional tribunal. The cases mention “democratic country ruled by law.”
  • U.S.A. I - Federal district court decisions in employment law mentioning “motion for summary judgment,” “employee,” and “independent contractor.”
  • U.S.A. II - Administrative decisions from the U.S. Department of Labor. Top 100 ordered in reverse chronological rulings order, starting in October 2020, were selected.

For more detailed information, please, refer to the original paper.

How to Use

ICAIL 2021 Data

The data used in the ICAIL 2021 experiments can be found in the following paths:

data/Country-Language-*/annotator-*-ICAIL2021.csv

Note that the Canadian subset could not be included in this repository due to concerns about personal information protection in Canada. However, it can be obtained upon request at [email protected]. Once you obtain the data, you just need to create data/Canada-EN-1 directory and place all the files there.

If you would like to experiment with different preprocessing techniques the original texts are placed in the following paths:

data/Country-Language-*/texts

You can find the annotations corresponding to these texts here:

data/Country-Language-*/annotator-*.csv

The texts cleaned of the Out of Scope and Heading segments (via dataset_clean.py) are placed in the following paths:

data/Country-Language-*/texts-clean-annotator-*

Note that the processing depends on annotations. Hence, there are several versions of documents at this stage if there were multiple annotators. The annotations corresponding to the cleaned texts are here:

data/Country-Language-*/annotator-*-clean.csv

The dataset_ICAIL2021.py has the processing code that has been applied to the cleaned texts and annotations to generate the ICAIL 2021 dataset (see above). Note, that the code will skip the Czech Republic subset by default. This is because this subset requires an external resource for sentence segmentation (czech-pdt-ud-X.X-XXXXXX.udpipe). You first need to obtain the file at https://universaldependencies.org/. Then, you need to place it into the data directory. Then, you can remove the Czech_Republic-CZ-1 string from the EXCLUDED tuple in dataset_ICAIL2021.py. Finally, you need to replace the data/czech-pdt-ud-2.5-191206.udpipe string in the utils.py to correspond to the file that you have downloaded. After these changes, the code will also operate on the Czech Republic part of the dataset.

Dataset Statistics

To replicate the inter-annotator agreement analysis performed in the ICAIL 2021 paper you can use the ia_agreement.ipynb notebook.

To generate the dataset statistics reported in the ICAIL 2021 paper you can use the dataset_statistics.ipynb notebook.

Experiments

The file ICAIL2021_experiments.ipynb contains the code necessary to run the code presented in the paper. This includes the code to embed the sentences of the cases into a multilingual vector representation, the definition of the Gated Recurrent Unit model and the code to train and evaluated along the different experiments described in the paper. It also contains the code to create the visualizations presented in the discussion section of the paper.

The notebook can be run in two different ways:

Attribution

We kindly ask you to cite the following paper:

@inproceedings{savelka2021,
    title={Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains},
    author={Jaromir Savelka and Hannes Westermann and Karim Benyekhlef and Charlotte S. Alexander and Jayla C. Grant and David Restrepo Amariles and Rajaa El Hamdani and S\'{e}bastien Mee\`{u}s and Aurore Troussel and Micha\l\ Araszkiewicz and Kevin D. Ashley and Alexandra Ashley and Karl Branting and Mattia Falduti and Matthias Grabmair and Jakub Hara\v{s}ta and Tereza Novotn\'a, Elizabeth Tippett and Shiwanni Johnson},
    year={2021},
    booktitle={Proceedings of the 18th International Conference on Artificial Intelligence and Law},
    publisher={Association for Computing Machinery},
    doi={10.1145/3462757.3466149}
}

Jaromir Savelka, Hannes Westermann, Karim Benyekhlef, Charlotte S. Alexander, Jayla C. Grant, David Restrepo Amariles, Rajaa El Hamdani, Sébastien Meeùs, Aurore Troussel, Michał Araszkiewicz, Kevin D. Ashley, Alexandra Ashley, Karl Branting, Mattia Falduti, Matthias Grabmair, Jakub Harašta, Tereza Novotná, Elizabeth Tippett, and Shiwanni Johnson. 2021. Lex Rosetta: Transfer of Predictive Models Across Languages, Jurisdictions, and Legal Domains. In Eighteenth International Conference for Artificial Intelligence and Law (ICAIL’21), June 21–25, 2021, São Paulo, Brazil. ACM, New York,NY, USA, 10 pages. https://doi.org/10.1145/3462757.3466149

Localizing Visual Sounds the Hard Way

Localizing-Visual-Sounds-the-Hard-Way Code and Dataset for "Localizing Visual Sounds the Hard Way". The repo contains code and our pre-trained model.

Honglie Chen 58 Dec 07, 2022
A repo with study material, exercises, examples, etc for Devnet SPAUTO

MPLS in the SDN Era -- DevNet SPAUTO Get right to the study material: Checkout the Wiki! A lab topology based on MPLS in the SDN era book used for 30

Hugo Tinoco 67 Nov 16, 2022
Semi-supevised Semantic Segmentation with High- and Low-level Consistency

Semi-supevised Semantic Segmentation with High- and Low-level Consistency This Pytorch repository contains the code for our work Semi-supervised Seman

123 Dec 30, 2022
An excellent hash algorithm combining classical sponge structure and RNN.

SHA-RNN Recurrent Neural Network with Chaotic System for Hash Functions Anonymous Authors [摘要] 在这次作业中我们提出了一种新的 Hash Function —— SHA-RNN。其以海绵结构为基础,融合了混

Houde Qian 5 May 15, 2022
This repository implements WGAN_GP.

Image_WGAN_GP This repository implements WGAN_GP. Image_WGAN_GP This repository uses wgan to generate mnist and fashionmnist pictures. Firstly, you ca

Lieon 6 Dec 10, 2021
MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモ

Tokyo2020-Pictogram-using-MediaPipe MediaPipeで姿勢推定を行い、Tokyo2020オリンピック風のピクトグラムを表示するデモです。 Tokyo2020Pictgram02.mp4 Requirement mediapipe 0.8.6 or later O

KazuhitoTakahashi 295 Dec 26, 2022
Simple transformer model for CIFAR10

CIFAR-Transformer Simple transformer model for CIFAR10. Reference: https://www.tensorflow.org/text/tutorials/transformer https://github.com/huggingfac

9 Nov 07, 2022
Scalable implementation of Lee / Mykland (2012) and Ait-Sahalia / Jacod (2012) Jump tests for noisy high frequency data

JumpDetectR Name of QuantLet : JumpDetectR Published in : 'To be published as "Jump dynamics in high frequency crypto markets"' Description : 'Scala

LvB 12 Jan 01, 2023
Code for the paper "Jukebox: A Generative Model for Music"

Status: Archive (code is provided as-is, no updates expected) Jukebox Code for "Jukebox: A Generative Model for Music" Paper Blog Explorer Colab Insta

OpenAI 6k Jan 02, 2023
MEDS: Enhancing Memory Error Detection for Large-Scale Applications

MEDS: Enhancing Memory Error Detection for Large-Scale Applications Prerequisites cmake and clang Build MEDS supporting compiler $ make Build Using Do

Secomp Lab at Purdue University 34 Dec 14, 2022
Official implementation for NIPS'17 paper: PredRNN: Recurrent Neural Networks for Predictive Learning Using Spatiotemporal LSTMs.

PredRNN: A Recurrent Neural Network for Spatiotemporal Predictive Learning The predictive learning of spatiotemporal sequences aims to generate future

THUML: Machine Learning Group @ THSS 243 Dec 26, 2022
Explanatory Learning: Beyond Empiricism in Neural Networks

Explanatory Learning This is the official repository for "Explanatory Learning: Beyond Empiricism in Neural Networks". Datasets Download the datasets

GLADIA Research Group 10 Dec 06, 2022
Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set (CVPRW 2019). A PyTorch implementation.

Accurate 3D Face Reconstruction with Weakly-Supervised Learning: From Single Image to Image Set —— PyTorch implementation This is an unofficial offici

Sicheng Xu 833 Dec 28, 2022
[ICCV 2021] Official Tensorflow Implementation for "Single Image Defocus Deblurring Using Kernel-Sharing Parallel Atrous Convolutions"

KPAC: Kernel-Sharing Parallel Atrous Convolutional block This repository contains the official Tensorflow implementation of the following paper: Singl

Hyeongseok Son 50 Dec 29, 2022
PyTorch Live is an easy to use library of tools for creating on-device ML demos on Android and iOS.

PyTorch Live is an easy to use library of tools for creating on-device ML demos on Android and iOS. With Live, you can build a working mobile app ML demo in minutes.

559 Jan 01, 2023
PyTorch implementation of the ideas presented in the paper Interaction Grounded Learning (IGL)

Interaction Grounded Learning This repository contains a simple PyTorch implementation of the ideas presented in the paper Interaction Grounded Learni

Arthur Juliani 4 Aug 31, 2022
Code repo for realtime multi-person pose estimation in CVPR'17 (Oral)

Realtime Multi-Person Pose Estimation By Zhe Cao, Tomas Simon, Shih-En Wei, Yaser Sheikh. Introduction Code repo for winning 2016 MSCOCO Keypoints Cha

Zhe Cao 4.9k Dec 31, 2022
Implementation of "Learning to Match Features with Seeded Graph Matching Network" ICCV2021

SGMNet Implementation PyTorch implementation of SGMNet for ICCV'21 paper "Learning to Match Features with Seeded Graph Matching Network", by Hongkai C

87 Dec 11, 2022
A curated list of awesome neural radiance fields papers

Awesome Neural Radiance Fields A curated list of awesome neural radiance fields papers, inspired by awesome-computer-vision. How to submit a pull requ

Yen-Chen Lin 3.9k Dec 27, 2022
Pytorch Implementation of Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations

NANSY: Unofficial Pytorch Implementation of Neural Analysis and Synthesis: Reconstructing Speech from Self-Supervised Representations Notice Papers' D

Dongho Choi 최동호 104 Dec 23, 2022