Automate the case review on legal case documents and find the most critical cases using network analysis

Last update: Dec 28, 2022

Overview

Automation on Legal Court Cases Review

This project is to automate the case review on legal case documents and find the most critical cases using network analysis.

Short write-up

Affiliation: Institute for Social and Economic Research and Policy, Columbia University

Project Information:

Keywords: Automation, PDF parse, String Extraction, Network Analysis

Software:

Python : pdfminer, LexNLP, nltk sklearn
R: igraph

Scope:

Parse court documents, extract citations from raw text.
Build citation network, identify important cases in the network.
Extract judge's opinion text and meta information including opinion author, court, decision.
Model training to predict court decision based on opinion text.

Polit Study on 159 Legal Court Documents (in `pilot_159` folder)

1. Process PDF documents using `Python`

Ipython Notebook	Description
`1.Extraction by LexNLP.ipynb`	Extract meta inforation use `LexNLP` package.
`2.Layer Analysis on Sigle File. ipynb`	Use `pdfminer` to extract the raw text and the paragraph segamentation in the PDF document.
`3.Patent Position by Layer.ipynb`	Identify the position of patent number in extracted layers from PDF.
`4.Opinion and Author by Layer.ipynb`	Extract opinion text, author, decisions from the layers list.
`5.Wrap up to Meta Data.ipynb`	Store extracted meta data to `.json` or `.csv`
`6.Visualize citation frequency.ipynb`	Bar plot of the citation frequencies

2. Data: Parse PDF documents via `Python`

These datasets are NOT included in this public repository for intellectual property and privacy concern

File
`pdf2text159.json`	A dictionary of 3 list: `file_name`, `raw_text`, `layers`.
`cite_edge159.csv`	Edge list of citation network
`cite_node159.csv`	Meta information of each case: `case_number`, `court`, `dates`
`reference_extract.csv`	cited cases in a list for every case, untidy format for analysis
`citation159.csv`	file citation pair, tidy format for calculation
`regulation159.csv`	file regulation pair, tidy format for calculation

3. Analyze and Visualize using `R`

File
`Calculate Citation Frequency.Rmd`	Analyze `reference_extract.csv`
`Citation Network.Rmd`	Analyze `cite_edge159`

4. Visulization Chart Sample

Citation Frequency

Citation Network

Network Visulization and Predictive Modeling on 854 Legal Court Cases (in `Extraction_Modelling` folder)

1. Extract opinion and meta information from raw text data

`.ipynb` notebook	Description
`Full Dataset Merge.ipynb`	Merge the 854 cases dataset
`Edge and Node List.ipynb`	Create edge and node list
`Full Extractions.ipynb`	Extract author, judge panel, opinion text
`Clean Opinion Text.ipynb`	Remove references and special characters in opinion text

2. Datasets

These datasets are NOT included in this public repository for intellectual property and privacy concern

Dataset	Description
`amy_cases.json`	large dictionary {file name: raw text} for 854 cases, from Lilian's PDF parsing
`full_name_text.json`	convert `amy_cases.json` key value pair to two list: `file_name`, `raw_text`
`cite_edge.csv`	edge list of citation
`cite_node.csv`	node list contains `case_code`, `case_name`, `court_from`, `court_type`
`extraction854.csv`	full extractions include `case_code`, `case_name`, `court_from`, `court_type`, `result`, `author`, `judge_panel`
`decision_text.json`	json file include `author`, `decision`(result of the case), `opinion` (opinion text), `cleaned_text` (cleaned opinion text)
`cleaned_text.csv`	csv file contains allt the cleaned text
`predict_data.csv`	cleaned dataset for NLP modeling predict court decision

3. Visulization using R

R markdown file
`Full Network Graph.Rmd`	draw the full citation network
`Citation Betwwen Nodes.Rmd`	draw citation between all the available cases
`Clean Data For Predictive Modelling.rmd`	clean text data for predictive modeling

Interactive Graph

Play with Interactive Graph

Full Citation Network (all cases and cited cases)

Citation Between Available Cases

4. Predictive Modeling using Python

`ipynb` notebook
`NLP Predictive Modeling.ipynb`	Try different preprocessing, and build a logistic regression to predict court decision.

Automate the case review on legal case documents and find the most critical cases using network analysis

Related tags

Overview

Automation on Legal Court Cases Review

Project Information:

Polit Study on 159 Legal Court Documents (in pilot_159 folder)

1. Process PDF documents using Python

2. Data: Parse PDF documents via Python

3. Analyze and Visualize using R

4. Visulization Chart Sample

Citation Frequency

Citation Network

Network Visulization and Predictive Modeling on 854 Legal Court Cases (in Extraction_Modelling folder)

1. Extract opinion and meta information from raw text data

2. Datasets

3. Visulization using R

Interactive Graph

Full Citation Network (all cases and cited cases)

Citation Between Available Cases

4. Predictive Modeling using Python

Visulization of the Bi-gram (words) with the strongest coefficient

Owner

Yi Yin

📊📈 Serves up Pandas dataframes via the Django REST Framework for use in client-side (i.e. d3.js) visualizations and offline analysis (e.g. Excel)

A small collection of tools made by me, that you can use to visualize atomic orbitals in both 2D and 3D in different aspects.

Functions for easily making publication-quality figures with matplotlib.

Visualize and compare datasets, target values and associations, with one line of code.

IPython/Jupyter notebook module for Vega and Vega-Lite

HW 2: Visualizing interesting datasets

This is my favourite function - the Rastrigin function.

HiPlot makes understanding high dimensional data easy

Visualizations for machine learning datasets

Smoking Simulation is an app to simulate the spreading of smokers and non-smokers, their interactions and population during certain amount of time.

Visualize the training curve from the *.csv file (tensorboard format).

Uniform Manifold Approximation and Projection

A python package for animating plots build on matplotlib.

Python script to generate a visualization of various sorting algorithms, image or video.

Interactive Dashboard for Visualizing OSM Data Change

100 data puzzles for pandas, ranging from short and simple to super tricky (60% complete)

Pydrawer: The Python package for visualizing curves and linear transformations in a super simple way

Interactive Data Visualization in the browser, from Python

Script to create an animated data visualisation for categorical timeseries data - GIF choropleth map with annotations.

Data Visualizer for Super Mario Kart (SNES)

Polit Study on 159 Legal Court Documents (in `pilot_159` folder)

1. Process PDF documents using `Python`

2. Data: Parse PDF documents via `Python`

3. Analyze and Visualize using `R`

Network Visulization and Predictive Modeling on 854 Legal Court Cases (in `Extraction_Modelling` folder)