This is a NLP based project to extract effective date of the contract from their text files.

Last update: Jan 26, 2022

Overview

Date-Extraction-from-Contracts

This is a NLP based project to extract effective date of the contract from their text files.

Problem statement

This is a NLP based project where effective dates needs to be identified from the contracts as per the given text data of the contracts. The dates could be in any format for eg - 01/01/2022, 1st Jan, 2022, 1st January, 2022, 01 Jan 2022, etc.

Libraries Used

Numpy
Tensorflow
keras
nltk
Sklearn
matplotlib
pandas

Approach

Data prerprocessing

To preprocess the text data the custom function was developed to preprocess the data as the convential libraires out there are not focused on preprocessing dates in a text corpus. To perform the requried tokenization and vectorization of the text nltk was used instaed of tensorflow or keras based text preprocessors. The preprocessing includes data cleaning (remvoing improper data lbaleing or file namings), stopwords removal, puncation removal but keeping in mind the punctutaions within a date like '/', spacing and seperating dates with words as there were cases where the numbers in the dates are conjoined with the preceding word, tokenization and vectorization of word. For vectorization of the word a normal word based vectorization was used as usig TF-IDF would not have made much difference in terms of date extraction.

Preprocessed data before vectorization:

Model Building

The model for this problem was a RNN based model with a bidirectional LSTM layer. the inputs of the model include the preprocessed data with 3 output values each predicting the values of a day, month and year respectively.

The model was trained a decayed learning rate starting from a learning rate of 0.001 and trained for 80 epochs with a batch size of 8.

Model Architecture:

Results

The model performed quite well being a baseline model to extract date using just a single Bidirectional LSTM layer. The prediction file is atatched to refer the results.

This is a NLP based project to extract effective date of the contract from their text files.

Related tags

Overview

Date-Extraction-from-Contracts

Problem statement

Libraries Used

Approach

Data prerprocessing

Model Building

Results

Owner

Sambhav Garg

基于“Seq2Seq+前缀树”的知识图谱问答

小布助手对话短文本语义匹配的一个baseline

ttslearn: Library for Pythonで学ぶ音声合成 (Text-to-speech with Python)

NeoDays-based tileset for the roguelike CDDA (Cataclysm Dark Days Ahead)

The source code of "Language Models are Few-shot Multilingual Learners" (MRL @ EMNLP 2021)

Automatically search Stack Overflow for the command you want to run

The model is designed to train a single and large neural network in order to predict correct translation by reading the given sentence.

GraphNLI: A Graph-based Natural Language Inference Model for Polarity Prediction in Online Debates

Machine learning models from Singapore's NLP research community

Application for shadowing Chinese.

TextAttack 🐙 is a Python framework for adversarial attacks, data augmentation, and model training in NLP

An Explainable Leaderboard for NLP

KR-FinBert And KR-FinBert-SC

(ACL 2022) The source code for the paper "Towards Abstractive Grounded Summarization of Podcast Transcripts"

The entmax mapping and its loss, a family of sparse softmax alternatives.

easySpeech is an open-source Python wrapper for google speech to text API that doesn't require PyAudio(So you especially windows user don't have to deal with the errors while installing PyAudio) and also works with hugging face transformers

🤗 Transformers: State-of-the-art Machine Learning for Pytorch, TensorFlow, and JAX.

Easy Language Model Pretraining leveraging Huggingface's Transformers and Datasets

Python api wrapper for JellyFish Lights

2021 2학기 데이터크롤링 기말프로젝트