Quantifiers-and-Negations-in-RE-Documents

This project was part of my work for a seminar at the Technical University of Munich (TUM) during my bachelor studies in 2019. The python project can be used to find quantifiers and negations in documents. It searches for problematic findings. Problematic findings are i.e. sentences that use specific combinations of quantifiers and negations that are ambiguous. This means there are multiple valid interpretations of the sentence. It can extract those and report them.

Motivation:

You want to avoid ambiguous sentences as they can cause problems that are hard to find and possibly hard to fix. This is especially the case for technical specifications and similar use cases. In this project we compare two different approaches to finding ambiguous sentences:

String based search
NLP based search

We want to find out if the computational overhead of using NLP gives better results than standard string based search methods.

Features:

Detect quantifiers and negations in .xml or .txt documents
Search either by a string based search or by NLP based search (using Stanfords CoreNLP library [1])
Extract possibly ambiguous sentences
Compare string search results with NLP search results

Prerequisites:

Java 8 or higher
Python 3.6 or higher as project interpreter
Stanford Corenlp library: https://stanfordnlp.github.io/CoreNLP/download.html
Environment variable "CORENLP_HOME" set to where the CoreNLP library is stored

References:

[1] Christopher D.Manning, MihaiSurdeanu, JohnBauer, JennyFinkel, StevenJ.Bethard, and David McClosky. The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations, pages 55–60, 2014.

Quantifiers and Negations in RE Documents

Related tags

Overview

Quantifiers-and-Negations-in-RE-Documents

Owner

Nicolas Ruscher

In this project, we compared Spanish BERT and Multilingual BERT in the Sentiment Analysis task.

Pipelines de datos, 2021.

Multi-Task Pre-Training for Plug-and-Play Task-Oriented Dialogue System

pytorch-kaldi is a project for developing state-of-the-art DNN/RNN hybrid speech recognition systems. The DNN part is managed by pytorch, while feature extraction, label computation, and decoding are performed with the kaldi toolkit.

Statistics and Mathematics for Machine Learning, Deep Learning , Deep NLP

Various Algorithms for Short Text Mining

The code for the Subformer, from the EMNLP 2021 Findings paper: "Subformer: Exploring Weight Sharing for Parameter Efficiency in Generative Transformers", by Machel Reid, Edison Marrese-Taylor, and Yutaka Matsuo

NewsMTSC: (Multi-)Target-dependent Sentiment Classification in News Articles

Transcribing audio files using Hugging Face's implementation of Wav2Vec2 + "chain-linking" NLP tasks to combine speech-to-text with downstream tasks like translation and summarisation.

Code for lyric-section-to-comment generation based on huggingface transformers.

HuggingSound: A toolkit for speech-related tasks based on HuggingFace's tools

Chinese segmentation library

Understanding the Difficulty of Training Transformers

This project uses unsupervised machine learning to identify correlations between daily inoculation rates in the USA and twitter sentiment in regards to COVID-19.

Official Stanford NLP Python Library for Many Human Languages

SimpleChinese2 集成了许多基本的中文NLP功能，使基于 Python 的中文文字处理和信息提取变得简单方便。

chaii - hindi & tamil question answering

Ελληνικά νέα (Python script) / Greek News Feed (Python script)

This repo stores the codes for topic modeling on palliative care journals.

Code for our paper "Transfer Learning for Sequence Generation: from Single-source to Multi-source" in ACL 2021.