Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
Tinyman exploit finder - Tinyman exploit finder for python

tinyman_exploit_finder There was a big tinyman exploit. You can read about it he

fish.exe 9 Dec 27, 2022
A Burp Pro extension that adds log4shell checks to Burp Scanner

scan4log4shell A Burp Pro extension that adds log4shell checks to Burp Scanner, written by Daniel Crowley of IBM X-Force Red. Installation To install

X-Force Red 26 Mar 15, 2022
A compact version of EDI-Vetter, which uses the TLS output to quickly vet transit signals.

A compact version of EDI-Vetter, which uses the TLS output to quickly vet transit signals. All your favorite hits in a simplified format.

Jon Zink 2 Aug 03, 2022
An Advanced Local Network IP Scanner, made in python of course!

██╗██████╗    ██████╗ █████╗ █████╗ ███╗ ██╗███╗ ██╗███████╗██████╗ ██║██╔══██╗  ██╔════╝██╔══██╗██╔══██╗████╗ ██║████╗ ██║██╔════╝██╔══██

Polsulpicien 2 Dec 18, 2021
Infection Monkey - An automated pentest tool

Infection Monkey Data center Security Testing Tool Welcome to the Infection Monkey! The Infection Monkey is an open source security tool for testing a

Guardicore Ltd. 6k Jan 09, 2023
An forensics tool to help aid in the investigation of spoofed emails based off the email headers.

A forensic tool to make analysis of email headers easy to aid in the quick discovery of the attacker. Table of Contents About mailMeta Installation Us

Syed Modassir Ali 59 Nov 26, 2022
domato but as a website

ROFL-FUZZER Ths is Domato, a DOM Fuzzer from Google, but hosted as an website It generates a instance of a newtab on the template given by the user ,

Swapnadeep Som 18 Nov 22, 2021
macOS Initial Access Payload Generator

Mystikal macOS Initial Access Payload Generator Related Blog Post: https://posts.specterops.io/introducing-mystikal-4fbd2f7ae520 Usage: Install Xcode

Leo Pitt 206 Dec 31, 2022
labsecurity is a framework and its use is for ethical hacking and computer security

labsecurity labsecurity is a framework and its use is for ethical hacking and computer security. Warning This tool is only for educational purpose. If

Dylan Meca 16 Dec 08, 2022
A Burp extension adding a passive scan check to flag parameters whose name or value may indicate a possible insertion point for SSRF or LFI.

BurpParamFlagger A Burp extension adding a passive scan check to flag parameters whose name or value may indicate a possible insertion point for SSRF

Allyson O'Malley 118 Nov 07, 2022
Multi-Process Vulnerability Tool

Multi-Process Vulnerability Tool

Baris Dincer 1 Dec 22, 2021
Polkit - Local Privilege Escalation (CVE-2021-3560)

CVE-2021-3560 Polkit - Local Privilege Escalation Original discovery by kevin_backhouse from GitHub Security Lab References https://github.blog/2021-0

Salman Asad 1 Nov 12, 2021
Script checks provided domains for log4j vulnerability

log4j Script checks provided domains for log4j vulnerability. A token is created with canarytokens.org and passed as header at request for a single do

Matthias Nehls 2 Dec 12, 2021
Some Attacks of Exchange SSRF ProxyLogon&ProxyShell

Some Attacks of Exchange SSRF This project is heavily replicated in ProxyShell, NtlmRelayToEWS https://mp.weixin.qq.com/s/GFcEKA48bPWsezNdVcrWag Get 1

Jumbo 129 Dec 30, 2022
Threat Intel Platform for T-POTs

GreedyBear The project goal is to extract data of the attacks detected by a TPOT or a cluster of them and to generate some feeds that can be used to p

The Honeynet Project 72 Jan 01, 2023
A tool for making python source difficult to read.

obscurepy Description A tool for obscuring, or making python source code difficult to read. Table of Contents Installation Limitations Usage Disclaime

Andrew Christiansen 10 Jul 31, 2022
The Multi-Tool Web Vulnerability Scanner.

🟥 RapidScan v1.2 - The Multi-Tool Web Vulnerability Scanner RapidScan has been ported to Python3 i.e. v1.2. The Python2.7 codebase is available on v1

skavngr 1.3k Dec 31, 2022
Lazarus analysis tools and research report

Lazarus Research This repository publishes analysis reports and analysis tools for Operation Dream Job and Operation JTrack for Lazarus. Tools Python

JPCERT Coordination Center 50 Sep 13, 2022
Source code for "A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction" @ NAACL 2022

TSAR Source code for NAACL 2022 paper: A Two-Stream AMR-enhanced Model for Document-level Event Argument Extraction. 🔥 Introduction We focus on extra

21 Sep 24, 2022