Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
RapiDAST provides a framework for continuous, proactive and fully automated dynamic scanning against web apps/API.

RapiDAST RapiDAST provides a framework for continuous, proactive and fully automated dynamic scanning against web apps/API. Its core engine is OWASP Z

Red Hat Product Security 17 Nov 11, 2022
This is a Crypto asset tracker that I built to aid my personal journey in cryptocurrencies.

Wallet Tracker This is a Crypto asset tracker that I built to aid my personal journey in cryptocurrencies. build docker build -t wallet-tracker . run

2 Mar 21, 2022
log4j-tools: CVE-2021-44228 poses a serious threat to a wide range of Java-based applications

log4j-tools Quick links Click to find: Inclusions of log4j2 in compiled code Calls to log4j2 in compiled code Calls to log4j2 in source code Overview

JFrog Ltd. 171 Dec 25, 2022
ProxyShell POC Exploit : Exchange Server RCE (ACL Bypass + EoP + Arbitrary File Write)

ProxyShell Install git clone https://github.com/ktecv2000/ProxyShell cd ProxyShell virtualenv -p $(which python3) venv source venv/bin/activate pip3 i

Poming huang 312 Dec 09, 2022
A forensic collection tool written in Python.

CHIRP A forensic collection tool written in Python. Watch the video overview 📝 Table of Contents 📝 Table of Contents 🧐 About 🏁 Getting Started Pre

Cybersecurity and Infrastructure Security Agency 1k Dec 09, 2022
An automated header extensive scanner for detecting log4j RCE CVE-2021-44228

log4j An automated header extensive scanner for detecting log4j RCE CVE-2021-44228 Usage $ python3 log4j.py -l urls.txt --dns-log REPLACE_THIS.dnslog.

2 Dec 16, 2021
A windows post exploitation tool that contains a lot of features for information gathering and more.

Crowbar - A windows post exploitation tool Status - ✔️ This project is now considered finished. Any updates from now on will most likely be new script

29 Nov 20, 2022
Pre-Auth Blind NoSQL Injection leading to Remote Code Execution in Rocket Chat 3.12.1

CVE-2021-22911 Pre-Auth Blind NoSQL Injection leading to Remote Code Execution in Rocket Chat 3.12.1 The getPasswordPolicy method is vulnerable to NoS

Enox 47 Nov 09, 2022
This repository is one of a few malware collections on the GitHub.

This repository is one of a few malware collections on the GitHub.

Andrew 1.7k Dec 28, 2022
spring-cloud-gateway-rce CVE-2022-22947

Spring Cloud Gateway Actuator API SpEL表达式注入命令执行(CVE-2022-22947) 1.installation pip3 install -r requirements.txt 2.Usage $ python3 spring-cloud-gateway

k3rwin 10 Sep 28, 2022
labsecurity is a framework and its use is for ethical hacking and computer security

labsecurity labsecurity is a framework and its use is for ethical hacking and computer security. Warning This tool is only for educational purpose. If

Dylan Meca 16 Dec 08, 2022
NoSecerets is a python script that is designed to crack hashes extremely fast. Faster even than Hashcat

NoSecerets NoSecerets is a python script that is designed to crack hashes extremely fast. Faster even than Hashcat How does it work? Instead of taking

DosentTrust GithubDatabase 9 Jul 04, 2022
Threat research and reporting from IronNet's Threat Research Teams

IronNet Threat Research 🕵️ Overview This repository contains IronNet's Threat Research. Research & Reporting 📝 Project Description Cobalt Strike Res

36 Dec 02, 2022
DNSpooq - dnsmasq cache poisoning (CVE-2020-25686, CVE-2020-25684, CVE-2020-25685)

dnspooq DNSpooq PoC - dnsmasq cache poisoning (CVE-2020-25686, CVE-2020-25684, CVE-2020-25685) For educational purposes only Requirements Docker compo

Teppei Fukuda 80 Nov 28, 2022
Password list generator for password spraying - prebaked with goodies

Generates permutations of Months, Seasons, Years, Sports Teams (NFL, NBA, MLB, NHL), Sports Scores, "Password", and even Iterable Keyspaces of a specified size.

Casey Erdmann 65 Dec 22, 2022
NS-LOOKUP - A python script for scanning website for getting ip address of a website

NS-LOOKUP A python script for scanning website for getting ip address of a websi

Spider Anongreyhat 5 Aug 02, 2022
ADExplorerSnapshot.py is an AD Explorer snapshot ingestor for BloodHound.

ADExplorerSnapshot.py ADExplorerSnapshot.py is an AD Explorer snapshot ingestor for BloodHound. AD Explorer allows you to connect to a DC and browse L

576 Dec 23, 2022
Auto Tor Ip Changer

AutoTor Auto Tor Ip Changer for Linux! git clone https://github.com/Arest7/AutoTor cd AutoTor pip install -r requirements.txt python3 AutoTor.py follo

Ken Ryuguji 3 Jan 23, 2022
Laravel RCE (CVE-2021-3129)

CVE-2021-3129 - Laravel RCE About The script has been made for exploiting the Laravel RCE (CVE-2021-3129) vulnerability. This script allows you to wri

Joshua van der Poll 21 Dec 27, 2022