Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
nuclei scanner for proxyshell ( CVE-2021-34473 )

Proxyshell-Scanner nuclei scanner for Proxyshell RCE (CVE-2021-34423,CVE-2021-34473,CVE-2021-31207) discovered by orange tsai in Pwn2Own, which affect

PikaChu 29 Dec 16, 2022
Exploiting CVE-2021-42278 and CVE-2021-42287 to impersonate DA from standard domain user

About Exploiting CVE-2021-42278 and CVE-2021-42287 to impersonate DA from standard domain user Changed from sam-the-admin. Usage SAM THE ADMIN CVE-202

Evi1cg 500 Jan 06, 2023
Generate your own NFTs and their metadata based on your desired probabilities.

Generate your own NFTs and their metadata based on your desired probabilities. Use your own art assets too! Perfect for use with Candy Machine.

hex 7 Sep 16, 2022
A gui application used for network reconnaissance while pentesting

netrecon A gui application used for network reconnaissance while pentesting

Krisna Pranav 4 Sep 03, 2022
A Python application to predict what is cooking

ez-cuisine-classifier A Python application to predict what is cooking Environment Python 3.9 Windows 10 Install python -m venv venv .\venv\Scripts\act

Zeheng Li 1 Jun 21, 2022
A Tool to find subdomains from hackerone reports.

Hactivity A Tool to find subdomains from Hackerone reports of a given company or a search term (xss, ssrf, etc). It can also print out URL and Title o

Stinger 15 Jul 24, 2022
Proof of Concept Exploit for vCenter CVE-2021-21972

CVE-2021-21972 Proof of Concept Exploit for vCenter CVE-2021-21972

Horizon 3 AI Inc 210 Dec 31, 2022
Python tool for dumping flash via uboot reliably

Reliable Uboot Flash Dumper is a Python tool for dumping flash via uboot reliably. If you've ever had to dump flash via uboot and a serial connection and became frustrated about doing it several time

SecurityJon 25 May 10, 2022
A blind SQL injection script that uses binary search aka bisection method to dump datas from database.

Blind SQL Injection I wrote this script to solve PortSwigger Web Security Academy's particular Blind SQL injection with conditional responses lab. Bec

Şefik Efe 2 Oct 29, 2022
This project is for finding a solution to use Security Onion Elastic data with Jupyter Notebooks.

This project is for finding a solution to use Security Onion Elastic data with Jupyter Notebooks. The goal is to successfully use this notebook project below with Security Onion for beacon detection

4 Jun 08, 2022
Buff A simple BOF library I wrote under an hour to help me automate with BOF attack

What is Buff? A simple BOF library I wrote under an hour to help me automate with BOF attack. It comes with fuzzer and a generic method to generate ex

0x00 3 Nov 21, 2022
Exploit for CVE-2017-17562 vulnerability, that allows RCE on GoAhead (< v3.6.5) if the CGI is enabled and a CGI program is dynamically linked.

GoAhead RCE Exploit Exploit for CVE-2017-17562 vulnerability, that allows RCE on GoAhead ( v3.6.5) if the CGI is enabled and a CGI program is dynamic

Francisco Spínola 2 Dec 12, 2021
Jolokia Exploitation Toolkit (JET) helps exploitation of exposed jolokia endpoints.

jolokia-exploitation-toolkit Jolokia Exploitation Toolkit (JET) helps exploitation of exposed jolokia endpoints. Core concept Jolokia is a protocol br

Laluka 194 Jan 01, 2023
Universal Radio Hacker: Investigate Wireless Protocols Like A Boss

The Universal Radio Hacker (URH) is a complete suite for wireless protocol investigation with native support for many common Software Defined Radios.

Dr. Johannes Pohl 9k Jan 03, 2023
C++ fully undetected shellcode launcher

charlotte c++ fully undetected shellcode launcher ;) releasing this to celebrate the birth of my newborn description 13/05/2021: c++ shellcode launche

894 Dec 25, 2022
HatSploit native powerful payload generation and shellcode injection tool that provides support for common platforms and architectures.

HatVenom HatSploit native powerful payload generation and shellcode injection tool that provides support for common platforms and architectures. Featu

EntySec 100 Dec 23, 2022
A python implementation of the windows 95 product key check.

Windows 95 Product Key Check Info: This is a python implementation of the windows 95 product key check. This was just a bit of fun and a massive 5 hou

11 Aug 07, 2022
A script to extract SNESticle from Fight Night Round 2

fn22snesticle.py A script for producing a SNESticle ISO from a Fight Night Round 2 ISO and any SNES ROM. Background Fight Night Round 2 is a boxing ga

Johannes Holmberg 57 Nov 22, 2022
An advanced multi-threaded, multi-client python reverse shell for hacking linux systems

PwnLnX An advanced multi-threaded, multi-client python reverse shell for hacking linux systems. There's still more work to do so feel free to help out

0xTRAW 212 Dec 24, 2022
On-demand scanning for container registries

Lacework registry scanner Install & configure Lacework CLI Integrate a Container Registry Go to Lacework Resources Containers Container Image In

Will Robinson 1 Dec 14, 2021