Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
Midas ELF64 Injector is a tool that will help you inject a C program from source code into an ELF64 binary.

Midas ELF64 Injector Description Midas ELF64 Injector is a tool that will help you inject a C program from source code into an ELF64 binary. All you n

midas 20 Dec 24, 2022
Sudo Baron Samedit Exploit

CVE-2021-3156 (Sudo Baron Samedit) This repository is CVE-2021-3156 exploit targeting Linux x64. For writeup, please visit https://datafarm-cybersecur

Worawit Wang 559 Jan 03, 2023
Log4j minecraft with python

Apache-Log4j Apache Log4j 远程代码执行 攻击者可直接构造恶意请求,触发远程代码执行漏洞。漏洞利用无需特殊配置,经阿里云安全团队验证,Apache Struts2、Apache Solr、Apache Druid、Apache Flink等均受影响 Steps 【Import

manmade 57 Oct 03, 2022
ClusterFuzz is a scalable fuzzing infrastructure that finds security and stability issues in software.

ClusterFuzz ClusterFuzz is a scalable fuzzing infrastructure that finds security and stability issues in software. Google uses ClusterFuzz to fuzz all

Google 4.9k Jan 08, 2023
How to exploit a double free vulnerability in 2021. 'Use-After-Free for Dummies'

This bug doesn’t exist on x86: Exploiting an ARM-only race condition How to exploit a double free and get a shell. "Use-After-Free for dummies" In thi

Stephen Tong 1.2k Dec 25, 2022
A guide to building basic malware in Python by implementing a keylogger application

Keylogger-Malware-Project A guide to building basic malware in Python by implementing a keylogger application. If you want even more detail on the Pro

Noah Davis 1 Jan 11, 2022
Separation of Mainlobes and Sidelobes in the Ultrasound Image Based on the Spatial Covariance (MIST) and Aperture-Domain Spectrum of Received Signals

Separation of Mainlobes and Sidelobes in the Ultrasound Image Based on the Spatial Covariance (MIST) and Aperture-Domain Spectrum of Received Signals

Rehman Ali 3 Jan 03, 2023
Worm/Trojan/Ransomware/apt/Rootkit/Virus Database

Pestilence - The Malware Database [] Screenshot Pestilence is a project created to make the possibility of malware analysis open and available to the

*ERR0R* 47 Dec 21, 2022
Holehe OSINT - Email to Registered Accounts

holehe allows you to check if the mail is used on different sites like twitter, instagram and will retrieve information on sites with the forgotten password function.

Palenath 3.8k Jan 06, 2023
Exploiting CVE-2021-42278 and CVE-2021-42287 to impersonate DA from standard domain user

About Exploiting CVE-2021-42278 and CVE-2021-42287 to impersonate DA from standard domain user Changed from sam-the-admin. Usage SAM THE ADMIN CVE-202

Evi1cg 500 Jan 06, 2023
Moodle community-based vulnerability scanner

badmoodle Moodle community-based vulnerability scanner Description badmoodle is an unofficial community-based vulnerability scanner for moodle that sc

Michele Di Bonaventura 11 Dec 22, 2022
VMware vCenter earlier v(7.0.2.00100) unauthorized arbitrary file read

vcenter_fileread_exploit VMware vCenter earlier v(7.0.2.00100) unauthorized arbitrary file read Usage python3 vCenter_fileread.py http(s)://ip Referen

Ashish Kunwar 4 Sep 23, 2022
Selamat Datang DiTools Crack-Old, Crack Old Adalah Sebuah Crack Tanpa Login Dan Crack Menggunakan Akun Facebook Tua/Old.

Selamat Datang DiTools Crack-Old, Crack Old Adalah Sebuah Crack Tanpa Login Dan Crack Menggunakan Akun Facebook Tua/Old. ([Welcome to Crack-Old Tools, Old Crack Is A Crack Without Login And Crack Usi

Risky [ Zero Tow ] 7 Dec 25, 2022
Passphrase-wordlist - Shameless clone of passphrase wordlist

This repository is NOT official -- the original repository is located on GitLab

Jeff McJunkin 2 Feb 05, 2022
Aiminsun 165 Dec 21, 2022
The next level Python obfuscator, nearly impossible to deobfuscate.

🐸 Kramer 🐸 Kramer is a next level obfuscation tool written in Python3 allowing you to obfuscate your Python3 code easily and securely. It uses Berse

Billy 114 Dec 26, 2022
KeyLogger

By-Emirhan KeyLogger Hangi Sistemlerde Çalışır? | On Which Systems Does It Work? KALİ LİNUX UBUNTU PARDUS MİNT TERMUX ARCH YÜKLEME & ÇALIŞTIRMA KOMUTL

2 Feb 24, 2022
A fast sub domain brute tool for pentesters

subDomainsBrute 1.4 A fast sub domain brute tool for pentesters. It works with P

Oliver 2 Oct 18, 2022
A piece of software that shows a traceroute of a URL redirect path

Tracing URL redirects has never been easier! Usage • Download 🚩 Use Cases To see where an affiliate link ends up To see what affiliate network is bei

41 Nov 22, 2022