Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
Python Password Generator

This is a console-based version of a password generator written with Python. The program generates a password based on numbers of letters, numbers, and symbols specified by the user. This is a simple

p.katekomol 1 Jan 24, 2022
Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods

Time Discretization-Invariant Safe Action Repetition for Policy Gradient Methods This repository is the official implementation of Seohong Park, Jaeky

Seohong Park 6 Aug 02, 2022
A collection of over 5.1 million sub-domains and assets belonging to public bug bounty programs, compiled into a repo, for performing bulk operations.

📂 Public Bug Bounty Targets Data By BugBountyResources A collection of over 5.1M sub-domains and assets belonging to bug bounty targets, all put in a

Bug Bounty Resources 87 Dec 13, 2022
A collection of write-ups and solutions for Cyber FastTrack Spring 2021.

IMPORTANT: Please contact us before you use any styling or content shown here! Cyber FastTrack Spring 2021 / National Cyber Scholarship Competition -

Alice 48 Aug 28, 2022
A signature parser for hikari's command handler tanjun.

tanchi A signature parser for hikari's command handler tanjun. Finally be able to define your commands without those bloody decorator chains! Example

sadru 11 Nov 17, 2022
Find existing email addresses by nickname using API/SMTP checking methods without user notification. Please, don't hesitate to improve cat's job! 🐱🔎 📬

mailcat The only cat who can find existing email addresses by nickname. Usage First install requirements: pip3 install -r requirements.txt Then just

282 Dec 30, 2022
This is a simple tool to create ZIP payloads using a provided wordlist for the symlink attack (present in some file upload vulnerabilities)

zip-symlink-payload-creator This is a simple tool to create ZIP payloads using a provided wordlist for the symlink attack (present in some file upload

stark0de 6 Aug 18, 2022
Official repository for Pyew.

pyew Pyew is a (command line) python tool to analyse malware. It does have support for hexadecimal viewing, disassembly (Intel 16, 32 and 64 bits), PE

Joxean 362 Nov 28, 2022
Apache Solr SSRF(CVE-2021-27905)

Solr-SSRF Apache Solr SSRF #Use [-] Apache Solr SSRF漏洞 (CVE-2021-27905) [-] Options: -h or --help : 方法说明 -u or --url

Henry4E36 70 Nov 09, 2022
Ingest GreyNoise.io malicious feed for CVE-2021-44228 and apply null routes

log4j-nullroute Quick script to ingest IP feed from greynoise.io for log4j (CVE-2021-44228) and null route bad addresses. Works w/Cisco IOS-XE and Ari

Ryan 5 Sep 12, 2022
A quick script to spot the usage of Unicode Bidi (bidirectional) characters that could lead to an Invisible Backdoor

Invisible Backdoor Detector is a little Python script that allows you to spot and remove Bidi characters that could lead to an invisible backdoor. If you don't know what that is you should check the

SecSI 28 Dec 29, 2022
Mert Güvençli 142 Jan 05, 2023
🔐 A simple command-line password manager.

PassVault What Is It? It is a command-line password manager, for educational purposes, that stores localy, in AES encryption, your sensitives datas in

5 Aug 15, 2022
A passive-recon tool that parses through found assets and interacts with the Hackerone API

Hackerone Passive Recon Tool A passive-recon tool that parses through found assets and interacts with the Hackerone API. Setup Simply run setup.sh to

elbee 4 Jan 13, 2022
Reverse engineered Parler API

Parler's unofficial API with all endpoints present in their iOS app as of 08/12/2020. For the most part undocumented, but the error responses are alre

393 Nov 26, 2022
Security-TXT is a python package for retrieving, parsing and manipulating security.txt files.

Security-TXT is a python package for retrieving, parsing and manipulating security.txt files.

Frank 3 Feb 07, 2022
A proof-of-concept exploit for Log4j RCE Unauthenticated (CVE-2021-44228)

CVE-2021-44228 – Log4j RCE Unauthenticated About This is a proof-of-concept exploit for Log4j RCE Unauthenticated (CVE-2021-44228). This vulnerability

Pedro Havay 20 Nov 11, 2022
Volunteer & Campaign Management System

Cleansweep Requirements A Linux (or Mac OS X) node with the following software installed. Ubuntu 14.04 is preferred. PostgreSQL 9.3 database server Py

Aam Aadmi Party 39 May 24, 2022
Scan all java processes on your host to check weather it's affected by log4j2 remote code execution

Log4j2 Vulnerability Local Scanner (CVE-2021-45046) Log4j 漏洞本地检测脚本,扫描主机上所有java进程,检测是否引入了有漏洞的log4j-core jar包,是否可能遭到远程代码执行攻击(CVE-2021-45046)。上传扫描报告到指定的服

86 Dec 09, 2022
Buff A simple BOF library I wrote under an hour to help me automate with BOF attack

What is Buff? A simple BOF library I wrote under an hour to help me automate with BOF attack. It comes with fuzzer and a generic method to generate ex

0x00 3 Nov 21, 2022