Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
SSH Tool For OSINT and then Cracking.

sshmap SSH Tool For OSINT and then Cracking. Linux Systems Only Usage: Scanner Syntax: scanner start/stop/status - Sarts/stops/sho

Miss Bliss 5 Apr 04, 2022
Python tool for enumerating directories and for fuzzing

Python tool for enumerating directories and for fuzzing

Gourab Roy 5 Feb 21, 2022
Open-source jailbreaking tool for many iOS devices

Open-source jailbreaking tool for many iOS devices *Read disclaimer before using this software. checkm8 permanent unpatchable bootrom exploit for hund

6.7k Jan 05, 2023
Wordlist attacks on Bitwarden data.json files

BitwardenDecryptBrute This is a slightly modified version of BitwardenDecrypt. In addition to the decryption this version can do wordlist attacks for

42 Nov 09, 2022
Example for the NFT 3D Collectibles using Blender Scripting (Python).

NFT Collectibles using Blender Python What is this? This project is to demonstrate for generating NFT Collectible Avatar-Styled images. For details, p

hideckies 48 Nov 26, 2022
Python lib to automate basic QFT calculations like Wick-contractions.

QFTools Python lib to automate basic QFT calculations like Wick-contractions. Features Wick contractions for real scalar fields Wick contractions for

2 Aug 21, 2022
A Tool to find subdomains from hackerone reports.

Hactivity A Tool to find subdomains from Hackerone reports of a given company or a search term (xss, ssrf, etc). It can also print out URL and Title o

Stinger 15 Jul 24, 2022
Python directory buster, multiple threads, gobuster-like CLI, web server brute-forcer, URL replace pattern feature.

pybuster v1.1 pybuster is a tool that is used to brute-force URLs of web servers. Features Directory busting (URI) URL replace patterns (put PYBUSTER

Glaukio 1 Jan 05, 2022
A python tool capable of creating HUGE wordlists. Has the ability to add custom words for concatenation in any way you see fit.

A python tool capable of creating HUGE wordlists. Has the ability to add custom words for concatenation in any way you see fit.

Codex 9 Oct 05, 2022
client attack remotely , this script was written for educational purposes only

client attack remotely , this script was written for educational purposes only, do not use against to any victim, which you do not have permission for it

9 Jun 05, 2022
NEW FACEBOOK CLONER WITH NEW PASSWORD, TERMUX FB CLONE, FB CLONING COMMAND. M

NEW FACEBOOK CLONER WITH NEW PASSWORD, TERMUX FB CLONE, FB CLONING COMMAND. M

Mr. Error 81 Jan 08, 2023
On the 11/11/21 the apache 2.4.49-2.4.50 remote command execution POC has been published online and this is a loader so that you can mass exploit servers using this.

ApacheRCE ApacheRCE is a small little python script that will allow you to input the apache version 2.4.49-2.4.50 and then input a list of ip addresse

3 Dec 04, 2022
This is a simple tool to create ZIP payloads using a provided wordlist for the symlink attack (present in some file upload vulnerabilities)

zip-symlink-payload-creator This is a simple tool to create ZIP payloads using a provided wordlist for the symlink attack (present in some file upload

stark0de 6 Aug 18, 2022
Yet another web fuzzer

yafuzz Yet another web fuzzer Usage This script can run in two modes of operation. Supplying a wordlist -W argument will initiate a multithreaded fuzz

FooBallZ 5 Feb 02, 2022
The Easiest Way To Gallery Hacking

The easiest way to HACK A GALLARY, Get every part of your friends' gallery ( 100% Working ) | Tool By John Kener 🇱🇰

John Kener 34 Nov 30, 2022
A secure password generator written in python

gruvbox-factory 🏭 "The main focus when developing gruvbox is to keep colors easily distinguishable, contrast enough and still pleasant for the eyes"

Paulo Pacitti 430 Dec 27, 2022
RCE Exploit for Gitlab < 13.9.4

GitLab-Wiki-RCE RCE Exploit for Gitlab 13.9.4 RCE via unsafe inline Kramdown options when rendering certain Wiki pages Allows any user with push acc

Enox 52 Nov 09, 2022
Python Library For Ethical Hacker

Python Library For Ethical Hacker

11 Nov 03, 2022
CVE-2021-26855 SSRF Exchange Server

CVE-2021-26855 Brute Force EMail Exchange Server Timeline: Monday, March 8, 2021: Update Dumping content...(I'm not done, can u guy help me done this

lulz 117 Nov 28, 2022
RedDrop is a quick and easy web server for capturing and processing encoded and encrypted payloads and tar archives.

RedDrop Exfil Server Check out the accompanying MaverisLabs Blog Post Here! RedDrop Exfil Server is a Python Flask Web Server for Penetration Testers,

53 Nov 01, 2022