Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
This is an advanced backdoor, created with Python

Backdoor This is a Backdoor, created with Python 3. Types of Commands: Downloading / Uploading files. Launching / Deleting / Reading file's content. S

swagkarna 28 Oct 28, 2022
A simple subdomain scanner in python

Subdomain-Scanner A simple subdomain scanner in python ✨ Features scans subdomains of a domain thats it! 💁‍♀️ How to use first download the scanner.p

Portgas D Ace 2 Jan 07, 2022
Arbitrium is a cross-platform, fully undetectable remote access trojan, to control Android, Windows and Linux and doesn't require any firewall exceptions or port forwarding rules

About: Arbitrium is a cross-platform is a remote access trojan (RAT), Fully UnDetectable (FUD), It allows you to control Android, Windows and Linux an

Ayoub 861 Feb 18, 2021
🐝 ℹ️ Honeybee extension for export to IES-VE gem file format

honeybee-ies Honeybee extension for export a HBJSON file to IES-VE GEM file format Installation pip install honeybee-ies QuickStart import pathlib fro

Ladybug Tools 4 Jul 12, 2022
A toolkit for web reconnaissance, it's fast and easy to use.

A toolkit for web reconnaissance, it's fast and easy to use. File Structure httpsuite/ main.py init.py db/ db.py init.py subdomains_db directories_db

whoami security 22 Jul 22, 2022
This repository detects a system vulnerable to CVE-2022-21907 and protects against this vulnerability if desired

This repository detects a system vulnerable to CVE-2022-21907 and protects against this vulnerability if desired

26 Dec 26, 2022
Abusing Microsoft 365 OAuth Authorization Flow for Phishing Attack

O365DevicePhish Microsoft365_devicePhish Abusing Microsoft 365 OAuth Authorization Flow for Phishing Attack This is a simple proof-of-concept script t

Trewis [work] Scotch 4 Sep 23, 2022
A Python 3 script that uploads a tasks.pickle file that enables RCE in MotionEye

MotionEye/MotionEyeOS Authenticated RCE A Python 3 script that uploads a tasks.pickle file that enables RCE in MotionEye. You need administrator crede

Matt 1 Apr 18, 2022
This is a js front-end encryption blasting account and password tools

Author:0xAXSDD By Gamma安全实验室 version:1.0 explain:这是一款用户绕过前端js加密进行密码爆破的工具,你无需在意js加密的细节,只需要输入你想要爆破url,以及username输入框的classname,password输入框的clas

75 Nov 25, 2022
Monty Hall Problem simulation written in Python.

Monty Hall Problem Simulation monty_hall_sim is a brute-force method of determining the optimal strategy for the Monty Hall Problem. Usage Set boolean

Xavier D 1 Aug 29, 2022
exchange-ssrf-rce

Usage python3 .\exchange-exp.py -------------------------------------------------------------------------------- |

Jen 76 Nov 09, 2022
Windows Virus who destroy some impotants files on C:\windows\system32\

psychic-robot Windows Virus who destroy some importants files on C:\windows\system32\ Signatures of psychic-robot.PY (python file) : Bkav Pro : ASP.We

H-Tech-Dev36 1 Jan 06, 2022
Pre-Auth Blind NoSQL Injection leading to Remote Code Execution in Rocket Chat 3.12.1

CVE-2021-22911 Pre-Auth Blind NoSQL Injection leading to Remote Code Execution in Rocket Chat 3.12.1 The getPasswordPolicy method is vulnerable to NoS

Enox 47 Nov 09, 2022
"Video Moment Retrieval from Text Queries via Single Frame Annotation" in SIGIR 2022.

ViGA: Video moment retrieval via Glance Annotation This is the official repository of the paper "Video Moment Retrieval from Text Queries via Single F

Ran Cui 38 Dec 31, 2022
BOF-Roaster is an automated buffer overflow exploit machine which is begin written with Python 3.

BOF-Roaster is an automated buffer overflow exploit machine which is begin written with Python 3. On first release it was able to successfully break many of the most well-known buffer overflow exampl

Kaan Caglan 5 Nov 23, 2021
RedlineSpam - Python tool to spam Redline Infostealer panels with legit looking data

RedlineSpam Python tool to spam Redline Infostealer panels with legit looking da

4 Jan 27, 2022
A burp-suite plugin that extract all parameter names from in-scope requests

ParamsExtractor A burp-suite plugin that extract all parameters name from in-scope requests. You can run the plugin while you are working on the targe

29 Nov 09, 2022
Backdoor is a term that refers to the access of the software or hardware of a computer system without being detected.

This program is an non-object oriented opensource, hidden and undetectable backdoor/reverse shell/RAT for Windows made in Python 3 which contains many features such as multi-client support and cross-

35 Apr 17, 2022
com_media allowed paths that are not intended for image uploads to RCE

CVE-2021-23132 com_media allowed paths that are not intended for image uploads to RCE. CVE-2020-24597 Directory traversal in com_media to RCE Two CVEs

KIEN HOANG 67 Nov 09, 2022