Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
Osint-Tool - Information collection tool in python

Osint-Tool Herramienta para la recolección de información Pronto más opciones In

3 Apr 09, 2022
Sudo Baron Samedit Exploit

CVE-2021-3156 (Sudo Baron Samedit) This repository is CVE-2021-3156 exploit targeting Linux x64. For writeup, please visit https://datafarm-cybersecur

Worawit Wang 559 Jan 03, 2023
An advanced multi-threaded, multi-client python reverse shell for hacking linux systems

PwnLnX An advanced multi-threaded, multi-client python reverse shell for hacking linux systems. There's still more work to do so feel free to help out

0xTRAW 212 Dec 24, 2022
FBGen is simple facebook user based wordlist generator using Username/ID and cookie.

FBGen is simple facebook user based wordlist generator using Username/ID and cookie.

2 Jul 20, 2022
ONT Analysis Toolkit (OAT)

A toolkit for monitoring ONT MinION sequencing, followed by data analysis, for viral genomes amplified with tiled amplicon sequencing.

6 Jun 14, 2022
Analyse a forensic target (such as a directory) to find and report files found and not found from CIRCL hashlookup public service

Analyse a forensic target (such as a directory) to find and report files found and not found from CIRCL hashlookup public service. This tool can help a digital forensic investigator to know the conte

hashlookup 96 Dec 20, 2022
GitLab CE/EE Preauth RCE using ExifTool

CVE-2021-22205 GitLab CE/EE Preauth RCE using ExifTool This project is for learning only, if someone's rights have been violated, please contact me to

3ND 164 Dec 10, 2022
wsvuls - website vulnerability scanner detect issues [ outdated server software and insecure HTTP headers.]

WSVuls Website vulnerability scanner detect issues [ outdated server software and insecure HTTP headers.] What's WSVuls? WSVuls is a simple and powerf

Anouar Ben Saad 47 Sep 22, 2022
Generate malicious files using recently published bidi-attack (CVE-2021-42574)

CVE-2021-42574 - Code generator Generate malicious files using recently published bidi-attack vulnerability, which was discovered in Unicode Specifica

js-on 7 Nov 09, 2022
proxyshell payload generate

Py Permutative Encoding https://docs.microsoft.com/en-us/openspecs/office_file_formats/ms-pst/5faf4800-645d-49d1-9457-2ac40eb467bd Generate proxyshell

Evi1cg 63 Nov 15, 2022
A hashtag check python module

A hashtag check python module

Fayas Noushad 3 Aug 10, 2022
A Feature Rich Modular Malware Configuration Extraction Utility for MalDuck

Malware Configuration Extractor A Malware Configuration Extraction Tool and Modules for MalDuck This project is FREE as in FREE 🍺 , use it commercial

c3rb3ru5 103 Dec 18, 2022
Script checks provided domains for log4j vulnerability

log4j Script checks provided domains for log4j vulnerability. A token is created with canarytokens.org and passed as header at request for a single do

Matthias Nehls 2 Dec 12, 2021
A Python & JavaScript Obfuscator made in Python 3.

Python Code Obfuscator A script that converts code into full on random numerical expressions. Simple Scripts: Python Mode... Input: Function that deco

Karim 3 Mar 24, 2022
IDA loader for Apple's iBoot, SecureROM and AVPBooter

IDA iBoot Loader IDA loader for Apple's iBoot, SecureROM and AVPBooter Installation Copy iboot-loader.py to the loaders folder in IDA directory. Credi

matteyeux 74 Dec 23, 2022
Raphael is a vulnerability scanning tool based on Python3.

Raphael Raphael是一款基于Python3开发的插件式漏洞扫描工具。 Raphael is a vulnerability scanning too

b4zinga 5 Mar 21, 2022
Writeups for wtf-CTF hosted by Manipal Information Security Team as part of Techweek2021- INCOGNITO

wtf-CTF_Writeups Table of Contents Table of Contents Crypto Misc Reverse Pwn Web Crypto wtf_Bot Author: Madjelly Join the discord server!You know how

6 Jun 07, 2021
使用golang重写开源工具wafw00f

GO-WAFW00F 介绍 WAFW00F是一款优秀的web应用防火墙识别开源工具:https://github.com/EnableSecurity/wafw00f 使用Golang重写的原因:Python环境配置不便利,Golang打包生成可执行文件直接运行 目前还在开发阶段,规则解析存在小问题

80 Dec 30, 2021
Proof of concept for CVE-2021-24086, a NULL dereference in tcpip.sys triggered remotely.

CVE-2021-24086 This is a proof of concept for CVE-2021-24086 ("Windows TCP/IP Denial of Service Vulnerability "), a NULL dereference in tcpip.sys patc

Axel Souchet 220 Dec 14, 2022
an impacket-dependent script exploiting CVE-2019-1040

dcpwn an impacket-dependent script exploiting CVE-2019-1040, with code partly borrowed from those security researchers that I'd like to say thanks to.

QAX A-Team 71 Nov 30, 2022