Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
NEW FACEBOOK CLONER WITH NEW PASSWORD, TERMUX FB CLONE, FB CLONING COMMAND. M

NEW FACEBOOK CLONER WITH NEW PASSWORD, TERMUX FB CLONE, FB CLONING COMMAND. M

Mr. Error 81 Jan 08, 2023
An automated header extensive scanner for detecting log4j RCE CVE-2021-44228

log4j An automated header extensive scanner for detecting log4j RCE CVE-2021-44228 Usage $ python3 log4j.py -l urls.txt --dns-log REPLACE_THIS.dnslog.

2 Dec 16, 2021
Simulating Log4j Remote Code Execution (RCE) vulnerability in a flask web server using python's logging library with custom formatter that simulates lookup substitution by executing remote exploit code.

py4jshell Simulating Log4j Remote Code Execution (RCE) CVE-2021-44228 vulnerability in a flask web server using python's logging library with custom f

Narasimha Prasanna HN 86 Aug 21, 2022
A script to search, scrape and scan for Apache Log4j CVE-2021-44228 affected files using Google dorks

Log4j dork scanner This is an auto script to search, scrape and scan for Apache Log4j CVE-2021-44228 affected files using Google dorks. Installation:

Jagar 5 Dec 27, 2022
A Python Scanner for log4j

log4j-Scanner scanner for log4j cat web-urls.txt | python3 log4j.py ID.burpcollaborator.net web-urls.txt http://127.0.0.1:8080 https://www.google.c

Ihebski 5 Jun 26, 2022
A tool combined with the advantages of masscan and nmap

A tool combined with the advantages of masscan and nmap

59 Dec 24, 2022
Malware for Discord, designed to steal passwords, tokens, and inject discord folders for long-term use.

Vital What is Vital? Vital is malware primarily used to collect and extract information from the Discord desktop client. While it has other features (

HellSec 59 Dec 01, 2022
Phoenix Framework is an environment for writing, testing and using exploit code.

Phoenix Framework is an environment for writing, testing and using exploit code. 🖼 Screenshots 🎪 Community PwnWiki Forums 🔑 Licen

42 Aug 09, 2022
Raphael is a vulnerability scanning tool based on Python3.

Raphael Raphael是一款基于Python3开发的插件式漏洞扫描工具。 Raphael is a vulnerability scanning too

b4zinga 5 Mar 21, 2022
A Python Tool that uses Shodan API's to perform quick recon for vulnerabilities

Shodan Quick Recon A Python Tool that uses Shodan API's to perform quick recon for vulnerabilities Configuration You must edit the python code, and in

Black Hat Ethical Hacking 5 Aug 09, 2022
macOS Initial Access Payload Generator

Mystikal macOS Initial Access Payload Generator Related Blog Post: https://posts.specterops.io/introducing-mystikal-4fbd2f7ae520 Usage: Install Xcode

Leo Pitt 206 Dec 31, 2022
Web-eyes - OSINT tools for website research

WEB-EYES V1.0 web-eyes: OSINT tools for website research, 14 research methods ar

8 Nov 10, 2022
A bitcoin private keys brute-forcing tool. Educational purpose only.

BitForce A bitcoin private keys brute-forcing tool. If you have an average computer, his will take decades to find a private key with balance. Run Mak

Gilad Leef 2 Dec 20, 2022
🐎🖥《赛马娘》(ウマ娘: Pretty Derby)辅助脚本

auto-derby 自动化养马 育成结果 Nurturing result 功能 支持客户端 DMM (前台) 实验性 安卓 ADB 连接(后台)开发基于 1080x1920 分辨率 团队赛 (Team race) 有胜利确定奖励时吃帕菲 日常赛 (Daily race) PvP 活动赛 (Cha

NateScarlet 376 Jan 01, 2023
python写的一款免杀工具(shellcode加载器)BypassAV,国内杀软全过(windows denfend)

python写的一款免杀工具(shellcode加载器)BypassAV,国内杀软全过(windows denfend)

1frame 266 Jan 02, 2023
CC CAMERA HACKING TOOL

CAM-HACK CC CAMERA HACKING TOOL Installation On Termux $ apt update

Aryan 10 Sep 25, 2022
An auxiliary tool for iot vulnerability hunter

firmeye - IoT固件漏洞挖掘工具 firmeye 是一个 IDA 插件,基于敏感函数参数回溯来辅助漏洞挖掘。我们知道,在固件漏洞挖掘中,从敏感/危险函数出发,寻找其参数来源,是一种很有效的漏洞挖掘方法,但程序中调用敏感函数的地方非常多,人工分析耗时费力,通过该插件,可以帮助排除大部分的安全

Firmy Yang 171 Nov 28, 2022
This is tools hacking for scan vuln in port web, happy using

Xnuvers007 PortInjection this is tools hacking for scan vuln in port web, happy using view/show python 3.9 solo coder (tangerang) 19 y/o installation

XnuxersXploitXen 6 Dec 24, 2022
Automated tool to exploit basic buffer overflow remotely and locally & x32 and x64

Automated tool to exploit basic buffer overflow (remotely or locally) & (x32 or x64)

5 Oct 09, 2022
SubFind - Subdomain Finder Tools

SubFind (Subdomain Finder Tools) Info Tools Result Of Subdomain Command In Termi

LangMurpY 2 Jan 25, 2022