Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
GDID (Google Dorks for Information Disclosure)

GDID (Google Dorks for Information Disclosure) Script made for your recon automation in Bug Bounty or Pentest. It will help you to find Information Di

Nischacid 5 Mar 10, 2022
Recon is a script to perform a full recon on a target with the main tools to search for vulnerabilities.

👑 Recon 👑 The step of recognizing a target in both Bug Bounties and Pentest can be very time-consuming. Thinking about it, I decided to create my ow

Dirso 171 Dec 31, 2022
An auxiliary tool for iot vulnerability hunter

firmeye - IoT固件漏洞挖掘工具 firmeye 是一个 IDA 插件,基于敏感函数参数回溯来辅助漏洞挖掘。我们知道,在固件漏洞挖掘中,从敏感/危险函数出发,寻找其参数来源,是一种很有效的漏洞挖掘方法,但程序中调用敏感函数的地方非常多,人工分析耗时费力,通过该插件,可以帮助排除大部分的安全

Firmy Yang 171 Nov 28, 2022
HTTP security headers for Flask

Talisman: HTTP security headers for Flask Talisman is a small Flask extension that handles setting HTTP headers that can help protect against a few co

Google Cloud Platform 854 Dec 30, 2022
A honey token manager and alert system for AWS.

SpaceSiren SpaceSiren is a honey token manager and alert system for AWS. With this fully serverless application, you can create and manage honey token

287 Nov 09, 2022
Utility for Extracting all passwords from ConnectWise Automate

CWA Password Extractor Utility for Extracting all passwords from ConnectWise Automate (E.g. while migrating to a new system). Outputs a csv file with

Matthew Kyles 1 Dec 09, 2021
Malware for Discord, designed to steal passwords, tokens, and inject discord folders for long-term use.

Vital What is Vital? Vital is malware primarily used to collect and extract information from the Discord desktop client. While it has other features (

HellSec 59 Dec 01, 2022
This is python script that will extract the functions call in all used DLL in an executable and then provide a mapping of those functions to the attack classes defined and curated malapi.io.

F2Amapper This is python script that will extract the functions call in all used DLL in an executable and then provide a mapping of those functions to

Ajit Kumar 3 Sep 03, 2022
宝塔面板Windows版提权方法

宝塔面板Windows提权方法 本项目整理一些宝塔特性,可以在无漏洞的情况下利用这些特性来增加提权的机会。

298 Dec 14, 2022
Spring4Shell - Spring Core RCE - CVE-2022-22965

Spring Core RCE - CVE-2022-22965 After Spring Cloud, on March 29, another heavyweight vulnerability of Spring broke out on the Internet: Spring Core R

Malte Gejr 118 Dec 31, 2022
Lnkbomb - Malicious shortcut generator for collecting NTLM hashes from insecure file shares

Lnkbomb Lnkbomb is used for uploading malicious shortcut files to insecure file

Joe Helle 216 Jan 08, 2023
A Python 3 script that uploads a tasks.pickle file that enables RCE in MotionEye

MotionEye/MotionEyeOS Authenticated RCE A Python 3 script that uploads a tasks.pickle file that enables RCE in MotionEye. You need administrator crede

Matt 1 Apr 18, 2022
Experimental musig2 python code, not for production use!

musig2-py Experimental musig2 python code, not for production use! This is just for testing things out. All public keys are encoded as 32 bytes, assum

Samuel Dobson 14 Jul 08, 2022
A high-performance DNS stub resolver for bulk lookups and reconnaissance (subdomain enumeration)

MassDNS A high-performance DNS stub resolver MassDNS is a simple high-performance DNS stub resolver targeting those who seek to resolve a massive amou

B. Blechschmidt 2.5k Jan 07, 2023
Notebooks, slides and dataset of the CorrelAid Machine Learning Winter School

CorrelAid Machine Learning Spring School Welcome to the CorrelAid ML Spring School! In this repository you can find the slides and other files for the

CorrelAid 12 Nov 23, 2022
Lazarus analysis tools and research report

Lazarus Research This repository publishes analysis reports and analysis tools for Operation Dream Job and Operation JTrack for Lazarus. Tools Python

JPCERT Coordination Center 50 Sep 13, 2022
KeyLogger

By-Emirhan KeyLogger Hangi Sistemlerde Çalışır? | On Which Systems Does It Work? KALİ LİNUX UBUNTU PARDUS MİNT TERMUX ARCH YÜKLEME & ÇALIŞTIRMA KOMUTL

2 Feb 24, 2022
Coerce authentication from Windows hosts via MS-FSRVP (Requires FS-VSS-AGENT service running on host)

VSSTrigger Coerce authentication from Windows hosts via MS-FSRVP (Requires FS-VS

Filip Dragovic 6 Jul 24, 2022
A Python Bytecode Disassembler helping reverse engineers in dissecting Python binaries

A Python Bytecode Disassembler helping reverse engineers in dissecting Python binaries by disassembling and analyzing the compiled python byte-code(.pyc) files across all python versions (including P

neeraj 95 Dec 26, 2022
Complet and easy to run Port Scanner with Python

Port_Scanner Complet and easy to run Port Scanner with Python Installation 1- git clone https://github.com/s120000/Port_Scanner 2- cd Port_Scanner 3-

1 May 19, 2022