Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
Cve-2021-22005-exp

cve-2021-22005-exp 0x01 漏洞简介 2021年9月21日,VMware发布安全公告,公开披露了vCenter Server中的19个安全漏洞,这些漏洞的CVSSv3评分范围为4.3-9.8。 其中,最为严重的漏洞为vCenter Server 中的任意文件上传漏洞(CVE-20

Jing Ling 146 Dec 31, 2022
SonicWall SMA-100 Unauth RCE Exploit (CVE-2021-20038)

Bad Blood Bad Blood is an exploit for CVE-2021-20038, a stack-based buffer overflow in the httpd binary of SMA-100 series systems using firmware versi

Jake Baines 80 Dec 29, 2022
Malware for Discord, designed to steal passwords, tokens, and inject discord folders for long-term use.

Vital What is Vital? Vital is malware primarily used to collect and extract information from the Discord desktop client. While it has other features (

HellSec 59 Dec 01, 2022
LaxrFar Python Obfuscator

LaxrFar Python Obfuscator Usage First do the things from "Upload to Webserver" o

LaxrFar 5 Jul 19, 2022
Using python 3 and Flask an MVC system where the AES 128 CBC and Trivium algorithms

This project was developed using python 3 and Flask, it is an MVC system where the AES 128 CBC and Trivium algorithms can be tested through a communication between the computer and a device such as a

Brandon Israel Camacho Reyes 1 Dec 26, 2021
An intranet tool for easily intranet pentesting

IntarKnife v1.0 a tool can be used in intarnet for easily pentesting moudle hash spray U can use this tool to spray hash on a webshell IntraKnife.exe

4 Nov 24, 2021
PreviewGram is for users that wants get a more private experience with the Telegram's Channel.

PreviewGram is for users that wants get a more private experience with the Telegram's Channel.

1 Sep 25, 2022
Python library to remotely extract credentials on a set of hosts.

Python library to remotely extract credentials on a set of hosts.

Pixis 1.5k Dec 31, 2022
SpiderFoot automates OSINT collection so that you can focus on analysis.

SpiderFoot is an open source intelligence (OSINT) automation tool. It integrates with just about every data source available and utilises a range of m

Steve Micallef 9k Jan 08, 2023
These are Simple python scripts to test/scan your network

Disclaimer This tool is for Educational purpose only. We do not promote or encourage any illegal activities. Summary These are Simple python scripts t

Varun Jagtap 5 Oct 08, 2022
A script to search, scrape and scan for Apache Log4j CVE-2021-44228 affected files using Google dorks

Log4j dork scanner This is an auto script to search, scrape and scan for Apache Log4j CVE-2021-44228 affected files using Google dorks. Installation:

Jagar 5 Dec 27, 2022
Laravel RCE (CVE-2021-3129)

CVE-2021-3129 - Laravel RCE About The script has been made for exploiting the Laravel RCE (CVE-2021-3129) vulnerability. This script allows you to wri

Joshua van der Poll 21 Dec 27, 2022
Just another script for automatize boolean-based blind SQL injections.

SQL Blind Injection Tool A script for automatize boolean-based blind SQL injections. Works with SQLite at least, supports using cookies. It uses bitwi

RIM 51 Dec 15, 2022
Password-Manager - This app can generate ,save , find and delete passwords.

Password-Manager This app can generate ,save , find and delete passwords. In the StartUp() Function , there are three buttons to choose from : Generat

1 Jan 01, 2022
IPscan - This Script is Framework To automate IP process large scope For Bug Hunting

IPscan This Script is Framework To automate IP process large scope For Bug Hunti

0xd2rdir 8 Mar 12, 2022
This exploit allows to connect to the remote RemoteMouse 3.008 service to virtually press arbitrary keys and execute code on the machine.

RemoteMouse-3.008-Exploit The RemoteMouse application is a program for remotely controlling a computer from a phone or tablet. This exploit allows to

Podalirius 25 Dec 04, 2022
Safe Policy Optimization with Local Features

Safe Policy Optimization with Local Feature (SPO-LF) This is the source-code for implementing the algorithms in the paper "Safe Policy Optimization wi

Akifumi Wachi 6 Jun 05, 2022
SSLyze is a fast and powerful SSL/TLS scanning tool and Python library.

SSLyze SSLyze is a fast and powerful SSL/TLS scanning tool and Python library. SSLyze can analyze the SSL/TLS configuration of a server by connecting

Alban Diquet 2.8k Jan 03, 2023
The RDT protocol (RDT3.0,GBN,SR) implementation and performance evaluation code using socket

소켓을 이용한 RDT protocols (RDT3.0,GBN,SR) 구현 및 성능 평가 코드 입니다. 코드를 실행할때 리시버를 먼저 실행하세요. 성능 평가 코드는 패킷 전송 과정을 제외하고 시간당 전송률을 출력합니다. RDT3.0 GBN SR(버그 발견으로 구현중 입니

kimtaeyong98 0 Dec 20, 2021
A collection of write-ups and solutions for Cyber FastTrack Spring 2021.

IMPORTANT: Please contact us before you use any styling or content shown here! Cyber FastTrack Spring 2021 / National Cyber Scholarship Competition -

Alice 48 Aug 28, 2022