Natural Language Processing - Sommer Semester 2022

Overview

Natural Language Processing (DIS25a/NLP)

This course can be taken for the Bachelor Programm Data and Information Science (DIS25a) or the Master Program Digital Sciences (NLP).

After easter all sessions are hosted at TH Köln, Claudiusstraße 1. The sessions will be held life. Slides will be usually available a night before the actual lecture. We try to record all lectures and tutorials for later referal (not sure how this works out with the sessions at Claudiusstraße).

Schedule for Summer Semester 2022

(L) Lectures; (T) Tutorials; (P) Project

The first lectures and tutorial were recorded and are available online. The password is the same as for the Zoom sessions.

Date Slot 13:30h Slot 15:15h DIS25a (DIS B.Sc.) NLP (DS M.Sc.)
1.4.2022 Introduction and Overview (L) Basic Text Processing (L) x x
8.4.2022 Basic NLP Pipeline: NLTK (T) (solution) Common Toolkit: Spacy (T) (solution) x x
15.4.2022 no lecture
22.4.2022 WordNet (L) Vector Semantics (L) x x
29.4.2022 WordNet, GermaNet (T) (solution) Vector Semantics (T) (solution) x x
6.5.2022 Information Extraction (L) Sentiment Analysis (L) x x
13.5.2022 no lecture
20.5.2022 Language Models and Ethics in NLP (L) Group assignment (P) x x
27.5.2022 Group work (P) Group work (P) x
3.6.2022 Data Programming for IE (L) Group work (P) / Oral Exam Master x x
10.6.2022 Guest Lecture: Dimitar Dimitrov(L) Group work (P) x
17.6.2022 Group work (P) Group work (P) x
24.6.2022 Student talks - Project presentation (P) Student talks - Project presentation (P) x
31.8.2022 Submission of term papers x

Bachelor: Group Assignments

In the group assignments a group of four students has to work on a bias-related topic with a specific focus and on one of three datasets. In the group work phases starting on 20.5.2022 we will be available during the lecture time to help and advise.

In the presentations on 24.6.2022 you are expected to present a concept regarding your specific topic and dataset. Please decribe the motivation, the dataset, your methods and NLP pipeline, a working prototype and some first insights and results.

The feedback gathered during the presentation should be used to write a final term paper on your specific topic and work. Please read the guidelines for the term paper.

Datasets

Choose one of the following datasets to work on:

Topics

Choose one of the following topcis:

Gender Bias

Gender bias is a group bias in which different genders are represented differently in terms of an aspect in a given (set of) document(s) than expected. Aspects for which there can be a bias range from quantitative measures (e.g., how many documents have male/female authors) to more complex NLP measures (e.g., different sentiments in texts about male/female politicians or topical bias, different distributions of topics in texts geared towards male/female readers).

Exaples for papers that investigate gender bias:

Ethnic Bias

Like gender bias, ethnic or racial bias describes bias towards groups of people belonging to an ethnical (or religious) group. Ethnic bias includes harmful stereotypes and less blatant but still dangerous aspects like topical bias. Detecting ethnic bias is not only important because it may lead to even more severe instances of racism, and it is an infringement of the constitutional right to equal treatment.

Exaples for papers that investigate ethnic bias:

Non-Neutral Speech

Non-neutral language consists of many aspects of language that is subjective, opinionated, or otherwise implies valuation. This includes toxicity, ranging from forms of hate speech such as racism, incivility, profane, offensive and aggressive language to over-positive praises. Non-neutral language is especially problematic when it appears in types of documents that claim to be neutral, such as wikipedia or (public) news. A related concept is framing bias, defined as the use of subjective words or phrases linked with a particular opinion.

Exaples for papers that investigate non-neutral language:

Stance Detection

Stance is a concept that describes an opinion on a subject, most often in a political context. The goal of stance detection is to detect the stances of users/authors towards these subjects. Often, the subjects are known due to context (for example, abortion, weapon laws and gay marriage in political texts) or they have to be determined using approaches like entity recognition. A related concept is that of target-dependent or aspect-based sentiment analysis, in which the opinions on aspects (targets) are detected.

Exaples for papers that investigate stance detection:

Owner
Classrooms of IR Group at Technische Hochschule Köln
Classrooms of IR Group at Technische Hochschule Köln
Provides script to download and format public IP lists related to the Log4j exploit.

Provides script to download and format public IP lists related to the Log4j exploit. Current format includes: plain list, Cisco ASA Network Group.

Gianluca Ulivi 1 Jan 02, 2022
Hack computer in the form of RAR files from all types of clients, even Linux

Program Features 📌 Hide malware 📌 Vulnerability software vulnerabilities RAR 📌 Creating malware 📌 Access client files 📌 Client Hacking 📌 Link Do

hack4lx 5 Nov 25, 2022
Js File Scanner This is Js File Scanner

Js File Scanner This is Js File Scanner . Which are scan in js file and find juicy information Toke,Password Etc.

122 Dec 12, 2022
Selamat Datang DiTools Crack-Old, Crack Old Adalah Sebuah Crack Tanpa Login Dan Crack Menggunakan Akun Facebook Tua/Old.

Selamat Datang DiTools Crack-Old, Crack Old Adalah Sebuah Crack Tanpa Login Dan Crack Menggunakan Akun Facebook Tua/Old. ([Welcome to Crack-Old Tools, Old Crack Is A Crack Without Login And Crack Usi

Risky [ Zero Tow ] 7 Dec 25, 2022
Collection Of Discord Hacking Tools / Fun Stuff / Exploits That Is Completely Made Using Python.

Venom Collection Of Discord Hacking Tools / Fun Stuff / Exploits That Is Completely Made Using Python. Report Bug · Request Feature Contributing Well,

PndaBoi 25 Dec 06, 2022
A web-app helping to create strong passwords that are easy to remember.

This is a simple Web-App that demonstrates a method of creating strong passwords that are still easy to remember. It also provides time estimates how long it would take an attacker to crack a passwor

2 Jun 04, 2021
Phishing-Crack tools to punish friends

Phishing-Crack Phishing Tool Version 1.0.0 Created By temirovazat A Phishing Tool With PHP and Python3 Features Fake Instagram Phishing Page Fake Face

3 Oct 04, 2022
WhPhisher: a Phishing tool With Python

WhPhisher Herramienta para hacer phishing con muchos métodos de túneling -----Como Instalarlo------- pkg install python3 pkg install git git clone htt

WhBeatZ 80 Jan 02, 2023
Vulmap 是一款 web 漏洞扫描和验证工具, 可对 webapps 进行漏洞扫描, 并且具备漏洞利用功能

Vulmap 是一款 web 漏洞扫描和验证工具, 可对 webapps 进行漏洞扫描, 并且具备漏洞利用功能

之乎者也 2.8k Dec 29, 2022
Phoenix Framework is an environment for writing, testing and using exploit code.

Phoenix Framework is an environment for writing, testing and using exploit code. 🖼 Screenshots 🎪 Community PwnWiki Forums 🔑 Licen

42 Aug 09, 2022
Proof of concept GnuCash Webinterface

Proof of Concept GnuCash Webinterface This may one day be a something truly great. Milestones [ ] Browse accounts and view transactions [ ] Record sim

Josh 14 Dec 28, 2022
A python package with tools to read and postprocess the output of the channel DNS-solver (davecats/channel), as well as its associated postprocessing tools.

Python tools for davecats/channel A python package with tools to read and postprocess the output of the channel dns solver, as well as its associated

Andrea Andreolli 1 Dec 13, 2021
Grafana-POC(CVE-2021-43798)

Grafana-Poc 此工具请勿用于违法用途。 一、使用方法:python3 grafana_hole.py 在domain.txt中填入ip:port 二、漏洞影响范围 影响版本: Grafana 8.0.0 - 8.3.0 安全版本: Grafana 8.3.1, 8.2.7, 8.1.8,

8 Jan 03, 2023
"Video Moment Retrieval from Text Queries via Single Frame Annotation" in SIGIR 2022.

ViGA: Video moment retrieval via Glance Annotation This is the official repository of the paper "Video Moment Retrieval from Text Queries via Single F

Ran Cui 38 Dec 31, 2022
DomainMonitor is a web project that has a RESTful API to get a domain's subdomains and whois data.

DomainMonitor is a web project that has a RESTful API to get a domain's subdomains and whois data.

2 Feb 05, 2022
Show apps recorded storage files by jailbreak

0x101 Show registered storage files of apps by jailbreak Legal disclaimer: Usage of insTof for attacking targets without prior mutual consent is illeg

0x 4 Oct 24, 2022
Mass Shortlink Bypass Merupakan Tools Yang Akan Bypass Shortlink Ke Tujuan Asli, Dibuat Dengan Python 3

Shortlink-Bypass Mass Shortlink Bypass Merupakan Tools Yang Akan Bypass Shortlink Ke Tujuan Asli, Dibuat Dengan Python 3 Support Shortlink tii.ai/tei.

Wan Naz ID 6 Oct 24, 2022
Script to calculate Active Directory Kerberos keys (AES256 and AES128) for an account, using its plaintext password

Script to calculate Active Directory Kerberos keys (AES256 and AES128) for an account, using its plaintext password

Matt Creel 27 Dec 20, 2022
A fully automated, accurate, and extensive scanner for finding vulnerable log4j hosts

log4j-scan A fully automated, accurate, and extensive scanner for finding vulnerable log4j hosts Features Support for lists of URLs. Fuzzing for more

Duc Linh Nguyen 4 Aug 08, 2022
集成crawlergo、xray、dirsearch、nmap等工具的src漏洞挖掘工具,使用docker封装运行;

tools下有几个工具,所以项目文件比较大,如果下载总是中断的话建议拆开下载各个项目然后直接拷贝dockefile和recon.py即可 0x01 hscan介绍 hscan是什么 hscan是一款旨在使用一条命令替代渗透前的多条扫描命令,通过集成crawlergo扫描和xray扫描、dirsear

102 Jan 04, 2023