Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Last update: Dec 06, 2022

Related tags

Overview

anlp21

Course materials for "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley) Syllabus: http://people.ischool.berkeley.edu/~dbamman/info256.html

Notebook	Description
1.words/EvaluateTokenizationForSentiment	The impact of tokenization choices on sentiment classification.
1.words/ExploreTokenization	Different methods for tokenizing texts (whitespace, NLTK, spacy, regex)
1.words/TokenizePrintedBooks	Design a better tokenizer for printed books
1.words/Text_Complexity	Implement type-token ratio and Flesch-Kincaid Grade Level scores for text
2.compare/ChiSquare, Mann-Whitney Tests	Explore two tests for finding distinctive terms
2.compare/Log-odds ratio with priors	Implement the log-odds ratio with an informative (and uninformative) Dirichlet prior
3.dictionaries/DictionaryTimeSeries	Plot sentiment over time using human-defined dictionaries
3.dictionaries/Empath	Explore using Empath dictionaries to characterize texts
4.embeddings/DistributionalSimilarity	Explore distributional hypothesis to build high-dimensional, sparse representations for words
4.embeddings/WordEmbeddings	Explore word embeddings using Gensim
4.embeddings/Semaxis	Implement SemAxis for scoring terms along a user-defined axis (e.g., positive-negative, concrete-abstract, hot-cold),
4.embeddings/BERT	Explore the basics of token representations in BERT and use it to find token nearest neighbors
4.embedings/SequenceEmbeddings	Use sequence embeddings to find TV episode summaries most similar to a short description
5.eda/WordSenseClustering	Inferring distinct word senses using KMeans clustering over BERT representations
5.eda/Haiku KMeans	Explore text representation in clustering by trying to group haiku and non-haiku poems into two distinct clusters

Data and code to support "Applied Natural Language Processing" (INFO 256, Fall 2021, UC Berkeley)

Related tags

Overview

anlp21

Owner

David Bamman

ADCS - Automatic Defect Classification System (ADCS) for SSMC

Sequence-to-Sequence Framework in PyTorch

Switch spaces for knowledge graph embeddings

iSTFTNet : Fast and Lightweight Mel-spectrogram Vocoder Incorporating Inverse Short-time Fourier Transform

A highly sophisticated sequence-to-sequence model for code generation

A paper list of pre-trained language models (PLMs).

pysentimiento: A Python toolkit for Sentiment Analysis and Social NLP tasks

Enterprise Scale NLP with Hugging Face & SageMaker Workshop series

Deduplication is the task to combine different representations of the same real world entity.

This repository implements a brute-force spellchecker utilizing the Damerau-Levenshtein edit distance.

Revisiting Pre-trained Models for Chinese Natural Language Processing (Findings of EMNLP 2020)

Python bindings to the dutch NLP tool Frog (pos tagger, lemmatiser, NER tagger, morphological analysis, shallow parser, dependency parser)

Linear programming solver for paper-reviewer matching and mind-matching

NLP tool to extract emotional phrase from tweets 🤩

Simple Annotated implementation of GPT-NeoX in PyTorch

Text to speech for Vietnamese, ez to use, ez to update

Comprehensive-E2E-TTS - PyTorch Implementation

A simple tool to update bib entries with their official information (e.g., DBLP or the ACL anthology).

MEDIALpy: MEDIcal Abbreviations Lookup in Python

auto_code_complete is a auto word-completetion program which allows you to customize it on your need