Proposed n-stage Latent Dirichlet Allocation method - A Novel Approach for LDA

Overview

n-stage Latent Dirichlet Allocation (n-LDA)

Proposed n-LDA & A Novel Approach for classical LDA

Latent Dirichlet Allocation (LDA) is a generative probabilistic topic model for a given text collection. Topics have a probability distribution over words and text documents over topics. Each subject has a probability distribution over the fixed word corpus [1]. The method exemplifies a mix of these topics for each document. Then, a model is produced by sampling words from this mixture [2].

The coherence value, which is the topic modeling criterion, is used to determine the number of K topic in the system. The coherence value calculates the closeness of words to each other. The topic value of the highest one among the calculated consistency values is chosen as the topic number of the system [3].

After modeling the system with classical LDA, an LDA-based n-stage method is proposed to increase the success of the model. The value of n in the method may vary according to the size of the dataset. With the method, it is aimed to delete the words in the corpus that negatively affect the success. Thus, with the increase in the weight values of the words in the topics formed with the remaining words, the class labels of the topics can be determined more easily [4].

image

The steps of the method are shown in above Figure. In order to reduce the number of words in the dictionary, the threshold value for each topic is calculated. The threshold value is obtained by dividing the sum of the weights of all the words to the word count in the relevant topic. Words with a weight less than the specified threshold value are deleted from the topics and a new dictionary is created for the model. Finally, the system is re-modeled using the LDA algorithm with the new dictionary. These steps can be repeated n times [4].

This method was applied for Turkish and English language. n-stage LDA method was better than classic LDA according to related studies.

Related papers & articles for n-stage LDA

!!! Please citation first paper:

@inproceedings{guven2019comparison,
  title={Comparison of Topic Modeling Methods for Type Detection of Turkish News},
  author={G{\"u}ven, Zekeriya Anil and Diri, Banu and {\c{C}}akalo{\u{g}}lu, Tolgahan},
  booktitle={2019 4th International Conference on Computer Science and Engineering (UBMK)},
  pages={150--154},
  year={2019},
  organization={IEEE}
  doi={10.1109/UBMK.2019.8907050}
}

1-Guven, Z. A., Diri, B., & Cakaloglu, T. (2018, October). Classification of New Titles by Two Stage Latent Dirichlet Allocation. In 2018 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1-5). Ieee.

2-Guven, Z. A., Diri, B., & Cakaloglu, T. (2021). Evaluation of Non-Negative Matrix Factorization and n-stage Latent Dirichlet Allocation for Emotion Analysis in Turkish Tweets. arXiv preprint arXiv:2110.00418.

3-Güven, Z. A., Diri, B., & Çakaloğlu, T. (2020). Comparison of n-stage Latent Dirichlet Allocation versus other topic modeling methods for emotion analysis. Journal of the Faculty of Engineering and Architecture of Gazi University, 35(4), 2135-2146.

4-Güven, Z. A., Diri, B., & Çakaloğlu, T. (2018, April). Classification of TurkishTweet emotions by n-stage Latent Dirichlet Allocation. In 2018 Electric Electronics, Computer Science, Biomedical Engineerings' Meeting (EBBT) (pp. 1-4). IEEE.

5-Güven, Z. A., Diri, B., & Çakaloğlu, T. (2019, September). Comparison of Topic Modeling Methods for Type Detection of Turkish News. In 2019 4th International Conference on Computer Science and Engineering (UBMK) (pp. 150-154). IEEE.

6-GÜVEN, Z. A., Banu, D. İ. R. İ., & ÇAKALOĞLU, T. (2019). Emotion Detection with n-stage Latent Dirichlet Allocation for Turkish Tweets. Academic Platform Journal of Engineering and Science, 7(3), 467-472.

7-Güven, Z. A., Diri, B., & Çakaloğlu, T. Comparison Method for Emotion Detection of Twitter Users. In 2019 Innovations in Intelligent Systems and Applications Conference (ASYU) (pp. 1-5). IEEE.

References

[1] David M. Blei, Andrew Y. Ng, and Michael I. Jordan. Latent Dirichlet allocation.Journal of Machine LearningResearch, 2003. ISSN 15324435. doi:10.1016/b978-0-12-411519-4.00006-9.

[2] Yong Chen, Hui Zhang, Rui Liu, Zhiwen Ye, and Jianying Lin.Experimental explorations on short texttopic mining between LDA and NMF based Schemes.Knowledge-Based Systems, 2019. ISSN 09507051.doi:10.1016/j.knosys.2018.08.011.

[3] Zekeriya Anil Güven, Banu Diri, and Tolgahan Çakaloˇglu. Classification of New Titles by Two Stage Latent DirichletAllocation. InProceedings - 2018 Innovations in Intelligent Systems and Applications Conference, ASYU 2018, 2018.ISBN 9781538677865. doi:10.1109/ASYU.2018.8554027.

[4] Guven, Zekeriya Anil, Banu Diri, and Tolgahan Cakaloglu. "Evaluation of Non-Negative Matrix Factorization and n-stage Latent Dirichlet Allocation for Emotion Analysis in Turkish Tweets." arXiv preprint arXiv:2110.00418 (2021).

Owner
Anıl Güven
Anıl Güven
Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training

Predicting lncRNA–protein interactions based on graph autoencoders and collaborative training Code for our paper "Predicting lncRNA–protein interactio

zhanglabNKU 1 Nov 29, 2022
Julia and Matlab codes to simulated all problems in El-Hachem, McCue and Simpson (2021)

Substrate_Mediated_Invasion Julia and Matlab codes to simulated all problems in El-Hachem, McCue and Simpson (2021) 2DSolver.jl reproduces the simulat

Matthew Simpson 0 Nov 09, 2021
Semantically Contrastive Learning for Low-light Image Enhancement

Semantically Contrastive Learning for Low-light Image Enhancement Here, we propose an effective semantically contrastive learning paradigm for Low-lig

48 Dec 16, 2022
Blender add-on: Add to Cameras menu: View → Camera, View → Add Camera, Camera → View, Previous Camera, Next Camera

Blender add-on: Camera additions In 3D view, it adds these actions to the View|Cameras menu: View → Camera : set the current camera to the 3D view Vie

German Bauer 11 Feb 08, 2022
This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their coordinates and detected labels.

This YoloV5 based model is fit to detect people and different types of land vehicles, and displaying their density on a fitted map, according to their

Liron Bdolah 8 May 22, 2022
This repository contains a Ruby API for utilizing TensorFlow.

tensorflow.rb Description This repository contains a Ruby API for utilizing TensorFlow. Linux CPU Linux GPU PIP Mac OS CPU Not Configured Not Configur

somatic labs 825 Dec 26, 2022
ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi-Object Segmentation

ClevrTex This repository contains dataset generation code for ClevrTex benchmark from paper: ClevrTex: A Texture-Rich Benchmark for Unsupervised Multi

Laurynas Karazija 26 Dec 21, 2022
Deep Learning and Logical Reasoning from Data and Knowledge

Logic Tensor Networks (LTN) Logic Tensor Network (LTN) is a neurosymbolic framework that supports querying, learning and reasoning with both rich data

171 Dec 29, 2022
[SIGGRAPH Asia 2019] Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning

AGIS-Net Introduction This is the official PyTorch implementation of the Artistic Glyph Image Synthesis via One-Stage Few-Shot Learning. paper | suppl

Yue Gao 102 Jan 02, 2023
A collection of resources on GAN Inversion.

This repo is a collection of resources on GAN inversion, as a supplement for our survey

Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models.

WECHSEL Code for WECHSEL: Effective initialization of subword embeddings for cross-lingual transfer of monolingual language models. arXiv: https://arx

Institute of Computational Perception 45 Dec 29, 2022
Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm under Mixed Illumination

Large Scale Multi-Illuminant (LSMI) Dataset for Developing White Balance Algorithm under Mixed Illumination (ICCV 2021) Dataset License This work is l

DongYoung Kim 33 Jan 04, 2023
Repositório criado para abrigar os notebooks com a listas de exercícios propostos pelo professor Gustavo Guanabara do canal Curso em Vídeo do YouTube durante o Curso de Python 3

Curso em Vídeo - Exercícios de Python 3 Sobre o repositório Este repositório contém os notebooks com a listas de exercícios propostos pelo professor G

João Pedro Pereira 9 Oct 15, 2022
This repository provides code for "On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness".

On Interaction Between Augmentations and Corruptions in Natural Corruption Robustness This repository provides the code for the paper On Interaction B

Meta Research 33 Dec 08, 2022
Robustness via Cross-Domain Ensembles

Robustness via Cross-Domain Ensembles [ICCV 2021, Oral] This repository contains tools for training and evaluating: Pretrained models Demo code Traini

Visual Intelligence & Learning Lab, Swiss Federal Institute of Technology (EPFL) 27 Dec 23, 2022
Breast cancer is been classified into benign tumour and malignant tumour.

Breast cancer is been classified into benign tumour and malignant tumour. Logistic regression is applied in this model.

1 Feb 04, 2022
Imagededup - 😎 Finding duplicate images made easy

imagededup is a python package that simplifies the task of finding exact and near duplicates in an image collection.

idealo 4.3k Jan 07, 2023
Code for: Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification

Imagine by Reasoning: A Reasoning-Based Implicit Semantic Data Augmentation for Long-Tailed Classification Prerequisite PyTorch = 1.2.0 Python3 torch

16 Dec 14, 2022
An end-to-end machine learning library to directly optimize AUC loss

LibAUC An end-to-end machine learning library for AUC optimization. Why LibAUC? Deep AUC Maximization (DAM) is a paradigm for learning a deep neural n

Andrew 75 Dec 12, 2022
Training a Resilient Q-Network against Observational Interference, Causal Inference Q-Networks

Obs-Causal-Q-Network AAAI 2022 - Training a Resilient Q-Network against Observational Interference Preprint | Slides | Colab Demo | Environment Setup

23 Nov 21, 2022