Exploring dimension-reduced embeddings

Last update: Nov 29, 2022

Related tags

Text Data & NLP sleepwalk

Overview

sleepwalk

Exploring dimension-reduced embeddings

This is the code repository. See here for the Sleepwalk web page.

License and disclaimer

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Comments

Error running sleepwalk: cannot open the connection
Dear sleepwalk developers, Thanks a lot for providing such nice method. I could install the package but I get the following error when I tried to run:

> sleepwalk([email protected][email protected], [email protected][email protected]) Estimating 'maxdist' for feature matrix 1 Server has been stopped. Server has been stopped. Error in app$openPage(useViewer, browser) : Timeout waiting for websocket. In addition: Warning messages: 1: In file(con, "r") : cannot open file 'sleepwalk_canvas.html': No such file or directory 2: In func(req) : File '/favicon.ico' is not found

I know this is probably not a sleepwalk specific error, but I couldn't find a solution for this. Any hints/help on how to fix this issue?

Also, I have a question about the output. Besides using the interactive mode to manually inspect cells that might be "misplaced" on the reduced-dimension space, I would like to systematically find the cells that don't quite fit to the clusters they were originally assigned to. In other words, how would you suggest to use sleepwalk to refine my clustering since I suspect that many of my cells were wrongly assigned to their clusters. I am using Seurat package to reduce dimension and clustering.

Thank you very much, Gustavo
opened by gufranca 2
Error: 'browser' must be a non-empty character string
Hello,

After calling the sleepwalk function on a Seurat object, I got this error:

> sleepwalk( as.matrix([email protected][email protected]), as.matrix([email protected][email protected]) ) Estimating 'maxdist' for feature matrix 1 Error in browseURL(str_c("http://localhost:", port, "/", pageobj$startPage), : 'browser' must be a non-empty character string

I have loaded the stringr library (containing the function str_c()), and I cannot find the file originating this error. Can I ask if someone had this problem at some point?

Thank you
opened by PedroRaposo 2
slw_on_selection error when sleepwalk is not attached

Running sleepwalk without attaching the package (i.e., NOT specifying library(sleepwalk)) like this works fine:

sleepwalk::sleepwalk(se[email protected][email protected], t([email protected][[email protected],]))

But the moment you select cells with your mouse, it crashed (browser tab closes) and R gives this error:

Error in slw_on_selection(selPoints, 1) : could not find function "slw_on_selection"

Loading the package using library(sleepwalk) solves the issue, but it'd be nice if it weren't necessary.

opened by FelixTheStudent 0
doc for comparison

The example on the web page for comparing two embeddings still uses the old version where both distances are used concurrently. We also need to change the explanation below to say that the same cell always has the same colour in all embeddings

opened by simon-anders 0
Suggestion: Link embeddings from transposed table

Let say I have e.g. a matrix where I have individuals (cells e.g.) as rows and features as columns, and then run a UMAP on both the ordinary matrix, and the transposed one. Then it would be natural to want to look at the individual UMAP with the default usage (the distances to other individuals), but it would also be interesting to see the features for that individual (and vice versa).

Is it clear what I mean?

opened by StaffanBetner 2

Releases(v0.3.2)

v0.3.2(Sep 17, 2021)
jrc now (v.0.5.0) uses setLimits function for all the security restriction. This update fixes the dependency problem caused by that change.

Source code(tar.gz)
Source code(zip)
v0.3.1(Sep 30, 2020)
broken path to the start page, caused by jrc update fixed

Source code(tar.gz)
Source code(zip)
v.0.3.0(Feb 27, 2020)
New argument metric allows to use angular distance (metric = "cosine") as an alternative to default Euclidean distance (meric = "euclid").

If compare = "distances", it is no longer required to provide several embeddings. If only one embedding is given, it will be used for all the distances.

Source code(tar.gz)
Source code(zip)
v0.2.1(Oct 2, 2019)
Changes due to an update of the jrc package.

Indices of selected points are no longer stored in a variable and can be accessed only via the callback function. Thus, no changes to the global environment are made, unless user specifies them his- or herself.

Added the possibility to pass arguments to jrc::openPage (such as port number or browser in which to open the app.)

Source code(tar.gz)
Source code(zip)
v0.2.0(Sep 27, 2019)
Now HTML Canvas is used to plot the embedding. It makes Sleepwalk faster and allows to simultaneously display more points.

New parameter mode = c("canvas", "svg") is added, that allows user to go back to the old SVG-based version of Sleepwalk app.

Bug in slw_snapshot is fixed. The function no longer returns a list of identical plots, when used with several different embeddings.

Source code(tar.gz)
Source code(zip)

Owner

S. Anders's research group at ZMBH

GitHub Repository https://anders-biostat.github.io/sleepwalk/

CMeEE 数据集医学实体抽取

医学实体抽取_GlobalPointer_torch 介绍思想来自于苏神 GlobalPointer，原始版本是基于keras实现的，模型结构实现参考现有 pytorch 复现代码【感谢!】，基于torch百分百复现苏神原始效果。数据集中文医学命名实体数据集点这里申请，很简单，共包含九类医学

85 Dec 28, 2022

DeLighT: Very Deep and Light-Weight Transformers

DeLighT: Very Deep and Light-weight Transformers This repository contains the source code of our work on building efficient sequence models: DeFINE (I

440 Dec 18, 2022

Creating a chess engine using GPT-3

GPT3Chess Creating a chess engine using GPT-3 Code for my article : https://towardsdatascience.com/gpt-3-play-chess-d123a96096a9 My game (white) vs GP

19 Dec 17, 2022

An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

WordleSolver An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode. How to use the program Copy this proje

3 Mar 02, 2022

Text Classification Using LSTM

Text classification is the task of assigning a set of predefined categories to free text. Text classifiers can be used to organize, structure, and categorize pretty much anything. For example, new ar

3 Jan 03, 2023

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

An implementation of Microsoft's "FastSpeech 2: Fast and High-Quality End-to-End Text to Speech"

1k Dec 30, 2022

Yet Another Neural Machine Translation Toolkit

YANMTT YANMTT is short for Yet Another Neural Machine Translation Toolkit. For a backstory how I ended up creating this toolkit scroll to the bottom o

121 Jan 05, 2023

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

SWRM Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors" Clone Clone th

14 Jan 03, 2023

Summarization module based on KoBART

KoBART-summarization Install KoBART pip install git+https://github.com/SKT-AI/KoBART#egg=kobart Requirements pytorch==1.7.0 transformers==4.0.0 pytor

148 Dec 28, 2022

Unofficial PyTorch implementation of Google AI's VoiceFilter system

VoiceFilter Note from Seung-won (2020.10.25) Hi everyone! It's Seung-won from MINDs Lab, Inc. It's been a long time since I've released this open-sour

881 Jan 03, 2023

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources Description This is the repository for the paper Unifying Cross-

16 Sep 09, 2022

DELTA is a deep learning based natural language and speech processing platform.

DELTA - A DEep learning Language Technology plAtform What is DELTA? DELTA is a deep learning based end-to-end natural language and speech processing p

1.5k Dec 26, 2022

An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

VizSeq is a Python toolkit for visual analysis on text generation tasks like machine translation, summarization, image captioning, speech translation

409 Oct 28, 2022

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

Data and code for EMNLP 2021 paper "FinQA: A Dataset of Numerical Reasoning over Financial Data"

114 Dec 29, 2022

Conditional Transformer Language Model for Controllable Generation

CTRL - A Conditional Transformer Language Model for Controllable Generation Authors: Nitish Shirish Keskar, Bryan McCann, Lav Varshney, Caiming Xiong,

1.7k Dec 28, 2022

A sample project that exists for PyPUG's "Tutorial on Packaging and Distributing Projects"

A sample Python project A sample project that exists as an aid to the Python Packaging User Guide's Tutorial on Packaging and Distributing Projects. T

4.5k Dec 30, 2022

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

RE2 This is a pytorch implementation of the ACL 2019 paper "Simple and Effective Text Matching with Richer Alignment Features". The original Tensorflo

286 Jan 02, 2023

ConvBERT: Improving BERT with Span-based Dynamic Convolution

ConvBERT Introduction In this repo, we introduce a new architecture ConvBERT for pre-training based language model. The code is tested on a V100 GPU.

237 Dec 10, 2022

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

62 Dec 20, 2022

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.

For better performance, you can try NLPGNN, see NLPGNN for more details. BERT-NER Version 2 Use Google's BERT for named entity recognition （CoNLL-2003

1.2k Dec 26, 2022

Exploring dimension-reduced embeddings

Related tags

Overview

sleepwalk

License and disclaimer

Comments

Error running sleepwalk: cannot open the connection

Error: 'browser' must be a non-empty character string

slw_on_selection error when sleepwalk is not attached

doc for comparison

Suggestion: Link embeddings from transposed table

Releases(v0.3.2)

v0.3.2(Sep 17, 2021)

v0.3.1(Sep 30, 2020)

v.0.3.0(Feb 27, 2020)

v0.2.1(Oct 2, 2019)

v0.2.0(Sep 27, 2019)

Owner

S. Anders's research group at ZMBH

CMeEE 数据集医学实体抽取

DeLighT: Very Deep and Light-Weight Transformers

Creating a chess engine using GPT-3

An algorithm that can solve the word puzzle Wordle with an optimal number of guesses on HARD mode.

Text Classification Using LSTM

PyTorch implementation of Microsoft's text-to-speech system FastSpeech 2: Fast and High-Quality End-to-End Text to Speech.

Yet Another Neural Machine Translation Toolkit

Code for Findings of ACL 2022 Paper "Sentiment Word Aware Multimodal Refinement for Multimodal Sentiment Analysis with ASR Errors"

Summarization module based on KoBART

Unofficial PyTorch implementation of Google AI's VoiceFilter system

Unifying Cross-Lingual Semantic Role Labeling with Heterogeneous Linguistic Resources (NAACL-2021).

DELTA is a deep learning based natural language and speech processing platform.

An Analysis Toolkit for Natural Language Generation (Translation, Captioning, Summarization, etc.)

The FinQA dataset from paper: FinQA: A Dataset of Numerical Reasoning over Financial Data

Conditional Transformer Language Model for Controllable Generation

A sample project that exists for PyPUG's "Tutorial on Packaging and Distributing Projects"

A pytorch implementation of the ACL2019 paper "Simple and Effective Text Matching with Richer Alignment Features".

ConvBERT: Improving BERT with Span-based Dynamic Convolution

A crowdsourced dataset of dialogues grounded in social contexts involving utilization of commonsense.

Use Google's BERT for named entity recognition （CoNLL-2003 as the dataset）.