Dimensionality reduction in very large datasets using Siamese Networks

Overview

DOI DOI Documentation Status Downloads Build Status

ivis

Implementation of the ivis algorithm as described in the paper Structure-preserving visualisation of high dimensional single-cell datasets. Ivis is designed to reduce dimensionality of very large datasets using a siamese neural network trained on triplets. Both unsupervised and supervised modes are supported.

ivis 10M data points

Installation

Ivis runs on top of TensorFlow. To install the latest ivis release from PyPi running on the CPU TensorFlow package, run:

# TensorFlow 2 packages require a pip version >19.0.
pip install --upgrade pip
pip install ivis[cpu]

If you have CUDA installed and want ivis to use the tensorflow-gpu package, run

pip install ivis[gpu]

Development version can be installed directly from from github:

git clone https://github.com/beringresearch/ivis
cd ivis
pip install -e '.[cpu]'

The following optional dependencies are needed if using the visualization callbacks while training the Ivis model:

  • matplotlib
  • seaborn

Upgrading

Ivis Python package is updated frequently! To upgrade, run:

pip install ivis --upgrade

Features

  • Scalable: ivis is fast and easily extends to millions of observations and thousands of features.
  • Versatile: numpy arrays, sparse matrices, and hdf5 files are supported out of the box. Additionally, both categorical and continuous features are handled well, making it easy to apply ivis to heterogeneous problems including clustering and anomaly detection.
  • Accurate: ivis excels at preserving both local and global features of a dataset. Often, ivis performs better at preserving global structure of the data than t-SNE, making it easy to visualise and interpret high-dimensional datasets.
  • Generalisable: ivis supports addition of new data points to original embeddings via a transform method, making it easy to incorporate ivis into standard sklearn Pipelines.

And many more! See ivis readme for latest additions and examples.

Examples

from ivis import Ivis
from sklearn.preprocessing import MinMaxScaler
from sklearn import datasets

iris = datasets.load_iris()
X = iris.data
X_scaled = MinMaxScaler().fit_transform(X)

model = Ivis(embedding_dims=2, k=15)

embeddings = model.fit_transform(X_scaled)

Copyright 2021 Bering Limited

Comments
  • Bug with index.build(ntrees)

    Bug with index.build(ntrees)

    Hello,

    I'm trying to run the ivis examples (both the simple iris one and the mnist one, and I keep getting this error whenever the model fitting is being called (running this on Debian). Any thoughts?

    In [7]: embeddings = ivis.fit_transform(mnist.data)
    
    Error truncating file: Invalid argument
    ---------------------------------------------------------------------------
    Exception                                 Traceback (most recent call last)
    <ipython-input-7-d5f1692c2b85> in <module>
    ----> 1 embeddings = ivis.fit_transform(mnist.data)
    
    /opt/conda/envs/ivisumap/lib/python3.7/site-packages/ivis/ivis.py in fit_transform(self, X, Y, shuffle_mode)
        289         """
        290
    --> 291         self.fit(X, Y, shuffle_mode)
        292         return self.transform(X)
        293
    
    /opt/conda/envs/ivisumap/lib/python3.7/site-packages/ivis/ivis.py in fit(self, X, Y, shuffle_mode)
        269         """
        270
    --> 271         self._fit(X, Y, shuffle_mode)
        272         return self
        273
    
    /opt/conda/envs/ivisumap/lib/python3.7/site-packages/ivis/ivis.py in _fit(self, X, Y, shuffle_mode)
        146                 print('Building KNN index')
        147             build_annoy_index(X, self.annoy_index_path,
    --> 148                               ntrees=self.ntrees, verbose=self.verbose)
        149
        150         datagen = generator_from_index(X, Y,
    
    /opt/conda/envs/ivisumap/lib/python3.7/site-packages/ivis/data/knn.py in build_annoy_index(X, path, ntrees, verbose)
         28
         29     # Build n trees
    ---> 30     index.build(ntrees)
         31     if platform.system() == 'Windows':
         32         index.save(path)
    
    Exception: Invalid argument
    
    opened by sadatnfs 15
  • Windows compatibility?

    Windows compatibility?

    Really excited to compare Ivis to UMAP on a project I am currently working on.

    The server I have access to is a Windows 10 machine, with a Python 3.7 Anaconda environment.

    Following the install instructions and trying to run the MNIST example, I am seeing the following error: TypeError: can't pickle annoy.Annoy objects

    enhancement help wanted 
    opened by paul-harambee 13
  • Ivis seems to provoke errors when composing a sklearn.pipeline.Pipeline passed to sklearn.model_selection.GridSearchCV and executed in parallel

    Ivis seems to provoke errors when composing a sklearn.pipeline.Pipeline passed to sklearn.model_selection.GridSearchCV and executed in parallel

    The problem

    I noticed that when Ivis compose a sklearn.pipeline.Pipeline which is passed to sklearn.model_selection.GridSearch to fine-tune hyper-parameters across all estimators/transformers, and GridSearch has n_jobs=-1 (i.e., when executions within GridSearch are parallel), errors are thrown. This does not happen when n_jobs=1 (i.e., when the executions within GridSearch are sequential).

    Since Pipeline globally regulates the n_jobs parameter, thus not supporting the parallelization of only specific steps, this problem forces the global use of n_jobs=1, which sensibly slows down the fine-tuning process by underusing the computational power of the setup in which the script is being executed (even in parts where n_jobs=-1 would work).

    Environment

    A virtual environment was created specifically to this repository, wherein all modules described in requirements.txt were installed. My setup runs an up-to-date version of Windows 10 (no WSL).

    Runtime

    python=3.8.4
    

    Relevant modules

    ivis=2.0.3
    tensorflow=2.5.0
    

    Minimal reproducible example

    Code

    if __name__ == "__main__":
        import tempfile
        import ivis
    
        from sklearn import datasets, ensemble, model_selection, pipeline, preprocessing
        from os import environ
    
        environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
    
        X, y = datasets.load_iris(return_X_y=True)
    
        pipeline_with_ivis = pipeline.Pipeline([
            ("normalize", preprocessing.MinMaxScaler()),
            ("project", ivis.Ivis()),
            ("classify", ensemble.RandomForestClassifier()),
        ], memory=tempfile.mkdtemp())
    
        parameter_grid = {
            "project__k": (15,),
            "project__verbose": (True,),
    
            "classify__random_state": (2021,)
        }
    
        grid_search = model_selection.GridSearchCV(pipeline_with_ivis, parameter_grid, scoring="accuracy", cv=10, n_jobs=-1,
                                                   return_train_score=True, verbose=3).fit(X, y)
    

    Error

    <REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py:615: FitFailedWarning: Estimator fit failed. The score on this train-test partition for these parameters will be set to nan. Details: 
    Traceback (most recent call last):
      File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 212, in extract_knn
        process.start()
      File "C:\Python38\lib\multiprocessing\process.py", line 121, in start
        self._popen = self._Popen(self)
      File "C:\Python38\lib\multiprocessing\context.py", line 224, in _Popen
        return _default_context.get_context().Process._Popen(process_obj)
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\externals\loky\backend\process.py", line 39, in _Popen
        return Popen(process_obj)
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\externals\loky\backend\popen_loky_win32.py", line 70, in __init__
        child_env.update(process_obj.env)
    AttributeError: 'KnnWorker' object has no attribute 'env'
    
    During handling of the above exception, another exception occurred:
    Traceback (most recent call last):
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_validation.py", line 598, in _fit_and_score
        estimator.fit(X_train, y_train, **fit_params)
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 341, in fit
        Xt = self._fit(X, y, **fit_params_steps)
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 303, in _fit
        X, fitted_transformer = fit_transform_one_cached(
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\memory.py", line 591, in __call__
        return self._cached_call(args, kwargs)[0]
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\memory.py", line 534, in _cached_call
        out, metadata = self.call(*args, **kwargs)
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\joblib\memory.py", line 761, in call
        output = self.func(*args, **kwargs)
      File "<REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\pipeline.py", line 754, in _fit_transform_one
        res = transformer.fit_transform(X, y, **fit_params)
      File "<REPOSITORY_ROOT>\ivis\ivis.py", line 350, in fit_transform
        self.fit(X, Y, shuffle_mode)
      File "<REPOSITORY_ROOT>\ivis\ivis.py", line 328, in fit
        self._fit(X, Y, shuffle_mode)
      File "<REPOSITORY_ROOT>\ivis\ivis.py", line 190, in _fit
        self.neighbour_matrix = AnnoyKnnMatrix.build(X, path=self.annoy_index_path,
      File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 63, in build
        return cls(index, X.shape, path, k, search_k, precompute, include_distances, verbose)
      File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 48, in __init__
        self.precomputed_neighbours = self.get_neighbour_indices()
      File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 96, in get_neighbour_indices
        return extract_knn(
      File "<REPOSITORY_ROOT>\ivis\data\neighbour_retrieval\knn.py", line 236, in extract_knn
        process.terminate()
      File "C:\Python38\lib\multiprocessing\process.py", line 133, in terminate
        self._popen.terminate()
    AttributeError: 'NoneType' object has no attribute 'terminate'
      warnings.warn("Estimator fit failed. The score on this train-test"
    
    [...]
    
    <REPOSITORY_ROOT>\venv\lib\site-packages\sklearn\model_selection\_search.py:922: UserWarning: One or more of the test scores are non-finite: [nan]
      warnings.warn(
    

    Discussion

    By coding and playing with the example above, I acquired the understanding that, since both sklearn uses joblib and ivis uses multiprocessing, these modules might not be playing well with each other for some reason.

    I would discard the understanding that nested estimators/transformers with parallel routines would be the problem: estimators like sklearn.ensemble.RandomForestClassifier can be set to have n_jobs=-1 without problem within the Pipeline passed to GridSearchCV.

    I am particularly affected by this issue because I want to employ ivis in projects that involve hyper-parameter fine-tuning using cross-validation via GridSearchCV with concurrent executions. I attempted to diagnose the problem, but to no avail, which is why I bring this issue to your attention.

    Observation: another part of this problem is a design choice that is not adherent to the sklearn API guidelines, whose solution I propose and detail in #95. This issue does not cause the aforementioned error, but might cause other errors that could affect the same use scenario (Pipeline in GridSearchCV running in parallel).

    opened by imatheussm 10
  • attempt  to apply non-function

    attempt to apply non-function

    I want to install ivis in R, but show the error as the title. The system of my computer is Windows, so I have installed conda before running the code., can anyone help me to solve this problem. thank you! library (reticulate) devtools : : install _github("beringresearch/ivis/R-package") library (ivis) model <- ivis (k = 3) Error in ivis _object$Ivis(embedding _dims = embedding _dims, k = k, distance = distance, : attempt to apply non-function

    opened by Feifei0511 9
  • Issue installing and running ivis R package in RStudio

    Issue installing and running ivis R package in RStudio

    Hello,

    For JOSS review.

    The installation instructions fail when run in the RStudio environment:

    > devtools::install_github("beringresearch/ivis/R-package", force=TRUE)
    Downloading GitHub repo beringresearch/[email protected]
    ✔  checking for file ‘/private/var/folders/cp/8rn2cs_x79zcbp_yb75ychg80000gq/T/Rtmpud6pnU/remotesbe4d59017fdb/beringresearch-ivis-bbccdb7/R-package/DESCRIPTION’ ...
    ─  preparing ‘ivis’:
    ✔  checking DESCRIPTION meta-information ...
    ─  checking for LF line-endings in source and make files and shell scripts
    ─  checking for empty or unneeded directories
    ─  building ‘ivis_1.1.3.tar.gz’
       
    * installing *source* package ‘ivis’ ...
    ** using staged installation
    ** R
    ** byte-compile and prepare package for lazy loading
    ** help
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded from temporary location
    Error: package or namespace load failed for ‘ivis’:
     .onLoad failed in loadNamespace() for 'ivis', details:
      call: path.expand(path)
      error: invalid 'path' argument
    Error: loading failed
    Execution halted
    ERROR: loading failed
    * removing ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/ivis’
    * restoring previous ‘/Library/Frameworks/R.framework/Versions/3.6/Resources/library/ivis’
    Error: Failed to install 'ivis' from GitHub:
      (converted from warning) installation of package ‘/var/folders/cp/8rn2cs_x79zcbp_yb75ychg80000gq/T//Rtmpud6pnU/filebe4d71713083/ivis_1.1.3.tar.gz’ had non-zero exit status
    

    However, it does work fine when run in the console (Darwin Kernel Version 18.6.0: Thu Apr 25 23:16:27 PDT 2019; root:xnu-4903.261.4~2/RELEASE_X86_64 x86_64):

    > devtools::install_github("beringresearch/ivis/R-package", force=TRUE)
    Downloading GitHub repo beringresearch/[email protected]
       checking for file ‘/private/var/folders/cp/8rn2cs_x79zcbp_yb75ychg80000gq/T/Rtmpvj2CT3/remotesc3827327cfb8/beringresearch-ivis-bbccdb7/R-package/DESCRIPTION’✔  checking for file ‘/private/var/folders/cp/8rn2cs_x79zcbp_yb75ychg80000gq/T/Rtmpvj2CT3/remotesc3827327cfb8/beringresearch-ivis-bbccdb7/R-package/DESCRIPTION’
    ─  preparing ‘ivis’:
    ✔  checking DESCRIPTION meta-information ...
    ─  checking for LF line-endings in source and make files and shell scripts
    ─  checking for empty or unneeded directories
    ─  building ‘ivis_1.1.3.tar.gz’
       
    * installing *source* package ‘ivis’ ...
    ** using staged installation
    ** R
    ** byte-compile and prepare package for lazy loading
    ** help
    *** installing help indices
    ** building package indices
    ** testing if installed package can be loaded from temporary location
    ** testing if installed package can be loaded from final location
    ** testing if installed package keeps a record of temporary installation path
    * DONE (ivis)
    

    Moreover, the ivis package (installed from the terminal) can be loaded from an R console in a terminal, but throws the following error when loaded in RStudio

    > library(ivis)
    Error: package or namespace load failed for ‘ivis’:
     .onLoad failed in loadNamespace() for 'ivis', details:
      call: path.expand(path)
      error: invalid 'path' argument
    

    This is most likely due to conda not being on the PATH in RStudio:

    # RStudio
    > system("echo $PATH")
    /usr/bin:/bin:/usr/sbin:/sbin:/usr/local/bin:/usr/local/ncbi/igblast/bin:/Library/TeX/texbin:/opt/X11/bin:/opt/local/bin
    # Console
    > system("echo $PATH")
    /Users/kevin/miniconda3/bin:/Users/kevin/miniconda3/condabin:/usr/local/opt/[email protected]/bin:/Users/kevin/bin:/usr/local/bin:/usr/bin:/bin:/usr/sbin:/sbin:/usr/local/ncbi/igblast/bin:/Library/TeX/texbin:/opt/X11/bin
    

    Is there a recommended way to set up an environment to run ivis in RStudio, or are users only expected to run it from a terminal R console?

    Thanks!

    opened by kevinrue 6
  • Enable registration or passing of a custom triplet loss function

    Enable registration or passing of a custom triplet loss function

    In Python, Ivis.__init__ accepts a distance: str keyword argument, which sets from a dictionary a predefined triplet loss function for that distance metric. Currently, one of the ways to provide a custom distance function is to monkeypatch the ivis.nn.losses.get_loss_functions. Other ways to accomplish the same are even messier from the perspectives of usage and implementation.

    The nature of dimensionality reduction, especially when dealing with one-hot-encoded categorical features, sometimes requires custom ways to calculate loss. Under the hood, ivis has the ability to enable custom loss functions, but any such offerings need to be implemented in a clean and API-idiomatic manner.


    A custom distance function requires its own triplet loss implementation. Ivis.__init__ could support an additional keyword argument (e.g. triplet_loss: Callable[..., ...] = ...) for users to be able to pass their own.

    Alternatively, it could simply be passed inside the existing distance kwarg, with its signature changing to distance: Union[str, Callable[..., ...]].

    Another way would be to make the losses dictionary built by ivis.nn.losses.get_loss_functions a module-level loss function registrar.

    Additionally, docs and examples need to be updated on how to correctly implement a custom loss function. With all currently available distance metrics, the triplet loss implementation follows a very similar pattern, and should not be too daunting to attempt to implement.

    opened by mihajenko 5
  • Add a vignette to the R package

    Add a vignette to the R package

    Hello,

    For JOSS review.

    Is your feature request related to a problem? Please describe.

    The R package lacks documentation of an application to a real-life dataset.

    Describe the solution you'd like

    Please add a vignette in the R package demonstrating at least an example application to a single-cell dataset. Basically, the equivalent of the scanpy workflow here.

    A convenient way to use the pbmc3k dataset for demonstration purposes is the Bioconductor TENxPBMCData package.

    Suggested code:

    library(TENxPBMCData)
    tenx_ pbmc3k <- TENxPBMCData(dataset = "pbmc3k")
    

    Ideally, consider using the vignette (or a separate one) to also give an introduction to the functionality of the R package. It is not necessary to duplicate information already described in the documentation of the Python package (DRY principle); you may simply include a link to the main page.

    Describe alternatives you've considered

    A working example of an R workflow could also be included in the documentation of the Python package, although this is probably unnecessarily difficult to maintain. Ideally, that example would be run and tested for every new release of the Python and R source code.

    Additional context Once you have an R vignette written, you should also consider using pkgdown to automatically create a GitHub website including the full package documentation.

    opened by kevinrue 5
  • Extremely slow extraction of KNN neighbours on 100k samples

    Extremely slow extraction of KNN neighbours on 100k samples

    I'm using ivis[cpu] on a dataset of about 100k samples with around 200k sparse features. My training dataset is stored in an h5 file and I use the following code to fit and transform the dataset:

    with h5py.File(filename, 'r') as f:
          X = f['data']
          Y = pd.Categorical(meta_df["label"]).codes
          model = Ivis(epochs=5, k=15)
          model.fit(X, Y, shuffle_mode='batch') # Shuffle batches when using h5 files
    
          embeddings = model.transform(X)
    

    However, it takes so long:

    Building KNN index
    100%|██████████| 105942/105942 [55:07<00:00, 32.03it/s]
    Extracting KNN neighbours
      0%|          | 262/105942 [7:16:38<2935:20:19, 99.99s/it]
    

    2935 hours!! Am I missing something? or this is expected? Should I switch to GPU?

    By the way, I'm using a google colab system with 8 CPU cores, 50 GB Ram, and an SSD disk.

    opened by adavoudi 4
  • How to get stable results?

    How to get stable results?

    Hello Folks,

    thank you for all the work on this lib. I have a question about reproducibility: Is there a way to set a random seed or random state and get stable results?

    I'm trying to achieve this with:

    import random
    import numpy
    random.seed(42)
    numpy.random.seed(42)
    

    I'm aware that these are not threadsafe, so this may be the reason of the not reproducible results. Anyway, is there any way to enforce this?

    opened by rsarai 4
  • model_save: optimizer is not compatible with pickle

    model_save: optimizer is not compatible with pickle

    When attempting to use save_model after fitting a supervised Ivis instance, I get an error when trying to save. It looks like some part of the optimizer is not compatible to be pickled with python.

    Replicate:

    import ivis
    i = ivis.Ivis(embedding_dims=10, n_epochs_without_progress=5)
    i.fit(X, y)
    i.save_model("model.ivis")
    
    Traceback (most recent call last):
      File "src/ivis_persist.py", line 69, in <module>
        ivises[output].save_model(f"models/{output}.ivis")
      File "/Users/pbaumgartner/anaconda3/envs/env/lib/python3.7/site-packages/ivis/ivis.py", line 404, in save_model
        pkl.dump(self.model_.optimizer, f)
    AttributeError: Can't pickle local object 'make_gradient_clipnorm_fn.<locals>.<lambda>'
    

    System Info: Running ivis==2.0.0 on macOS with python 3.7.

    bug 
    opened by pmbaumgartner 4
  • R pkg fit() call finishes but subprocess doesn't terminate

    R pkg fit() call finishes but subprocess doesn't terminate

    This model consistently feels like a magic trick, thanks for contributing!

    Bug I'm running the ivis R package(v1.7.1) (more system details below). I can get model$fit() and model$transform() working just fine and producing substantive results. However, when the R process finishes and returns the fitted model, I'm seeing continued sky-high system usage. The R process calling ivis is definitely completed and back to a command prompt, but in htop I can see the RStudio GUI process (parent of the rsession process) occupying at least 2 full cores. Some process further down is not stopping when the R process gets the returned value. (Restarting the R session does kill it.)

    I don't understand enough of the ivis-through-reticulate toolchain to provide more helpful diagnostics in this first report, but happy to run experiments and document further.

    Environment

    • ivis R package(v1.7.1), installed from Github (beringresearch/[email protected]) 14 Apr 2020
    • reticulate (v1.15), 2020-04-02 CRAN (R 3.6.2)
    • R 3.6.2 on MacOS 10.14.6 (18G4032)
    platform       x86_64-apple-darwin15.6.0   
    arch           x86_64                      
    os             darwin15.6.0                
    system         x86_64, darwin15.6.0        
    status                                     
    major          3                           
    minor          6.2                         
    year           2019                        
    month          12                          
    day            12                          
    svn rev        77560                       
    language       R                           
    version.string R version 3.6.2 (2019-12-12)
    nickname       Dark and Stormy Night  
    
    opened by sheffe 4
  • InternalError: Graph execution error:

    InternalError: Graph execution error:

    Hello, I want to use ivis to do the analysis for my scRNA-seq data.

    Here is my code:

    def getReduction(X):
        #X = PCA(n_components=4, copy=True, random_state=1).fit_transform(X)
        from ivis import Ivis
        model = Ivis(embedding_dims=4, k=15)
        X = model.fit_transform(X)
        print(X.shape)
        return X
    

    but I got some errors:

    ---------------------------------------------------------------------------
    InternalError                             Traceback (most recent call last)
    Input In [9], in <cell line: 1>()
    ----> 1 multi_train_x = getReduction(train_x)
    
    Input In [8], in getReduction(X)
          3 from ivis import Ivis
          4 model = Ivis(embedding_dims=6, k=15)
    ----> 5 X = model.fit_transform(X)
          6 print(X.shape)
          7 return X
    
    File /opt/conda/lib/python3.8/site-packages/ivis/ivis.py:368, in Ivis.fit_transform(self, X, Y, shuffle_mode)
        349 def fit_transform(self, X, Y=None, shuffle_mode=True):
        350     """Fit to data then transform
        351 
        352     Parameters
       (...)
        365         Embedding of the data in low-dimensional space.
        366     """
    --> 368     self.fit(X, Y, shuffle_mode)
        369     return self.transform(X)
    
    File /opt/conda/lib/python3.8/site-packages/ivis/ivis.py:346, in Ivis.fit(self, X, Y, shuffle_mode)
        328 def fit(self, X, Y=None, shuffle_mode=True):
        329     """Fit an ivis model.
        330 
        331     Parameters
       (...)
        343         Returns estimator instance.
        344     """
    --> 346     self._fit(X, Y, shuffle_mode)
        347     return self
    
    File /opt/conda/lib/python3.8/site-packages/ivis/ivis.py:318, in Ivis._fit(self, X, Y, shuffle_mode)
        315 if self.verbose > 0:
        316     print('Training neural network')
    --> 318 hist = self.model_.fit(
        319     datagen,
        320     epochs=self.epochs,
        321     callbacks=self.callbacks_ + [EarlyStopping(monitor='loss',
        322                                                patience=self.n_epochs_without_progress)],
        323     shuffle=shuffle_mode,
        324     steps_per_epoch=int(np.ceil(X.shape[0] / self.batch_size)),
        325     verbose=self.verbose)
        326 self.loss_history_ += hist.history['loss']
    
    File /opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py:67, in filter_traceback.<locals>.error_handler(*args, **kwargs)
         65 except Exception as e:  # pylint: disable=broad-except
         66   filtered_tb = _process_traceback_frames(e.__traceback__)
    ---> 67   raise e.with_traceback(filtered_tb) from None
         68 finally:
         69   del filtered_tb
    
    File /opt/conda/lib/python3.8/site-packages/tensorflow/python/eager/execute.py:54, in quick_execute(op_name, num_outputs, inputs, attrs, ctx, name)
         52 try:
         53   ctx.ensure_initialized()
    ---> 54   tensors = pywrap_tfe.TFE_Py_Execute(ctx._handle, device_name, op_name,
         55                                       inputs, attrs, num_outputs)
         56 except core._NotOkStatusException as e:
         57   if name is not None:
    
    InternalError: Graph execution error:
    
    Detected at node 'model_1/model/dense/MatMul' defined at (most recent call last):
        File "/opt/conda/lib/python3.8/runpy.py", line 194, in _run_module_as_main
          return _run_code(code, main_globals, None,
        File "/opt/conda/lib/python3.8/runpy.py", line 87, in _run_code
          exec(code, run_globals)
        File "/opt/conda/lib/python3.8/site-packages/ipykernel_launcher.py", line 17, in <module>
          app.launch_new_instance()
        File "/opt/conda/lib/python3.8/site-packages/traitlets/config/application.py", line 846, in launch_instance
          app.start()
        File "/opt/conda/lib/python3.8/site-packages/ipykernel/kernelapp.py", line 712, in start
          self.io_loop.start()
        File "/opt/conda/lib/python3.8/site-packages/tornado/platform/asyncio.py", line 199, in start
          self.asyncio_loop.run_forever()
        File "/opt/conda/lib/python3.8/asyncio/base_events.py", line 570, in run_forever
          self._run_once()
        File "/opt/conda/lib/python3.8/asyncio/base_events.py", line 1859, in _run_once
          handle._run()
        File "/opt/conda/lib/python3.8/asyncio/events.py", line 81, in _run
          self._context.run(self._callback, *self._args)
        File "/opt/conda/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 504, in dispatch_queue
          await self.process_one()
        File "/opt/conda/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 493, in process_one
          await dispatch(*args)
        File "/opt/conda/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 400, in dispatch_shell
          await result
        File "/opt/conda/lib/python3.8/site-packages/ipykernel/kernelbase.py", line 724, in execute_request
          reply_content = await reply_content
        File "/opt/conda/lib/python3.8/site-packages/ipykernel/ipkernel.py", line 383, in do_execute
          res = shell.run_cell(
        File "/opt/conda/lib/python3.8/site-packages/ipykernel/zmqshell.py", line 528, in run_cell
          return super().run_cell(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2880, in run_cell
          result = self._run_cell(
        File "/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 2935, in _run_cell
          return runner(coro)
        File "/opt/conda/lib/python3.8/site-packages/IPython/core/async_helpers.py", line 129, in _pseudo_sync_runner
          coro.send(None)
        File "/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3134, in run_cell_async
          has_raised = await self.run_ast_nodes(code_ast.body, cell_name,
        File "/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3337, in run_ast_nodes
          if await self.run_code(code, result, async_=asy):
        File "/opt/conda/lib/python3.8/site-packages/IPython/core/interactiveshell.py", line 3397, in run_code
          exec(code_obj, self.user_global_ns, self.user_ns)
        File "/tmp/ipykernel_1917/2291785529.py", line 1, in <cell line: 1>
          multi_train_x = getReduction(train_x)
        File "/tmp/ipykernel_1917/2290316524.py", line 5, in getReduction
          X = model.fit_transform(X)
        File "/opt/conda/lib/python3.8/site-packages/ivis/ivis.py", line 368, in fit_transform
          self.fit(X, Y, shuffle_mode)
        File "/opt/conda/lib/python3.8/site-packages/ivis/ivis.py", line 346, in fit
          self._fit(X, Y, shuffle_mode)
        File "/opt/conda/lib/python3.8/site-packages/ivis/ivis.py", line 318, in _fit
          hist = self.model_.fit(
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/training.py", line 1409, in fit
          tmp_logs = self.train_function(iterator)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/training.py", line 1051, in train_function
          return step_function(self, iterator)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/training.py", line 1040, in step_function
          outputs = model.distribute_strategy.run(run_step, args=(data,))
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/training.py", line 1030, in run_step
          outputs = model.train_step(data)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/training.py", line 889, in train_step
          y_pred = self(x, training=True)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/training.py", line 490, in __call__
          return super().__call__(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1014, in __call__
          outputs = call_fn(inputs, *args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/functional.py", line 458, in call
          return self._run_internal_graph(
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/functional.py", line 596, in _run_internal_graph
          outputs = node.layer(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/training.py", line 490, in __call__
          return super().__call__(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1014, in __call__
          outputs = call_fn(inputs, *args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/functional.py", line 458, in call
          return self._run_internal_graph(
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/functional.py", line 596, in _run_internal_graph
          outputs = node.layer(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 64, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/engine/base_layer.py", line 1014, in __call__
          outputs = call_fn(inputs, *args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/utils/traceback_utils.py", line 92, in error_handler
          return fn(*args, **kwargs)
        File "/opt/conda/lib/python3.8/site-packages/keras/layers/core/dense.py", line 221, in call
          outputs = tf.matmul(a=inputs, b=self.kernel)
    Node: 'model_1/model/dense/MatMul'
    Attempting to perform BLAS operation using StreamExecutor without BLAS support
    	 [[{{node model_1/model/dense/MatMul}}]] [Op:__inference_train_function_1703]
    
    

    Thanks !!!

    opened by bitcometz 3
  • Add conda-forge package

    Add conda-forge package

    In addition to the pypi package, please add a conda-forge package (https://conda-forge.org).

    I can give support if needed.

    You can easily create a boilerplate conda recipe with grayskull (starting from the pypi package): https://github.com/conda-incubator/grayskull (note: the "annoy" package is called "python-annoy" in conda-forge).

    opened by candalfigomoro 0
  • Distance-weighted random sampling of non-neighbor negatives

    Distance-weighted random sampling of non-neighbor negatives

    Not a fully-baked feature request, just a directional hunch. I've found the conclusions from this paper Sampling Matters in Deep Embedding Learning pretty intuitive -- (1) the method for choosing negative samples is critical to the overall embedding, maybe more than the specific loss function, and (2) a distance-weighted sampling of negatives had some nice properties during training and better results compared to uniform random sampling or oversampling hard cases.

    I'm brand-new to Annoy, not confident on the implementation details or performance changes here, but I suspect that the prebuilt index could be used for both positive and negative sampling. An example: the current approach draws random negatives in sequence and chooses the first index not in a neighbor list. A distance-weighted approach for choosing a negative for each triplet might work like this:

    • Draw a random set of candidate negatives
    • Drop any candidate negatives already in the neighbor list
    • Choose from the remaining set of candidates with probabilities proportional to 1/f(dist(i, j)), where f(dist) could be just 1/dist, 1/sqrt(dist), etc

    Annoy gives us the dist(i, j) without much of a performance hit. Weighted choice of the candidate negatives puts a (tunable) thumb on the scale for triplets that contain closer/harder-negative matches.

    This idea probably does increase some hyperparameter selection headaches. I think the impactful choices here are the size of the initial set of candidate negatives and (especially) f(dist).

    opened by sheffe 2
  • Custom generator for training on out-of-memory datasets

    Custom generator for training on out-of-memory datasets

    In https://bering-ivis.readthedocs.io/en/latest/oom_datasets.html, for out-of-memory datasets, you say to train on h5 files that exist on disk.

    In my case, I can't use h5 files, but I could use a custom generator which yields numpy array batched data.

    Is there a way to provide batched data through a custom generator function? Something like keras' fit_generator.

    Thank you

    opened by candalfigomoro 5
Releases(2.08)
  • 2.07(Mar 10, 2022)

    • Added ability to save/load ivis models that have not been trained. This also fixes an issue when using GridSearchCV in conjunction with ivis
    • Bugfix for triplet generator when used in conjunction with a dataset exposing the custom get_triplet_data method
    Source code(tar.gz)
    Source code(zip)
  • 2.06(Oct 17, 2021)

    New features:

    • ivis models are now serializable via pickle/dill/joblib. Thanks to @imatheussm for his contributions toward this.
    • The save_model method now accepts an optional "save_format" argument. Setting it to "tfs" will export ivis models in the TensorFlow SavedModel format, which integrates well with other TensorFlow libraries.
    Source code(tar.gz)
    Source code(zip)
  • 2.0.5rc1(Jun 4, 2021)

    • Knn retrieval made more efficient by switching from multi-processing to multi-threading. Memory savings depend on OS and core count.
    • Fixed issue where saved ivis models would attempt to load the index at the path they were saved with - this can't be relied on when the index is temporary and deleted after use.
    • Fixed issue where Annoy Index metric parameter was not passed to an index that was loaded from disk.
    • A few other things changed, including better error handling, cleaner code, and allowing for saving AnnoyKnnMatrix via pickle
    Source code(tar.gz)
    Source code(zip)
  • 2.0.5(Jul 13, 2021)

    Highlights:

    • Improved training speed for numpy arrray inputs thanks to a faster triplet generator.
    • Batched retrieval capabilities that makes ivis much faster when training on out-of-memory data that is retrieved in parallel.
    • Improved performance when using Ivis with precompute=False option by using multi-threading when retrieving batches of KNN on-demand.
    • Added deprecation notices for minor upcoming changes to API for consistency and adherence to sklearn API.
    Source code(tar.gz)
    Source code(zip)
  • 2.0.3(May 26, 2021)

  • 2.02(Apr 15, 2021)

  • 2.0.1(Jan 6, 2021)

  • 2.0.0(Dec 8, 2020)

    Major ivis release!

    Version 2.0 features:

    • Unsupervised, semi-supervised, and fully supervised dimensionality reduction
    • Support for arbitrary datasets:
      • N-dimensional arrays
      • Image files on disk
      • Custom data connectors
    • In- and out-of-memory data ingestion
    • Resumable training
    • Arbitrary neural network backbones
    • Customizable neighbour retrieval
    • Callbacks and Tensorboard integration
    Source code(tar.gz)
    Source code(zip)
  • 1.8.4(Nov 2, 2020)

  • 1.8.3(Oct 28, 2020)

  • 1.8.2(Oct 28, 2020)

  • 1.8.1(Jun 11, 2020)

  • 1.8.0(May 13, 2020)

    • Introducing neighbour_matrix parameter for provision of arbitrary KNNs.
    • Transition to tf.Datasets, improving memory efficiency and overall stability
    Source code(tar.gz)
    Source code(zip)
  • 1.7.0(Jan 7, 2020)

  • 1.6.0(Oct 29, 2019)

    Major features:

    • Support for semi-supervised dimensionality reduction
    • Switch from using fit_generator to fit for training the Keras model
    • Address eager execution issues with TF 2.0
    • User-configurable on-disk-building of Annoy index.
    • Tidy handling of interrupted multi-thread processes

    Minor features:

    • Tests for semi-supervised DR

    • Improved input validation

    • Better hyper parameter validation

    • Slight changes to default hyperparameters

    • Bug fixes

    Source code(tar.gz)
    Source code(zip)
  • 1.5.3(Oct 3, 2019)

    • Control eager execution
    • R package updates and improvements
    • Save ivis object with a custom model
    • Bug squashes and performance improvements
    Source code(tar.gz)
    Source code(zip)
  • 1.5.0(Oct 1, 2019)

  • 1.4.1(Sep 5, 2019)

  • 1.4.0(Aug 19, 2019)

    A number of major additions:

    • Support for both classification- and regression-type supervision
    • Access to all Keras losses for supervised dimensionality reduction
    • Bug fixes and performance improvements
    Source code(tar.gz)
    Source code(zip)
  • 1.3.0(Aug 6, 2019)

    This release introduces a number of new features into ivis:

    • Windows support
    • Code changes to support ivis on Python2
    • R package received a major facelift - with big thanks to JOSS reviewers
    • Added cosine distance metric in triplet loss function
    • Minor bug fixes and performance improvements
    Source code(tar.gz)
    Source code(zip)
  • 1.2.4(Aug 5, 2019)

  • 1.2.3-joss(Aug 5, 2019)

  • 1.2.3(Jul 4, 2019)

  • 1.2.2(Jul 2, 2019)

  • 1.2.1(Jul 2, 2019)

  • 1.2.0(Jul 2, 2019)

    Supervised mode added to ivis. Additional features:

    • Add classification_weight parameter to allow users to tune balance between classification vs. triplet loss.
    • Add Ivis callbacks module for ivis-specific callbacks such as checkpointing during training. Ivis object code changed to deal with provided callbacks.
    • Tensorboard callbacks
    • Sparse matrix support in supervised mode
    Source code(tar.gz)
    Source code(zip)
  • 1.1.5(Jun 25, 2019)

    Significant improvement in processing speed for both precompute=True and precompute=False option using Keras Sequence generator. Addresses #21 .

    Source code(tar.gz)
    Source code(zip)
  • 1.1.4(Jun 20, 2019)

NorthPitch is a python soccer plotting library that sits on top of Matplotlib

NorthPitch is a python soccer plotting library that sits on top of Matplotlib.

Devin Pleuler 30 Feb 22, 2022
Collection of data visualizing projects through Tableau, Data Wrapper, and Power BI

Data-Visualization-Projects Collection of data visualizing projects through Tableau, Data Wrapper, and Power BI Indigenous-Brands-Social-Movements Pyt

Jinwoo(Roy) Yoon 1 Feb 05, 2022
Jupyter Notebook extension leveraging pandas DataFrames by integrating DataTables and ChartJS.

Jupyter DataTables Jupyter Notebook extension to leverage pandas DataFrames by integrating DataTables JS. About Data scientists and in fact many devel

Marek Čermák 142 Dec 28, 2022
Visualization ideas for data science

Nuance I use Nuance to curate varied visualization thoughts during my data scientist career. It is not yet a package but a list of small ideas. Welcom

Li Jiangchun 16 Nov 03, 2022
哔咔漫画window客户端,界面使用PySide2,已实现分类、搜索、收藏夹、下载、在线观看、waifu2x等功能。

picacomic-windows 哔咔漫画window客户端,界面使用PySide2,已实现分类、搜索、收藏夹、下载、在线观看等功能。 功能介绍 登陆分流,还原安卓端的三个分流入口 分类,搜索,排行,收藏夹使用同一的逻辑,滚轮下滑自动加载下一页,双击打开 漫画详情,章节列表和评论列表 下载功能,目

1.8k Dec 31, 2022
Graphical display tools, to help students debug their class implementations in the Carcassonne family of projects

carcassonne_tools Graphical display tools, to help students debug their class implementations in the Carcassonne family of projects NOTE NOTE NOTE The

1 Nov 08, 2021
matplotlib: plotting with Python

Matplotlib is a comprehensive library for creating static, animated, and interactive visualizations in Python. Check out our home page for more inform

Matplotlib Developers 16.7k Jan 08, 2023
Displaying plot of death rates from past years in Poland. Data source from these years is in readme

Average-Death-Rate Displaying plot of death rates from past years in Poland The goal collect the data from a CSV file count the ADR (Average Death Rat

Oliwier Szymański 0 Sep 12, 2021
Main repository for Vispy

VisPy: interactive scientific visualization in Python Main website: http://vispy.org VisPy is a high-performance interactive 2D/3D data visualization

vispy 3k Jan 03, 2023
📊 Charts with pure python

A zero-dependency python package that prints basic charts to a Jupyter output Charts supported: Bar graphs Scatter plots Histograms 🍑 📊 👏 Examples

Max Humber 54 Oct 04, 2022
Easily convert matplotlib plots from Python into interactive Leaflet web maps.

mplleaflet mplleaflet is a Python library that converts a matplotlib plot into a webpage containing a pannable, zoomable Leaflet map. It can also embe

Jacob Wasserman 502 Dec 28, 2022
CONTRIBUTIONS ONLY: Voluptuous, despite the name, is a Python data validation library.

CONTRIBUTIONS ONLY What does this mean? I do not have time to fix issues myself. The only way fixes or new features will be added is by people submitt

Alec Thomas 1.8k Dec 31, 2022
Political elections, appointment, analysis and visualization in Python

Political elections, appointment, analysis and visualization in Python poli-sci-kit is a Python package for political science appointment and election

Andrew Tavis McAllister 9 Dec 01, 2022
Focus on Algorithm Design, Not on Data Wrangling

The dataTap Python library is the primary interface for using dataTap's rich data management tools. Create datasets, stream annotations, and analyze model performance all with one library.

Zensors 37 Nov 25, 2022
基于python爬虫爬取COVID-19爆发开始至今全球疫情数据并利用Echarts对数据进行分析与多样化展示。

COVID-19-Epidemic-Map 基于python爬虫爬取COVID-19爆发开始至今全球疫情数据并利用Echarts对数据进行分析与多样化展示。 觉得项目还不错的话欢迎给一个star! 项目的源码可以正常运行,各个库的版本、数据库的建表语句、运行过程中遇到的坑以及解决方式在笔记.md中都

31 Dec 15, 2022
Generate a 3D Skyline in STL format and a OpenSCAD file from Gitlab contributions

Your Gitlab's contributions in a 3D Skyline gitlab-skyline is a Python command to generate a skyline figure from Gitlab contributions as Github did at

Félix Gómez 70 Dec 22, 2022
A way of looking at COVID-19 data that I haven't seen before.

Visualizing Omicron: COVID-19 Deaths vs. Cases Click here for other countries. Data is from Our World in Data/Johns Hopkins University. About this pro

1 Jan 10, 2022
a python function to plot a geopandas dataframe

Pretty GeoDataFrame A minimum python function (~60 lines) to draw pretty geodataframe. Based on matplotlib, shapely, descartes. Installation just use

haoming 27 Dec 05, 2022
An interactive dashboard for visualisation, integration and classification of data using Active Learning.

AstronomicAL An interactive dashboard for visualisation, integration and classification of data using Active Learning. AstronomicAL is a human-in-the-

45 Nov 28, 2022
MPL Plotter is a Matplotlib based Python plotting library built with the goal of delivering publication-quality plots concisely.

MPL Plotter is a Matplotlib based Python plotting library built with the goal of delivering publication-quality plots concisely.

Antonio López Rivera 162 Nov 11, 2022