MiniSom is a minimalistic implementation of the Self Organizing Maps

Overview

MiniSom

Self Organizing Maps

MiniSom is a minimalistic and Numpy based implementation of the Self Organizing Maps (SOM). SOM is a type of Artificial Neural Network able to convert complex, nonlinear statistical relationships between high-dimensional data items into simple geometric relationships on a low-dimensional display. Minisom is designed to allow researchers to easily build on top of it and to give students the ability to quickly grasp its details.

Updates about MiniSom are posted on Twitter.

Installation

Just use pip:

pip install minisom

or download MiniSom to a directory of your choice and use the setup script:

git clone https://github.com/JustGlowing/minisom.git
python setup.py install

How to use it

In order to use MiniSom you need your data organized as a Numpy matrix where each row corresponds to an observation or as list of lists like the following:

data = [[ 0.80,  0.55,  0.22,  0.03],
        [ 0.82,  0.50,  0.23,  0.03],
        [ 0.80,  0.54,  0.22,  0.03],
        [ 0.80,  0.53,  0.26,  0.03],
        [ 0.79,  0.56,  0.22,  0.03],
        [ 0.75,  0.60,  0.25,  0.03],
        [ 0.77,  0.59,  0.22,  0.03]]      

Then you can train MiniSom just as follows:

from minisom import MiniSom    
som = MiniSom(6, 6, 4, sigma=0.3, learning_rate=0.5) # initialization of 6x6 SOM
som.train(data, 100) # trains the SOM with 100 iterations

You can obtain the position of the winning neuron on the map for a given sample as follows:

som.winner(data[0])

For an overview of all the features implemented in minisom you can browse the following examples: https://github.com/JustGlowing/minisom/tree/master/examples

Export a SOM and load it again

A model can be saved using pickle as follows

import pickle
som = MiniSom(7, 7, 4)

# ...train the som here

# saving the som in the file som.p
with open('som.p', 'wb') as outfile:
    pickle.dump(som, outfile)

and can be loaded as follows

with open('som.p', 'rb') as infile:
    som = pickle.load(infile)

Note that if a lambda function is used to define the decay factor MiniSom will not be pickable anymore.

Explore parameters

You can use this dashboard to explore the effect of the parameters on a sample dataset: https://share.streamlit.io/justglowing/minisom/dashboard/dashboard.py

Examples

Here are some of the charts you'll see how to generate in the examples:

Seeds map Class assignment
Handwritteng digits mapping Hexagonal Topology som hexagonal toplogy
Color quantization Outliers detection

Other tutorials

How to cite MiniSom

@misc{vettigliminisom,
  title={MiniSom: minimalistic and NumPy-based implementation of the Self Organizing Map},
  author={Giuseppe Vettigli},
  year={2018},
  url={https://github.com/JustGlowing/minisom/},
}

Who uses Minisom?

Guidelines to contribute

  1. In the description of your Pull Request explain clearly what does it implements/fixes and your changes. Possibly give an example in the description of the PR. In cases that the PR is about a code speedup, report a reproducible example and quantify the speedup.
  2. Give your pull request a helpful title that summarises what your contribution does.
  3. Write unit tests for your code and make sure the existing tests are up to date. pytest can be used for this:
pytest minisom.py
  1. Make sure that there a no stylistic issues using pycodestyle:
pycodestyle minisom.py
  1. Make sure your code is properly commented and documented. Each public method needs to be documented as the existing ones.
Comments
  • Introducing possibility to train the SOM so that learning_rate and sigma are constant during one epoch.

    Introducing possibility to train the SOM so that learning_rate and sigma are constant during one epoch.

    This pull request introduces the possibility to train the SOM so that learning_rate and sigma are only being decreased after each epoch. During one epoch the SOM is updated once per given input vector (=len(data) times) with constant learning_rate and sigma. This should lead to a greater independence between the order of the input vectors and the resulting SOM.

    In order to use this feature, one only has to use train_epochs() instead of train().

    learning_rate and sigma could (should?) technically be updated only once every epoch but in order to change as little code as possible those parameters are still updated every time update() gets called (but with constant paramters during one epoch). This could be 'optimised' if desired.

    opened by jriege555 22
  • Fixed topographic_error() and quantization_error()

    Fixed topographic_error() and quantization_error()

    Problems:

    • The previous topographic_error() method is incorrect. bmu_1 and bmu_2 are not the coordinates of the best two matching units.
    • The previous topographic_error() and quantization_error() uses explicit for-loops, which is very slow.

    Fixes:

    • Fixed incorrect implementation of topographic_error() method.
    • Changed the topographic_error() and quantization_error() methods with vectorized implementation.
    opened by wei-zhang-thz 17
  • quantization error (theoretical question)

    quantization error (theoretical question)

    I have a question about the interpretability of the quantization error.

    How can we know that the SOM is reliable ? does the quantization error need to be lower than a certain value ?

    For exemple, in my case, i have a quantization errror of 7.0 which is quite high in comparison to the exemple given in the documentation. Does that mean my som is not reliable ?

    question 
    opened by lachhebo 13
  • Do you know why nodes change completely when I reran the same setup with varying number of iterations?

    Do you know why nodes change completely when I reran the same setup with varying number of iterations?

    Hey :-)

    First of all thank you for providing this tool, it seems very handy! I am using SOM with geopotential height anomalies over a given region as input variables to cluster meteorological circulation patterns (ca. 2000 observations). What is really strange is that the SOM nodes differ completely when I rerun the same setup with more iterations (e.g. doubling from 10000 to 20000). It produces nodes not only in a different order, but also such that have no analogue in the new SOM... Is there anything I am doing wrong?

    Thank you very much - below some details about the setup

    The example I am using most often is sigma=1 (Gaussian), lr=0.5, SOM sizes between 2x4 to 4x5. The problem occurs no matter the initialization (pca or random) and no matter the training (single, batch, random). My code is basically only:

    SOM

    som = MiniSom(som_m, som_n, ndims, sigma=sigma, learning_rate=lr, neighborhood_function='gaussian') som.pca_weights_init(somarr) som.train_batch(somarr,10000,verbose=True)

    ...

    plot

    for m in range(som_m): for n in range(som_n): ax... pltarr = som.get_weights()[m,n,:].reshape((nsomlat,nsomlon)) p = ax.contourf(somlons,somlats,pltarr,cmap='seismic', transform=ccrs.PlateCarree())

    question 
    opened by michel039 12
  • Vectorized the _activate function

    Vectorized the _activate function

    Great library, but I noticed that the training code for your SOMs is not vectorized. You use the fast_norm function a lot, which may be faster than linalg.norm for 1D arrays, but iterating over every spot in the SOM is a lot slower than just calling linalg.norm.

    This pull request replaces fast_norm with linalg.norm in 2 places where I saw iteration over the whole SOM. Some simple testing with a 100x100 SOM showed ~40x speedup on my laptop.

    After making the changes, the unit tests failed, which I believe is caused by incorrectly setting up the testing weights as a 2D array rather than a 3D array. So I changed that too, and now the unit tests pass. I also did a few rough tests of my own, and the results of self.winner(x) and the training seem to be the same as before.

    opened by AustinT 11
  • Time Series

    Time Series

    Hello! I am trying to use my time series data for the example uploaded, but I encounter this error when initializing pca. Also, the second image is the error that I encounter when I use random initialization.

    image image

    opened by jaybhiesantos 10
  • How to cluster images?

    How to cluster images?

    I would like to know how to cluster images instead of reading CSV I want to read all images from disk and cluster those images using SOM.

    Can you please share some examples?

    opened by balavenkatesh3322 10
  • Example: Hexagonal Topology bokeh

    Example: Hexagonal Topology bokeh

    Summary

    This branch actions on https://github.com/JustGlowing/minisom/issues/86 by adding to the existing examples/HexagonalTopology.ipynb notebook an interactive bokeh example of the equivalent matplotlib plot.

    The purpose of adding interactivity was so that further exploration could be conducted on the plot to see where the original data points are mapped to in the SOM space.

    Check

    • [x] This branch adds value to the main repository, so it is worthwhile to include.
    • [x] The bokeh plot is equivalent to the matplotlib plot.
    • [ ] The code is error free and works on your machine.
    • [x] The logic of showing data points in the hover tooltip is sound.

    Note

    This "closes #86".

    opened by avisionh 10
  • speed up in update method

    speed up in update method

    Hi! Thanks for sharing the library! I noticed that if you change the loop in the update method with an einsum operation you can speed up the training by some amount. Hope you find it useful. Christos

    opened by Sourmpis 10
  • Add topographic error calculation for hexagonal grid

    Add topographic error calculation for hexagonal grid

    This PR adds the functionality for Topographic Error calculation, computed by finding the first-best-matching and second-best-matching neurons in the hexagonal grid.

    Screenshot 2022-04-12 005139

    The topographic error calculation is based on the above equation, which considers if the first-best-matching and second-best-matching neurons are neighbors in the SOM grid.

    opened by TharindaDilshan 9
  • new visualizations

    new visualizations

    Hi, I have implemented a number of visualizations in the BasicUsage file. Addionally, I did some minor changes (mainly typos) in some other files. As this is my first use of github, I do not know how to separate both topics and make two pull requests... I hope this works out!

    opened by bijae 9
  • Topographic error wrong for hexagonal topography with rectangular grid

    Topographic error wrong for hexagonal topography with rectangular grid

    Hi,

    I am trying to get the topographic error from a SOM with 11x7 neurons, hexagonal topography.

    When I do, I get this error:

         21     return (-1, -1)
         22 y = som._weights.shape[1]
    ---> 23 coords = som.convert_map_to_euclidean((index % y, int(index/y)))
         24 return coords
    
    File ~/.local/lib/python3.8/site-packages/minisom.py:243, in MiniSom.convert_map_to_euclidean(self, xy)
        237 def convert_map_to_euclidean(self, xy):
        238     """Converts map coordinates into euclidean coordinates
        239     that reflects the chosen topology.
        240 
        241     Only useful if the topology chosen is not rectangular.
        242     """
    --> 243     return self._xx.T[xy], self._yy.T[xy]
    
    IndexError: index 8 is out of bounds for axis 1 with size 7
    

    I don't think this line of code makes sense:

    coords = som.convert_map_to_euclidean((index % y, int(index/y)))

    Shouldn't the parameters be inverted, e.g.:

    coords = som.convert_map_to_euclidean((int(index/y), index % y))

    Anyway, thanks for the amazing work!

    bug 
    opened by mbarison 6
  • Matching Matlab hyperparameters

    Matching Matlab hyperparameters

    Hi there!Thank you for this great work!

    I switched to using python from the Matlab, version of SOM However I found the result was quite different. Where I could have a perfect 100% in MatLab but somehow only get 19% in f1-score here.

    The only thing I changed from the default setting in Matlab is using a 10*10. som = MiniSom(10, 10, 4096, sigma=1.5, learning_rate=0.7,activation_distance='euclidean', neighborhood_function='gaussian', topology='hexagonal', random_seed=10) And this is what I had for my settings using minisom.

    Any suggestions so I could maybe recreate the result from Matlab?

    Thank you in advance!

    question 
    opened by AmousQiu 3
  • Is there a way to obtain a distance of each point to its BMU?

    Is there a way to obtain a distance of each point to its BMU?

    Hi, first and foremost thank you for your great work and allowing to implement SOM algorithm in such convienent way. I wanted to ask if there is a possibility to obtain a kind of list with the distances between each point and its Best Matching Unit (Node) on trained SOM grid? I have read the documentation and saw different attributes for the SOM object, however it appears to me that none of them allow to return the (euclidean) distance to BMU. Thanks in advance for support!

    question 
    opened by JMiklaszewski 1
  • Is there an option to obtain the BMU value directly?

    Is there an option to obtain the BMU value directly?

    Hi there,

    I am trying to use BMU values a metric to classify my data. Features are seismic attributes. Your function “distance_from_weights” was my first guess but it´s not exporting BMUS directly. We do have to manipulate it to remove the second BMU.

    np.argsort(distance_from_weights(data), axis=1)[:, :2] -----> np.argsort(distance_from_weights(data), axis=1)[:, :1]

    Do you mind to build that function?

    question 
    opened by akol67 1
  • Wrong value in topographic error function?

    Wrong value in topographic error function?

    So a topographic error occurs when the two bmu of a sample are not adjacent. Shouldn't then t = 1? If the bmu are two hops apart in a corner, their euclidean distance is sqrt(2) = 1.4142 . So with distance > 1.42 this doesn't count as an error. Or am I missing something?

    question 
    opened by SandroMartens 0
  • Example spatio-temporal climate data

    Example spatio-temporal climate data

    This pull request is to load a SOM example on climate data notebook, which is usually 2D (time, lat, lon).

    I've been looking a lot into SOM examples, and it's hard to find examples on climate data...so I hope this notebook can help future users (and also me, if you find something wrong on the use).

    For the example, I've used the tutorial dataset from Xarray.

    opened by carocamargo 2
Releases(2.3.0)
Owner
Giuseppe Vettigli
Data Scientist, teaching fellow, Python enthusiast, fearless visionarist, lateral thinker.
Giuseppe Vettigli
[NeurIPS 2021] "G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of Teacher Discriminators"

G-PATE This is the official code base for our NeurIPS 2021 paper: "G-PATE: Scalable Differentially Private Data Generator via Private Aggregation of T

AI Secure 14 Oct 12, 2022
Large-scale language modeling tutorials with PyTorch

Large-scale language modeling tutorials with PyTorch 안녕하세요. 저는 TUNiB에서 머신러닝 엔지니어로 근무 중인 고현웅입니다. 이 자료는 대규모 언어모델 개발에 필요한 여러가지 기술들을 소개드리기 위해 마련하였으며 기본적으로

TUNiB 172 Dec 29, 2022
TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks

TSP: Temporally-Sensitive Pretraining of Video Encoders for Localization Tasks [Paper] [Project Website] This repository holds the source code, pretra

Humam Alwassel 83 Dec 21, 2022
A PyTorch implementation of the Relational Graph Convolutional Network (RGCN).

Torch-RGCN Torch-RGCN is a PyTorch implementation of the RGCN, originally proposed by Schlichtkrull et al. in Modeling Relational Data with Graph Conv

Thiviyan Singam 66 Nov 30, 2022
Experiments for distributed optimization algorithms

Network-Distributed Algorithm Experiments -- This repository contains a set of optimization algorithms and objective functions, and all code needed to

Boyue Li 40 Dec 04, 2022
A testcase generation tool for Persistent Memory Programs.

PMFuzz PMFuzz is a testcase generation tool to generate high-value tests cases for PM testing tools (XFDetector, PMDebugger, PMTest and Pmemcheck) If

Systems Research at ShiftLab 14 Jul 24, 2022
Causal Imitative Model for Autonomous Driving

Causal Imitative Model for Autonomous Driving Mohammad Reza Samsami, Mohammadhossein Bahari, Saber Salehkaleybar, Alexandre Alahi. arXiv 2021. [Projec

VITA lab at EPFL 8 Oct 04, 2022
a pytorch implementation of auto-punctuation learned character by character

Learning Auto-Punctuation by Reading Engadget Articles Link to Other of my work 🌟 Deep Learning Notes: A collection of my notes going from basic mult

Ge Yang 137 Nov 09, 2022
PyTorch Code of "Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spatiotemporal Dynamics"

Memory In Memory Networks It is based on the paper Memory In Memory: A Predictive Neural Network for Learning Higher-Order Non-Stationarity from Spati

Yang Li 12 May 30, 2022
Bayes-Newton—A Gaussian process library in JAX, with a unifying view of approximate Bayesian inference as variants of Newton's algorithm.

Bayes-Newton Bayes-Newton is a library for approximate inference in Gaussian processes (GPs) in JAX (with objax), built and actively maintained by Wil

AaltoML 165 Nov 27, 2022
Multi agent DDPG algorithm written in Python + Pytorch

Multi agent DDPG algorithm written in Python + Pytorch. It also includes a Jupyter notebook, Tennis.ipynb, as a showcase.

Rogier Wachters 2 Feb 26, 2022
The official implementation of paper Siamese Transformer Pyramid Networks for Real-Time UAV Tracking, accepted by WACV22

SiamTPN Introduction This is the official implementation of the SiamTPN (WACV2022). The tracker intergrates pyramid feature network and transformer in

Robotics and Intelligent Systems Control @ NYUAD 28 Nov 25, 2022
Code for the Lovász-Softmax loss (CVPR 2018)

The Lovász-Softmax loss: A tractable surrogate for the optimization of the intersection-over-union measure in neural networks Maxim Berman, Amal Ranne

Maxim Berman 1.3k Jan 04, 2023
[SIGMETRICS 2022] One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search

One Proxy Device Is Enough for Hardware-Aware Neural Architecture Search paper | website One Proxy Device Is Enough for Hardware-Aware Neural Architec

10 Dec 16, 2022
Continual reinforcement learning baselines: experiment specifications, implementation of existing methods, and common metrics. Easily extensible to new methods.

Continual Reinforcement Learning This repository provides a simple way to run continual reinforcement learning experiments in PyTorch, including evalu

55 Dec 24, 2022
Official repository for "Restormer: Efficient Transformer for High-Resolution Image Restoration". SOTA for motion deblurring, image deraining, denoising (Gaussian/real data), and defocus deblurring.

Restormer: Efficient Transformer for High-Resolution Image Restoration Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan,

Syed Waqas Zamir 906 Dec 30, 2022
code for "Self-supervised edge features for improved Graph Neural Network training",

Self-supervised edge features for improved Graph Neural Network training Data availability: Here is a link to the raw data for the organoids dataset.

Neal Ravindra 23 Dec 02, 2022
Preparation material for Dropbox interviews

Dropbox-Onsite-Interviews A guide for the Dropbox onsite interview! The Dropbox interview question bank is very small. The bank has been in a Chinese

386 Dec 31, 2022
A repository for storing njxzc final exam review material

文档地址,请戳我 👈 👈 👈 ☀️ 1.Reason 大三上期末复习软件工程的时候,发现其他高校在GitHub上开源了他们学校的期末试题,我很受触动。期末

GuJiakai 2 Jan 18, 2022
Source Code for Simulations in the Publication "Can the brain use waves to solve planning problems?"

Code for Simulations in the Publication Can the brain use waves to solve planning problems? Installing Required Python Packages Please use Python vers

EMD Group 2 Jul 01, 2022