Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

Last update: Dec 20, 2022

Overview

DISCONTINUATION OF PROJECT. This project will no longer be maintained by Intel. Intel will not provide or guarantee development of or support for this project, including but not limited to, maintenance, bug fixes, new releases or updates. Patches to this project are no longer accepted by Intel. If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the community, please create your own fork of the project.

neon

neon is Intel's reference deep learning framework committed to best performance on all hardware. Designed for ease-of-use and extensibility.

Tutorials and iPython notebooks to get users started with using neon for deep learning.
Support for commonly used layers: convolution, RNN, LSTM, GRU, BatchNorm, and more.
Model Zoo contains pre-trained weights and example scripts for start-of-the-art models, including: VGG, Reinforcement learning, Deep Residual Networks, Image Captioning, Sentiment analysis, and more.
Swappable hardware backends: write code once and then deploy on CPUs, GPUs, or Nervana hardware

For fast iteration and model exploration, neon has the fastest performance among deep learning libraries (2x speed of cuDNNv4, see benchmarks).

2.5s/macrobatch (3072 images) on AlexNet on Titan X (Full run on 1 GPU ~ 26 hrs)
Training VGG with 16-bit floating point on 1 Titan X takes ~10 days (original paper: 4 GPUs for 2-3 weeks)

We use neon internally at Intel Nervana to solve our customers' problems across many domains. We are hiring across several roles. Apply here!

See the new features in our latest release. We want to highlight that neon v2.0.0+ has been optimized for much better performance on CPUs by enabling Intel Math Kernel Library (MKL). The DNN (Deep Neural Networks) component of MKL that is used by neon is provided free of charge and downloaded automatically as part of the neon installation.

Quick Install

Local install and dependencies

On a Mac OSX or Linux machine, enter the following to download and install neon (conda users see the guide), and use it to train your first multi-layer perceptron. To force a python2 or python3 install, replace make below with either make python2 or make python3.

    git clone https://github.com/NervanaSystems/neon.git
    cd neon
    make
    . .venv/bin/activate

Starting after neon v2.2.0, the master branch of neon will be updated weekly with work-in-progress toward the next release. Check out a release tag (e.g., "git checkout v2.2.0") for a stable release. Or simply check out the "latest" release tag to get the latest stable release (i.e., "git checkout latest")

Install via pypi

From version 2.4.0, we re-enabled pip install. Neon can be installed using package name nervananeon.

    pip install nervananeon

It is noted that aeon needs to be installed separately. The latest release v2.6.0 uses aeon v1.3.0.

Warning

Between neon v2.1.0 and v2.2.0, the aeon manifest file format has been changed. When updating from neon < v2.2.0 manifests have to be recreated using ingest scripts (in examples folder) or updated using this script.

Use a script to run an example

    python examples/mnist_mlp.py

Selecting a backend engine from the command line

The gpu backend is selected by default, so the above command is equivalent to if a compatible GPU resource is found on the system:

    python examples/mnist_mlp.py -b gpu

When no GPU is available, the optimized CPU (MKL) backend is now selected by default as of neon v2.1.0, which means the above command is now equivalent to:

    python examples/mnist_mlp.py -b mkl

If you are interested in comparing the default mkl backend with the non-optimized CPU backend, use the following command:

    python examples/mnist_mlp.py -b cpu

Use a yaml file to run an example

Alternatively, a yaml file may be used run an example.

    neon examples/mnist_mlp.yaml

To select a specific backend in a yaml file, add or modify a line that contains backend: mkl to enable mkl backend, or backend: cpu to enable cpu backend. The gpu backend is selected by default if a GPU is available.

Recommended Settings for neon with MKL on Intel Architectures

The Intel Math Kernel Library takes advantages of the parallelization and vectorization capabilities of Intel Xeon and Xeon Phi systems. When hyperthreading is enabled on the system, we recommend the following KMP_AFFINITY setting to make sure parallel threads are 1:1 mapped to the available physical cores.

    export OMP_NUM_THREADS=<Number of Physical Cores>
    export KMP_AFFINITY=compact,1,0,granularity=fine

    export OMP_NUM_THREADS=<Number of Physical Cores>
    export KMP_AFFINITY=verbose,granularity=fine,proclist=[0-<Number of Physical Cores>],explicit

For more information about KMP_AFFINITY, please check here. We encourage users to set out trying and establishing their own best performance settings.

Documentation

The complete documentation for neon is available here. Some useful starting points are:

Tutorials for neon
Overview of the neon workflow
API documentation
Resources for neon and deep learning

Support

For any bugs or feature requests please:

Search the open and closed issues list to see if we're already working on what you have uncovered.
Check that your issue/request hasn't already been addressed in our Frequently Asked Questions (FAQ) or neon-users Google group.
File a new issue or submit a new pull request if you have some code you'd like to contribute

For other questions and discussions please post a message to the neon-users Google group

License

We are releasing neon under an open source Apache 2.0 License. We welcome you to contact us with your use cases.

Comments

MKL backend performance regression with some topologies

Hello! I use neon to train a model on three backends: CPU, MKL and GPU. All the settings are the same when running with these backends. I got very similar costs from CPU and GPU, while the cost from MKL backend was usually higher than the previous two, sometimes even nan. Anybody has an idea why does that happen?

The CPU is an Intel i7; the GPU is a Nvidia GTX 1050; the code is running on Ubuntu 16.04. Here is the printed result of the code...

Use cpu as backend.

DISPLAY:neon:-------------------------------------------------------------------------------------
DISPLAY:neon:|    Func     |    Mean     |   Median    |     Min     |     Max     |    Units    |
DISPLAY:neon:-------------------------------------------------------------------------------------
DISPLAY:neon:| fprop       |  456.74     |  452.61     |  439.07     |  501.7      |    msec     |
DISPLAY:neon:| bprop       |  819.21     |  796.45     |  772.53     |  979.8      |    msec     |
DISPLAY:neon:| iteration   |  1276       |  1250       |  1213.5     |  1457       |    msec     |
DISPLAY:neon:-------------------------------------------------------------------------------------

Epoch 0   [Train |████████████████████|  246/246  batches, 3.51 cost, 303.30s] [CrossEntropyMulti Loss 0.00, 0.00s]
Epoch 1   [Train |████████████████████|  245/245  batches, 3.49 cost, 301.14s] [CrossEntropyMulti Loss 0.00, 0.00s]
Epoch 2   [Train |████████████████████|  245/245  batches, 3.47 cost, 301.43s] [CrossEntropyMulti Loss 0.00, 0.00s]
Epoch 3   [Train |████████████████████|  245/245  batches, 3.46 cost, 302.56s] [CrossEntropyMulti Loss 0.00, 0.00s]
Epoch 4   [Train |████████████████████|  245/245  batches, 3.44 cost, 302.91s] [CrossEntropyMulti Loss 0.00, 0.00s]
Neon training finishes in 1646.99 seconds.
Misclassification error = 91.2%. Finished in 26.86 seconds.
Top 3 Misclassification error = 78.1%. Finished in 27.36 seconds.
Top 5 Misclassification error = 65.7%. Finished in 27.36 seconds.
Misclassification error = 91.7% on test set. Finished in 43.54 seconds.
Top 3 Misclassification error = 79.8% on test set. Finished in 43.60 seconds.
Top 5 Misclassification error = 67.3% on test set. Finished in 43.76 seconds.


Use mkl as backend.

DISPLAY:neon:-------------------------------------------------------------------------------------
DISPLAY:neon:|    Func     |    Mean     |   Median    |     Min     |     Max     |    Units    |
DISPLAY:neon:-------------------------------------------------------------------------------------
DISPLAY:neon:| fprop       |  119.82     |  120.03     |  111.14     |  130.82     |    msec     |
DISPLAY:neon:| bprop       |  157.51     |  156.32     |  151.81     |  165.86     |    msec     |
DISPLAY:neon:| iteration   |  277.33     |  280.49     |  264.03     |  285.16     |    msec     |
DISPLAY:neon:-------------------------------------------------------------------------------------

Epoch 0   [Train |████████████████████|  246/246  batches, 48.12 cost, 70.76s] [CrossEntropyMulti Loss 0.00, 0.00s]
Epoch 1   [Train |████████████████████|  245/245  batches, 47.54 cost, 73.94s] [CrossEntropyMulti Loss 0.00, 0.00s]
Epoch 2   [Train |████████████████████|  245/245  batches, 48.52 cost, 77.99s] [CrossEntropyMulti Loss 0.00, 0.00s]
Epoch 3   [Train |████████████████████|  245/245  batches, 48.09 cost, 74.04s] [CrossEntropyMulti Loss 0.00, 0.00s]
Epoch 4   [Train |████████████████████|  245/245  batches, 48.20 cost, 79.86s] [CrossEntropyMulti Loss 0.00, 0.00s]
Neon training finishes in 422.74 seconds.
Misclassification error = 94.6%. Finished in 9.29 seconds.
Top 3 Misclassification error = 90.1%. Finished in 9.56 seconds.
Top 5 Misclassification error = 85.6%. Finished in 9.78 seconds.
Misclassification error = 94.5% on test set. Finished in 15.48 seconds.
Top 3 Misclassification error = 90.0% on test set. Finished in 15.47 seconds.
Top 5 Misclassification error = 85.5% on test set. Finished in 14.99 seconds.


Use gpu as backend.

DISPLAY:neon:-------------------------------------------------------------------------------------
DISPLAY:neon:|    Func     |    Mean     |   Median    |     Min     |     Max     |    Units    |
DISPLAY:neon:-------------------------------------------------------------------------------------
DISPLAY:neon:| fprop       |  6.1057     |  6.0366     |  5.8992     |  6.3699     |    msec     |
DISPLAY:neon:| bprop       |  10.76      |  10.753     |  9.9809     |  11.841     |    msec     |
DISPLAY:neon:| iteration   |  16.865     |  16.783     |  15.88      |  18.185     |    msec     |
DISPLAY:neon:-------------------------------------------------------------------------------------

Epoch 0   [Train |████████████████████|  246/246  batches, 3.51 cost, 3.98s] [CrossEntropyMulti Loss 0.00, 0.00s]
Epoch 1   [Train |████████████████████|  245/245  batches, 3.48 cost, 3.97s] [CrossEntropyMulti Loss 0.00, 0.00s]
Epoch 2   [Train |████████████████████|  245/245  batches, 3.47 cost, 3.98s] [CrossEntropyMulti Loss 0.00, 0.00s]
Epoch 3   [Train |████████████████████|  245/245  batches, 3.46 cost, 3.98s] [CrossEntropyMulti Loss 0.00, 0.00s]
Epoch 4   [Train |████████████████████|  245/245  batches, 3.44 cost, 3.98s] [CrossEntropyMulti Loss 0.00, 0.00s]
Neon training finishes in 21.84 seconds.
Misclassification error = 91.2%. Finished in 0.38 seconds.
Top 3 Misclassification error = 78.0%. Finished in 0.38 seconds.
Top 5 Misclassification error = 65.6%. Finished in 0.38 seconds.
Misclassification error = 91.6% on test set. Finished in 0.60 seconds.
Top 3 Misclassification error = 79.8% on test set. Finished in 0.60 seconds.
Top 5 Misclassification error = 67.4% on test set. Finished in 0.60 seconds.

opened by moderato 24

Prediction drops to 0 after certain number of epochs

I'm using Neon for my deep Q-learning code - https://github.com/tambetm/simple_dqn. Recently I noticed an issue, that prediction of my network drops to 0 after certain number of epochs. This can be seen from Q-value graph:

Normally it would look like this: This plot was produced using Neon commit hash 7a56fa9645a51e97c05f2e5afbbd1df7057ae832 from October 30th. My code is exactly the same.

The most plausible explanation would be, that weights are truncated to 0 at some point. Because my code hasn't changed, I suspect something in Neon code related to saving and loading weights repeatedly. In my code I need to clone a model, and simplest (and most compatible) way of doing that is to just save and load the model. I do this ~45 times before network prediction drops (it doesn't drop always at the same moment).

Any ideas what change could have resulted in such a behavior and how to debug it?
bug

opened by tambetm 24
Running mnist-small.yaml example after setup - getting error

Hi, Just installed neon on Ubuntu 14 python 3.4 with the following command: [email protected]:~/neon$ neon examples/mlp/mnist-small.yaml and getting an error message: Traceback (most recent call last): File "/home/nir/anaconda3/bin/neon", line 240, in experiment, result, status = main() File "/home/nir/anaconda3/bin/neon", line 126, in main experiment = deserialize(args.yaml_file) File "/home/nir/anaconda3/lib/python3.4/site-packages/neon/util/persist.py", line 183, in deserialize if not isinstance(load_path, file): NameError: name 'file' is not defined I check in the directory - this file exists. Appreciate your assistance thanks N

opened by nirre1401 18
Updated Docker images

I've updated my Docker builds for version 1.0 - one for the cpu backend, and a new one for the gpu backend. The GPU images referenced in the 0.9 docs are still available, but with a note about deprecation.

I've tested the cpu version with neon examples/mnist_mlp.yaml and python examples/mnist_mlp.py, and it appears fine. However, the gpu version builds the cpu version because of https://github.com/NervanaSystems/neon/issues/83. When building the code to check for the GPU capabilities, please keep https://github.com/NervanaSystems/neon/issues/19 in mind.

opened by Kaixhin 17

Enable gpu error

I install cuda and environment set,

nvidia-smi 
Wed Dec  6 13:53:24 2017       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  Off  | 00000000:2F:00.0 Off |                    0 |
| N/A   33C    P0    31W / 250W |  15553MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+
|   1  Tesla P100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
| N/A   36C    P0    31W / 250W |  15479MiB / 16276MiB |      0%      Default |
+-------------------------------+----------------------+----------------------+

pip install nervananeon

env | grep PATH
LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/lib/python2.7/site-packages/mklml_lnx_2018.0.1.20171007/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin

python  mnist.py -b gpu
python : error: argument -b/--backend: invalid choice: 'gpu' (choose from 'cpu', 'mkl')

I don't understand why CPU and GPU install the same package,It should be used when the GPU package is installed by TensorFlow.

pip install tensorflow-gpu

opened by yangyang-zhang 12

'magic' is not very descriptive :-D
You have this cool function by dividing by integers by using bitshift, and first multiplying by another number, so you're not limited to dividing by powers of 2, as described in https://gmplib.org/~tege/divcnst-pldi94.pdf

At the moment, this function is called 'magic', but I'm not sure it's very descriptive? I've renamed it to get_div_mul_shift in my own branch: https://github.com/hughperkins/winogradCl/blob/api/winogradcl/util/math_helper.py#L33

def get_div_mul_shift_32(nmax, d)
opened by hughperkins 11
Race condition in softmax-like expressions?

There appears to be a race condition in certain expressions, such as the denominator of softmax with axis=1 (unlike in neon.transforms.activation):

https://gist.github.com/oleg-trott/30b802902fd8c63ce002

I tried this on several GTX 980 cards and see it on all of them. The errors are rare, about 0.1-0.2%. However, when they happen, they are usually dramatic.

I also see these errors on GTX 750, but they are about 100 times less frequent.

opened by oleg-trott 11
Create custom layer

How do I start to implement a new layer? I want to implement a few more exotic pooling functions, so basically I want to change the function that takes the max / average and maybe even the gradient for that layer. It would be great to get a step by step recommendation on how to do such things.

opened by Sowasa 11

make neon error

$make HAS_GPU=true
Building MKL Engine...
which: no icc in (/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
using GNU compiler...
basename: missing operand
Try 'basename --help' for more information.
mkl root: /opt/neon/
make[1]: Entering directory `/opt/neon/neon/backends/mklEngine'
make[1]: Leaving directory `/opt/neon/neon/backends/mklEngine'
make[1]: Entering directory `/opt/neon/neon/backends/mklEngine'
Building mklEngine.so...
gcc src/conv.c src/pooling.c src/relu.c src/batchNorm.c src/concat.c src/softmax.c src/MKLDNN.h -shared -o mklEngine.so -std=c99 -O3 -I/opt/neon//include -L/opt/neon//lib -Wl,-rpath=/opt/neon//lib -fopenmp -lmklml_gnu -fPIC -march=native -g -liomp5
In file included from src/conv.c:15:0:
src/MKLDNN.h:23:21: fatal error: mkl_dnn.h: No such file or directory
 #include <mkl_dnn.h>
                     ^
compilation terminated.
In file included from src/pooling.c:15:0:
src/MKLDNN.h:23:21: fatal error: mkl_dnn.h: No such file or directory
 #include <mkl_dnn.h>
                     ^
compilation terminated.
In file included from src/relu.c:15:0:
src/MKLDNN.h:23:21: fatal error: mkl_dnn.h: No such file or directory
 #include <mkl_dnn.h>
                     ^

$ bash prepare_mkl.sh

Checking MKLML dependencies...
Downloading required MKLML version mklml_lnx_2018.0.1.20171007 ...
MKLML dependencies installed: MKLROOT=/opt/neon/
basename: missing operand
Try 'basename --help' for more information.
/opt/neon/ 1

I need help,thanks

opened by yangyang-zhang 10

Assertion Error loading weights to hidden layers

Hi guys,

An assertion error is thrown when I run this line of code: layer.load_weights(params)

I looked at the source code and it looks like I might be missing an argument in the function called 'self'. Not sure what is meant by self and if I missed the documentation for it I apologize. I know I do not need the load_states argument since it defaults to true.

Full code here, similar to VGG example: param_layers = [l for l in model.layers.layers] param_dict_list = trained_vgg['model']['config']['layers'] for layer, params in zip(param_layers, param_dict_list): if(layer.name == 'class_layer'): break print(params) layer.load_weights(params)

opened by pantherso48 10
Ubuntu 16.04: `unsupported GNU version! gcc versions later than 4.9 are not supported!`
Ubuntu 16.04: unsupported GNU version! gcc versions later than 4.9 are not supported!

I have gcc4.9 installed. In Torch, I can select gcc4.9 by doing:

export CC=gcc-4.9 export CXX=g++-4.9

Unclear how to do this for Neon?
opened by hughperkins 10
No module named neon.util.compat

Hi All,

I am getting this error when I am running the script that has the following line, from neon.util.compat import range, StringIO. And I get the error that there isn't a module named neon.util.compat. How do I fix this error?

And help will be appreciated

opened by chandratejatiriveedhi 0
About installation

Hi! my name is keval pandya . I have issue about neon that . how can I install neon on my python in windows of . the python version is 3.8.5. so can you tell me how can I install neon because I am learning deep-learning from intel and there is use of neon framework so please help me fast as possible so I can learn . please provide me solution to install it on windows

opened by keval2232 1
docs: fix simple typo, sclae -> scale

There is a small typo in examples/ssd/mboxloss.py.

Should read scale rather than sclae.

Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

opened by timgates42 0
IndexError: index 200 is out of bounds for axis 0 with size 2

I have been doing the codes as -

df=pd.DataFrame(adata) df.head()

xq=adata['batch size'].values.reshape(-1,1) yq=adata['Price'].values.reshape(-1,1)

mod = smf.quantreg('yq ~ xq', df) res = mod.fit(q=.5) print(res.summary())

quantiles = [.05, .25, .50, .75, .95] def fit_model(q): res = mod.fit(q=q) return [q, res.params['Intercept'], res.params[xq]] + res.conf_int().ix[xq].tolist() models = [fit_model(xq) for xq in quantiles]

However am getting the error as IndexError: index 200 is out of bounds for axis 0 with size 2

opened by krpwn 0
IndexError: index 5000 is out of bounds for axis 0 with size 5000
Hi, I am running my capstone project and working on my dataset. When I tried to clean my dataset removing the outliers, I am getting this error. I am attaching the code as below.

#Removing Outliers #Tukey Method

import required libraries

from collections import Counter

Outlier detection

def detect_outliers(df,n,features):

outlier_indices = [] # iterate over features(columns) for col in features: # 1st quartile (25%) Q1 = np.percentile(df[col], 25) # 3rd quartile (75%) Q3 = np.percentile(df[col],75) # Interquartile range (IQR) IQR = Q3 - Q1 # outlier step outlier_step = 1.5 * IQR # Determine a list of indices of outliers for feature col outlier_list_col = df[(df[col] < Q1 - outlier_step) | (df[col] > Q3 + outlier_step )].index # append the found outlier indices for col to the list of outlier indices outlier_indices.extend(outlier_list_col) # select observations containing more than 2 outliers outlier_indices = Counter(outlier_indices) multiple_outliers = list( k for k, v in outlier_indices.items() if v > n ) return multiple_outliers

List of Outliers

Outliers_to_drop = detect_outliers(data1.drop('Class',axis=1),0,list(data1.drop('Class',axis=1))) data1.drop('Class',axis=1).loc[Outliers_to_drop]

#Create New Dataset without Outliers good_data = data1.drop(data1.index[Outliers_to_drop]).reset_index(drop = True) good_data.info()

IndexError Traceback (most recent call last) in 1 #Create New Dataset without Outliers ----> 2 good_data = data1.drop(data1.index[Outliers_to_drop]).reset_index(drop = True) 3 good_data.info()

~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in getitem(self, key) 4289 4290 key = com.values_from_object(key) -> 4291 result = getitem(key) 4292 if not is_scalar(result): 4293 return promote(result)

IndexError: index 5000 is out of bounds for axis 0 with size 5000

Can any one help me to fix this and code it properly.
opened by venkidevictor 0
pip install failed, posix-ipc using sys/time.h failed

pip install nervananeon failed.

posix_ipc_module.c(37): fatal error C1083: 无法打开包括文件: “sys/time.h” no such file or directory

Mine is CPU x86-64

How can I do?

opened by silkyrose 2

Releases(v2.6.0)

v2.6.0(Jan 5, 2018)
Further optimized MKL backend performance for SSD inference

Updated MKLML to version 20171227

Enabled neon install with MKLML on Mac OSX

Source code(tar.gz)
Source code(zip)
v2.5.0(Dec 21, 2017)
Optimized SSD MKL backend performance (~3X boost version over version)

Bumped aeon version to v1.3.0

Fixed inference performance issue of MKL batchnorm

Fixed batch prediction issue for gpu backend

Enabled subset_pct for MNIST_DCGAN example

Updated "make clean" to clean up mkl artifacts

Added dockerfile for IA mkl

Source code(tar.gz)
Source code(zip)
v2.4.0(Nov 27, 2017)
Enabled pip install through pypi

Updated MKLML to version 20171007 with performance improvement of ~3X for mnist datalayer/nondatalayer and ~1.6X for DCGAN/WGAN datalayer

Updated resnet model to optimize performance with MKLML 20171007

Updated Alexnet weight file and fixed bug for deep dream

Fixed faster-rcnn inference model loading issue

Added data_loading time measurement and enabled GAN networks benchmarking

Updated to Aeon version 1.2.0

Enabled neon build with mklEngine on Windows systems

Source code(tar.gz)
Source code(zip)
v2.3.0(Oct 27, 2017)
Optimized DeepSpeech2 MKL backend performance (~7X improvement over the CPU backend)

Fused convolution and bias layer which significantly boosted AlexNet and VGG performance on Intel architectures with MKL backend

Made SSD and Faster-RNN use VGG weight files in new format

Fixed use of reset_cells hyperparameter

Fixed MKL backend bug for GAN and Faster-RCNN models

Source code(tar.gz)
Source code(zip)
v2.2.0(Sep 27, 2017)
Update MKLML version 20170908 that fixes a bug related to data conversions

Add SSD example for bounding box object detection that works for both GPU and MKL backend

Add DeepSpeech2 MKL backend optimization that features ~3X improvement

Update aeon to 1.0.0 including new version of manifest (doc/source/loading_data.rst#aeon-dataloader)

Add CHWD Support for Batch Normalization in mkl backend

Modify ResNet-50 model's last layer to match the original ResNet-50 model paper

Enable Seq2Seq testing and benchmarking

Source code(tar.gz)
Source code(zip)
v2.1.0(Aug 2, 2017)
Set MKL backend (-b mkl) as the default CPU backend on Linux (use -b cpu to specify original CPU backend)

Update MKLML version 20170720 (AVX512 code paths enabled by default and conversion optimizations)

Simplify ResNet example

Makefiles now check for virtualenv and pkg-config (NervanaSystems/neon#383)

Fix Deep Speech2 model on MKL backend

Fix MKL installation for "make sysinstall"

Source code(tar.gz)
Source code(zip)
v2.0.0(Jun 28, 2017)
Added support for MKL backend (-b mkl) on Linux, which boosts neon CPU performance significantly

Added WGAN model examples for LSUN and MNIST data

Enabled WGAN and DCGAN model examples for Python3

Added fix (using file locking) to prevent race conditions running multiple jobs on the same machine with multiple GPUs

Added functionality to display some information about hardware, OS and model used

Updated appdirs to 1.4.3 to be compatibile on Centos 7.3 for appliance

Source code(tar.gz)
Source code(zip)
v1.9.0(May 4, 2017)
Add support for 3D deconvolution

Generative Adversarial Networks (GAN) implementation, and MNIST DCGAN example, following GoodFellow 2014 (http://arXiv.org/abs/1406.2661)

Implement Wasserstein GAN cost function and make associated API changes for GAN models

Add a new benchmarking script with per-layer timings

Add weight clipping for GDM, RMSProp, Adagrad, Adadelta and Adam optimizers

Make multicost an explicit choice in mnist_branch.py example

Enable NMS kernels to work with normalized boxes and offset

Fix missing links in api.rst [#366]

Fix docstring for --datatype option to neon [#367]

Fix perl shebang in maxas.py and allow for build with numpy 1.12 [#356]

Replace os.path.join for Windows interoperability [#351]

Update aeon to 0.2.7 to fix a seg fault on termination

Source code(tar.gz)
Source code(zip)
v1.8.2(Feb 24, 2017)
Make the whale calls example stable and shuffle dataset before splitting into subsets

Reduce default depth in cifar_msra example to 2

Fix the formatting of the conv layer description

Fix documentation error in the video-c3d example

Support greyscale videos

Source code(tar.gz)
Source code(zip)
v1.8.1(Jan 18, 2017)
Bug fix: Add dilation to object dict and assign defaults to dil_w = dil_h = 1 [#335, #336]

Bug fix: Prevent GPU backend from ignoring non-zero slope in Rectlinclip and change default slope to 0

Bug fix: Nesterov momentum was updating velocities incorrectly

Source code(tar.gz)
Source code(zip)
v1.8.0(Dec 28, 2016)
Skip Thought Vectors (http://arxiv.org/abs/1506.06726) example

Dilated convolution support

Nesterov Accelerated Gradient option to SGD optimizer

MultiMetric class to allow wrapping Metric classes

Support for serializing and deserializing encoder-decoder models

Allow specifying the number of time steps to evaluate during beam search

A new community-contributed Docker image

Improved error messages when a tensor is created with an invalid shape or reshaped to an incompatible size

Fix bugs in MultiCost support

Documentation fixes [#331]

Source code(tar.gz)
Source code(zip)
v1.7.0(Nov 21, 2016)
Update Data Loader to aeon https://github.com/NervanaSystems/aeon for flexible, multi-threaded data loading and transformations

Add Neural Machine Translation model

Remove Fast RCNN model (use Faster RCNN model instead)

Remove music_genres example

Fix super blocking for small N with 1D conv

Fix update-direct conv kernel for small N

Add gradient clipping to Adam optimizer

Documentation updates and bug fixes

Source code(tar.gz)
Source code(zip)
v1.6.0(Sep 21, 2016)
Faster RCNN model

Sequence to Sequence container and char_rae recurrent autoencoder model

Reshape Layer that reshapes the input [#221]

Pip requirements in requirements.txt updated to latest versions [#289]

Remove deprecated data loaders and update docs

Use NEON_DATA_CACHE_DIR envvar as archive dir to store DataLoader ingested data

Eliminate type conversion for FP16 for CUDA compute capability >= 5.2

Use GEMV kernels for batch size 1

Alter delta buffers for nesting of merge-broadcast layers

Support for ncloud real-time logging

Add fast_style Makefile target

Fix Python 3 builds on Ubuntu 16.04

Run setup.py for sysinstall to generate version.py [#282]

Fix broken link in mnist docs

Fix conv/deconv tests for CPU execution and fix i32 data type

Fix for average pooling with batch size 1

Change default scale_min to allow random cropping if omitted

Fix yaml loading

Fix bug with image resize during injest

Update references to the ModelZoo and neon examples to their new locations

Source code(tar.gz)
Source code(zip)
v1.5.4(Jul 15, 2016)
Python2/Python3 compatibility [#191]

Support for Pascal GPUs

Persistent RNN kernels [#262]

Implement Binarized Neural Networks from http://arxiv.org/pdf/1602.02830v3.pdf (added in v1.5.4)

Dataloader enhancements (audio loader with examples)

HDF5 file data iterator

Convolution kernel improvements

API documentation improvements [#234, #244, #263]

Cache directory cleanup

Reorganization of all unit tests

Bug fixes [#182, #183, #231, #241, #252, #253, #257, #259, #267, #268]

Source code(tar.gz)
Source code(zip)
v1.5.3(Jul 7, 2016)
Python2/Python3 compatibility [#191]

Support for Pascal GPUs

Persistent RNN kernels [#262]

Dataloader enhancements (audio loader with examples)

HDF5 file data iterator

Convolution kernel improvements

API documentation improvements [#234, #244, #263]

Cache directory cleanup

Reorganization of all unit tests

Bug fixes [#182, #183, #231, #241, #252, #253, #257, #259, #267]

Source code(tar.gz)
Source code(zip)
v1.5.2(Jul 7, 2016)
Python2/Python3 compatibility [#191]

Support for Pascal GPUs

Persistent RNN kernels [#262]

Dataloader enhancements (audio loader with examples)

HDF5 file data iterator

Convolution kernel improvements

API documentation improvements [#234, #244, #263]

Cache directory cleanup

Reorganization of all unit tests

Bug fixes [#182, #183, #231, #241, #252, #253, #257, #259]

Source code(tar.gz)
Source code(zip)
v1.5.1(Jun 30, 2016)
Python2/Python3 compatibility [#191]

Support for Pascal GPUs

Persistent RNN kernels [#262]

Dataloader enhancements (audio loader with examples)

HDF5 file data iterator

Convolution kernel improvements

API documentation improvements [#234, #244, #263]

Cache directory cleanup

Reorganization of all unit tests

Bug fixes [#182, #183, #231, #241, #252, #253, #257, #259]

Source code(tar.gz)
Source code(zip)
v1.4.0(Apr 29, 2016)
VGG16 based Fast R-CNN model using winograd kernels

new, backward compatible, generic data loader

C3D video loader model trained on UCF101 dataset

Deep Dream example

make conv layer printout more informative [#222]

fix some examples to use new arg override capability

improve performance for relu for small N

better support for arbitrary batch norm layer placement

documentation updates [#210, #213, #236]

Source code(tar.gz)
Source code(zip)
v1.3.0(Mar 4, 2016)
winograd kernels and associated autotuning routines

benchmarking scripts

deprecation of deterministic argument for backend constructor

improve batch norm stability with fp16 backend

allow strided support for dimshuffle kernel

speed up zero momentum gradient descent

Source code(tar.gz)
Source code(zip)
v1.2.2(Feb 25, 2016)
benchmarking enhancements

fast dimshuffle, transpose, other kernel speedups and refactoring

batch norm states fix, deterministic updates

example fixes for fast rcnn and conv_autoencoder

image decoding rescaling method fix

deserialization fixes for RNN's, refactoring

caffe compatibility fixes

documentation updates

Source code(tar.gz)
Source code(zip)
neon-1.2.2.tar.gz(532.69 KB)
v1.2.1(Feb 5, 2016)
New MergeSum, Colornoise layers

support for aspect_ratio scaling augmentation

updated IMDB sentiment analysis example

generic CSV batchwriter

various build and deserialization bugfixes, doc updates

Source code(tar.gz)
Source code(zip)
neon-1.2.1.tar.gz(522.42 KB)
v1.2.0(Jan 31, 2016)
kepler GPU kernel support [#80]

new dataloader format, updated docs [#115, #170]

new serialization format

FastRCNN implementation, ROI pooling support [#135]

deep residual nets implementation and example

expanded model zoo

Ticker dataset and copy, repeat copy tasks

autodiff transpose support [#173]

numerous bug fixes and documentation updates.

Source code(tar.gz)
Source code(zip)
neon-1.2.0.tar.gz(467.88 KB)
v1.1.5(Jan 14, 2016)
CUDA kernels for lookuptable layer (up to 4x speedup)

support for determinstic Conv layer updatesa

LRN layer support

custom dataset walkthrough utilizing bAbI data

reduced number of threads in deep reduction EW kernels [#171]

additional (de)serialization routines [#106]

CPU tensor slicing fix

corrections for PrecisionRecall, MultiLabelStats [#148]

explicitly specify python2.7 for virtualenv [#155]

default to SM50 when no working GPU found [#186]

Add alpha to ELU activation [#164]

deconv callback fix [#162]

various documentation updates [#151, #152]

Source code(tar.gz)
Source code(zip)
neon-1.1.5.tar.gz(468.95 KB)
v1.1.4(Dec 15, 2015)
Add support for bidirectional RNNs and LSTMs

added ELU, leaky ReLU activations

significantly faster GPU kernel builds (using ptx instead of cuda-c)

data shuffling enhancements, removal of old data loader code.

caffe conv, pool, dropout layer matching and compatibility flags

add scheduling support for RMSProp

callback enhancements, additional unit tests

documentation auditing, added links to introductory video tutorials

Source code(tar.gz)
Source code(zip)
v1.1.3(Dec 1, 2015)
deconvolution and weight histogram visualization examples and documentation

CPU convolution and pooling layer speedups (~2x faster)

bAbI question and answer interactive demo, dataset support.

various ImageLoader enhancements.

interactive usage improvements (shortcut Callback import, multiple Callbacks init, doc updates, single item batch size support)

set default verbosity level to warning

CIFAR10 example normalization updates

CUDA detection enhancements [#132]

only parse batch_writer arguments when used as a script, allow undefined global_mean [#137, #140]

Source code(tar.gz)
Source code(zip)
neon-1.1.3.tar.gz(444.45 KB)
v1.1.2(Nov 18, 2015)
completely re-written C++ multithreaded dataloader

new weight initialization options for recurrent layers

Added deconvolution visualization support (guided backprop)

new bAbI question answering example network

Improved performance of cifar10_allcnn, word_lstm examples

new CUDA-C max and avg pooling kernels

Additional bugfixes and documentation updates

Source code(tar.gz)
Source code(zip)
v1.1.1(Nov 6, 2015)
Callback initialization bug fix [#127]

IMDB LSTM example bug fix [#130]

Added cuda-convnet2 style binary dropout variant

Added benchmark function to model (separate fprop, bprop, update timings)

Remove h_buffer references in lieu of outputs for recurrent layers

Multi-cost output buffer bugfix for inference [#131]

New timeseries prediction and generation example

Change Callback initialization to re-support named arguments. Separate out these arguments in argparser. [#128]

Source code(tar.gz)
Source code(zip)
v1.1.0(Oct 30, 2015)
Sentiment analysis support (LSTM lookupTable based), new IMDB example

Support for merge and branch layer stacks via LayerContainers

Sequential, Tree, MergeBroadcast, MergeMultiStream

Support for freezing layer stacks

Adagrad optimizer support

new GPU kernels for fast compounding batch norm, conv and pooling engine updates, new kernel build system and flags.

Modifications for Caffe support

conv, pooling, P/Q updates, dropout layer normalization more in-line with Caffe approach. NOTE: this breaks backwards compatibility with some strided conv/pool related models serialized using older versions of neon as the output sizes may now be different. See the FAQ for more info.

serialization enhancements to make caffe model import/export easier

use per-channel mean subtraction instead of single global. NOTE: this breaks backwards compatibility with ImgMaster saved datasets prior to this revision. To correct, please use the included update_dataset_cache.py script in the util directory.

Default training cost display during progress bar is now calculated on a rolling window basis rather than from the beginning of each epoch

Separate Layer configuration and initialization steps

YAML based alexnet example

Callback enhancements.

now pass args instead of having to spell out callbacks in each example

Changed validation callback to loss callback, validation_frequency now evaluation_frequency

Generic metric callback.

Various bug fixes

non-contiguous array get for GPUTensors

1D slicing returns 2D matrices

bin/neon serialization fixes for RNNs

3D conv fixes for fprop, bprop

batch norm inference fix

bias layer size fix

Documentation updates and improvements

Source code(tar.gz)
Source code(zip)
neon-1.1.0.tar.gz(429.88 KB)
v1.0.0(Oct 30, 2015)
Ensure root logging handler setup [#82]

C++ utility for CUDA compatibility checking [#83]

Add predict function to models [#86]

Fix bug in learning rate schedule impacting deserialization

Speed up batch norm computation

Average gradients in OpTree, fix tests

Use inference mode for fprop during validation

Add top-k misclassifcation metric

Simplify maxas install, make vis requirements optional, doc updates.

Source code(tar.gz)
Source code(zip)
neon-1.0.0.tar.gz(431.73 KB)
v0.9.0(Jul 20, 2015)

This release implements support for multi GPU processing using weird trick parallelization (data parallel for local layers, model parallel for fully-connected layers) and cleans up previously existing MPI based parallel code.

Multi GPU is only supported on newer Maxwell based cards using the NervanaGPU backend.

Older, Kepler based cards using the cudanet backend are no longer supported (some models and datasets will still work, but others may raise DeprecationWarning's). Users of these cards are encouraged to remain on the 0.8.2 release until we back-port NervanaGPU to support Kepler cards.
Source code(tar.gz)
Source code(zip)
neon-0.9.0.tar.gz(519.07 KB)

Owner

Nervana

Intel® Nervana™ - Artificial Intelligence Products Group

GitHub Repository http://neon.nervanasys.com/docs/latest

Source code for 2021 ICCV paper "In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces"

In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces This is the PyTorch implementation for 2021 ICCV paper "In-the-Wild Single C

27 Dec 06, 2022

Pytorch implementation of

EfficientTTS Unofficial Pytorch implementation of "EfficientTTS: An Efficient and High-Quality Text-to-Speech Architecture"(arXiv). Disclaimer: Somebo

109 Nov 16, 2022

CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view.

CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view. Center-based 3D Object Detection and Tracking, Tianwei Yin, Xin

134 Dec 23, 2022

Vignette is a face tracking software for characters using osu!framework.

Vignette is a face tracking software for characters using osu!framework. Unlike most solutions, Vignette is: Made with osu!framework, the game framewo

412 Dec 28, 2022

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer Paper on arXiv Public PyTorch implementation of two-stage peer-reg

38 Oct 14, 2022

The Official PyTorch Implementation of DiscoBox.

DiscoBox: Weakly Supervised Instance Segmentation and Semantic Correspondence from Box Supervision Paper | Project page | Demo (Youtube) | Demo (Bilib

89 Jan 09, 2023

A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

Orchard Dataset This repository contains the code used for generating the Orchard Dataset, as seen in the Multi-Hierarchical Reasoning in Sequences: S

1 Jun 05, 2022

Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image (ICCV 2021)

Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color

75 Dec 02, 2022

Birthday-problem - The birthday problem asks for the probability that, in a set of n randomly chosen people, at least two will share a birthday

Birthday-problem In probability theory, the birthday problem asks for the probab

5 Jan 05, 2023

Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

Related tags

Overview

neon

Quick Install

Use a script to run an example

Selecting a backend engine from the command line

Use a yaml file to run an example

Recommended Settings for neon with MKL on Intel Architectures

Documentation

Support

License

Comments

import required libraries

Outlier detection

List of Outliers

Releases(v2.6.0)

v2.6.0(Jan 5, 2018)

v2.5.0(Dec 21, 2017)

v2.4.0(Nov 27, 2017)

v2.3.0(Oct 27, 2017)

v2.2.0(Sep 27, 2017)

v2.1.0(Aug 2, 2017)

v2.0.0(Jun 28, 2017)

v1.9.0(May 4, 2017)

v1.8.2(Feb 24, 2017)

v1.8.1(Jan 18, 2017)

v1.8.0(Dec 28, 2016)

v1.7.0(Nov 21, 2016)

v1.6.0(Sep 21, 2016)

v1.5.4(Jul 15, 2016)

v1.5.3(Jul 7, 2016)

v1.5.2(Jul 7, 2016)

v1.5.1(Jun 30, 2016)

v1.4.0(Apr 29, 2016)

v1.3.0(Mar 4, 2016)

v1.2.2(Feb 25, 2016)

v1.2.1(Feb 5, 2016)

v1.2.0(Jan 31, 2016)

v1.1.5(Jan 14, 2016)

v1.1.4(Dec 15, 2015)

v1.1.3(Dec 1, 2015)

v1.1.2(Nov 18, 2015)

v1.1.1(Nov 6, 2015)

v1.1.0(Oct 30, 2015)

v1.0.0(Oct 30, 2015)

v0.9.0(Jul 20, 2015)

Owner

Nervana

Source code for 2021 ICCV paper "In-the-Wild Single Camera 3D Reconstruction Through Moving Water Surfaces"

Pytorch implementation of

CenterPoint 3D Object Detection and Tracking using center points in the bird-eye view.

Vignette is a face tracking software for characters using osu!framework.

Two-Stage Peer-Regularized Feature Recombination for Arbitrary Image Style Transfer

The Official PyTorch Implementation of DiscoBox.

A Benchmark For Measuring Systematic Generalization of Multi-Hierarchical Reasoning

Interacting Two-Hand 3D Pose and Shape Reconstruction from Single Color Image (ICCV 2021)

Birthday-problem - The birthday problem asks for the probability that, in a set of n randomly chosen people, at least two will share a birthday

(AAAI2020)Grapy-ML: Graph Pyramid Mutual Learning for Cross-dataset Human Parsing

BaseCls BaseCls 是一个基于 MegEngine 的预训练模型库，帮助大家挑选或训练出更适合自己科研或者业务的模型结构

A TensorFlow implementation of FCN-8s

The source code and dataset for the RecGURU paper (WSDM 2022)

MSG-Transformer: Exchanging Local Spatial Information by Manipulating Messenger Tokens

Evaluation toolkit of the informative tracking benchmark comprising 9 scenarios, 180 diverse videos, and new challenges.

Retinal vessel segmentation based on GT-UNet

KoRean based ELECTRA pre-trained models (KR-ELECTRA) for Tensorflow and PyTorch

Physics-informed convolutional-recurrent neural networks for solving spatiotemporal PDEs

CLIPImageClassifier wraps clip image model from transformers

Robustness via Cross-Domain Ensembles