Intel® Nervana™ reference deep learning framework committed to best performance on all hardware

Overview

DISCONTINUATION OF PROJECT. This project will no longer be maintained by Intel. Intel will not provide or guarantee development of or support for this project, including but not limited to, maintenance, bug fixes, new releases or updates. Patches to this project are no longer accepted by Intel. If you have an ongoing need to use this project, are interested in independently developing it, or would like to maintain patches for the community, please create your own fork of the project.

neon

neon is Intel's reference deep learning framework committed to best performance on all hardware. Designed for ease-of-use and extensibility.

For fast iteration and model exploration, neon has the fastest performance among deep learning libraries (2x speed of cuDNNv4, see benchmarks).

  • 2.5s/macrobatch (3072 images) on AlexNet on Titan X (Full run on 1 GPU ~ 26 hrs)
  • Training VGG with 16-bit floating point on 1 Titan X takes ~10 days (original paper: 4 GPUs for 2-3 weeks)

We use neon internally at Intel Nervana to solve our customers' problems across many domains. We are hiring across several roles. Apply here!

See the new features in our latest release. We want to highlight that neon v2.0.0+ has been optimized for much better performance on CPUs by enabling Intel Math Kernel Library (MKL). The DNN (Deep Neural Networks) component of MKL that is used by neon is provided free of charge and downloaded automatically as part of the neon installation.

Quick Install

On a Mac OSX or Linux machine, enter the following to download and install neon (conda users see the guide), and use it to train your first multi-layer perceptron. To force a python2 or python3 install, replace make below with either make python2 or make python3.

    git clone https://github.com/NervanaSystems/neon.git
    cd neon
    make
    . .venv/bin/activate

Starting after neon v2.2.0, the master branch of neon will be updated weekly with work-in-progress toward the next release. Check out a release tag (e.g., "git checkout v2.2.0") for a stable release. Or simply check out the "latest" release tag to get the latest stable release (i.e., "git checkout latest")

From version 2.4.0, we re-enabled pip install. Neon can be installed using package name nervananeon.

    pip install nervananeon

It is noted that aeon needs to be installed separately. The latest release v2.6.0 uses aeon v1.3.0.

Warning

Between neon v2.1.0 and v2.2.0, the aeon manifest file format has been changed. When updating from neon < v2.2.0 manifests have to be recreated using ingest scripts (in examples folder) or updated using this script.

Use a script to run an example

    python examples/mnist_mlp.py 

Selecting a backend engine from the command line

The gpu backend is selected by default, so the above command is equivalent to if a compatible GPU resource is found on the system:

    python examples/mnist_mlp.py -b gpu

When no GPU is available, the optimized CPU (MKL) backend is now selected by default as of neon v2.1.0, which means the above command is now equivalent to:

    python examples/mnist_mlp.py -b mkl

If you are interested in comparing the default mkl backend with the non-optimized CPU backend, use the following command:

    python examples/mnist_mlp.py -b cpu

Use a yaml file to run an example

Alternatively, a yaml file may be used run an example.

    neon examples/mnist_mlp.yaml

To select a specific backend in a yaml file, add or modify a line that contains backend: mkl to enable mkl backend, or backend: cpu to enable cpu backend. The gpu backend is selected by default if a GPU is available.

Recommended Settings for neon with MKL on Intel Architectures

The Intel Math Kernel Library takes advantages of the parallelization and vectorization capabilities of Intel Xeon and Xeon Phi systems. When hyperthreading is enabled on the system, we recommend the following KMP_AFFINITY setting to make sure parallel threads are 1:1 mapped to the available physical cores.

    export OMP_NUM_THREADS=<Number of Physical Cores>
    export KMP_AFFINITY=compact,1,0,granularity=fine  

or

    export OMP_NUM_THREADS=<Number of Physical Cores>
    export KMP_AFFINITY=verbose,granularity=fine,proclist=[0-<Number of Physical Cores>],explicit

For more information about KMP_AFFINITY, please check here. We encourage users to set out trying and establishing their own best performance settings.

Documentation

The complete documentation for neon is available here. Some useful starting points are:

Support

For any bugs or feature requests please:

  1. Search the open and closed issues list to see if we're already working on what you have uncovered.
  2. Check that your issue/request hasn't already been addressed in our Frequently Asked Questions (FAQ) or neon-users Google group.
  3. File a new issue or submit a new pull request if you have some code you'd like to contribute

For other questions and discussions please post a message to the neon-users Google group

License

We are releasing neon under an open source Apache 2.0 License. We welcome you to contact us with your use cases.

Comments
  • MKL backend performance regression with some topologies

    MKL backend performance regression with some topologies

    Hello! I use neon to train a model on three backends: CPU, MKL and GPU. All the settings are the same when running with these backends. I got very similar costs from CPU and GPU, while the cost from MKL backend was usually higher than the previous two, sometimes even nan. Anybody has an idea why does that happen?

    The CPU is an Intel i7; the GPU is a Nvidia GTX 1050; the code is running on Ubuntu 16.04. Here is the printed result of the code...

    Use cpu as backend.
    
    DISPLAY:neon:-------------------------------------------------------------------------------------
    DISPLAY:neon:|    Func     |    Mean     |   Median    |     Min     |     Max     |    Units    |
    DISPLAY:neon:-------------------------------------------------------------------------------------
    DISPLAY:neon:| fprop       |  456.74     |  452.61     |  439.07     |  501.7      |    msec     |
    DISPLAY:neon:| bprop       |  819.21     |  796.45     |  772.53     |  979.8      |    msec     |
    DISPLAY:neon:| iteration   |  1276       |  1250       |  1213.5     |  1457       |    msec     |
    DISPLAY:neon:-------------------------------------------------------------------------------------
    
    Epoch 0   [Train |████████████████████|  246/246  batches, 3.51 cost, 303.30s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 1   [Train |████████████████████|  245/245  batches, 3.49 cost, 301.14s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 2   [Train |████████████████████|  245/245  batches, 3.47 cost, 301.43s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 3   [Train |████████████████████|  245/245  batches, 3.46 cost, 302.56s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 4   [Train |████████████████████|  245/245  batches, 3.44 cost, 302.91s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Neon training finishes in 1646.99 seconds.
    Misclassification error = 91.2%. Finished in 26.86 seconds.
    Top 3 Misclassification error = 78.1%. Finished in 27.36 seconds.
    Top 5 Misclassification error = 65.7%. Finished in 27.36 seconds.
    Misclassification error = 91.7% on test set. Finished in 43.54 seconds.
    Top 3 Misclassification error = 79.8% on test set. Finished in 43.60 seconds.
    Top 5 Misclassification error = 67.3% on test set. Finished in 43.76 seconds.
    
    
    Use mkl as backend.
    
    DISPLAY:neon:-------------------------------------------------------------------------------------
    DISPLAY:neon:|    Func     |    Mean     |   Median    |     Min     |     Max     |    Units    |
    DISPLAY:neon:-------------------------------------------------------------------------------------
    DISPLAY:neon:| fprop       |  119.82     |  120.03     |  111.14     |  130.82     |    msec     |
    DISPLAY:neon:| bprop       |  157.51     |  156.32     |  151.81     |  165.86     |    msec     |
    DISPLAY:neon:| iteration   |  277.33     |  280.49     |  264.03     |  285.16     |    msec     |
    DISPLAY:neon:-------------------------------------------------------------------------------------
    
    Epoch 0   [Train |████████████████████|  246/246  batches, 48.12 cost, 70.76s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 1   [Train |████████████████████|  245/245  batches, 47.54 cost, 73.94s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 2   [Train |████████████████████|  245/245  batches, 48.52 cost, 77.99s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 3   [Train |████████████████████|  245/245  batches, 48.09 cost, 74.04s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 4   [Train |████████████████████|  245/245  batches, 48.20 cost, 79.86s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Neon training finishes in 422.74 seconds.
    Misclassification error = 94.6%. Finished in 9.29 seconds.
    Top 3 Misclassification error = 90.1%. Finished in 9.56 seconds.
    Top 5 Misclassification error = 85.6%. Finished in 9.78 seconds.
    Misclassification error = 94.5% on test set. Finished in 15.48 seconds.
    Top 3 Misclassification error = 90.0% on test set. Finished in 15.47 seconds.
    Top 5 Misclassification error = 85.5% on test set. Finished in 14.99 seconds.
    
    
    Use gpu as backend.
    
    DISPLAY:neon:-------------------------------------------------------------------------------------
    DISPLAY:neon:|    Func     |    Mean     |   Median    |     Min     |     Max     |    Units    |
    DISPLAY:neon:-------------------------------------------------------------------------------------
    DISPLAY:neon:| fprop       |  6.1057     |  6.0366     |  5.8992     |  6.3699     |    msec     |
    DISPLAY:neon:| bprop       |  10.76      |  10.753     |  9.9809     |  11.841     |    msec     |
    DISPLAY:neon:| iteration   |  16.865     |  16.783     |  15.88      |  18.185     |    msec     |
    DISPLAY:neon:-------------------------------------------------------------------------------------
    
    Epoch 0   [Train |████████████████████|  246/246  batches, 3.51 cost, 3.98s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 1   [Train |████████████████████|  245/245  batches, 3.48 cost, 3.97s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 2   [Train |████████████████████|  245/245  batches, 3.47 cost, 3.98s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 3   [Train |████████████████████|  245/245  batches, 3.46 cost, 3.98s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Epoch 4   [Train |████████████████████|  245/245  batches, 3.44 cost, 3.98s] [CrossEntropyMulti Loss 0.00, 0.00s]
    Neon training finishes in 21.84 seconds.
    Misclassification error = 91.2%. Finished in 0.38 seconds.
    Top 3 Misclassification error = 78.0%. Finished in 0.38 seconds.
    Top 5 Misclassification error = 65.6%. Finished in 0.38 seconds.
    Misclassification error = 91.6% on test set. Finished in 0.60 seconds.
    Top 3 Misclassification error = 79.8% on test set. Finished in 0.60 seconds.
    Top 5 Misclassification error = 67.4% on test set. Finished in 0.60 seconds.
    
    opened by moderato 24
  • Prediction drops to 0 after certain number of epochs

    Prediction drops to 0 after certain number of epochs

    I'm using Neon for my deep Q-learning code - https://github.com/tambetm/simple_dqn. Recently I noticed an issue, that prediction of my network drops to 0 after certain number of epochs. This can be seen from Q-value graph: breakout_neon_latest_meanq

    Normally it would look like this: breakout_lives_meanq This plot was produced using Neon commit hash 7a56fa9645a51e97c05f2e5afbbd1df7057ae832 from October 30th. My code is exactly the same.

    The most plausible explanation would be, that weights are truncated to 0 at some point. Because my code hasn't changed, I suspect something in Neon code related to saving and loading weights repeatedly. In my code I need to clone a model, and simplest (and most compatible) way of doing that is to just save and load the model. I do this ~45 times before network prediction drops (it doesn't drop always at the same moment).

    Any ideas what change could have resulted in such a behavior and how to debug it?

    bug 
    opened by tambetm 24
  • Running mnist-small.yaml example after setup - getting error

    Running mnist-small.yaml example after setup - getting error

    Hi, Just installed neon on Ubuntu 14 python 3.4 with the following command: [email protected]:~/neon$ neon examples/mlp/mnist-small.yaml and getting an error message: Traceback (most recent call last): File "/home/nir/anaconda3/bin/neon", line 240, in experiment, result, status = main() File "/home/nir/anaconda3/bin/neon", line 126, in main experiment = deserialize(args.yaml_file) File "/home/nir/anaconda3/lib/python3.4/site-packages/neon/util/persist.py", line 183, in deserialize if not isinstance(load_path, file): NameError: name 'file' is not defined I check in the directory - this file exists. Appreciate your assistance thanks N

    opened by nirre1401 18
  • Updated Docker images

    Updated Docker images

    I've updated my Docker builds for version 1.0 - one for the cpu backend, and a new one for the gpu backend. The GPU images referenced in the 0.9 docs are still available, but with a note about deprecation.

    I've tested the cpu version with neon examples/mnist_mlp.yaml and python examples/mnist_mlp.py, and it appears fine. However, the gpu version builds the cpu version because of https://github.com/NervanaSystems/neon/issues/83. When building the code to check for the GPU capabilities, please keep https://github.com/NervanaSystems/neon/issues/19 in mind.

    opened by Kaixhin 17
  • Enable gpu error

    Enable gpu error

    I install cuda and environment set,

    nvidia-smi 
    Wed Dec  6 13:53:24 2017       
    +-----------------------------------------------------------------------------+
    | NVIDIA-SMI 384.81                 Driver Version: 384.81                    |
    |-------------------------------+----------------------+----------------------+
    | GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
    | Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
    |===============================+======================+======================|
    |   0  Tesla P100-PCIE...  Off  | 00000000:2F:00.0 Off |                    0 |
    | N/A   33C    P0    31W / 250W |  15553MiB / 16276MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    |   1  Tesla P100-PCIE...  Off  | 00000000:86:00.0 Off |                    0 |
    | N/A   36C    P0    31W / 250W |  15479MiB / 16276MiB |      0%      Default |
    +-------------------------------+----------------------+----------------------+
    
    pip install nervananeon
    
    env | grep PATH
    LD_LIBRARY_PATH=/usr/local/cuda/lib64:/usr/local/cuda/lib:/usr/lib/python2.7/site-packages/mklml_lnx_2018.0.1.20171007/lib:/usr/local/nvidia/lib:/usr/local/nvidia/lib64:/.singularity.d/libs
    PATH=/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
    
    python  mnist.py -b gpu
    python : error: argument -b/--backend: invalid choice: 'gpu' (choose from 'cpu', 'mkl')
    

    I don't understand why CPU and GPU install the same package,It should be used when the GPU package is installed by TensorFlow.

    pip install tensorflow-gpu
    
    opened by yangyang-zhang 12
  • 'magic' is not very descriptive :-D

    'magic' is not very descriptive :-D

    You have this cool function by dividing by integers by using bitshift, and first multiplying by another number, so you're not limited to dividing by powers of 2, as described in https://gmplib.org/~tege/divcnst-pldi94.pdf

    At the moment, this function is called 'magic', but I'm not sure it's very descriptive? I've renamed it to get_div_mul_shift in my own branch: https://github.com/hughperkins/winogradCl/blob/api/winogradcl/util/math_helper.py#L33

    def get_div_mul_shift_32(nmax, d)
    
    opened by hughperkins 11
  • Race condition in softmax-like expressions?

    Race condition in softmax-like expressions?

    There appears to be a race condition in certain expressions, such as the denominator of softmax with axis=1 (unlike in neon.transforms.activation):

    https://gist.github.com/oleg-trott/30b802902fd8c63ce002

    I tried this on several GTX 980 cards and see it on all of them. The errors are rare, about 0.1-0.2%. However, when they happen, they are usually dramatic.

    I also see these errors on GTX 750, but they are about 100 times less frequent.

    opened by oleg-trott 11
  • Create custom layer

    Create custom layer

    How do I start to implement a new layer? I want to implement a few more exotic pooling functions, so basically I want to change the function that takes the max / average and maybe even the gradient for that layer. It would be great to get a step by step recommendation on how to do such things.

    opened by Sowasa 11
  • make  neon error

    make neon error

    $make HAS_GPU=true
    Building MKL Engine...
    which: no icc in (/usr/local/nvidia/bin:/usr/local/cuda/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin)
    using GNU compiler...
    basename: missing operand
    Try 'basename --help' for more information.
    mkl root: /opt/neon/
    make[1]: Entering directory `/opt/neon/neon/backends/mklEngine'
    make[1]: Leaving directory `/opt/neon/neon/backends/mklEngine'
    make[1]: Entering directory `/opt/neon/neon/backends/mklEngine'
    Building mklEngine.so...
    gcc src/conv.c src/pooling.c src/relu.c src/batchNorm.c src/concat.c src/softmax.c src/MKLDNN.h -shared -o mklEngine.so -std=c99 -O3 -I/opt/neon//include -L/opt/neon//lib -Wl,-rpath=/opt/neon//lib -fopenmp -lmklml_gnu -fPIC -march=native -g -liomp5
    In file included from src/conv.c:15:0:
    src/MKLDNN.h:23:21: fatal error: mkl_dnn.h: No such file or directory
     #include <mkl_dnn.h>
                         ^
    compilation terminated.
    In file included from src/pooling.c:15:0:
    src/MKLDNN.h:23:21: fatal error: mkl_dnn.h: No such file or directory
     #include <mkl_dnn.h>
                         ^
    compilation terminated.
    In file included from src/relu.c:15:0:
    src/MKLDNN.h:23:21: fatal error: mkl_dnn.h: No such file or directory
     #include <mkl_dnn.h>
                         ^
    
    $ bash prepare_mkl.sh
    
    Checking MKLML dependencies...
    Downloading required MKLML version mklml_lnx_2018.0.1.20171007 ...
    MKLML dependencies installed: MKLROOT=/opt/neon/
    basename: missing operand
    Try 'basename --help' for more information.
    /opt/neon/ 1
    

    I need help,thanks

    opened by yangyang-zhang 10
  • Assertion Error loading weights to hidden layers

    Assertion Error loading weights to hidden layers

    Hi guys,

    An assertion error is thrown when I run this line of code: layer.load_weights(params)

    I looked at the source code and it looks like I might be missing an argument in the function called 'self'. Not sure what is meant by self and if I missed the documentation for it I apologize. I know I do not need the load_states argument since it defaults to true.

    Full code here, similar to VGG example: param_layers = [l for l in model.layers.layers] param_dict_list = trained_vgg['model']['config']['layers'] for layer, params in zip(param_layers, param_dict_list): if(layer.name == 'class_layer'): break print(params) layer.load_weights(params)

    opened by pantherso48 10
  • Ubuntu 16.04: `unsupported GNU version! gcc versions later than 4.9 are not supported!`

    Ubuntu 16.04: `unsupported GNU version! gcc versions later than 4.9 are not supported!`

    Ubuntu 16.04: unsupported GNU version! gcc versions later than 4.9 are not supported!

    I have gcc4.9 installed. In Torch, I can select gcc4.9 by doing:

    export CC=gcc-4.9
    export CXX=g++-4.9
    

    Unclear how to do this for Neon?

    opened by hughperkins 10
  •  No module named neon.util.compat

    No module named neon.util.compat

    Hi All,

    I am getting this error when I am running the script that has the following line, from neon.util.compat import range, StringIO. And I get the error that there isn't a module named neon.util.compat. How do I fix this error?

    And help will be appreciated

    opened by chandratejatiriveedhi 0
  • About installation

    About installation

    Hi! my name is keval pandya . I have issue about neon that . how can I install neon on my python in windows of . the python version is 3.8.5. so can you tell me how can I install neon because I am learning deep-learning from intel and there is use of neon framework so please help me fast as possible so I can learn . please provide me solution to install it on windows

    opened by keval2232 1
  • docs: fix simple typo, sclae -> scale

    docs: fix simple typo, sclae -> scale

    There is a small typo in examples/ssd/mboxloss.py.

    Should read scale rather than sclae.

    Semi-automated pull request generated by https://github.com/timgates42/meticulous/blob/master/docs/NOTE.md

    opened by timgates42 0
  • IndexError: index 200 is out of bounds for axis 0 with size 2

    IndexError: index 200 is out of bounds for axis 0 with size 2

    I have been doing the codes as -

    df=pd.DataFrame(adata) df.head()

    xq=adata['batch size'].values.reshape(-1,1) yq=adata['Price'].values.reshape(-1,1)

    mod = smf.quantreg('yq ~ xq', df) res = mod.fit(q=.5) print(res.summary())

    quantiles = [.05, .25, .50, .75, .95] def fit_model(q): res = mod.fit(q=q) return [q, res.params['Intercept'], res.params[xq]] + res.conf_int().ix[xq].tolist() models = [fit_model(xq) for xq in quantiles]

    However am getting the error as IndexError: index 200 is out of bounds for axis 0 with size 2

    opened by krpwn 0
  • IndexError: index 5000 is out of bounds for axis 0 with size 5000

    IndexError: index 5000 is out of bounds for axis 0 with size 5000

    Hi, I am running my capstone project and working on my dataset. When I tried to clean my dataset removing the outliers, I am getting this error. I am attaching the code as below.

    #Removing Outliers #Tukey Method

    import required libraries

    from collections import Counter

    Outlier detection

    def detect_outliers(df,n,features):

    outlier_indices = []
    
    # iterate over features(columns)
    for col in features:
        # 1st quartile (25%)
        Q1 = np.percentile(df[col], 25)
        # 3rd quartile (75%)
        Q3 = np.percentile(df[col],75)
        # Interquartile range (IQR)
        IQR = Q3 - Q1
        
        # outlier step
        outlier_step = 1.5 * IQR
        
        # Determine a list of indices of outliers for feature col
        outlier_list_col = df[(df[col] < Q1 - outlier_step) | (df[col] > Q3 + outlier_step )].index
        
        # append the found outlier indices for col to the list of outlier indices 
        outlier_indices.extend(outlier_list_col)
        
    # select observations containing more than 2 outliers
    outlier_indices = Counter(outlier_indices)        
    multiple_outliers = list( k for k, v in outlier_indices.items() if v > n )
    
    return multiple_outliers   
    

    List of Outliers

    Outliers_to_drop = detect_outliers(data1.drop('Class',axis=1),0,list(data1.drop('Class',axis=1))) data1.drop('Class',axis=1).loc[Outliers_to_drop]

    #Create New Dataset without Outliers good_data = data1.drop(data1.index[Outliers_to_drop]).reset_index(drop = True) good_data.info()


    IndexError Traceback (most recent call last) in 1 #Create New Dataset without Outliers ----> 2 good_data = data1.drop(data1.index[Outliers_to_drop]).reset_index(drop = True) 3 good_data.info()

    ~\Anaconda3\lib\site-packages\pandas\core\indexes\base.py in getitem(self, key) 4289 4290 key = com.values_from_object(key) -> 4291 result = getitem(key) 4292 if not is_scalar(result): 4293 return promote(result)

    IndexError: index 5000 is out of bounds for axis 0 with size 5000

    ​Can any one help me to fix this and code it properly.

    opened by venkidevictor 0
  • pip install failed, posix-ipc using sys/time.h failed

    pip install failed, posix-ipc using sys/time.h failed

    pip install nervananeon failed.

    posix_ipc_module.c(37): fatal error C1083: 无法打开包括文件: “sys/time.h” no such file or directory

    Mine is CPU x86-64

    How can I do?

    opened by silkyrose 2
Releases(v2.6.0)
  • v2.6.0(Jan 5, 2018)

  • v2.5.0(Dec 21, 2017)

    • Optimized SSD MKL backend performance (~3X boost version over version)
    • Bumped aeon version to v1.3.0
    • Fixed inference performance issue of MKL batchnorm
    • Fixed batch prediction issue for gpu backend
    • Enabled subset_pct for MNIST_DCGAN example
    • Updated "make clean" to clean up mkl artifacts
    • Added dockerfile for IA mkl
    Source code(tar.gz)
    Source code(zip)
  • v2.4.0(Nov 27, 2017)

    • Enabled pip install through pypi
    • Updated MKLML to version 20171007 with performance improvement of ~3X for mnist datalayer/nondatalayer and ~1.6X for DCGAN/WGAN datalayer
    • Updated resnet model to optimize performance with MKLML 20171007
    • Updated Alexnet weight file and fixed bug for deep dream
    • Fixed faster-rcnn inference model loading issue
    • Added data_loading time measurement and enabled GAN networks benchmarking
    • Updated to Aeon version 1.2.0
    • Enabled neon build with mklEngine on Windows systems
    Source code(tar.gz)
    Source code(zip)
  • v2.3.0(Oct 27, 2017)

    • Optimized DeepSpeech2 MKL backend performance (~7X improvement over the CPU backend)
    • Fused convolution and bias layer which significantly boosted AlexNet and VGG performance on Intel architectures with MKL backend
    • Made SSD and Faster-RNN use VGG weight files in new format
    • Fixed use of reset_cells hyperparameter
    • Fixed MKL backend bug for GAN and Faster-RCNN models
    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Sep 27, 2017)

    • Update MKLML version 20170908 that fixes a bug related to data conversions
    • Add SSD example for bounding box object detection that works for both GPU and MKL backend
    • Add DeepSpeech2 MKL backend optimization that features ~3X improvement
    • Update aeon to 1.0.0 including new version of manifest (doc/source/loading_data.rst#aeon-dataloader)
    • Add CHWD Support for Batch Normalization in mkl backend
    • Modify ResNet-50 model's last layer to match the original ResNet-50 model paper
    • Enable Seq2Seq testing and benchmarking
    Source code(tar.gz)
    Source code(zip)
  • v2.1.0(Aug 2, 2017)

    • Set MKL backend (-b mkl) as the default CPU backend on Linux (use -b cpu to specify original CPU backend)
    • Update MKLML version 20170720 (AVX512 code paths enabled by default and conversion optimizations)
    • Simplify ResNet example
    • Makefiles now check for virtualenv and pkg-config (NervanaSystems/neon#383)
    • Fix Deep Speech2 model on MKL backend
    • Fix MKL installation for "make sysinstall"
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Jun 28, 2017)

    • Added support for MKL backend (-b mkl) on Linux, which boosts neon CPU performance significantly
    • Added WGAN model examples for LSUN and MNIST data
    • Enabled WGAN and DCGAN model examples for Python3
    • Added fix (using file locking) to prevent race conditions running multiple jobs on the same machine with multiple GPUs
    • Added functionality to display some information about hardware, OS and model used
    • Updated appdirs to 1.4.3 to be compatibile on Centos 7.3 for appliance
    Source code(tar.gz)
    Source code(zip)
  • v1.9.0(May 4, 2017)

    • Add support for 3D deconvolution
    • Generative Adversarial Networks (GAN) implementation, and MNIST DCGAN example, following GoodFellow 2014 (http://arXiv.org/abs/1406.2661)
    • Implement Wasserstein GAN cost function and make associated API changes for GAN models
    • Add a new benchmarking script with per-layer timings
    • Add weight clipping for GDM, RMSProp, Adagrad, Adadelta and Adam optimizers
    • Make multicost an explicit choice in mnist_branch.py example
    • Enable NMS kernels to work with normalized boxes and offset
    • Fix missing links in api.rst [#366]
    • Fix docstring for --datatype option to neon [#367]
    • Fix perl shebang in maxas.py and allow for build with numpy 1.12 [#356]
    • Replace os.path.join for Windows interoperability [#351]
    • Update aeon to 0.2.7 to fix a seg fault on termination
    Source code(tar.gz)
    Source code(zip)
  • v1.8.2(Feb 24, 2017)

    • Make the whale calls example stable and shuffle dataset before splitting into subsets
    • Reduce default depth in cifar_msra example to 2
    • Fix the formatting of the conv layer description
    • Fix documentation error in the video-c3d example
    • Support greyscale videos
    Source code(tar.gz)
    Source code(zip)
  • v1.8.1(Jan 18, 2017)

    • Bug fix: Add dilation to object dict and assign defaults to dil_w = dil_h = 1 [#335, #336]
    • Bug fix: Prevent GPU backend from ignoring non-zero slope in Rectlinclip and change default slope to 0
    • Bug fix: Nesterov momentum was updating velocities incorrectly
    Source code(tar.gz)
    Source code(zip)
  • v1.8.0(Dec 28, 2016)

    • Skip Thought Vectors (http://arxiv.org/abs/1506.06726) example
    • Dilated convolution support
    • Nesterov Accelerated Gradient option to SGD optimizer
    • MultiMetric class to allow wrapping Metric classes
    • Support for serializing and deserializing encoder-decoder models
    • Allow specifying the number of time steps to evaluate during beam search
    • A new community-contributed Docker image
    • Improved error messages when a tensor is created with an invalid shape or reshaped to an incompatible size
    • Fix bugs in MultiCost support
    • Documentation fixes [#331]
    Source code(tar.gz)
    Source code(zip)
  • v1.7.0(Nov 21, 2016)

    • Update Data Loader to aeon https://github.com/NervanaSystems/aeon for flexible, multi-threaded data loading and transformations
    • Add Neural Machine Translation model
    • Remove Fast RCNN model (use Faster RCNN model instead)
    • Remove music_genres example
    • Fix super blocking for small N with 1D conv
    • Fix update-direct conv kernel for small N
    • Add gradient clipping to Adam optimizer
    • Documentation updates and bug fixes
    Source code(tar.gz)
    Source code(zip)
  • v1.6.0(Sep 21, 2016)

    • Faster RCNN model
    • Sequence to Sequence container and char_rae recurrent autoencoder model
    • Reshape Layer that reshapes the input [#221]
    • Pip requirements in requirements.txt updated to latest versions [#289]
    • Remove deprecated data loaders and update docs
    • Use NEON_DATA_CACHE_DIR envvar as archive dir to store DataLoader ingested data
    • Eliminate type conversion for FP16 for CUDA compute capability >= 5.2
    • Use GEMV kernels for batch size 1
    • Alter delta buffers for nesting of merge-broadcast layers
    • Support for ncloud real-time logging
    • Add fast_style Makefile target
    • Fix Python 3 builds on Ubuntu 16.04
    • Run setup.py for sysinstall to generate version.py [#282]
    • Fix broken link in mnist docs
    • Fix conv/deconv tests for CPU execution and fix i32 data type
    • Fix for average pooling with batch size 1
    • Change default scale_min to allow random cropping if omitted
    • Fix yaml loading
    • Fix bug with image resize during injest
    • Update references to the ModelZoo and neon examples to their new locations
    Source code(tar.gz)
    Source code(zip)
  • v1.5.4(Jul 15, 2016)

    • Python2/Python3 compatibility [#191]
    • Support for Pascal GPUs
    • Persistent RNN kernels [#262]
    • Implement Binarized Neural Networks from http://arxiv.org/pdf/1602.02830v3.pdf (added in v1.5.4)
    • Dataloader enhancements (audio loader with examples)
    • HDF5 file data iterator
    • Convolution kernel improvements
    • API documentation improvements [#234, #244, #263]
    • Cache directory cleanup
    • Reorganization of all unit tests
    • Bug fixes [#182, #183, #231, #241, #252, #253, #257, #259, #267, #268]
    Source code(tar.gz)
    Source code(zip)
  • v1.5.3(Jul 7, 2016)

    • Python2/Python3 compatibility [#191]
    • Support for Pascal GPUs
    • Persistent RNN kernels [#262]
    • Dataloader enhancements (audio loader with examples)
    • HDF5 file data iterator
    • Convolution kernel improvements
    • API documentation improvements [#234, #244, #263]
    • Cache directory cleanup
    • Reorganization of all unit tests
    • Bug fixes [#182, #183, #231, #241, #252, #253, #257, #259, #267]
    Source code(tar.gz)
    Source code(zip)
  • v1.5.2(Jul 7, 2016)

    • Python2/Python3 compatibility [#191]
    • Support for Pascal GPUs
    • Persistent RNN kernels [#262]
    • Dataloader enhancements (audio loader with examples)
    • HDF5 file data iterator
    • Convolution kernel improvements
    • API documentation improvements [#234, #244, #263]
    • Cache directory cleanup
    • Reorganization of all unit tests
    • Bug fixes [#182, #183, #231, #241, #252, #253, #257, #259]
    Source code(tar.gz)
    Source code(zip)
  • v1.5.1(Jun 30, 2016)

    • Python2/Python3 compatibility [#191]
    • Support for Pascal GPUs
    • Persistent RNN kernels [#262]
    • Dataloader enhancements (audio loader with examples)
    • HDF5 file data iterator
    • Convolution kernel improvements
    • API documentation improvements [#234, #244, #263]
    • Cache directory cleanup
    • Reorganization of all unit tests
    • Bug fixes [#182, #183, #231, #241, #252, #253, #257, #259]
    Source code(tar.gz)
    Source code(zip)
  • v1.4.0(Apr 29, 2016)

    • VGG16 based Fast R-CNN model using winograd kernels
    • new, backward compatible, generic data loader
    • C3D video loader model trained on UCF101 dataset
    • Deep Dream example
    • make conv layer printout more informative [#222]
    • fix some examples to use new arg override capability
    • improve performance for relu for small N
    • better support for arbitrary batch norm layer placement
    • documentation updates [#210, #213, #236]
    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Mar 4, 2016)

    • winograd kernels and associated autotuning routines
    • benchmarking scripts
    • deprecation of deterministic argument for backend constructor
    • improve batch norm stability with fp16 backend
    • allow strided support for dimshuffle kernel
    • speed up zero momentum gradient descent
    Source code(tar.gz)
    Source code(zip)
  • v1.2.2(Feb 25, 2016)

    • benchmarking enhancements
    • fast dimshuffle, transpose, other kernel speedups and refactoring
    • batch norm states fix, deterministic updates
    • example fixes for fast rcnn and conv_autoencoder
    • image decoding rescaling method fix
    • deserialization fixes for RNN's, refactoring
    • caffe compatibility fixes
    • documentation updates
    Source code(tar.gz)
    Source code(zip)
    neon-1.2.2.tar.gz(532.69 KB)
  • v1.2.1(Feb 5, 2016)

  • v1.2.0(Jan 31, 2016)

    • kepler GPU kernel support [#80]
    • new dataloader format, updated docs [#115, #170]
    • new serialization format
    • FastRCNN implementation, ROI pooling support [#135]
    • deep residual nets implementation and example
    • expanded model zoo
    • Ticker dataset and copy, repeat copy tasks
    • autodiff transpose support [#173]
    • numerous bug fixes and documentation updates.
    Source code(tar.gz)
    Source code(zip)
    neon-1.2.0.tar.gz(467.88 KB)
  • v1.1.5(Jan 14, 2016)

    • CUDA kernels for lookuptable layer (up to 4x speedup)
    • support for determinstic Conv layer updatesa
    • LRN layer support
    • custom dataset walkthrough utilizing bAbI data
    • reduced number of threads in deep reduction EW kernels [#171]
    • additional (de)serialization routines [#106]
    • CPU tensor slicing fix
    • corrections for PrecisionRecall, MultiLabelStats [#148]
    • explicitly specify python2.7 for virtualenv [#155]
    • default to SM50 when no working GPU found [#186]
    • Add alpha to ELU activation [#164]
    • deconv callback fix [#162]
    • various documentation updates [#151, #152]
    Source code(tar.gz)
    Source code(zip)
    neon-1.1.5.tar.gz(468.95 KB)
  • v1.1.4(Dec 15, 2015)

    • Add support for bidirectional RNNs and LSTMs
    • added ELU, leaky ReLU activations
    • significantly faster GPU kernel builds (using ptx instead of cuda-c)
    • data shuffling enhancements, removal of old data loader code.
    • caffe conv, pool, dropout layer matching and compatibility flags
    • add scheduling support for RMSProp
    • callback enhancements, additional unit tests
    • documentation auditing, added links to introductory video tutorials
    Source code(tar.gz)
    Source code(zip)
  • v1.1.3(Dec 1, 2015)

    • deconvolution and weight histogram visualization examples and documentation
    • CPU convolution and pooling layer speedups (~2x faster)
    • bAbI question and answer interactive demo, dataset support.
    • various ImageLoader enhancements.
    • interactive usage improvements (shortcut Callback import, multiple Callbacks init, doc updates, single item batch size support)
    • set default verbosity level to warning
    • CIFAR10 example normalization updates
    • CUDA detection enhancements [#132]
    • only parse batch_writer arguments when used as a script, allow undefined global_mean [#137, #140]
    Source code(tar.gz)
    Source code(zip)
    neon-1.1.3.tar.gz(444.45 KB)
  • v1.1.2(Nov 18, 2015)

    • completely re-written C++ multithreaded dataloader
    • new weight initialization options for recurrent layers
    • Added deconvolution visualization support (guided backprop)
    • new bAbI question answering example network
    • Improved performance of cifar10_allcnn, word_lstm examples
    • new CUDA-C max and avg pooling kernels
    • Additional bugfixes and documentation updates
    Source code(tar.gz)
    Source code(zip)
  • v1.1.1(Nov 6, 2015)

    • Callback initialization bug fix [#127]
    • IMDB LSTM example bug fix [#130]
    • Added cuda-convnet2 style binary dropout variant
    • Added benchmark function to model (separate fprop, bprop, update timings)
    • Remove h_buffer references in lieu of outputs for recurrent layers
    • Multi-cost output buffer bugfix for inference [#131]
    • New timeseries prediction and generation example
    • Change Callback initialization to re-support named arguments. Separate out these arguments in argparser. [#128]
    Source code(tar.gz)
    Source code(zip)
  • v1.1.0(Oct 30, 2015)

    • Sentiment analysis support (LSTM lookupTable based), new IMDB example
    • Support for merge and branch layer stacks via LayerContainers
      • Sequential, Tree, MergeBroadcast, MergeMultiStream
    • Support for freezing layer stacks
    • Adagrad optimizer support
    • new GPU kernels for fast compounding batch norm, conv and pooling engine updates, new kernel build system and flags.
    • Modifications for Caffe support
      • conv, pooling, P/Q updates, dropout layer normalization more in-line with Caffe approach. NOTE: this breaks backwards compatibility with some strided conv/pool related models serialized using older versions of neon as the output sizes may now be different. See the FAQ for more info.
      • serialization enhancements to make caffe model import/export easier
      • use per-channel mean subtraction instead of single global. NOTE: this breaks backwards compatibility with ImgMaster saved datasets prior to this revision. To correct, please use the included update_dataset_cache.py script in the util directory.
    • Default training cost display during progress bar is now calculated on a rolling window basis rather than from the beginning of each epoch
    • Separate Layer configuration and initialization steps
    • YAML based alexnet example
    • Callback enhancements.
      • now pass args instead of having to spell out callbacks in each example
      • Changed validation callback to loss callback, validation_frequency now evaluation_frequency
      • Generic metric callback.
    • Various bug fixes
      • non-contiguous array get for GPUTensors
      • 1D slicing returns 2D matrices
      • bin/neon serialization fixes for RNNs
      • 3D conv fixes for fprop, bprop
      • batch norm inference fix
      • bias layer size fix
    • Documentation updates and improvements
    Source code(tar.gz)
    Source code(zip)
    neon-1.1.0.tar.gz(429.88 KB)
  • v1.0.0(Oct 30, 2015)

    • Ensure root logging handler setup [#82]
    • C++ utility for CUDA compatibility checking [#83]
    • Add predict function to models [#86]
    • Fix bug in learning rate schedule impacting deserialization
    • Speed up batch norm computation
    • Average gradients in OpTree, fix tests
    • Use inference mode for fprop during validation
    • Add top-k misclassifcation metric
    • Simplify maxas install, make vis requirements optional, doc updates.
    Source code(tar.gz)
    Source code(zip)
    neon-1.0.0.tar.gz(431.73 KB)
  • v0.9.0(Jul 20, 2015)

    This release implements support for multi GPU processing using weird trick parallelization (data parallel for local layers, model parallel for fully-connected layers) and cleans up previously existing MPI based parallel code.

    Multi GPU is only supported on newer Maxwell based cards using the NervanaGPU backend.

    Older, Kepler based cards using the cudanet backend are no longer supported (some models and datasets will still work, but others may raise DeprecationWarning's). Users of these cards are encouraged to remain on the 0.8.2 release until we back-port NervanaGPU to support Kepler cards.

    Source code(tar.gz)
    Source code(zip)
    neon-0.9.0.tar.gz(519.07 KB)
Owner
Nervana
Intel® Nervana™ - Artificial Intelligence Products Group
Nervana
Per-Pixel Classification is Not All You Need for Semantic Segmentation

MaskFormer: Per-Pixel Classification is Not All You Need for Semantic Segmentation Bowen Cheng, Alexander G. Schwing, Alexander Kirillov [arXiv] [Proj

Facebook Research 1k Jan 08, 2023
Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution

Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution Abstract Within the Latin (and ancient Greek) production, it is well

4 Dec 03, 2022
Alfred-Restore-Iterm-Arrangement - An Alfred workflow to restore iTerm2 window Arrangements

Alfred-Restore-Iterm-Arrangement This alfred workflow will list avaliable iTerm2

7 May 10, 2022
The code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning"

The Code for MM2021 paper "Multi-Level Counterfactual Contrast for Visual Commonsense Reasoning" Setting up and using the repo Get the dataset. Follow

4 Apr 20, 2022
CVPR2020 Counterfactual Samples Synthesizing for Robust VQA

CVPR2020 Counterfactual Samples Synthesizing for Robust VQA This repo contains code for our paper "Counterfactual Samples Synthesizing for Robust Visu

72 Dec 22, 2022
Code for "LASR: Learning Articulated Shape Reconstruction from a Monocular Video". CVPR 2021.

LASR Installation Build with conda conda env create -f lasr.yml conda activate lasr # install softras cd third_party/softras; python setup.py install;

Google 157 Dec 26, 2022
pytorch implementation of trDesign

trdesign-pytorch This repository is a PyTorch implementation of the trDesign paper based on the official TensorFlow implementation. The initial port o

Learn Ventures Inc. 41 Dec 29, 2022
PyTorch implementation of DreamerV2 model-based RL algorithm

PyDreamer Reimplementation of DreamerV2 model-based RL algorithm in PyTorch. The official DreamerV2 implementation can be found here. Features ... Run

118 Dec 15, 2022
Code-free deep segmentation for computational pathology

NoCodeSeg: Deep segmentation made easy! This is the official repository for the manuscript "Code-free development and deployment of deep segmentation

André Pedersen 26 Nov 23, 2022
Predict halo masses from simulations via graph neural networks

HaloGraphNet Predict halo masses from simulations via Graph Neural Networks. Given a dark matter halo and its galaxies, creates a graph with informati

Pablo Villanueva Domingo 20 Nov 15, 2022
OBBDetection is a oriented object detection library, which is based on MMdetection.

OBBDetection news: We are now updating OBBDetection to new vision based on MMdetection v2.10, which has more advanced models and more efficient featur

jbwang1997 401 Jan 02, 2023
A Diagnostic Dataset for Compositional Language and Elementary Visual Reasoning

CLEVR Dataset Generation This is the code used to generate the CLEVR dataset as described in the paper: CLEVR: A Diagnostic Dataset for Compositional

Facebook Research 503 Jan 04, 2023
BT-Unet: A-Self-supervised-learning-framework-for-biomedical-image-segmentation-using-Barlow-Twins

BT-Unet: A-Self-supervised-learning-framework-for-biomedical-image-segmentation-using-Barlow-Twins Deep learning has brought most profound contributio

Narinder Singh Punn 12 Dec 04, 2022
Code, final versions, and information on the Sparkfun Graphical Datasheets

Graphical Datasheets Code, final versions, and information on the SparkFun Graphical Datasheets. Generated Cells After Running Script Example Complete

SparkFun Electronics 102 Jan 05, 2023
Learning and Building Convolutional Neural Networks using PyTorch

Image Classification Using Deep Learning Learning and Building Convolutional Neural Networks using PyTorch. Models, selected are based on number of ci

Mayur 126 Dec 22, 2022
Implementation of "Semi-supervised Domain Adaptive Structure Learning"

Semi-supervised Domain Adaptive Structure Learning - ASDA This repo contains the source code and dataset for our ASDA paper. Illustration of the propo

3 Dec 13, 2021
Automatic Image Background Subtraction

Automatic Image Background Subtraction This repo contains set of scripts for automatic one-shot image background subtraction task using the following

Oleg Sémery 6 Dec 05, 2022
It is a simple library to speed up CLIP inference up to 3x (K80 GPU)

CLIP-ONNX It is a simple library to speed up CLIP inference up to 3x (K80 GPU) Usage Install clip-onnx module and requirements first. Use this trick !

Gerasimov Maxim 93 Dec 20, 2022
A chemical analysis of lipophilicities & molecule drawings including ML

A chemical analysis of lipophilicity & molecule drawings including a bit of ML analysis. This is a simple project that includes two Jupyter files (one

Aurimas A. Nausėdas 7 Nov 22, 2022
Storage-optimizer - Identify potintial optimizations on the cloud storage accounts

Storage Optimizer Identify potintial optimizations on the cloud storage accounts

Zaher Mousa 1 Feb 13, 2022