🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library

Last update: Nov 26, 2022

Related tags

Overview

thinc-apple-ops

Make spaCy and Thinc up to 8 × faster on macOS by calling into Apple's native libraries.

⏳ Install

Make sure you have Xcode installed and then install with pip:

pip install thinc-apple-ops

🏫 Motivation

Matrix multiplication is one of the primary operations in machine learning. Since matrix multiplication is computationally expensive, using a fast matrix multiplication implementation can speed up training and prediction significantly.

Most linear algebra libraries provide matrix multiplication in the form of the standardized BLAS gemm functions. The work behind scences is done by a set of matrix multiplication kernels that are meticulously tuned for specific architectures. Matrix multiplication kernels use architecture-specific SIMD instructions for data-level parallism and can take factors such as cache sizes and intstruction latency into account. Thinc uses the BLIS linear algebra library, which provides optimized matrix multiplication kernels for most x86_64 and some ARM CPUs.

Recent Apple Silicon CPUs, such as the M-series used in Macs, differ from traditional x86_64 and ARM CPUs in that they have a separate matrix co-processor(s) called AMX. Since AMX is not well-documented, it is unclear how many AMX units Apple M CPUs have. It is certain that the (single) performance cluster of the M1 has an AMX unit and there is empirical evidence that both performance clusters of the M1 Pro/Max have an AMX unit.

Even though AMX units use a set of undocumented instructions, the units can be used through Apple's Accelerate linear algebra library. Since Accelerate implements the BLAS interface, it can be used as a replacement of the BLIS library that is used by Thinc. This is where the thinc-apple-ops package comes in. thinc-apple-ops extends the default Thinc ops, so that gemm matrix multiplication from Accelerate is used in place of the BLIS implementation of gemm. As a result, matrix multiplication in Thinc is performed on the fast AMX unit(s).

⏱ Benchmarks

Using thinc-apple-ops leads to large speedups in prediction and training on Apple Silicon Macs, as shown by the benchmarks below.

Prediction

This first benchark compares prediction speed of the de_core_news_lg spaCy model between the M1 with and without thinc-apple-ops. Results for an Intel Mac Mini and AMD Ryzen 5900X are also provided for comparison. Results are in words per second. In this prediction benchmark, using thinc-apple-ops improves performance by 4.3 times.

CPU	BLIS	thinc-apple-ops	Package power (Watt)
Mac Mini (M1)	6492	27676	5
MacBook Air Core i5 2020	9790	10983	9
AMD Ryzen 5900X	22568	N/A	52

Training

In the second benchmark, we compare the training speed of the de_core_news_lg spaCy model (without NER). The results are in training iterations per second. Using thinc-apple-ops improves training time by 3.0 times.

CPU	BLIS	thinc-apple-ops	Package power (Watt)
Mac Mini M1 2020	3.34	10.07	5
MacBook Air Core i5 2020	3.10	3.27	10
AMD Ryzen 5900X	6.53	N/A	53

Comments

Pass through Accelerate sgemm/saxpy in Ops.cblas

This can be used by e.g. the parser in spaCy 3.4 to use Accelerate's implementations.

I am not sure how to handle this dependency-wise, since this requires Thinc 8.1, but we still want to people to be able to use thinc-apple-ops with Thinc 8.0.x and spaCy < 3.4. Do we need another minor release that sets thinc < 8.1.0?

opened by danieldk 5

IndexError: Out of bounds on buffer access (axis 1)

Hi I tried to use this awesome package and I am getting this error. Not sure what it means, maybe you guys could help me?

I should mention that my data is quite big and I am also using some SWAP space. Could this be the reason of this error?

[2021-09-28 21:09:01,238] [INFO] Set up nlp object from config
[2021-09-28 21:09:01,500] [INFO] Pipeline: ['tok2vec', 'ner', 'sentencizer', 'entity_linker']
[2021-09-28 21:09:01,505] [INFO] Created vocabulary
[2021-09-28 21:09:01,505] [INFO] Finished initializing nlp object
Traceback (most recent call last):
  File "/Users/joozty/Documents/kolurbo/venv/bin/spacy", line 8, in <module>
    sys.exit(setup_cli())
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/cli/_util.py", line 69, in setup_cli
    command(prog_name=COMMAND)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
    return self.main(*args, **kwargs)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1062, in main
    rv = self.invoke(ctx)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
    return ctx.invoke(self.callback, **ctx.params)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 763, in invoke
    return __callback(*args, **kwargs)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/typer/main.py", line 500, in wrapper
    return callback(**use_params)  # type: ignore
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/cli/train.py", line 60, in train_cli
    nlp = init_nlp(config, use_gpu=use_gpu)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/training/initialize.py", line 84, in init_nlp
    nlp.initialize(lambda: train_corpus(nlp), sgd=optimizer)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/language.py", line 1272, in initialize
    proc.initialize(get_examples, nlp=self, **p_settings)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/pipeline/tok2vec.py", line 216, in initialize
    self.model.initialize(X=doc_sample)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 299, in initialize
    self.init(self, X=X, Y=Y)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 86, in init
    layer.initialize(X=curr_input, Y=Y)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 299, in initialize
    self.init(self, X=X, Y=Y)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 90, in init
    curr_input = layer.predict(curr_input)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 315, in predict
    return self._func(self, X, is_train=False)[0]
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/concatenate.py", line 44, in forward
    Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/concatenate.py", line 44, in <listcomp>
    Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 291, in __call__
    return self._func(self, X, is_train=is_train)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/ml/staticvectors.py", line 46, in forward
    vectors_data = model.ops.gemm(model.ops.as_contig(V[rows]), W, trans2=True)
  File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc_apple_ops/ops.py", line 25, in gemm
    C = blas.gemm(x, y, trans1=trans1, trans2=trans2)
  File "thinc_apple_ops/blas.pyx", line 37, in thinc_apple_ops.blas.gemm
  File "thinc_apple_ops/blas.pyx", line 53, in thinc_apple_ops.blas.gemm
IndexError: Out of bounds on buffer access (axis 1)

Info about spaCy

spaCy version: 3.1.3
Platform: macOS-11.6-arm64-arm-64bit
Python version: 3.9.7
Pipelines: en_core_web_sm (3.1.0), en_core_web_md (3.1.0)

opened by Joozty 2

Can't compile thinc on Macbook Air M1

Hello, I find myself unable to compile this otherwise magnificent tool! Please help, if you can!

I am on MacOS 12.1, Kernel Version 21.2.0, and have installed the latest Python (3.10.2)

Here is the error message I get after trying to install with pip (apparently it can't find the Accelerate Libraries, especially Accelerate.h Header ...):

ERROR: Command errored out with exit status 1: command: /Library/Frameworks/Python.framework/Versions/3.10/bin/python3.10 /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/tmp0bhlw2sh cwd: /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-install-wgga78t9/thinc-apple-ops_f5b38888c7a149cd9f99fd524c2bd340 Complete output (34 lines): running bdist_wheel running build running build_py creating build creating build/lib.macosx-10.9-universal2-3.10 creating build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/init.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/ops.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops creating build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests copying thinc_apple_ops/tests/init.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests copying thinc_apple_ops/tests/test_gemm.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests running egg_info warning: no files found matching '.pxd' under directory 'thinc_apple_ops' warning: no files found matching '.txt' under directory 'thinc_apple_ops' writing manifest file 'thinc_apple_ops.egg-info/SOURCES.txt' copying thinc_apple_ops/blas.pyx -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/py.typed -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops running build_ext creating build/temp.macosx-10.9-universal2-3.10 creating build/temp.macosx-10.9-universal2-3.10/thinc_apple_ops clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch arm64 -arch x86_64 -g -I/private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include -I/Library/Frameworks/Python.framework/Versions/3.10/include/python3.10 -c thinc_apple_ops/blas.c -o build/temp.macosx-10.9-universal2-3.10/thinc_apple_ops/blas.o In file included from thinc_apple_ops/blas.c:706: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/arrayobject.h:5: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/ndarrayobject.h:12: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/ndarraytypes.h:1960: /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it with "
^ thinc_apple_ops/blas.c:714:10: fatal error: 'Accelerate/Accelerate.h' file not found #include "Accelerate/Accelerate.h" ^~~~~~~~~~~~~~~~~~~~~~~~~ thinc_apple_ops/blas.c:714:10: note: did not find header 'Accelerate.h' in framework 'Accelerate' (loaded from '/System/Library/Frameworks') 1 warning and 1 error generated. error: command '/Library/Developer/CommandLineTools/usr/bin/clang' failed with exit code 1

ERROR: Failed building wheel for thinc-apple-ops Failed to build thinc-apple-ops ERROR: Could not build wheels for thinc-apple-ops, which is required to install pyproject.toml-based projects

------------------------------------------ END---------------------------------------------------------------------------

Any help would be greatly appreciated, thanks!
duplicate

opened by amal1us 1
AppleOps.gemm: write in-place when `output` is given

NumpyOps.gemm (with BLIS) writes the result of matrix multiplication in-place when the output argument is given. This changes AppleOps.gemm to do the same, avoiding allocation of a temporary.
enhancement

opened by danieldk 0
Change thinc upper bound to <8.1.0

thinc-apple-ops will require thinc >= 8.1.0 in the future for the CBLAS passthrough functionality. As discussed in #15, we should first do another minor thinc-apple-ops release specifically for thinc <8.1.0.

Also bump the version to v0.0.7 to prepare for the release.

opened by danieldk 0
Fix 0-size arrays

Our bit of Cython code uses memory buffers, which apparently have a bounds-check when the size is 0 when acquiring the pointer. In contrast, in other bits of code we often acquire the buffer by casting the array.data pointer, which has no such bounds check. This led to IndexError being raised when zero shapes were passed through.

opened by honnibal 0
Require thinc with ops registry

Technically it doesn't require a currently unreleased version of thinc to run, but if people install it into an existing venv, then it's better to require the version of thinc to upgraded so that it's detected and used.

opened by adrianeboyd 0

Releases(v0.1.3)

v0.1.3(Dec 16, 2022)

Relax Thinc upper bound to <9.1.0 to support current Thinc 9.0.0 development builds.
Source code(tar.gz)
Source code(zip)
v0.1.2(Oct 17, 2022)
Updates and binary wheels for python 3.11.

Source code(tar.gz)
Source code(zip)
v0.1.1(Sep 27, 2022)
🔴 Bug fixes

Fix issue #27: Add numpy build constraints for PyPI wheels.

Source code(tar.gz)
Source code(zip)
v0.0.8(Sep 27, 2022)
🔴 Bug fixes

Fix issue #27: Add numpy build constraints for PyPI wheels.

Source code(tar.gz)
Source code(zip)
v0.1.0(Jul 19, 2022)
✨ New features and improvements

Pass through Accelerate's saxpy/sgemm in AppleOps.cblas (#15, #21).

Write in-place in AppleOps.gemm when the output argument is given (#19).

🔴 Bug fixes

Fix issue #17: avoid cyclic imports in Thinc.

Source code(tar.gz)
Source code(zip)
v0.0.7(May 27, 2022)
Restrict Thinc to v8.0.x in preparation for Thinc v8.1.

Source code(tar.gz)
Source code(zip)
v0.0.6(May 18, 2022)
Fix issue #12: Check shape in AppleOps.gemm.

Source code(tar.gz)
Source code(zip)

Owner

Explosion

A software company specializing in developer tools for Artificial Intelligence and Natural Language Processing

GitHub Repository https://github.com/explosion/thinc

Free APN For Python

4 Apr 22, 2022

北大选课网2021年春季验证码识别

北大选课网验证码识别 2021 年春季学期 Powered by Elector Quartet (@Rabbit, @xmcp, @SpiritedAwayCN, @gzz) 数据集描述最初的数据集为 5130 张人工标记的验证码，之后利用早期训练好的模型在选课网上进行自动验证 (自举)，又收集

27 Sep 17, 2022

A student information management system in Python

Student-information-management-system 本项目是一个学生信息管理系统，这个项目是用Python语言实现的，也实现了图形化界面的显示，同时也实现了管理员端，学生端两个登陆入口，同时底层使用的是Redis做的数据持久化。 This project is a stude

7 Nov 15, 2022

Used the pyautogui library to automate some processes on the computer

Pyautogui Utilizei a biblioteca pyautogui para automatizar alguns processos no c

1 Dec 30, 2021

Impf Bot.py 🐍⚡ automation for the German

Impf Bot.py 🐍⚡ automation for the German "ImpfterminService - 116117"

251 Dec 13, 2022

ASVspoof 2021 Baseline Systems

ASVspoof 2021 Baseline Systems Baseline systems are grouped by task: Speech Deepfake (DF) Logical Access (LA) Physical Access (PA) Please find more de

91 Dec 28, 2022

Simple logger for Urbit pier size, with systemd timer template

urbit-piermon Simple logger for Urbit pier size, with systemd timer template. Syntax piermon.py -i [PATH TO PIER] -o [PATH TO OUTPUT CSV] systemd serv

1 Nov 07, 2021

A curses based mpd client with basic functionality and album art.

Miniplayer A curses based mpd client with basic functionality and album art. After installation, the player can be opened from the terminal with minip

102 Dec 24, 2022

An account generator for guilded.gg that I made a while back and decided to bring back up

8 Nov 17, 2022

Euler 021 Py - Euler Problem 021 solved in Python

Euler_021_Py Euler Problem 021 solved in Python Let d(n) be defined as the sum o

1 Jan 24, 2022

reproduces experiments from

Installation To enable importing of modules, from the parent directory execute: pip install -e . To install requirements: python -m pip install requir

15 Aug 11, 2022

Craxk is a SINGLE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reproduced by a single machine.

What is Craxk ? Craxk is a UNIQUE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reprod

5 Jun 19, 2021

🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library

Related tags

Overview

thinc-apple-ops

⏳ Install

🏫 Motivation

⏱ Benchmarks

Prediction

Training

Comments

Pass through Accelerate sgemm/saxpy in Ops.cblas

IndexError: Out of bounds on buffer access (axis 1)

Info about spaCy

Can't compile thinc on Macbook Air M1

AppleOps.gemm: write in-place when `output` is given

Change thinc upper bound to <8.1.0

Fix 0-size arrays

Require thinc with ops registry

Releases(v0.1.3)

v0.1.3(Dec 16, 2022)

v0.1.2(Oct 17, 2022)

v0.1.1(Sep 27, 2022)

🔴 Bug fixes

v0.0.8(Sep 27, 2022)

🔴 Bug fixes

v0.1.0(Jul 19, 2022)

✨ New features and improvements

🔴 Bug fixes

v0.0.7(May 27, 2022)

v0.0.6(May 18, 2022)

Owner

Explosion

Free APN For Python

北大选课网2021年春季验证码识别

A student information management system in Python

Used the pyautogui library to automate some processes on the computer

Impf Bot.py 🐍⚡ automation for the German

ASVspoof 2021 Baseline Systems

Simple logger for Urbit pier size, with systemd timer template

A curses based mpd client with basic functionality and album art.

An account generator for guilded.gg that I made a while back and decided to bring back up

Euler 021 Py - Euler Problem 021 solved in Python

reproduces experiments from

Craxk is a SINGLE AND NON-REPLICABLE Hash that uses data from the hardware where it is executed to form a hash that can only be reproduced by a single machine.

A collection of examples of using cocotb for functional verification of VHDL designs with GHDL.

A test repository to build a python package and publish the package to Artifact Registry using GCB

VHDL to Discrete Logic on PCB Flow

A python script for osu!lazer rulesets auto update.

Implementation of the Folders📂 esoteric programming language, a language with no code and just folders.

Get the stats of a (or more) Hypixel player(s)

Aim of the project is to reduce phishing victims. 😇

This project intends to take the user's CEP (brazilian adress code) and return the local in which the CEP is placed.