🍏 Make Thinc faster on macOS by calling into Apple's native Accelerate library

Overview

thinc-apple-ops

Make spaCy and Thinc up to 8 × faster on macOS by calling into Apple's native libraries.

Install

Make sure you have Xcode installed and then install with pip:

pip install thinc-apple-ops

🏫 Motivation

Matrix multiplication is one of the primary operations in machine learning. Since matrix multiplication is computationally expensive, using a fast matrix multiplication implementation can speed up training and prediction significantly.

Most linear algebra libraries provide matrix multiplication in the form of the standardized BLAS gemm functions. The work behind scences is done by a set of matrix multiplication kernels that are meticulously tuned for specific architectures. Matrix multiplication kernels use architecture-specific SIMD instructions for data-level parallism and can take factors such as cache sizes and intstruction latency into account. Thinc uses the BLIS linear algebra library, which provides optimized matrix multiplication kernels for most x86_64 and some ARM CPUs.

Recent Apple Silicon CPUs, such as the M-series used in Macs, differ from traditional x86_64 and ARM CPUs in that they have a separate matrix co-processor(s) called AMX. Since AMX is not well-documented, it is unclear how many AMX units Apple M CPUs have. It is certain that the (single) performance cluster of the M1 has an AMX unit and there is empirical evidence that both performance clusters of the M1 Pro/Max have an AMX unit.

Even though AMX units use a set of undocumented instructions, the units can be used through Apple's Accelerate linear algebra library. Since Accelerate implements the BLAS interface, it can be used as a replacement of the BLIS library that is used by Thinc. This is where the thinc-apple-ops package comes in. thinc-apple-ops extends the default Thinc ops, so that gemm matrix multiplication from Accelerate is used in place of the BLIS implementation of gemm. As a result, matrix multiplication in Thinc is performed on the fast AMX unit(s).

Benchmarks

Using thinc-apple-ops leads to large speedups in prediction and training on Apple Silicon Macs, as shown by the benchmarks below.

Prediction

This first benchark compares prediction speed of the de_core_news_lg spaCy model between the M1 with and without thinc-apple-ops. Results for an Intel Mac Mini and AMD Ryzen 5900X are also provided for comparison. Results are in words per second. In this prediction benchmark, using thinc-apple-ops improves performance by 4.3 times.

CPU BLIS thinc-apple-ops Package power (Watt)
Mac Mini (M1) 6492 27676 5
MacBook Air Core i5 2020 9790 10983 9
AMD Ryzen 5900X 22568 N/A 52

Training

In the second benchmark, we compare the training speed of the de_core_news_lg spaCy model (without NER). The results are in training iterations per second. Using thinc-apple-ops improves training time by 3.0 times.

CPU BLIS thinc-apple-ops Package power (Watt)
Mac Mini M1 2020 3.34 10.07 5
MacBook Air Core i5 2020 3.10 3.27 10
AMD Ryzen 5900X 6.53 N/A 53
Comments
  • Pass through Accelerate sgemm/saxpy in Ops.cblas

    Pass through Accelerate sgemm/saxpy in Ops.cblas

    This can be used by e.g. the parser in spaCy 3.4 to use Accelerate's implementations.

    I am not sure how to handle this dependency-wise, since this requires Thinc 8.1, but we still want to people to be able to use thinc-apple-ops with Thinc 8.0.x and spaCy < 3.4. Do we need another minor release that sets thinc < 8.1.0?

    opened by danieldk 5
  • IndexError: Out of bounds on buffer access (axis 1)

    IndexError: Out of bounds on buffer access (axis 1)

    Hi I tried to use this awesome package and I am getting this error. Not sure what it means, maybe you guys could help me?

    I should mention that my data is quite big and I am also using some SWAP space. Could this be the reason of this error?

    [2021-09-28 21:09:01,238] [INFO] Set up nlp object from config
    [2021-09-28 21:09:01,500] [INFO] Pipeline: ['tok2vec', 'ner', 'sentencizer', 'entity_linker']
    [2021-09-28 21:09:01,505] [INFO] Created vocabulary
    [2021-09-28 21:09:01,505] [INFO] Finished initializing nlp object
    Traceback (most recent call last):
      File "/Users/joozty/Documents/kolurbo/venv/bin/spacy", line 8, in <module>
        sys.exit(setup_cli())
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/cli/_util.py", line 69, in setup_cli
        command(prog_name=COMMAND)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1137, in __call__
        return self.main(*args, **kwargs)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1062, in main
        rv = self.invoke(ctx)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1668, in invoke
        return _process_result(sub_ctx.command.invoke(sub_ctx))
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 1404, in invoke
        return ctx.invoke(self.callback, **ctx.params)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/click/core.py", line 763, in invoke
        return __callback(*args, **kwargs)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/typer/main.py", line 500, in wrapper
        return callback(**use_params)  # type: ignore
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/cli/train.py", line 60, in train_cli
        nlp = init_nlp(config, use_gpu=use_gpu)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/training/initialize.py", line 84, in init_nlp
        nlp.initialize(lambda: train_corpus(nlp), sgd=optimizer)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/language.py", line 1272, in initialize
        proc.initialize(get_examples, nlp=self, **p_settings)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/pipeline/tok2vec.py", line 216, in initialize
        self.model.initialize(X=doc_sample)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 299, in initialize
        self.init(self, X=X, Y=Y)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 86, in init
        layer.initialize(X=curr_input, Y=Y)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 299, in initialize
        self.init(self, X=X, Y=Y)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/chain.py", line 90, in init
        curr_input = layer.predict(curr_input)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 315, in predict
        return self._func(self, X, is_train=False)[0]
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/concatenate.py", line 44, in forward
        Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/layers/concatenate.py", line 44, in <listcomp>
        Ys, callbacks = zip(*[layer(X, is_train=is_train) for layer in model.layers])
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc/model.py", line 291, in __call__
        return self._func(self, X, is_train=is_train)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/spacy/ml/staticvectors.py", line 46, in forward
        vectors_data = model.ops.gemm(model.ops.as_contig(V[rows]), W, trans2=True)
      File "/Users/joozty/Documents/kolurbo/venv/lib/python3.9/site-packages/thinc_apple_ops/ops.py", line 25, in gemm
        C = blas.gemm(x, y, trans1=trans1, trans2=trans2)
      File "thinc_apple_ops/blas.pyx", line 37, in thinc_apple_ops.blas.gemm
      File "thinc_apple_ops/blas.pyx", line 53, in thinc_apple_ops.blas.gemm
    IndexError: Out of bounds on buffer access (axis 1)
    

    Info about spaCy

    • spaCy version: 3.1.3
    • Platform: macOS-11.6-arm64-arm-64bit
    • Python version: 3.9.7
    • Pipelines: en_core_web_sm (3.1.0), en_core_web_md (3.1.0)
    opened by Joozty 2
  • Can't compile thinc on Macbook Air M1

    Can't compile thinc on Macbook Air M1

    Hello, I find myself unable to compile this otherwise magnificent tool! Please help, if you can!

    I am on MacOS 12.1, Kernel Version 21.2.0, and have installed the latest Python (3.10.2)

    Here is the error message I get after trying to install with pip (apparently it can't find the Accelerate Libraries, especially Accelerate.h Header ...):

    ERROR: Command errored out with exit status 1: command: /Library/Frameworks/Python.framework/Versions/3.10/bin/python3.10 /Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/pip/_vendor/pep517/in_process/_in_process.py build_wheel /var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/tmp0bhlw2sh cwd: /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-install-wgga78t9/thinc-apple-ops_f5b38888c7a149cd9f99fd524c2bd340 Complete output (34 lines): running bdist_wheel running build running build_py creating build creating build/lib.macosx-10.9-universal2-3.10 creating build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/init.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/ops.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops creating build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests copying thinc_apple_ops/tests/init.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests copying thinc_apple_ops/tests/test_gemm.py -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops/tests running egg_info warning: no files found matching '.pxd' under directory 'thinc_apple_ops' warning: no files found matching '.txt' under directory 'thinc_apple_ops' writing manifest file 'thinc_apple_ops.egg-info/SOURCES.txt' copying thinc_apple_ops/blas.pyx -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops copying thinc_apple_ops/py.typed -> build/lib.macosx-10.9-universal2-3.10/thinc_apple_ops running build_ext creating build/temp.macosx-10.9-universal2-3.10 creating build/temp.macosx-10.9-universal2-3.10/thinc_apple_ops clang -Wno-unused-result -Wsign-compare -Wunreachable-code -fno-common -dynamic -DNDEBUG -g -fwrapv -O3 -Wall -arch arm64 -arch x86_64 -g -I/private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include -I/Library/Frameworks/Python.framework/Versions/3.10/include/python3.10 -c thinc_apple_ops/blas.c -o build/temp.macosx-10.9-universal2-3.10/thinc_apple_ops/blas.o In file included from thinc_apple_ops/blas.c:706: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/arrayobject.h:5: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/ndarrayobject.h:12: In file included from /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/ndarraytypes.h:1960: /private/var/folders/n7/t2plqm6n2jq4khmj0bckswg40000gq/T/pip-build-env-b0flamc2/overlay/lib/python3.10/site-packages/numpy/core/include/numpy/npy_1_7_deprecated_api.h:17:2: warning: "Using deprecated NumPy API, disable it with " "#define NPY_NO_DEPRECATED_API NPY_1_7_API_VERSION" [-W#warnings] #warning "Using deprecated NumPy API, disable it with "
    ^ thinc_apple_ops/blas.c:714:10: fatal error: 'Accelerate/Accelerate.h' file not found #include "Accelerate/Accelerate.h" ^~~~~~~~~~~~~~~~~~~~~~~~~ thinc_apple_ops/blas.c:714:10: note: did not find header 'Accelerate.h' in framework 'Accelerate' (loaded from '/System/Library/Frameworks') 1 warning and 1 error generated. error: command '/Library/Developer/CommandLineTools/usr/bin/clang' failed with exit code 1

    ERROR: Failed building wheel for thinc-apple-ops Failed to build thinc-apple-ops ERROR: Could not build wheels for thinc-apple-ops, which is required to install pyproject.toml-based projects

    ------------------------------------------ END---------------------------------------------------------------------------

    Any help would be greatly appreciated, thanks!

    duplicate 
    opened by amal1us 1
  • AppleOps.gemm: write in-place when `output` is given

    AppleOps.gemm: write in-place when `output` is given

    NumpyOps.gemm (with BLIS) writes the result of matrix multiplication in-place when the output argument is given. This changes AppleOps.gemm to do the same, avoiding allocation of a temporary.

    enhancement 
    opened by danieldk 0
  • Change thinc upper bound to <8.1.0

    Change thinc upper bound to <8.1.0

    thinc-apple-ops will require thinc >= 8.1.0 in the future for the CBLAS passthrough functionality. As discussed in #15, we should first do another minor thinc-apple-ops release specifically for thinc <8.1.0.

    Also bump the version to v0.0.7 to prepare for the release.

    opened by danieldk 0
  • Fix 0-size arrays

    Fix 0-size arrays

    Our bit of Cython code uses memory buffers, which apparently have a bounds-check when the size is 0 when acquiring the pointer. In contrast, in other bits of code we often acquire the buffer by casting the array.data pointer, which has no such bounds check. This led to IndexError being raised when zero shapes were passed through.

    opened by honnibal 0
  • Require thinc with ops registry

    Require thinc with ops registry

    Technically it doesn't require a currently unreleased version of thinc to run, but if people install it into an existing venv, then it's better to require the version of thinc to upgraded so that it's detected and used.

    opened by adrianeboyd 0
Releases(v0.1.3)
Owner
Explosion
A software company specializing in developer tools for Artificial Intelligence and Natural Language Processing
Explosion
Tiny demo site for exploring SameSite=Lax

samesite-lax-demo Background on my blog: Exploring the SameSite cookie attribute for preventing CSRF This repo holds some tools for exploring the impl

Simon Willison 6 Nov 10, 2021
The third home of the bare Programming Language (1st there's my heart, the forest came second and then there's Github :)

The third home of the bare Programming Language (1st there's my heart, the forest came second and then there's Github :)

Garren Souza 7 Dec 24, 2022
Consulta cpf fds

Consulta-cpf Consulta cpf fds Instalação: apt-get update -y

Moleey 1 Nov 24, 2021
Make your Discord Account Online 24/7!

Online-Forever Make your Discord Account Online 24/7! A Code written in Python that helps you to keep your account 24/7. The main.py is the main file.

SealedSaucer 0 Mar 16, 2022
Basic code and description for GoBigger challenge 2021.

GoBigger Challenge 2021 en / 中文 Challenge Description 2021.11.13 We are holding a competition —— Go-Bigger: Multi-Agent Decision Intelligence Challeng

OpenDILab 183 Dec 29, 2022
A simple hash system.

PBH-Hash-System A simple hash system. Usage You could use it like this: from pbh import pbh print(pbh("Hey", True)) Output: 2feae2471698cfcdcbd6b98ca

Karim 3 Mar 24, 2022
OpenTable Reservation Maker For Python

OpenTable-Reservation-Maker The code that corresponds with this blog post on writing a script to make reservations for me on opentable Getting started

JonLuca De Caro 36 Nov 10, 2022
Python library for parsing Godot scene files

Godot Parser This is a python library for parsing Godot scene (.tscn) and resource (.tres) files. It's intended to make it easier to automate certain

Steven Arcangeli 30 Jan 04, 2023
How did Covid affect businesses?

NYC_Business_Analysis How did Covid affect businesses? COVID's effect on NYC businesses We all know that businesses in NYC have been affected by COVID

AK 1 Jan 15, 2022
Chemical equation balancer

Chemical equation balancer Balance your chemical equations with ease! Installation $ git clone

Marijan Smetko 4 Nov 26, 2022
Functional collections extension functions for Python

pyfuncol pyfuncol Installation Usage API Documentation Compatibility Contributing License A Python functional collections library. It extends collecti

Andrea Veneziano 32 Nov 16, 2022
Structural basis for solubility in protein expression systems

Structural basis for solubility in protein expression systems Large-scale protein production for biotechnology and biopharmaceutical applications rely

ProteinQure 16 Aug 18, 2022
Edorado93 - Unraveling a Rockstar! -- Too much? Fine, Unraveling a humble programmer then?

Hi, I'm Sachin Malhotra ( ⛄ 💻 🎃 🍺 ) Let me set the records straight. Roger Federer is the GOAT and I will not hear otherwise! Now that we have that

Sachin Malhotra 7 Dec 25, 2022
A Blender addon to enable reloading linked libraries from UI.

library_reload_linked_libraries A Blender addon to enable reloading linked libraries from UI.

3 Nov 27, 2022
bamboo-engine 是一个通用的流程引擎,他可以解析,执行,调度由用户创建的流程任务,并提供了如暂停,撤销,跳过,强制失败,重试和重入等等灵活的控制能力和并行、子流程等进阶特性,并可通过水平扩展来进一步提升任务的并发处理能力。

bamboo-engine 是一个通用的流程引擎,他可以解析,执行,调度由用户创建的流程任务,并提供了如暂停,撤销,跳过,强制失败,重试和重入等等灵活的控制能力和并行、子流程等进阶特性,并可通过水平扩展来进一步提升任务的并发处理能力。 整体设计 Quick start 1. 安装依赖 2. 项目初始

腾讯蓝鲸 96 Dec 15, 2022
Student Result Management System Project in tkinter created based on python, tkinter, and SQLITE3 Database

Student-Result-Management-System This Student Result Management System Project in tkinter created based on python, tkinter, and SQLITE3 Database. The

Ravi Chauhan 2 Aug 03, 2022
Open source book about making Python packages.

Python packages Tomas Beuzen & Tiffany Timbers Python packages are a core element of the Python programming language and are how you create organized,

Python Packages 169 Jan 06, 2023
J MBF - Assalamualaikum Mamang...

★ VISITOR ★ ★ INFORMATION ★ Script Ini DiBuat Oleh YayanXD Script Ini Akan DiPerjual Belikan Tanggal 9 Januari 2022 Jika Mau Beli Script Silahkan Hub

Risky [ Zero Tow ] 5 Apr 08, 2022
NeoInterface - Neo4j made easy for Python programmers!

Neointerface - Neo4j made easy for Python programmers! A Python interface to use the Neo4j graph database, and simplify its use. class NeoInterface: C

15 Dec 15, 2022
Push Prometheus metrics to VictoriaMetrics or other exporters

Push metrics from your periodic long-running jobs to existing Prometheus/VictoriaMetrics monitoring system.

olegm 14 Nov 04, 2022