⚡️Optimizing einsum functions in NumPy, Tensorflow, Dask, and more with contraction order optimization.

Overview

Optimized Einsum

Tests codecov Anaconda-Server Badge PyPI PyPIStats Documentation Status DOI

Optimized Einsum: A tensor contraction order optimizer

Optimized einsum can significantly reduce the overall execution time of einsum-like expressions (e.g., np.einsum, dask.array.einsum, pytorch.einsum, tensorflow.einsum, ) by optimizing the expression's contraction order and dispatching many operations to canonical BLAS, cuBLAS, or other specialized routines.

Optimized einsum is agnostic to the backend and can handle NumPy, Dask, PyTorch, Tensorflow, CuPy, Sparse, Theano, JAX, and Autograd arrays as well as potentially any library which conforms to a standard API. See the documentation for more information.

Example usage

The opt_einsum.contract function can often act as a drop-in replacement for einsum functions without further changes to the code while providing superior performance. Here, a tensor contraction is performed with and without optimization:

import numpy as np
from opt_einsum import contract

N = 10
C = np.random.rand(N, N)
I = np.random.rand(N, N, N, N)

%timeit np.einsum('pi,qj,ijkl,rk,sl->pqrs', C, C, I, C, C)
1 loops, best of 3: 934 ms per loop

%timeit contract('pi,qj,ijkl,rk,sl->pqrs', C, C, I, C, C)
1000 loops, best of 3: 324 us per loop

In this particular example, we see a ~3000x performance improvement which is not uncommon when compared against unoptimized contractions. See the backend examples for more information on using other backends.

Features

The algorithms found in this repository often power the einsum optimizations in many of the above projects. For example, the optimization of np.einsum has been passed upstream and most of the same features that can be found in this repository can be enabled with np.einsum(..., optimize=True). However, this repository often has more up to date algorithms for complex contractions.

The following capabilities are enabled by opt_einsum:

Please see the documentation for more features!

Installation

opt_einsum can either be installed via pip install opt_einsum or from conda conda install opt_einsum -c conda-forge. See the installation documentation for further methods.

Citation

If this code has benefited your research, please support us by citing:

Daniel G. A. Smith and Johnnie Gray, opt_einsum - A Python package for optimizing contraction order for einsum-like expressions. Journal of Open Source Software, 2018, 3(26), 753

DOI: https://doi.org/10.21105/joss.00753

Contributing

All contributions, bug reports, bug fixes, documentation improvements, enhancements, and ideas are welcome.

A detailed overview on how to contribute can be found in the contributing guide.

Comments
  • added dynamic programming path

    added dynamic programming path

    Description

    In the field of tensor network states (matrix product states, projected entangled pair states, ...) the most common approach to optimise tensor network contractions is based on dynamic programming [Phys. Rev. E 90, 033315 (2014)], which usually provides an optimal contraction path within a reasonable amount of time (even for a large number of tensors). This pull request provides such an implementation.

    Questions

    • [ ] Is there a simple way to run the benchmarks of the "Performance Comparison" section of the documentation for the dynamic programming optimiser to compare with the other approaches?
    opened by mrader1248 45
  • [WIP] recursive, depth-first optimizer

    [WIP] recursive, depth-first optimizer

    Use recursive, depth-first optimizer for optimal & possibly 'greedy2'

    I had a play around with making the opt_einsum optimal path finder a bit more efficient. The basic idea is:

    1. Use a recursive search to simplify the logic and find paths 'depth-first', i.e. follow a path all the way to its termination before searching for the next one.
    2. Allowed by this, terminate searches as soon as the path cost increases above the best found so far, sieving many known-bad paths

    The results are pretty promising, current optimal:

    >> eq, shapes = oe.helpers.rand_equation(8, 3, seed=42)
    >> views = [np.random.rand(*s) for s in shapes]
    >> %time path = oe.contract_path(eq, *views, optimize='optimal')[1]
    CPU times: user 25.3 s, sys: 716 ms, total: 26 s
    Wall time: 26 s
    >>> path.opt_cost
    130316
    

    and with this recursive strategy:

    >> %time path = oe.contract_path(eq, *views, optimize='roptimal')[1]
    CPU times: user 107 ms, sys: 324 µs, total: 107 ms
    Wall time: 108 ms
    >>> path.opt_cost
    130316
    

    The fundamental scaling is still factorial, but it should allow optimal-paths to be realistically found for expressions of up to 10 tensors. Possibly other heuristics can be used to sieve paths eagerly as well.

    One additional option this opens up is to recurse only into the n best possible contractions (current greedy/eager being n=1). Thus you could have a 'greedy2', 'greedy3' etc, which scale like 2**n, 3**n which while exponential (modulo sieving), is significantly more manageable than factorial scaling. These might use custom cost functions #64 and random shaking as well #63.

    Anyway, thought I'd share this WIP initial implementation. My suggestion might be to have this approach eventually replace optimal, with 'greedy', 'greedy2'... using it also.

    cc @fritzo --- took some inspiration from your 'eager' path if you have any input!

    TODO:

    • [x] apply some caching?
    • [ ] keep track of keep indices rather than computing each time?
    • [x] handle memory_limit case properly? (i.e. contract all remaining tensors at once)
    enhancement 
    opened by jcmgray 34
  • Use best of many random greedy paths

    Use best of many random greedy paths

    In quantum computing simulations particularly, finding the optimal contraction sequence has a huge effect on what is computable - the amount of memory and time required is exponential in how 'bad' the path found is. The greedy approach is generally worse than the optimal approach but it is cheap to compute (and even cheaper with #60!).

    If we introduce a random greedy algorithm by tweaking the cost objective function, we get a variety of different paths with varying costs and we can just pick the best one. I had a little play with contracting a 49 qubit depth 20 circuit (661 tensors overall) and randomly adjusting each cost by +- 5%.

    The following are 'contraction complexities' (maximum rank of tensor produced - 2^this is the memory required which is also a rough proxy for computation time) on the left and frequency on the right from 100 runs.

    [(25, 1),
     (27, 1),
     (28, 3),
     (29, 6),
     (30, 10),
     (31, 16),
     (32, 6),  # <- this is what the current greedy finds
     (33, 9),
     (34, 21),
     (35, 8),
     (36, 8),
     (37, 1),
     (38, 4),
     (39, 2),
     (40, 1),
     (42, 3)]
    

    So the best contraction found vs the current contraction found requires ~128x less memory and computation, and turns the simulation from 'super'-computable to 'laptop'-computable.

    enhancement 
    opened by jcmgray 19
  • Implement shared_intermediates context manager

    Implement shared_intermediates context manager

    Resolves #9

    Description

    This implements a shared_intermediates context manager, within which multiple calls to contract will share intermediate computations. This is very useful e.g. in message passing algorithms. The first draft of this PR is copied directly from Pyro, where we are using sharing for probabilistic inference in graphical models.

    The implementation uses a special internal backend opt_einsum.backends.shared. When a sharing context is opened

    with shared_intermediates():
        contract(..., backend='foo')
        contract(..., backend='foo')
    

    a special handle_sharing() context is activated inside the contract function, temporarily switching to the shared backend, and setting the original backend foo in a global shared._CURRENT_BACKEND variable. The sharing backend then performs all the original foo operations and also memoizes them.

    One design choice is to memoize by id() rather than by value. This makes sense from a computational perspective (equality comparison is expensive), but requires a bit more care by users.

    A second design choice is to expose the memoization cache to users. This makes it easy to share intermediates by passing the cache around, as is necessary when it is difficult to perform all computation in a context:

    with shared_intermediates() as cache:
        contract(...)
    ...pass control elsewhere...
    with shared_intermediates(cache):  # <-- reuse previous cache
        contract(...)
    del cache  # <-- intermediates will be garbage collected
    

    Todos

    • [x] Cache type conversions e.g. numpy->torch->numpy so args can be compared by id.
    • [x] fix failing tests
    • [x] add tests that pass around the cache
    • [x] add tests where we tests the normal ordering:
      with shared_intermeidates() as cache:
          a = np.random.rand(4, 4)
          b = np.random.rand(4, 4)
          contract("ab,bc->ac", a, b)
      
      assert get_cache_key("zc,az->ac", b, a) in cache
      ...
      
    • [x] test against the Pyro rm-einsum-shared branch
    • [x] test nested sharing
    • [x] write docs
    • [x] move _alpha_canonicalize() to parser.py
    • [x] test caching with constant expressions

    Questions

    • [x] Could a maintainer suggest where the handle_sharing() context should be moved? I've attempted to insert it into ContractExpression.__call__(), but I'm a bit lost.

    Status

    • [x] Ready to go
    enhancement 
    opened by fritzo 16
  • Re-use intermediates across various contractions (Common Subexpression Elimination)

    Re-use intermediates across various contractions (Common Subexpression Elimination)

    Suppose you want to compute two contractions of the same tensors, e.g. contraction strings

    ['ab,dca,eb,cde', 'ab,cda,eb,cde']

    The (globally) optimal way to do this would be to first perform the contractions over indices a,b and e, and then perform the remaining contractions over c and d for the two sets of contractions. The current opt_einsum implementation does not allow for such a global optimization of contraction order and re-use of common intermediates.

    I'm still organizing my thoughts on this, and all input would be most welcome. On a side note, I believe implementing such a more general optimization strategy will also fix #7 as a by-product.

    Optimization logic

    A relevant publication suggesting a concrete algorithm for this optimization problem is

    Hartono et al, Identifying Cost-Effective Common Subexpressions to Reduce Operation Count in Tensor Contraction Evaluations

    I do not know to what extent the current code can be re-used in this more general setup, but the single-term optimization should be under control with the current implementation.

    Interface

    Such a multi_contract function could be called with a list of tuples (contractionString, [tensors, ..]), and would return a list of results with the same length as the input list.

    Internally, one would have to find out which tensors are actually identical between the various contractions, and then use the above contraction logic. Ideally this information should be deduced automatically and not rely on user input being in the correct form. In the same spirit, dummy indices should be allowed to have arbitrary names, i.e. they should not have to match across different contractions to be correctly identified as a common subexpression. This may require transforming the input into a 'canonical' form first to make sure that common subexpressions are captured correctly.

    In contrast to the setup outlined in Hartono et al, contraction strings should maybe remain in their current 'simple' form and not be generalized to allow for numerical expressions like sums of tensors etc. Such a behavior can be implemented a posteriori with the interface described here by computing the sum of the resulting tensors, e.g.

    contract('ab,ab + ab,ba', N,M) --> 'sum'(multi_contract( [('ab,ab', N,M), ('ab,ba', N,M)] ))

    Thus, restricting contraction strings to be of the form currently used does not cause loss of generality, the only downside being that it might lead to a slightly increased memory-footprint as the function would return several arrays instead of one.

    Other thoughts?

    enhancement 
    opened by ebatz 15
  • Support more symbols for subscripts (#67)

    Support more symbols for subscripts (#67)

    • Add more symbol support based on hash

    • Refactor the converting to subscripts routine a little bit

    • Update tests

    Description

    Make opt_einsum support anything hashable as subscripts, such as:

    contract(arr1, ['left edge', 'bond'], arr2, ['bond', 'right edge'])
    

    See #67

    Questions

    • [x] I've tried to add the new feature to the document, but it seems the document does not have a chapter describing the inputs. Maybe I've left out something or maybe this should be solved in a different PR?

    Status

    • [x] Ready to go
    enhancement 
    opened by liwt31 14
  • non-numpy (e.g. GPU) backend support

    non-numpy (e.g. GPU) backend support

    This adds non-numpy backend support to opt_einsum in two flavors:

    General duck-typed support

    For arbitrary ndarray libraries, i.e. if the arrays supplied have got a .shape attribute and opt_einsum can find {backend}.tensordot etc then it should work, returning whatever type the library uses. So for example

    contract("abc,bcd,def->aef", *dask_arrays, backend='dask')
    

    will return a dask.Array without any explicit logic. I've added tests for dask and also sparse.

    Edit: I should add that extra functionality like taking diagonals etc is still reliant on a einsum implementation.

    Special conversion support

    To support GPU operations on numpy arrays, here, on first call the arrays are converted to the correct placeholder, an expression generated using these, and then this expression can be called with normal numpy arrays, returning numpy arrays also. E.g.:

    expr = contract_expression("abc,bcd,def->aef", *shapes)
    expr(*numpy_arrays, backend='theano')
    

    Explicit tested support here is for tensorflow, theano and cupy - GPU libraries where it makes sense to convert to and from.

    Adding more backends

    contract.py doesn't have any explicit reference to the various backends, thats all contained in backends.py now. So it should be very easy to add new backends. E.g. if tblis were to add tensordot, it would fall under the duck-typing category above and I think no changes would actually be needed at all.

    Obviously any recommendations welcome! Getting the right backend function and all that is cached so I don't think this has added any more than a few 100ns overhead.

    opened by jcmgray 14
  • `Optimal` path not optimal?

    `Optimal` path not optimal?

    First, thanks for the great package!

    Was running optimal vs. auto on a 10 element random expression, and while I know optimal is not recommended for larger networks, I found it odd that it gives worse scaling and with a larger intermediate size. Some of the other threads mention memory as a factor in the optimization, but the auto solution seems better on memory as well? All the tensors are 3x3. Is this intentional behaviour?

    auto path:

    ([(2, 3), (4, 8), (0, 4), (0, 4), (4, 5), (0, 2), (2, 3), (1, 2), (0, 1)],
       Complete contraction:  db,cc,fe,fe,aa,ff,fe,cb,ea,ac->d
              Naive scaling:  6
          Optimized scaling:  3
           Naive FLOP count:  7.290e+3
       Optimized FLOP count:  2.700e+2
        Theoretical speedup:  27.000
       Largest intermediate:  9.000e+0 elements
     --------------------------------------------------------------------------------
     scaling        BLAS                current                             remaining
     --------------------------------------------------------------------------------
        2              0              fe,fe->fe         db,cc,aa,ff,fe,cb,ea,ac,fe->d
        2              0              fe,fe->fe            db,cc,aa,ff,cb,ea,ac,fe->d
        3           GEMM              cb,db->cd               cc,aa,ff,ea,ac,fe,cd->d
        2              0              ac,cc->ac                  aa,ff,ea,fe,cd,ac->d
        3           GEMM              ac,cd->ad                     aa,ff,ea,fe,ad->d
        2              0              ea,aa->ea                        ff,fe,ad,ea->d
        3           GEMM              ea,ad->ed                           ff,fe,ed->d
        3           GEMM              ed,fe->df                              ff,df->d
    

    optimal path:

    ([(1, 7), (0, 8), (0, 2), (0, 1), (0, 5), (0, 3), (0, 3), (0, 2), (0, 1)],
       Complete contraction:  db,cc,fe,fe,aa,ff,fe,cb,ea,ac->d
              Naive scaling:  6
          Optimized scaling:  4
           Naive FLOP count:  7.290e+3
       Optimized FLOP count:  5.130e+2
        Theoretical speedup:  14.211
       Largest intermediate:  2.700e+1 elements
     --------------------------------------------------------------------------------
     scaling        BLAS                current                             remaining
     --------------------------------------------------------------------------------
        2              0              cb,cc->cb         db,fe,fe,aa,ff,fe,ea,ac,cb->d
        3           GEMM              cb,db->cd            fe,fe,aa,ff,fe,ea,ac,cd->d
        3              0             aa,fe->afe              fe,ff,fe,ea,ac,cd,afe->d
        2              0              ff,fe->fe                 fe,ea,ac,cd,afe,fe->d
        2              0              fe,fe->fe                    ea,ac,cd,afe,fe->d
        3              0            afe,ea->afe                       ac,cd,fe,afe->d
        4           GEMM            afe,ac->fec                          cd,fe,fec->d
        4           GEMM            fec,cd->fed                             fe,fed->d
        3           GEMM              fed,fe->d                                  d->d) 
    
    bug 
    opened by Bonnevie 13
  • [WIP] allow constant tensors in expressions

    [WIP] allow constant tensors in expressions

    This allows constant tensors to be supplied to contract_expression, which will then build as many constant intermediaries as it can. Still a bit of a work in progress but the core functionality is working so thought I'd share. Resolves #7.

    Example

    Basic setup:

    >>> from opt_einsum import contract_expression
    >>> from numpy.random import rand
    
    >>> equation = 'ij,jk,kl,lm,mn->ni'
    >>> const_ops = rand(2, 2), rand(2, 2), rand(2, 2)
    >>> ops = (3, 2), *const_ops, (2, 3)  # non-const ops are just shapes
    
    >>> expr = contract_expression(equation, *ops, constants=[1, 2, 3])
    >>> expr
    <ContractExpression('ij,[jk,kl,lm],mn->ni', constants=[1, 2, 3])>
    
    >>> print(expr)
    <ContractExpression('ij,[jk,kl,lm],mn->ni', constants=[1, 2, 3])>
      1.  'jm,ij->mi' [GEMM]
      2.  'mi,mn->ni' [GEMM]
    

    Now we can call it with just two arrays:

    >>> expr(rand(3, 2), rand(2, 3))
    array([[0.10982582, 0.06334146, 0.09379349],
           [0.09159049, 0.05282374, 0.07821902],
           [0.15775775, 0.09097925, 0.13471551]])
    
    >>> expr(rand(3, 2), rand(2, 3))
    array([[0.19207156, 0.21354127, 0.04811443],
           [0.01633236, 0.01815862, 0.004091  ],
           [0.06079124, 0.06758219, 0.0152304 ]])
    

    Notes

    • Currently this just performs as many contractions in the contraction_list as possible, stopping when it reaches a non-constant array. This means that some potential constant contractions might still be missed even though they don't depend on anything (e.g. 'a,a,b,b' with constants=[0, 1] might still choose to contract b,b first). One solution would be to somehow reshuffle the contraction_list but this gets a bit messy.
    • How do we feel about the repr?

    TODO

    • [x] ~~Sort out interaction with backend (need to not cache expression on first constants pass, and deal with various conversion points (e.g. keep constants on GPU)~~
    • [x] Document
    • [ ] Add edge case tests? Additional errors to help user?
    opened by jcmgray 13
  • shared_intermediates and contraction trees

    shared_intermediates and contraction trees

    The shared_intermediates example in the docs is to calculate multiple marginals, which is a use case of interest to me. Calculating marginals is equivalent to calculating environment tensors, so I'm interested in using the approach from https://arxiv.org/abs/1310.8023 which basically guarantees that calculating all marginals has the same cost as 3x the cost of calculating 1 marginal by using extensive memoization. This is ensured by making each contraction path a member of the same contraction tree family, with each tree node at worst corresponding to 3 different contractions that can be appropriately memoized. This function is called multienv() in various tensor network packages.

    Here is a rough but working prototype of a ContractionTree class that can generate the various contraction paths by calls to get_environment_path.

    Contraction tree prototype
    class ContractionTree:
        """Interface for finding contraction paths for each possible environment tensor with high cache reuse.
    
        Based on "Improving the efficiency of variational tensor network algorithms" by Evenbly and Pfeifer. Returned
        environment paths stay in same contraction tree family.
        """
    
        def __init__(self, tensors, optimize='greedy'):
            operands = chain.from_iterable([(tensor.tensor, tensor.axis_names) for tensor in tensors])
            self.partition_path, _ = contract_path(*operands, (), optimize=optimize)
            self.contraction_tree = nx.DiGraph()
            for index, tensor in enumerate(tensors):
                self.contraction_tree.add_node(tensor, index=index)
            self.tensors = tensors
            self.cache = HitDict()
    
            active_nodes = [tensor for tensor in tensors]
            for path_step in self.partition_path:
                children = []
                for index in sorted(path_step, reverse=True):
                    children.append(active_nodes.pop(index))
                active_nodes.append(frozenset(children))
                for child in children:
                    self.contraction_tree.add_edge(child, active_nodes[-1])
    
        def get_environment_path(self, marginal_tensor):
            """Returns opt_einsum path for computing the environment tensor of `marginal_tensor`"""
            environment_tree = self.contraction_tree.copy()
    
            def redirect_edges(node, child, visited):
                """recursive function that redirects all edges towards initial node passed"""
                visited = visited + [node]
                out_edges = list(environment_tree.out_edges(node))
                for start, end in out_edges:
                    if child is None or end != child:
                        environment_tree.remove_edge(start, end)
                        environment_tree.add_edge(end, start)
                    if end not in visited:
                        redirect_edges(end, node, visited)
                in_edges = list(environment_tree.in_edges(node))
                for start, end in in_edges:
                    if start not in visited:
                        redirect_edges(start, node, visited)
    
            #  redirect and remove redundant nodes
            redirect_edges(marginal_tensor, None, [marginal_tensor])
            nodes = list(environment_tree.nodes)
            for node in nodes:
                in_edges = environment_tree.in_edges(node)
                out_edges = environment_tree.out_edges(node)
                if len(out_edges) and len(in_edges) == 1:
                    origin = list(in_edges)[0][0]
                    for out_edge in out_edges:
                        environment_tree.add_edge(origin, out_edge[1])
                    environment_tree.remove_node(node)
    
            #  maintain dict mapping tensor -> index position in list of tensors
            positions = {tensor: index for index, tensor in enumerate(self.tensors)}
            positions = {tensor: position - (positions[marginal_tensor] < position)
                         for tensor, position in positions.items()}
            del positions[marginal_tensor] #  assume the query tensor is removed from input string
    
            #  rebuild path from reoriented contraction tree
            path = []
            active = len(self.tensors) - 1
            for node in nx.topological_sort(environment_tree):
                children = [start for start, end in environment_tree.in_edges(node)]
                if len(children) > 1:
                    path.append(tuple(positions[tensor] for tensor in children))
                    positions = {tensor: position - sum(positions[removed_tensor] < position for removed_tensor in children)
                                 for tensor, position in positions.items() if tensor not in children}
                    active = active - len(children) + 1
                    positions[node] = active - 1
            return path
    

    My issue is

    1. using the HitDict class proposed for counting cache hits it seems that using these paths gives me 0 cache hits?? I am using jax, but tried with numpy as well.
    2. Is shared_intermediates capable of reusing high-order intermediates as found in the contraction tree?

    I am calling it using interleaved input (I am using a wrapper of tensor with an axis_names attribute):

        def get_environment(self, tensors, environment_of, optimize='auto', **kwargs):
            environment_tensors = [tensor for tensor in tensors if tensor != environment_of]
            operands = chain.from_iterable([(tensor.tensor, tensor.axis_names) for tensor in environment_tensors])
            return oe.contract(*operands, environment_of.axis_names, optimize=optimize, **kwargs)
        
        def marginals(all_tensors, marginal_tensors):
            tree = ContractionTree(all_tensors)
            marginals = []
            for marginal_tensor in marginal_tensors:
                path = tree.get_environment_path(marginal_tensor)
                with shared_intermediates(tree.cache):
                    marginals.append(get_environment(all_tensors, marginal_tensor, optimize=path))
        return marginals, tree.cache
                
    

    I realize this might be outside the scope of opt_einsum, but it also seems like this would be a valuable extension of shared_intermediates - or it might already be supported via the current caching mechanism and there's just a bug in my implementation :)

    opened by Bonnevie 12
  • v3.0 Planning

    v3.0 Planning

    We are planning a major version bump for the following reasons:

    • Transition to be backend agnostic (NumPy is no longer the principle backend).
      • See #81 which has the (small) potential to break current implementations.

    TODO Items:

    • [ ] Refactoring in preparation for NEP18 (#46) should be considered.
    • [ ] Consider adding a Ricci calculus parser contract('A_ij B_jk A_kl', {"A": a, "B": b}).
    • [x] Run through the backends to check where we can refactor.
    • [x] Update docs and README to reflect that we are NumPy agnostic.
    • [ ] Consider parsers that use metrics besides raw FLOPS (actually trying out several contraction paths for example).

    Are there other items where we might need to do a few (small) compatibility breaks that would be good for future sustainability?

    release 
    opened by dgasmith 12
  • Contigugous loss after opt-einsum that may affect the efficiency of subsequent operations

    Contigugous loss after opt-einsum that may affect the efficiency of subsequent operations

    After a tensor contraction, the resulting tensor from numpy.einsum is C-contiguous, and in opt_einsum, such properties may be lost. If there is a further tensor addition, the non-contiguous array may be slower. Is there any solution to let the resulting tensor from opt_einsum be still C-contiguous? If not, is there any option to specify the resulting tensor to be C-contiguous, while letting the searching of the path within C-contiguous constrain?

    For example, the following code

    import numpy as np
    import opt_einsum as oe
    import time
    
    
    na = 5
    nb = 5
    nc = 8
    nd = 8
    ne = 2
    nf = 1
    ng = 8
    nh = 8
    ni = 8
    nj = 8
    
    dim_a = (na,nb,nc,nd,ne,nf)
    dim_b = (ne,nf,ng,nh,ni,nj)
    dim_c = (na,nb,nc,nd,ng,nh,ni,nj)
    n_times = 20
    
    contraction = 'abcdef,efghij -> abcdghij'
    a = np.random.random(dim_a)
    b = np.random.random(dim_b)
    c = np.einsum(contraction, a, b)
    d = oe.contract(contraction, a, b)
    
    sum_array = np.zeros(dim_c)
    print(a.flags)
    print(b.flags)
    print(c.flags)
    print(d.flags)
    
    print('list')
    list_a = []
    list_b = []
    list_c = []
    for n in range(n_times):
        a = np.random.random(dim_a)
        b = np.random.random(dim_b)
        list_a.append(a)
        list_b.append(b)
    
    
    print('np')
    
    start = time.time()
    for n in range(n_times):
        list_c.append( np.einsum(contraction, list_a[n], list_b[n]) )
    print(time.time() - start) 
    
    sum_array = np.zeros(dim_c)
    start = time.time()
    for n in range(n_times):
        np.add(sum_array, list_c[n], out = sum_array)
    print(time.time() - start) 
    
    
    print('oe')
    opt_path = oe.contract_expression(contraction, a.shape, b.shape, optimize = "optimal") 
    
    list_a = []
    list_b = []
    list_c = []
    for n in range(n_times):
        a = np.random.random(dim_a)
        b = np.random.random(dim_b)
        list_a.append(a)
        list_b.append(b)
    
    
    start = time.time()
    for n in range(n_times):
       list_c.append(opt_path(list_a[n], list_b[n]))
    print(time.time() - start) 
    
    #start = time.time()
    #for n in range(n_times):
    #    list_c[n] = np.ascontiguousarray(list_c[n])
    #print(time.time() - start) 
    
    
    
    sum_array = np.zeros(dim_c)
    start = time.time()
    for n in range(n_times):
      #  list_c[n] = np.ascontiguousarray(list_c[n])
        np.add(sum_array, list_c[n], out = sum_array)
    print(time.time() - start) 
    

    The result is

      C_CONTIGUOUS : True
      F_CONTIGUOUS : False
      OWNDATA : True
      WRITEABLE : True
      ALIGNED : True
      WRITEBACKIFCOPY : False
    
      C_CONTIGUOUS : True
      F_CONTIGUOUS : False
      OWNDATA : True
      WRITEABLE : True
      ALIGNED : True
      WRITEBACKIFCOPY : False
    
      C_CONTIGUOUS : True
      F_CONTIGUOUS : False
      OWNDATA : True
      WRITEABLE : True
      ALIGNED : True
      WRITEBACKIFCOPY : False
    
      C_CONTIGUOUS : False
      F_CONTIGUOUS : False
      OWNDATA : False
      WRITEABLE : True
      ALIGNED : True
      WRITEBACKIFCOPY : False
    
    list
    np
    1.9423425197601318
    0.32601356506347656
    oe
    1.6317214965820312
    1.2006006240844727
    

    which the np.add part is much slower in opt_einsum than einsum. If I use np.ascontiguousarray to let the resulting array be C-contiguous, this step itself is slow.

    opened by Geositta2000 7
  • [WIP] add a preprocessor

    [WIP] add a preprocessor

    Description

    This adds a potential preprocessor that takes an equation and converts it into a minimal form with the following simplifications:

    • All indices which only appear on a single input (and not the output) are summed over.
    • All indices which appear multiple times on the same term are traced.
    • All scalars are multiplied into the smallest other term
    • All terms with the same indices are multiplied (hamadard product / elementwise) into a single term ('de-duplication').

    along with a function, that transforms input arrays into the inputs for the new simplified eq. It is a function so that it could be part of a contract 'expression'.

    This would address (#114, #99, #112, #167, #189, ...).

    Example from the docstring:

    eq = ',ab,abee,,cd,cd,dd->ac'
    arrays = helpers.build_views(eq)
    new_eq, simplifier = make_simplifier(eq)
    new_eq
    # 'ab,cd,d->ac'
    sarrays = simplifier(arrays)
    oe.contract(new_eq, *sarrays)
    # array([[0.65245661, 0.14684493, 0.42543411, 0.32350895],
    #        [0.38357005, 0.08632807, 0.25010672, 0.19018636]])
    

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [ ] when and how should this be called?
    • [ ] return a partial path rather than a 'simplifier'?
    • [ ] should shape information (size_dict) be used and propagated for use with a contraction expression (and so that the smallest tensor to multiply scalars into can be most accurately chosen)?
    • [ ] other simplifications are possible (e.g. hadamard deduplication up to transpose, performing all non-rank increasing contractions)...
    • [x] ~currently the parsing is a single sweep, should it iteratively sweep til no more changes? E.g. currently examples like ab,ab-> transform to ab-> but if processed again could transform to ->~

    Status

    • [ ] Ready to go
    opened by jcmgray 1
  • Disable torch.backends.opt_einsum to avoid duplicate work

    Disable torch.backends.opt_einsum to avoid duplicate work

    Description

    Recently, torch.einsum has improved to automatically optimize for multi-contractions if the opt_einsum library is installed. This way, torch users can reap benefits easily. However, this change may inadvertently cause regressions for those who use opt_einsum.contract with torch tensors. While these use cases can be improved by just changing opt_einsum.contract calls to torch.einsum, it is not right to cause silent regressions.

    Consider the following example:

    import torch
    import opt_einsum
    
    A = torch.rand(4, 5)
    B = torch.rand(5, 6)
    C = torch.rand(6, 7) 
    D = opt_einsum.contract('ij,jk,kl->il', A, B, C)
    

    Looking through the code, opt_einsum.contract will figure out an ideal path + return that path in the form of contraction tuples (commonly pairs). Then, for each of the contraction tuples, opt_einsum may call BACK into the torch backend and call torch.einsum.

    Since the contractions are commonly pairs, calling back into torch.einsum will do what it used to--just directly do the contraction.

    HOWEVER. There are cases where the contractions are not necessarily pairs (@dgasmith had mentioned that hadamard products are faster when chained vs staggered in pairs) and in this case, torch.einsum will do unnecessary work in recomputing an ideal path, which is a regression from the previous state.

    This PR simply asks "hey, does the opt_einsum backend exist in torch?" If so, this means torch.einsum will do the unnecessary work in recomputing an ideal path, so let's just turn it off.

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [x] Add the code to disable torch's opt_einsum optimization
    • [ ] Test cases?

    Questions

    • [ ] Unsure how to best test this/what use case would ensure that, without this code, the path would be computed more than once, and after this code, the path would only be computed at the top level. We could just test that torch.backends.opt_einsum.enabled is globally False, I guess.

    Status

    • [ ] Ready to go
    opened by janeyx99 8
  • Remove NumPy as a hard dependency

    Remove NumPy as a hard dependency

    Description

    This PR removes NumPy from opt_einsum as a runtime dependency. The change has little effect on the opt_einsum code base but significantly effects the testing infrastructure. Several testing strategies are currently employed to help understand the most effective one. Currently, these are:

    • Move all NumPy functions to a testing.py module with pytest.skip
    • Various pytest.skip functions wrapped in decorators
    • Local NumPy imports.

    Feedback welcome!

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [x] Remove unused custom tensordot code
    • [ ] Remove import numpy from possibly_convert_to_numpy
    • [ ] Fix ssa_to_linear with a proper implementation

    Questions

    • [ ] Question1

    Status

    • [ ] Ready to go
    opened by dgasmith 3
  • Thoughts on making the numpy dependency optional?

    Thoughts on making the numpy dependency optional?

    Proposal

    Make the numpy dependency optional, if possible.

    Why?

    Minimizing dependencies is a general goal as it allows a bigger audience to reap the benefits of this library. More specifically, some of us are interested in making opt_einsum a hard dependency for torch, but we would like to keep numpy unrequired. (If you're curious why torch does not have a hard dependency on numpy, see https://github.com/pytorch/pytorch/issues/60081, tl;dr being the last comment.)

    A hard dependency would mean all torch users would get the benefits of opt_einsum right away without thinking too hard/needing to manually install opt_einsum themselves.

    Alternatives

    We could also have torch vendor in opt_einsum, but that is increased complexity/maintenance + we would like to automatically subscribe to improvements in opt_einsum!

    opened by janeyx99 4
  • When I use opt_einsum optimizes torch.einum, the running time after optimization increases.

    When I use opt_einsum optimizes torch.einum, the running time after optimization increases.

    import numpy as np
    import time
    import torch
    from opt_einsum import contract
    
    dim = 4
    
    x = torch.randn(6 ,4 ,4, 4)
    w1 = torch.randn(1,4,dim)
    w2 = torch.randn(dim,4,dim)
    w3 = torch.randn(dim,4,dim)
    w4 = torch.randn(dim,8,dim)
    w5 = torch.randn(dim,8,dim)
    w6 = torch.randn(dim,4,1)
    
    def naive(x, w1, w2, w3, w4, w5, w6):
        return torch.einsum('bkxy,ikj,jxm,myf,fpl,lqz,zri -> bpqr', x, w1, w2, w3, w4, w5, w6)
    
    def optimized(x, w1, w2, w3, w4, w5, w6):
        return contract('bkxy,ikj,jxm,myf,fpl,lqz,zri -> bpqr', x, w1, w2, w3, w4, w5, w6)
    
    

    The respective running time:

    naive
    0.0005145072937011719
    
    optimized
    0.876018762588501
    

    I want to know what caused this.Thanks!

    opened by edwin-zft 5
Releases(v3.3.0)
  • v3.3.0(Jul 19, 2020)

    Adds a object backend for optimized contractions on arbitrary Python objects.

    New Features

    • (#145) Adds a object based backend so that contract(backend='object') can be used on arbitrary objects such as SymPy symbols.

    Enhancements

    • (#140) Better error messages when the requested contract backend cannot be found.
    • (#141) Adds a check with RandomOptimizers to ensure the objects are not accidentally reused for different contractions.
    • (#149) Limits the remaining category for the contract_path output to only show up to 20 tensors to prevent issues with the quadratically scaling memory requirements and the number of print lines for large contractions.
    Source code(tar.gz)
    Source code(zip)
  • v3.2.1(Apr 15, 2020)

    Bug Fixes

    • (#131) Fixes an einstein subscript error message to point to the correct value.
    • (#135) Lazily loads JAX similar to other backends.
    Source code(tar.gz)
    Source code(zip)
  • v3.2.0(Mar 3, 2020)

    Small fixes for the dp path and support for a new mars backend.

    New Features

    • (#109) Adds mars backend support.

    Enhancements

    • (#110) New auto-hq and 'random-greedy-128' paths.
    • (#119) Fixes several edge cases in the dp path.

    Bug fixes

    • (#127) Fixes an issue where Python 3.6 features are required while Python 3.5 is opt_einsum's stated minimum version.
    Source code(tar.gz)
    Source code(zip)
  • v3.1.0(Sep 30, 2019)

  • v3.0.1(Aug 22, 2019)

    Alters setup.py to correctly state that opt_einsum requires Python 3.5+. This will now correctly effect PyPI and other pip mirror downloads, v3.0.0 will be removed from PyPI to prevent further issues.

    Source code(tar.gz)
    Source code(zip)
  • v3.0.0(Aug 12, 2019)

    This release moves opt_einsum to be backend agnostic while adding support additional backends such as Jax and Autograd. Support for Python 2.7 has been dropped and Python 3.5 will become the new minimum version, a Python deprecation policy equivalent to NumPy's has been adopted.

    New Features

    • (#78) A new random-optimizer has been implemented which uses Boltzmann weighting to explore alternative near-minimum paths using greedy-like schemes. This provides a fairly large path performance enhancements with a linear path time overhead.
    • (#78) A new PathOptimizer class has been implemented to provide a framework for building new optimizers. An example is that now custom cost functions can now be provided in the greedy formalism for building custom optimizers without a large amount of additional code.
    • (#81) The backend="auto" keyword has been implemented for contract allowing automatic detection of the correct backend to use based off provided tensors in the contraction.
    • (#88) Autograd and Jax support have been implemented.
    • (#96) Deprecates Python 2 functionality and devops improvements.

    Enhancements

    • (#84) The contract_path function can now accept shape tuples rather than full tensors.
    • (#84) The contract_path automated path algorithm decision technology has been refactored to a standalone function.
    Source code(tar.gz)
    Source code(zip)
  • v2.3.2(Dec 9, 2018)

  • v2.3.1(Dec 1, 2018)

  • v2.3.0(Dec 1, 2018)

    This release primarily focuses on expanding the suite of available path technologies to provide better optimization characistics for 4-20 tensors while decreasing the time to find paths for 50-200+ tensors. See Path Overview for more information.

    New Features:

    • (#60) A new greedy implementation has been added which is up to two orders of magnitude faster for 200 tensors.
    • (#73) Adds a new branch path that uses greedy ideas to prune the optimal exploration space to provide a better path than greedy at sub optimal cost.
    • (#73) Adds a new auto keyword to the opt_einsum.contract path option. This keyword automatically chooses the best path technology that takes under 1ms to execute.

    Enhancements:

    • (#61) The opt_einsum.contract path keyword has been changed to optimize to more closely match NumPy. path will be deprecated in the future.
    • (#61) The opt_einsum.contract_path now returns a opt_einsum.contract.PathInfo object that can be queried for the scaling, flops, and intermediates of the path. The print representation of this object is identical to before.
    • (#61) The default memory_limit is now unlimited by default based on community feedback.
    • (#66) The Torch backend will now use tensordot when using a version of Torch which includes this functionality.
    • (#68) Indices can now be any hashable object when provided in the "Interleaved Input" syntax.
    • (#74) Allows the default transpose operation to be overridden to take advantage of more advanced tensor transpose libraries.
    • (#73) The optimal path is now significantly faster.

    Bug fixes:

    • (#72) Fixes the "Interleaved Input" syntax and adds documentation.
    Source code(tar.gz)
    Source code(zip)
  • v2.2.0(Aug 29, 2018)

    New features:

    • (#48) Intermediates can now be shared between contractions, see here for more details.
    • (#53) Intermediate caching is thread safe.

    Enhancements:

    • (#48) Expressions are now mapped to non-unicode index set so that unicode input is support for all backends.
    • (#58) Adds tensorflow and theano with shared intermediates.

    Bug fixes:

    • (#41) PyTorch indices are mapped back to a small a-z subset valid for PyTorch's einsum implementation.
    Source code(tar.gz)
    Source code(zip)
  • v2.1.3(Aug 23, 2018)

  • v2.1.2(Aug 15, 2018)

  • v2.1.1(Aug 14, 2018)

  • v2.1.0(Aug 14, 2018)

    opt_einsum continues to improve its support for additional backends beyond NumPy with PyTorch.

    We have also published the opt_einsum package in the Journal of Open Source Software. If you use this package in your work, please consider citing us!

    New features:

    • PyTorch backend support
    • Tensorflow eager-mode execution backend support

    Enhancements:

    • Intermediate tensordot-like expressions are now ordered to avoid transposes.
    • CI now uses conda backend to better support GPU and tensor libraries.
    • Now accepts arbitrary unicode indices rather than a subset.
    • New auto path option which switches between optimal and greedy at four tensors.

    Bug fixes:

    • Fixed issue where broadcast indices were incorrectly locked out of tensordot-like evaluations even after their dimension was broadcast.
    Source code(tar.gz)
    Source code(zip)
  • v2.0.1(Jun 28, 2018)

    opt_einsum is a powerful tensor contraction order optimizer for NumPy and related ecosystems.

    New Features

    • Allows unlimited Unicode indices.
    • Adds a Journal of Open-Source Software paper.
    • Minor documentation improvements.
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(May 17, 2018)

    opt_einsum is a powerful tensor contraction order optimizer for NumPy and related ecosystems.

    New Features

    • Expressions can be precompiled so that the expression optimization need not happen multiple times.
    • The greedy order optimization algorithm has been tuned to be able to handle hundreds of tensors in several seconds.
    • Input indices can now be unicode so that expressions can have many thousands of indices.
    • GPU and distributed computing backends have been added such as Dask, TensorFlow, CUPy, Theano, and Sparse.

    Bug Fixes

    • A error effecting cases where opt_einsum mistook broadcasting operations for matrix multiply has been fixed.
    • Most error messages are now more expressive.
    Source code(tar.gz)
    Source code(zip)
  • 1.0(Oct 15, 2016)

    Official 1.0 release.

    Einsum is a very powerful function for contracting tensors of arbitrary dimension and index. However, it is only optimized to contract two terms at a time resulting in non-optimal scaling for contractions with many terms. Opt_einsum aims to fix this by optimizing the contraction order which can lead to arbitrarily large speed ups at the cost of additional intermediate tensors.

    Opt_einsum is also implemented into the np.einsum function as of NumPy v1.12.

    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Jul 30, 2016)

    A large step towards to a full 1.0 release. BLAS usage is now automatically applied to all operations. Future releases will be more careful with regard to views and needless data copying.

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Mar 23, 2016)

Owner
Daniel Smith
Twitter: @dga_smith
Daniel Smith
Mae segmentation - Reproduction of semantic segmentation using masked autoencoder (mae)

ADE20k Semantic segmentation with MAE Getting started Install the mmsegmentation

97 Dec 17, 2022
Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation

Implicit Internal Video Inpainting Implementation for our ICCV2021 paper: Internal Video Inpainting by Implicit Long-range Propagation paper | project

202 Dec 30, 2022
Pytorch implementation of our paper LIMUSE: LIGHTWEIGHT MULTI-MODAL SPEAKER EXTRACTION.

LiMuSE Overview Pytorch implementation of our paper LIMUSE: LIGHTWEIGHT MULTI-MODAL SPEAKER EXTRACTION. LiMuSE explores group communication on a multi

Auditory Model and Cognitive Computing Lab 17 Oct 26, 2022
Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021).

AA-RMVSNet Code for AA-RMVSNet: Adaptive Aggregation Recurrent Multi-view Stereo Network (ICCV 2021) in PyTorch. paper link: arXiv | CVF Change Log Ju

Qingtian Zhu 97 Dec 30, 2022
ColBERT: Contextualized Late Interaction over BERT (SIGIR'20)

Update: if you're looking for ColBERTv2 code, you can find it alongside a new simpler API, in the branch new_api. ColBERT ColBERT is a fast and accura

Stanford Future Data Systems 637 Jan 08, 2023
RoBERTa Marathi Language model trained from scratch during huggingface 🤗 x flax community week

RoBERTa base model for Marathi Language (मराठी भाषा) Pretrained model on Marathi language using a masked language modeling (MLM) objective. RoBERTa wa

Nipun Sadvilkar 23 Oct 19, 2022
Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper

Continual Learning With Filter Atom Swapping Pytorch Implementation of Continual Learning With Filter Atom Swapping (ICLR'22 Spolight) Paper If find t

11 Aug 29, 2022
RAANet: Range-Aware Attention Network for LiDAR-based 3D Object Detection with Auxiliary Density Level Estimation

RAANet: Range-Aware Attention Network for LiDAR-based 3D Object Detection with Auxiliary Density Level Estimation Anonymous submission Abstract 3D obj

30 Sep 16, 2022
Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution

Syllabic Quantity Patterns as Rhythmic Features for Latin Authorship Attribution Abstract Within the Latin (and ancient Greek) production, it is well

4 Dec 03, 2022
Code for the paper "VisualBERT: A Simple and Performant Baseline for Vision and Language"

This repository contains code for the following two papers: VisualBERT: A Simple and Performant Baseline for Vision and Language (arxiv) with a short

Natural Language Processing @UCLA 463 Dec 09, 2022
ANEA: Automated (Named) Entity Annotation for German Domain-Specific Texts

ANEA The goal of Automatic (Named) Entity Annotation is to create a small annotated dataset for NER extracted from German domain-specific texts. Insta

Anastasia Zhukova 2 Oct 07, 2022
Concept drift monitoring for HA model servers.

{Fast, Correct, Simple} - pick three Easily compare training and production ML data & model distributions Goals Boxkite is an instrumentation library

98 Dec 15, 2022
Scalable, event-driven, deep-learning-friendly backtesting library

...Minimizing the mean square error on future experience. - Richard S. Sutton BTGym Scalable event-driven RL-friendly backtesting library. Build on

Andrew 922 Dec 27, 2022
Code and project page for ICCV 2021 paper "DisUnknown: Distilling Unknown Factors for Disentanglement Learning"

DisUnknown: Distilling Unknown Factors for Disentanglement Learning See introduction on our project page Requirements PyTorch = 1.8.0 torch.linalg.ei

Sitao Xiang 24 May 16, 2022
(CVPR2021) Kaleido-BERT: Vision-Language Pre-training on Fashion Domain

Kaleido-BERT: Vision-Language Pre-training on Fashion Domain Mingchen Zhuge*, Dehong Gao*, Deng-Ping Fan#, Linbo Jin, Ben Chen, Haoming Zhou, Minghui

248 Dec 04, 2022
Python implementation of a live deep learning based age/gender/expression recognizer

TUT live age estimator Python implementation of a live deep learning based age/gender/smile/celebrity twin recognizer. All components use convolutiona

Heikki Huttunen 80 Nov 21, 2022
Unofficial implementation (replicates paper results!) of MINER: Multiscale Implicit Neural Representations in pytorch-lightning

MINER_pl Unofficial implementation of MINER: Multiscale Implicit Neural Representations in pytorch-lightning. 📖 Ref readings Laplacian pyramid explan

AI葵 51 Nov 28, 2022
GPU-accelerated Image Processing library using OpenCL

pyclesperanto pyclesperanto is a python package for clEsperanto - a multi-language framework for GPU-accelerated image processing. clEsperanto uses Op

17 Dec 25, 2022
Activity tragle - Google is tracking everything, we just look at it

activity_tragle Google is tracking everything, we just look at it here. You need

BERNARD Guillaume 1 Feb 15, 2022
Boundary IoU API (Beta version)

Boundary IoU API (Beta version) Bowen Cheng, Ross Girshick, Piotr Dollár, Alexander C. Berg, Alexander Kirillov [arXiv] [Project] [BibTeX] This API is

Bowen Cheng 177 Dec 29, 2022