A functional standard library for Python.

Last update: Jan 03, 2023

Related tags

Functional Programming toolz

Overview

Toolz

A set of utility functions for iterators, functions, and dictionaries.

See the PyToolz documentation at https://toolz.readthedocs.io

LICENSE

New BSD. See License File.

Install

toolz is on the Python Package Index (PyPI):

pip install toolz

Structure and Heritage

toolz is implemented in three parts:

itertoolz, for operations on iterables. Examples: groupby, unique, interpose,

functoolz, for higher-order functions. Examples: memoize, curry, compose,

dicttoolz, for operations on dictionaries. Examples: assoc, update-in, merge.

These functions come from the legacy of functional languages for list processing. They interoperate well to accomplish common complex tasks.

Read our API Documentation for more details.

Example

This builds a standard wordcount function from pieces within toolz:

>>> def stem(word):
...     """ Stem word to primitive form """
...     return word.lower().rstrip(",.!:;'-\"").lstrip("'\"")

>>> from toolz import compose, frequencies, partial
>>> from toolz.curried import map
>>> wordcount = compose(frequencies, map(stem), str.split)

>>> sentence = "This cat jumped over this other cat!"
>>> wordcount(sentence)
{'this': 2, 'cat': 2, 'jumped': 1, 'over': 1, 'other': 1}

Dependencies

toolz supports Python 3.5+ with a common codebase. It is pure Python and requires no dependencies beyond the standard library.

It is, in short, a lightweight dependency.

CyToolz

The toolz project has been reimplemented in Cython. The cytoolz project is a drop-in replacement for the Pure Python implementation. See CyToolz GitHub Page for more details.

Contributions Welcome

toolz aims to be a repository for utility functions, particularly those that come from the functional programming and list processing traditions. We welcome contributions that fall within this scope.

We also try to keep the API small to keep toolz manageable. The ideal contribution is significantly different from existing functions and has precedent in a few other functional systems.

Please take a look at our issue page for contribution ideas.

Community

See our mailing list. We're friendly.

Comments

Cython implementation of toolz

What do you think about having a Cython implementation of toolz that can be used as a regular C extension in CPython, or be cimport-ed by other Cython code?

I've been messing around with Cython lately, and I became curious how much performance could be gained by implementing toolz in Cython. I am almost finished with a first-pass implementation (it goes quickly when one doesn't try to fine-tune everything), and just have half of itertoolz left to do.

Performance increases of x2-x4 are common. Some perform even better (like x10), and a few are virtually the same. There is also less overhead when calling functions defined in Cython, which at times can be significant regardless of how things scale.

However, performance when called from Python isn't the only consideration. A common strategy used by the scientific, mobile, and game communities to increase performance of their applications is to convert Python code that is frequently run to Cython. Developing in Cython also tends to be very imperative. A Cython version of toolz will allow fast implementations to be used in other Cython code (via cimport) while facilitating a more functional style of programming.

Looking ahead, cython.parallel exposes OpenMP at a low level, which should allow for more efficient parallel processing.

Thoughts? Any ideas for a name? I am thinking coolz, because ctoolz and cytoolz sound like they are utilities for C or Cython code. I can push what I currently have to a repo once it has a name. Should this be part of pytoolz?

opened by eriknw 72
Join
Here is a semi-streaming Join function, analagous to SQL Join

Join two sequences on common attributes

This is a semi-streaming operation. The LEFT sequence is fully evaluated and placed into memory. The RIGHT side is evaluated lazily and so can be arbitrarily large.

The following example joins quantities of sold fruit to the name of the quantity.

>>> names = [(1, 'one'), (2, 'two'), (3, 'three')] >>> fruit = [('apple', 1), ('banana', 2), ('coconut', 2), ('orange', 1)] >>> result = join(first, second, names, fruit, apply=lambda x, y: x + y) >>> for row in result: ... print(row) (1, 'one', 'apple', 1) (2, 'two', 'banana', 2) (2, 'two', 'coconut', 2) (1, 'one', 'orange', 1)
opened by mrocklin 46
Logical Operators

I found some logical predicate functions useful recently, so I added them to toolz to complement...complement. Does it seem reasonable? The implementation and testing here isn't necessary the final result, just a proof-of-concept.

Also, I wasn't sure whether to call the new functions the imaginary verbs conjunct and disjunct (they're only nouns and adjectives in my dictionary) as per standard functional style or the clearer but longer conjunction and disjunction. Went with the latter for now.

opened by karansag 27
Add `toolz.sandbox.EqualityHashKey`

This builds upon the discussion and feedback from #166, which also has a faster (but harder to understand) implementation.

EqualityHashKey creates a hash key that uses equality comparisons between items, which may be used to create hash keys for otherwise unhashable types. The trade-offs for using this are discussed in the docstring. Additional usage cases would qualify as compelling reasons to promote EqualityHashKey out of the sandbox (imho).

@asmeurer, do you have any suggestions or additional cases where this would be applicable?

opened by eriknw 25

Keyword-only args breaks toolz.curry

Hey guys! I would expect the following behavior from toolz.curry:

>>> @toolz.curry
>>> def kwonly_sum(a, *, b=10):
           return a + b
>>> b_is_five = kwonly_sum(b=5) # actually raise exception here
>>> b_is_five(5)
10 # what I want

The exception gives a suggested solution:

TypeError                                 Traceback (most recent call last)
/home/mtartre/.conda/envs/std/lib/python3.4/site-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
    218         try:
--> 219             return self._partial(*args, **kwargs)
    220         except TypeError:

TypeError: kwonly_sum() missing 1 required positional argument: 'a'

During handling of the above exception, another exception occurred:
ValueError                                Traceback (most recent call last)
<ipython-input-31-fd1daf64ecb7> in <module>()
----> 1 kwonly_sum(b=5)(5)

/home/mtartre/.conda/envs/std/lib/python3.4/site-packages/toolz/functoolz.py in __call__(self, *args, **kwargs)
    220         except TypeError:
    221             # If there was a genuine TypeError
--> 222             required_args = _num_required_args(self.func)
    223             if (required_args is not None and
    224                     len(args) + len(self.args) >= required_args):

/home/mtartre/.conda/envs/std/lib/python3.4/site-packages/toolz/functoolz.py in _num_required_args(func)
    116         return known_numargs[func]
    117     try:
--> 118         spec = inspect.getargspec(func)
    119         if spec.varargs:
    120             return None

/home/mtartre/.conda/envs/std/lib/python3.4/inspect.py in getargspec(func)
    934         getfullargspec(func)
    935     if kwonlyargs or ann:
--> 936         raise ValueError("Function has keyword-only arguments or annotations"
    937                          ", use getfullargspec() API which can support them")
    938     return ArgSpec(args, varargs, varkw, defaults)

ValueError: Function has keyword-only arguments or annotations, use getfullargspec() API which can support them

The issue is exactly the same in cytoolz. Apologies I don't have the fix in a pull request, my firm requires legal approval for that.

opened by quantology 22

Add support for OrderedDicts

I like the Dicttoolz package, but for many of my use cases I need the deterministic behaviour of OrderedDict. Adapting Dicttoolz to return OrderedDict if all of its inputs are one is relatively straightforward. Is this something you would consider merging?

opened by bartvm 22
Tracing

Woah!

I was curious what it would be like to trace the input and output of toolz functions and user-defined functions. As a proof-of-concept, I created this branch:

https://github.com/eriknw/toolz/tree/trace_with_q

Simply do from toolz.traced import * and viola! In another termal, watch the output real-time via tail -f /tmp/toolz.

To trace a user function use trace as a decorator or function.

The results are astounding. I would paste example traces here, but I think you guys have got to try this out yourself.

q was copied from https://github.com/zestyping/q and was slightly modified to output to "/tmp/toolz" instead of "/tmp/q".

As I said above, this was meant as a proof-of-concept. It begs the question, though, whether such functionality should be added to toolz, how it should behave, etc. Tracing can be very handy for debugging and as an educational tool for new users.

If you encounter any bugs in the above branch, please post here.

Thoughts and reactions?

opened by eriknw 22
ENH: Adds excepts
The idea of this is to use exception based api functions alongside your normal functional code.

for example:

map(itemgetter('key'), seq) -> map(excepts(itemgetter('key'), KeyError), seq)

This helps us get around the fact that I cannot put an except clause in a lambda. I have found this to be very useful in my own code.

Most of this code is for fresh __name__ and __doc__ attributes.
opened by llllllllll 21
Faster groupby!
Issue #178 impressed upon me just how costly attribute resolution can be. In this case, groupby was made faster by avoiding resolving the attribute list.append.

This implementation is also more memory efficient than the current version that uses a defaultdict that gets cast to a dict. While casting a defaultdict d to a dict as dict(d) is fast, it is still a fast copy.

Honorable mention goes to the following implementation:

def groupby_alt(func, seq): d = collections.defaultdict(lambda: [].append) for item in seq: d[func(item)](item) rv = {} for k, v in iteritems(d): rv[k] = v.__self__ return rv

This alternative implementation can at times be very impressive. You should play with it!
opened by eriknw 20
Smarter wrapper behavior in functoolz.curry and functoolz.memoize

Using update_wrapper and wraps would be preferable, however they both cause errors -- pickling issues in curry and attribute errors for memoizing a partial in Python 2. However, more than just __name__ and __doc__ should be transferred: __module__, if present and __qualname__ and __annotations__ in Python 3. Updating with func.__dict__ isn't possible in curry (source of pickling problems), but should be done in memoize.

opened by justanr 19

Remove positional arg "func" from curry.init

conflicted with kwargs['func']

example.py:

from toolz import curry
@curry
def foo(x, y, func=int, bar=str):
    return str(func(x*y))
foo(bar=float)(4.2, 3.8, func=round)
foo(func=int)(4.2, 3.8, bar=str)

The last line would throw TypeError: __init__() got multiple values for keyword argument 'func' because curry.__init__(self, func, *args, **kwargs) names its first positional argument "func" This effectively prevented creating a curry object with such a kwarg. I didn't find other functions with the same problem.

> /home/digenis/src/toolz/toolz/example.py(1)<module>()
-> from toolz import curry
(pdb) c
Traceback (most recent call last):
 ...
  File "/home/digenis/src/toolz/toolz/functoolz.py", line 224, in __call__
    return curry(self._partial, *args, **kwargs)
TypeError: __init__() got multiple values for keyword argument 'func'
...
> /home/digenis/src/toolz/toolz/functoolz.py(224)__call__()
-> return curry(self._partial, *args, **kwargs)
(pdb) self._partial  # __init__ will receive this as the positional argument "func"
<functools.partial object at 0x7f16874ee0a8>
(pdb) pp args
()
(pdb) pp kwargs  # but it will also receive a kwarg named "func"
{'func': <type 'int'>}


Post mortem debugger finished. The example.py will be restarted
...

I abbreviated some line ranges with "..."

opened by Digenis 19

Bump pypa/gh-action-pypi-publish from 1.5.0 to 1.6.4
Bumps pypa/gh-action-pypi-publish from 1.5.0 to 1.6.4.

Release notes

Sourced from pypa/gh-action-pypi-publish's releases.

v1.6.4

oh, boi! again?

This is the last one tonight, promise! It fixes this embarrassing bug that was actually caught by the CI but got overlooked due to the lack of sleep. TL;DR GH passed $HOME from the external env into the container and that tricked the Python's site module to think that the home directory is elsewhere, adding non-existent paths to the env vars. See #115.

Full Diff: https://github.com/pypa/gh-action-pypi-publish/compare/v1.6.3...v1.6.4

v1.6.3

Another Release!? Why?

In pypa/gh-action-pypi-publish#112, it was discovered that passing a $PATH variable even breaks the shebang. So this version adds more safeguards to make sure it keeps working with a fully broken $PATH.

Full Diff: https://github.com/pypa/gh-action-pypi-publish/compare/v1.6.2...v1.6.3

v1.6.2

What's Fixed

Made the $PATH and $PYTHONPATH environment variables resilient to broken values passed from the host runner environment, which previously allowed the users to accidentally break the container's internal runtime as reported in pypa/gh-action-pypi-publish#112

Internal Maintenance Improvements

Added a devpi-based smoke-test GitHub Actions CI/CD workflow by @sesdaile-varmour in pypa/gh-action-pypi-publish#111

New Contributors

@sesdaile-varmour made their first contribution in pypa/gh-action-pypi-publish#111

Full Diff: https://github.com/pypa/gh-action-pypi-publish/compare/v1.6.1...v1.6.2

v1.6.1

What's happened?!

There was a sneaky bug in v1.6.0 which caused Twine to be outside the import path in the Python runtime. It is fixed in v1.6.1 by updating $PYTHONPATH to point to a correct location of the user-global site-packages/ directory.

Full Diff: https://github.com/pypa/gh-action-pypi-publish/compare/v1.6.0...v1.6.1

v1.6.0

Anything's changed?

The only update is that the Python runtime has been upgraded from 3.9 to 3.11. There are no functional changes in this release.

Full Changelog: https://github.com/pypa/gh-action-pypi-publish/compare/v1.5.2...v1.6.0

v1.5.2

What's Improved

Implemented the Twine transitive dependency tree pinning using pip-tools-generated constraint files. See pypa/gh-action-pypi-publish#107 and pypa/gh-action-pypi-publish#101 for details.

Full Diff: https://github.com/pypa/gh-action-pypi-publish/compare/v1.5.1...v1.5.2

v1.5.1

What's Changed

... (truncated)

Commits

c7f29f7 🐛 Override $HOME in the container with /root

644926c 🧪 Always run smoke testing in debug mode

e71a4a4 Add support for verbose bash execusion w/ $DEBUG

e56e821 🐛 Make id always available in twine-upload

c879b84 🐛 Use full path to bash in shebang

57e7d53 🐛Ensure the default $PATH value is pre-loaded

ce291dc 🎨🐛Fix the branch @ pre-commit.ci badge links

102d8ab 🐛 Rehardcode devpi port for GHA srv container

3a9eaef 🐛Use different ports in/out of GHA containers

a01fa74 🐛 Use localhost @ GHA outside the containers

Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.

Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

@dependabot rebase will rebase this PR

@dependabot recreate will recreate this PR, overwriting any edits that have been made to it

@dependabot merge will merge this PR after your CI passes on it

@dependabot squash and merge will squash and merge this PR after your CI passes on it

@dependabot cancel merge will cancel a previously requested merge and block automerging

@dependabot reopen will reopen this PR if it is closed

@dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually

@dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)

@dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

dependencies
opened by dependabot[bot] 0

The "collect" decorator

Hi! I often encounter the same pattern in my code:

def to_power(array, powers):
    result = []
    for entry in array:
        power = powers.get(entry)
        if power is not None and power >= 0:
            result.append(entry ** power)

    return result


def reverse_mapping(expensive_func, values):
    result = {}
    for entry in values:
        value = expensive_func(entry)
        if value is not None:
            result[value] = entry

    return result

The examples are somewhat simplistic, but you get the idea:

create a container
iterate some data, do some branching, etc and fill the container
return the container

I came up with several decorators that reduce this to:

@collect
def to_power(array, powers):
    for entry in array:
        power = powers.get(entry)
        if power is not None and power >= 0:
            yield entry ** power


@composed(dict)
def reverse_mapping(expensive_func, values):
    for entry in values:
        value = expensive_func(entry)
        if value is not None:
            yield value, entry

composed(func) simply applies func to the result of the decorated function, which effectively gathers the generator. And collect is just a shorter version of composed(list)

I can create a PR with my implementation, if you are interested in adding it to toolz.

opened by maxme1 0

Idea: Compose class should be iterable
It would be really nice to be able to iterate all the funcs in compose, without having to combine the first and funcs properties. It would let you immediately use Compose objects as iterables in the itertoolz functions.

For an example, consider the simple logging strategy outlined in my gist here: https://gist.github.com/ZeroBomb/8ac470b1d4b02c11f2873c5d4e0512a1

As written, I need to define this somewhat extraneous function

def get_funcs(composition): return (composition.first,)+composition.funcs

in order to map over those functions and re-compose:

@curry def interleave_map(func, items): # [1,2,3] -> [func(1), 1, func(2), 2, func(3), 3] return interleave([map(func, items), items]) # define a debug function that interleaves logging funcs inbetween each func in an existing composition debug = compose_left(get_funcs, interleave_map(passthru_log), star(compose_left))

if the Compose class were iterable, I could completely eliminate the get_funcs function, and comfortably feed the compose object directly into interleave:

def debug(composition): return compose_left(*interleave_map(passthru_log, composition))
opened by ZeroBomb 1
Setup pyright type-checking
I have added

A basic config file for pyright

A CI job to run pyright

comments to ignore errors that pyright detects in existing code.

This is to type-check any type hints that are added to toolz, as suggested in #496. These can be added incrementally.
opened by LincolnPuzey 0

Use "yield from" in merge_sorted

Convert these loops:

for item in seq:
    yield item

To the more modern and slightly more efficient

yield from seq

A quick benchmark (a is a list of 30 sorted lists of 60 random integers)

# old
In [8]: %timeit list(merge_sorted(*a))
815 µs ± 31.3 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

# new
In [7]: %timeit list(merge_sorted(*a))
766 µs ± 26.2 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)

opened by groutr 0

get_in function: add example of giving a string to the keys argument
It is currently tempting to test get_in like this:

get_in('x', {'x':5}) # returns 5

and conclude that this will also work:

get_in('test', {'test':5}) # actually returns None

It does not work, because 'test' is treated as ['t','e','s','t']. In complex dictionaries, you may actually get a value, like

get_in('xy', {'x': {'y': 5}} ) # returns 5

The documentation should probably call this out explicitly, if this is the intended behavior, perhaps by giving the 'xy' example above.

I, for one, wouldn't mind an implementation where get_in('test', {'test':5}) returns 5, but I wouldn't go so far as to say that is the right approach. I'm imagining it would facilitate doing something like this:

juxt(*map(curry(get_in), ['str1', ['str2', 'str3'], 'etc']))
opened by KevinXOM 2

Releases(0.12.0)

0.12.0(Jul 10, 2022)
Add apply (#411)

Support newer Python versions--up to Python 3.11-alpha (#525, #527, #533)

Improve warning when using toolz.compatibility (#485)

Improve documentation (#507, #524, #526, #530)

Improve performance of merge_with (#532)

Improve import times (#534)

Auto-upload new releases to PyPI (#536, #537)

Source code(tar.gz)
Source code(zip)
0.11.2(Nov 6, 2021)
Support Python 3.10

Source code(tar.gz)
Source code(zip)
0.11.1(Sep 24, 2020)
Importing toolz no longer warns (by importing .compatibility)

Source code(tar.gz)
Source code(zip)
0.11.0(Sep 23, 2020)
Drop Python 2.7 support!

Give deprecation warning on using toolz.compatibility

Some doc fixes

First time using auto-deployment. Fingers crossed!

Next release will probably be 1.0.0 :)
Source code(tar.gz)
Source code(zip)

A functional standard library for Python.

Related tags

Overview

Toolz

LICENSE

Install

Structure and Heritage

Example

Dependencies

CyToolz

See Also

Contributions Welcome

Community

Comments

v1.6.4

oh, boi! again?

v1.6.3

Another Release!? Why?

v1.6.2

What's Fixed

Internal Maintenance Improvements

New Contributors

v1.6.1

What's happened?!

v1.6.0

Anything's changed?

v1.5.2

What's Improved

v1.5.1

What's Changed

Releases(0.12.0)

0.12.0(Jul 10, 2022)

0.11.2(Nov 6, 2021)

0.11.1(Sep 24, 2020)

0.11.0(Sep 23, 2020)

Owner

Functional programming in Python: implementation of missing features to enjoy FP

粤语编程语言.The Cantonese programming language.

Make your functions return something meaningful, typed, and safe!

A functional standard library for Python.

Cython implementation of Toolz: High performance functional utilities

More routines for operating on iterables, beyond itertools

Simple, elegant, Pythonic functional programming.

A fancy and practical functional tools