Extended pickling support for Python objects

Overview

cloudpickle

Automated Tests codecov.io

cloudpickle makes it possible to serialize Python constructs not supported by the default pickle module from the Python standard library.

cloudpickle is especially useful for cluster computing where Python code is shipped over the network to execute on remote hosts, possibly close to the data.

Among other things, cloudpickle supports pickling for lambda functions along with functions and classes defined interactively in the __main__ module (for instance in a script, a shell or a Jupyter notebook).

Cloudpickle can only be used to send objects between the exact same version of Python.

Using cloudpickle for long-term object storage is not supported and strongly discouraged.

Security notice: one should only load pickle data from trusted sources as otherwise pickle.load can lead to arbitrary code execution resulting in a critical security vulnerability.

Installation

The latest release of cloudpickle is available from pypi:

pip install cloudpickle

Examples

Pickling a lambda expression:

>>> import cloudpickle
>>> squared = lambda x: x ** 2
>>> pickled_lambda = cloudpickle.dumps(squared)

>>> import pickle
>>> new_squared = pickle.loads(pickled_lambda)
>>> new_squared(2)
4

Pickling a function interactively defined in a Python shell session (in the __main__ module):

>>> CONSTANT = 42
>>> def my_function(data: int) -> int:
...     return data + CONSTANT
...
>>> pickled_function = cloudpickle.dumps(my_function)
>>> depickled_function = pickle.loads(pickled_function)
>>> depickled_function
<function __main__.my_function(data:int) -> int>
>>> depickled_function(43)
85

Running the tests

  • With tox, to test run the tests for all the supported versions of Python and PyPy:

    pip install tox
    tox
    

    or alternatively for a specific environment:

    tox -e py37
    
  • With py.test to only run the tests for your current version of Python:

    pip install -r dev-requirements.txt
    PYTHONPATH='.:tests' py.test
    

History

cloudpickle was initially developed by picloud.com and shipped as part of the client SDK.

A copy of cloudpickle.py was included as part of PySpark, the Python interface to Apache Spark. Davies Liu, Josh Rosen, Thom Neale and other Apache Spark developers improved it significantly, most notably to add support for PyPy and Python 3.

The aim of the cloudpickle project is to make that work available to a wider audience outside of the Spark ecosystem and to make it easier to improve it further notably with the help of a dedicated non-regression test suite.

Comments
  • Add ability to register modules to be deeply serialized

    Add ability to register modules to be deeply serialized

    This PR is based on the work done by @kinghuang in PR391, but takes on the feedback provided by @ogrisel and adds testing.

    Fixes #206

    Issue Summary

    To summarise the issue, in many cases cloudpickle is used to send code for remote execution. This is used in dask, prefect, mlflow and many libraries. For local functions, this works perfectly fine. But for any non-local function or class, cloudpickle assumes that external modules and packages are available at the location of deserialization. This may either not be the case, or the version of the package available at the end point may be different.

    This PR adds the option to register modules for deep serialization by providing a register_deep_serialization function which takes either a name or a module. This is the original register_dynamic_module by @kinghuang.

    import cloudpickle
    from tests import external
    
    cloudpickle.register_deep_serialization("tests.external")  # string name works
    cloudpickle.register_deep_serialization(external)          # You can pass the module itself
    cloudpickle.register_deep_serialization("tests")           # or the parent string/module
    
    output = cloudpickle.dumps(external.an_external_function)
    

    Original dumps:

    b'\x80\x05\x95+\x00\x00\x00\x00\x00\x00\x00\x8c\x0etests.external\x94\x8c\x14an_external_function\x94\x93\x94.'
    

    dumps after registering tests.external for deep serialization:

    b'\x80\x05\x95<\x02\x00\x00\x00\x00\x00\x00\x8c\x17cloudpickle.cloudpickle\x94\x8c\r_builtin_type\x94\x93\x94\x8c\nLambdaType\x94\x85\x94R\x94(h\x02\x8c\x08CodeType\x94\x85\x94R\x94(K\x00K\x00K\x00K\x00K\x01KCC\x04d\x01S\x00\x94N\x8c\x11this is something\x94\x86\x94))\x8c5/Users/samreay/Projects/cloudpickle/tests/external.py\x94\x8c\x14an_external_function\x94K\x04C\x02\x00\x01\x94))t\x94R\x94}\x94(\x8c\x0b__package__\x94\x8c\x05tests\x94\x8c\x08__name__\x94\x8c\x0etests.external\x94\x8c\x08__file__\x94\x8c5/Users/samreay/Projects/cloudpickle/tests/external.py\x94uNNNt\x94R\x94\x8c\x1ccloudpickle.cloudpickle_fast\x94\x8c\x12_function_setstate\x94\x93\x94h\x19}\x94}\x94(h\x14h\r\x8c\x0c__qualname__\x94h\r\x8c\x0f__annotations__\x94}\x94\x8c\x0e__kwdefaults__\x94N\x8c\x0c__defaults__\x94N\x8c\n__module__\x94h\x15\x8c\x07__doc__\x94N\x8c\x0b__closure__\x94N\x8c\x17_cloudpickle_submodules\x94]\x94\x8c\x0b__globals__\x94}\x94u\x86\x94\x86R0.'
    

    Modules can be unregistered via unregister_deep_serialization

    Tests

    One of the example tests with an explicit use case is shown above. On top of this, tests have been added to _lookup_module_and_qualname using the _cloudpickle_testpkg package, and also to the new _is_explicitly_serialized_module function.

    opened by Samreay 58
  • ENH: derive from C-pickler for fast serialization

    ENH: derive from C-pickler for fast serialization

    Summary:

    This PR proposes a new Cloudpickler class, that inherits from the C _pickle.Pickler instead of the python pickle._Pickler, allowing 10x+ speedups for the serialization of large builtin objects such as dicts, lists..

    Disclaimer: a new start

    Moving from the python to the c Pickler requires a fair amount of changes. For this reason, instead of simply adapting the current code to respect the new constraints, I started back from scratch. This allows a new, clean API and structure, that will be hopefully easier to understand for everyone.

    I made a lot of comments, (sometimes overly verbose), to ease the review process of this PR. Eventually, I hope the information they contain can be transfered to a proper project documentation.

    Implementation:

    Changes to python

    As opposed to the python pickler, The CPickler does not expose the save_* family of functions, as well as low level isntructions such as write. These methods can can neither be patched, or called, and the only customization option we had initially was the dispatch table, that is called for all types BUT a few special cases, including classes and functions, the two principal use-cases of cloudpickle.

    As this makes it simply impossible to modify pickling behavior for such types, we patched the C pickler for it to allow a user defined reduction callback for functions and classes. This idea was suggested by @pitrou.

    The direct consequence is that functions and classes now have to follow the save_reduce-load_build pickling/depickling process. Unfortunaltely, this API is not well suited for custom builtin-type saving: in particular, the state setting part of load_build (function that reconstructs an object from a reduce value) assumes all attributes of an object are writeable, which is not the case for C types (especially function.__globals__ and function.__closure__)

    For this reason, we also changed the API of save_reduce, allowing to add a custom state_setter, that will be called at unpickling time.

    You can view the totals changes in this diff

    Individual PRs to CPython:

    • https://github.com/python/cpython/pull/12499 (reducer_override)
    • https://github.com/python/cpython/pull/12588 (state_setter in save_reduce)

    Changes to cloudpickle

    Functions and classes are the two main types affected by this PR. The main challenge was to make the saving process fit into the save_reduce API.

    Outside of these types, the actuall reduction process remains intact.

    However, now that any customization must return a tuple, I decided to adopt a new naming, hopefully clearer naming style for functions. You will see by yourselves.

    How to build this version locally

    Until the final release of Python 3.8, you need to build python from upstream's master branch

    git clone [email protected]:python/cpython.git
    cd cpython
    ./configure
    make
    

    To be able to use external modules you need a virtual environment, using for example the venv module:

    ./python -m venv /path/to/local/virtualenv
    

    Clone and install cloudpickle and its dependencies

    cd /path/to/cloudpickle
    git clone [email protected]:cloudpipe/cloudpickle.git
    git fetch origin pull/ID/head:fast-cloudpickle
    /path/to/local/virtualenv/bin/python -mpip install -rdev-requirements.txt
    /path/to/local/virtualenv/bin/python -mpip install .
    

    Finally, rum the tests:

    /path/to/local/virtualenv/bin/python -mpytest tests/
    

    Bechmarks:

    • Benchmarks of a "concrete", end-to-end use-case using loky can be found here. To run the benchmarks, you also need the master version of loky.
    opened by pierreglaser 41
  • Pickling of generic annotations/types in 3.5+

    Pickling of generic annotations/types in 3.5+

    This PR adds support for pickling annotations on 3.5+, and fixes some problems with generic annotations on 3.7+.

    TODO

    • [x] Backport for 3.5
    • [x] Test that fails with TypeError: type() doesn't support MRO entry resolution; use types.new_class() if not using types.new_class for reconstructing classes
    • [x] Remove typing_extensions dependency
    • [x] Prefix privates with _
    • [x] Add test for pickle_depickle'ing annotated functions/classes

    Details

    The types.new_class change (in _make_skeleton_class) is because of a TypeError: type() doesn't support MRO entry resolution; use types.new_class() error on 3.7+, similar to this issue. Also see https://github.com/python/cpython/pull/6319.

    I'm not sure if there are any downsides to TypeVars being __reduce__'d now. Previously, they were only supported as globals (so always imported, I think).

    The functions try_decompose_generic and get_bases are brittle the way they are written, because they check for attributes. There might be a better way.

    Tests

    Passing, and added some new ones.

    ci downstream ci ray ci joblib ci distributed ci python-nightly ci loky 
    opened by valtron 39
  • deduplicate cloudpickle reducers.

    deduplicate cloudpickle reducers.

    closes #284 related to #364

    About backward compatiblity:

    • this PR removes make_skel_func, fill_function, e.g the previous function cloudpickle used to reconstruct functions, as they were equivalent to the new function_setstate/function_new (modulo some Python 3.8 compatiblity. These functions are important to reconstruct pickles created by previous cloudpickle versions. A simple fix is to keep them inside cloudpickle.py for a few releases and add a FutureWarning inside them saying that an attempt is made into reading old pickle, and that reading them will break in 2 releases.
    • By removing the previous Python < 3.8 CloudPickler class this PR also removes semi-public functions (all the CloudPickler.save_*). These functions are not necessary to read old pickles, but they could be used inside third-party code. To address this, we could keep exposing the previous CloudPickler for the next few releases, but make cloudpickle.dump(s) use the new CloudPickler. This way, we can add a FutureWarning into the previous CloudPickler.__init__, while cloudpickle.dump(s) remains silent.

    Also, the module names don't make much sense now. In the future we should rename cloudpickle_fast.py to cloudpickle.py, and merge it with the previous cloudpickle.py.

    @jakirkham if you want to give #364 another shot, but rebasing on this PR first, I suspect its implementation should be much easier :)

    ci downstream 
    opened by pierreglaser 25
  • Add ability to pickle dynamically create modules

    Add ability to pickle dynamically create modules

    The old logic treated all modules the same, which would fail when unpickling. In save_module detect whether the module has been dynamically created by following the chain of imports. Noteworthy is that imp.find_module doesn't work with submodules (example sckit.tree), so we actually have to split the module name and iterate over each piece.

    Dynamic modules are saved as dictionaries and reconstituted by dynamic_subimport function. While working on the test cases I discovered NotImplemented and Ellipsis also don't work properly (they are introduced into the test dynamic module by exec). I've also addressed that.

    opened by rodrigofarnhamsc 21
  • Optionally use pickle5 (Redux)

    Optionally use pickle5 (Redux)

    Fixes https://github.com/cloudpipe/cloudpickle/issues/179

    Thanks to @pierreglaser's work in PR ( https://github.com/cloudpipe/cloudpickle/pull/368 ), this is a rebased/simplified version of PR ( https://github.com/cloudpipe/cloudpickle/pull/364 ). Otherwise is the same in that it tries to use pickle5 on older Python versions to support out-of-band buffers.

    ci downstream 
    opened by jakirkham 20
  • Implement dynamic class provenance tracking to fix isinstance semantics and add support for dynamically defined enums

    Implement dynamic class provenance tracking to fix isinstance semantics and add support for dynamically defined enums

    This is a fix for #244 (and #101) to add support for dynamically defined Enum subclasses.

    Properly adding support for dynamic Enums required to more broadly fix the isinstance semantics as initially requested in #195.

    The proposed solution involves tracking the provenance of pickled dynamic class definitions with a pair of weakref.WeakKeyDictionary / weakref.WeakValueDictionary protected by a threading.Lock.

    enhancement 
    opened by ogrisel 19
  • Making cloudpickle produce

    Making cloudpickle produce "consistent/deterministic" results.

    This question arose in the following context. I have multiple Python processes, and some classes are defined in each process. Sometimes class definitions are shipped from one process to another (using cloudpickle). Sometimes classes may be shipped multiple times or in multiple ways to a given process, and I'd like to deduplicate them based on the output of cloudpickle (that is, if cloudpickle.dumps(class1) == cloudpickle.dumps(class2), then the classes are the "same" and I can throw away one of them. This works, but there are way too many false negatives (that is, two classes really are the same (in some sense), but cloudpickle.dumps gives different results on the two classes.

    Here's one example that sort of illustrates the issue (although there are a number of ways this kind of thing can arise).

    Suppose I do the following.

    import cloudpickle
    
    class Foo1(object):
        def __init__(self):
            pass
    
    serialized1 = cloudpickle.dumps(Foo1)
    Foo2 = cloudpickle.loads(serialized1)
    serialized2 = cloudpickle.dumps(Foo2)
    
    assert serialized1 == serialized2  # This assertion fails.
    

    I'd love for this kind of assertion to succeed. Does anyone know if this is achievable or what the main obstacles are?

    Interestingly, if I iterate this a third time,

    Foo3 = cloudpickle.loads(serialized2)
    serialized3 = cloudpickle.dumps(Foo3)
    
    assert serialized2 == serialized3  # This succeeds.
    

    then the assert succeeds, so maybe it suffices to use cloudpickle.dumps(cloudpickle.loads(cloudpickle.dumps(cls))) to deduplicate classes (though this seems kind of insane, and I haven't tested this extensively). Would you expect this to work?

    One thing that may be related/revealing is the outputs I get if I do something similar in an IPython interpreter (instead of a regular Python interpreter).

    First copy and paste this block into IPython.

    import cloudpickle
    
    class Foo(object):
        def __init__(self):
            pass
    
    serialized1 = cloudpickle.dumps(Foo)
    

    Then copy and paste this block into IPython.

    class Foo(object):
        def __init__(self):
            pass
    
    serialized2 = cloudpickle.dumps(Foo)
    

    Comparing serialized1 and serialized2 next to each other, they are

    serialized1  # b'\x80\x02ccloudpickle.cloudpickle\n_rehydrate_skeleton_class\nq\x00(ccloudpickle.cloudpickle\n_builtin_type\nq\x01X\t\x00\x00\x00ClassTypeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05c__builtin__\nobject\nq\x06\x85q\x07}q\x08X\x07\x00\x00\x00__doc__q\tNs\x87q\nRq\x0b}q\x0c(X\n\x00\x00\x00__module__q\rX\x08\x00\x00\x00__main__q\x0eX\x08\x00\x00\x00__init__q\x0fccloudpickle.cloudpickle\n_fill_function\nq\x10(ccloudpickle.cloudpickle\n_make_skel_func\nq\x11h\x01X\x08\x00\x00\x00CodeTypeq\x12\x85q\x13Rq\x14(K\x01K\x00K\x01K\x01KCc_codecs\nencode\nq\x15X\x04\x00\x00\x00d\x00S\x00q\x16X\x06\x00\x00\x00latin1q\x17\x86q\x18Rq\x19N\x85q\x1a)X\x04\x00\x00\x00selfq\x1b\x85q\x1cX\x1e\x00\x00\x00<ipython-input-1-d9b5c81388ae>q\x1dh\x0fK\x04h\x15X\x02\x00\x00\x00\x00\x01q\x1eh\x17\x86q\x1fRq ))tq!Rq"J\xff\xff\xff\xff}q#\x87q$Rq%}q&N}q\'NtRutR.'
    serialized2  # b'\x80\x02ccloudpickle.cloudpickle\n_rehydrate_skeleton_class\nq\x00(ccloudpickle.cloudpickle\n_builtin_type\nq\x01X\t\x00\x00\x00ClassTypeq\x02\x85q\x03Rq\x04X\x03\x00\x00\x00Fooq\x05c__builtin__\nobject\nq\x06\x85q\x07}q\x08X\x07\x00\x00\x00__doc__q\tNs\x87q\nRq\x0b}q\x0c(X\n\x00\x00\x00__module__q\rX\x08\x00\x00\x00__main__q\x0eX\x08\x00\x00\x00__init__q\x0fccloudpickle.cloudpickle\n_fill_function\nq\x10(ccloudpickle.cloudpickle\n_make_skel_func\nq\x11h\x01X\x08\x00\x00\x00CodeTypeq\x12\x85q\x13Rq\x14(K\x01K\x00K\x01K\x01KCc_codecs\nencode\nq\x15X\x04\x00\x00\x00d\x00S\x00q\x16X\x06\x00\x00\x00latin1q\x17\x86q\x18Rq\x19N\x85q\x1a)X\x04\x00\x00\x00selfq\x1b\x85q\x1cX\x1e\x00\x00\x00<ipython-input-2-a08e1f07615d>q\x1dh\x0fK\x02h\x15X\x02\x00\x00\x00\x00\x01q\x1eh\x17\x86q\x1fRq ))tq!Rq"J\xff\xff\xff\xff}q#\x87q$Rq%}q&N}q\'NtRutR.'
    

    They seem to be the same everywhere except that the first includes the string <ipython-input-1-d9b5c81388ae>q\x1dh\x0fK\x04h and the second includes the string <ipython-input-2-a08e1f07615d>q\x1dh\x0fK\x02h. Any idea where these strings come from or if it is possible to remove them?

    cc @Wapaul1 @mehrdadn

    opened by robertnishihara 19
  • NumPy arrays serialize more slowly with cloudpickle than pickle

    NumPy arrays serialize more slowly with cloudpickle than pickle

    I would expect pickle and cloudpickle to behave pretty much identically here. Sadly cloudpickle serializes much more slowly.

    In [1]: import numpy as np
    
    In [2]: data = np.random.randint(0, 255, dtype='u1', size=100000000)
    
    In [3]: import cloudpickle, pickle
    
    In [4]: %time len(pickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
    CPU times: user 50.9 ms, sys: 135 ms, total: 186 ms
    Wall time: 185 ms
    Out[4]: 100000161
    
    In [5]: %time len(cloudpickle.dumps(data, protocol=pickle.HIGHEST_PROTOCOL))
    CPU times: user 125 ms, sys: 280 ms, total: 404 ms
    Wall time: 405 ms
    Out[5]: 100000161
    
    opened by mrocklin 19
  • Remove non-standard __transient__ support

    Remove non-standard __transient__ support

    The __transient__ dunder attribute is not the standard way to prevent attributes from being pickled. Instead, the standard approach is to use the __getstate__ and __setstate__ magic methods.

    Considering that:

    • This is an old fix that was implemented maybe for unsupported Python versions (i.e.: Python 2.6).
    • Nobody knows what it is doing there exactly or why it was added.
    • Having this special non-standard case implemented makes the code more complex and may result in unexpected behavior (see #108).
    • This code was not even covered by tests, so removing it should increase code coverage and make the module more robust.

    I propose to remove any support for the non-standard approach.

    As mentioned in #108, some other projects might be using this attribute. But when looking at those projects:

    • Most simply have copies of the cloudpickle.py file (hence the match when searching for __transient__).
    • Others simply seem to be using __transient__ but without depending on cloudpickle as an external module dependency.
    • Most seem to be not very relevant (i.e.: fewer than 2 stars).

    I think even though this change may break other's code it is an unlikely scenario. Anyway, if that was the case, I think they should be fixing their code rather than making cloudpickle carry ugly fixes. Also, they can always choose to use an older cloudpickle version from PyPi.

    Fixes #108.

    opened by Peque 18
  • Fix cloudpickle incompatibilities on early Python 3.5 versions

    Fix cloudpickle incompatibilities on early Python 3.5 versions

    Closes #360 . cloudpickle 1.4.0 is not compatible with early Python 3.5 versions. This should fix it.

    Note that I did not set up any CI for Python 3.5.[0-2], I simply tested it on my machine using fresh conda envs.

    @vedran If you have some time, could you tell me if this branch fixes the problems that made you create #360?

    I would be tempted to release a bugfix version by tonight since this bug completely breaks cloudpickle on Python 3.5.

    opened by pierreglaser 17
  • Fix NamedTuple issues on Python 3.9

    Fix NamedTuple issues on Python 3.9

    This PR fixes issue #460. Two changes were required. First, if __module__ was present in obj.__dict__, we need to pass it along to type_kwargs. See error message below.

    cls = <class 'typing.NamedTupleMeta'>, typename = 'MyTuple', bases = (<class 'typing.NamedTuple'>,), ns = {'__orig_bases__': (<function NamedTuple at 0x7fc0780f9310>,), '__slots__': ()}
    
        def __new__(cls, typename, bases, ns):
            assert bases[0] is _NamedTuple
            types = ns.get('__annotations__', {})
            default_names = []
            for field_name in types:
                if field_name in ns:
                    default_names.append(field_name)
                elif default_names:
                    raise TypeError(f"Non-default namedtuple field {field_name} "
                                    f"cannot follow default field"
                                    f"{'s' if len(default_names) > 1 else ''} "
                                    f"{', '.join(default_names)}")
            nm_tpl = _make_nmtuple(typename, types.items(),
                                   defaults=[ns[n] for n in default_names],
    >                              module=ns['__module__'])
    E       KeyError: '__module__'
    

    Second, if we pass __slots__ and __module__ to type_kwargs then we get the following error:

    cls = <class 'typing.NamedTupleMeta'>, typename = 'MyTuple', bases = (<class 'typing.NamedTuple'>,)
    ns = {'__module__': 'tests.cloudpickle_test', '__orig_bases__': (<function NamedTuple at 0x7fa300131310>,), '__slots__': ()}
    
        def __new__(cls, typename, bases, ns):
            assert bases[0] is _NamedTuple
            types = ns.get('__annotations__', {})
            default_names = []
            for field_name in types:
                if field_name in ns:
                    default_names.append(field_name)
                elif default_names:
                    raise TypeError(f"Non-default namedtuple field {field_name} "
                                    f"cannot follow default field"
                                    f"{'s' if len(default_names) > 1 else ''} "
                                    f"{', '.join(default_names)}")
            nm_tpl = _make_nmtuple(typename, types.items(),
                                   defaults=[ns[n] for n in default_names],
                                   module=ns['__module__'])
            # update from user namespace without overriding special namedtuple attributes
            for key in ns:
                if key in _prohibited:
    >               raise AttributeError("Cannot overwrite NamedTuple attribute " + key)
    E               AttributeError: Cannot overwrite NamedTuple attribute __slots__
    
    /Users/ryanc/opt/anaconda3/lib/python3.9/typing.py:1884: AttributeError
    

    To resolve this, I deleted the lines passing __slots__ to type_kwargs. Our unit test test_instance_with_slots still passes with this change. The deleted __slots__ lines were written 4 years ago and are possibly no longer useful. If there is reason to believe removing it could cause a regression, we should at least add a unit test that properly tests the functionality provided by these lines.

    I've run all unit tests locally with Python 3.6, 3.7, 3.8, 3.9, and 3.10 and verified non-regression. The new NamedTuple test fails on develop with Python 3.9 and 3.10 but passes on this branch. I'm happy to iterate here if there are changes needed.

    opened by RyanClark2k 1
  • 2.2.0: pytest (7.2) is failing in two units

    2.2.0: pytest (7.2) is failing in two units

    I'm packaging your module as an rpm package so I'm using the typical PEP517 based build, install and test cycle used on building packages from non-root account.

    • python3 -sBm build -w --no-isolation
    • because I'm calling build with --no-isolation I'm using during all processes only locally installed modules
    • install .whl file in </install/prefix>
    • run pytest with PYTHONPATH pointing to sitearch and sitelib inside </install/prefix>

    Looks like cloudpickle test suite is failing with pytest 7.2. Here is pytest output:

    + PYTHONPATH=/home/tkloczko/rpmbuild/BUILDROOT/python-cloudpickle-2.2.0-4.fc35.x86_64/usr/lib64/python3.8/site-packages:/home/tkloczko/rpmbuild/BUILDROOT/python-cloudpickle-2.2.0-4.fc35.x86_64/usr/lib/python3.8/site-packages
    + /usr/bin/pytest -ra
    =========================================================================== test session starts ============================================================================
    platform linux -- Python 3.8.15, pytest-7.2.0, pluggy-1.0.0
    rootdir: /home/tkloczko/rpmbuild/BUILD/cloudpickle-2.2.0, configfile: tox.ini
    collected 256 items
    
    tests/cloudpickle_file_test.py .......
    tests/cloudpickle_test.py ...................................F.....................................................................................................................F......................................................s.................................
    tests/test_backward_compat.py .......
    
    ================================================================================= FAILURES =================================================================================
    ________________________________________________________________ CloudPickleTest.test_dynamic_pytest_module ________________________________________________________________
    
    self = <tests.cloudpickle_test.CloudPickleTest testMethod=test_dynamic_pytest_module>
    
        def test_dynamic_pytest_module(self):
            # Test case for pull request https://github.com/cloudpipe/cloudpickle/pull/116
            import py
    
            def f():
                s = py.builtin.set([1])
                return s.pop()
    
            # some setup is required to allow pytest apimodules to be correctly
            # serializable.
            from cloudpickle import CloudPickler
            from cloudpickle import cloudpickle_fast as cp_fast
    >       CloudPickler.dispatch_table[type(py.builtin)] = cp_fast._module_reduce
    E       AttributeError: module 'py' has no attribute 'builtin'
    
    tests/cloudpickle_test.py:1482: AttributeError
    ___________________________________________________________ Protocol2CloudPickleTest.test_dynamic_pytest_module ____________________________________________________________
    
    self = <tests.cloudpickle_test.Protocol2CloudPickleTest testMethod=test_dynamic_pytest_module>
    
        def test_dynamic_pytest_module(self):
            # Test case for pull request https://github.com/cloudpipe/cloudpickle/pull/116
            import py
    
            def f():
                s = py.builtin.set([1])
                return s.pop()
    
            # some setup is required to allow pytest apimodules to be correctly
            # serializable.
            from cloudpickle import CloudPickler
            from cloudpickle import cloudpickle_fast as cp_fast
    >       CloudPickler.dispatch_table[type(py.builtin)] = cp_fast._module_reduce
    E       AttributeError: module 'py' has no attribute 'builtin'
    
    tests/cloudpickle_test.py:1482: AttributeError
    ========================================================================= short test summary info ==========================================================================
    SKIPPED [1] tests/cloudpickle_test.py:2261: Need Pickle Protocol 5 or later
    FAILED tests/cloudpickle_test.py::CloudPickleTest::test_dynamic_pytest_module - AttributeError: module 'py' has no attribute 'builtin'
    FAILED tests/cloudpickle_test.py::Protocol2CloudPickleTest::test_dynamic_pytest_module - AttributeError: module 'py' has no attribute 'builtin'
    ================================================================ 2 failed, 253 passed, 1 skipped in 14.34s =================================================================
    

    Here is list of installed modules in build env

    Package           Version
    ----------------- --------------
    appdirs           1.4.4
    asn1crypto        1.5.1
    attrs             22.1.0
    bcrypt            3.2.2
    Brlapi            0.8.3
    build             0.9.0
    cffi              1.15.1
    contourpy         1.0.6
    cryptography      38.0.1
    cssselect         1.1.0
    cycler            0.11.0
    distro            1.8.0
    dnspython         2.2.1
    exceptiongroup    1.0.0
    extras            1.0.0
    fixtures          4.0.0
    fonttools         4.38.0
    gpg               1.17.1-unknown
    iniconfig         1.1.1
    kiwisolver        1.4.4
    libcomps          0.1.19
    louis             3.23.0
    lxml              4.9.1
    matplotlib        3.6.2
    mock              4.0.3
    numpy             1.23.1
    olefile           0.46
    packaging         21.3
    pbr               5.9.0
    pep517            0.13.0
    Pillow            9.3.0
    pip               22.3.1
    pluggy            1.0.0
    ply               3.11
    psutil            5.9.2
    pyasn1            0.4.8
    pyasn1-modules    0.2.8
    pycparser         2.21
    PyGObject         3.42.2
    pyparsing         3.0.9
    pytest            7.2.0
    python-dateutil   2.8.2
    PyYAML            6.0
    rpm               4.17.0
    scour             0.38.2
    setuptools        65.6.3
    six               1.16.0
    testtools         2.5.0
    tomli             2.0.1
    tornado           6.2
    tpm2-pkcs11-tools 1.33.7
    tpm2-pytss        1.1.0
    typing_extensions 4.4.0
    wheel             0.38.4
    
    opened by kloczek 1
  • Exception line numbering is wrong in Python 3.10.8

    Exception line numbering is wrong in Python 3.10.8

    Hi 👋

    Behaviour in 3.8:

    Python 3.8.9 (default, Apr 13 2022, 08:48:06) 
    Type "help", "copyright", "credits" or "license" for more information.
    >>> def add(x, y):
    ...     if x == 2:
    ...         raise Exception(f'Kapput: problem with x={x} and y={y}')
    ...     else:
    ...         return x + y
    ... 
    >>> add(2, 2)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 3, in add
    Exception: Kapput: problem with x=2 and y=2
    >>> import cloudpickle
    >>> cloudpickle.loads(cloudpickle.dumps(add))(2, 2)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 3, in add
    Exception: Kapput: problem with x=2 and y=2
    

    Behaviour in 3.10:

    Python 3.10.8 (main, Oct 13 2022, 09:48:40) [Clang 14.0.0 (clang-1400.0.29.102)] on darwin
    Type "help", "copyright", "credits" or "license" for more information.
    >>> def add(x, y):
    ...     if x == 2:
    ...         raise Exception(f'Kapput: problem with x={x} and y={y}')
    ...     else:
    ...         return x + y
    ... 
    >>> import cloudpickle
    >>> add(2, 2)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 3, in add
    Exception: Kapput: problem with x=2 and y=2
    >>> cloudpickle.loads(cloudpickle.dumps(add))(2, 2)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "<stdin>", line 5, in add
    Exception: Kapput: problem with x=2 and y=2
    

    Note the difference in the second line from the bottom: In 3.10, the line number is wrong.

    This works fine with plain Python pickle.

    opened by henrifroese 0
  • cloudpickle cannot pickle '_jpype._JField' objects

    cloudpickle cannot pickle '_jpype._JField' objects

    So, I've been working on a project which involves implementing reinforcement learning in a server-client app. The server is written in Java and the client is in Python, which is why I use JPype to import some server classes.

    After importing the necessary packages and creating the environment using PettingZoo, it is time to create the model and train it using Stable-Baselines3, but the problem is that when I use Supersuit, it needs to pickle and unpickle the environment, and because the environment contains many Java objects, an error is thrown: TypeError: cannot pickle '_jpype._JField' object.

    The normal Pickle package does not support JField objects, but in the JPype library, there is a JPickle version that supports JField objects. I tried to modify the cloudpickle_fast.py to add the JPickle package but I end up having a problem with the cloudpickle.loads()

    Here is what I modified in cloudpickle_fast.py:

    from jpype.pickle import JPickler, JUnpickler
    
        def dump(self, obj):
            try:
                return Pickler.dump(self, obj)
            except RuntimeError as e:
                if "recursion" in e.args[0]:
                    msg = (
                        "Could not pickle object as excessively deep recursion "
                        "required."
                    )
                    raise pickle.PicklingError(msg) from e
                else:
                    raise
            except TypeError as e:
                return JPickler.dump(self, obj)
    

    And here is the full stacktrace I get:

    ---------------------------------------------------------------------------
    UnpicklingError                           Traceback (most recent call last)
    Input In [11], in <cell line: 6>()
          1 env = MARL_Env_Parallel(4)
          5 env = ss.pettingzoo_env_to_vec_env_v1(env)
    ----> 6 env = ss.concat_vec_envs_v1(env, 1, num_cpus=1, base_class='stable_baselines3')
    
    File ~\anaconda3\envs\gym\lib\site-packages\supersuit\vector\vector_constructors.py:61, in concat_vec_envs_v1(vec_env, num_vec_envs, num_cpus, base_class)
         59 def concat_vec_envs_v1(vec_env, num_vec_envs, num_cpus=0, base_class="gymnasium"):
         60     num_cpus = min(num_cpus, num_vec_envs)
    ---> 61     vec_env = MakeCPUAsyncConstructor(num_cpus)(*vec_env_args(vec_env, num_vec_envs))
         63     if base_class == "gymnasium":
         64         return vec_env
    
    File ~\anaconda3\envs\gym\lib\site-packages\supersuit\vector\concat_vec_env.py:22, in ConcatVecEnv.__init__(self, vec_env_fns, obs_space, act_space)
         21 def __init__(self, vec_env_fns, obs_space=None, act_space=None):
    ---> 22     self.vec_envs = vec_envs = [vec_env_fn() for vec_env_fn in vec_env_fns]
         23     for i in range(len(vec_envs)):
         24         if not hasattr(vec_envs[i], "num_envs"):
    
    File ~\anaconda3\envs\gym\lib\site-packages\supersuit\vector\concat_vec_env.py:22, in <listcomp>(.0)
         21 def __init__(self, vec_env_fns, obs_space=None, act_space=None):
    ---> 22     self.vec_envs = vec_envs = [vec_env_fn() for vec_env_fn in vec_env_fns]
         23     for i in range(len(vec_envs)):
         24         if not hasattr(vec_envs[i], "num_envs"):
    
    File ~\anaconda3\envs\gym\lib\site-packages\supersuit\vector\vector_constructors.py:11, in vec_env_args.<locals>.env_fn()
         10 def env_fn():
    ---> 11     env_copy = cloudpickle.loads(cloudpickle.dumps(env))
         12     return env_copy
    
    UnpicklingError: Memo value not found at index 3
    

    I don't have a lot of experience with Pickle, so any advice would be welcome, thanks.

    opened by framepixel 0
  • pytest no longer bundles py

    pytest no longer bundles py

    py is kinda deprecated and pytest now bundles only a subset of it. It'd be best to stop depending on it. If it's not possible, dependency on py should be explicitly specified so the original package is installed.

    https://github.com/cloudpipe/cloudpickle/blob/f5472e1a2eb4235e61b632b58367dede93dfb30c/tests/cloudpickle_test.py#L1472

    opened by frenzymadness 2
Releases(v2.0.0)
  • v0.5.3(May 14, 2018)

    Installation

    pip install cloudpickle
    

    Changes Since v0.5.2

    • Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types (issue #144).

    • itertools objects can also pickled (PR #156).

    • logging.RootLogger can be also pickled (PR #160).

    Source code(tar.gz)
    Source code(zip)
  • v0.4.4(May 14, 2018)

  • v0.4.3(Feb 13, 2018)

    Installation

    pip install cloudpickle
    

    Changes Since v0.4.2

    • Fixed a regression: AttributeError when loading pickles that hold a reference to a dynamically defined class from the __main__ module. (issue #131).
    • Fixed a crash in Python 2 when serializing non-hashable instancemethods of built-in types. (issue #144)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Oct 26, 2017)

    Installation

    pip install cloudpickle
    

    Changes Since v0.4.0

    • Fixed a crash when pickling dynamic classes whose __dict__ attribute was defined as a property. Most notably, this affected dynamic namedtuples in Python 2. (https://github.com/cloudpipe/cloudpickle/pull/113)
    • Cloudpickle now preserves the __module__ attribute of functions (https://github.com/cloudpipe/cloudpickle/pull/118/).
    • Fixed a crash when pickling modules that don't have a __package__ attribute (https://github.com/cloudpipe/cloudpickle/pull/116).
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Aug 9, 2017)

    Get it while it's briny with

    pip install cloudpickle
    

    Ch-ch-ch-changes

    • Fix functions with empty cells (https://github.com/cloudpipe/cloudpickle/pull/91)
    • Allow pickling Logger objects (https://github.com/cloudpipe/cloudpickle/pull/96)
    • Fix crash when pickling dynamic class cycles (https://github.com/cloudpipe/cloudpickle/pull/102)
    • Support WeakSets and ABCMeta instances (https://github.com/cloudpipe/cloudpickle/pull/104)
    • Ignore "None" modules added to sys.modules (https://github.com/cloudpipe/cloudpickle/pull/107)
    • Remove non-standard __transient__ support (https://github.com/cloudpipe/cloudpickle/pull/110)
    • Catch exception from pickle.whichmodule() (https://github.com/cloudpipe/cloudpickle/pull/112)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.1(May 31, 2017)

    Get it while it's hot with

    pip install cloudpickle
    

    Changes since v0.2.2

    • Import submodules accessed by pickled functions (https://github.com/cloudpipe/cloudpickle/pull/80)
    • Support recursive functions inside closures (https://github.com/cloudpipe/cloudpickle/pull/89, https://github.com/cloudpipe/cloudpickle/pull/90)
    • Fix ResourceWarnings and DeprecationWarnings (https://github.com/cloudpipe/cloudpickle/pull/88)
    • Assume modules with __file__ attribute are not dynamic (https://github.com/cloudpipe/cloudpickle/pull/85)
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(May 30, 2017)

    Get it while it's hot with

    pip install cloudpickle
    

    Changes

    • Import submodules accessed by pickled functions (https://github.com/cloudpipe/cloudpickle/pull/80)
    • Support recursive functions inside closures (https://github.com/cloudpipe/cloudpickle/pull/89, https://github.com/cloudpipe/cloudpickle/pull/90)
    • Fix ResourceWarnings and DeprecationWarnings (https://github.com/cloudpipe/cloudpickle/pull/88)
    • Assume modules with __file__ attribute are not dynamic (https://github.com/cloudpipe/cloudpickle/pull/85)
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Sep 5, 2015)

    cloudpickle bug fix release v0.1.1

    • fixed save_classmethod (#41)
    • now allows users to import cloudpickle to dump and load pickled data (#37)
    • no more pickling of closed files, was broken on Python 3 (#32)
    • more tests!
    Source code(tar.gz)
    Source code(zip)
  • 0.1.0(Apr 16, 2015)

Protocol Buffers - Google's data interchange format

Protocol Buffers - Google's data interchange format Copyright 2008 Google Inc. https://developers.google.com/protocol-buffers/ Overview Protocol Buffe

Protocol Buffers 57.6k Jan 03, 2023
Crappy tool to convert .scw files to .json and and vice versa.

SCW-JSON-TOOL Crappy tool to convert .scw files to .json and vice versa. How to use Run main.py file with two arguments: python main.py scw2json or j

Fred31 5 May 14, 2021
FlatBuffers: Memory Efficient Serialization Library

FlatBuffers FlatBuffers is a cross platform serialization library architected for maximum memory efficiency. It allows you to directly access serializ

Google 19.6k Jan 01, 2023
Generic ASN.1 library for Python

ASN.1 library for Python This is a free and open source implementation of ASN.1 types and codecs as a Python package. It has been first written to sup

Ilya Etingof 223 Dec 11, 2022
Ultra fast JSON decoder and encoder written in C with Python bindings

UltraJSON UltraJSON is an ultra fast JSON encoder and decoder written in pure C with bindings for Python 3.6+. Install with pip: $ python -m pip insta

3.9k Jan 02, 2023
serialize all of python

dill serialize all of python About Dill dill extends python's pickle module for serializing and de-serializing python objects to the majority of the b

The UQ Foundation 1.8k Jan 07, 2023
Python library for serializing any arbitrary object graph into JSON. It can take almost any Python object and turn the object into JSON. Additionally, it can reconstitute the object back into Python.

jsonpickle jsonpickle is a library for the two-way conversion of complex Python objects and JSON. jsonpickle builds upon the existing JSON encoders, s

1.1k Jan 02, 2023
🦉 Modern high-performance serialization utilities for Python (JSON, MessagePack, Pickle)

srsly: Modern high-performance serialization utilities for Python This package bundles some of the best Python serialization libraries into one standa

Explosion 329 Dec 28, 2022
MessagePack serializer implementation for Python msgpack.org[Python]

MessagePack for Python What's this MessagePack is an efficient binary serialization format. It lets you exchange data among multiple languages like JS

MessagePack 1.7k Dec 29, 2022
Python wrapper around rapidjson

python-rapidjson Python wrapper around RapidJSON Authors: Ken Robbins [email prot

469 Jan 04, 2023
Extended pickling support for Python objects

cloudpickle cloudpickle makes it possible to serialize Python constructs not supported by the default pickle module from the Python standard library.

1.3k Jan 05, 2023
A lightweight library for converting complex objects to and from simple Python datatypes.

marshmallow: simplified object serialization marshmallow is an ORM/ODM/framework-agnostic library for converting complex datatypes, such as objects, t

marshmallow-code 6.4k Jan 02, 2023
simplejson is a simple, fast, extensible JSON encoder/decoder for Python

simplejson simplejson is a simple, fast, complete, correct and extensible JSON http://json.org encoder and decoder for Python 3.3+ with legacy suppo

1.5k Dec 31, 2022
Fast, correct Python JSON library supporting dataclasses, datetimes, and numpy

orjson orjson is a fast, correct JSON library for Python. It benchmarks as the fastest Python library for JSON and is more correct than the standard j

4.1k Dec 30, 2022
Python bindings for the simdjson project.

pysimdjson Python bindings for the simdjson project, a SIMD-accelerated JSON parser. If SIMD instructions are unavailable a fallback parser is used, m

Tyler Kennedy 562 Jan 08, 2023
Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data.

Corset is a web-based data selection portal that helps you getting relevant data from massive amounts of parallel data. So, if you don't need the whole corpus, but just a suitable subset (indeed, a c

13 Nov 10, 2022