cuDF - GPU DataFrame Library

Overview

 cuDF - GPU DataFrames

Build Status

NOTE: For the latest stable README.md ensure you are on the main branch.

Built based on the Apache Arrow columnar memory format, cuDF is a GPU DataFrame library for loading, joining, aggregating, filtering, and otherwise manipulating data.

cuDF provides a pandas-like API that will be familiar to data engineers & data scientists, so they can use it to easily accelerate their workflows without going into the details of CUDA programming.

For example, the following snippet downloads a CSV, then uses the GPU to parse it into rows and columns and run calculations:

import cudf, io, requests
from io import StringIO

url = "https://github.com/plotly/datasets/raw/master/tips.csv"
content = requests.get(url).content.decode('utf-8')

tips_df = cudf.read_csv(StringIO(content))
tips_df['tip_percentage'] = tips_df['tip'] / tips_df['total_bill'] * 100

# display average tip by dining party size
print(tips_df.groupby('size').tip_percentage.mean())

Output:

size
1    21.729201548727808
2    16.571919173482897
3    15.215685473711837
4    14.594900639351332
5    14.149548965142023
6    15.622920072028379
Name: tip_percentage, dtype: float64

For additional examples, browse our complete API documentation, or check out our more detailed notebooks.

Quick Start

Please see the Demo Docker Repository, choosing a tag based on the NVIDIA CUDA version you’re running. This provides a ready to run Docker container with example notebooks and data, showcasing how you can utilize cuDF.

Installation

CUDA/GPU requirements

  • CUDA 10.1+
  • NVIDIA driver 418.39+
  • Pascal architecture or better (Compute Capability >=6.0)

Conda

cuDF can be installed with conda (miniconda, or the full Anaconda distribution) from the rapidsai channel:

For cudf version == 0.18 :

# for CUDA 10.1
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
    cudf=0.18 python=3.7 cudatoolkit=10.1

# or, for CUDA 10.2
conda install -c rapidsai -c nvidia -c numba -c conda-forge \
    cudf=0.18 python=3.7 cudatoolkit=10.2

For the nightly version of cudf :

# for CUDA 10.1
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
    cudf python=3.7 cudatoolkit=10.1

# or, for CUDA 10.2
conda install -c rapidsai-nightly -c nvidia -c numba -c conda-forge \
    cudf python=3.7 cudatoolkit=10.2

Note: cuDF is supported only on Linux, and with Python versions 3.7 and later.

See the Get RAPIDS version picker for more OS and version info.

Build/Install from Source

See build instructions.

Contributing

Please see our guide for contributing to cuDF.

Contact

Find out more details on the RAPIDS site

Open GPU Data Science

The RAPIDS suite of open source software libraries aim to enable execution of end-to-end data science and analytics pipelines entirely on GPUs. It relies on NVIDIA® CUDA® primitives for low-level compute optimization, but exposing that GPU parallelism and high-bandwidth memory speed through user-friendly Python interfaces.

Apache Arrow on GPU

The GPU version of Apache Arrow is a common API that enables efficient interchange of tabular data between processes running on the GPU. End-to-end computation on the GPU avoids unnecessary copying and converting of data off the GPU, reducing compute time and cost for high-performance analytics common in artificial intelligence workloads. As the name implies, cuDF uses the Apache Arrow columnar data format on the GPU. Currently, a subset of the features in Apache Arrow are supported.

Comments
  • Make a plan for sort_values/set_index

    Make a plan for sort_values/set_index

    It would be nice to be able to use the set_index method to sort the dataframe by a particular column.

    There are currently two implementations for this, one in dask.dataframe and one in dask-cudf which uses a batcher sorting net. While most dask-cudf code has been removed in favor of the dask.dataframe implementations this sorting code has remained, mostly because I don't understand it fully, and don't know if there was a reason for this particular implementation.

    Why was this implementation chosen? Was this discussed somewhere? Alternatively @sklam, do you have any information here?

    cc @kkraus14 @randerzander

    cuDF (Python) dask dask-cudf 
    opened by mrocklin 80
  • [WIP] Update cudf.to_parquet to use new GPU accelerated Parquet writer

    [WIP] Update cudf.to_parquet to use new GPU accelerated Parquet writer

    Update cudf.to_parquet to use new GPU accelerated Parquet writer. This including creating the appropriate c++ interface in io_writers and io_functions along with modifications to parquet pyx and pxd files.

    This closes #3574

    libcudf cuDF (Python) 5 - Ready to Merge cuIO Cython 
    opened by jdye64 66
  • [QST] cuDF performance with gridsearchcv

    [QST] cuDF performance with gridsearchcv

    In a conversation with @kkraus14 about cudf usage with cuml+gridsearch we looked at cudf performance. Attached is profile plot of running gridsearch+cuml+cudf.

    Screen Shot 2019-05-30 at 1 12 02 PM

    Folks can download the full dask profile here: https://gist.github.com/quasiben/1da49c5aa6e61d979dd42ce6c50e79b3

    In the image above you can see that the computation is spending ~80% in the iloc call. My initial thought was that iloc/_prepare_series_for_add could be improved. I believe @kkraus14 suggested we look at host_to_device transfers see if we can build requisite indicies in cuda/cupy/numba_cuda instead of numpy (this is required during splitting/kfold calls)

    question Performance 
    opened by quasiben 59
  • [DISCUSSION] libcudf column abstraction redesign

    [DISCUSSION] libcudf column abstraction redesign

    Creating and interfacing with the gdf_column C struct is rife with problems. We need better abstraction(s) to make libcudf a safer and more pleasant library to use and develop.

    In lieu of adding support for any new column types, the initial goal of a cudf::column design is to ease the current pain points in creating and interfacing with the column data structure in libcudf.

    Goals:

    • Identify pain points with existing gdf_column structure
    • Derive requirements for an abstraction or set of abstractions to ease those pain points
    • Define an API design that satisfies the requirements
    • Provide a working implementation of the design

    Non-Goals

    • Derive requirements to support new column types, e.g., variable width elements, compressed columns, etc.
    • Support delayed materialization or lazy evaluation

    Note that a “Non-Goal” is not something that we want to expressly forbid in our redesign, but rather are not the focus of the current effort. Whenever possible, we can make design decisions that will enable these “Non-Goals” sometime in the future, so long as those decisions do not compromise the timely accomplishment of the above “Goals”

    Process

    1. Gather pain points

    • Those who wish to participate should list 3-5 pain points (in priority order) that they would like to solve with the column redesign.
      • Note that choosing to participate implies a commitment to putting in the effort to derive requirements and provide feedback on designs, i.e., if you want something to change, you’re expected to put in the work to make it happen.
    • Pain points should be submitted by responding to this issue.
    • @jrhemstad will take responsibility for gathering pain points and distilling/organizing based on functional area.
    • Proposed Deadline: 0.7 release

    2. Derive requirements

    • Distill pain points into satisfiable requirements
    • @jrhemstad will take responsibility for providing an initial draft of requirements from pain points and distributing for feedback.
    • Stakeholders will provide feedback on requirements and iterate until consensus is reached on initial requirements
    • Proposed Deadline: 0.8 Release

    3. Design Mock Up

    • Create draft interface of class(es) that attempt to satisfy requirements.
    • APIs should be fully Doxymented.
    • Code does not need to function nor compile
    • Design should be submitted via a PR to cuDF
    • TBD will take responsibility for providing an initial interface design
    • Stakeholders will provide feedback and iterate until consensus is reached on design
    • Proposed Deadline: 0.8 Release

    4. Implementation

    • Implement the agreed upon interface
    • Should provide Google Test unit tests
    • Implementation/testing will likely expose necessary design changes
    • Implementation should be submitted as a PR to cuDF
    • TBD will take responsibility for implementing/testing the design
    • Stakeholders will review implementation PR until consensus is reached
    • Proposed Deadline 0.8 Release

    5. Initial Refactor

    • Two candidate libcudf features shall be chosen for refactoring to use the new cudf::column abstraction
    • Two developers (TBD) will take responsibility for refactoring the features (one each) to use the newly designed abstraction(s) and submitting a cuDF PR for review. At least one of the developers shall be different from the developer who designed and implemented the column abstraction.
    • Any required design changes exposed in refactoring shall be discussed in the PR
    • Stakeholders will review refactored feature until consensus is reached
    • TBD will be responsible for creating/amending a style guide with lessons learned and best practices for refactoring a feature using gdf_column to the new abstraction(s)
    • Proposed Deadline: 0.9 Release

    6. Full Refactor

    • Remaining libcudf features will be refactored one at a time to use the new column abstraction(s)
    • The style guide mentioned above will be distributed to all libcudf developers to provide guidance in this refactoring effort
    • This will be an ongoing process that likely will not be fully complete for several releases
    feature request help wanted proposal libcudf 
    opened by jrhemstad 55
  • [FEA] Make cudf::size_type 64-bit

    [FEA] Make cudf::size_type 64-bit

    Is your feature request related to a problem? Please describe. cudf::size_type is currently an int32_t, which limits column size to two billion elements (MAX_INT). Moreover, it limits child column size to the same. This causes problems, for example, for string columns, where there may be fewer than 2B strings, but the character data to represent them could easily exceed 2B characters.

    A 32-bit size was originally chosen to ensure compatibility with Apache Arrow, which dictates that Arrow arrays have a 32-bit size, and that larger arrays are made by chunking into individual Arrays.

    Describe the solution you'd like

    • Change size_type to be an int64_t.

    • Handle compatibility with Arrow by creating arrow chunked arrays in the libcudf to_arrow interface (not yet created), and combine arrow chunked arrays in the libcudf from_arrow interface. This can be dealt with when we create these APIs.

    Describe alternatives you've considered

    Chunked columns. This would be very challenging -- supporting chunked columns in every algorithm would result in complex distributed algorithms and implementations, where libcudf currently aims to be communication agnostic / ignorant. In other words, a higher level library handles distributed algorithms.

    Additional context

    A potential downside: @felipeblazing called us brave for considering supporting chunked columns. If we implement this feature request, perhaps he will not consider us quite so brave. :(

    feature request wontfix libcudf 
    opened by harrism 52
  • [Discussion] Requirements for schema/column names

    [Discussion] Requirements for schema/column names

    There have been a number of requests related to adding column names, either to the column's themselves and/or to tables and their views.

    libcudf internals don't use column names, so we need requirements to be driven by users that will make use of the names (cuIO/Spark/cuDF).

    For those who need column names, please discuss what you would like to see for column names.

    CC @kkraus14 @revans2 @jlowe @j-ieong @shwina

    feature request cuDF (Python) cuIO helps: Spark 
    opened by jrhemstad 50
  • [BUG] nan_as_null parameter affects output of sort_values.

    [BUG] nan_as_null parameter affects output of sort_values.

    Describe the bug nan_as_null parameter affects output of sort_values.

    Steps/Code to reproduce bug

    In [22]: df = cudf.DataFrame({'a': cudf.Series([np.nan, 1.0, np.nan, 2.0, np.nan, 0.0], nan_as_null=True)})
    
    In [23]: print(df.sort_values(by='a'))
         a
    5  0.0
    1  1.0
    3  2.0
    0
    2
    4
    In [19]: df = cudf.DataFrame({'a': cudf.Series([np.nan, 1.0, np.nan, 2.0, np.nan, 0.0], nan_as_null=False)})
    
    In [20]: print(df.sort_values(by='a'))
         a
    0  nan
    1  1.0
    2  nan
    3  2.0
    4  nan
    5  0.0
    

    similar issues with methods using libcudf APIs. Eg. drop_duplicates (which uses sorting)

    df = cudf.DataFrame({'a': cudf.Series([1.0, np.nan, 0, np.nan, 0, 1], nan_as_null=False)})
    
    In [10]: print(df)
         a
    0  1.0
    1  nan
    2  0.0
    3  nan
    4  0.0
    5  1.0
    
    In [11]: print(df.drop_duplicates())
         a
    0  1.0
    1  nan
    2  0.0
    3  nan
    4  0.0
    5  1.0
    

    Expected behavior For sorting, drop_duplicates, nan should be considered equal.

    Environment overview (please complete the following information)

    • Environment location: Bare-metal
    • Method of cuDF install: from source
    bug libcudf helps: Spark 
    opened by karthikeyann 48
  • [DISCUSSION] Behavior for NaN comparisons in libcudf

    [DISCUSSION] Behavior for NaN comparisons in libcudf

    Recent issues (https://github.com/rapidsai/cudf/issues/4753 https://github.com/rapidsai/cudf/issues/4752) have called into question how libcudf handles NaN floating point values. We've only ever addressed this issue on an ad hoc basis as opposed to having a larger conversation about the issue.

    C++

    C++ follows the IEEE 754 standard for floating point values, which for comparisons with NaN has the following behavior:

    | Comparison | NaN ≥ x | NaN ≤ x | NaN > x | NaN < x | NaN = x | NaN ≠ x | |------------|--------------|--------------|--------------|--------------|--------------|-------------| | Result | Always False | Always False | Always False | Always False | Always False | Always True |

    https://en.wikipedia.org/wiki/NaN

    Spark

    Spark is non-conforming with the IEEE 754 standard:

    | Comparison | NaN ≥ x | NaN ≤ x | NaN > x | NaN < x | NaN = x | NaN ≠ x | |------------|-------------|-----------------------|-------------|--------------|------------------|----------------------| | Result | Always True | False unless x is NaN | Always True | Always False | True only if x is NaN | True unless x is NaN |

    See https://spark.apache.org/docs/latest/sql-reference.html#nan-semantics

    Python/Pandas

    Python is a bit of a grey area because prior to 1.0, Pandas did not have the concept of "null" values and used NaN's in their stead.

    In most regards, Python does respect IEEE 754. For example, see how numpy conforms with the expected IEEE754 behavior in binary ops https://github.com/rapidsai/cudf/issues/4752#issuecomment-606649251 (where Spark does not).

    However, there are some cases where Pandas is non-conforming due to the pseudo-null behavior. For example, in sort_values there is a na_position argument to control where NaN values are placed. This requires specializing the libcudf comparator used for sorting to special case floating point values and deviate from the IEEE 754 behavior of NaN < x == false and NaN > x == false. See https://github.com/rapidsai/cudf/issues/2191 and https://github.com/rapidsai/cudf/issues/3226 where this was done previously.

    That said, I believe Python's requirements could be satisfied by always converting NaN values to nulls, but @shwina @kkraus14 will need to confirm. Prior to Pandas 1.0, it wasn't possible to have both NaN and NULL values in a floating point column. We should see what the expected behavior is of NaNs vs Nulls will be in 1.0.

    Discussion

    We need to have a conversation and make decisions on what libcudf will and will not do with respect to NaN behavior.

    My stance is that libcudf should adhere to IEEE 754. Spark's semantics redefine a core concept of the C++ language/IEEE standard and satisfying those semantics would require extremely invasive changes that negatively impact both performance and code maintainability.

    Even worse, because Spark differs from C++/Pandas, we need to provide separate code paths for all comparison based operations: a "Spark" path, and a "C++/Pandas" path. This further increases code bloat and maintenance costs.

    Furthermore, for consistency, I think we should roll back the non-conformant changes introduced for comparators in https://github.com/rapidsai/cudf/issues/3226.

    In conclusion, we already have special logic for handling NULLs everywhere in libcudf. Users should leverage that logic by converting NaNs to NULLs. I understand that vanilla Spark treats NaNs and NULLs independently, but I believe trying to imitate that behavior in libcudf comes at too high a cost.

    libcudf cuDF (Python) helps: Spark 
    opened by jrhemstad 45
  • [DOC] [BUG] Building from source fails as deps are not fetched

    [DOC] [BUG] Building from source fails as deps are not fetched

    Describe the bug Building v0.9.0 from source fails as some dependencies are missing or not fetched.

    Steps/Code to reproduce bug

    • git clone and checkout v0.9.0.
    • update submodules
    • bash build.sh libcudf
    -- RMM: RMM_LIBRARY set to RMM_LIBRARY-NOTFOUND
    -- RMM: RMM_INCLUDE set to RMM_INCLUDE-NOTFOUND
    -- DLPACK: DLPACK_INCLUDE set to DLPACK_INCLUDE-NOTFOUND
    -- NVSTRINGS: NVSTRINGS_INCLUDE set to NVSTRINGS_INCLUDE-NOTFOUND
    -- NVSTRINGS: NVSTRINGS_LIBRARY set to NVSTRINGS_LIBRARY-NOTFOUND
    -- NVSTRINGS: NVCATEGORY_LIBRARY set to NVCATEGORY_LIBRARY-NOTFOUND
    -- NVSTRINGS: NVTEXT_LIBRARY set to NVTEXT_LIBRARY-NOTFOUND
    

    Expected behavior Build succeeds without missing deps.

    Environment overview (please complete the following information)

    • Environment location: Centos 7, avx512
    • Method of cuDF install: source

    Additional context

    • The documentation does not state dlpack, rmm or nvstrings as dependencies.
    • According to the RMM README

    RMM currently must be built from source. This happens automatically in a submodule when you build or install cuDF or RAPIDS containers.

    Users should then expect that theses deps should be automatically pulled.

    bug doc 
    opened by ccoulombe 42
  • [FEA] CUDA versions between PyTorch and RAPIDS

    [FEA] CUDA versions between PyTorch and RAPIDS

    Hi Developers,

    Thanks for the great tools you have made. Our group would like to use cudf for deep learning, however pytorch currently only support CUDA 10.2 and CUDA 11.1, the nightly version rapids supported is CUDA 11.0 and 11.2, which is a pain for users (mostly scientist) if they need to compile either pytorch or rapids from source. Is that possible for rapids to support CUDA 11.1 for user to install from conda?

    I noticed that #8224 just remove cuda 11.1 related files.

    Thanks! Richard

    feature request conda 
    opened by yueyericardo 41
  • Use cuFile for Parquet IO when available

    Use cuFile for Parquet IO when available

    Adds optional cuFile integration:

    • cufile.h is included in the build when available.
    • libcufile.so is loaded at runtime if LIBCUDF_CUFILE_POLICY environment variable is set to "ALWAYS" or "GDS".
    • cuFile compatibility mode is set through the same policy variable - "ALWAYS" means on, "GDS" means off.
    • cuFile is currently only used on Parquet R/W and in CSV writer.
    • device_read/write API can be used with file datasource/data_sink.
    • Added CUDA stream to device_read.
    libcudf CMake cuIO Performance improvement non-breaking 
    opened by vuule 41
  • [QST] CPU memory spike during cudf dataframe conversion

    [QST] CPU memory spike during cudf dataframe conversion

    Hi all, I have a dataframe that is ~19K Rows, ~11.4 MB (Profiled using df.info(memory_usage = "deep")). We are currently running into CPU out of memory issues and so profiling our memory using this sample dataset. As you can see in the screenshot attached, there is a jump in mem usage, from 840MiB -> 4148MiB, during the type conversion of df. Image below shows the dataframe memory usage after conversion.

    My question is: Why is there a jump in the memory usage when converting a dataframe from pandas to cudf? Furthermore, this memory is not released after, and so increases from this point in following processing steps.

    Screenshot 2023-01-09 at 6 06 57 PM Screenshot 2023-01-09 at 6 08 11 PM question ? - Needs Triage 
    opened by lvxhnat 0
  • nvcc fatal   : Unsupported gpu architecture 'compute_NATIVE'

    nvcc fatal : Unsupported gpu architecture 'compute_NATIVE'

    When I run ./build.sh with branch 22.12 ,the erro print log is nvcc fatal : Unsupported gpu architecture 'compute_NATIVE' How can I deal the erro ,thank you.

    question ? - Needs Triage 
    opened by newjavaer 0
  • JSON data page in user guide

    JSON data page in user guide

    Description

    Adding a walkthrough to the User Guide that extracts data from common JSON structures.

    Checklist

    • [ ] I am familiar with the Contributing Guidelines.
    • [ ] New or existing tests cover these changes.
    • [ ] The documentation is up to date with these changes.
    opened by GregoryKimball 1
  • Support

    Support "values" orient (array of arrays) in Nested JSON reader

    Description

    Legacy GPU JSON reader can read "values" orient data in JSON string (only JSON lines). With this PR change, Nested JSON reader can also reader "values" orient data for both JSON lines and non-line JSON string.

    Examples:

    import cudf
    json="[[1, 2, 3], [4, 5], [7, 8, 9]]"
    cudf.read_json(json, engine="cudf_experimental")
       0  1     2
    0  1  2     3
    1  4  5  <NA>
    2  7  8     9
    json="[1, 2, 3]\n [4, 5, null]\n [7, 8, [9]]"
    cudf.read_json(json, engine="cudf_experimental", lines=True)
       0  1     2
    0  1  2  None
    1  4  5  None
    2  7  8   [9]
    

    Note that pandas passes "values" data but with orient="records" argument, but it is parsed as "values". Similar support is added here too. Passing values with orient="records" will still work).

    Checklist

    • [x] I am familiar with the Contributing Guidelines.
    • [ ] New or existing tests cover these changes.
    • [ ] The documentation is up to date with these changes.
    feature request 2 - In Progress libcudf cuDF (Python) cuIO non-breaking 
    opened by karthikeyann 0
  • [FEA] category dtype support in parquet reader

    [FEA] category dtype support in parquet reader

    Is your feature request related to a problem? Please describe. writing code with import cudf as pd

    Describe the solution you'd like same behavior as import pandas as pd

    In [1]: import cudf as pd
    
    In [2]: pd.__version__
    Out[2]: '22.12.01'
    
    In [3]: df = pd.DataFrame({'a': ['one','two','three'] * 10})
    
    In [4]: df.info()
    <class 'cudf.core.dataframe.DataFrame'>
    RangeIndex: 30 entries, 0 to 29
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   a       30 non-null     object
    dtypes: object(1)
    memory usage: 234.0+ bytes
    
    In [5]: df.a = df.astype('category')
    
    In [6]: df.info()
    <class 'cudf.core.dataframe.DataFrame'>
    RangeIndex: 30 entries, 0 to 29
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   a       30 non-null     category
    dtypes: category(1)
    memory usage: 57.0 bytes
    
    In [7]: %ls df.parquet
    ls: cannot access 'df.parquet': No such file or directory
    
    In [8]: df.to_pandas().to_parquet('df.parquet')
    
    In [9]: %ls df.parquet
    df.parquet
    
    In [10]: pd.read_parquet('df.parquet').info()
    <class 'cudf.core.dataframe.DataFrame'>
    RangeIndex: 30 entries, 0 to 29
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   a       30 non-null     object
    dtypes: object(1)
    memory usage: 234.0+ bytes
    
    In [11]: import pandas
    
    In [12]: pandas.read_parquet('df.parquet').info()
    <class 'pandas.core.frame.DataFrame'>
    RangeIndex: 30 entries, 0 to 29
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype   
    ---  ------  --------------  -----   
     0   a       30 non-null     category
    dtypes: category(1)
    memory usage: 290.0 bytes
    
    In [13]: pd.DataFrame(pandas.read_parquet('df.parquet')).info()
    <class 'cudf.core.dataframe.DataFrame'>
    RangeIndex: 30 entries, 0 to 29
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   a       30 non-null     category
    dtypes: category(1)
    memory usage: 57.0 bytes
    

    the parquet reader turns the column into dtype=object

    In [10]: pd.read_parquet('df.parquet').info()
    <class 'cudf.core.dataframe.DataFrame'>
    RangeIndex: 30 entries, 0 to 29
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   a       30 non-null     object
    dtypes: object(1)
    memory usage: 234.0+ bytes
    
    feature request ? - Needs Triage 
    opened by mattf 0
  • [FEA] category dtype support in parquet writer

    [FEA] category dtype support in parquet writer

    Is your feature request related to a problem? Please describe. writing code with import cudf as pd

    Describe the solution you'd like same behavior as import pandas as pd

    In [1]: import cudf as pd
    
    In [2]: pd.__version__
    Out[2]: '22.12.01'
    
    In [3]: df = pd.DataFrame({'a': ['one','two','three'] * 10})
    
    In [4]: df.info()
    <class 'cudf.core.dataframe.DataFrame'>
    RangeIndex: 30 entries, 0 to 29
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   a       30 non-null     object
    dtypes: object(1)
    memory usage: 234.0+ bytes
    
    In [5]: df.a = df.astype('category')
    
    In [6]: df.info()
    <class 'cudf.core.dataframe.DataFrame'>
    RangeIndex: 30 entries, 0 to 29
    Data columns (total 1 columns):
     #   Column  Non-Null Count  Dtype
    ---  ------  --------------  -----
     0   a       30 non-null     category
    dtypes: category(1)
    memory usage: 57.0 bytes
    
    In [7]: df.to_parquet('df.parquet')
    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    Cell In[7], line 1
    ----> 1 df.to_parquet('df.parquet')
    
    File .../lib/python3.8/site-packages/cudf/core/dataframe.py:6287, in DataFrame.to_parquet(self, path, engine, compression, index, partition_cols, partition_file_name, partition_offsets, statistics, metadata_file_path, int96_timestamps, row_group_size_bytes, row_group_size_rows, max_page_size_bytes, max_page_size_rows, storage_options, return_metadata, *args, **kwargs)
       6284 """{docstring}"""
       6285 from cudf.io import parquet
    -> 6287 return parquet.to_parquet(
       6288     self,
       6289     path=path,
       6290     engine=engine,
       6291     compression=compression,
       6292     index=index,
       6293     partition_cols=partition_cols,
       6294     partition_file_name=partition_file_name,
       6295     partition_offsets=partition_offsets,
       6296     statistics=statistics,
       6297     metadata_file_path=metadata_file_path,
       6298     int96_timestamps=int96_timestamps,
       6299     row_group_size_bytes=row_group_size_bytes,
       6300     row_group_size_rows=row_group_size_rows,
       6301     max_page_size_bytes=max_page_size_bytes,
       6302     max_page_size_rows=max_page_size_rows,
       6303     storage_options=storage_options,
       6304     return_metadata=return_metadata,
       6305     *args,
       6306     **kwargs,
       6307 )
    
    File .../lib/python3.8/contextlib.py:75, in ContextDecorator.__call__.<locals>.inner(*args, **kwds)
         72 @wraps(func)
         73 def inner(*args, **kwds):
         74     with self._recreate_cm():
    ---> 75         return func(*args, **kwds)
    
    File .../lib/python3.8/site-packages/cudf/io/parquet.py:700, in to_parquet(df, path, engine, compression, index, partition_cols, partition_file_name, partition_offsets, statistics, metadata_file_path, int96_timestamps, row_group_size_bytes, row_group_size_rows, max_page_size_bytes, max_page_size_rows, storage_options, return_metadata, *args, **kwargs)
        698     if partition_cols is None or col not in partition_cols:
        699         if df[col].dtype.name == "category":
    --> 700             raise ValueError(
        701                 "'category' column dtypes are currently not "
        702                 + "supported by the gpu accelerated parquet writer"
        703             )
        705 if partition_cols:
        706     if metadata_file_path is not None:
    
    ValueError: 'category' column dtypes are currently not supported by the gpu accelerated parquet writer
    
    In [8]: df.to_pandas().to_parquet('df.parquet')
    
    In [9]: %ls df.parquet
    df.parquet
    
    feature request ? - Needs Triage 
    opened by mattf 0
Releases(v22.12.01)
  • v22.12.01(Dec 8, 2022)

    🚨 Breaking Changes

    • Add JNI for substring without 'end' parameter. (#12113) @firestarman
    • Refactor purge_nonempty_nulls (#12111) @ttnghia
    • Create an int8 column in read_csv when all elements are missing (#12110) @vuule
    • Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to &quot;ALWAYS&quot; (#12080) @vuule
    • Fix type promotion edge cases in numerical binops (#12074) @wence-
    • Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
    • Rollback of DeviceBufferLike (#12009) @madsbk
    • Remove unused managed_allocator (#12005) @vyasr
    • Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
    • Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
    • Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
    • Remove validation that requires introspection (#11938) @vyasr
    • Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
    • Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
    • Support nested types as groupby keys in libcudf (#11792) @PointKernel
    • Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
    • Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
    • part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source

    🐛 Bug Fixes

    • strings_udf: use libcudf caching of character tables (#12343) @wence-
    • Fix include line for IO Cython modules (#12250) @vyasr
    • Make dask pinning looser (#12231) @vyasr
    • Workaround for CUB segmented-sort bug with boolean keys (#12217) @davidwendt
    • Fix from_dict backend dispatch to match upstream dask (#12203) @galipremsagar
    • Merge branch-22.10 into branch-22.12 (#12198) @davidwendt
    • Fix compression in ORC writer (#12194) @vuule
    • Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug (#12188) @robertmaynard
    • Fix data corruption when reading ORC files with empty stripes (#12160) @vuule
    • Fix decimal binary operations (#12142) @galipremsagar
    • Ensure dlpack include is provided to cudf interop lib (#12139) @robertmaynard
    • Safely allocate udf_string pointers in strings_udf (#12138) @brandon-b-miller
    • Fix/disable jitify lto (#12122) @robertmaynard
    • Fix conditional_full_join benchmark (#12121) @GregoryKimball
    • Fix regex working-memory-size refactor error (#12119) @davidwendt
    • Add in negative size checks for columns (#12118) @revans2
    • Add JNI for substring without 'end' parameter. (#12113) @firestarman
    • Fix reading of CSV files with blank second row (#12098) @vuule
    • Fix an error in IO with GzipFile type (#12085) @galipremsagar
    • Workaround groupby aggregate thrust::copy_if overflow (#12079) @davidwendt
    • Fix alignment of compressed blocks in ORC writer (#12077) @vuule
    • Fix singleton-range __setitem__ edge case (#12075) @wence-
    • Fix type promotion edge cases in numerical binops (#12074) @wence-
    • Force using old fmt in nvbench. (#12067) @vyasr
    • Fixes List offset bug in Nested JSON reader (#12060) @karthikeyann
    • Allow falling back to shim_60.ptx by default in strings_udf (#12056) @brandon-b-miller
    • Force black exclusions for pre-commit. (#12036) @bdice
    • Add memory_usage & items implementation for Struct column & dtype (#12033) @galipremsagar
    • Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
    • Fixes bug in csv_reader_options construction in cython (#12021) @karthikeyann
    • Fix issues when both usecols and names options are used in read_csv (#12018) @vuule
    • Port thrust's pinned_allocator to cudf, since Thrust 1.17 removes the type (#12004) @robertmaynard
    • Revert "Replace most of preprocessor usage in nvcomp adapter with constexpr" (#11999) @vuule
    • Fix bug where df.loc resulting in single row could give wrong index (#11998) @eriknw
    • Switch to DISABLE_DEPRECATION_WARNINGS to match other RAPIDS projects (#11989) @robertmaynard
    • Fix maximum page size estimate in Parquet writer (#11962) @vuule
    • Fix local offset handling in bgzip reader (#11918) @upsj
    • Fix an issue reading struct-of-list types in Parquet. (#11910) @nvdbaranec
    • Fix memcheck error in TypeInference.Timestamp gtest (#11905) @davidwendt
    • Fix type casting in Series.setitem (#11904) @wence-
    • Fix memcheck error in get_dremel_data (#11903) @davidwendt
    • Fixes Unsupported column type error due to empty list columns in Nested JSON reader (#11897) @karthikeyann
    • Fix segmented-sort to ignore indices outside the offsets (#11888) @davidwendt
    • Fix cudf::stable_sorted_order for NaN and -NaN in FLOAT64 columns (#11874) @davidwendt
    • Fix writing of Parquet files with many fragments (#11869) @etseidl
    • Fix RangeIndex unary operators. (#11868) @vyasr
    • JNI Avoid NPE for reading host binary data (#11865) @revans2
    • Fix decimal benchmark input data generation (#11863) @karthikeyann
    • Fix pre-commit copyright check (#11860) @galipremsagar
    • Fix Parquet support for seconds and milliseconds duration types (#11854) @vuule
    • Ensure better compiler cache results between cudf cal-ver branches (#11835) @robertmaynard
    • Fix make_column_from_scalar for all-null strings column (#11807) @davidwendt
    • Tell jitify_preprocess where to search for libnvrtc (#11787) @robertmaynard
    • add V2 page header support to parquet reader (#11778) @etseidl
    • Parquet reader: bug fix for a num_rows/skip_rows corner case, w/optimization for nested preprocessing (#11752) @nvdbaranec
    • Determine if Arrow has S3 support at runtime in unit test. (#11560) @bdice

    📖 Documentation

    • Use rapidsai CODE_OF_CONDUCT.md (#12166) @bdice
    • Add symlinks to notebooks. (#12128) @bdice
    • Add truncate API to python doc pages (#12109) @galipremsagar
    • Update Numba docs links. (#12107) @bdice
    • Remove "Multi-GPU with Dask-cuDF" notebook. (#12095) @bdice
    • Fix link to c++ developer guide from CONTRIBUTING.md (#12084) @brandon-b-miller
    • Add pivot_table and crosstab to docs. (#12014) @bdice
    • Fix doxygen text for cudf::dictionary::encode (#11991) @davidwendt
    • Replace default_stream_value with get_default_stream in docs. (#11985) @vyasr
    • Add dtype docs pages and docstrings for cudf specific dtypes (#11974) @galipremsagar
    • Update Unit Testing in libcudf guidelines to code tests outside the cudf::test namespace (#11959) @davidwendt
    • Rename libcudf++ to libcudf. (#11953) @bdice
    • Fix documentation referring to removed as_gpu_matrix method. (#11937) @bdice
    • Remove "experimental" warning for struct columns in ORC reader and writer (#11880) @vuule
    • Initial draft of policies and guidelines for libcudf usage. (#11853) @vyasr
    • Add clear indication of non-GPU accelerated parameters in read_json docstring (#11825) @GregoryKimball
    • Add developer docs for writing tests (#11199) @vyasr

    🚀 New Features

    • Adds an EventHandler to Java MemoryBuffer to be invoked on close (#12125) @abellina
    • Support + in strings_udf (#12117) @brandon-b-miller
    • Support upper and lower in strings_udf (#12099) @brandon-b-miller
    • Add wheel builds (#12096) @vyasr
    • Allow setting malloc heap size in string udfs (#12094) @brandon-b-miller
    • Support strip, lstrip, and rstrip in strings_udf (#12091) @brandon-b-miller
    • Mark nvcomp zstd compression stable (#12059) @jbrennan333
    • Add debug-only onAllocated/onDeallocated to RmmEventHandler (#12054) @abellina
    • Enable building against the libarrow contained in pyarrow (#12034) @vyasr
    • Add strings like jni and native method (#12032) @cindyyuanjiang
    • Cleanup common parsing code in JSON, CSV reader (#12022) @karthikeyann
    • byte_range support for JSON Lines format (#12017) @karthikeyann
    • Minor cleanup of root CMakeLists.txt for better organization (#11988) @robertmaynard
    • Add inplace arithmetic operators to MaskedType (#11987) @brandon-b-miller
    • Implement JNI for chunked Parquet reader (#11961) @ttnghia
    • Add method argument to DataFrame.quantile (#11957) @rjzamora
    • Add gpu memory watermark apis to JNI (#11950) @abellina
    • Adds retryCount to RmmEventHandler.onAllocFailure (#11940) @abellina
    • Enable returning string data from UDFs used through apply (#11933) @brandon-b-miller
    • Switch over to rapids-cmake patches for thrust (#11921) @robertmaynard
    • Add strings udf C++ classes and functions for phase II (#11912) @davidwendt
    • Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
    • Enable CEC for strings_udf (#11884) @brandon-b-miller
    • ArrowIPCTableWriter writes en empty batch in the case of an empty table. (#11883) @firestarman
    • Implement chunked Parquet reader (#11867) @ttnghia
    • Add read_orc_metadata to libcudf (#11815) @vuule
    • Support nested types as groupby keys in libcudf (#11792) @PointKernel
    • Adding feature Truncate to DataFrame and Series (#11435) @VamsiTallam95

    🛠️ Improvements

    • Reduce number of tests marked spilling (#12197) @madsbk
    • Pin dask and distributed for release (#12165) @galipremsagar
    • Don't rely on GNU find in headers_test.sh (#12164) @wence-
    • Update cp.clip call (#12148) @quasiben
    • Enable automatic column projection in groupby().agg (#12124) @rjzamora
    • Refactor purge_nonempty_nulls (#12111) @ttnghia
    • Create an int8 column in read_csv when all elements are missing (#12110) @vuule
    • Spilling to host memory (#12106) @madsbk
    • First pass of pd.read_orc changes in tests (#12103) @galipremsagar
    • Expose engine argument in dask_cudf.read_json (#12101) @rjzamora
    • Remove CUDA 10 compatibility code. (#12088) @bdice
    • Move and update dask nigthly install in CI (#12082) @galipremsagar
    • Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to &quot;ALWAYS&quot; (#12080) @vuule
    • Remove macros that inspect the contents of exceptions (#12076) @vyasr
    • Fix ingest_raw_data performance issue in Nested JSON reader due to RVO (#12070) @karthikeyann
    • Remove overflow error during decimal binops (#12063) @galipremsagar
    • Change cudf::detail::tdigest to cudf::tdigest::detail (#12050) @davidwendt
    • Fix quantile gtests coded in namespace cudf::test (#12049) @davidwendt
    • Add support for DataFrame.from_dict`to_dictandSeries.to_dict` (#12048) @galipremsagar
    • Refactor Parquet reader (#12046) @ttnghia
    • Forward merge 22.10 into 22.12 (#12045) @vyasr
    • Standardize newlines at ends of files. (#12042) @bdice
    • Trim trailing whitespace from all files. (#12041) @bdice
    • Use nosync policy in gather and scatter implementations. (#12038) @bdice
    • Remove smart quotes from all docstrings. (#12035) @bdice
    • Update cuda-python dependency to 11.7.1 (#12030) @galipremsagar
    • Add cython-lint to pre-commit checks. (#12020) @bdice
    • Use pragma once (#12019) @bdice
    • New GHA to add issues/prs to project board (#12016) @jarmak-nv
    • Add DataFrame.pivot_table. (#12015) @bdice
    • Rollback of DeviceBufferLike (#12009) @madsbk
    • Remove default parameters for nvtext::detail functions (#12007) @davidwendt
    • Remove default parameters for cudf::dictionary::detail functions (#12006) @davidwendt
    • Remove unused managed_allocator (#12005) @vyasr
    • Remove default parameters for cudf::strings::detail functions (#12003) @davidwendt
    • Remove unnecessary code from dask-cudf _Frame (#12001) @rjzamora
    • Ignore python docs build artifacts (#12000) @galipremsagar
    • Use rapids-cmake for google benchmark. (#11997) @vyasr
    • Leverage rapids_cython for more automated RPATH handling (#11996) @vyasr
    • Remove stale labeler (#11995) @raydouglass
    • Move protobuf compilation to CMake (#11986) @vyasr
    • Replace most of preprocessor usage in nvcomp adapter with constexpr (#11980) @vuule
    • Add missing noexcepts to column_in_metadata methods (#11973) @vyasr
    • Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
    • Accelerate libcudf segmented sort with CUB segmented sort (#11969) @davidwendt
    • Feature/remove default streams (#11967) @vyasr
    • Add pool memory resource to libcudf basic example (#11966) @davidwendt
    • Fix some libcudf calls to cudf::detail::gather (#11963) @davidwendt
    • Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
    • Add deprecation warning for set_allocator. (#11958) @vyasr
    • Fix lists and structs gtests coded in namespace cudf::test (#11956) @davidwendt
    • Add full page indexes to Parquet writer benchmarks (#11955) @etseidl
    • Use gather-based strings factory in cudf::strings::strip (#11954) @davidwendt
    • Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
    • Add strip_delimiters option to read_text (#11946) @upsj
    • Refactor multibyte_split output_builder (#11945) @upsj
    • Remove validation that requires introspection (#11938) @vyasr
    • Add .str.find_multiple API (#11928) @galipremsagar
    • Add regex_program class for use with all regex APIs (#11927) @davidwendt
    • Enable backend dispatching for Dask-DataFrame creation (#11920) @rjzamora
    • Performance improvement in JSON Tree traversal (#11919) @karthikeyann
    • Fix some gtests incorrectly coded in namespace cudf::test (part I) (#11917) @davidwendt
    • Refactor pad/zfill functions for reuse with strings udf (#11914) @davidwendt
    • Add nanosecond & microsecond to DatetimeProperties (#11911) @galipremsagar
    • Pin mimesis version in setup.py. (#11906) @bdice
    • Error on ListColumn or any new unsupported column in cudf.Index (#11902) @galipremsagar
    • Add thrust output iterator fix (1805) to thrust.patch (#11900) @davidwendt
    • Relax codecov threshold diff (#11899) @galipremsagar
    • Use public APIs in STREAM_COMPACTION_NVBENCH (#11892) @GregoryKimball
    • Add coverage for string UDF tests. (#11891) @vyasr
    • Provide data_chunk_source wrapper for datasource (#11886) @upsj
    • Handle multibyte_split byte_range out-of-bounds offsets on host (#11885) @upsj
    • Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
    • Change expect_strings_empty into expect_column_empty libcudf test utility (#11873) @davidwendt
    • Add ngroup (#11871) @shwina
    • Reduce memory usage in nested JSON parser - tree generation (#11864) @karthikeyann
    • Unpin dask and distributed for development (#11859) @galipremsagar
    • Remove unused includes for table/row_operators (#11857) @GregoryKimball
    • Use conda-forge's pyorc (#11855) @jakirkham
    • Add libcudf strings examples (#11849) @davidwendt
    • Remove cudf_io namespace alias (#11827) @vuule
    • Test/remove thrust vector usage (#11813) @vyasr
    • Add BGZIP reader to python read_text (#11802) @upsj
    • Merge branch-22.10 into branch-22.12 (#11801) @davidwendt
    • Fix compile warning from CUDF_FUNC_RANGE in a member function (#11798) @davidwendt
    • Update cudf JNI version to 22.12.0-SNAPSHOT (#11764) @pxLi
    • Update flake8 to 5.0.4 and use flake8-force to check Cython. (#11736) @bdice
    • Add BGZIP multibyte_split benchmark (#11723) @upsj
    • Bifurcate Dependency Lists (#11674) @bdice
    • Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
    • Conform "bench_isin" to match generator column names (#11549) @GregoryKimball
    • Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
    • Add checks for HLG layers in dask-cudf groupby tests (#10853) @charlesbluca
    • part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source
    • Make all nvcc warnings into errors (#8916) @trxcllnt
    Source code(tar.gz)
    Source code(zip)
  • v22.12.00(Dec 8, 2022)

    🚨 Breaking Changes

    • Add JNI for substring without 'end' parameter. (#12113) @firestarman
    • Refactor purge_nonempty_nulls (#12111) @ttnghia
    • Create an int8 column in read_csv when all elements are missing (#12110) @vuule
    • Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to &quot;ALWAYS&quot; (#12080) @vuule
    • Fix type promotion edge cases in numerical binops (#12074) @wence-
    • Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
    • Rollback of DeviceBufferLike (#12009) @madsbk
    • Remove unused managed_allocator (#12005) @vyasr
    • Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
    • Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
    • Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
    • Remove validation that requires introspection (#11938) @vyasr
    • Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
    • Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
    • Support nested types as groupby keys in libcudf (#11792) @PointKernel
    • Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
    • Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
    • part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source

    🐛 Bug Fixes

    • Fix include line for IO Cython modules (#12250) @vyasr
    • Make dask pinning looser (#12231) @vyasr
    • Workaround for CUB segmented-sort bug with boolean keys (#12217) @davidwendt
    • Fix from_dict backend dispatch to match upstream dask (#12203) @galipremsagar
    • Merge branch-22.10 into branch-22.12 (#12198) @davidwendt
    • Fix compression in ORC writer (#12194) @vuule
    • Don't use CMake 3.25.0 as it has a show stopping FindCUDAToolkit bug (#12188) @robertmaynard
    • Fix data corruption when reading ORC files with empty stripes (#12160) @vuule
    • Fix decimal binary operations (#12142) @galipremsagar
    • Ensure dlpack include is provided to cudf interop lib (#12139) @robertmaynard
    • Safely allocate udf_string pointers in strings_udf (#12138) @brandon-b-miller
    • Fix/disable jitify lto (#12122) @robertmaynard
    • Fix conditional_full_join benchmark (#12121) @GregoryKimball
    • Fix regex working-memory-size refactor error (#12119) @davidwendt
    • Add in negative size checks for columns (#12118) @revans2
    • Add JNI for substring without 'end' parameter. (#12113) @firestarman
    • Fix reading of CSV files with blank second row (#12098) @vuule
    • Fix an error in IO with GzipFile type (#12085) @galipremsagar
    • Workaround groupby aggregate thrust::copy_if overflow (#12079) @davidwendt
    • Fix alignment of compressed blocks in ORC writer (#12077) @vuule
    • Fix singleton-range __setitem__ edge case (#12075) @wence-
    • Fix type promotion edge cases in numerical binops (#12074) @wence-
    • Force using old fmt in nvbench. (#12067) @vyasr
    • Fixes List offset bug in Nested JSON reader (#12060) @karthikeyann
    • Allow falling back to shim_60.ptx by default in strings_udf (#12056) @brandon-b-miller
    • Force black exclusions for pre-commit. (#12036) @bdice
    • Add memory_usage & items implementation for Struct column & dtype (#12033) @galipremsagar
    • Reduce/Remove reliance on **kwargs and *args in IO readers & writers (#12025) @galipremsagar
    • Fixes bug in csv_reader_options construction in cython (#12021) @karthikeyann
    • Fix issues when both usecols and names options are used in read_csv (#12018) @vuule
    • Port thrust's pinned_allocator to cudf, since Thrust 1.17 removes the type (#12004) @robertmaynard
    • Revert "Replace most of preprocessor usage in nvcomp adapter with constexpr" (#11999) @vuule
    • Fix bug where df.loc resulting in single row could give wrong index (#11998) @eriknw
    • Switch to DISABLE_DEPRECATION_WARNINGS to match other RAPIDS projects (#11989) @robertmaynard
    • Fix maximum page size estimate in Parquet writer (#11962) @vuule
    • Fix local offset handling in bgzip reader (#11918) @upsj
    • Fix an issue reading struct-of-list types in Parquet. (#11910) @nvdbaranec
    • Fix memcheck error in TypeInference.Timestamp gtest (#11905) @davidwendt
    • Fix type casting in Series.setitem (#11904) @wence-
    • Fix memcheck error in get_dremel_data (#11903) @davidwendt
    • Fixes Unsupported column type error due to empty list columns in Nested JSON reader (#11897) @karthikeyann
    • Fix segmented-sort to ignore indices outside the offsets (#11888) @davidwendt
    • Fix cudf::stable_sorted_order for NaN and -NaN in FLOAT64 columns (#11874) @davidwendt
    • Fix writing of Parquet files with many fragments (#11869) @etseidl
    • Fix RangeIndex unary operators. (#11868) @vyasr
    • JNI Avoid NPE for reading host binary data (#11865) @revans2
    • Fix decimal benchmark input data generation (#11863) @karthikeyann
    • Fix pre-commit copyright check (#11860) @galipremsagar
    • Fix Parquet support for seconds and milliseconds duration types (#11854) @vuule
    • Ensure better compiler cache results between cudf cal-ver branches (#11835) @robertmaynard
    • Fix make_column_from_scalar for all-null strings column (#11807) @davidwendt
    • Tell jitify_preprocess where to search for libnvrtc (#11787) @robertmaynard
    • add V2 page header support to parquet reader (#11778) @etseidl
    • Parquet reader: bug fix for a num_rows/skip_rows corner case, w/optimization for nested preprocessing (#11752) @nvdbaranec
    • Determine if Arrow has S3 support at runtime in unit test. (#11560) @bdice

    📖 Documentation

    • Use rapidsai CODE_OF_CONDUCT.md (#12166) @bdice
    • Add symlinks to notebooks. (#12128) @bdice
    • Add truncate API to python doc pages (#12109) @galipremsagar
    • Update Numba docs links. (#12107) @bdice
    • Remove "Multi-GPU with Dask-cuDF" notebook. (#12095) @bdice
    • Fix link to c++ developer guide from CONTRIBUTING.md (#12084) @brandon-b-miller
    • Add pivot_table and crosstab to docs. (#12014) @bdice
    • Fix doxygen text for cudf::dictionary::encode (#11991) @davidwendt
    • Replace default_stream_value with get_default_stream in docs. (#11985) @vyasr
    • Add dtype docs pages and docstrings for cudf specific dtypes (#11974) @galipremsagar
    • Update Unit Testing in libcudf guidelines to code tests outside the cudf::test namespace (#11959) @davidwendt
    • Rename libcudf++ to libcudf. (#11953) @bdice
    • Fix documentation referring to removed as_gpu_matrix method. (#11937) @bdice
    • Remove "experimental" warning for struct columns in ORC reader and writer (#11880) @vuule
    • Initial draft of policies and guidelines for libcudf usage. (#11853) @vyasr
    • Add clear indication of non-GPU accelerated parameters in read_json docstring (#11825) @GregoryKimball
    • Add developer docs for writing tests (#11199) @vyasr

    🚀 New Features

    • Adds an EventHandler to Java MemoryBuffer to be invoked on close (#12125) @abellina
    • Support + in strings_udf (#12117) @brandon-b-miller
    • Support upper and lower in strings_udf (#12099) @brandon-b-miller
    • Add wheel builds (#12096) @vyasr
    • Allow setting malloc heap size in string udfs (#12094) @brandon-b-miller
    • Support strip, lstrip, and rstrip in strings_udf (#12091) @brandon-b-miller
    • Mark nvcomp zstd compression stable (#12059) @jbrennan333
    • Add debug-only onAllocated/onDeallocated to RmmEventHandler (#12054) @abellina
    • Enable building against the libarrow contained in pyarrow (#12034) @vyasr
    • Add strings like jni and native method (#12032) @cindyyuanjiang
    • Cleanup common parsing code in JSON, CSV reader (#12022) @karthikeyann
    • byte_range support for JSON Lines format (#12017) @karthikeyann
    • Minor cleanup of root CMakeLists.txt for better organization (#11988) @robertmaynard
    • Add inplace arithmetic operators to MaskedType (#11987) @brandon-b-miller
    • Implement JNI for chunked Parquet reader (#11961) @ttnghia
    • Add method argument to DataFrame.quantile (#11957) @rjzamora
    • Add gpu memory watermark apis to JNI (#11950) @abellina
    • Adds retryCount to RmmEventHandler.onAllocFailure (#11940) @abellina
    • Enable returning string data from UDFs used through apply (#11933) @brandon-b-miller
    • Switch over to rapids-cmake patches for thrust (#11921) @robertmaynard
    • Add strings udf C++ classes and functions for phase II (#11912) @davidwendt
    • Trim quotes for non-string values in nested json parsing (#11898) @karthikeyann
    • Enable CEC for strings_udf (#11884) @brandon-b-miller
    • ArrowIPCTableWriter writes en empty batch in the case of an empty table. (#11883) @firestarman
    • Implement chunked Parquet reader (#11867) @ttnghia
    • Add read_orc_metadata to libcudf (#11815) @vuule
    • Support nested types as groupby keys in libcudf (#11792) @PointKernel
    • Adding feature Truncate to DataFrame and Series (#11435) @VamsiTallam95

    🛠️ Improvements

    • Reduce number of tests marked spilling (#12197) @madsbk
    • Pin dask and distributed for release (#12165) @galipremsagar
    • Don't rely on GNU find in headers_test.sh (#12164) @wence-
    • Update cp.clip call (#12148) @quasiben
    • Enable automatic column projection in groupby().agg (#12124) @rjzamora
    • Refactor purge_nonempty_nulls (#12111) @ttnghia
    • Create an int8 column in read_csv when all elements are missing (#12110) @vuule
    • Spilling to host memory (#12106) @madsbk
    • First pass of pd.read_orc changes in tests (#12103) @galipremsagar
    • Expose engine argument in dask_cudf.read_json (#12101) @rjzamora
    • Remove CUDA 10 compatibility code. (#12088) @bdice
    • Move and update dask nigthly install in CI (#12082) @galipremsagar
    • Throw an error when libcudf is built without cuFile and LIBCUDF_CUFILE_POLICY is set to &quot;ALWAYS&quot; (#12080) @vuule
    • Remove macros that inspect the contents of exceptions (#12076) @vyasr
    • Fix ingest_raw_data performance issue in Nested JSON reader due to RVO (#12070) @karthikeyann
    • Remove overflow error during decimal binops (#12063) @galipremsagar
    • Change cudf::detail::tdigest to cudf::tdigest::detail (#12050) @davidwendt
    • Fix quantile gtests coded in namespace cudf::test (#12049) @davidwendt
    • Add support for DataFrame.from_dict`to_dictandSeries.to_dict` (#12048) @galipremsagar
    • Refactor Parquet reader (#12046) @ttnghia
    • Forward merge 22.10 into 22.12 (#12045) @vyasr
    • Standardize newlines at ends of files. (#12042) @bdice
    • Trim trailing whitespace from all files. (#12041) @bdice
    • Use nosync policy in gather and scatter implementations. (#12038) @bdice
    • Remove smart quotes from all docstrings. (#12035) @bdice
    • Update cuda-python dependency to 11.7.1 (#12030) @galipremsagar
    • Add cython-lint to pre-commit checks. (#12020) @bdice
    • Use pragma once (#12019) @bdice
    • New GHA to add issues/prs to project board (#12016) @jarmak-nv
    • Add DataFrame.pivot_table. (#12015) @bdice
    • Rollback of DeviceBufferLike (#12009) @madsbk
    • Remove default parameters for nvtext::detail functions (#12007) @davidwendt
    • Remove default parameters for cudf::dictionary::detail functions (#12006) @davidwendt
    • Remove unused managed_allocator (#12005) @vyasr
    • Remove default parameters for cudf::strings::detail functions (#12003) @davidwendt
    • Remove unnecessary code from dask-cudf _Frame (#12001) @rjzamora
    • Ignore python docs build artifacts (#12000) @galipremsagar
    • Use rapids-cmake for google benchmark. (#11997) @vyasr
    • Leverage rapids_cython for more automated RPATH handling (#11996) @vyasr
    • Remove stale labeler (#11995) @raydouglass
    • Move protobuf compilation to CMake (#11986) @vyasr
    • Replace most of preprocessor usage in nvcomp adapter with constexpr (#11980) @vuule
    • Add missing noexcepts to column_in_metadata methods (#11973) @vyasr
    • Pass column names to write_csv instead of table_metadata pointer (#11972) @vuule
    • Accelerate libcudf segmented sort with CUB segmented sort (#11969) @davidwendt
    • Feature/remove default streams (#11967) @vyasr
    • Add pool memory resource to libcudf basic example (#11966) @davidwendt
    • Fix some libcudf calls to cudf::detail::gather (#11963) @davidwendt
    • Accept const refs instead of const unique_ptr refs in reduce and scan APIs. (#11960) @vyasr
    • Add deprecation warning for set_allocator. (#11958) @vyasr
    • Fix lists and structs gtests coded in namespace cudf::test (#11956) @davidwendt
    • Add full page indexes to Parquet writer benchmarks (#11955) @etseidl
    • Use gather-based strings factory in cudf::strings::strip (#11954) @davidwendt
    • Default to equal NaNs in make_merge_sets_aggregation. (#11952) @bdice
    • Add strip_delimiters option to read_text (#11946) @upsj
    • Refactor multibyte_split output_builder (#11945) @upsj
    • Remove validation that requires introspection (#11938) @vyasr
    • Add .str.find_multiple API (#11928) @galipremsagar
    • Add regex_program class for use with all regex APIs (#11927) @davidwendt
    • Enable backend dispatching for Dask-DataFrame creation (#11920) @rjzamora
    • Performance improvement in JSON Tree traversal (#11919) @karthikeyann
    • Fix some gtests incorrectly coded in namespace cudf::test (part I) (#11917) @davidwendt
    • Refactor pad/zfill functions for reuse with strings udf (#11914) @davidwendt
    • Add nanosecond & microsecond to DatetimeProperties (#11911) @galipremsagar
    • Pin mimesis version in setup.py. (#11906) @bdice
    • Error on ListColumn or any new unsupported column in cudf.Index (#11902) @galipremsagar
    • Add thrust output iterator fix (1805) to thrust.patch (#11900) @davidwendt
    • Relax codecov threshold diff (#11899) @galipremsagar
    • Use public APIs in STREAM_COMPACTION_NVBENCH (#11892) @GregoryKimball
    • Add coverage for string UDF tests. (#11891) @vyasr
    • Provide data_chunk_source wrapper for datasource (#11886) @upsj
    • Handle multibyte_split byte_range out-of-bounds offsets on host (#11885) @upsj
    • Add tests ensuring that cudf's default stream is always used (#11875) @vyasr
    • Change expect_strings_empty into expect_column_empty libcudf test utility (#11873) @davidwendt
    • Add ngroup (#11871) @shwina
    • Reduce memory usage in nested JSON parser - tree generation (#11864) @karthikeyann
    • Unpin dask and distributed for development (#11859) @galipremsagar
    • Remove unused includes for table/row_operators (#11857) @GregoryKimball
    • Use conda-forge's pyorc (#11855) @jakirkham
    • Add libcudf strings examples (#11849) @davidwendt
    • Remove cudf_io namespace alias (#11827) @vuule
    • Test/remove thrust vector usage (#11813) @vyasr
    • Add BGZIP reader to python read_text (#11802) @upsj
    • Merge branch-22.10 into branch-22.12 (#11801) @davidwendt
    • Fix compile warning from CUDF_FUNC_RANGE in a member function (#11798) @davidwendt
    • Update cudf JNI version to 22.12.0-SNAPSHOT (#11764) @pxLi
    • Update flake8 to 5.0.4 and use flake8-force to check Cython. (#11736) @bdice
    • Add BGZIP multibyte_split benchmark (#11723) @upsj
    • Bifurcate Dependency Lists (#11674) @bdice
    • Default to equal NaNs in make_collect_set_aggregation. (#11621) @bdice
    • Conform "bench_isin" to match generator column names (#11549) @GregoryKimball
    • Removing int8 column option from parquet byte_array writing (#11539) @hyperbolic2346
    • Add checks for HLG layers in dask-cudf groupby tests (#10853) @charlesbluca
    • part1: Simplify BaseIndex to an abstract class (#10389) @skirui-source
    • Make all nvcc warnings into errors (#8916) @trxcllnt
    Source code(tar.gz)
    Source code(zip)
  • v23.02.00a(Nov 18, 2022)

    🔗 Links

    🚨 Breaking Changes

    • Add trailing comma support for nested JSON reader (#12448) @karthikeyann
    • Upgrade to arrow-10.0.1 (#12327) @galipremsagar
    • Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
    • CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann
    • Remove deprecated code for 23.02 (#12281) @vyasr
    • Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
    • Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
    • Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia
    • Remove JIT type names, refactor id_to_type. (#12158) @bdice
    • Floor division uses integer division for integral arguments (#12131) @wence-

    🐛 Bug Fixes

    • Enable metadata transfer for complex types in transpose (#12491) @galipremsagar
    • Fix missing metadata transfer in concat for ListColumn (#12487) @galipremsagar
    • Fix compile issue with arrow 10 (#12465) @ttnghia
    • Fix xfail incompatibilities (#12423) @vyasr
    • Fix bug in Parquet column index encoding (#12404) @etseidl
    • When building Arrow shared look for a shared OpenSSL (#12396) @robertmaynard
    • Fix get_json_object to return empty column on empty input (#12384) @davidwendt
    • Pin arrow 9 in testing dependencies to prevent conda solve issues (#12377) @vyasr
    • Fix reductions any/all return value for empty input (#12374) @davidwendt
    • Fix debug compile errors in parquet.hpp (#12372) @davidwendt
    • Purge non-empty nulls in cudf::make_lists_column (#12370) @ttnghia
    • Use correct memory resource in io::make_column (#12364) @vyasr
    • Add code to detect possible malformed page data in parquet files. (#12360) @nvdbaranec
    • Fail loudly to avoid data corruption with unsupported input in read_orc (#12325) @vuule
    • Fix NumericPairIteratorTest for float values (#12306) @davidwendt
    • Fixes memory allocation in nested JSON tokenizer (#12300) @elstehle
    • Fix regex \A and \Z to strictly match string begin/end (#12282) @davidwendt
    • Fix compile issue in json_chunked_reader.cpp (#12280) @ttnghia
    • Change reductions any/all to return valid values for empty input (#12279) @davidwendt
    • Only exclude join keys that are indices from key columns (#12271) @wence-
    • Fix spill to device limit (#12252) @madsbk
    • Correct behaviour of sort in concat for singleton concatenations (#12247) @wence-
    • Purge non-empty nulls for superimpose_nulls and push_down_nulls (#12239) @ttnghia
    • Patch CUB DeviceSegmentedSort and remove workaround (#12234) @davidwendt
    • Fix memory leak in udf_string::assign(&&) function (#12206) @davidwendt
    • Workaround thrust-copy-if limit in json get_tree_representation (#12190) @davidwendt
    • Fix page size calculation in Parquet writer (#12182) @etseidl
    • Add cudf::detail::sizes_to_offsets_iterator to allow checking overflow in offsets (#12180) @davidwendt
    • Workaround thrust-copy-if limit in wordpiece-tokenizer (#12168) @davidwendt
    • Floor division uses integer division for integral arguments (#12131) @wence-

    📖 Documentation

    • Link unsupported iteration API docstrings (#12482) @galipremsagar
    • strings_udf doc update (#12469) @brandon-b-miller
    • Update cudf_assert docs with correct NDEBUG behavior (#12464) @robertmaynard
    • Update pre-commit hooks guide (#12395) @bdice
    • Update test docs to not use detail comparison utilities (#12332) @PointKernel
    • Fix doxygen description for regex_program::compute_working_memory_size (#12329) @davidwendt
    • Add eval to docs. (#12322) @vyasr
    • Turn on xfail_strict=true (#12244) @wence-
    • Update 10 minutes to cuDF (#12114) @wence-

    🚀 New Features

    • one_hot_encode to use experimental row comparators (#12478) @divyegala
    • Refactor thrust_copy_if into cudf::detail::copy_if_safe (#12455) @ttnghia
    • Add trailing comma support for nested JSON reader (#12448) @karthikeyann
    • Extract tokenize_json.hpp detail header from src/io/json/nested_json.hpp (#12432) @ttnghia
    • JNI bindings to write CSV (#12425) @mythrocks
    • Implement lists::reverse (#12336) @ttnghia
    • Use device_read in experimental read_json (#12314) @vuule
    • Implement JNI for strings::reverse (#12283) @ttnghia
    • Null element for parsing error in numeric types in JSON, CSV reader (#12272) @karthikeyann
    • Add cudf::strings:like function with multiple patterns (#12269) @davidwendt
    • Add environment variable to control host memory allocation in hostdevice_vector (#12251) @vuule
    • Add cudf::strings::reverse function (#12227) @davidwendt
    • Support replace in strings_udf (#12207) @brandon-b-miller
    • Add support to read binary encoded decimals in parquet (#12205) @PointKernel
    • Support regex EOL where the string ends with a new-line character (#12181) @davidwendt

    🛠️ Improvements

    • Stop using pandas._testing (#12492) @vyasr
    • Rework nvtext::generate_character_ngrams to use make_strings_children (#12480) @davidwendt
    • Raise warnings as errors in the test suite (#12468) @vyasr
    • Remove int32 hard-coding in python (#12467) @galipremsagar
    • Use cudaMemcpyDefault. (#12466) @bdice
    • Update workflows for nightly tests (#12462) @ajschmidt8
    • JNI build image default as cuda11.8 (#12441) @pxLi
    • Re-enable Recently Updated Check (#12435) @ajschmidt8
    • Rework remaining cudf::strings::from_xyz functions to use make_strings_children (#12434) @vuule
    • Remove arguments for checking exception messages in Python (#12424) @vyasr
    • Clean up cuco usage (#12421) @PointKernel
    • Fix warnings in remaining modules (#12406) @vyasr
    • Update ops-bot.yaml (#12402) @ajschmidt8
    • Rework cudf::strings::integers_to_ipv4 to use make_strings_children utility (#12401) @davidwendt
    • Expose the RMM pool size in JNI (#12390) @revans2
    • Fix COPYING_TEST: gtests coded in namespace cudf::test (#12387) @davidwendt
    • Rework cudf::strings::url_encode to use make_strings_children utility (#12385) @davidwendt
    • Fix warnings in test_datetime.py (#12381) @vyasr
    • Fix warnings in dataframe.py (#12369) @vyasr
    • Update conda recipes. (#12368) @bdice
    • Use gpu-latest-1 runner tag (#12366) @bdice
    • Rework cudf::strings::from_booleans to use make_strings_children (#12365) @vuule
    • Fix warnings in test modules up to test_dataframe.py (#12355) @vyasr
    • Accelerate stable-segmented-sort with CUB segmented sort (#12347) @davidwendt
    • Enable max compression ratio small block optimization for ZSTD (#12338) @vuule
    • Fix warnings in test_monotonic.py (#12334) @vyasr
    • Improve JSON column creation performance (list offsets) (#12330) @karthikeyann
    • Upgrade to arrow-10.0.1 (#12327) @galipremsagar
    • Fix warnings in test_orc.py (#12326) @vyasr
    • Fix warnings in test_groupby.py (#12324) @vyasr
    • Fix test_notebooks.sh (#12323) @ajschmidt8
    • Fix transform gtests coded in namespace cudf::test (#12321) @davidwendt
    • Fix check_style.sh script (#12320) @ajschmidt8
    • Rework cudf::strings::from_timestamps to use make_strings_children (#12317) @davidwendt
    • Fix warnings in test_index.py (#12313) @vyasr
    • Fix warnings in test_multiindex.py (#12310) @vyasr
    • CSV, JSON reader to infer integer column with nulls as int64 instead of float64 (#12309) @karthikeyann
    • Fix warnings in test_indexing.py (#12305) @vyasr
    • Fix warnings in test_joining.py (#12304) @vyasr
    • Unpin dask and distributed for development (#12302) @galipremsagar
    • Re-enable sccache for Jenkins builds (#12297) @ajschmidt8
    • Define needs for pr-builder workflow. (#12296) @bdice
    • Forward merge 22.12 into 23.02 (#12294) @vyasr
    • Fix warnings in test_stats.py (#12293) @vyasr
    • Fix table gtests coded in namespace cudf::test (#12292) @davidwendt
    • Change cython for regex calls to use cudf::strings::regex_program (#12289) @davidwendt
    • Improved error reporting when reading multiple JSON files (#12285) @vuule
    • Deprecate Frame.sum_of_squares (#12284) @vyasr
    • Remove deprecated code for 23.02 (#12281) @vyasr
    • Clean up handling of max_page_size_bytes in Parquet writer (#12277) @etseidl
    • Fix replace gtests coded in namespace cudf::test (#12270) @davidwendt
    • Add pandas nullable type support in Index.to_pandas (#12268) @galipremsagar
    • Rework nvtext::detokenize to use indexalator for row indices (#12267) @davidwendt
    • Fix reduction gtests coded in namespace cudf::test (#12257) @davidwendt
    • Remove default parameters from cudf::detail::sort function declarations (#12254) @davidwendt
    • Add duplicated support for Series, DataFrame and Index (#12246) @galipremsagar
    • Replace column/table test utilities with macros (#12242) @PointKernel
    • Rework cudf::strings::pad and zfill to use make_strings_children (#12238) @davidwendt
    • Fix sort gtests coded in namespace cudf::test (#12237) @davidwendt
    • Wrapping concat and file writes in @acquire_spill_lock() (#12232) @madsbk
    • Rename cudf::structs::detail::superimpose_parent_nulls APIs (#12230) @ttnghia
    • Cover parsing to decimal types in read_json tests (#12229) @vuule
    • Spill Statistics (#12223) @madsbk
    • Use CUDF_JNI_ENABLE_PROFILING to conditionally enable profiling support. (#12221) @bdice
    • Clean up of test_spilling.py (#12220) @madsbk
    • Simplify repetitive boolean logic (#12218) @vuule
    • Add Series.hasnans and Index.hasnans (#12214) @galipremsagar
    • Add cudf::strings:udf::replace function (#12210) @davidwendt
    • Adds in new java APIs for appending byte arrays to host columnar data (#12208) @revans2
    • Remove Python dependencies from Java CI. (#12193) @bdice
    • Fix null order in sort-based groupby and improve groupby tests (#12191) @divyegala
    • Move strings children functions from cudf/strings/detail/utilities.cuh to new header (#12185) @davidwendt
    • Clean up existing JNI scalar to column code (#12173) @revans2
    • Remove JIT type names, refactor id_to_type. (#12158) @bdice
    • Update JNI version to 23.02.0-SNAPSHOT (#12129) @pxLi
    • Minor refactor of cpp/src/io/parquet/page_data.cu (#12126) @etseidl
    • Add codespell as a linter (#12097) @benfred
    • Enable specifying exceptions in error macros (#12078) @vyasr
    • Move _label_encoding from Series to Column (#12040) @shwina
    • Add GitHub Actions Workflows (#12002) @ajschmidt8
    • Consolidate dask-cudf groupby_agg calls in one place (#10835) @charlesbluca
    Source code(tar.gz)
    Source code(zip)
  • v22.10.01(Nov 3, 2022)

    🚨 Breaking Changes

    • Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
    • Disable nvCOMP DEFLATE integration (#11811) @vuule
    • Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
    • Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
    • Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
    • Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
    • Update zfill to match Python output (#11634) @davidwendt
    • Upgrade pandas to 1.5 (#11617) @galipremsagar
    • Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
    • Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
    • Adding optional parquet reader schema (#11524) @hyperbolic2346
    • Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
    • Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
    • Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
    • Disable Arrow S3 support by default. (#11470) @bdice
    • Convert thrust::optional usages to std::optional (#11455) @robertmaynard
    • Remove unused is_struct trait. (#11450) @bdice
    • Refactor the Buffer class (#11447) @madsbk
    • Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
    • Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
    • Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
    • Use the new JSON parser when the experimental reader is selected (#11364) @vuule
    • Remove deprecated Series.applymap. (#11031) @bdice
    • Remove deprecated expand parameter from str.findall. (#11030) @bdice

    🐛 Bug Fixes

    • Update cuda-python dependency to 11.7.1 (#11994) @shwina
    • Fixes bug in temporary decompression space estimation before calling nvcomp (#11879) @abellina
    • Handle ptx file paths during strings_udf import (#11862) @galipremsagar
    • Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
    • Reset strings_udf CEC and solve several related issues (#11846) @brandon-b-miller
    • Fix bug in new shuffle-based groupby implementation (#11836) @rjzamora
    • Fix is_valid checks in Scalar._binaryop (#11818) @wence-
    • Fix operator NotImplemented issue with numpy (#11816) @galipremsagar
    • Disable nvCOMP DEFLATE integration (#11811) @vuule
    • Build strings_udf package with other python packages in nightlies (#11808) @brandon-b-miller
    • Revert problematic shuffle=explicit-comms changes (#11803) @rjzamora
    • Fix regex out-of-bounds write in strided rows logic (#11797) @davidwendt
    • Build cudf locally before building strings_udf conda packages in CI (#11785) @brandon-b-miller
    • Fix an issue in cudf::row_bit_count involving structs and lists at multiple levels. (#11779) @nvdbaranec
    • Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
    • Fix issue with set-item incase of list and struct types (#11760) @galipremsagar
    • Ensure all libcudf APIs run on cudf's default stream (#11759) @vyasr
    • Resolve dask_cudf failures caused by upstream groupby changes (#11755) @rjzamora
    • Fix ORC string sum statistics (#11740) @vuule
    • Add strings_udf package for python 3.9 (#11730) @brandon-b-miller
    • Ensure that all tests launch kernels on cudf's default stream (#11726) @vyasr
    • Don't assume stream is a compile-time constant expression (#11725) @vyasr
    • Fix get_thrust.cmake format at patch command (#11715) @davidwendt
    • Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
    • Fix cudf::lists::sort_lists for NaN and Infinity values (#11703) @davidwendt
    • Modify ORC reader timestamp parsing to match the apache reader behavior (#11699) @vuule
    • Fix DataFrame.from_arrow to preserve type metadata (#11698) @galipremsagar
    • Fix compile error due to missing header (#11697) @ttnghia
    • Default to Snappy compression in to_orc when using cuDF or Dask (#11690) @vuule
    • Fix an issue related to Multindex when group_keys=True (#11689) @galipremsagar
    • Transfer correct dtype to exploded column (#11687) @wence-
    • Ignore protobuf generated files in mypy checks (#11685) @galipremsagar
    • Maintain the index name after .loc (#11677) @shwina
    • Fix issue with extracting nested column data & dtype preservation (#11671) @galipremsagar
    • Ensure that all cudf tests and benchmarks are conda env aware (#11666) @robertmaynard
    • Update to Thrust 1.17.2 to fix cub ODR issues (#11665) @robertmaynard
    • Fix multi-file remote datasource bug (#11655) @rjzamora
    • Fix invalid regex quantifier check to not include alternation (#11654) @davidwendt
    • Fix bug in device_write(): it uses an incorrect size (#11651) @madsbk
    • fixes overflows in benchmarks (#11649) @elstehle
    • Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
    • Fix compile error in benchmark nested_json.cpp (#11637) @davidwendt
    • Update zfill to match Python output (#11634) @davidwendt
    • Removed converted type for INT32 and INT64 since they do not convert (#11627) @hyperbolic2346
    • Fix host scalars construction of nested types (#11612) @galipremsagar
    • Fix compile warning in nested_json_gpu.cu (#11607) @davidwendt
    • Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
    • Preserve order if necessary when deduping categoricals internally (#11597) @brandon-b-miller
    • Add is_timestamp test for leap second (60) (#11594) @davidwendt
    • Fix an issue with to_arrow when column name type is not a string (#11590) @galipremsagar
    • Fix exception in segmented-reduce benchmark (#11588) @davidwendt
    • Fix encode/decode of negative timestamps in ORC reader/writer (#11586) @vuule
    • Correct distribution data type in quantiles benchmark (#11584) @vuule
    • Fix multibyte_split benchmark for host buffers (#11583) @upsj
    • xfail custreamz display test for now (#11567) @shwina
    • Fix JNI for TableWithMeta to use schema_info instead of column_names (#11566) @jlowe
    • Reduce code duplication for dask & distributed nightly/stable installs (#11565) @galipremsagar
    • Fix groupby failures in dask_cudf CI (#11561) @rjzamora
    • Fix for pivot: error when 'values' is a multicharacter string (#11538) @shaswat-indian
    • find_package(cudf) + arrow9 usable with cudf build directory (#11535) @robertmaynard
    • Fixing crash when writing binary nested data in parquet (#11526) @hyperbolic2346
    • Fix for: error when assigning a value to an empty series (#11523) @shaswat-indian
    • Fix invalid results from conditional-left-anti-join in debug build (#11517) @davidwendt
    • Fix cmake error after upgrading to Arrow 9 (#11513) @ttnghia
    • Fix reverse binary operators acting on a host value and cudf.Scalar (#11512) @bdice
    • Update parquet fuzz tests to drop support for skiprows & num_rows (#11505) @galipremsagar
    • Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#11493) @robertmaynard
    • Handle some zero-sized corner cases in dlpack interop (#11449) @wence-
    • Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
    • libcudf c++ example updated to CPM version 0.35.3 (#11417) @robertmaynard
    • Fix regex quantifier check to include capture groups (#11373) @davidwendt
    • Fix read_text when byte_range is aligned with field (#11371) @upsj
    • Fix to_timestamps truncated subsecond calculation (#11367) @davidwendt
    • column: calculate null_count before release()ing the cudf::column (#11365) @wence-

    📖 Documentation

    • Update guide-to-udfs notebook (#11861) @brandon-b-miller
    • Update docstring for cudf.read_text (#11799) @GregoryKimball
    • Add doc section for list & struct handling (#11770) @galipremsagar
    • Document that minimum required CMake version is now 3.23.1 (#11751) @robertmaynard
    • Update libcudf documentation build command in DOCUMENTATION.md (#11735) @davidwendt
    • Add docs for use of string data to DataFrame.apply and Series.apply and update guide to UDFs notebook (#11733) @brandon-b-miller
    • Enable more Pydocstyle rules (#11582) @bdice
    • Remove unused cpp/img folder (#11554) @davidwendt
    • Publish C++ developer docs (#11475) @vyasr
    • Fix a misalignment in cudf.get_dummies docstring (#11443) @galipremsagar
    • Update contributing doc to include links to the developer guides (#11390) @davidwendt
    • Fix table_view_base doxygen format (#11340) @davidwendt
    • Create main developer guide for Python (#11235) @vyasr
    • Add developer documentation for benchmarking (#11122) @vyasr
    • cuDF error handling document (#7917) @isVoid

    🚀 New Features

    • Add hasNull statistic reading ability to ORC (#11747) @devavret
    • Add istitle to string UDFs (#11738) @brandon-b-miller
    • JSON Column creation in GPU (#11714) @karthikeyann
    • Adds option to take explicit nested schema for nested JSON reader (#11682) @elstehle
    • Add BGZIP data_chunk_reader (#11652) @upsj
    • Support DECIMAL order-by for RANGE window functions (#11645) @mythrocks
    • changing version of cmake to 3.23.3 (#11619) @hyperbolic2346
    • Generate unique keys table in java JNI contiguousSplitGroups (#11614) @res-life
    • Generic type casting to support the new nested JSON reader (#11613) @elstehle
    • JSON tree traversal (#11610) @karthikeyann
    • Add casting operators to masked UDFs (#11578) @brandon-b-miller
    • Adds type inference and type conversion for leaf-columns to the nested JSON parser (#11574) @elstehle
    • Add strings 'like' function (#11558) @davidwendt
    • Handle hyphen as literal for regex cclass when incomplete range (#11557) @davidwendt
    • Enable ZSTD compression in ORC and Parquet writers (#11551) @vuule
    • Adds support for json lines format to the nested JSON reader (#11534) @elstehle
    • Adding optional parquet reader schema (#11524) @hyperbolic2346
    • Adds GPU implementation of JSON-token-stream to JSON-tree (#11518) @karthikeyann
    • Add gdb pretty-printers for simple types (#11499) @upsj
    • Add create_random_column function to the data generator (#11490) @vuule
    • Add fluent API builder to data_profile (#11479) @vuule
    • Adds Nested Json benchmark (#11466) @karthikeyann
    • Convert thrust::optional usages to std::optional (#11455) @robertmaynard
    • Python API for the future experimental JSON reader (#11426) @vuule
    • Return schema info from JSON reader (#11419) @vuule
    • Add regex ASCII flag support for matching builtin character classes (#11404) @davidwendt
    • Truncate parquet column indexes (#11403) @etseidl
    • Adds the end-to-end JSON parser implementation (#11388) @elstehle
    • Use the new JSON parser when the experimental reader is selected (#11364) @vuule
    • Add placeholder for the experimental JSON reader (#11334) @vuule
    • Add read-only functions on string dtypes to DataFrame.apply and Series.apply (#11319) @brandon-b-miller
    • Added 'crosstab' and 'pivot_table' features (#11314) @shaswat-indian
    • Quickly error out when trying to build with unsupported nvcc versions (#11297) @robertmaynard
    • Adds JSON tokenizer (#11264) @elstehle
    • List lexicographic comparator (#11129) @devavret
    • Add generic type inference for cuIO (#11121) @PointKernel
    • Fully support nested types in cudf::contains (#10656) @ttnghia
    • Support nested types in lists::contains (#10548) @ttnghia

    🛠️ Improvements

    • Pin dask and distributed for release (#11822) @galipremsagar
    • Add examples for Nested JSON reader (#11814) @GregoryKimball
    • Support shuffle-based groupby aggregations in dask_cudf (#11800) @rjzamora
    • Update strings udf version updater script (#11772) @galipremsagar
    • Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
    • Pass dtype param to avoid pd.Series warnings (#11761) @galipremsagar
    • Enable schema_element & keep_quotes support in json reader (#11746) @galipremsagar
    • Add ability to construct ListColumn when size is None (#11745) @galipremsagar
    • Reduces memory requirements in JSON parser and adds bytes/s and peak memory usage to benchmarks (#11732) @elstehle
    • Add missing copyright headers. (#11712) @bdice
    • Fix copyright check issues in pre-commit (#11711) @bdice
    • Include decimal in supported types for range window order-by columns (#11710) @mythrocks
    • Disable very large column gtest for contiguous-split (#11706) @davidwendt
    • Drop split_out=None test from groupby.agg (#11704) @wence-
    • Use CubinLinker for CUDA Minor Version Compatibility (#11701) @gmarkall
    • Add regex capture-group parameter to auto convert to non-capture groups (#11695) @davidwendt
    • Add a __dataframe__ method to the protocol dataframe object (#11692) @rgommers
    • Special-case multibyte_split for single-byte delimiter (#11681) @upsj
    • Remove isort exclusions (#11680) @bdice
    • Refactor CSV reader benchmarks with nvbench (#11678) @PointKernel
    • Check conda recipe headers with pre-commit (#11669) @bdice
    • Remove redundant style check for clang-format. (#11668) @bdice
    • Add support for group_keys in groupby (#11659) @galipremsagar
    • Fix pandoc pinning. (#11658) @bdice
    • Revert removal of skip_rows / num_rows options from the Parquet reader. (#11657) @nvdbaranec
    • Update git metadata (#11647) @bdice
    • Call set_null_count on a returning column if null-count is known (#11646) @davidwendt
    • Fix some libcudf detail calls not passing the stream variable (#11642) @davidwendt
    • Update to mypy 0.971 (#11640) @wence-
    • Refactor strings strip functor to details header (#11635) @davidwendt
    • Fix incorrect nullCount in get_json_object (#11633) @trxcllnt
    • Simplify hostdevice_vector (#11631) @upsj
    • Refactor parquet writer benchmarks with nvbench (#11623) @PointKernel
    • Rework contains_scalar to check nulls at runtime (#11622) @davidwendt
    • Fix incorrect memory resource used in rolling temp columns (#11618) @mythrocks
    • Upgrade pandas to 1.5 (#11617) @galipremsagar
    • Move type-dispatcher calls from traits.hpp to traits.cpp (#11616) @davidwendt
    • Refactor parquet reader benchmarks with nvbench (#11611) @PointKernel
    • Forward-merge branch-22.08 to branch-22.10 (#11608) @bdice
    • Use stream in Java API. (#11601) @bdice
    • Refactors of public/detail APIs, CUDF_FUNC_RANGE, stream handling. (#11600) @bdice
    • Improve ORC writer benchmark with nvbench (#11598) @PointKernel
    • Tune multibyte_split kernel (#11587) @upsj
    • Move split_utils.cuh to strings/detail (#11585) @davidwendt
    • Fix warnings due to compiler regression with if constexpr (#11581) @ttnghia
    • Add full 24-bit dictionary support to Parquet writer (#11580) @etseidl
    • Expose "explicit-comms" option in shuffle-based dask_cudf functions (#11576) @rjzamora
    • Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
    • Refactor dask_cudf groupby to use apply_concat_apply (#11571) @rjzamora
    • Add ability to write list(struct) columns as map type in orc writer (#11568) @galipremsagar
    • Add byte_range to multibyte_split benchmark + NVBench refactor (#11562) @upsj
    • JNI support for writing binary columns in parquet (#11556) @revans2
    • Support additional dictionary bit widths in Parquet writer (#11547) @etseidl
    • Refactor string/numeric conversion utilities (#11545) @davidwendt
    • Removing unnecessary asserts in parquet tests (#11544) @hyperbolic2346
    • Clean up ORC reader benchmarks with NVBench (#11543) @PointKernel
    • Reuse MurmurHash3_32 in Parquet page data. (#11528) @bdice
    • Add hexadecimal value separators (#11527) @bdice
    • Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
    • Struct support for NULL_EQUALS binary operation (#11520) @rwlee
    • Bump hadoop-common from 3.2.3 to 3.2.4 in /java (#11516) @dependabot[bot]
    • Fix Feather test warning. (#11511) @bdice
    • copy_range ballot_syncs to have no execution dependency (#11508) @robertmaynard
    • Upgrade to arrow-9.x (#11507) @galipremsagar
    • Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
    • Single-pass multibyte_split (#11500) @upsj
    • Sanitize percentile_approx() output for empty input (#11498) @SrikarVanavasam
    • Unpin dask and distributed for development (#11492) @galipremsagar
    • Move SparkMurmurHash3_32 functor. (#11489) @bdice
    • Refactor group_nunique.cu to use nullate::DYNAMIC for reduce-by-key functor (#11482) @davidwendt
    • Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
    • Add reduction distinct_count benchmark (#11473) @ttnghia
    • Add groupby nunique aggregation benchmark (#11472) @ttnghia
    • Disable Arrow S3 support by default. (#11470) @bdice
    • Add groupby max aggregation benchmark (#11464) @ttnghia
    • Extract Dremel encoding code from Parquet (#11461) @vyasr
    • Add missing Thrust #includes. (#11457) @bdice
    • Make CMake hooks verbose (#11456) @vyasr
    • Control Parquet page size through Python API (#11454) @etseidl
    • Add control of Parquet column index creation to python (#11453) @etseidl
    • Remove unused is_struct trait. (#11450) @bdice
    • Refactor the Buffer class (#11447) @madsbk
    • Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
    • Update to Thrust 1.17.0 (#11437) @bdice
    • Add in JNI for parsing JSON data and getting the metadata back too. (#11431) @revans2
    • Convert byte_array_view to use std::byte (#11424) @hyperbolic2346
    • Deprecate unflatten_nested_columns (#11421) @SrikarVanavasam
    • Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
    • Add Spark list hashing Java tests (#11379) @bdice
    • Move cmake to the build section. (#11376) @vyasr
    • Remove use of CUDA driver API calls from libcudf (#11370) @shwina
    • Add column constructor from device_uvector&& (#11356) @SrikarVanavasam
    • Remove unused custreamz thirdparty directory (#11343) @vyasr
    • Update jni version to 22.10.0-SNAPSHOT (#11338) @pxLi
    • Enable using upstream jitify2 (#11287) @shwina
    • Cache cudf.Scalar (#11246) @shwina
    • Remove deprecated Series.applymap. (#11031) @bdice
    • Remove deprecated expand parameter from str.findall. (#11030) @bdice
    Source code(tar.gz)
    Source code(zip)
  • v22.10.00(Oct 12, 2022)

    🚨 Breaking Changes

    • Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
    • Disable nvCOMP DEFLATE integration (#11811) @vuule
    • Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
    • Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
    • Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
    • Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
    • Update zfill to match Python output (#11634) @davidwendt
    • Upgrade pandas to 1.5 (#11617) @galipremsagar
    • Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
    • Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
    • Adding optional parquet reader schema (#11524) @hyperbolic2346
    • Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
    • Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
    • Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
    • Disable Arrow S3 support by default. (#11470) @bdice
    • Convert thrust::optional usages to std::optional (#11455) @robertmaynard
    • Remove unused is_struct trait. (#11450) @bdice
    • Refactor the Buffer class (#11447) @madsbk
    • Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
    • Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
    • Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
    • Use the new JSON parser when the experimental reader is selected (#11364) @vuule
    • Remove deprecated Series.applymap. (#11031) @bdice
    • Remove deprecated expand parameter from str.findall. (#11030) @bdice

    🐛 Bug Fixes

    • Fixes bug in temporary decompression space estimation before calling nvcomp (#11879) @abellina
    • Handle ptx file paths during strings_udf import (#11862) @galipremsagar
    • Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
    • Reset strings_udf CEC and solve several related issues (#11846) @brandon-b-miller
    • Fix bug in new shuffle-based groupby implementation (#11836) @rjzamora
    • Fix is_valid checks in Scalar._binaryop (#11818) @wence-
    • Fix operator NotImplemented issue with numpy (#11816) @galipremsagar
    • Disable nvCOMP DEFLATE integration (#11811) @vuule
    • Build strings_udf package with other python packages in nightlies (#11808) @brandon-b-miller
    • Revert problematic shuffle=explicit-comms changes (#11803) @rjzamora
    • Fix regex out-of-bounds write in strided rows logic (#11797) @davidwendt
    • Build cudf locally before building strings_udf conda packages in CI (#11785) @brandon-b-miller
    • Fix an issue in cudf::row_bit_count involving structs and lists at multiple levels. (#11779) @nvdbaranec
    • Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
    • Fix issue with set-item incase of list and struct types (#11760) @galipremsagar
    • Ensure all libcudf APIs run on cudf's default stream (#11759) @vyasr
    • Resolve dask_cudf failures caused by upstream groupby changes (#11755) @rjzamora
    • Fix ORC string sum statistics (#11740) @vuule
    • Add strings_udf package for python 3.9 (#11730) @brandon-b-miller
    • Ensure that all tests launch kernels on cudf's default stream (#11726) @vyasr
    • Don't assume stream is a compile-time constant expression (#11725) @vyasr
    • Fix get_thrust.cmake format at patch command (#11715) @davidwendt
    • Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
    • Fix cudf::lists::sort_lists for NaN and Infinity values (#11703) @davidwendt
    • Modify ORC reader timestamp parsing to match the apache reader behavior (#11699) @vuule
    • Fix DataFrame.from_arrow to preserve type metadata (#11698) @galipremsagar
    • Fix compile error due to missing header (#11697) @ttnghia
    • Default to Snappy compression in to_orc when using cuDF or Dask (#11690) @vuule
    • Fix an issue related to Multindex when group_keys=True (#11689) @galipremsagar
    • Transfer correct dtype to exploded column (#11687) @wence-
    • Ignore protobuf generated files in mypy checks (#11685) @galipremsagar
    • Maintain the index name after .loc (#11677) @shwina
    • Fix issue with extracting nested column data & dtype preservation (#11671) @galipremsagar
    • Ensure that all cudf tests and benchmarks are conda env aware (#11666) @robertmaynard
    • Update to Thrust 1.17.2 to fix cub ODR issues (#11665) @robertmaynard
    • Fix multi-file remote datasource bug (#11655) @rjzamora
    • Fix invalid regex quantifier check to not include alternation (#11654) @davidwendt
    • Fix bug in device_write(): it uses an incorrect size (#11651) @madsbk
    • fixes overflows in benchmarks (#11649) @elstehle
    • Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
    • Fix compile error in benchmark nested_json.cpp (#11637) @davidwendt
    • Update zfill to match Python output (#11634) @davidwendt
    • Removed converted type for INT32 and INT64 since they do not convert (#11627) @hyperbolic2346
    • Fix host scalars construction of nested types (#11612) @galipremsagar
    • Fix compile warning in nested_json_gpu.cu (#11607) @davidwendt
    • Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
    • Preserve order if necessary when deduping categoricals internally (#11597) @brandon-b-miller
    • Add is_timestamp test for leap second (60) (#11594) @davidwendt
    • Fix an issue with to_arrow when column name type is not a string (#11590) @galipremsagar
    • Fix exception in segmented-reduce benchmark (#11588) @davidwendt
    • Fix encode/decode of negative timestamps in ORC reader/writer (#11586) @vuule
    • Correct distribution data type in quantiles benchmark (#11584) @vuule
    • Fix multibyte_split benchmark for host buffers (#11583) @upsj
    • xfail custreamz display test for now (#11567) @shwina
    • Fix JNI for TableWithMeta to use schema_info instead of column_names (#11566) @jlowe
    • Reduce code duplication for dask & distributed nightly/stable installs (#11565) @galipremsagar
    • Fix groupby failures in dask_cudf CI (#11561) @rjzamora
    • Fix for pivot: error when 'values' is a multicharacter string (#11538) @shaswat-indian
    • find_package(cudf) + arrow9 usable with cudf build directory (#11535) @robertmaynard
    • Fixing crash when writing binary nested data in parquet (#11526) @hyperbolic2346
    • Fix for: error when assigning a value to an empty series (#11523) @shaswat-indian
    • Fix invalid results from conditional-left-anti-join in debug build (#11517) @davidwendt
    • Fix cmake error after upgrading to Arrow 9 (#11513) @ttnghia
    • Fix reverse binary operators acting on a host value and cudf.Scalar (#11512) @bdice
    • Update parquet fuzz tests to drop support for skiprows & num_rows (#11505) @galipremsagar
    • Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#11493) @robertmaynard
    • Handle some zero-sized corner cases in dlpack interop (#11449) @wence-
    • Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
    • libcudf c++ example updated to CPM version 0.35.3 (#11417) @robertmaynard
    • Fix regex quantifier check to include capture groups (#11373) @davidwendt
    • Fix read_text when byte_range is aligned with field (#11371) @upsj
    • Fix to_timestamps truncated subsecond calculation (#11367) @davidwendt
    • column: calculate null_count before release()ing the cudf::column (#11365) @wence-

    📖 Documentation

    • Update guide-to-udfs notebook (#11861) @brandon-b-miller
    • Update docstring for cudf.read_text (#11799) @GregoryKimball
    • Add doc section for list & struct handling (#11770) @galipremsagar
    • Document that minimum required CMake version is now 3.23.1 (#11751) @robertmaynard
    • Update libcudf documentation build command in DOCUMENTATION.md (#11735) @davidwendt
    • Add docs for use of string data to DataFrame.apply and Series.apply and update guide to UDFs notebook (#11733) @brandon-b-miller
    • Enable more Pydocstyle rules (#11582) @bdice
    • Remove unused cpp/img folder (#11554) @davidwendt
    • Publish C++ developer docs (#11475) @vyasr
    • Fix a misalignment in cudf.get_dummies docstring (#11443) @galipremsagar
    • Update contributing doc to include links to the developer guides (#11390) @davidwendt
    • Fix table_view_base doxygen format (#11340) @davidwendt
    • Create main developer guide for Python (#11235) @vyasr
    • Add developer documentation for benchmarking (#11122) @vyasr
    • cuDF error handling document (#7917) @isVoid

    🚀 New Features

    • Add hasNull statistic reading ability to ORC (#11747) @devavret
    • Add istitle to string UDFs (#11738) @brandon-b-miller
    • JSON Column creation in GPU (#11714) @karthikeyann
    • Adds option to take explicit nested schema for nested JSON reader (#11682) @elstehle
    • Add BGZIP data_chunk_reader (#11652) @upsj
    • Support DECIMAL order-by for RANGE window functions (#11645) @mythrocks
    • changing version of cmake to 3.23.3 (#11619) @hyperbolic2346
    • Generate unique keys table in java JNI contiguousSplitGroups (#11614) @res-life
    • Generic type casting to support the new nested JSON reader (#11613) @elstehle
    • JSON tree traversal (#11610) @karthikeyann
    • Add casting operators to masked UDFs (#11578) @brandon-b-miller
    • Adds type inference and type conversion for leaf-columns to the nested JSON parser (#11574) @elstehle
    • Add strings 'like' function (#11558) @davidwendt
    • Handle hyphen as literal for regex cclass when incomplete range (#11557) @davidwendt
    • Enable ZSTD compression in ORC and Parquet writers (#11551) @vuule
    • Adds support for json lines format to the nested JSON reader (#11534) @elstehle
    • Adding optional parquet reader schema (#11524) @hyperbolic2346
    • Adds GPU implementation of JSON-token-stream to JSON-tree (#11518) @karthikeyann
    • Add gdb pretty-printers for simple types (#11499) @upsj
    • Add create_random_column function to the data generator (#11490) @vuule
    • Add fluent API builder to data_profile (#11479) @vuule
    • Adds Nested Json benchmark (#11466) @karthikeyann
    • Convert thrust::optional usages to std::optional (#11455) @robertmaynard
    • Python API for the future experimental JSON reader (#11426) @vuule
    • Return schema info from JSON reader (#11419) @vuule
    • Add regex ASCII flag support for matching builtin character classes (#11404) @davidwendt
    • Truncate parquet column indexes (#11403) @etseidl
    • Adds the end-to-end JSON parser implementation (#11388) @elstehle
    • Use the new JSON parser when the experimental reader is selected (#11364) @vuule
    • Add placeholder for the experimental JSON reader (#11334) @vuule
    • Add read-only functions on string dtypes to DataFrame.apply and Series.apply (#11319) @brandon-b-miller
    • Added 'crosstab' and 'pivot_table' features (#11314) @shaswat-indian
    • Quickly error out when trying to build with unsupported nvcc versions (#11297) @robertmaynard
    • Adds JSON tokenizer (#11264) @elstehle
    • List lexicographic comparator (#11129) @devavret
    • Add generic type inference for cuIO (#11121) @PointKernel
    • Fully support nested types in cudf::contains (#10656) @ttnghia
    • Support nested types in lists::contains (#10548) @ttnghia

    🛠️ Improvements

    • Pin dask and distributed for release (#11822) @galipremsagar
    • Add examples for Nested JSON reader (#11814) @GregoryKimball
    • Support shuffle-based groupby aggregations in dask_cudf (#11800) @rjzamora
    • Update strings udf version updater script (#11772) @galipremsagar
    • Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
    • Pass dtype param to avoid pd.Series warnings (#11761) @galipremsagar
    • Enable schema_element & keep_quotes support in json reader (#11746) @galipremsagar
    • Add ability to construct ListColumn when size is None (#11745) @galipremsagar
    • Reduces memory requirements in JSON parser and adds bytes/s and peak memory usage to benchmarks (#11732) @elstehle
    • Add missing copyright headers. (#11712) @bdice
    • Fix copyright check issues in pre-commit (#11711) @bdice
    • Include decimal in supported types for range window order-by columns (#11710) @mythrocks
    • Disable very large column gtest for contiguous-split (#11706) @davidwendt
    • Drop split_out=None test from groupby.agg (#11704) @wence-
    • Use CubinLinker for CUDA Minor Version Compatibility (#11701) @gmarkall
    • Add regex capture-group parameter to auto convert to non-capture groups (#11695) @davidwendt
    • Add a __dataframe__ method to the protocol dataframe object (#11692) @rgommers
    • Special-case multibyte_split for single-byte delimiter (#11681) @upsj
    • Remove isort exclusions (#11680) @bdice
    • Refactor CSV reader benchmarks with nvbench (#11678) @PointKernel
    • Check conda recipe headers with pre-commit (#11669) @bdice
    • Remove redundant style check for clang-format. (#11668) @bdice
    • Add support for group_keys in groupby (#11659) @galipremsagar
    • Fix pandoc pinning. (#11658) @bdice
    • Revert removal of skip_rows / num_rows options from the Parquet reader. (#11657) @nvdbaranec
    • Update git metadata (#11647) @bdice
    • Call set_null_count on a returning column if null-count is known (#11646) @davidwendt
    • Fix some libcudf detail calls not passing the stream variable (#11642) @davidwendt
    • Update to mypy 0.971 (#11640) @wence-
    • Refactor strings strip functor to details header (#11635) @davidwendt
    • Fix incorrect nullCount in get_json_object (#11633) @trxcllnt
    • Simplify hostdevice_vector (#11631) @upsj
    • Refactor parquet writer benchmarks with nvbench (#11623) @PointKernel
    • Rework contains_scalar to check nulls at runtime (#11622) @davidwendt
    • Fix incorrect memory resource used in rolling temp columns (#11618) @mythrocks
    • Upgrade pandas to 1.5 (#11617) @galipremsagar
    • Move type-dispatcher calls from traits.hpp to traits.cpp (#11616) @davidwendt
    • Refactor parquet reader benchmarks with nvbench (#11611) @PointKernel
    • Forward-merge branch-22.08 to branch-22.10 (#11608) @bdice
    • Use stream in Java API. (#11601) @bdice
    • Refactors of public/detail APIs, CUDF_FUNC_RANGE, stream handling. (#11600) @bdice
    • Improve ORC writer benchmark with nvbench (#11598) @PointKernel
    • Tune multibyte_split kernel (#11587) @upsj
    • Move split_utils.cuh to strings/detail (#11585) @davidwendt
    • Fix warnings due to compiler regression with if constexpr (#11581) @ttnghia
    • Add full 24-bit dictionary support to Parquet writer (#11580) @etseidl
    • Expose "explicit-comms" option in shuffle-based dask_cudf functions (#11576) @rjzamora
    • Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
    • Refactor dask_cudf groupby to use apply_concat_apply (#11571) @rjzamora
    • Add ability to write list(struct) columns as map type in orc writer (#11568) @galipremsagar
    • Add byte_range to multibyte_split benchmark + NVBench refactor (#11562) @upsj
    • JNI support for writing binary columns in parquet (#11556) @revans2
    • Support additional dictionary bit widths in Parquet writer (#11547) @etseidl
    • Refactor string/numeric conversion utilities (#11545) @davidwendt
    • Removing unnecessary asserts in parquet tests (#11544) @hyperbolic2346
    • Clean up ORC reader benchmarks with NVBench (#11543) @PointKernel
    • Reuse MurmurHash3_32 in Parquet page data. (#11528) @bdice
    • Add hexadecimal value separators (#11527) @bdice
    • Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
    • Struct support for NULL_EQUALS binary operation (#11520) @rwlee
    • Bump hadoop-common from 3.2.3 to 3.2.4 in /java (#11516) @dependabot[bot]
    • Fix Feather test warning. (#11511) @bdice
    • copy_range ballot_syncs to have no execution dependency (#11508) @robertmaynard
    • Upgrade to arrow-9.x (#11507) @galipremsagar
    • Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
    • Single-pass multibyte_split (#11500) @upsj
    • Sanitize percentile_approx() output for empty input (#11498) @SrikarVanavasam
    • Unpin dask and distributed for development (#11492) @galipremsagar
    • Move SparkMurmurHash3_32 functor. (#11489) @bdice
    • Refactor group_nunique.cu to use nullate::DYNAMIC for reduce-by-key functor (#11482) @davidwendt
    • Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
    • Add reduction distinct_count benchmark (#11473) @ttnghia
    • Add groupby nunique aggregation benchmark (#11472) @ttnghia
    • Disable Arrow S3 support by default. (#11470) @bdice
    • Add groupby max aggregation benchmark (#11464) @ttnghia
    • Extract Dremel encoding code from Parquet (#11461) @vyasr
    • Add missing Thrust #includes. (#11457) @bdice
    • Make CMake hooks verbose (#11456) @vyasr
    • Control Parquet page size through Python API (#11454) @etseidl
    • Add control of Parquet column index creation to python (#11453) @etseidl
    • Remove unused is_struct trait. (#11450) @bdice
    • Refactor the Buffer class (#11447) @madsbk
    • Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
    • Update to Thrust 1.17.0 (#11437) @bdice
    • Add in JNI for parsing JSON data and getting the metadata back too. (#11431) @revans2
    • Convert byte_array_view to use std::byte (#11424) @hyperbolic2346
    • Deprecate unflatten_nested_columns (#11421) @SrikarVanavasam
    • Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
    • Add Spark list hashing Java tests (#11379) @bdice
    • Move cmake to the build section. (#11376) @vyasr
    • Remove use of CUDA driver API calls from libcudf (#11370) @shwina
    • Add column constructor from device_uvector&& (#11356) @SrikarVanavasam
    • Remove unused custreamz thirdparty directory (#11343) @vyasr
    • Update jni version to 22.10.0-SNAPSHOT (#11338) @pxLi
    • Enable using upstream jitify2 (#11287) @shwina
    • Cache cudf.Scalar (#11246) @shwina
    • Remove deprecated Series.applymap. (#11031) @bdice
    • Remove deprecated expand parameter from str.findall. (#11030) @bdice
    Source code(tar.gz)
    Source code(zip)
  • v22.08.01(Sep 29, 2022)

    🚨 Breaking Changes

    • Pin numpy to &lt;1.23 (#11824) @galipremsagar
    • Remove legacy join APIs (#11274) @vyasr
    • Remove lists::drop_list_duplicates (#11236) @ttnghia
    • Remove Index.replace API (#11131) @vyasr
    • Remove deprecated Index methods from Frame (#11073) @vyasr
    • Remove public API of cudf.merge_sorted. (#11032) @bdice
    • Drop python 3.7 in code-base (#11029) @galipremsagar
    • Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule
    • Remove Arrow CUDA IPC code (#10995) @shwina
    • Buffer: make .ptr read-only (#10872) @madsbk

    🐛 Bug Fixes

    • Fix out-of-bound access in cudf::detail::label_segments (#11497) @ttnghia
    • Fix distributed error related to loop_in_thread (#11428) @galipremsagar
    • Fix atomic operations on NaN values (#11420) @ttnghia
    • Relax arrow pinning to just 8.x and remove cuda build dependency from cudf recipe (#11412) @kkraus14
    • Revert "Allow CuPy 11" (#11409) @jakirkham
    • Fix moto timeouts (#11369) @galipremsagar
    • Set +/-infinity as the identity values for floating-point numbers in device operators min and max (#11357) @ttnghia
    • Fix memory_usage() for ListSeries (#11355) @thomcom
    • Fix constructing Column from column_view with expired mask (#11354) @shwina
    • Handle parquet corner case: Columns with more rows than are in the row group. (#11353) @nvdbaranec
    • Fix DatetimeIndex & TimedeltaIndex constructors (#11342) @galipremsagar
    • Fix unsigned-compare compile warning in IntPow binops (#11339) @davidwendt
    • Fix performance issue and add a new code path to cudf::detail::contains (#11330) @ttnghia
    • Pin pytorch to temporarily unblock from libcupti errors (#11289) @galipremsagar
    • Workaround for nvcomp zstd overwriting blocks for orc due to underestimate of sizes (#11288) @jbrennan333
    • Fix inconsistency when hashing two tables in cudf::detail::contains (#11284) @ttnghia
    • Fix issue related to numpy array and category dtype (#11282) @galipremsagar
    • Add NotImplementedError when on is specified in DataFrame.join. (#11275) @vyasr
    • Fix invalid allocate_like() and empty_like() tests. (#11268) @nvdbaranec
    • Returns DataFrame When Concating Along Axis 1 (#11263) @isVoid
    • Fix compile error due to missing header (#11257) @ttnghia
    • Fix a memory aliasing/crash issue in scatter for lists. (#11254) @nvdbaranec
    • Fix tests/rolling/empty_input_test (#11238) @ttnghia
    • Fix const qualifier when using host_span&lt;bitmask_type const*&gt; (#11220) @ttnghia
    • Avoid using nvcompBatchedDeflateDecompressGetTempSizeEx in cuIO (#11213) @vuule
    • Generate benchmark data with correct run length regardless of cardinality (#11205) @vuule
    • Fix cumulative count index behavior (#11188) @brandon-b-miller
    • Fix assertion in dask_cudf test_struct_explode (#11170) @rjzamora
    • Provides a method for the user to remove the hook and re-register the hook in a custom shutdown hook manager (#11161) @res-life
    • Fix compatibility issues with pandas 1.4.3 (#11152) @vyasr
    • Ensure cuco export set is installed in cmake build (#11147) @jlowe
    • Avoid redundant deepcopy in cudf.from_pandas (#11142) @galipremsagar
    • Fix compile error due to missing header (#11126) @ttnghia
    • Fix __cuda_array_interface__ failures (#11113) @galipremsagar
    • Support octal and hex within regex character class pattern (#11112) @davidwendt
    • Fix split_re matching logic for word boundaries (#11106) @davidwendt
    • Handle multiple files metadata in read_parquet (#11105) @galipremsagar
    • Fix index alignment for Series objects with repeated index (#11103) @shwina
    • FindcuFile now searches in the current CUDA Toolkit location (#11101) @robertmaynard
    • Fix regex word boundary logic to include underline (#11099) @davidwendt
    • Exclude CudaFatalTest when selecting all Java tests (#11083) @jlowe
    • Fix duplicate cudatoolkit pinning issue (#11070) @galipremsagar
    • Maintain the input index in the result of a groupby-transform (#11068) @shwina
    • Fix bug with row count comparison for expect_columns_equivalent(). (#11059) @nvdbaranec
    • Fix BPE uninitialized size value for null and empty input strings (#11054) @davidwendt
    • Include missing header for usage of get_current_device_resource() (#11047) @AtlantaPepsi
    • Fix warn_unused_result error in parquet test (#11026) @karthikeyann
    • Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule
    • Fix small error in page row count limiting (#10991) @etseidl
    • Fix a row index entry error in ORC writer issue (#10989) @vuule
    • Fix grouped covariance to require both values to be convertible to double. (#10891) @bdice

    📖 Documentation

    • Defer loading of custom.js (#11465) @galipremsagar
    • Fix issues with day & night modes in python docs (#11400) @galipremsagar
    • Update missing data handling APIs in docs (#11345) @galipremsagar
    • Add lists filtering APIs to doxygen group. (#11336) @bdice
    • Remove unused import in README sample (#11318) @vyasr
    • Note null behavior in where docs (#11276) @brandon-b-miller
    • Update docstring for spans in get_row_data_range (#11271) @vyasr
    • Update nvCOMP integration table (#11231) @vuule
    • Add dev docs for documentation writing (#11217) @vyasr
    • Documentation fix for concatenate (#11187) @dagardner-nv
    • Fix unresolved links in markdown (#11173) @karthikeyann
    • Fix cudf version in README.md install commands (#11164) @jvanstraten
    • Switch language from None to &quot;en&quot; in docs build (#11133) @galipremsagar
    • Remove docs mentioning scalar_view since no such class exists. (#11132) @bdice
    • Add docstring entry for DataFrame.value_counts (#11039) @galipremsagar
    • Add docs to rolling var, std, count. (#11035) @bdice
    • Fix docs for Numba UDFs. (#11020) @bdice
    • Replace column comparison utilities functions with macros (#11007) @karthikeyann
    • Fix Doxygen warnings in multiple headers files (#11003) @karthikeyann
    • Fix doxygen warnings in utilities/ headers (#10974) @karthikeyann
    • Fix Doxygen warnings in table header files (#10964) @karthikeyann
    • Fix Doxygen warnings in column header files (#10963) @karthikeyann
    • Fix Doxygen warnings in strings / header files (#10937) @karthikeyann
    • Generate Doxygen Tag File for Libcudf (#10932) @isVoid
    • Fix doxygen warnings in structs, lists headers (#10923) @karthikeyann
    • Fix doxygen warnings in fixed_point.hpp (#10922) @karthikeyann
    • Fix doxygen warnings in ast/, rolling, tdigest/, wrappers/, dictionary/ headers (#10921) @karthikeyann
    • fix doxygen warnings in cudf/io/types.hpp, other header files (#10913) @karthikeyann
    • fix doxygen warnings in cudf/io/ avro, csv, json, orc, parquet header files (#10912) @karthikeyann
    • Fix doxygen warnings in cudf/*.hpp (#10896) @karthikeyann
    • Add missing documentation in aggregation.hpp (#10887) @karthikeyann
    • Revise PR template. (#10774) @bdice

    🚀 New Features

    • Change cmake to allow controlling Arrow version via cmake variable (#11429) @kkraus14
    • Adding support for list<int8> columns to be written as byte arrays in parquet (#11328) @hyperbolic2346
    • Adding byte array view structure (#11322) @hyperbolic2346
    • Adding byte_array statistics (#11303) @hyperbolic2346
    • Add column indexes to Parquet writer (#11302) @etseidl
    • Provide an Option for Default Integer and Floating Bitwidth (#11272) @isVoid
    • FST benchmark (#11243) @karthikeyann
    • Adds the Finite-State Transducer algorithm (#11242) @elstehle
    • Refactor collect_set to use cudf::distinct and cudf::lists::distinct (#11228) @ttnghia
    • Treat zstd as stable in nvcomp releases 2.3.2 and later (#11226) @jbrennan333
    • Add 24 bit dictionary support to Parquet writer (#11216) @devavret
    • Enable positive group indices for extractAllRecord on JNI (#11215) @anthony-chang
    • JNI bindings for NTH_ELEMENT window aggregation (#11201) @mythrocks
    • Add JNI bindings for extractAllRecord (#11196) @anthony-chang
    • Add cudf.options (#11193) @isVoid
    • Add thrift support for parquet column and offset indexes (#11178) @etseidl
    • Adding binary read/write as options for parquet (#11160) @hyperbolic2346
    • Support nth_element for window functions (#11158) @mythrocks
    • Implement lists::distinct and cudf::detail::stable_distinct (#11149) @ttnghia
    • Implement Groupby pct_change (#11144) @skirui-source
    • Add JNI for set operations (#11143) @ttnghia
    • Remove deprecated PER_THREAD_DEFAULT_STREAM (#11134) @jbrennan333
    • Added a Java method to check the existence of a list of keys in a map (#11128) @razajafri
    • Feature/python benchmarking (#11125) @vyasr
    • Support nan_equality in cudf::distinct (#11118) @ttnghia
    • Added JNI for getMapValueForKeys (#11104) @razajafri
    • Refactor semi_anti_join (#11100) @ttnghia
    • Replace remaining instances of rmm::cuda_stream_default with cudf::default_stream_value (#11082) @jbrennan333
    • Adds the Logical Stack algorithm (#11078) @elstehle
    • Add doxygen-check pre-commit hook (#11076) @karthikeyann
    • Use new nvCOMP API to optimize the decompression temp memory size (#11064) @vuule
    • Add Doxygen CI check (#11057) @karthikeyann
    • Support duplicate_keep_option in cudf::distinct (#11052) @ttnghia
    • Support set operations (#11043) @ttnghia
    • Support for ZLIB compression in ORC writer (#11036) @vuule
    • Adding feature swaplevels (#11027) @VamsiTallam95
    • Use nvCOMP for ZLIB decompression in ORC reader (#11024) @vuule
    • Function for bfill, ffill #9591 (#11022) @Sreekiran096
    • Generate group offsets from element labels (#11017) @ttnghia
    • Feature axes (#10979) @VamsiTallam95
    • Generate group labels from offsets (#10945) @ttnghia
    • Add missing cuIO benchmark coverage for duration types (#10933) @vuule
    • Dask-cuDF cumulative groupby ops (#10889) @brandon-b-miller
    • Reindex Improvements (#10815) @brandon-b-miller
    • Implement value_counts for DataFrame (#10813) @martinfalisse

    🛠️ Improvements

    • Pin numpy to &lt;1.23 (#11824) @galipremsagar
    • Make Index Join Tests on Default Precisions Deterministic (#11451) @isVoid
    • Pin dask & distributed for release (#11433) @galipremsagar
    • Use documented header template for doxygen (#11430) @galipremsagar
    • Relax arrow version in dev env (#11418) @galipremsagar
    • Added Java bindings for Parquet options for binary read (#11410) @razajafri
    • Allow CuPy 11 (#11393) @jakirkham
    • Improve multibyte_split performance (#11347) @cwharris
    • Switch death test to use explicit trap. (#11326) @vyasr
    • Add --output-on-failure to ctest args. (#11321) @vyasr
    • Consolidate remaining DataFrame/Series APIs (#11315) @vyasr
    • Add JNI support for the join_strings API (#11309) @revans2
    • Add cupy version to setup.py install_requires (#11306) @vyasr
    • removing some unused code (#11305) @hyperbolic2346
    • Add test of wildcard selection (#11300) @vyasr
    • Update parquet reader to take stream parameter (#11294) @PointKernel
    • Spark list hashing (#11292) @bdice
    • Remove legacy join APIs (#11274) @vyasr
    • Fix cudf recipes syntax (#11273) @ajschmidt8
    • Fix cudf recipe (#11267) @ajschmidt8
    • Cleanup config files (#11266) @vyasr
    • Run mypy on all packages (#11265) @vyasr
    • Update to isort 5.10.1. (#11262) @vyasr
    • Consolidate flake8 and pydocstyle configuration (#11260) @vyasr
    • Remove redundant black config specifications. (#11258) @vyasr
    • Ensure DeprecationWarnings are not introduced via pre-commit (#11255) @wence-
    • Optimization to gpu::PreprocessColumnData in parquet reader. (#11252) @nvdbaranec
    • Move rolling impl details to detail/ directory. (#11250) @mythrocks
    • Remove lists::drop_list_duplicates (#11236) @ttnghia
    • Use cudf::lists::distinct in Python binding (#11234) @ttnghia
    • Use cudf::lists::distinct in Java binding (#11233) @ttnghia
    • Use cudf::distinct in Java binding (#11232) @ttnghia
    • Pin dask-cuda in dev environment (#11229) @galipremsagar
    • Remove cruft in map_lookup (#11221) @mythrocks
    • Deprecate skiprows & num_rows in parquet reader (#11218) @galipremsagar
    • Remove Frame._index (#11210) @vyasr
    • Improve performance for cudf::contains when searching for a scalar (#11202) @ttnghia
    • Document why Development component is needing for CMake. (#11200) @vyasr
    • cleanup unused code in rolling_test.hpp (#11195) @karthikeyann
    • Standardize join internals around DataFrame (#11184) @vyasr
    • Move character case table declarations from src to detail (#11183) @davidwendt
    • Remove usage of Frame in StringMethods (#11181) @vyasr
    • Expose get_json_object_options to Python (#11180) @SrikarVanavasam
    • Fix decimal128 stats in parquet writer (#11179) @etseidl
    • Modify CheckPageRows in parquet_test to use datasources (#11177) @etseidl
    • Pin max version of cuda-python to 11.7.0 (#11174) @Ethyling
    • Refactor and optimize Frame.where (#11168) @vyasr
    • Add npos const static member to cudf::string_view (#11166) @davidwendt
    • Move _drop_rows_by_label from Frame to IndexedFrame (#11157) @vyasr
    • Clean up _copy_type_metadata (#11156) @vyasr
    • Add nvcc conda package in dev environment (#11154) @galipremsagar
    • Struct binary comparison op functionality for spark rapids (#11153) @rwlee
    • Refactor inline conditionals. (#11151) @bdice
    • Refactor Spark hashing tests (#11145) @bdice
    • Add new _from_data_like_self factory (#11140) @vyasr
    • Update get_cucollections to use rapids-cmake (#11139) @vyasr
    • Remove unnecessary extra function for libcudacxx detection (#11138) @vyasr
    • Allow initial value for cudf::reduce and cudf::segmented_reduce. (#11137) @SrikarVanavasam
    • Remove Index.replace API (#11131) @vyasr
    • Move char-type table function declarations from src to detail (#11127) @davidwendt
    • Clean up repo root (#11124) @bdice
    • Improve print formatting of strings containing newline characters. (#11108) @nvdbaranec
    • Fix cudf::string_view::find() to return pos for empty string argument (#11107) @davidwendt
    • Forward-merge branch-22.06 to branch-22.08 (#11086) @bdice
    • Take iterators by value in clamp.cu. (#11084) @bdice
    • Performance improvements for row to column conversions (#11075) @hyperbolic2346
    • Remove deprecated Index methods from Frame (#11073) @vyasr
    • Use per-page max compressed size estimate for compression (#11066) @devavret
    • column to row refactor for performance (#11063) @hyperbolic2346
    • Include skbuild directory into build.sh clean operation (#11060) @galipremsagar
    • Unpin dask & distributed for development (#11058) @galipremsagar
    • Add support for Series.between (#11051) @galipremsagar
    • Fix groupby include (#11046) @bwyogatama
    • Regex cleanup internal reclass and reclass_device classes (#11045) @davidwendt
    • Remove public API of cudf.merge_sorted. (#11032) @bdice
    • Drop python 3.7 in code-base (#11029) @galipremsagar
    • Addition & integration of the integer power operator (#11025) @AtlantaPepsi
    • Refactor lists::contains (#11019) @ttnghia
    • Change build.sh to find C++ library by default and avoid shadowing CMAKE_ARGS (#11013) @vyasr
    • Clean up parquet unit test (#11005) @PointKernel
    • Add missing #pragma once to header files (#11004) @karthikeyann
    • Cleanup iterator.cuh and add fixed point support for scalar_optional_accessor (#10999) @ttnghia
    • Refactor cudf::contains (#10997) @ttnghia
    • Remove Arrow CUDA IPC code (#10995) @shwina
    • Change file extension for groupby benchmark (#10985) @ttnghia
    • Sort recipe include checks. (#10984) @bdice
    • Update cuCollections for thrust upgrade (#10983) @PointKernel
    • Expose row-group size options in cudf ParquetWriter (#10980) @rjzamora
    • Cleanup cudf::strings::detail::regex_parser class source (#10975) @davidwendt
    • Handle missing fields as nulls in get_json_object() (#10970) @SrikarVanavasam
    • Fix license families to match all-caps expected by conda-verify. (#10931) @bdice
    • Include <optional> for GCC 11 compatibility. (#10927) @bdice
    • Enable builds with scikit-build (#10919) @vyasr
    • Improve distinct by using cuco::static_map::retrieve_all (#10916) @PointKernel
    • update cudfjni to 22.08.0-SNAPSHOT (#10910) @pxLi
    • Improve the capture of fatal cuda error (#10884) @sperlingxx
    • Cleanup regex compiler operators and operands source (#10879) @davidwendt
    • Buffer: make .ptr read-only (#10872) @madsbk
    • Configurable NaN handling in device_row_comparators (#10870) @rwlee
    • Register cudf.core.groupby.Grouper objects to dask grouper_dispatch (#10838) @brandon-b-miller
    • Upgrade to arrow-8 (#10816) @galipremsagar
    • Remove getattr method in RangeIndex class (#10538) @skirui-source
    • Adding bins to value counts (#8247) @marlenezw
    Source code(tar.gz)
    Source code(zip)
  • v22.08.00(Aug 17, 2022)

    🚨 Breaking Changes

    • Remove legacy join APIs (#11274) @vyasr
    • Remove lists::drop_list_duplicates (#11236) @ttnghia
    • Remove Index.replace API (#11131) @vyasr
    • Remove deprecated Index methods from Frame (#11073) @vyasr
    • Remove public API of cudf.merge_sorted. (#11032) @bdice
    • Drop python 3.7 in code-base (#11029) @galipremsagar
    • Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule
    • Remove Arrow CUDA IPC code (#10995) @shwina
    • Buffer: make .ptr read-only (#10872) @madsbk

    🐛 Bug Fixes

    • Fix distributed error related to loop_in_thread (#11428) @galipremsagar
    • Relax arrow pinning to just 8.x and remove cuda build dependency from cudf recipe (#11412) @kkraus14
    • Revert "Allow CuPy 11" (#11409) @jakirkham
    • Fix moto timeouts (#11369) @galipremsagar
    • Set +/-infinity as the identity values for floating-point numbers in device operators min and max (#11357) @ttnghia
    • Fix memory_usage() for ListSeries (#11355) @thomcom
    • Fix constructing Column from column_view with expired mask (#11354) @shwina
    • Handle parquet corner case: Columns with more rows than are in the row group. (#11353) @nvdbaranec
    • Fix DatetimeIndex & TimedeltaIndex constructors (#11342) @galipremsagar
    • Fix unsigned-compare compile warning in IntPow binops (#11339) @davidwendt
    • Fix performance issue and add a new code path to cudf::detail::contains (#11330) @ttnghia
    • Pin pytorch to temporarily unblock from libcupti errors (#11289) @galipremsagar
    • Workaround for nvcomp zstd overwriting blocks for orc due to underestimate of sizes (#11288) @jbrennan333
    • Fix inconsistency when hashing two tables in cudf::detail::contains (#11284) @ttnghia
    • Fix issue related to numpy array and category dtype (#11282) @galipremsagar
    • Add NotImplementedError when on is specified in DataFrame.join. (#11275) @vyasr
    • Fix invalid allocate_like() and empty_like() tests. (#11268) @nvdbaranec
    • Returns DataFrame When Concating Along Axis 1 (#11263) @isVoid
    • Fix compile error due to missing header (#11257) @ttnghia
    • Fix a memory aliasing/crash issue in scatter for lists. (#11254) @nvdbaranec
    • Fix tests/rolling/empty_input_test (#11238) @ttnghia
    • Fix const qualifier when using host_span&lt;bitmask_type const*&gt; (#11220) @ttnghia
    • Avoid using nvcompBatchedDeflateDecompressGetTempSizeEx in cuIO (#11213) @vuule
    • Generate benchmark data with correct run length regardless of cardinality (#11205) @vuule
    • Fix cumulative count index behavior (#11188) @brandon-b-miller
    • Fix assertion in dask_cudf test_struct_explode (#11170) @rjzamora
    • Provides a method for the user to remove the hook and re-register the hook in a custom shutdown hook manager (#11161) @res-life
    • Fix compatibility issues with pandas 1.4.3 (#11152) @vyasr
    • Ensure cuco export set is installed in cmake build (#11147) @jlowe
    • Avoid redundant deepcopy in cudf.from_pandas (#11142) @galipremsagar
    • Fix compile error due to missing header (#11126) @ttnghia
    • Fix __cuda_array_interface__ failures (#11113) @galipremsagar
    • Support octal and hex within regex character class pattern (#11112) @davidwendt
    • Fix split_re matching logic for word boundaries (#11106) @davidwendt
    • Handle multiple files metadata in read_parquet (#11105) @galipremsagar
    • Fix index alignment for Series objects with repeated index (#11103) @shwina
    • FindcuFile now searches in the current CUDA Toolkit location (#11101) @robertmaynard
    • Fix regex word boundary logic to include underline (#11099) @davidwendt
    • Exclude CudaFatalTest when selecting all Java tests (#11083) @jlowe
    • Fix duplicate cudatoolkit pinning issue (#11070) @galipremsagar
    • Maintain the input index in the result of a groupby-transform (#11068) @shwina
    • Fix bug with row count comparison for expect_columns_equivalent(). (#11059) @nvdbaranec
    • Fix BPE uninitialized size value for null and empty input strings (#11054) @davidwendt
    • Include missing header for usage of get_current_device_resource() (#11047) @AtlantaPepsi
    • Fix warn_unused_result error in parquet test (#11026) @karthikeyann
    • Return empty dataframe when reading a Parquet file using empty columns option (#11018) @vuule
    • Fix small error in page row count limiting (#10991) @etseidl
    • Fix a row index entry error in ORC writer issue (#10989) @vuule
    • Fix grouped covariance to require both values to be convertible to double. (#10891) @bdice

    📖 Documentation

    • Fix issues with day & night modes in python docs (#11400) @galipremsagar
    • Update missing data handling APIs in docs (#11345) @galipremsagar
    • Add lists filtering APIs to doxygen group. (#11336) @bdice
    • Remove unused import in README sample (#11318) @vyasr
    • Note null behavior in where docs (#11276) @brandon-b-miller
    • Update docstring for spans in get_row_data_range (#11271) @vyasr
    • Update nvCOMP integration table (#11231) @vuule
    • Add dev docs for documentation writing (#11217) @vyasr
    • Documentation fix for concatenate (#11187) @dagardner-nv
    • Fix unresolved links in markdown (#11173) @karthikeyann
    • Fix cudf version in README.md install commands (#11164) @jvanstraten
    • Switch language from None to &quot;en&quot; in docs build (#11133) @galipremsagar
    • Remove docs mentioning scalar_view since no such class exists. (#11132) @bdice
    • Add docstring entry for DataFrame.value_counts (#11039) @galipremsagar
    • Add docs to rolling var, std, count. (#11035) @bdice
    • Fix docs for Numba UDFs. (#11020) @bdice
    • Replace column comparison utilities functions with macros (#11007) @karthikeyann
    • Fix Doxygen warnings in multiple headers files (#11003) @karthikeyann
    • Fix doxygen warnings in utilities/ headers (#10974) @karthikeyann
    • Fix Doxygen warnings in table header files (#10964) @karthikeyann
    • Fix Doxygen warnings in column header files (#10963) @karthikeyann
    • Fix Doxygen warnings in strings / header files (#10937) @karthikeyann
    • Generate Doxygen Tag File for Libcudf (#10932) @isVoid
    • Fix doxygen warnings in structs, lists headers (#10923) @karthikeyann
    • Fix doxygen warnings in fixed_point.hpp (#10922) @karthikeyann
    • Fix doxygen warnings in ast/, rolling, tdigest/, wrappers/, dictionary/ headers (#10921) @karthikeyann
    • fix doxygen warnings in cudf/io/types.hpp, other header files (#10913) @karthikeyann
    • fix doxygen warnings in cudf/io/ avro, csv, json, orc, parquet header files (#10912) @karthikeyann
    • Fix doxygen warnings in cudf/*.hpp (#10896) @karthikeyann
    • Add missing documentation in aggregation.hpp (#10887) @karthikeyann
    • Revise PR template. (#10774) @bdice

    🚀 New Features

    • Change cmake to allow controlling Arrow version via cmake variable (#11429) @kkraus14
    • Adding support for list<int8> columns to be written as byte arrays in parquet (#11328) @hyperbolic2346
    • Adding byte array view structure (#11322) @hyperbolic2346
    • Adding byte_array statistics (#11303) @hyperbolic2346
    • Add column indexes to Parquet writer (#11302) @etseidl
    • Provide an Option for Default Integer and Floating Bitwidth (#11272) @isVoid
    • FST benchmark (#11243) @karthikeyann
    • Adds the Finite-State Transducer algorithm (#11242) @elstehle
    • Refactor collect_set to use cudf::distinct and cudf::lists::distinct (#11228) @ttnghia
    • Treat zstd as stable in nvcomp releases 2.3.2 and later (#11226) @jbrennan333
    • Add 24 bit dictionary support to Parquet writer (#11216) @devavret
    • Enable positive group indices for extractAllRecord on JNI (#11215) @anthony-chang
    • JNI bindings for NTH_ELEMENT window aggregation (#11201) @mythrocks
    • Add JNI bindings for extractAllRecord (#11196) @anthony-chang
    • Add cudf.options (#11193) @isVoid
    • Add thrift support for parquet column and offset indexes (#11178) @etseidl
    • Adding binary read/write as options for parquet (#11160) @hyperbolic2346
    • Support nth_element for window functions (#11158) @mythrocks
    • Implement lists::distinct and cudf::detail::stable_distinct (#11149) @ttnghia
    • Implement Groupby pct_change (#11144) @skirui-source
    • Add JNI for set operations (#11143) @ttnghia
    • Remove deprecated PER_THREAD_DEFAULT_STREAM (#11134) @jbrennan333
    • Added a Java method to check the existence of a list of keys in a map (#11128) @razajafri
    • Feature/python benchmarking (#11125) @vyasr
    • Support nan_equality in cudf::distinct (#11118) @ttnghia
    • Added JNI for getMapValueForKeys (#11104) @razajafri
    • Refactor semi_anti_join (#11100) @ttnghia
    • Replace remaining instances of rmm::cuda_stream_default with cudf::default_stream_value (#11082) @jbrennan333
    • Adds the Logical Stack algorithm (#11078) @elstehle
    • Add doxygen-check pre-commit hook (#11076) @karthikeyann
    • Use new nvCOMP API to optimize the decompression temp memory size (#11064) @vuule
    • Add Doxygen CI check (#11057) @karthikeyann
    • Support duplicate_keep_option in cudf::distinct (#11052) @ttnghia
    • Support set operations (#11043) @ttnghia
    • Support for ZLIB compression in ORC writer (#11036) @vuule
    • Adding feature swaplevels (#11027) @VamsiTallam95
    • Use nvCOMP for ZLIB decompression in ORC reader (#11024) @vuule
    • Function for bfill, ffill #9591 (#11022) @Sreekiran096
    • Generate group offsets from element labels (#11017) @ttnghia
    • Feature axes (#10979) @VamsiTallam95
    • Generate group labels from offsets (#10945) @ttnghia
    • Add missing cuIO benchmark coverage for duration types (#10933) @vuule
    • Dask-cuDF cumulative groupby ops (#10889) @brandon-b-miller
    • Reindex Improvements (#10815) @brandon-b-miller
    • Implement value_counts for DataFrame (#10813) @martinfalisse

    🛠️ Improvements

    • Pin dask & distributed for release (#11433) @galipremsagar
    • Use documented header template for doxygen (#11430) @galipremsagar
    • Relax arrow version in dev env (#11418) @galipremsagar
    • Allow CuPy 11 (#11393) @jakirkham
    • Improve multibyte_split performance (#11347) @cwharris
    • Switch death test to use explicit trap. (#11326) @vyasr
    • Add --output-on-failure to ctest args. (#11321) @vyasr
    • Consolidate remaining DataFrame/Series APIs (#11315) @vyasr
    • Add JNI support for the join_strings API (#11309) @revans2
    • Add cupy version to setup.py install_requires (#11306) @vyasr
    • removing some unused code (#11305) @hyperbolic2346
    • Add test of wildcard selection (#11300) @vyasr
    • Update parquet reader to take stream parameter (#11294) @PointKernel
    • Spark list hashing (#11292) @bdice
    • Remove legacy join APIs (#11274) @vyasr
    • Fix cudf recipes syntax (#11273) @ajschmidt8
    • Fix cudf recipe (#11267) @ajschmidt8
    • Cleanup config files (#11266) @vyasr
    • Run mypy on all packages (#11265) @vyasr
    • Update to isort 5.10.1. (#11262) @vyasr
    • Consolidate flake8 and pydocstyle configuration (#11260) @vyasr
    • Remove redundant black config specifications. (#11258) @vyasr
    • Ensure DeprecationWarnings are not introduced via pre-commit (#11255) @wence-
    • Optimization to gpu::PreprocessColumnData in parquet reader. (#11252) @nvdbaranec
    • Move rolling impl details to detail/ directory. (#11250) @mythrocks
    • Remove lists::drop_list_duplicates (#11236) @ttnghia
    • Use cudf::lists::distinct in Python binding (#11234) @ttnghia
    • Use cudf::lists::distinct in Java binding (#11233) @ttnghia
    • Use cudf::distinct in Java binding (#11232) @ttnghia
    • Pin dask-cuda in dev environment (#11229) @galipremsagar
    • Remove cruft in map_lookup (#11221) @mythrocks
    • Deprecate skiprows & num_rows in parquet reader (#11218) @galipremsagar
    • Remove Frame._index (#11210) @vyasr
    • Improve performance for cudf::contains when searching for a scalar (#11202) @ttnghia
    • Document why Development component is needing for CMake. (#11200) @vyasr
    • cleanup unused code in rolling_test.hpp (#11195) @karthikeyann
    • Standardize join internals around DataFrame (#11184) @vyasr
    • Move character case table declarations from src to detail (#11183) @davidwendt
    • Remove usage of Frame in StringMethods (#11181) @vyasr
    • Expose get_json_object_options to Python (#11180) @SrikarVanavasam
    • Fix decimal128 stats in parquet writer (#11179) @etseidl
    • Modify CheckPageRows in parquet_test to use datasources (#11177) @etseidl
    • Pin max version of cuda-python to 11.7.0 (#11174) @Ethyling
    • Refactor and optimize Frame.where (#11168) @vyasr
    • Add npos const static member to cudf::string_view (#11166) @davidwendt
    • Move _drop_rows_by_label from Frame to IndexedFrame (#11157) @vyasr
    • Clean up _copy_type_metadata (#11156) @vyasr
    • Add nvcc conda package in dev environment (#11154) @galipremsagar
    • Struct binary comparison op functionality for spark rapids (#11153) @rwlee
    • Refactor inline conditionals. (#11151) @bdice
    • Refactor Spark hashing tests (#11145) @bdice
    • Add new _from_data_like_self factory (#11140) @vyasr
    • Update get_cucollections to use rapids-cmake (#11139) @vyasr
    • Remove unnecessary extra function for libcudacxx detection (#11138) @vyasr
    • Allow initial value for cudf::reduce and cudf::segmented_reduce. (#11137) @SrikarVanavasam
    • Remove Index.replace API (#11131) @vyasr
    • Move char-type table function declarations from src to detail (#11127) @davidwendt
    • Clean up repo root (#11124) @bdice
    • Improve print formatting of strings containing newline characters. (#11108) @nvdbaranec
    • Fix cudf::string_view::find() to return pos for empty string argument (#11107) @davidwendt
    • Forward-merge branch-22.06 to branch-22.08 (#11086) @bdice
    • Take iterators by value in clamp.cu. (#11084) @bdice
    • Performance improvements for row to column conversions (#11075) @hyperbolic2346
    • Remove deprecated Index methods from Frame (#11073) @vyasr
    • Use per-page max compressed size estimate for compression (#11066) @devavret
    • column to row refactor for performance (#11063) @hyperbolic2346
    • Include skbuild directory into build.sh clean operation (#11060) @galipremsagar
    • Unpin dask & distributed for development (#11058) @galipremsagar
    • Add support for Series.between (#11051) @galipremsagar
    • Fix groupby include (#11046) @bwyogatama
    • Regex cleanup internal reclass and reclass_device classes (#11045) @davidwendt
    • Remove public API of cudf.merge_sorted. (#11032) @bdice
    • Drop python 3.7 in code-base (#11029) @galipremsagar
    • Addition & integration of the integer power operator (#11025) @AtlantaPepsi
    • Refactor lists::contains (#11019) @ttnghia
    • Change build.sh to find C++ library by default and avoid shadowing CMAKE_ARGS (#11013) @vyasr
    • Clean up parquet unit test (#11005) @PointKernel
    • Add missing #pragma once to header files (#11004) @karthikeyann
    • Cleanup iterator.cuh and add fixed point support for scalar_optional_accessor (#10999) @ttnghia
    • Refactor cudf::contains (#10997) @ttnghia
    • Remove Arrow CUDA IPC code (#10995) @shwina
    • Change file extension for groupby benchmark (#10985) @ttnghia
    • Sort recipe include checks. (#10984) @bdice
    • Update cuCollections for thrust upgrade (#10983) @PointKernel
    • Expose row-group size options in cudf ParquetWriter (#10980) @rjzamora
    • Cleanup cudf::strings::detail::regex_parser class source (#10975) @davidwendt
    • Handle missing fields as nulls in get_json_object() (#10970) @SrikarVanavasam
    • Fix license families to match all-caps expected by conda-verify. (#10931) @bdice
    • Include <optional> for GCC 11 compatibility. (#10927) @bdice
    • Enable builds with scikit-build (#10919) @vyasr
    • Improve distinct by using cuco::static_map::retrieve_all (#10916) @PointKernel
    • update cudfjni to 22.08.0-SNAPSHOT (#10910) @pxLi
    • Improve the capture of fatal cuda error (#10884) @sperlingxx
    • Cleanup regex compiler operators and operands source (#10879) @davidwendt
    • Buffer: make .ptr read-only (#10872) @madsbk
    • Configurable NaN handling in device_row_comparators (#10870) @rwlee
    • Register cudf.core.groupby.Grouper objects to dask grouper_dispatch (#10838) @brandon-b-miller
    • Upgrade to arrow-8 (#10816) @galipremsagar
    • Remove getattr method in RangeIndex class (#10538) @skirui-source
    • Adding bins to value counts (#8247) @marlenezw
    Source code(tar.gz)
    Source code(zip)
  • v22.10.00a(Nov 3, 2022)

    🔗 Links

    🚨 Breaking Changes

    • Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
    • Disable nvCOMP DEFLATE integration (#11811) @vuule
    • Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
    • Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
    • Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
    • Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
    • Update zfill to match Python output (#11634) @davidwendt
    • Upgrade pandas to 1.5 (#11617) @galipremsagar
    • Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
    • Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
    • Adding optional parquet reader schema (#11524) @hyperbolic2346
    • Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
    • Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
    • Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
    • Disable Arrow S3 support by default. (#11470) @bdice
    • Convert thrust::optional usages to std::optional (#11455) @robertmaynard
    • Remove unused is_struct trait. (#11450) @bdice
    • Refactor the Buffer class (#11447) @madsbk
    • Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
    • Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
    • Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
    • Use the new JSON parser when the experimental reader is selected (#11364) @vuule
    • Remove deprecated Series.applymap. (#11031) @bdice
    • Remove deprecated expand parameter from str.findall. (#11030) @bdice

    🐛 Bug Fixes

    • Force using old fmt in nvbench. (#12064) @vyasr
    • Update cuda-python dependency to 11.7.1 (#11994) @shwina
    • Fixes bug in temporary decompression space estimation before calling nvcomp (#11879) @abellina
    • Handle ptx file paths during strings_udf import (#11862) @galipremsagar
    • Disable Zstandard decompression on nvCOMP 2.4 and Pascal GPus (#11856) @vuule
    • Reset strings_udf CEC and solve several related issues (#11846) @brandon-b-miller
    • Fix bug in new shuffle-based groupby implementation (#11836) @rjzamora
    • Fix is_valid checks in Scalar._binaryop (#11818) @wence-
    • Fix operator NotImplemented issue with numpy (#11816) @galipremsagar
    • Disable nvCOMP DEFLATE integration (#11811) @vuule
    • Build strings_udf package with other python packages in nightlies (#11808) @brandon-b-miller
    • Revert problematic shuffle=explicit-comms changes (#11803) @rjzamora
    • Fix regex out-of-bounds write in strided rows logic (#11797) @davidwendt
    • Build cudf locally before building strings_udf conda packages in CI (#11785) @brandon-b-miller
    • Fix an issue in cudf::row_bit_count involving structs and lists at multiple levels. (#11779) @nvdbaranec
    • Fix return type of Index.isna & Index.notna (#11769) @galipremsagar
    • Fix issue with set-item incase of list and struct types (#11760) @galipremsagar
    • Ensure all libcudf APIs run on cudf's default stream (#11759) @vyasr
    • Resolve dask_cudf failures caused by upstream groupby changes (#11755) @rjzamora
    • Fix ORC string sum statistics (#11740) @vuule
    • Add strings_udf package for python 3.9 (#11730) @brandon-b-miller
    • Ensure that all tests launch kernels on cudf's default stream (#11726) @vyasr
    • Don't assume stream is a compile-time constant expression (#11725) @vyasr
    • Fix get_thrust.cmake format at patch command (#11715) @davidwendt
    • Fix cudf::partition* APIs that do not return offsets for empty output table (#11709) @ttnghia
    • Fix cudf::lists::sort_lists for NaN and Infinity values (#11703) @davidwendt
    • Modify ORC reader timestamp parsing to match the apache reader behavior (#11699) @vuule
    • Fix DataFrame.from_arrow to preserve type metadata (#11698) @galipremsagar
    • Fix compile error due to missing header (#11697) @ttnghia
    • Default to Snappy compression in to_orc when using cuDF or Dask (#11690) @vuule
    • Fix an issue related to Multindex when group_keys=True (#11689) @galipremsagar
    • Transfer correct dtype to exploded column (#11687) @wence-
    • Ignore protobuf generated files in mypy checks (#11685) @galipremsagar
    • Maintain the index name after .loc (#11677) @shwina
    • Fix issue with extracting nested column data & dtype preservation (#11671) @galipremsagar
    • Ensure that all cudf tests and benchmarks are conda env aware (#11666) @robertmaynard
    • Update to Thrust 1.17.2 to fix cub ODR issues (#11665) @robertmaynard
    • Fix multi-file remote datasource bug (#11655) @rjzamora
    • Fix invalid regex quantifier check to not include alternation (#11654) @davidwendt
    • Fix bug in device_write(): it uses an incorrect size (#11651) @madsbk
    • fixes overflows in benchmarks (#11649) @elstehle
    • Fix regex negated classes to not automatically include new-lines (#11644) @davidwendt
    • Fix compile error in benchmark nested_json.cpp (#11637) @davidwendt
    • Update zfill to match Python output (#11634) @davidwendt
    • Removed converted type for INT32 and INT64 since they do not convert (#11627) @hyperbolic2346
    • Fix host scalars construction of nested types (#11612) @galipremsagar
    • Fix compile warning in nested_json_gpu.cu (#11607) @davidwendt
    • Change default value of ordered to False in CategoricalDtype (#11604) @galipremsagar
    • Preserve order if necessary when deduping categoricals internally (#11597) @brandon-b-miller
    • Add is_timestamp test for leap second (60) (#11594) @davidwendt
    • Fix an issue with to_arrow when column name type is not a string (#11590) @galipremsagar
    • Fix exception in segmented-reduce benchmark (#11588) @davidwendt
    • Fix encode/decode of negative timestamps in ORC reader/writer (#11586) @vuule
    • Correct distribution data type in quantiles benchmark (#11584) @vuule
    • Fix multibyte_split benchmark for host buffers (#11583) @upsj
    • xfail custreamz display test for now (#11567) @shwina
    • Fix JNI for TableWithMeta to use schema_info instead of column_names (#11566) @jlowe
    • Reduce code duplication for dask & distributed nightly/stable installs (#11565) @galipremsagar
    • Fix groupby failures in dask_cudf CI (#11561) @rjzamora
    • Fix for pivot: error when 'values' is a multicharacter string (#11538) @shaswat-indian
    • find_package(cudf) + arrow9 usable with cudf build directory (#11535) @robertmaynard
    • Fixing crash when writing binary nested data in parquet (#11526) @hyperbolic2346
    • Fix for: error when assigning a value to an empty series (#11523) @shaswat-indian
    • Fix invalid results from conditional-left-anti-join in debug build (#11517) @davidwendt
    • Fix cmake error after upgrading to Arrow 9 (#11513) @ttnghia
    • Fix reverse binary operators acting on a host value and cudf.Scalar (#11512) @bdice
    • Update parquet fuzz tests to drop support for skiprows & num_rows (#11505) @galipremsagar
    • Use rapids-cmake 22.10 best practice for RAPIDS.cmake location (#11493) @robertmaynard
    • Handle some zero-sized corner cases in dlpack interop (#11449) @wence-
    • Return empty dataframe when reading an ORC file using empty columns option (#11446) @vuule
    • libcudf c++ example updated to CPM version 0.35.3 (#11417) @robertmaynard
    • Fix regex quantifier check to include capture groups (#11373) @davidwendt
    • Fix read_text when byte_range is aligned with field (#11371) @upsj
    • Fix to_timestamps truncated subsecond calculation (#11367) @davidwendt
    • column: calculate null_count before release()ing the cudf::column (#11365) @wence-

    📖 Documentation

    • Update guide-to-udfs notebook (#11861) @brandon-b-miller
    • Update docstring for cudf.read_text (#11799) @GregoryKimball
    • Add doc section for list & struct handling (#11770) @galipremsagar
    • Document that minimum required CMake version is now 3.23.1 (#11751) @robertmaynard
    • Update libcudf documentation build command in DOCUMENTATION.md (#11735) @davidwendt
    • Add docs for use of string data to DataFrame.apply and Series.apply and update guide to UDFs notebook (#11733) @brandon-b-miller
    • Enable more Pydocstyle rules (#11582) @bdice
    • Remove unused cpp/img folder (#11554) @davidwendt
    • Publish C++ developer docs (#11475) @vyasr
    • Fix a misalignment in cudf.get_dummies docstring (#11443) @galipremsagar
    • Update contributing doc to include links to the developer guides (#11390) @davidwendt
    • Fix table_view_base doxygen format (#11340) @davidwendt
    • Create main developer guide for Python (#11235) @vyasr
    • Add developer documentation for benchmarking (#11122) @vyasr
    • cuDF error handling document (#7917) @isVoid

    🚀 New Features

    • Add hasNull statistic reading ability to ORC (#11747) @devavret
    • Add istitle to string UDFs (#11738) @brandon-b-miller
    • JSON Column creation in GPU (#11714) @karthikeyann
    • Adds option to take explicit nested schema for nested JSON reader (#11682) @elstehle
    • Add BGZIP data_chunk_reader (#11652) @upsj
    • Support DECIMAL order-by for RANGE window functions (#11645) @mythrocks
    • changing version of cmake to 3.23.3 (#11619) @hyperbolic2346
    • Generate unique keys table in java JNI contiguousSplitGroups (#11614) @res-life
    • Generic type casting to support the new nested JSON reader (#11613) @elstehle
    • JSON tree traversal (#11610) @karthikeyann
    • Add casting operators to masked UDFs (#11578) @brandon-b-miller
    • Adds type inference and type conversion for leaf-columns to the nested JSON parser (#11574) @elstehle
    • Add strings 'like' function (#11558) @davidwendt
    • Handle hyphen as literal for regex cclass when incomplete range (#11557) @davidwendt
    • Enable ZSTD compression in ORC and Parquet writers (#11551) @vuule
    • Adds support for json lines format to the nested JSON reader (#11534) @elstehle
    • Adding optional parquet reader schema (#11524) @hyperbolic2346
    • Adds GPU implementation of JSON-token-stream to JSON-tree (#11518) @karthikeyann
    • Add gdb pretty-printers for simple types (#11499) @upsj
    • Add create_random_column function to the data generator (#11490) @vuule
    • Add fluent API builder to data_profile (#11479) @vuule
    • Adds Nested Json benchmark (#11466) @karthikeyann
    • Convert thrust::optional usages to std::optional (#11455) @robertmaynard
    • Python API for the future experimental JSON reader (#11426) @vuule
    • Return schema info from JSON reader (#11419) @vuule
    • Add regex ASCII flag support for matching builtin character classes (#11404) @davidwendt
    • Truncate parquet column indexes (#11403) @etseidl
    • Adds the end-to-end JSON parser implementation (#11388) @elstehle
    • Use the new JSON parser when the experimental reader is selected (#11364) @vuule
    • Add placeholder for the experimental JSON reader (#11334) @vuule
    • Add read-only functions on string dtypes to DataFrame.apply and Series.apply (#11319) @brandon-b-miller
    • Added 'crosstab' and 'pivot_table' features (#11314) @shaswat-indian
    • Quickly error out when trying to build with unsupported nvcc versions (#11297) @robertmaynard
    • Adds JSON tokenizer (#11264) @elstehle
    • List lexicographic comparator (#11129) @devavret
    • Add generic type inference for cuIO (#11121) @PointKernel
    • Fully support nested types in cudf::contains (#10656) @ttnghia
    • Support nested types in lists::contains (#10548) @ttnghia

    🛠️ Improvements

    • Pin dask and distributed for release (#11822) @galipremsagar
    • Add examples for Nested JSON reader (#11814) @GregoryKimball
    • Support shuffle-based groupby aggregations in dask_cudf (#11800) @rjzamora
    • Update strings udf version updater script (#11772) @galipremsagar
    • Remove kwargs in read_csv & to_csv (#11762) @galipremsagar
    • Pass dtype param to avoid pd.Series warnings (#11761) @galipremsagar
    • Enable schema_element & keep_quotes support in json reader (#11746) @galipremsagar
    • Add ability to construct ListColumn when size is None (#11745) @galipremsagar
    • Reduces memory requirements in JSON parser and adds bytes/s and peak memory usage to benchmarks (#11732) @elstehle
    • Add missing copyright headers. (#11712) @bdice
    • Fix copyright check issues in pre-commit (#11711) @bdice
    • Include decimal in supported types for range window order-by columns (#11710) @mythrocks
    • Disable very large column gtest for contiguous-split (#11706) @davidwendt
    • Drop split_out=None test from groupby.agg (#11704) @wence-
    • Use CubinLinker for CUDA Minor Version Compatibility (#11701) @gmarkall
    • Add regex capture-group parameter to auto convert to non-capture groups (#11695) @davidwendt
    • Add a __dataframe__ method to the protocol dataframe object (#11692) @rgommers
    • Special-case multibyte_split for single-byte delimiter (#11681) @upsj
    • Remove isort exclusions (#11680) @bdice
    • Refactor CSV reader benchmarks with nvbench (#11678) @PointKernel
    • Check conda recipe headers with pre-commit (#11669) @bdice
    • Remove redundant style check for clang-format. (#11668) @bdice
    • Add support for group_keys in groupby (#11659) @galipremsagar
    • Fix pandoc pinning. (#11658) @bdice
    • Revert removal of skip_rows / num_rows options from the Parquet reader. (#11657) @nvdbaranec
    • Update git metadata (#11647) @bdice
    • Call set_null_count on a returning column if null-count is known (#11646) @davidwendt
    • Fix some libcudf detail calls not passing the stream variable (#11642) @davidwendt
    • Update to mypy 0.971 (#11640) @wence-
    • Refactor strings strip functor to details header (#11635) @davidwendt
    • Fix incorrect nullCount in get_json_object (#11633) @trxcllnt
    • Simplify hostdevice_vector (#11631) @upsj
    • Refactor parquet writer benchmarks with nvbench (#11623) @PointKernel
    • Rework contains_scalar to check nulls at runtime (#11622) @davidwendt
    • Fix incorrect memory resource used in rolling temp columns (#11618) @mythrocks
    • Upgrade pandas to 1.5 (#11617) @galipremsagar
    • Move type-dispatcher calls from traits.hpp to traits.cpp (#11616) @davidwendt
    • Refactor parquet reader benchmarks with nvbench (#11611) @PointKernel
    • Forward-merge branch-22.08 to branch-22.10 (#11608) @bdice
    • Use stream in Java API. (#11601) @bdice
    • Refactors of public/detail APIs, CUDF_FUNC_RANGE, stream handling. (#11600) @bdice
    • Improve ORC writer benchmark with nvbench (#11598) @PointKernel
    • Tune multibyte_split kernel (#11587) @upsj
    • Move split_utils.cuh to strings/detail (#11585) @davidwendt
    • Fix warnings due to compiler regression with if constexpr (#11581) @ttnghia
    • Add full 24-bit dictionary support to Parquet writer (#11580) @etseidl
    • Expose "explicit-comms" option in shuffle-based dask_cudf functions (#11576) @rjzamora
    • Move cudf::strings::findall_record to cudf::strings::findall (#11575) @davidwendt
    • Refactor dask_cudf groupby to use apply_concat_apply (#11571) @rjzamora
    • Add ability to write list(struct) columns as map type in orc writer (#11568) @galipremsagar
    • Add byte_range to multibyte_split benchmark + NVBench refactor (#11562) @upsj
    • JNI support for writing binary columns in parquet (#11556) @revans2
    • Support additional dictionary bit widths in Parquet writer (#11547) @etseidl
    • Refactor string/numeric conversion utilities (#11545) @davidwendt
    • Removing unnecessary asserts in parquet tests (#11544) @hyperbolic2346
    • Clean up ORC reader benchmarks with NVBench (#11543) @PointKernel
    • Reuse MurmurHash3_32 in Parquet page data. (#11528) @bdice
    • Add hexadecimal value separators (#11527) @bdice
    • Deprecate skiprows and num_rows in read_orc (#11522) @galipremsagar
    • Struct support for NULL_EQUALS binary operation (#11520) @rwlee
    • Bump hadoop-common from 3.2.3 to 3.2.4 in /java (#11516) @dependabot[bot]
    • Fix Feather test warning. (#11511) @bdice
    • copy_range ballot_syncs to have no execution dependency (#11508) @robertmaynard
    • Upgrade to arrow-9.x (#11507) @galipremsagar
    • Remove support for skip_rows / num_rows options in the parquet reader. (#11503) @nvdbaranec
    • Single-pass multibyte_split (#11500) @upsj
    • Sanitize percentile_approx() output for empty input (#11498) @SrikarVanavasam
    • Unpin dask and distributed for development (#11492) @galipremsagar
    • Move SparkMurmurHash3_32 functor. (#11489) @bdice
    • Refactor group_nunique.cu to use nullate::DYNAMIC for reduce-by-key functor (#11482) @davidwendt
    • Drop support for skiprows and num_rows in cudf.read_parquet (#11480) @galipremsagar
    • Add reduction distinct_count benchmark (#11473) @ttnghia
    • Add groupby nunique aggregation benchmark (#11472) @ttnghia
    • Disable Arrow S3 support by default. (#11470) @bdice
    • Add groupby max aggregation benchmark (#11464) @ttnghia
    • Extract Dremel encoding code from Parquet (#11461) @vyasr
    • Add missing Thrust #includes. (#11457) @bdice
    • Make CMake hooks verbose (#11456) @vyasr
    • Control Parquet page size through Python API (#11454) @etseidl
    • Add control of Parquet column index creation to python (#11453) @etseidl
    • Remove unused is_struct trait. (#11450) @bdice
    • Refactor the Buffer class (#11447) @madsbk
    • Refactor pad_side and strip_type enums into side_type enum (#11438) @davidwendt
    • Update to Thrust 1.17.0 (#11437) @bdice
    • Add in JNI for parsing JSON data and getting the metadata back too. (#11431) @revans2
    • Convert byte_array_view to use std::byte (#11424) @hyperbolic2346
    • Deprecate unflatten_nested_columns (#11421) @SrikarVanavasam
    • Remove HASH_SERIAL_MURMUR3 / serial32BitMurmurHash3 (#11383) @bdice
    • Add Spark list hashing Java tests (#11379) @bdice
    • Move cmake to the build section. (#11376) @vyasr
    • Remove use of CUDA driver API calls from libcudf (#11370) @shwina
    • Add column constructor from device_uvector&& (#11356) @SrikarVanavasam
    • Remove unused custreamz thirdparty directory (#11343) @vyasr
    • Update jni version to 22.10.0-SNAPSHOT (#11338) @pxLi
    • Enable using upstream jitify2 (#11287) @shwina
    • Cache cudf.Scalar (#11246) @shwina
    • Remove deprecated Series.applymap. (#11031) @bdice
    • Remove deprecated expand parameter from str.findall. (#11030) @bdice
    Source code(tar.gz)
    Source code(zip)
  • v22.06.01(Jul 6, 2022)

  • v22.06.00(Jun 7, 2022)

    🚨 Breaking Changes

    • Enable Zstandard decompression only when all nvcomp integrations are enabled (#10944) @vuule
    • Rename sliced_child to get_sliced_child. (#10885) @bdice
    • Add parameters to control page size in Parquet writer (#10882) @etseidl
    • Make cudf::test::expect_columns_equal() to fail when comparing unsanitary lists. (#10880) @nvdbaranec
    • Cleanup regex compiler fixed quantifiers source (#10843) @davidwendt
    • Refactor cudf::contains, renaming and switching parameters role (#10802) @ttnghia
    • Generic serialization of all column types (#10784) @wence-
    • Return per-file metadata from readers (#10782) @vuule
    • HostColumnVectoreCore#isNull should return true for out-of-range rows (#10779) @gerashegalov
    • Update groupby::hash to use new row operators for keys (#10770) @PointKernel
    • update mangle_dupe_cols behavior in csv reader to match pandas 1.4.0 behavior (#10749) @karthikeyann
    • Rename CUDA_TRY macro to CUDF_CUDA_TRY, rename CHECK_CUDA macro to CUDF_CHECK_CUDA. (#10589) @bdice
    • Upgrade cudf to support pandas 1.4.x versions (#10584) @galipremsagar
    • Move binop methods from Frame to IndexedFrame and standardize the docstring (#10576) @vyasr
    • Add default= kwarg to .list.get() accessor method (#10547) @shwina
    • Remove deprecated decimal_cols_as_float in the ORC reader (#10515) @vuule
    • Support nvComp 2.3 if local, otherwise use nvcomp 2.2 (#10513) @robertmaynard
    • Fix findall_record to return empty list for no matches (#10491) @davidwendt
    • Namespace/Docstring Fixes for Reduction (#10471) @isVoid
    • Additional refactoring of hash functions (#10462) @bdice
    • Fix default value of str.split expand parameter. (#10457) @bdice
    • Remove deprecated code. (#10450) @vyasr

    🐛 Bug Fixes

    • Fix single column MultiIndex issue in sort_index (#10957) @galipremsagar
    • Make SerializedTableHeader(numRows) public (#10949) @gerashegalov
    • Fix gcc_linux version pinning in dev environment (#10943) @galipremsagar
    • Fix an issue with reading raw string in cudf.read_json (#10924) @galipremsagar
    • Make cudf::test::expect_columns_equal() to fail when comparing unsanitary lists. (#10880) @nvdbaranec
    • Fix segmented_reduce on empty column with non-empty offsets (#10876) @davidwendt
    • Fix dask-cudf groupby handling when grouping by all columns (#10866) @charlesbluca
    • Fix a bug in distinct: using nested nulls logic (#10848) @PointKernel
    • Fix constness / references in weak ordering operator() signatures. (#10846) @bdice
    • Suppress sizeof-array-div warnings in thrust found by gcc-11 (#10840) @robertmaynard
    • Add handling for string by-columns in dask-cudf groupby (#10830) @charlesbluca
    • Fix compile warning in search.cu (#10827) @davidwendt
    • Fix element access const correctness in hostdevice_vector (#10804) @vuule
    • Update cuco git tag (#10788) @PointKernel
    • HostColumnVectoreCore#isNull should return true for out-of-range rows (#10779) @gerashegalov
    • Fixing deprecation warnings in test_orc.py (#10772) @hyperbolic2346
    • Enable writing to s3 storage in chunked parquet writer (#10769) @galipremsagar
    • Fix construction of nested structs with EMPTY child (#10761) @shwina
    • Fix replace error when regex has only zero match quantifiers (#10760) @davidwendt
    • Fix an issue with one_level_list schemas in parquet reader. (#10750) @nvdbaranec
    • update mangle_dupe_cols behavior in csv reader to match pandas 1.4.0 behavior (#10749) @karthikeyann
    • Fix cupy function in notebook (#10737) @ajschmidt8
    • Fix fillna to retain columns when it is MultiIndex (#10729) @galipremsagar
    • Fix scatter for all-empty-string column case (#10724) @davidwendt
    • Retain series name in Series.apply (#10716) @brandon-b-miller
    • Correct build dir cudf-config dependency issues for static builds (#10704) @robertmaynard
    • Fix list of testing requirements in setup.py. (#10678) @bdice
    • Fix rounding to zero error in stod on very small float numbers (#10672) @davidwendt
    • cuco isn't a cudf dependency when we are built shared (#10662) @robertmaynard
    • Fix to_timestamps to support Z for %z format specifier (#10617) @davidwendt
    • Verify compression type in Parquet reader (#10610) @vuule
    • Fix struct row comparator's exception on empty structs (#10604) @sperlingxx
    • Fix strings strip() to accept only str Scalar for to_strip parameter (#10597) @davidwendt
    • Fix has_atomic_support check in can_use_hash_groupby() (#10588) @jbrennan333
    • Revert Thrust 1.16 to Thrust 1.15 (#10586) @bdice
    • Fix missing RMM_STATIC_CUDART define when compiling JNI with static CUDA runtime (#10585) @jlowe
    • pin more cmake versions (#10570) @robertmaynard
    • Re-enable Build Metrics Report (#10562) @davidwendt
    • Remove statically linked CUDA runtime check in Java build (#10532) @jlowe
    • Fix temp data cleanup in test_text.py (#10524) @brandon-b-miller
    • Update pre-commit to run black 22.3.0 (#10523) @vyasr
    • Remove deprecated decimal_cols_as_float in the ORC reader (#10515) @vuule
    • Fix findall_record to return empty list for no matches (#10491) @davidwendt
    • Allow users to specify data types for a subset of columns in read_csv (#10484) @vuule
    • Fix default value of str.split expand parameter. (#10457) @bdice
    • Improve coverage of dask-cudf's groupby aggregation, add tests for dropna support (#10449) @charlesbluca
    • Allow string aggs for dask_cudf.CudfDataFrameGroupBy.aggregate (#10222) @charlesbluca
    • In-place updates with loc or iloc don't work correctly when the LHS has more than one column (#9918) @skirui-source

    📖 Documentation

    • Clarify append deprecation notice. (#10930) @bdice
    • Use full name of GPUDirect Storage SDK in docs (#10904) @vuule
    • Update Dask + Pandas to Dask + cuDF path (#10897) @miguelusque
    • Add missing documentation in cudf/types.hpp (#10895) @karthikeyann
    • Add strong index iterator docs. (#10888) @bdice
    • spell check fixes (#10865) @karthikeyann
    • Add missing documentation in scalar/ headers (#10861) @karthikeyann
    • Remove typo in ngram documentation (#10859) @miguelusque
    • fix doxygen warnings (#10842) @karthikeyann
    • Add a library_design.md file documenting the core Python data structures and their relationship (#10817) @vyasr
    • Add NumPy to intersphinx references. (#10809) @bdice
    • Add a section to the docs that compares cuDF with Pandas (#10796) @shwina
    • Mention 2 cpp-reviewer requirement in pull request template (#10768) @davidwendt
    • Enable pydocstyle for all packages. (#10759) @bdice
    • Enable pydocstyle rules involving quotes (#10748) @vyasr
    • Revise 10 minutes notebook. (#10738) @bdice
    • Reorganize cuDF Python docs (#10691) @shwina
    • Fix sphinx/jupyter heading issue in UDF notebook (#10690) @brandon-b-miller
    • Migrated user guide notebooks to MyST-NB and added sphinx extension (#10685) @mmccarty
    • add data generation to benchmark documentation (#10677) @karthikeyann
    • Fix some docs build warnings (#10674) @galipremsagar
    • Update UDF notebook in User Guide. (#10668) @bdice
    • Improve User Guide docs (#10663) @bdice
    • Fix some docstrings formatting (#10660) @galipremsagar
    • Remove implementation details from apply docstrings (#10651) @brandon-b-miller
    • Revise CONTRIBUTING.md (#10644) @bdice
    • Add missing APIs to documentation. (#10643) @bdice
    • Use cudf.read_json as documented API name. (#10640) @bdice
    • Fix docstring section headings. (#10639) @bdice
    • Document cudf.read_text and cudf.read_avro. (#10638) @bdice
    • Fix type-o in docstring for json_reader_options (#10627) @dagardner-nv
    • Update guide to UDFs with notes about Series.applymap deprecation and related changes (#10607) @brandon-b-miller
    • Fix doxygen Modules page for cudf::lists::sequences (#10561) @davidwendt
    • Add Replace Backreferences section to Regex Features page (#10560) @davidwendt
    • Introduce deprecation policy to developer guide. (#10252) @vyasr

    🚀 New Features

    • Enable Zstandard decompression only when all nvcomp integrations are enabled (#10944) @vuule
    • Handle nested types in cudf::concatenate_rows() (#10890) @nvdbaranec
    • Strong index types for equality comparator (#10883) @ttnghia
    • Add parameters to control page size in Parquet writer (#10882) @etseidl
    • Support for Zstandard decompression in ORC reader (#10873) @vuule
    • Use pre-built nvcomp 2.3 binaries by default (#10851) @robertmaynard
    • Support for Zstandard decompression in Parquet reader (#10847) @vuule
    • Add JNI support for apply_boolean_mask (#10812) @res-life
    • Segmented Min/Max for Fixed Point Types (#10794) @isVoid
    • Return per-file metadata from readers (#10782) @vuule
    • Segmented apply_boolean_mask for LIST columns (#10773) @mythrocks
    • Update groupby::hash to use new row operators for keys (#10770) @PointKernel
    • Support purging non-empty null elements from LIST/STRING columns (#10701) @mythrocks
    • Add detail::hash_join (#10695) @PointKernel
    • Persist string statistics data across multiple calls to orc chunked write (#10694) @hyperbolic2346
    • Add .list.astype() to cast list leaves to specified dtype (#10693) @shwina
    • JNI: Add generateListOffsets API (#10683) @sperlingxx
    • Support args in groupby apply (#10682) @brandon-b-miller
    • Enable segmented_gather in Java package (#10669) @sperlingxx
    • Add row hasher with nested column support (#10641) @devavret
    • Add support for numeric_only in DataFrame._reduce (#10629) @martinfalisse
    • First step toward statistics in ORC files with chunked writes (#10567) @hyperbolic2346
    • Add support for struct columns to the random table generator (#10566) @vuule
    • Enable passing a sequence for the index argument to .list.get() (#10564) @shwina
    • Add python bindings for cudf::list::index_of (#10549) @ChrisJar
    • Add default= kwarg to .list.get() accessor method (#10547) @shwina
    • Add cudf.DataFrame.applymap (#10542) @brandon-b-miller
    • Support nvComp 2.3 if local, otherwise use nvcomp 2.2 (#10513) @robertmaynard
    • Add column field ID control in parquet writer (#10504) @PointKernel
    • Deprecate Series.applymap (#10497) @brandon-b-miller
    • Add option to drop cache in cuIO benchmarks (#10488) @vuule
    • move benchmark input generation in device in reduction nvbench (#10486) @karthikeyann
    • Support Segmented Min/Max Reduction on String Type (#10447) @isVoid
    • List element Equality comparator (#10289) @devavret
    • Implement all methods of groupby rank aggregation in libcudf, python (#9569) @karthikeyann
    • Implement DataFrame.eval using libcudf ASTs (#8022) @vyasr

    🛠️ Improvements

    • Use conda compilers in env file (#10915) @galipremsagar
    • Remove C style artifacts in cuIO (#10886) @vuule
    • Rename sliced_child to get_sliced_child. (#10885) @bdice
    • Replace defaulted stream value for libcudf APIs that use NVCOMP (#10877) @jbrennan333
    • Add more unit tests for cudf::distinct for nested types with sliced input (#10860) @ttnghia
    • Changing list_view.cuh to list_view.hpp (#10854) @ttnghia
    • More error checking in from_dlpack (#10850) @wence-
    • Cleanup regex compiler fixed quantifiers source (#10843) @davidwendt
    • Adds the JNI call for Cuda.deviceSynchronize (#10839) @abellina
    • Add missing cuda-python dependency to cudf (#10833) @bdice
    • Change std::string parameters in cudf::strings APIs to std::string_view (#10832) @davidwendt
    • Split up search.cu to improve compile time (#10831) @davidwendt
    • Add tests for null scalar binaryops (#10828) @brandon-b-miller
    • Cleanup regex compile optimize functions (#10825) @davidwendt
    • Use ThreadedMotoServer instead of subprocess in spinning up s3 server (#10822) @galipremsagar
    • Import NA from missing rather than using cudf.NA everywhere (#10821) @brandon-b-miller
    • Refactor regex builtin character-class identifiers (#10814) @davidwendt
    • Change pattern parameter for regex APIs from std::string to std::string_view (#10810) @davidwendt
    • Make the JNI API to get list offsets as a view public. (#10807) @revans2
    • Add cudf JNI docker build github action (#10806) @pxLi
    • Removed mr parameter from inplace bitmask operations (#10805) @AtlantaPepsi
    • Refactor cudf::contains, renaming and switching parameters role (#10802) @ttnghia
    • Handle closed property in IntervalDtype.from_pandas (#10798) @wence-
    • Return weak orderings from device_row_comparator. (#10793) @rwlee
    • Rework Scalar imports (#10791) @brandon-b-miller
    • Enable ccache for cudfjni build in Docker (#10790) @gerashegalov
    • Generic serialization of all column types (#10784) @wence-
    • simplifying skiprows test in test_orc.py (#10783) @hyperbolic2346
    • Use column_views instead of column_device_views in binary operations. (#10780) @bdice
    • Add struct utility functions. (#10776) @bdice
    • Add multiple rows to subword tokenizer benchmark (#10767) @davidwendt
    • Refactor host decompression in ORC reader (#10764) @vuule
    • Flush output streams before creating a process to drop caches (#10762) @vuule
    • Refactor binaryop/compiled/util.cpp (#10756) @bdice
    • Use warp per string for long strings in cudf::strings::contains() (#10739) @davidwendt
    • Use generator expressions in any/all functions. (#10736) @bdice
    • Use canonical "magic methods" (replace x.__repr__() with repr(x)). (#10735) @bdice
    • Improve use of isinstance. (#10734) @bdice
    • Rename tests from multiIndex to multiindex. (#10732) @bdice
    • Two-table comparators with strong index types (#10730) @bdice
    • Replace std::make_pair with std::pair (C++17 CTAD) (#10727) @karthikeyann
    • Use structured bindings instead of std::tie (#10726) @karthikeyann
    • Missing f prefix on f-strings fix (#10721) @code-review-doctor
    • Add max_file_size parameter to chunked parquet dataset writer (#10718) @galipremsagar
    • Deprecate merge_sorted, change dask cudf usage to internal method (#10713) @isVoid
    • Prepare dask_cudf test_parquet.py for upcoming API changes (#10709) @rjzamora
    • Remove or simplify various utility functions (#10705) @vyasr
    • Allow building arrow with parquet and not python (#10702) @revans2
    • Partial cuIO GPU decompression refactor (#10699) @vuule
    • Cython API refactor: merge.pyx (#10698) @isVoid
    • Fix random string data length to become variable (#10697) @galipremsagar
    • Add bindings for index_of with column search key (#10696) @ChrisJar
    • Deprecate index merging (#10689) @vyasr
    • Remove cudf::strings::string namespace (#10684) @davidwendt
    • Standardize imports. (#10680) @bdice
    • Standardize usage of collections.abc. (#10679) @bdice
    • Cython API Refactor: transpose.pyx, sort.pyx (#10675) @isVoid
    • Add device_memory_resource parameter to create_string_vector_from_column (#10673) @davidwendt
    • Split up mixed-join kernels source files (#10671) @davidwendt
    • Use std::filesystem for temporary directory location and deletion (#10664) @vuule
    • cleanup benchmark includes (#10661) @karthikeyann
    • Use upstream clang-format pre-commit hook. (#10659) @bdice
    • Clean up C++ includes to use <> instead of "". (#10658) @bdice
    • Handle RuntimeError thrown by CUDA Python in validate_setup (#10653) @shwina
    • Rework JNI CMake to leverage rapids_find_package (#10649) @jlowe
    • Use conda to build python packages during GPU tests (#10648) @Ethyling
    • Deprecate various functions that don't need to be defined for Index. (#10647) @vyasr
    • Update pinning to allow newer CMake versions. (#10646) @vyasr
    • Bump hadoop-common from 3.1.4 to 3.2.3 in /java (#10645) @dependabot[bot]
    • Remove concurrent_unordered_multimap. (#10642) @bdice
    • Improve parquet dictionary encoding (#10635) @PointKernel
    • Improve cudf::cuda_error (#10630) @sperlingxx
    • Add support for null and non-numeric types in Series.diff and DataFrame.diff (#10625) @Matt711
    • Branch 22.06 merge 22.04 (#10624) @vyasr
    • Unpin dask & distributed for development (#10623) @galipremsagar
    • Slightly improve accuracy of stod in to_floats (#10622) @davidwendt
    • Allow libcudfjni to be built as a static library (#10619) @jlowe
    • Change stack-based regex state data to use global memory (#10600) @davidwendt
    • Resolve Forward merging of branch-22.04 into branch-22.06 (#10598) @galipremsagar
    • KvikIO as an alternative GDS backend (#10593) @madsbk
    • Rename CUDA_TRY macro to CUDF_CUDA_TRY, rename CHECK_CUDA macro to CUDF_CHECK_CUDA. (#10589) @bdice
    • Upgrade cudf to support pandas 1.4.x versions (#10584) @galipremsagar
    • Refactor binary ops for timedelta and datetime columns (#10581) @vyasr
    • Refactor cudf::strings::count_re API to use count_matches utility (#10580) @davidwendt
    • Update Programming Language :: Python Versions to 3.8 & 3.9 (#10579) @madsbk
    • Automate Java cudf jar build with statically linked dependencies (#10578) @gerashegalov
    • Add patch for thrust-cub 1.16 to fix sort compile times (#10577) @davidwendt
    • Move binop methods from Frame to IndexedFrame and standardize the docstring (#10576) @vyasr
    • Cleanup libcudf strings regex classes (#10573) @davidwendt
    • Simplify preprocessing of arguments for DataFrame binops (#10563) @vyasr
    • Reduce kernel calls to build strings findall results (#10559) @davidwendt
    • Forward-merge branch-22.04 to branch-22.06 (#10557) @bdice
    • Update strings contains benchmark to measure varying match rates (#10555) @davidwendt
    • JNI: throw CUDA errors more specifically (#10551) @sperlingxx
    • Enable building static libs (#10545) @trxcllnt
    • Remove pip requirements files. (#10543) @bdice
    • Remove Click pinnings that are unnecessary after upgrading black. (#10541) @vyasr
    • Refactor memory_usage to improve performance (#10537) @galipremsagar
    • Adjust the valid range of group index for replace_with_backrefs (#10530) @sperlingxx
    • add accidentally removed comment. (#10526) @vyasr
    • Update conda environment. (#10525) @vyasr
    • Remove ColumnBase.getitem (#10516) @vyasr
    • Optimize left_semi_join by materializing the gather mask (#10511) @cheinger
    • Define proper binary operation APIs for columns (#10509) @vyasr
    • Upgrade arrow-cpp & pyarrow to 7.0.0 (#10503) @galipremsagar
    • Update to Thrust 1.16 (#10489) @bdice
    • Namespace/Docstring Fixes for Reduction (#10471) @isVoid
    • Update cudfjni 22.06.0-SNAPSHOT (#10467) @pxLi
    • Use Lists of Columns for Various Files (#10463) @isVoid
    • Additional refactoring of hash functions (#10462) @bdice
    • Fix Series.str.findall behavior for expand=False. (#10459) @bdice
    • Remove deprecated code. (#10450) @vyasr
    • Update cmake-format version. (#10440) @vyasr
    • Consolidate C++ conda recipes and add libcudf-tests package (#10326) @ajschmidt8
    • Use conda compilers (#10275) @Ethyling
    • Add row bitmask as a detail::hash_join member (#10248) @PointKernel
    Source code(tar.gz)
    Source code(zip)
  • v22.04.00(Apr 6, 2022)

    🚨 Breaking Changes

    • Drop unsupported method argument from nunique and distinct_count. (#10411) @bdice
    • Refactor stream compaction APIs (#10370) @PointKernel
    • Add scan_aggregation and reduce_aggregation derived types. (#10357) @nvdbaranec
    • Avoid decimal type narrowing for decimal binops (#10299) @galipremsagar
    • Rewrites sample API (#10262) @isVoid
    • Remove probe-time null equality parameters in cudf::hash_join (#10260) @PointKernel
    • Enable proper Index round-tripping in orc reader and writer (#10170) @galipremsagar
    • Add JNI for strings::split_re and strings::split_record_re (#10139) @ttnghia
    • Change cudf::strings::find_multiple to return a lists column (#10134) @davidwendt
    • Remove the option to completely disable decimal128 columns in the ORC reader (#10127) @vuule
    • Remove deprecated code (#10124) @vyasr
    • Update gpu_utils.py to reflect current CUDA support. (#10113) @bdice
    • Optimize compaction operations (#10030) @PointKernel
    • Remove deprecated method Series.set_index. (#9945) @bdice
    • Add cudf::strings::findall_record API (#9911) @davidwendt
    • Upgrade arrow & pyarrow to 6.0.1 (#9686) @galipremsagar

    🐛 Bug Fixes

    • Fix an issue with tdigest merge aggregations. (#10506) @nvdbaranec
    • Batch of fixes for index overflows in grid stride loops. (#10448) @nvdbaranec
    • Update dask_cudf imports to be compatible with latest dask (#10442) @rlratzel
    • Fix for integer overflow in contiguous-split (#10437) @jbrennan333
    • Fix has_null predicate for drop_list_duplicates on nested structs (#10436) @sperlingxx
    • Fix empty reduce with List output and non-List input (#10435) @sperlingxx
    • Fix list and struct meta generation issue in dask-cudf (#10434) @galipremsagar
    • Fix error in cudf.to_numeric when a bool input is passed (#10431) @galipremsagar
    • Support cupy array in quantile input (#10429) @galipremsagar
    • Fix benchmarks to work with new aggregation types (#10428) @davidwendt
    • Fix cudf::shift to handle offset greater than column size (#10414) @davidwendt
    • Fix lifespan of the temporary directory that holds cuFile configuration file (#10403) @vuule
    • Fix error thrown in compiled-binaryop benchmark (#10398) @davidwendt
    • Limiting async allocator using alignment of 512 (#10395) @rongou
    • Include <optional> in multibyte split. (#10385) @bdice
    • Fix issue with column and scalar re-assignment (#10377) @galipremsagar
    • Fix floating point data generation in benchmarks (#10372) @vuule
    • Avoid overflow in fused_concatenate_kernel output_index (#10344) @abellina
    • Remove is_relationally_comparable for table device views (#10342) @davidwendt
    • Fix debug compile error in device_span to column_view conversion (#10331) @davidwendt
    • Add Pascal support to JCUDF transcode (row_conversion) (#10329) @mythrocks
    • Fix std::bad_alloc exception due to JIT reserving a huge buffer (#10317) @ttnghia
    • Fixes up the overflowed fixed-point round on nullable column (#10316) @sperlingxx
    • Fix DataFrame slicing issues for empty cases (#10310) @brandon-b-miller
    • Fix documentation issues (#10307) @ajschmidt8
    • Allow Java bindings to use default decimal precisions when writing columns (#10276) @sperlingxx
    • Fix incorrect slicing of GDS read/write calls (#10274) @vuule
    • Fix out-of-memory error in compiled-binaryop benchmark (#10269) @davidwendt
    • Add tests of reflected ufuncs and fix behavior of logical reflected ufuncs (#10261) @vyasr
    • Remove probe-time null equality parameters in cudf::hash_join (#10260) @PointKernel
    • Fix out-of-memory error in UrlDecode benchmark (#10258) @davidwendt
    • Fix groupby reductions that perform operations on source type instead of target type (#10250) @ttnghia
    • Fix small leak in explode (#10245) @revans2
    • Yet another small JNI memory leak (#10238) @revans2
    • Fix regex octal parsing to limit to 3 characters (#10233) @davidwendt
    • Fix string to decimal128 conversion handling large exponents (#10231) @davidwendt
    • Fix JNI leak on copy to device (#10229) @revans2
    • Fix the data generator element size for decimal types (#10225) @vuule
    • Fix decimal metadata in parquet writer (#10224) @galipremsagar
    • Fix strings handling of hex in regex pattern (#10220) @davidwendt
    • Fix docs builds (#10216) @ajschmidt8
    • Fix a leftover _has_nulls change from Nullate (#10211) @devavret
    • Fix bitmask of the output for JNI of lists::drop_list_duplicates (#10210) @ttnghia
    • Fix compile error in binaryop/compiled/util.cpp (#10209) @ttnghia
    • Skip ORC and Parquet readers' benchmark cases that are not currently supported (#10194) @vuule
    • Fix JNI leak of a cudf::column_view native class. (#10171) @revans2
    • Enable proper Index round-tripping in orc reader and writer (#10170) @galipremsagar
    • Convert Column Name to String Before Using Struct Column Factory (#10156) @isVoid
    • Preserve the correct ListDtype while creating an identical empty column (#10151) @galipremsagar
    • benchmark fixture - static object pointer fix (#10145) @karthikeyann
    • Fix UDF Caching (#10133) @brandon-b-miller
    • Raise duplicate column error in DataFrame.rename (#10120) @galipremsagar
    • Fix flaky memory usage test by guaranteeing array size. (#10114) @vyasr
    • Encode values from python callback for C++ (#10103) @jdye64
    • Add check for regex instructions causing an infinite-loop (#10095) @davidwendt
    • Remove metadata singleton from nvtext normalizer (#10090) @davidwendt
    • Column equality testing fixes (#10011) @brandon-b-miller
    • Pin libcudf runtime dependency for cudf / libcudf-kafka nightlies (#9847) @charlesbluca

    📖 Documentation

    • Fix documentation for DataFrame.corr and Series.corr. (#10493) @bdice
    • Add cut to API docs (#10479) @shwina
    • Remove documentation for methods removed in #10124. (#10366) @bdice
    • Fix documentation issues (#10306) @ajschmidt8
    • Fix fixed_point binary operation documentation (#10198) @codereport
    • Remove cleaned up methods from docs (#10189) @galipremsagar
    • Update developer guide to recommend no default stream parameter. (#10136) @bdice
    • Update benchmarking guide to use NVBench. (#10093) @bdice

    🚀 New Features

    • Add StringIO support to read_text (#10465) @cwharris
    • Add support for tdigest and merge_tdigest aggregations through cudf::reduce (#10433) @nvdbaranec
    • JNI support for Collect Ops in Reduction (#10427) @sperlingxx
    • Enable read_text with dask_cudf using byte_range (#10407) @ChrisJar
    • Add cudf::stable_sort_by_key (#10387) @PointKernel
    • Implement maps_column_view abstraction over LIST&lt;STRUCT&lt;K,V&gt;&gt; (#10380) @mythrocks
    • Support Java bindings for Avro reader (#10373) @HaoYang670
    • Refactor stream compaction APIs (#10370) @PointKernel
    • Support collect aggregations in reduction (#10353) @sperlingxx
    • Refactor array_ufunc for Index and unify across all classes (#10346) @vyasr
    • Add JNI for extract_list_element with index column (#10341) @firestarman
    • Support min and max operations for structs in rolling window (#10332) @ttnghia
    • Add device create_sequence_table for benchmarks (#10300) @karthikeyann
    • Enable numpy ufuncs for DataFrame (#10287) @vyasr
    • move input generation for json benchmark to device (#10281) @karthikeyann
    • move input generation for type dispatcher benchmark to device (#10280) @karthikeyann
    • move input generation for copy benchmark to device (#10279) @karthikeyann
    • generate url decode benchmark input in device (#10278) @karthikeyann
    • device input generation in join bench (#10277) @karthikeyann
    • Add nvtext::byte_pair_encoding API (#10270) @davidwendt
    • Prevent internal usage of expensive APIs (#10263) @vyasr
    • Column to JCUDF row for tables with strings (#10235) @hyperbolic2346
    • Support percent_rank() aggregation (#10227) @mythrocks
    • Refactor Series.array_ufunc (#10217) @vyasr
    • Reduce pytest runtime (#10203) @brandon-b-miller
    • Add regex flags parameter to python cudf strings split (#10185) @davidwendt
    • Support for MOD, PMOD and PYMOD for decimal32/64/128 (#10179) @codereport
    • Adding string row size iterator for row to column and column to row conversion (#10157) @hyperbolic2346
    • Add file size counter to cuIO benchmarks (#10154) @vuule
    • byte_range support for multibyte_split/read_text (#10150) @cwharris
    • Add JNI for strings::split_re and strings::split_record_re (#10139) @ttnghia
    • Add maxSplit parameter to Java binding for strings:split (#10137) @ttnghia
    • Add libcudf strings split API that accepts regex pattern (#10128) @davidwendt
    • generate benchmark input in device (#10109) @karthikeyann
    • Avoid nan_as_null op if nan_count is 0 (#10082) @galipremsagar
    • Add Dataframe and Index nunique (#10077) @martinfalisse
    • Support nanosecond timestamps in parquet (#10063) @PointKernel
    • Java bindings for mixed semi and anti joins (#10040) @jlowe
    • Implement mixed equality/conditional semi/anti joins (#10037) @vyasr
    • Optimize compaction operations (#10030) @PointKernel
    • Support args= in Series.apply (#9982) @brandon-b-miller
    • Add cudf::strings::findall_record API (#9911) @davidwendt
    • Add covariance for sort groupby (python) (#9889) @mayankanand007
    • Implement DataFrame diff() (#9817) @skirui-source
    • Implement DataFrame pct_change (#9805) @skirui-source
    • Support segmented reductions and null mask reductions (#9621) @isVoid
    • Add 'spearman' correlation method for dataframe.corr and series.corr (#7141) @dominicshanshan

    🛠️ Improvements

    • Add scipy skip for a test (#10502) @galipremsagar
    • Temporarily disable new ops-bot functionality (#10496) @ajschmidt8
    • Include <cstddef> to fix compilation of parquet reader on GCC 11. (#10483) @bdice
    • Pin dask and distributed (#10481) @galipremsagar
    • MD5 refactoring. (#10445) @bdice
    • Remove or split up Frame methods that use the index (#10439) @vyasr
    • Centralization of tdigest aggregation code. (#10422) @nvdbaranec
    • Simplify column binary operations (#10421) @vyasr
    • Add .github/ops-bot.yaml config file (#10420) @ajschmidt8
    • Use list of columns for methods in Groupby.pyx (#10419) @isVoid
    • Remove warnings in test_timedelta.py (#10418) @galipremsagar
    • Fix some warnings in test_parquet.py (#10416) @galipremsagar
    • JNI support for segmented reduce (#10413) @revans2
    • Clean up null mask after purging null entries (#10412) @sperlingxx
    • Drop unsupported method argument from nunique and distinct_count. (#10411) @bdice
    • Use str instead of builtins.str. (#10410) @bdice
    • Fix warnings in test_rolling (#10405) @bdice
    • Enable codecov github-check in CI (#10404) @galipremsagar
    • Fix warnings in test_cuda_apply, test_numerical, test_pickling, test_unaops. (#10402) @bdice
    • Set column names in _from_columns_like_self factory (#10400) @isVoid
    • Refactor nvtx annotations in cudf & dask-cudf (#10396) @galipremsagar
    • Consolidate .cov and .corr for sort groupby (#10386) @skirui-source
    • Consolidate some Frame APIs (#10381) @vyasr
    • Refactor hash functions and hash_combine (#10379) @bdice
    • Add nvtx annotations for Series and Index (#10374) @galipremsagar
    • Refactor filling.repeat API (#10371) @isVoid
    • Move standalone UTF8 functions from string_view.hpp to utf8.hpp (#10369) @davidwendt
    • Remove doc for deprecated function one_hot_encoding (#10367) @isVoid
    • Refactor array function (#10364) @vyasr
    • Fix warnings in test_csv.py. (#10362) @bdice
    • Implement a mixin for binops (#10360) @vyasr
    • Refactor cython interface: copying.pyx (#10359) @isVoid
    • Implement a mixin for scans (#10358) @vyasr
    • Add scan_aggregation and reduce_aggregation derived types. (#10357) @nvdbaranec
    • Add cleanup of python artifacts (#10355) @galipremsagar
    • Fix warnings in test_categorical.py. (#10354) @bdice
    • Create a dispatcher for invoking regex kernel functions (#10349) @davidwendt
    • Fix codecov in CI (#10347) @galipremsagar
    • Enable caching for memory_usage calculation in Column (#10345) @galipremsagar
    • C++17 cleanup: traits replace std::enable_if<>::type with std::enable_if_t (#10343) @karthikeyann
    • JNI: Support appending DECIMAL128 into ColumnBuilder in terms of byte array (#10338) @sperlingxx
    • multibyte_split test improvements (#10328) @vuule
    • Fix warnings in test_binops.py. (#10327) @bdice
    • Fix warnings from pandas in test_array_ufunc.py. (#10324) @bdice
    • Update upload script (#10321) @ajschmidt8
    • Move hash type declarations to hashing.hpp (#10320) @davidwendt
    • C++17 cleanup: traits replace ::value with _v (#10319) @karthikeyann
    • Remove internal columns usage (#10315) @vyasr
    • Remove extraneous build.sh parameter (#10313) @ajschmidt8
    • Add const qualifier to MurmurHash3_32::hash_combine (#10311) @davidwendt
    • Remove TODO in libcudf_kafka recipe (#10309) @ajschmidt8
    • Add conversions between column_view and device_span<T const>. (#10302) @bdice
    • Avoid decimal type narrowing for decimal binops (#10299) @galipremsagar
    • Deprecate DataFrame.iteritems and introduce .items (#10298) @galipremsagar
    • Explicitly request CMake use gnu++17 over c++17 (#10297) @robertmaynard
    • Add copyright check as pre-commit hook. (#10290) @vyasr
    • DataFrame insert and creation optimizations (#10285) @galipremsagar
    • Improve hash join detail functions (#10273) @PointKernel
    • Replace custom cached_property implementation with functools (#10272) @shwina
    • Rewrites sample API (#10262) @isVoid
    • Bump hadoop-common from 3.1.0 to 3.1.4 in /java (#10259) @dependabot[bot]
    • Remove making redundant copy across code-base (#10257) @galipremsagar
    • Add more nvtx annotations (#10256) @galipremsagar
    • Add copyright check in cudf (#10253) @galipremsagar
    • Remove redundant copies in fillna to improve performance (#10241) @galipremsagar
    • Remove std::numeric_limit specializations for timestamp & durations (#10239) @codereport
    • Optimize DataFrame creation across code-base (#10236) @galipremsagar
    • Change pytest distribution algorithm and increase parallelism in CI (#10232) @galipremsagar
    • Add environment variables for I/O thread pool and slice sizes (#10218) @vuule
    • Add regex flags to strings findall functions (#10208) @davidwendt
    • Update dask-cudf parquet tests to reflect upstream bugfixes to _metadata (#10206) @charlesbluca
    • Remove unnecessary nunique function in Series. (#10205) @martinfalisse
    • Refactor DataFrame tests. (#10204) @bdice
    • Rewrites column.__setitem__, Use boolean_mask_scatter (#10202) @isVoid
    • Java utilities to aid in accelerating aggregations on 128-bit types (#10201) @jlowe
    • Fix docstrings alignment in Frame methods (#10199) @galipremsagar
    • Fix cuco pair issue in hash join (#10195) @PointKernel
    • Replace dask groupby .index usages with .by (#10193) @galipremsagar
    • Add regex flags to strings extract function (#10192) @davidwendt
    • Forward-merge branch-22.02 to branch-22.04 (#10191) @bdice
    • Add CMake install rule for tests (#10190) @ajschmidt8
    • Unpin dask & distributed (#10182) @galipremsagar
    • Add comments to explain test validation (#10176) @galipremsagar
    • Reduce warnings in pytest output (#10168) @bdice
    • Some consolidation of indexed frame methods (#10167) @vyasr
    • Refactor isin implementations (#10165) @vyasr
    • Faster struct row comparator (#10164) @devavret
    • Refactor groupby::get_groups. (#10161) @bdice
    • Deprecate decimal_cols_as_float in ORC reader (C++ layer) (#10152) @vuule
    • Replace ccache with sccache (#10146) @ajschmidt8
    • Murmur3 hash kernel cleanup (#10143) @rwlee
    • Deprecate decimal_cols_as_float in ORC reader (#10142) @galipremsagar
    • Run pyupgrade 2.31.0. (#10141) @bdice
    • Remove drop_nan from internal IndexedFrame._drop_na_rows. (#10140) @bdice
    • Change cudf::strings::find_multiple to return a lists column (#10134) @davidwendt
    • Update cmake-format script for branch 22.04. (#10132) @bdice
    • Accept r-value references in convert_table_for_return(): (#10131) @mythrocks
    • Remove the option to completely disable decimal128 columns in the ORC reader (#10127) @vuule
    • Remove deprecated code (#10124) @vyasr
    • Update gpu_utils.py to reflect current CUDA support. (#10113) @bdice
    • Remove benchmarks suffix (#10112) @bdice
    • Update cudf java binding version to 22.04.0-SNAPSHOT (#10084) @pxLi
    • Remove unnecessary docker files. (#10069) @vyasr
    • Limit benchmark iterations using environment variable (#10060) @karthikeyann
    • Add timing chart for libcudf build metrics report page (#10038) @davidwendt
    • JNI: Rewrite growBuffersAndRows to accelerate the HostColumnBuilder (#10025) @sperlingxx
    • Reduce redundant code in CUDF JNI (#10019) @mythrocks
    • Make snappy decompress check more efficient (#9995) @cheinger
    • Remove deprecated method Series.set_index. (#9945) @bdice
    • Implement a mixin for reductions (#9925) @vyasr
    • JNI: Push back decimal utils from spark-rapids (#9907) @sperlingxx
    • Add assert_column_memory_* (#9882) @isVoid
    • Add CUDF_UNREACHABLE macro. (#9727) @bdice
    • Upgrade arrow & pyarrow to 6.0.1 (#9686) @galipremsagar
    Source code(tar.gz)
    Source code(zip)
  • v22.02.00(Feb 2, 2022)

    🚨 Breaking Changes

    • ORC writer API changes for granular statistics (#10058) @mythrocks
    • decimal128 Support for to/from_arrow (#9986) @codereport
    • Remove deprecated method one_hot_encoding (#9977) @isVoid
    • Remove str.subword_tokenize (#9968) @VibhuJawa
    • Remove deprecated method parameter from merge and join. (#9944) @bdice
    • Remove deprecated method DataFrame.hash_columns. (#9943) @bdice
    • Remove deprecated method Series.hash_encode. (#9942) @bdice
    • Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007
    • Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar
    • Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt
    • Break tie for top categorical columns in Series.describe (#9867) @isVoid
    • Add partitioning support in parquet writer (#9810) @devavret
    • Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid
    • Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar
    • Change default dtype of all nulls column from float to object (#9803) @galipremsagar
    • Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller
    • Pick smallest decimal type with required precision in ORC reader (#9775) @vuule
    • Add decimal128 support to Parquet reader and writer (#9765) @vuule
    • Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe
    • Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule
    • Match pandas scalar result types in reductions (#9717) @brandon-b-miller
    • Add parameters to control row group size in Parquet writer (#9677) @vuule
    • Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice
    • Add support for decimal128 in cudf python (#9533) @galipremsagar
    • Implement lists::index_of() to find positions in list rows (#9510) @mythrocks
    • Rewriting row/column conversions for Spark <-> cudf data conversions (#8444) @hyperbolic2346

    🐛 Bug Fixes

    • Add check for negative stripe index in ORC reader (#10074) @vuule
    • Update Java tests to expect DECIMAL128 from Arrow (#10073) @jlowe
    • Avoid index materialization when DataFrame is created with un-named Series objects (#10071) @galipremsagar
    • fix gcc 11 compilation errors (#10067) @rongou
    • Fix columns ordering issue in parquet reader (#10066) @galipremsagar
    • Fix dataframe setitem with ndarray types (#10056) @galipremsagar
    • Remove implicit copy due to conversion from cudf::size_type and size_t (#10045) @robertmaynard
    • Include <optional> in headers that use std::optional (#10044) @robertmaynard
    • Fix repr and concat of StructColumn (#10042) @galipremsagar
    • Include row group level stats when writing ORC files (#10041) @vuule
    • build.sh respects the --build_metrics and --incl_cache_stats flags (#10035) @robertmaynard
    • Fix memory leaks in JNI native code. (#10029) @mythrocks
    • Update JNI to use new arena mr constructor (#10027) @rongou
    • Fix null check when comparing structs in arg_min operation of reduction/groupby (#10026) @ttnghia
    • Wrap CI script shell variables in quotes to fix local testing. (#10018) @bdice
    • cudftestutil no longer propagates compiler flags to external users (#10017) @robertmaynard
    • Remove CUDA_DEVICE_CALLABLE macro usage (#10015) @hyperbolic2346
    • Add missing list filling header in meta.yaml (#10007) @devavret
    • Fix conda recipes for custreamz & cudf_kafka (#10003) @ajschmidt8
    • Fix matching regex word-boundary (\b) in strings replace (#9997) @davidwendt
    • Fix null check when comparing structs in min and max reduction/groupby operations (#9994) @ttnghia
    • Fix octal pattern matching in regex string (#9993) @davidwendt
    • decimal128 Support for to/from_arrow (#9986) @codereport
    • Fix groupby shift/diff/fill after selecting from a GroupBy (#9984) @shwina
    • Fix the overflow problem of decimal rescale (#9966) @sperlingxx
    • Use default value for decimal precision in parquet writer when not specified (#9963) @devavret
    • Fix cudf java build error. (#9958) @firestarman
    • Use gpuci_mamba_retry to install local artifacts. (#9951) @bdice
    • Fix regression HostColumnVectorCore requiring native libs (#9948) @jlowe
    • Rename aggregate_metadata in writer to fix name collision (#9938) @devavret
    • Fixed issue with percentile_approx where output tdigests could have uninitialized data at the end. (#9931) @nvdbaranec
    • Resolve racecheck errors in ORC kernels (#9916) @vuule
    • Fix the java build after parquet partitioning support (#9908) @revans2
    • Fix compilation of benchmark for parquet writer. (#9905) @bdice
    • Fix a memcheck error in ORC writer (#9896) @vuule
    • Introduce nan_as_null parameter for cudf.Index (#9893) @galipremsagar
    • Fix fallback to sort aggregation for grouping only hash aggregate (#9891) @abellina
    • Add zlib to cudfjni link when using static libcudf library dependency (#9890) @jlowe
    • TimedeltaIndex constructor raises an AttributeError. (#9884) @skirui-source
    • Fix cudf.Scalar string datetime construction (#9875) @brandon-b-miller
    • Load libcufile.so with RTLD_NODELETE flag (#9872) @vuule
    • Break tie for top categorical columns in Series.describe (#9867) @isVoid
    • Fix null handling for structs min and arg_min in groupby, groupby scan, reduction, and inclusive_scan (#9864) @ttnghia
    • Add one-level list encoding support in parquet reader (#9848) @PointKernel
    • Fix an out-of-bounds read in validity copying in contiguous_split. (#9842) @nvdbaranec
    • Fix join of MultiIndex to Index with one column and overlapping name. (#9830) @vyasr
    • Fix caching in Series.applymap (#9821) @brandon-b-miller
    • Enforce boolean ascending for dask-cudf sort_values (#9814) @charlesbluca
    • Fix ORC writer crash with empty input columns (#9808) @vuule
    • Change default dtype of all nulls column from float to object (#9803) @galipremsagar
    • Load native dependencies when Java ColumnView is loaded (#9800) @jlowe
    • Fix dtype-argument bug in dask_cudf read_csv (#9796) @rjzamora
    • Fix overflow for min calculation in strings::from_timestamps (#9793) @revans2
    • Fix memory error due to lambda return type deduction limitation (#9778) @karthikeyann
    • Revert regex $/EOL end-of-string new-line special case handling (#9774) @davidwendt
    • Fix missing streams (#9767) @karthikeyann
    • Fix make_empty_scalar_like on list_type (#9759) @sperlingxx
    • Update cmake and conda to 22.02 (#9746) @devavret
    • Fix out-of-bounds memory write in decimal128-to-string conversion (#9740) @davidwendt
    • Match pandas scalar result types in reductions (#9717) @brandon-b-miller
    • Fix regex non-multiline EOL/$ matching strings ending with a new-line (#9715) @davidwendt
    • Fixed build by adding more checks for int8, int16 (#9707) @razajafri
    • Fix null handling when boolean dtype is passed (#9691) @galipremsagar
    • Fix stream usage in segmented_gather() (#9679) @mythrocks

    📖 Documentation

    • Update decimal dtypes related docs entries (#10072) @galipremsagar
    • Fix regex doc describing hexadecimal escape characters (#10009) @davidwendt
    • Fix cudf compilation instructions. (#9956) @esoha-nvidia
    • Fix see also links for IO APIs (#9895) @galipremsagar
    • Fix build instructions for libcudf doxygen (#9837) @davidwendt
    • Fix some doxygen warnings and add missing documentation (#9770) @karthikeyann
    • update cuda version in local build (#9736) @karthikeyann
    • Fix doxygen for enum types in libcudf (#9724) @davidwendt
    • Spell check fixes (#9682) @karthikeyann
    • Fix links in C++ Developer Guide. (#9675) @bdice

    🚀 New Features

    • Remove libcudacxx patch needed for nvcc 11.4 (#10057) @robertmaynard
    • Allow CuPy 10 (#10048) @jakirkham
    • Add in support for NULL_LOGICAL_AND and NULL_LOGICAL_OR binops (#10016) @revans2
    • Add groupby.transform (only support for aggregations) (#10005) @shwina
    • Add partitioning support to Parquet chunked writer (#10000) @devavret
    • Add jni for sequences (#9972) @wbo4958
    • Java bindings for mixed left, inner, and full joins (#9941) @jlowe
    • Java bindings for JSON reader support (#9940) @wbo4958
    • Enable transpose for string columns in cudf python (#9937) @galipremsagar
    • Support structs for cudf::contains with column/scalar input (#9929) @ttnghia
    • Implement mixed equality/conditional joins (#9917) @vyasr
    • Add cudf::strings::extract_all API (#9909) @davidwendt
    • Implement JNI for cudf::scatter APIs (#9903) @ttnghia
    • JNI: Function to copy and set validity from bool column. (#9901) @mythrocks
    • Add dictionary support to cudf::copy_if_else (#9887) @davidwendt
    • add run_benchmarks target for running benchmarks with json output (#9879) @karthikeyann
    • Add regex_flags parameter to strings replace_re functions (#9878) @davidwendt
    • Add_suffix and add_prefix for DataFrames and Series (#9846) @mayankanand007
    • Add JNI for cudf::drop_duplicates (#9841) @ttnghia
    • Implement per-list sequence (#9839) @ttnghia
    • adding series.transpose (#9835) @mayankanand007
    • Adding support for Series.autocorr (#9833) @mayankanand007
    • Support round operation on datetime64 datatypes (#9820) @mayankanand007
    • Add partitioning support in parquet writer (#9810) @devavret
    • Raise temporary error for decimal128 types in parquet reader (#9804) @galipremsagar
    • Add decimal128 support to Parquet reader and writer (#9765) @vuule
    • Optimize groupby::scan (#9754) @PointKernel
    • Add sample JNI API (#9728) @res-life
    • Support min and max in inclusive scan for structs (#9725) @ttnghia
    • Add first and last method to IndexedFrame (#9710) @isVoid
    • Support min and max reduction for structs (#9697) @ttnghia
    • Add parameters to control row group size in Parquet writer (#9677) @vuule
    • Run compute-sanitizer in nightly build (#9641) @karthikeyann
    • Implement Series.datetime.floor (#9571) @skirui-source
    • ceil/floor for DatetimeIndex (#9554) @mayankanand007
    • Add support for decimal128 in cudf python (#9533) @galipremsagar
    • Implement lists::index_of() to find positions in list rows (#9510) @mythrocks
    • custreamz oauth callback for kafka (librdkafka) (#9486) @jdye64
    • Add Pearson correlation for sort groupby (python) (#9166) @skirui-source
    • Interchange dataframe protocol (#9071) @iskode
    • Rewriting row/column conversions for Spark <-> cudf data conversions (#8444) @hyperbolic2346

    🛠️ Improvements

    • Prepare upload scripts for Python 3.7 removal (#10092) @Ethyling
    • Simplify custreamz and cudf_kafka recipes files (#10065) @Ethyling
    • ORC writer API changes for granular statistics (#10058) @mythrocks
    • Remove python constraints in cutreamz and cudf_kafka recipes (#10052) @Ethyling
    • Unpin dask and distributed in CI (#10028) @galipremsagar
    • Add _from_column_like_self factory (#10022) @isVoid
    • Replace custom CUDA bindings previously provided by RMM with official CUDA Python bindings (#10008) @shwina
    • Use cuda::std::is_arithmetic in cudf::is_numeric trait. (#9996) @bdice
    • Clean up CUDA stream use in cuIO (#9991) @vuule
    • Use addressed-ordered first fit for the pinned memory pool (#9989) @rongou
    • Add strings tests to transpose_test.cpp (#9985) @davidwendt
    • Use gpuci_mamba_retry on Java CI. (#9983) @bdice
    • Remove deprecated method one_hot_encoding (#9977) @isVoid
    • Minor cleanup of unused Python functions (#9974) @vyasr
    • Use new efficient partitioned parquet writing in cuDF (#9971) @devavret
    • Remove str.subword_tokenize (#9968) @VibhuJawa
    • Forward-merge branch-21.12 to branch-22.02 (#9947) @bdice
    • Remove deprecated method parameter from merge and join. (#9944) @bdice
    • Remove deprecated method DataFrame.hash_columns. (#9943) @bdice
    • Remove deprecated method Series.hash_encode. (#9942) @bdice
    • use ninja in java ci build (#9933) @rongou
    • Add build-time publish step to cpu build script (#9927) @davidwendt
    • Refactoring ceil/round/floor code for datetime64 types (#9926) @mayankanand007
    • Remove various unused functions (#9922) @vyasr
    • Raise in query if dtype is not supported (#9921) @brandon-b-miller
    • Add missing imports tests (#9920) @Ethyling
    • Spark Decimal128 hashing (#9919) @rwlee
    • Replace thrust/std::get with structured bindings (#9915) @codereport
    • Upgrade thrust version to 1.15 (#9912) @robertmaynard
    • Remove conda envs for CUDA 11.0 and 11.2. (#9910) @bdice
    • Return count of set bits from inplace_bitmask_and. (#9904) @bdice
    • Use dynamic nullate for join hasher and equality comparator (#9902) @davidwendt
    • Update ucx-py version on release using rvc (#9897) @Ethyling
    • Remove IncludeCategories from .clang-format (#9876) @codereport
    • Support statically linking CUDA runtime for Java bindings (#9873) @jlowe
    • Add clang-tidy to libcudf (#9860) @codereport
    • Remove deprecated methods from Java Table class (#9853) @jlowe
    • Add test for map column metadata handling in ORC writer (#9852) @vuule
    • Use pandas to_offset to parse frequency string in date_range (#9843) @isVoid
    • add templated benchmark with fixture (#9838) @karthikeyann
    • Use list of column inputs for apply_boolean_mask (#9832) @isVoid
    • Added a few more tests for Decimal to String cast (#9818) @razajafri
    • Run doctests. (#9815) @bdice
    • Avoid overflow for fixed_point round (#9809) @sperlingxx
    • Move drop_duplicates, drop_na, _gather, take to IndexFrame and create their _base_index counterparts (#9807) @isVoid
    • Use vector factories for host-device copies. (#9806) @bdice
    • Refactor host device macros (#9797) @vyasr
    • Remove unused masked udf cython/c++ code (#9792) @brandon-b-miller
    • Allow custom sort functions for dask-cudf sort_values (#9789) @charlesbluca
    • Improve build time of libcudf iterator tests (#9788) @davidwendt
    • Copy Java native dependencies directly into classpath (#9787) @jlowe
    • Add decimal types to cuIO benchmarks (#9776) @vuule
    • Pick smallest decimal type with required precision in ORC reader (#9775) @vuule
    • Avoid overflow for fixed_point cudf::cast and performance optimization (#9772) @codereport
    • Use CTAD with Thrust function objects (#9768) @codereport
    • Refactor TableTest assertion methods to a separate utility class (#9762) @jlowe
    • Use Java classloader to find test resources (#9760) @jlowe
    • Allow cast decimal128 to string and add tests (#9756) @razajafri
    • Load balance optimization for contiguous_split (#9755) @nvdbaranec
    • Consolidate and improve reset_index (#9750) @isVoid
    • Update to UCX-Py 0.24 (#9748) @pentschev
    • Skip cufile tests in JNI build script (#9744) @pxLi
    • Enable string to decimal 128 cast (#9742) @razajafri
    • Use stop instead of stop_. (#9735) @bdice
    • Forward-merge branch-21.12 to branch-22.02 (#9730) @bdice
    • Improve cmake format script (#9723) @vyasr
    • Use cuFile direct device reads/writes by default in cuIO (#9722) @vuule
    • Add directory-partitioned data support to cudf.read_parquet (#9720) @rjzamora
    • Use stream allocator adaptor for hash join table (#9704) @PointKernel
    • Update check for inf/nan strings in libcudf float conversion to ignore case (#9694) @davidwendt
    • Update cudf JNI to 22.02.0-SNAPSHOT (#9681) @pxLi
    • Replace cudf's concurrent_ordered_map with cuco::static_map in semi/anti joins (#9666) @vyasr
    • Some improvements to parse_decimal function and bindings for is_fixed_point (#9658) @razajafri
    • Add utility to format ninja-log build times (#9631) @davidwendt
    • Allow runtime has_nulls parameter for row operators (#9623) @davidwendt
    • Use fsspec.parquet for improved read_parquet performance from remote storage (#9589) @rjzamora
    • Refactor bit counting APIs, introduce valid/null count functions, and split host/device side code for segmented counts. (#9588) @bdice
    • Use List of Columns as Input for drop_nulls, gather and drop_duplicates (#9558) @isVoid
    • Simplify merge internals and reduce overhead (#9516) @vyasr
    • Add struct generation support in datagenerator & fuzz tests (#9180) @galipremsagar
    • Simplify write_csv by removing unnecessary writer/impl classes (#9089) @cwharris
    Source code(tar.gz)
    Source code(zip)
  • v21.12.02(Dec 16, 2021)

  • v21.12.01(Dec 9, 2021)

  • v21.12.00(Dec 3, 2021)

    🚨 Breaking Changes

    • Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel
    • Remove sizeof and standardize on memory_usage (#9544) @vyasr
    • Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
    • Refactor sorting APIs (#9464) @vyasr
    • Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333
    • Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
    • JNI: Support nested types in ORC writer (#9334) @firestarman
    • Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
    • Refactor cuIO timestamp processing with cuda::std::chrono (#9278) @PointKernel
    • Various internal MultiIndex improvements (#9243) @vyasr

    🐛 Bug Fixes

    • Fix read_parquet bug for bytes input (#9669) @rjzamora
    • Use _gather internal for sort_* (#9668) @isVoid
    • Fix behavior of equals for non-DataFrame Frames and add tests. (#9653) @vyasr
    • Dont recompute output size if it is already available (#9649) @abellina
    • Fix read_parquet bug for extended dtypes from remote storage (#9638) @rjzamora
    • add const when getting data from a JNI data wrapper (#9637) @wjxiz1992
    • Fix debrotli issue on CUDA 11.5 (#9632) @vuule
    • Use std::size_t when computing join output size (#9626) @jlowe
    • Fix usecols parameter handling in dask_cudf.read_csv (#9618) @galipremsagar
    • Add support for string &#39;nan&#39;, &#39;inf&#39; &amp; &#39;-inf&#39; values while type-casting to float (#9613) @galipremsagar
    • Avoid passing NativeFileDatasource to pyarrow in read_parquet (#9608) @rjzamora
    • Fix test failure with cuda 11.5 in row_bit_count tests. (#9581) @nvdbaranec
    • Correct _LIBCUDACXX_CUDACC_VER value computation (#9579) @robertmaynard
    • Increase max RLE stream size estimate to avoid potential overflows (#9568) @vuule
    • Fix edge case in tdigest scalar generation for groups containing all nulls. (#9551) @nvdbaranec
    • Fix pytests failing in cuda-11.5 environment (#9547) @galipremsagar
    • compile libnvcomp with PTDS if requested (#9540) @jbrennan333
    • Fix segmented_gather() for null LIST rows (#9537) @mythrocks
    • Deprecate DataFrame.label_encoding, use private _label_encoding method internally. (#9535) @bdice
    • Fix several test and benchmark issues related to bitmask allocations. (#9521) @nvdbaranec
    • Fix for inserting duplicates in groupby result cache (#9508) @karthikeyann
    • Fix mismatched types error in clip() when using non int64 numeric types (#9498) @davidwendt
    • Match conda pinnings for style checks (revert part of #9412, #9433). (#9490) @bdice
    • Make sure all dask-cudf supported aggs are handled in _tree_node_agg (#9487) @charlesbluca
    • Resolve hash_columns FutureWarning in dask_cudf (#9481) @pentschev
    • Add fixed point to AllTypes in libcudf unit tests (#9472) @karthikeyann
    • Fix regex handling of embedded null characters (#9470) @davidwendt
    • Fix memcheck error in copy-if-else (#9467) @davidwendt
    • Fix bug in dask_cudf.read_parquet for index=False (#9453) @rjzamora
    • Preserve the decimal scale when creating a default scalar (#9449) @revans2
    • Push down parent nulls when flattening nested columns. (#9443) @mythrocks
    • Fix memcheck error in gtest SegmentedGatherTest/GatherSliced (#9442) @davidwendt
    • Revert "Fix quantile division / partition handling for dask-cudf sort… (#9438) @charlesbluca
    • Allow int-like objects for the decimals argument in round (#9428) @shwina
    • Fix stream compaction's drop_duplicates API to use stable sort (#9417) @ttnghia
    • Skip Comparing Uniform Window Results in Var/std Tests (#9416) @isVoid
    • Fix StructColumn.to_pandas type handling issues (#9388) @galipremsagar
    • Correct issues in the build dir cudf-config.cmake (#9386) @robertmaynard
    • Fix Java table partition test to account for non-deterministic ordering (#9385) @jlowe
    • Fix timestamp truncation/overflow bugs in orc/parquet (#9382) @PointKernel
    • Fix the crash in stats code (#9368) @devavret
    • Make Series.hash_encode results reproducible. (#9366) @bdice
    • Fix libcudf compile warnings on debug 11.4 build (#9360) @davidwendt
    • Fail gracefully when compiling python UDFs that attempt to access columns with unsupported dtypes (#9359) @brandon-b-miller
    • Set pass_filenames: false in mypy pre-commit configuration. (#9349) @bdice
    • Fix cudf_assert in cudf::io::orc::gpu::gpuDecodeOrcColumnData (#9348) @davidwendt
    • Fix memcheck error in groupby-tdigest get_scalar_minmax (#9339) @davidwendt
    • Optimizations for cudf.concat when axis=1 (#9333) @galipremsagar
    • Use f-string in join helper warning message. (#9325) @bdice
    • Avoid casting to list or struct dtypes in dask_cudf.read_parquet (#9314) @rjzamora
    • Fix null count in statistics for parquet (#9303) @devavret
    • Potential overflow of decimal32 when casting to int64_t (#9287) @codereport
    • Fix quantile division / partition handling for dask-cudf sort on null dataframes (#9259) @charlesbluca
    • Updating cudf version also updates rapids cmake branch (#9249) @robertmaynard
    • Implement one_hot_encoding in libcudf and bind to python (#9229) @isVoid
    • BUG FIX: CSV Writer ignores the header parameter when no metadata is provided (#8740) @skirui-source

    📖 Documentation

    • Update Documentation to use TYPED_TEST_SUITE (#9654) @codereport
    • Add dedicated page for StringHandling in python docs (#9624) @galipremsagar
    • Update docstring of DataFrame.merge (#9572) @galipremsagar
    • Use raw strings to avoid SyntaxErrors in parsed docstrings. (#9526) @bdice
    • Add example to docstrings in rolling.apply (#9522) @isVoid
    • Update help message to escape quotes in ./build.sh --cmake-args. (#9494) @bdice
    • Improve Python docstring formatting. (#9493) @bdice
    • Update table of I/O supported types (#9476) @vuule
    • Document invalid regex patterns as undefined behavior (#9473) @davidwendt
    • Miscellaneous documentation fixes to cudf (#9471) @galipremsagar
    • Fix many documentation errors in libcudf. (#9355) @karthikeyann
    • Fixing SubwordTokenizer docs issue (#9354) @mayankanand007
    • Improved deprecation warnings. (#9347) @bdice
    • doc reorder mr, stream to stream, mr (#9308) @karthikeyann
    • Deprecate method parameters to DataFrame.join, DataFrame.merge. (#9291) @bdice
    • Added deprecation warning for .label_encoding() (#9289) @mayankanand007

    🚀 New Features

    • Enable Series.divide and DataFrame.divide (#9630) @vyasr
    • Update bitmask_and and bitmask_or to return a pair of resulting mask and count of unset bits (#9616) @PointKernel
    • Add handling of mixed numeric types in to_dlpack (#9585) @galipremsagar
    • Support re.Pattern object for pat arg in str.replace (#9573) @davidwendt
    • Add JNI for lists::drop_list_duplicates with keys-values input column (#9553) @ttnghia
    • Support structs column in min, max, argmin and argmax groupby aggregate() and scan() (#9545) @ttnghia
    • Move libcudacxx to use rapids_cpm and use newer versions (#9539) @robertmaynard
    • Add scan min/max support for chrono types to libcudf reduction-scan (not groupby scan) (#9518) @davidwendt
    • Support args= in apply (#9514) @brandon-b-miller
    • Add groupby scan min/max support for strings values (#9502) @davidwendt
    • Add list output option to character_ngrams() function (#9499) @davidwendt
    • More granular column selection in ORC reader (#9496) @vuule
    • add min_periods, ddof to groupby covariance, & correlation aggregation (#9492) @karthikeyann
    • Implement Series.datetime.floor (#9488) @skirui-source
    • Enable linting of CMake files using pre-commit (#9484) @vyasr
    • Add support for single-line regex anchors ^/$ in contains_re (#9482) @davidwendt
    • Augment order_by to Accept a List of null_precedence (#9455) @isVoid
    • Add format API for list column of strings (#9454) @davidwendt
    • Enable Datetime/Timedelta dtypes in Masked UDFs (#9451) @brandon-b-miller
    • Add cudf python groupby.diff (#9446) @karthikeyann
    • Implement lists::stable_sort_lists for stable sorting of elements within each row of lists column (#9425) @ttnghia
    • add ctest memcheck using cuda-sanitizer (#9414) @karthikeyann
    • Support Unary Operations in Masked UDF (#9409) @isVoid
    • Move Several Series Function to Frame (#9394) @isVoid
    • MD5 Python hash API (#9390) @bdice
    • Add cudf strings is_title API (#9380) @davidwendt
    • Enable casting to int64, uint64, and double in AST code. (#9379) @vyasr
    • Add support for writing ORC with map columns (#9369) @vuule
    • extract_list_elements() with column_view indices (#9367) @mythrocks
    • Reimplement lists::drop_list_duplicates for keys-values lists columns (#9345) @ttnghia
    • Support Python UDFs written in terms of rows (#9343) @brandon-b-miller
    • JNI: Support nested types in ORC writer (#9334) @firestarman
    • Optionally nullify out-of-bounds indices in segmented_gather(). (#9318) @mythrocks
    • Add shallow hash function and shallow equality comparison for column_view (#9312) @karthikeyann
    • Add CudaMemoryBuffer for cudaMalloc memory using RMM cuda_memory_resource (#9311) @rongou
    • Add parameters to control row index stride and stripe size in ORC writer (#9310) @vuule
    • Add na_position param to dask-cudf sort_values (#9264) @charlesbluca
    • Add ascending parameter for dask-cudf sort_values (#9250) @charlesbluca
    • New array conversion methods (#9236) @vyasr
    • Series apply method backed by masked UDFs (#9217) @brandon-b-miller
    • Grouping by frequency and resampling (#9178) @shwina
    • Pure-python masked UDFs (#9174) @brandon-b-miller
    • Add Covariance, Pearson correlation for sort groupby (libcudf) (#9154) @karthikeyann
    • Add calendrical_month_sequence in c++ and date_range in python (#8886) @shwina

    🛠️ Improvements

    • Followup to PR 9088 comments (#9659) @cwharris
    • Update cuCollections to version that supports installed libcudacxx (#9633) @robertmaynard
    • Add 11.5 dev.yml to cudf (#9617) @galipremsagar
    • Add xfail for parquet reader 11.5 issue (#9612) @galipremsagar
    • remove deprecated Rmm.initialize method (#9607) @rongou
    • Use HostColumnVectorCore for child columns in JCudfSerialization.unpackHostColumnVectors (#9596) @sperlingxx
    • Set RMM pool to a fixed size in JNI (#9583) @rongou
    • Use nvCOMP for Snappy compression/decompression (#9582) @vuule
    • Build CUDA version agnostic packages for dask-cudf (#9578) @Ethyling
    • Fixed tests warning: "TYPED_TEST_CASE is deprecated, please use TYPED_TEST_SUITE" (#9574) @ttnghia
    • Enable CMake format in CI and fix style (#9570) @vyasr
    • Add NVTX Start/End Ranges to JNI (#9563) @abellina
    • Add librdkafka and python-confluent-kafka to dev conda environments s… (#9562) @jdye64
    • Add offsets_begin/end() to strings_column_view (#9559) @davidwendt
    • remove alignment options for RMM jni (#9550) @rongou
    • Add axis parameter passthrough to DataFrame and Series take for pandas API compatibility (#9549) @dantegd
    • Remove sizeof and standardize on memory_usage (#9544) @vyasr
    • Adds cudaProfilerStart/cudaProfilerStop in JNI api (#9543) @abellina
    • Generalize comparison binary operations (#9542) @vyasr
    • Expose APIs to wrap CUDA or RMM allocations with a Java device buffer instance (#9538) @jlowe
    • Add scan sum support for duration types to libcudf (#9536) @davidwendt
    • Force inlining to improve AST performance (#9530) @vyasr
    • Generalize some more indexed frame methods (#9529) @vyasr
    • Add Java bindings for rolling window stddev aggregation (#9527) @razajafri
    • catch rmm::out_of_memory exceptions in jni (#9525) @rongou
    • Add an overload of make_empty_column with type_id parameter (#9524) @ttnghia
    • Accelerate conditional inner joins with larger right tables (#9523) @vyasr
    • Initial pass of generalizing decimal support in cudf python layer (#9517) @galipremsagar
    • Cleanup for flattening nested columns (#9509) @rwlee
    • Enable running tests using RMM arena and async memory resources (#9506) @rongou
    • Remove dependency on six. (#9495) @bdice
    • Cleanup some libcudf strings gtests (#9489) @davidwendt
    • Rename strings/array_tests.cu to strings/array_tests.cpp (#9480) @davidwendt
    • Refactor sorting APIs (#9464) @vyasr
    • Implement DataFrame.hash_values, deprecate DataFrame.hash_columns. (#9458) @bdice
    • Deprecate Series.hash_encode. (#9457) @bdice
    • Update conda recipes for Enhanced Compatibility effort (#9456) @ajschmidt8
    • Small clean up to simplify column selection code in ORC reader (#9444) @vuule
    • add missing stream to scalar.is_valid() wherever stream is available (#9436) @karthikeyann
    • Adds Deprecation Warnings to one_hot_encoding and Implement get_dummies with Cython API (#9435) @isVoid
    • Update pre-commit hook URLs. (#9433) @bdice
    • Remove pyarrow import in dask_cudf.io.parquet (#9429) @charlesbluca
    • Miscellaneous improvements for UDFs (#9422) @isVoid
    • Use pre-commit for CI (#9412) @vyasr
    • Update to UCX-Py 0.23 (#9407) @pentschev
    • Expose OutOfBoundsPolicy in JNI for Table.gather (#9406) @abellina
    • Improvements to tdigest aggregation code. (#9403) @nvdbaranec
    • Add Java API to deserialize a table to host columns (#9402) @jlowe
    • Frame copy to use class instead of type() (#9397) @madsbk
    • Change all DeprecationWarnings to FutureWarning. (#9392) @bdice
    • Update Java nvcomp JNI bindings to nvcomp 2.x API (#9384) @jbrennan333
    • Add IndexedFrame class and move SingleColumnFrame to a separate module (#9378) @vyasr
    • Support Arrow NativeFile and PythonFile for remote ORC storage (#9377) @rjzamora
    • Use Arrow PythonFile for remote CSV storage (#9376) @rjzamora
    • Add multi-threaded writing to GDS writes (#9372) @devavret
    • Miscellaneous column cleanup (#9370) @vyasr
    • Use single kernel to extract all groups in cudf::strings::extract (#9358) @davidwendt
    • Consolidate binary ops into Frame (#9357) @isVoid
    • Move rank scan implementations from scan_inclusive.cu to rank_scan.cu (#9351) @davidwendt
    • Remove usage of deprecated thrust::host_space_tag. (#9350) @bdice
    • Use Default Memory Resource for Temporaries in reduction.cpp (#9344) @isVoid
    • Fix Cython compilation warnings. (#9327) @bdice
    • Fix some unused variable warnings in libcudf (#9326) @davidwendt
    • Use optional-iterator for copy-if-else kernel (#9324) @davidwendt
    • Remove Table class (#9315) @vyasr
    • Unpin dask and distributed in CI (#9307) @galipremsagar
    • Add optional-iterator support to indexalator (#9306) @davidwendt
    • Consolidate more methods in Frame (#9305) @vyasr
    • Add Arrow-NativeFile and PythonFile support to read_parquet and read_csv in cudf (#9304) @rjzamora
    • Pin mypy in .pre-commit-config.yaml to match conda environment pinning. (#9300) @bdice
    • Use gather.hpp when gather-map exists in device memory (#9299) @davidwendt
    • Fix Automerger for Branch-21.12 from branch-21.10 (#9285) @galipremsagar
    • Refactor cuIO timestamp processing with cuda::std::chrono (#9278) @PointKernel
    • Change strings copy_if_else to use optional-iterator instead of pair-iterator (#9266) @davidwendt
    • Update cudf java bindings to 21.12.0-SNAPSHOT (#9248) @pxLi
    • Various internal MultiIndex improvements (#9243) @vyasr
    • Add detail interface for split and slice(table_view), refactors both function with host_span (#9226) @isVoid
    • Refactor MD5 implementation. (#9212) @bdice
    • Update groupby result_cache to allow sharing intermediate results based on column_view instead of requests. (#9195) @karthikeyann
    • Use nvcomp's snappy decompressor in avro reader (#9181) @devavret
    • Add isocalendar API support (#9169) @marlenezw
    • Simplify read_json by removing unnecessary reader/impl classes (#9088) @cwharris
    • Simplify read_csv by removing unnecessary reader/impl classes (#9041) @cwharris
    • Refactor hash join with cuCollections multimap (#8934) @PointKernel
    Source code(tar.gz)
    Source code(zip)
  • v21.10.01(Oct 12, 2021)

  • v21.10.00(Oct 6, 2021)

    🚨 Breaking Changes

    • Remove Cython APIs for table view generation (#9199) @vyasr
    • Upgrade pandas version in cudf (#9147) @galipremsagar
    • Make AST operators nullable (#9096) @vyasr
    • Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule
    • Update JNI java CSV APIs to not use deprecated API (#9066) @revans2
    • Support additional format specifiers in from_timestamps (#9047) @davidwendt
    • Expose expression base class publicly and simplify public AST API (#9045) @vyasr
    • Add support for struct type in ORC writer (#9025) @vuule
    • Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr
    • Java bindings for conditional join output sizes (#9002) @jlowe
    • Move compute_column API out of ast namespace (#8957) @vyasr
    • cudf.dtype function (#8949) @shwina
    • Refactor Frame reductions (#8944) @vyasr
    • Add nested column selection to parquet reader (#8933) @devavret
    • JNI Aggregation Type Changes (#8919) @revans2
    • Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec
    • Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule
    • Change cudf docs theme to pydata theme (#8746) @galipremsagar
    • Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann
    • Make groupby transform-like op order match original data order (#8720) @isVoid

    🐛 Bug Fixes

    • fixed_point cudf::groupby for mean aggregation (#9296) @codereport
    • Fix interleave_columns when the input string lists column having empty child column (#9292) @ttnghia
    • Update nvcomp to include fixes for installation of headers (#9276) @devavret
    • Fix Java column leak in testParquetWriteMap (#9271) @jlowe
    • Fix call to thrust::reduce_by_key in argmin/argmax libcudf groupby (#9263) @davidwendt
    • Fixing empty input to getMapValue crashing (#9262) @hyperbolic2346
    • Fix duplicate names issue in MultiIndex.deserialize (#9258) @galipremsagar
    • Dataframe.sort_index optimizations (#9238) @galipremsagar
    • Temporarily disabling problematic test in parquet writer (#9230) @devavret
    • Explicitly disable groupby on unsupported key types. (#9227) @mythrocks
    • Fix gather for sliced input structs column (#9218) @ttnghia
    • Fix JNI code for left semi and anti joins (#9207) @jlowe
    • Only install thrust when using a non 'system' version (#9206) @robertmaynard
    • Remove zlib from libcudf public CMake dependencies (#9204) @robertmaynard
    • Fix out-of-bounds memory read in orc gpuEncodeOrcColumnData (#9196) @davidwendt
    • Fix gather() for STRUCT inputs with no nulls in members. (#9194) @mythrocks
    • get_cucollections properly uses rapids_cpm_find (#9189) @robertmaynard
    • rapids-export correctly reference build code block and doc strings (#9186) @robertmaynard
    • Fix logic while parsing the sum statistic for numerical orc columns (#9183) @ayushdg
    • Add handling for nulls in dask_cudf.sorting.quantile_divisions (#9171) @charlesbluca
    • Approximate overflow detection in ORC statistics (#9163) @vuule
    • Use decimal precision metadata when reading from parquet files (#9162) @shwina
    • Fix variable name in Java build script (#9161) @jlowe
    • Import rapids-cmake modules using the correct cmake variable. (#9149) @robertmaynard
    • Fix conditional joins with empty left table (#9146) @vyasr
    • Fix joining on indexes with duplicate level names (#9137) @shwina
    • Fixes missing child column name in dtype while reading ORC file. (#9134) @rgsl888prabhu
    • Apply type metadata after column is slice-copied (#9131) @isVoid
    • Fix a bug: inner_join_size return zero if build table is empty (#9128) @PointKernel
    • Fix multi hive-partition parquet reading in dask-cudf (#9122) @rjzamora
    • Support null literals in expressions (#9117) @vyasr
    • Fix cudf::hash_join output size for struct joins (#9107) @jlowe
    • Import fix (#9104) @shwina
    • Fix cudf::strings::is_fixed_point checking of overflow for decimal32 (#9093) @davidwendt
    • Fix branch_stack calculation in row_bit_count() (#9076) @mythrocks
    • Fetch rapids-cmake to work around cuCollection cmake issue (#9075) @jlowe
    • Fix compilation errors in groupby benchmarks. (#9072) @nvdbaranec
    • Preserve float16 upscaling (#9069) @galipremsagar
    • Fix memcheck read error in libcudf contiguous_split (#9067) @davidwendt
    • Add support for reading ORC file with no row group index (#9060) @rgsl888prabhu
    • Various multiindex related fixes (#9036) @shwina
    • Avoid rebuilding cython in build.sh (#9034) @brandon-b-miller
    • Add support for percentile dispatch in dask_cudf (#9031) @galipremsagar
    • cudf resolve nvcc 11.0 compiler crashes during codegen (#9028) @robertmaynard
    • Fetch correct grouping keys agg of dask groupby (#9022) @galipremsagar
    • Allow where() to work with a Series and other=cudf.NA (#9019) @sarahyurick
    • Use correct index when returning Series from GroupBy.apply() (#9016) @charlesbluca
    • Fix Dataframe indexer setitem when array is passed (#9006) @galipremsagar
    • Fix ORC reading of files with struct columns that have null values (#9005) @vuule
    • Ensure JNI native libraries load when CompiledExpression loads (#8997) @jlowe
    • Fix memory read error in get_dremel_data in page_enc.cu (#8995) @davidwendt
    • Fix memory write error in get_list_child_to_list_row_mapping utility (#8994) @davidwendt
    • Fix debug compile error for csv_test.cpp (#8981) @davidwendt
    • Fix memory read/write error in concatenate_lists_ignore_null (#8978) @davidwendt
    • Fix concatenation of cudf.RangeIndex (#8970) @galipremsagar
    • Java conditional joins should not require matching column counts (#8955) @jlowe
    • Fix concatenate empty structs (#8947) @sperlingxx
    • Fix cuda-memcheck errors for some libcudf functions (#8941) @davidwendt
    • Apply series name to result of SeriesGroupby.apply() (#8939) @charlesbluca
    • cdef packed_columns as cppclass instead of struct (#8936) @charlesbluca
    • Inserting a cudf.NA into a DataFrame (#8923) @sarahyurick
    • Support casting with Pandas dtype aliases (#8920) @sarahyurick
    • Allow sort_values to accept same kind values as Pandas (#8912) @sarahyurick
    • Enable casting to pandas nullable dtypes (#8889) @brandon-b-miller
    • Fix libcudf memory errors (#8884) @karthikeyann
    • Throw KeyError when accessing field from struct with nonexistent key (#8880) @NV-jpt
    • replace auto with auto& ref for cast<&> (#8866) @karthikeyann
    • Add missing include<optional> in binops (#8864) @karthikeyann
    • Fix select_dtypes to work when non-class dtypes present in dataframe (#8849) @sarahyurick
    • Re-enable JSON tests (#8843) @vuule
    • Support header with embedded delimiter in csv writer (#8798) @davidwendt

    📖 Documentation

    • Add IO docs page in cudf documentation (#9145) @galipremsagar
    • use correct namespace in cuio code examples (#9037) @cwharris
    • Restructuring Contributing doc (#9026) @iskode
    • Update stable version in readme (#9008) @galipremsagar
    • Add spans and more include guidelines to libcudf developer guide (#8931) @harrism
    • Update Java build instructions to mention Arrow S3 and Docker (#8867) @jlowe
    • List GDS-enabled formats in the docs (#8805) @vuule
    • Change cudf docs theme to pydata theme (#8746) @galipremsagar

    🚀 New Features

    • Revert "Add shallow hash function and shallow equality comparison for column_view (#9185)" (#9283) @karthikeyann
    • Align DataFrame.apply signature with pandas (#9275) @brandon-b-miller
    • Add struct type support for drop_list_duplicates (#9202) @ttnghia
    • support CUDA async memory resource in JNI (#9201) @rongou
    • Add shallow hash function and shallow equality comparison for column_view (#9185) @karthikeyann
    • Superimpose null masks for STRUCT columns. (#9144) @mythrocks
    • Implemented bindings for ceil timestamp operation (#9141) @shaneding
    • Adding MAP type support for ORC Reader (#9132) @rgsl888prabhu
    • Implement interleave_columns for lists with arbitrary nested type (#9130) @ttnghia
    • Add python bindings to fixed-size window and groupby rolling.var, rolling.std (#9097) @isVoid
    • Make AST operators nullable (#9096) @vyasr
    • Java bindings for approx_percentile (#9094) @andygrove
    • Add dseries.struct.explode (#9086) @isVoid
    • Add support for BaseIndexer in Rolling APIs (#9085) @galipremsagar
    • Remove the option to pass data types as strings to read_csv and read_json (#9079) @vuule
    • Add handling for nested dicts in dask-cudf groupby (#9054) @charlesbluca
    • Added Series.dt.is_quarter_start and Series.dt.is_quarter_end (#9046) @TravisHester
    • Support nested types for nth_element reduction (#9043) @sperlingxx
    • Update sort groupby to use non-atomic operation (#9035) @karthikeyann
    • Add support for struct type in ORC writer (#9025) @vuule
    • Implement interleave_columns for structs columns (#9012) @ttnghia
    • Add groupby first and last aggregations (#9004) @shwina
    • Add DecimalBaseColumn and move as_decimal_column (#9001) @isVoid
    • Python/Cython bindings for multibyte_split (#8998) @jdye64
    • Support scalar months in add_calendrical_months, extends API to INT32 support (#8991) @isVoid
    • Added Series.dt.is_month_end (#8989) @TravisHester
    • Support for using tdigests to compute approximate percentiles. (#8983) @nvdbaranec
    • Support "unflatten" of columns flattened via flatten_nested_columns(): (#8956) @mythrocks
    • Implement timestamp ceil (#8942) @shaneding
    • Add nested column selection to parquet reader (#8933) @devavret
    • Expose conditional join size calculation (#8928) @vyasr
    • Support Nulls in Timeseries Generator (#8925) @isVoid
    • Avoid index equality check in _CPackedColumns.from_py_table() (#8917) @charlesbluca
    • Add dot product binary op (#8909) @charlesbluca
    • Expose days_in_month function in libcudf and add python bindings (#8892) @isVoid
    • Series string repeat (#8882) @sarahyurick
    • Python binding for quarters (#8862) @shaneding
    • Expand CSV and JSON reader APIs to accept dtypes as a vector or map of data_type objects (#8856) @vuule
    • Add Java bindings for AST transform (#8846) @jlowe
    • Series datetime is_month_start (#8844) @sarahyurick
    • Support bracket syntax for cudf::strings::replace_with_backrefs group index values (#8841) @davidwendt
    • Support VARIANCE and STD aggregation in rolling op (#8809) @isVoid
    • Add quarters to libcudf datetime (#8779) @shaneding
    • Linear Interpolation of nans via cupy (#8767) @brandon-b-miller
    • Enable compiled binary ops in libcudf, python and java (#8741) @karthikeyann
    • Make groupby transform-like op order match original data order (#8720) @isVoid
    • multibyte_split (#8702) @cwharris
    • Implement JNI for strings:repeat_strings that repeats each string separately by different numbers of times (#8572) @ttnghia

    🛠️ Improvements

    • Pin max dask and distributed versions to 2021.09.1 (#9286) @galipremsagar
    • Optimized fsspec data transfer for remote file-systems (#9265) @rjzamora
    • Skip dask-cudf tests on arm64 (#9252) @Ethyling
    • Use nvcomp's snappy compressor in ORC writer (#9242) @devavret
    • Only run imports tests on x86_64 (#9241) @Ethyling
    • Remove unnecessary call to device_uvector::release() (#9237) @harrism
    • Use nvcomp's snappy decompression in ORC reader (#9235) @devavret
    • Add grouped_rolling test with STRUCT groupby keys. (#9228) @mythrocks
    • Optimize cudf.concat for axis=0 (#9222) @galipremsagar
    • Fix some libcudf calls not passing the stream parameter (#9220) @davidwendt
    • Add min and max bounds for random dataframe generator numeric types (#9211) @galipremsagar
    • Improve performance of expression evaluation (#9210) @vyasr
    • Misc optimizations in cudf (#9203) @galipremsagar
    • Remove Cython APIs for table view generation (#9199) @vyasr
    • Add JNI support for drop_list_duplicates (#9198) @revans2
    • Update pandas versions in conda recipes and requirements.txt files (#9197) @galipremsagar
    • Minor C++17 cleanup of groupby.cu: structured bindings, more concise lambda, etc (#9193) @codereport
    • Explicit about bitwidth difference between cudf boolean and arrow boolean (#9192) @isVoid
    • Remove _source_index from MultiIndex (#9191) @vyasr
    • Fix typo in the name of cudf-testing-targets.cmake (#9190) @trxcllnt
    • Add support for single-digits in cudf::to_timestamps (#9173) @davidwendt
    • Fix cufilejni build include path (#9168) @pxLi
    • dask_cudf dispatch registering cleanup (#9160) @galipremsagar
    • Remove unneeded stream/mr from a cudf::make_strings_column (#9148) @davidwendt
    • Upgrade pandas version in cudf (#9147) @galipremsagar
    • make data chunk reader return unique_ptr (#9129) @cwharris
    • Add backend for percentile_lookup dispatch (#9118) @galipremsagar
    • Refactor implementation of column setitem (#9110) @vyasr
    • Fix compile warnings found using nvcc 11.4 (#9101) @davidwendt
    • Update to UCX-Py 0.22 (#9099) @pentschev
    • Simplify read_avro by removing unnecessary writer/impl classes (#9090) @cwharris
    • Allowing %f in format to return nanoseconds (#9081) @marlenezw
    • Java bindings for cudf::hash_join (#9080) @jlowe
    • Remove stale code in ColumnBase._fill (#9078) @isVoid
    • Add support for get_group in GroupBy (#9070) @galipremsagar
    • Remove remaining "support" methods from DataFrame (#9068) @vyasr
    • Update JNI java CSV APIs to not use deprecated API (#9066) @revans2
    • Added method to remove null_masks if the column has no nulls (#9061) @razajafri
    • Consolidate Several Series and Dataframe Methods (#9059) @isVoid
    • Remove usage of string based set_dtypes for csv & json readers (#9049) @galipremsagar
    • Remove some debug print statements from gtests (#9048) @davidwendt
    • Support additional format specifiers in from_timestamps (#9047) @davidwendt
    • Expose expression base class publicly and simplify public AST API (#9045) @vyasr
    • move filepath and mmap logic out of json/csv up to functions.cpp (#9040) @cwharris
    • Refactor Index hierarchy (#9039) @vyasr
    • cudf now leverages rapids-cmake to reduce CMake boilerplate (#9030) @robertmaynard
    • Add support for STRUCT input to groupby (#9024) @mythrocks
    • Refactor Frame scans (#9021) @vyasr
    • Remove duplicate set_categories code (#9018) @isVoid
    • Map support for ParquetWriter (#9013) @razajafri
    • Remove aliases of various api.types APIs from utils.dtypes. (#9011) @vyasr
    • Java bindings for conditional join output sizes (#9002) @jlowe
    • Remove _copy_construct factory (#8999) @vyasr
    • ENH Allow arbitrary CMake config options in build.sh (#8996) @dillon-cullinan
    • A small optimization for JNI copy column view to column vector (#8985) @revans2
    • Fix nvcc warnings in ORC writer (#8975) @devavret
    • Support nested structs in rank and dense rank (#8962) @rwlee
    • Move compute_column API out of ast namespace (#8957) @vyasr
    • Series datetime is_year_end and is_year_start (#8954) @marlenezw
    • Make Java AstNode public (#8953) @jlowe
    • Replace allocate with device_uvector for subword_tokenize internal tables (#8952) @davidwendt
    • cudf.dtype function (#8949) @shwina
    • Refactor Frame reductions (#8944) @vyasr
    • Add deprecation warning for Series.set_mask API (#8943) @galipremsagar
    • Move AST evaluator into a separate header (#8930) @vyasr
    • JNI Aggregation Type Changes (#8919) @revans2
    • Move template parameter to function parameter in cudf::detail::left_semi_anti_join (#8914) @davidwendt
    • Upgrade arrow & pyarrow to 5.0.0 (#8908) @galipremsagar
    • Add groupby_aggregation and groupby_scan_aggregation classes and force their usage. (#8906) @nvdbaranec
    • Move structs_column_tests.cu to .cpp. (#8902) @mythrocks
    • Add stream and memory-resource parameters to struct-scalar copy ctor (#8901) @davidwendt
    • Combine linearizer and ast_plan (#8900) @vyasr
    • Add Java bindings for conditional join gather maps (#8888) @jlowe
    • Remove max version pin for dask & distributed on development branch (#8881) @galipremsagar
    • fix cufilejni build w/ c++17 (#8877) @pxLi
    • Add struct accessor to dask-cudf (#8874) @NV-jpt
    • Migrate dask-cudf CudfEngine to leverage ArrowDatasetEngine (#8871) @rjzamora
    • Add JNI for extract_quarter, add_calendrical_months, and is_leap_year (#8863) @revans2
    • Change cudf::scalar copy and move constructors to protected (#8857) @davidwendt
    • Replace is_same&lt;&gt;::value with is_same_v&lt;&gt; (#8852) @codereport
    • Add min pytorch version to importorskip in pytest (#8851) @galipremsagar
    • Java bindings for regex replace (#8847) @jlowe
    • Remove make strings children with null mask (#8830) @davidwendt
    • Refactor conditional joins (#8815) @vyasr
    • Small cleanup (unused headers / commented code removals) (#8799) @codereport
    • ENH Replace gpuci_conda_retry with gpuci_mamba_retry (#8770) @dillon-cullinan
    • Update cudf java bindings to 21.10.0-SNAPSHOT (#8765) @pxLi
    • Refactor and improve join benchmarks with nvbench (#8734) @PointKernel
    • Refactor Python factories and remove usage of Table for libcudf output handling (#8687) @vyasr
    • Optimize URL Decoding (#8622) @gaohao95
    • Parquet writer dictionary encoding refactor (#8476) @devavret
    • Use nvcomp's snappy decompression in parquet reader (#8252) @devavret
    • Use nvcomp's snappy compressor in parquet writer (#8229) @devavret
    Source code(tar.gz)
    Source code(zip)
  • v21.08.03(Sep 16, 2021)

  • v21.08.02(Aug 6, 2021)

  • v21.08.01(Aug 6, 2021)

  • v21.08.00(Aug 4, 2021)

    🚨 Breaking Changes

    • Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
    • Remove unused cudf::strings::create_offsets (#8663) @davidwendt
    • Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
    • Change default datetime index resolution to ns to match pandas (#8611) @vyasr
    • Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
    • Add strings::repeat_strings API that can repeat each string a different number of times (#8561) @ttnghia
    • String-to-boolean conversion is different from Pandas (#8549) @skirui-source
    • Add accurate hash join size functions (#8453) @PointKernel
    • Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source
    • Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar
    • Adapt cudf::scalar classes to changes in rmm::device_scalar (#8411) @harrism
    • Remove special Index class from the general index class hierarchy (#8309) @vyasr
    • Add first-class dtype utilities (#8308) @vyasr
    • ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64
    • Upgrade arrow to 4.0.1 (#7495) @galipremsagar

    🐛 Bug Fixes

    • Fix contains check in string column (#8834) @galipremsagar
    • Remove unused variable from row_bit_count_test. (#8829) @mythrocks
    • Fixes issue with null struct columns in ORC reader (#8819) @rgsl888prabhu
    • Set CMake vars for python/parquet support in libarrow builds (#8808) @vyasr
    • Handle empty child columns in row_bit_count() (#8791) @mythrocks
    • Revert "Remove cudf unneeded build time requirement of the cuda driver" (#8784) @robertmaynard
    • Fix isort error in utils.pyx (#8771) @charlesbluca
    • Handle sliced struct/list columns properly in concatenate() bounds checking. (#8760) @nvdbaranec
    • Fix issues with _CPackedColumns.serialize() handling of host and device data (#8759) @charlesbluca
    • Fix issues with MultiIndex in dropna, stack & reset_index (#8753) @galipremsagar
    • Write pandas extension types to parquet file metadata (#8749) @devavret
    • Fix where to handle DataFrame & Series input combination (#8747) @galipremsagar
    • Fix replace to handle null values correctly (#8744) @galipremsagar
    • Handle sliced structs properly in pack/contiguous_split. (#8739) @nvdbaranec
    • Fix issue in slice() where columns with a positive offset were computing null counts incorrectly. (#8738) @nvdbaranec
    • Fix cudf.Series constructor to handle list of sequences (#8735) @galipremsagar
    • Fix min/max sorted groupby aggregation on string column with nulls (argmin, argmax sentinel value missing on nulls) (#8731) @karthikeyann
    • Fix orc reader assert on create data_type in debug (#8706) @davidwendt
    • Fix min/max inclusive cudf::scan for strings column (#8705) @davidwendt
    • JNI: Fix driver version assertion logic in testGetCudaRuntimeInfo (#8701) @sperlingxx
    • Adding fix for skip_rows and crash in orc reader (#8700) @rgsl888prabhu
    • Bug fix: replace_nulls_policy functor not returning correct indices for gathermap (#8699) @isVoid
    • Fix a crash in pack() when being handed tables with no columns. (#8697) @nvdbaranec
    • Add post-processing steps to dask_cudf.groupby.CudfSeriesGroupby.aggregate (#8694) @charlesbluca
    • JNI build no longer looks for Arrow in conda environment (#8686) @jlowe
    • Handle arbitrarily different data in null list column rows when checking for equivalency. (#8666) @nvdbaranec
    • Add ConfigureNVBench to avoid concurrent main() entry points (#8662) @PointKernel
    • Pin *arrow to use *cuda in run (#8651) @jakirkham
    • Add proper support for tolerances in testing methods. (#8649) @vyasr
    • Support multi-char case conversion in capitalize function (#8647) @davidwendt
    • Fix repeated mangled names in read_csv with duplicate column names (#8645) @karthikeyann
    • Temporarily disable libcudf example build tests (#8642) @isVoid
    • Use conda-sourced cudf artifacts for libcudf example in CI (#8638) @isVoid
    • Ensure dev environment uses Arrow GPU packages (#8637) @charlesbluca
    • Fix bug that columns only initialized once when specified columns and index in dataframe ctor (#8628) @isVoid
    • Propagate **kwargs through to as_*_column methods (#8618) @shwina
    • Fix orc_reader_benchmark.cpp compile error (#8609) @davidwendt
    • Fix missed renumbering of Aggregation values (#8600) @revans2
    • Update cmake to 3.20.5 in the Java Docker image (#8593) @NvTimLiu
    • Fix bug in replace_with_backrefs when group has greedy quantifier (#8575) @davidwendt
    • Apply metadata to keys before returning in Frame._encode (#8560) @charlesbluca
    • Fix for strings containing special JSON characters in get_json_object(). (#8556) @nvdbaranec
    • Fix debug compile error in gather_struct_tests.cpp (#8554) @davidwendt
    • String-to-boolean conversion is different from Pandas (#8549) @skirui-source
    • Fix __repr__ output with display.max_rows is None (#8547) @galipremsagar
    • Fix size passed to column constructors in _with_type_metadata (#8539) @shwina
    • Properly retrieve last column when -1 is specified for column index (#8529) @isVoid
    • Fix importing apply from dask (#8517) @galipremsagar
    • Fix offset of the string dictionary length stream (#8515) @vuule
    • Fix double counting of selected columns in CSV reader (#8508) @ochan1
    • Incorrect map size in scatter_to_gather corrupts struct columns (#8507) @gerashegalov
    • replace_nulls properly propagates memory resource to gather calls (#8500) @robertmaynard
    • Disallow groupby aggs for StructColumns (#8499) @charlesbluca
    • Fixes out-of-bounds access for small files in unzip (#8498) @elstehle
    • Adding support for writing empty dataframe (#8490) @shaneding
    • Fix exclusive scan when including nulls and improve testing (#8478) @harrism
    • Add workaround for crash in libcudf debug build using output_indexalator in thrust::lower_bound (#8432) @davidwendt
    • Install only the same Thrust files that Thrust itself installs (#8420) @robertmaynard
    • Add nightly version for ucx-py in ci script (#8419) @galipremsagar
    • Fix null_equality config of rolling_collect_set (#8415) @sperlingxx
    • CollectSetAggregation: implement RollingAggregation interface (#8406) @sperlingxx
    • Handle pre-sliced nested columns in contiguous_split. (#8391) @nvdbaranec
    • Fix bitmask_tests.cpp host accessing device memory (#8370) @davidwendt
    • Fix concurrent_unordered_map to prevent accessing padding bits in pair_type (#8348) @davidwendt
    • BUG FIX: Raise appropriate strings error when concatenating strings column (#8290) @skirui-source
    • Make gpuCI and pre-commit style configurations consistent (#8215) @charlesbluca
    • Add collect list to dask-cudf groupby aggregations (#8045) @charlesbluca

    📖 Documentation

    • Update Python UDFs notebook (#8810) @brandon-b-miller
    • Fix dask.dataframe API docs links after reorg (#8772) @jsignell
    • Fix instructions for running cuDF/dask-cuDF tests in CONTRIBUTING.md (#8724) @shwina
    • Translate Markdown documentation to rST and remove recommonmark (#8698) @vyasr
    • Fixed spelling mistakes in libcudf documentation (#8664) @karthikeyann
    • Custom Sphinx Extension: PandasCompat (#8643) @isVoid
    • Fix README.md (#8535) @ajschmidt8
    • Change namespace contains_nulls to struct (#8523) @davidwendt
    • Add info about NVTX ranges to dev guide (#8461) @jrhemstad
    • Fixed documentation bug in groupby agg method (#8325) @ahmet-uyar

    🚀 New Features

    • Fix concatenating structs (#8811) @shaneding
    • Implement JNI for groupby aggregations M2 and MERGE_M2 (#8763) @ttnghia
    • Bump isort to 5.6.4 and remove isort overrides made for 5.0.7 (#8755) @charlesbluca
    • Implement __setitem__ for StructColumn (#8737) @shaneding
    • Add is_leap_year to DateTimeProperties and DatetimeIndex (#8736) @isVoid
    • Add struct.explode() method (#8729) @shwina
    • Add DataFrame.to_struct() method to convert a DataFrame to a struct Series (#8728) @shwina
    • Add support for list type in ORC writer (#8723) @vuule
    • Fix slicing from struct columns and accessing struct columns (#8719) @shaneding
    • Add datetime::is_leap_year (#8711) @isVoid
    • Accessing struct columns from dask_cudf (#8675) @shaneding
    • Added pct_change to Series (#8650) @TravisHester
    • Add strings support to cudf::shift function (#8648) @davidwendt
    • Support Scatter struct_scalar (#8630) @isVoid
    • Struct scalar from host dictionary (#8629) @shaneding
    • Add dayofyear and day_of_year to Series, DatetimeColumn, and DatetimeIndex (#8626) @beckernick
    • JNI support for capitalize (#8624) @firestarman
    • Add delimiter parameter to cudf::strings::capitalize() (#8620) @davidwendt
    • Add NVBench in CMake (#8619) @PointKernel
    • Change default datetime index resolution to ns to match pandas (#8611) @vyasr
    • ListColumn __setitem__ (#8606) @brandon-b-miller
    • Implement groupby aggregations M2 and MERGE_M2 (#8605) @ttnghia
    • Add sequence_type parameter to cudf::strings::title function (#8602) @davidwendt
    • Adding support for list and struct type in ORC Reader (#8599) @rgsl888prabhu
    • Benchmark for strings::repeat_strings APIs (#8589) @ttnghia
    • Nested scalar support for copy if else (#8588) @gerashegalov
    • User specified decimal columns to float64 (#8587) @jdye64
    • Add get_element for struct column (#8578) @isVoid
    • Python changes for adding __getitem__ for struct (#8577) @shaneding
    • Add strings::repeat_strings API that can repeat each string a different number of times (#8561) @ttnghia
    • Refactor tests/iterator_utilities.hpp functions (#8540) @ttnghia
    • Support MERGE_LISTS and MERGE_SETS in Java package (#8516) @sperlingxx
    • Decimal support csv reader (#8511) @elstehle
    • Add column type tests (#8505) @isVoid
    • Warn when downscaling decimal columns (#8492) @ChrisJar
    • Add JNI for strings::repeat_strings (#8491) @ttnghia
    • Add Index.get_loc for Numerical, String Index support (#8489) @isVoid
    • Expose half_up rounding in cuDF (#8477) @shwina
    • Java APIs to fetch CUDA runtime info (#8465) @sperlingxx
    • Add str.edit_distance_matrix (#8463) @isVoid
    • Support constructing cudf.Scalar objects from host side lists (#8459) @brandon-b-miller
    • Add accurate hash join size functions (#8453) @PointKernel
    • Add cudf::strings::integer_to_hex convert API (#8450) @davidwendt
    • Create objects from iterables that contain cudf.NA (#8442) @brandon-b-miller
    • JNI bindings for sort_lists (#8439) @sperlingxx
    • Expose a Decimal32Dtype in cuDF Python (#8438) @skirui-source
    • Replace all_null() and all_valid() by iterator_all_nulls() and iterator_no_null() in tests (#8437) @ttnghia
    • Implement groupby MERGE_LISTS and MERGE_SETS aggregates (#8436) @ttnghia
    • Add public libcudf match_dictionaries API (#8429) @davidwendt
    • Add move constructors for string_scalar and struct_scalar (#8428) @ttnghia
    • Implement strings::repeat_strings (#8423) @ttnghia
    • STRUCT column support for cudf::merge. (#8422) @nvdbaranec
    • Implement reverse in libcudf (#8410) @shaneding
    • Support multiple input files/buffers for read_json (#8403) @jdye64
    • Improve test coverage for struct search (#8396) @ttnghia
    • Add groupby.fillna (#8362) @isVoid
    • Enable AST-based joining (#8214) @vyasr
    • Generalized null support in user defined functions (#8213) @brandon-b-miller
    • Add compiled binary operation (#8192) @karthikeyann
    • Implement .describe() for DataFrameGroupBy (#8179) @skirui-source
    • ORC - Support reading multiple orc files/buffers in a single operation (#8142) @jdye64
    • Add Python bindings for lists::concatenate_list_elements and expose them as .list.concat() (#8006) @shwina
    • Use Arrow URI FileSystem backed instance to retrieve remote files (#7709) @jdye64
    • Example to build custom application and link to libcudf (#7671) @isVoid
    • Upgrade arrow to 4.0.1 (#7495) @galipremsagar

    🛠️ Improvements

    • Provide a better error message when CUDA::cuda_driver not found (#8794) @robertmaynard
    • Remove anonymous namespace from null_mask.cuh (#8786) @nvdbaranec
    • Allow cudf to be built without libcuda.so existing (#8751) @robertmaynard
    • Pin mimesis to &lt;4.1 (#8745) @galipremsagar
    • Update conda environment name for CI (#8692) @ajschmidt8
    • Remove flatbuffers dependency (#8671) @Ethyling
    • Add options to build Arrow with Python and Parquet support (#8670) @trxcllnt
    • Remove unused cudf::strings::create_offsets (#8663) @davidwendt
    • Update GDS lib version to 1.0.0 (#8654) @pxLi
    • Support for groupby/scan rank and dense_rank aggregations (#8652) @rwlee
    • Fix usage of deprecated arrow ipc API (#8632) @revans2
    • Use absolute imports in cudf (#8631) @galipremsagar
    • ENH Add Java CI build script (#8627) @dillon-cullinan
    • Add DeprecationWarning to ser.str.subword_tokenize (#8603) @VibhuJawa
    • Rewrite binary operations for improved performance and additional type support (#8598) @vyasr
    • Fix mypy errors surfacing because of numpy-1.21.0 (#8595) @galipremsagar
    • Remove unneeded includes from cudf::string_view headers (#8594) @davidwendt
    • Use cmake 3.20.1 as it is now required by rmm (#8586) @robertmaynard
    • Remove device debug symbols from cmake CUDF_CUDA_FLAGS (#8584) @davidwendt
    • Dask-CuDF: use default Dask Dataframe optimizer (#8581) @madsbk
    • Remove checking if an unsigned value is less than zero (#8579) @robertmaynard
    • Remove strings_count parameter from cudf::strings::detail::create_chars_child_column (#8576) @davidwendt
    • Make cudf.api.types imports consistent (#8571) @galipremsagar
    • Modernize libcudf basic example CMakeFile; updates CI build tests (#8568) @isVoid
    • Rename concatenate_tests.cu to .cpp (#8555) @davidwendt
    • enable window lead/lag test on struct (#8548) @wbo4958
    • Add Java methods to split and write column views (#8546) @razajafri
    • Small cleanup (#8534) @codereport
    • Unpin dask version in CI (#8533) @galipremsagar
    • Added optional flag for building Arrow with S3 filesystem support (#8531) @jdye64
    • Minor clean up of various internal column and frame utilities (#8528) @vyasr
    • Rename some copying_test source files .cu to .cpp (#8527) @davidwendt
    • Correct the last warnings and issues when using newer cuda versions (#8525) @robertmaynard
    • Correct unused parameter warnings in transform and unary ops (#8521) @robertmaynard
    • Correct unused parameter warnings in string algorithms (#8509) @robertmaynard
    • Add in JNI APIs for scan, replace_nulls, group_by.scan, and group_by.replace_nulls (#8503) @revans2
    • Fix 21.08 forward-merge conflicts (#8502) @ajschmidt8
    • Fix Cython formatting command in Contributing.md. (#8496) @marlenezw
    • Bug/correct unused parameters in reshape and text (#8495) @robertmaynard
    • Correct unused parameter warnings in partitioning and stream compact (#8494) @robertmaynard
    • Correct unused parameter warnings in labelling and list algorithms (#8493) @robertmaynard
    • Refactor index construction (#8485) @vyasr
    • Correct unused parameter warnings in replace algorithms (#8483) @robertmaynard
    • Correct unused parameter warnings in reduction algorithms (#8481) @robertmaynard
    • Correct unused parameter warnings in io algorithms (#8480) @robertmaynard
    • Correct unused parameter warnings in interop algorithms (#8479) @robertmaynard
    • Correct unused parameter warnings in filling algorithms (#8468) @robertmaynard
    • Correct unused parameter warnings in groupby (#8467) @robertmaynard
    • use libcu++ time_point as timestamp (#8466) @karthikeyann
    • Modify reprog_device::extract to return groups in a single pass (#8460) @davidwendt
    • Update minimum Dask requirement to 2021.6.0 (#8458) @pentschev
    • Fix failures when performing binary operations on DataFrames with empty columns (#8452) @ChrisJar
    • Fix conflicts in 8447 (#8448) @ajschmidt8
    • Add serialization methods for List and StructDtype (#8441) @charlesbluca
    • Replace make_empty_strings_column with make_empty_column (#8435) @davidwendt
    • JNI bindings for get_element (#8433) @revans2
    • Update dask make_meta changes to be compatible with dask upstream (#8426) @galipremsagar
    • Unpin dask version on CI (#8425) @galipremsagar
    • Add benchmark for strings/fixed_point convert APIs (#8417) @davidwendt
    • Adapt cudf::scalar classes to changes in rmm::device_scalar (#8411) @harrism
    • Add benchmark for strings/integers convert APIs (#8402) @davidwendt
    • Enable multi-file partitioning in dask_cudf.read_parquet (#8393) @rjzamora
    • Correct unused parameter warnings in rolling algorithms (#8390) @robertmaynard
    • Correct unused parameters in column round and search (#8389) @robertmaynard
    • Add functionality to apply Dtype metadata to ColumnBase (#8373) @charlesbluca
    • Refactor setting stack size in regex code (#8358) @davidwendt
    • Update Java bindings to 21.08-SNAPSHOT (#8344) @pxLi
    • Replace remaining uses of device_vector (#8343) @harrism
    • Statically link libnvcomp into libcudfjni (#8334) @jlowe
    • Resolve auto merge conflicts for Branch 21.08 from branch 21.06 (#8329) @galipremsagar
    • Minor code refactor for sorted_order (#8326) @wbo4958
    • Remove special Index class from the general index class hierarchy (#8309) @vyasr
    • Add first-class dtype utilities (#8308) @vyasr
    • Add option to link Java bindings with Arrow dynamically (#8307) @jlowe
    • Refactor ColumnMethods and its subclasses to remove column argument and require parent argument (#8306) @shwina
    • Refactor scatter for list columns (#8255) @isVoid
    • Expose pack/unpack API to Python (#8153) @charlesbluca
    • Adding cudf.cut method (#8002) @marlenezw
    • Optimize string gather performance for large strings (#7980) @gaohao95
    • Add peak memory usage tracking to cuIO benchmarks (#7770) @devavret
    • Updating Clang Version to 11.0.0 (#6695) @codereport
    Source code(tar.gz)
    Source code(zip)
  • v21.06.01(Jun 17, 2021)

  • v21.06.00(Jun 9, 2021)

    🚨 Breaking Changes

    • Add support for make_meta_obj dispatch in dask-cudf (#8342) @galipremsagar
    • Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
    • Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
    • Update ORC statistics API to use C++17 standard library (#8241) @vuule
    • Preserve column hierarchy when getting NULL row from LIST column (#8206) @isVoid
    • Groupby.shift c++ API refactor and python binding (#8131) @isVoid

    🐛 Bug Fixes

    • Fix struct flattening to add a validity column only when the input column has null element (#8374) @ttnghia
    • Compilation fix: Remove redefinition for std::is_same_v() (#8369) @mythrocks
    • Add backward compatibility for dask-cudf to work with other versions of dask (#8368) @galipremsagar
    • Handle empty results with nested types in copy_if_else (#8359) @nvdbaranec
    • Handle nested column types properly for empty parquet files. (#8350) @nvdbaranec
    • Raise error when unsupported arguments are passed to dask_cudf.DataFrame.sort_values (#8349) @galipremsagar
    • Raise NotImplementedError for axis=1 in rank (#8347) @galipremsagar
    • Add support for make_meta_obj dispatch in dask-cudf (#8342) @galipremsagar
    • Update Java string concatenate test for single column (#8330) @tgravescs
    • Use empty_like in scatter (#8314) @revans2
    • Fix concatenate_lists_ignore_null on rows of all_nulls (#8312) @sperlingxx
    • Add separator-on-null parameter to strings concatenate APIs (#8282) @davidwendt
    • COLLECT_LIST support returning empty output columns. (#8279) @mythrocks
    • Update io util to convert path like object to string (#8275) @ayushdg
    • Fix result column types for empty inputs to rolling window (#8274) @mythrocks
    • Actually test equality in assert_groupby_results_equal (#8272) @shwina
    • CMake always explicitly specify a source files extension (#8270) @robertmaynard
    • Fix struct binary search and struct flattening (#8268) @ttnghia
    • Revert "patch thrust to fix intmax num elements limitation in scan_by_key" (#8263) @cwharris
    • upgrade dlpack to 0.5 (#8262) @cwharris
    • Fixes CSV-reader type inference for thousands separator and decimal point (#8261) @elstehle
    • Fix incorrect assertion in Java concat (#8258) @sperlingxx
    • Copy nested types upon construction (#8244) @isVoid
    • Preserve column hierarchy when getting NULL row from LIST column (#8206) @isVoid
    • Clip decimal binary op precision at max precision (#8194) @ChrisJar

    📖 Documentation

    • Add docstring for dask_cudf.read_csv (#8355) @galipremsagar
    • Fix cudf release version in readme (#8331) @galipremsagar
    • Fix structs column description in dev docs (#8318) @isVoid
    • Update readme with correct CUDA versions (#8315) @raydouglass
    • Add description of the cuIO GDS integration (#8293) @vuule
    • Remove unused parameter from copy_partition kernel documentation (#8283) @robertmaynard

    🚀 New Features

    • Add support merging b/w categorical data (#8332) @galipremsagar
    • Java: Support struct scalar (#8327) @sperlingxx
    • added _is_homogeneous property (#8299) @shaneding
    • Added decimal writing for CSV writer (#8296) @kaatish
    • Java: Support creating a scalar from utf8 string (#8294) @firestarman
    • Add Java API for Concatenate strings with separator (#8289) @tgravescs
    • strings::join_list_elements options for empty list inputs (#8285) @ttnghia
    • Return python lists for getitem calls to list type series (#8265) @brandon-b-miller
    • add unit tests for lead/lag on list for row window (#8259) @wbo4958
    • Create a String column from UTF8 String byte arrays (#8257) @firestarman
    • Support scattering list_scalar (#8256) @isVoid
    • Implement lists::concatenate_list_elements (#8231) @ttnghia
    • Support for struct scalars. (#8220) @nvdbaranec
    • Add support for decimal types in ORC writer (#8198) @vuule
    • Support create lists column from a list_scalar (#8185) @isVoid
    • Groupby.shift c++ API refactor and python binding (#8131) @isVoid
    • Add groupby::replace_nulls(replace_policy) api (#7118) @isVoid

    🛠️ Improvements

    • Support Dask + Distributed 2021.05.1 (#8392) @jakirkham
    • Add aliases for string methods (#8353) @shwina
    • Update environment variable used to determine cuda_version (#8321) @ajschmidt8
    • JNI: Refactor the code of making column from scalar (#8310) @firestarman
    • Update CHANGELOG.md links for calver (#8303) @ajschmidt8
    • Merge branch-0.19 into branch-21.06 (#8302) @ajschmidt8
    • use address and length for GDS reads/writes (#8301) @rongou
    • Update cudfjni version to 21.06.0 (#8292) @pxLi
    • Update docs build script (#8284) @ajschmidt8
    • Make device_buffer streams explicit and enforce move construction (#8280) @harrism
    • Introduce a common parent class for NumericalColumn and DecimalColumn (#8278) @vyasr
    • Do not add nulls to the hash table when null_equality::NOT_EQUAL is passed to left_semi_join and left_anti_join (#8277) @nvdbaranec
    • Enable implicit casting when concatenating mixed types (#8276) @ChrisJar
    • Fix CMake FindPackage rmm, pin dev envs' dlpack to v0.3 (#8271) @trxcllnt
    • Update cudfjni version to 21.06 (#8267) @pxLi
    • support RMM aligned resource adapter in JNI (#8266) @rongou
    • Pass compiler environment variables to conda python build (#8260) @Ethyling
    • Remove abc inheritance from Serializable (#8254) @vyasr
    • Move more methods into SingleColumnFrame (#8253) @vyasr
    • Update ORC statistics API to use C++17 standard library (#8241) @vuule
    • Correct unused parameter warnings in dictonary algorithms (#8239) @robertmaynard
    • Correct unused parameters in the copying algorithms (#8232) @robertmaynard
    • IO statistics cleanup (#8191) @kaatish
    • Refactor of rolling_window implementation. (#8158) @nvdbaranec
    • Add a flag for allowing single quotes in JSON strings. (#8144) @nvdbaranec
    • Column refactoring 2 (#8130) @vyasr
    • support space in workspace (#7956) @jolorunyomi
    • Support collect_set on rolling window (#7881) @sperlingxx
    Source code(tar.gz)
    Source code(zip)
  • v0.19.2(Apr 28, 2021)

    🚨 Breaking Changes

    • Allow hash_partition to take a seed value (#7771) @magnatelee
    • Allow merging index column with data column using keyword "on" (#7736) @skirui-source
    • Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
    • Replace device_vector with device_uvector in null_mask (#7715) @harrism
    • Don't identify decimals as strings. (#7710) @vyasr
    • Fix Java Parquet write after writer API changes (#7655) @revans2
    • Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
    • Update missing docstring examples in python public APIs (#7546) @galipremsagar
    • Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
    • Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
    • Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
    • Add struct support to parquet writer (#7461) @devavret
    • Join APIs that return gathermaps (#7454) @shwina
    • fixed_point + cudf::binary_operation API Changes (#7435) @codereport
    • Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
    • Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
    • Refactor strings column factories (#7397) @harrism
    • Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
    • Upgrade pandas to 1.2 (#7375) @galipremsagar
    • Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
    • Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

    🐛 Bug Fixes

    • unsnap: busy wait a number of cycles (#8073) @vuule
    • Fix returned column type when extracting from an empty list column (#8031) @jlowe
    • Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
    • Fix a NameError in meta dispatch API (#7996) @galipremsagar
    • Reindex in DataFrame.__setitem__ (#7957) @galipremsagar
    • jitify direct-to-cubin compilation and caching. (#7919) @cwharris
    • Use dynamic cudart for nvcomp in java build (#7896) @abellina
    • fix "incompatible redefinition" warnings (#7894) @cwharris
    • cudf consistently specifies the cuda runtime (#7887) @robertmaynard
    • disable verbose output for jitify_preprocess (#7886) @cwharris
    • CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
    • Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
    • cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
    • Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
    • Sort by index in groupby tests more consistently (#7802) @shwina
    • Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
    • Add decimal column handling in copy_type_metadata (#7788) @shwina
    • Add column names validation in parquet writer (#7786) @galipremsagar
    • Fix Java explode outer unit tests (#7782) @jlowe
    • Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
    • User resource fix for replace_nulls (#7769) @magnatelee
    • Fix type dispatch for columnar replace_nulls (#7768) @jlowe
    • Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar
    • Fix slicing and arrow representations of decimal columns (#7755) @vyasr
    • Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
    • Implement scatter for struct columns (#7752) @ttnghia
    • Fix data corruption in string columns (#7746) @galipremsagar
    • Fix string length in stripe dictionary building (#7744) @kaatish
    • Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
    • Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller
    • Fix dictionary size computation in ORC writer (#7737) @vuule
    • Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport
    • Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
    • Disable column_view data accessors for unsupported types (#7725) @jrhemstad
    • Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar
    • Don't identify decimals as strings. (#7710) @vyasr
    • Fix return type of DataFrame.argsort (#7706) @galipremsagar
    • Fix/correct cudf installed package requirements (#7688) @robertmaynard
    • Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
    • Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
    • Fix Java Parquet write after writer API changes (#7655) @revans2
    • Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
    • Fix internal compiler error during JNI Docker build (#7645) @jlowe
    • Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
    • Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
    • Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
    • Fix specifying GPU architecture in JNI build (#7612) @jlowe
    • Fix ORC writer OOM issue (#7605) @vuule
    • Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
    • Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
    • Fix missing Dask imports (#7580) @kkraus14
    • CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
    • Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
    • Fix ORC writer output corruption with string columns (#7565) @vuule
    • Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
    • FIX Fix Anaconda upload args (#7558) @dillon-cullinan
    • Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
    • FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
    • Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
    • Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
    • Update missing docstring examples in python public APIs (#7546) @galipremsagar
    • Decimal32 Build Fix (#7544) @razajafri
    • FIX Retry conda output location (#7540) @dillon-cullinan
    • fix missing renames of dask git branches from master to main (#7535) @kkraus14
    • Remove detail from device_span (#7533) @rwlee
    • Change dask and distributed branch to main (#7532) @dantegd
    • Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
    • Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
    • Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
    • Change jit launch to safe_launch (#7510) @devavret
    • Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
    • Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
    • Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
    • Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
    • Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
    • Correctly compile benchmarks (#7485) @robertmaynard
    • Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
    • Fix __repr__ for categorical dtype (#7476) @galipremsagar
    • Java cleaner synchronization (#7474) @abellina
    • Fix java float/double parsing tests (#7473) @revans2
    • Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
    • Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
    • Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport
    • fix cuFile JNI compile errors (#7445) @rongou
    • Support Series.__setitem__ with key to a new row (#7443) @isVoid
    • Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
    • Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
    • Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
    • Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
    • Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
    • Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
    • Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
    • fix Arrow CMake file (#7358) @rongou
    • Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
    • Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar
    • Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
    • FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

    📖 Documentation

    • Fix join API doxygen (#7890) @shwina
    • Add Resources to README. (#7697) @bdice
    • Add isin examples in Docstring (#7479) @galipremsagar
    • Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
    • Fix typo in regex.md doc page (#7363) @davidwendt
    • Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

    🚀 New Features

    • Enable basic reductions for decimal columns (#7776) @ChrisJar
    • Enable join on decimal columns (#7764) @ChrisJar
    • Allow merging index column with data column using keyword "on" (#7736) @skirui-source
    • Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
    • Add support for unique groupby aggregation (#7726) @shwina
    • Expose libcudf's label_bins function to cudf (#7724) @vyasr
    • Adding support for equi-join on struct (#7720) @hyperbolic2346
    • Add decimal column comparison operations (#7716) @isVoid
    • Implement scan operations for decimal columns (#7707) @ChrisJar
    • Enable typecasting between decimal and int (#7691) @ChrisJar
    • Enable decimal support in parquet writer (#7673) @devavret
    • Adds list.unique API (#7664) @isVoid
    • Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
    • Add lists.sort_values API (#7657) @isVoid
    • Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
    • Adds explode API (#7607) @isVoid
    • Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid
    • Implement cudf::label_bins() (#7554) @vyasr
    • Add Python bindings for lists::contains (#7547) @skirui-source
    • cudf::row_bit_count() support. (#7534) @nvdbaranec
    • Implement drop_list_duplicates (#7528) @ttnghia
    • Add Python bindings for lists::extract_lists_element (#7505) @skirui-source
    • Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
    • Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
    • Add struct support to parquet writer (#7461) @devavret
    • Enable type conversion from float to decimal type (#7450) @ChrisJar
    • Add cython for converting strings/fixed-point functions (#7429) @davidwendt
    • Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
    • Implement groupby collect_set (#7420) @ttnghia
    • Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
    • Refactor strings column factories (#7397) @harrism
    • Add groupby scan operations (sort groupby) (#7387) @karthikeyann
    • Add cudf::explode_position (#7376) @hyperbolic2346
    • Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
    • Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
    • Add Series.drop api (#7304) @isVoid
    • get_json_object() implementation (#7286) @nvdbaranec
    • Python API for LIstMethods.len() (#7283) @isVoid
    • Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
    • Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
    • Fix inplace update of data and add Series.update (#7201) @galipremsagar
    • Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport
    • Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

    🛠️ Improvements

    • fix GDS include path for version 0.95 (#7877) @rongou
    • Update dask + distributed to 2021.4.0 (#7858) @jakirkham
    • Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar
    • Add USE_GDS as an option in build script (#7833) @pxLi
    • add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
    • Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
    • Revert dask versioning of concat dispatch (#7823) @galipremsagar
    • add copy methods in Java memory buffer (#7791) @rongou
    • Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
    • Allow hash_partition to take a seed value (#7771) @magnatelee
    • Turn on NVTX by default in java build (#7761) @tgravescs
    • Add Java bindings to join gather map APIs (#7751) @jlowe
    • Add replacements column support for Java replaceNulls (#7750) @jlowe
    • Add Java bindings for row_bit_count (#7749) @jlowe
    • Remove unused JVM array creation (#7748) @jlowe
    • Added JNI support for new is_integer (#7739) @revans2
    • Create and promote library aliases in libcudf installations (#7734) @trxcllnt
    • Support groupby operations for decimal dtypes (#7731) @vyasr
    • Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
    • Replace device_vector with device_uvector in null_mask (#7715) @harrism
    • Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
    • Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
    • Use stream in groupby calls (#7705) @karthikeyann
    • Update codeowners file (#7701) @ajschmidt8
    • Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
    • Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
    • Misc Python/Cython optimizations (#7686) @shwina
    • Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
    • Add column_device_view to orc writer (#7676) @kaatish
    • cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
    • Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
    • Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
    • Feature/optimize accessor copy (#7660) @vyasr
    • Fix find_package(cudf) (#7658) @trxcllnt
    • Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
    • Add in JNI support for count_elements (#7651) @revans2
    • Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
    • Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
    • Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
    • Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina
    • Add in JNI support for table partition (#7637) @revans2
    • Add explicit fixed_point merge test (#7635) @codereport
    • Add JNI support for IDENTITY hash partitioning (#7626) @revans2
    • Java support on explode_outer (#7625) @sperlingxx
    • Java support of casting string from/to decimal (#7623) @sperlingxx
    • Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
    • Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
    • Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
    • Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
    • Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
    • Add gbenchmarks for string substrings functions (#7603) @davidwendt
    • Refactor string conversion check (#7599) @ttnghia
    • JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
    • Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
    • ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
    • Fix auto-detecting GPU architectures (#7593) @trxcllnt
    • Reduce cudf library size (#7583) @robertmaynard
    • Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
    • Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
    • Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
    • Add gbenchmark for strings::concatenate (#7560) @davidwendt
    • Update Changelog Link (#7550) @ajschmidt8
    • Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
    • Add __repr__ for Column and ColumnAccessor (#7531) @shwina
    • Support Decimal DIV changes in cudf (#7527) @razajafri
    • Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
    • Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
    • Add gbenchmarks for strings extract function (#7522) @davidwendt
    • Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
    • Reduce compile time/size for scan.cu (#7516) @davidwendt
    • Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
    • Removed unneeded includes from traits.hpp (#7509) @davidwendt
    • FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
    • xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
    • JNI bit cast (#7493) @revans2
    • Combine rolling window function tests (#7480) @mythrocks
    • Prepare Changelog for Automation (#7477) @ajschmidt8
    • Java support for explode position (#7471) @sperlingxx
    • Update 0.18 changelog entry (#7463) @ajschmidt8
    • JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
    • Join APIs that return gathermaps (#7454) @shwina
    • Remove dependence on managed memory for multimap test (#7451) @jrhemstad
    • Use cuFile for Parquet IO when available (#7444) @vuule
    • Statistics cleanup (#7439) @kaatish
    • Add gbenchmarks for strings filter functions (#7438) @davidwendt
    • fixed_point + cudf::binary_operation API Changes (#7435) @codereport
    • Improve string gather performance (#7433) @jlowe
    • Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
    • Detail APIs for datetime functions (#7430) @magnatelee
    • Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
    • Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
    • Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
    • Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
    • Simplify type dispatch with device_storage_dispatch (#7419) @codereport
    • Java support for casting of nested child columns (#7417) @razajafri
    • Improve scalar string replace performance for long strings (#7415) @jlowe
    • Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
    • bitmask_or implementation with bitmask refactor (#7406) @rwlee
    • Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
    • Clean up included headers in device_operators.cuh (#7401) @codereport
    • Move nullable index iterator to indexalator factory (#7399) @davidwendt
    • ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
    • upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
    • Add gbenchmark for strings find/contains functions (#7392) @davidwendt
    • Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
    • Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
    • Added in JNI support for out of core sort algorithm (#7381) @revans2
    • Upgrade pandas to 1.2 (#7375) @galipremsagar
    • Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
    • jitify 2 support (#7372) @cwharris
    • compile_udf: Cache PTX for similar functions (#7371) @gmarkall
    • Add string scalar replace benchmark (#7369) @jlowe
    • Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
    • Update orc reader and writer fuzz tests (#7357) @galipremsagar
    • Improve url_decode performance for long strings (#7353) @jlowe
    • cudf::ast Small Refactorings (#7352) @codereport
    • Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
    • Use cudf::detail::make_counting_transform_iterator (#7338) @codereport
    • Change block size parameter from a global to a template param. (#7333) @nvdbaranec
    • Partial clean up of ORC writer (#7324) @vuule
    • Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
    • Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
    • Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport
    • Use string literals in fixed_point release_asserts (#7303) @codereport
    • Fix merge conflicts for #7295 (#7297) @ajschmidt8
    • Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
    • Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu
    • Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
    • Refactor dictionary support for reductions any/all (#7242) @davidwendt
    • Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
    • Interval index and interval_range (#7182) @marlenezw
    • avro reader integration tests (#7156) @cwharris
    • Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
    • Adding Interval Dtype (#6984) @marlenezw
    • Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport
    Source code(tar.gz)
    Source code(zip)
  • v0.19.1(Apr 22, 2021)

    🚨 Breaking Changes

    • Allow hash_partition to take a seed value (#7771) @magnatelee
    • Allow merging index column with data column using keyword "on" (#7736) @skirui-source
    • Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
    • Replace device_vector with device_uvector in null_mask (#7715) @harrism
    • Don't identify decimals as strings. (#7710) @vyasr
    • Fix Java Parquet write after writer API changes (#7655) @revans2
    • Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
    • Update missing docstring examples in python public APIs (#7546) @galipremsagar
    • Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
    • Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
    • Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
    • Add struct support to parquet writer (#7461) @devavret
    • Join APIs that return gathermaps (#7454) @shwina
    • fixed_point + cudf::binary_operation API Changes (#7435) @codereport
    • Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
    • Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
    • Refactor strings column factories (#7397) @harrism
    • Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
    • Upgrade pandas to 1.2 (#7375) @galipremsagar
    • Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
    • Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

    🐛 Bug Fixes

    • Fix returned column type when extracting from an empty list column (#8031) @jlowe
    • Don't reindex an new value on setitem if the original dataframe was empty (#8026) @vyasr
    • Fix a NameError in meta dispatch API (#7996) @galipremsagar
    • Reindex in DataFrame.__setitem__ (#7957) @galipremsagar
    • jitify direct-to-cubin compilation and caching. (#7919) @cwharris
    • Use dynamic cudart for nvcomp in java build (#7896) @abellina
    • fix "incompatible redefinition" warnings (#7894) @cwharris
    • cudf consistently specifies the cuda runtime (#7887) @robertmaynard
    • disable verbose output for jitify_preprocess (#7886) @cwharris
    • CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
    • Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
    • cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
    • Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
    • Sort by index in groupby tests more consistently (#7802) @shwina
    • Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
    • Add decimal column handling in copy_type_metadata (#7788) @shwina
    • Add column names validation in parquet writer (#7786) @galipremsagar
    • Fix Java explode outer unit tests (#7782) @jlowe
    • Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
    • User resource fix for replace_nulls (#7769) @magnatelee
    • Fix type dispatch for columnar replace_nulls (#7768) @jlowe
    • Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar
    • Fix slicing and arrow representations of decimal columns (#7755) @vyasr
    • Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
    • Implement scatter for struct columns (#7752) @ttnghia
    • Fix data corruption in string columns (#7746) @galipremsagar
    • Fix string length in stripe dictionary building (#7744) @kaatish
    • Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
    • Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller
    • Fix dictionary size computation in ORC writer (#7737) @vuule
    • Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport
    • Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
    • Disable column_view data accessors for unsupported types (#7725) @jrhemstad
    • Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar
    • Don't identify decimals as strings. (#7710) @vyasr
    • Fix return type of DataFrame.argsort (#7706) @galipremsagar
    • Fix/correct cudf installed package requirements (#7688) @robertmaynard
    • Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
    • Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
    • Fix Java Parquet write after writer API changes (#7655) @revans2
    • Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
    • Fix internal compiler error during JNI Docker build (#7645) @jlowe
    • Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
    • Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
    • Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
    • Fix specifying GPU architecture in JNI build (#7612) @jlowe
    • Fix ORC writer OOM issue (#7605) @vuule
    • Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
    • Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
    • Fix missing Dask imports (#7580) @kkraus14
    • CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
    • Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
    • Fix ORC writer output corruption with string columns (#7565) @vuule
    • Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
    • FIX Fix Anaconda upload args (#7558) @dillon-cullinan
    • Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
    • FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
    • Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
    • Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
    • Update missing docstring examples in python public APIs (#7546) @galipremsagar
    • Decimal32 Build Fix (#7544) @razajafri
    • FIX Retry conda output location (#7540) @dillon-cullinan
    • fix missing renames of dask git branches from master to main (#7535) @kkraus14
    • Remove detail from device_span (#7533) @rwlee
    • Change dask and distributed branch to main (#7532) @dantegd
    • Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
    • Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
    • Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
    • Change jit launch to safe_launch (#7510) @devavret
    • Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
    • Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
    • Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
    • Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
    • Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
    • Correctly compile benchmarks (#7485) @robertmaynard
    • Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
    • Fix __repr__ for categorical dtype (#7476) @galipremsagar
    • Java cleaner synchronization (#7474) @abellina
    • Fix java float/double parsing tests (#7473) @revans2
    • Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
    • Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
    • Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport
    • fix cuFile JNI compile errors (#7445) @rongou
    • Support Series.__setitem__ with key to a new row (#7443) @isVoid
    • Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
    • Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
    • Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
    • Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
    • Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
    • Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
    • Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
    • fix Arrow CMake file (#7358) @rongou
    • Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
    • Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar
    • Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
    • FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

    📖 Documentation

    • Fix join API doxygen (#7890) @shwina
    • Add Resources to README. (#7697) @bdice
    • Add isin examples in Docstring (#7479) @galipremsagar
    • Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
    • Fix typo in regex.md doc page (#7363) @davidwendt
    • Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

    🚀 New Features

    • Enable basic reductions for decimal columns (#7776) @ChrisJar
    • Enable join on decimal columns (#7764) @ChrisJar
    • Allow merging index column with data column using keyword "on" (#7736) @skirui-source
    • Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
    • Add support for unique groupby aggregation (#7726) @shwina
    • Expose libcudf's label_bins function to cudf (#7724) @vyasr
    • Adding support for equi-join on struct (#7720) @hyperbolic2346
    • Add decimal column comparison operations (#7716) @isVoid
    • Implement scan operations for decimal columns (#7707) @ChrisJar
    • Enable typecasting between decimal and int (#7691) @ChrisJar
    • Enable decimal support in parquet writer (#7673) @devavret
    • Adds list.unique API (#7664) @isVoid
    • Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
    • Add lists.sort_values API (#7657) @isVoid
    • Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
    • Adds explode API (#7607) @isVoid
    • Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid
    • Implement cudf::label_bins() (#7554) @vyasr
    • Add Python bindings for lists::contains (#7547) @skirui-source
    • cudf::row_bit_count() support. (#7534) @nvdbaranec
    • Implement drop_list_duplicates (#7528) @ttnghia
    • Add Python bindings for lists::extract_lists_element (#7505) @skirui-source
    • Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
    • Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
    • Add struct support to parquet writer (#7461) @devavret
    • Enable type conversion from float to decimal type (#7450) @ChrisJar
    • Add cython for converting strings/fixed-point functions (#7429) @davidwendt
    • Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
    • Implement groupby collect_set (#7420) @ttnghia
    • Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
    • Refactor strings column factories (#7397) @harrism
    • Add groupby scan operations (sort groupby) (#7387) @karthikeyann
    • Add cudf::explode_position (#7376) @hyperbolic2346
    • Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
    • Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
    • Add Series.drop api (#7304) @isVoid
    • get_json_object() implementation (#7286) @nvdbaranec
    • Python API for LIstMethods.len() (#7283) @isVoid
    • Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
    • Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
    • Fix inplace update of data and add Series.update (#7201) @galipremsagar
    • Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport
    • Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

    🛠️ Improvements

    • fix GDS include path for version 0.95 (#7877) @rongou
    • Update dask + distributed to 2021.4.0 (#7858) @jakirkham
    • Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar
    • Add USE_GDS as an option in build script (#7833) @pxLi
    • add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
    • Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
    • Revert dask versioning of concat dispatch (#7823) @galipremsagar
    • add copy methods in Java memory buffer (#7791) @rongou
    • Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
    • Allow hash_partition to take a seed value (#7771) @magnatelee
    • Turn on NVTX by default in java build (#7761) @tgravescs
    • Add Java bindings to join gather map APIs (#7751) @jlowe
    • Add replacements column support for Java replaceNulls (#7750) @jlowe
    • Add Java bindings for row_bit_count (#7749) @jlowe
    • Remove unused JVM array creation (#7748) @jlowe
    • Added JNI support for new is_integer (#7739) @revans2
    • Create and promote library aliases in libcudf installations (#7734) @trxcllnt
    • Support groupby operations for decimal dtypes (#7731) @vyasr
    • Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
    • Replace device_vector with device_uvector in null_mask (#7715) @harrism
    • Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
    • Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
    • Use stream in groupby calls (#7705) @karthikeyann
    • Update codeowners file (#7701) @ajschmidt8
    • Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
    • Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
    • Misc Python/Cython optimizations (#7686) @shwina
    • Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
    • Add column_device_view to orc writer (#7676) @kaatish
    • cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
    • Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
    • Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
    • Feature/optimize accessor copy (#7660) @vyasr
    • Fix find_package(cudf) (#7658) @trxcllnt
    • Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
    • Add in JNI support for count_elements (#7651) @revans2
    • Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
    • Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
    • Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
    • Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina
    • Add in JNI support for table partition (#7637) @revans2
    • Add explicit fixed_point merge test (#7635) @codereport
    • Add JNI support for IDENTITY hash partitioning (#7626) @revans2
    • Java support on explode_outer (#7625) @sperlingxx
    • Java support of casting string from/to decimal (#7623) @sperlingxx
    • Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
    • Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
    • Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
    • Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
    • Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
    • Add gbenchmarks for string substrings functions (#7603) @davidwendt
    • Refactor string conversion check (#7599) @ttnghia
    • JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
    • Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
    • ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
    • Fix auto-detecting GPU architectures (#7593) @trxcllnt
    • Reduce cudf library size (#7583) @robertmaynard
    • Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
    • Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
    • Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
    • Add gbenchmark for strings::concatenate (#7560) @davidwendt
    • Update Changelog Link (#7550) @ajschmidt8
    • Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
    • Add __repr__ for Column and ColumnAccessor (#7531) @shwina
    • Support Decimal DIV changes in cudf (#7527) @razajafri
    • Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
    • Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
    • Add gbenchmarks for strings extract function (#7522) @davidwendt
    • Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
    • Reduce compile time/size for scan.cu (#7516) @davidwendt
    • Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
    • Removed unneeded includes from traits.hpp (#7509) @davidwendt
    • FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
    • xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
    • JNI bit cast (#7493) @revans2
    • Combine rolling window function tests (#7480) @mythrocks
    • Prepare Changelog for Automation (#7477) @ajschmidt8
    • Java support for explode position (#7471) @sperlingxx
    • Update 0.18 changelog entry (#7463) @ajschmidt8
    • JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
    • Join APIs that return gathermaps (#7454) @shwina
    • Remove dependence on managed memory for multimap test (#7451) @jrhemstad
    • Use cuFile for Parquet IO when available (#7444) @vuule
    • Statistics cleanup (#7439) @kaatish
    • Add gbenchmarks for strings filter functions (#7438) @davidwendt
    • fixed_point + cudf::binary_operation API Changes (#7435) @codereport
    • Improve string gather performance (#7433) @jlowe
    • Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
    • Detail APIs for datetime functions (#7430) @magnatelee
    • Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
    • Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
    • Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
    • Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
    • Simplify type dispatch with device_storage_dispatch (#7419) @codereport
    • Java support for casting of nested child columns (#7417) @razajafri
    • Improve scalar string replace performance for long strings (#7415) @jlowe
    • Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
    • bitmask_or implementation with bitmask refactor (#7406) @rwlee
    • Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
    • Clean up included headers in device_operators.cuh (#7401) @codereport
    • Move nullable index iterator to indexalator factory (#7399) @davidwendt
    • ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
    • upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
    • Add gbenchmark for strings find/contains functions (#7392) @davidwendt
    • Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
    • Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
    • Added in JNI support for out of core sort algorithm (#7381) @revans2
    • Upgrade pandas to 1.2 (#7375) @galipremsagar
    • Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
    • jitify 2 support (#7372) @cwharris
    • compile_udf: Cache PTX for similar functions (#7371) @gmarkall
    • Add string scalar replace benchmark (#7369) @jlowe
    • Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
    • Update orc reader and writer fuzz tests (#7357) @galipremsagar
    • Improve url_decode performance for long strings (#7353) @jlowe
    • cudf::ast Small Refactorings (#7352) @codereport
    • Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
    • Use cudf::detail::make_counting_transform_iterator (#7338) @codereport
    • Change block size parameter from a global to a template param. (#7333) @nvdbaranec
    • Partial clean up of ORC writer (#7324) @vuule
    • Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
    • Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
    • Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport
    • Use string literals in fixed_point release_asserts (#7303) @codereport
    • Fix merge conflicts for #7295 (#7297) @ajschmidt8
    • Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
    • Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu
    • Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
    • Refactor dictionary support for reductions any/all (#7242) @davidwendt
    • Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
    • Interval index and interval_range (#7182) @marlenezw
    • avro reader integration tests (#7156) @cwharris
    • Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
    • Adding Interval Dtype (#6984) @marlenezw
    • Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport
    Source code(tar.gz)
    Source code(zip)
  • v0.19.0(Apr 21, 2021)

    🚨 Breaking Changes

    • Allow hash_partition to take a seed value (#7771) @magnatelee
    • Allow merging index column with data column using keyword "on" (#7736) @skirui-source
    • Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
    • Replace device_vector with device_uvector in null_mask (#7715) @harrism
    • Don't identify decimals as strings. (#7710) @vyasr
    • Fix Java Parquet write after writer API changes (#7655) @revans2
    • Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
    • Update missing docstring examples in python public APIs (#7546) @galipremsagar
    • Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
    • Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
    • Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
    • Add struct support to parquet writer (#7461) @devavret
    • Join APIs that return gathermaps (#7454) @shwina
    • fixed_point + cudf::binary_operation API Changes (#7435) @codereport
    • Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
    • Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
    • Refactor strings column factories (#7397) @harrism
    • Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
    • Upgrade pandas to 1.2 (#7375) @galipremsagar
    • Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
    • Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt

    🐛 Bug Fixes

    • Fix a NameError in meta dispatch API (#7996) @galipremsagar
    • Reindex in DataFrame.__setitem__ (#7957) @galipremsagar
    • jitify direct-to-cubin compilation and caching. (#7919) @cwharris
    • Use dynamic cudart for nvcomp in java build (#7896) @abellina
    • fix "incompatible redefinition" warnings (#7894) @cwharris
    • cudf consistently specifies the cuda runtime (#7887) @robertmaynard
    • disable verbose output for jitify_preprocess (#7886) @cwharris
    • CMake jit_preprocess_files function only runs when needed (#7872) @robertmaynard
    • Push DeviceScalar construction into cython for list.contains (#7864) @brandon-b-miller
    • cudf now sets an install rpath of $ORIGIN (#7863) @robertmaynard
    • Don't install Thrust examples, tests, docs, and python files (#7811) @robertmaynard
    • Sort by index in groupby tests more consistently (#7802) @shwina
    • Revert "Update conda recipes pinning of repo dependencies (#7743)" (#7793) @raydouglass
    • Add decimal column handling in copy_type_metadata (#7788) @shwina
    • Add column names validation in parquet writer (#7786) @galipremsagar
    • Fix Java explode outer unit tests (#7782) @jlowe
    • Fix compiler warning about non-POD types passed through ellipsis (#7781) @jrhemstad
    • User resource fix for replace_nulls (#7769) @magnatelee
    • Fix type dispatch for columnar replace_nulls (#7768) @jlowe
    • Add ignore_order parameter to dask-cudf concat dispatch (#7765) @galipremsagar
    • Fix slicing and arrow representations of decimal columns (#7755) @vyasr
    • Fixing issue with explode_outer position not nulling position entries of null rows (#7754) @hyperbolic2346
    • Implement scatter for struct columns (#7752) @ttnghia
    • Fix data corruption in string columns (#7746) @galipremsagar
    • Fix string length in stripe dictionary building (#7744) @kaatish
    • Update conda recipes pinning of repo dependencies (#7743) @mike-wendt
    • Enable dask dispatch to cuDF's is_categorical_dtype for cuDF objects (#7740) @brandon-b-miller
    • Fix dictionary size computation in ORC writer (#7737) @vuule
    • Fix cudf::cast overflow for decimal64 to int32_t or smaller in certain cases (#7733) @codereport
    • Change JNI API to avoid loading native dependencies when creating sort order classes. (#7729) @revans2
    • Disable column_view data accessors for unsupported types (#7725) @jrhemstad
    • Materialize RangeIndex when index=True in parquet writer (#7711) @galipremsagar
    • Don't identify decimals as strings. (#7710) @vyasr
    • Fix return type of DataFrame.argsort (#7706) @galipremsagar
    • Fix/correct cudf installed package requirements (#7688) @robertmaynard
    • Fix SparkMurmurHash3_32 hash inconsistencies with Apache Spark (#7672) @jlowe
    • Fix ORC reader issue with reading empty string columns (#7656) @rgsl888prabhu
    • Fix Java Parquet write after writer API changes (#7655) @revans2
    • Fixing empty null lists throwing explode_outer for a loop. (#7649) @hyperbolic2346
    • Fix internal compiler error during JNI Docker build (#7645) @jlowe
    • Fix Debug build break with device_uvectors in grouped_rolling.cu (#7633) @mythrocks
    • Parquet reader: Fix issue when using skip_rows on non-nested columns containing nulls (#7627) @nvdbaranec
    • Fix ORC reader for empty DataFrame/Table (#7624) @rgsl888prabhu
    • Fix specifying GPU architecture in JNI build (#7612) @jlowe
    • Fix ORC writer OOM issue (#7605) @vuule
    • Fix 0.18 --> 0.19 automerge (#7589) @kkraus14
    • Fix ORC issue with incorrect timestamp nanosecond values (#7581) @vuule
    • Fix missing Dask imports (#7580) @kkraus14
    • CMAKE_CUDA_ARCHITECTURES doesn't change when build-system invokes cmake (#7579) @robertmaynard
    • Another fix for offsets_end() iterator in lists_column_view (#7575) @ttnghia
    • Fix ORC writer output corruption with string columns (#7565) @vuule
    • Fix cudf::lists::sort_lists failing for sliced column (#7564) @ttnghia
    • FIX Fix Anaconda upload args (#7558) @dillon-cullinan
    • Fix index mismatch issue in equality related APIs (#7555) @galipremsagar
    • FIX Revert gpuci_conda_retry on conda file output locations (#7552) @dillon-cullinan
    • Fix offset_end iterator for lists_column_view, which was not correctl… (#7551) @ttnghia
    • Fix no such file dlpack.h error when build libcudf (#7549) @chenrui17
    • Update missing docstring examples in python public APIs (#7546) @galipremsagar
    • Decimal32 Build Fix (#7544) @razajafri
    • FIX Retry conda output location (#7540) @dillon-cullinan
    • fix missing renames of dask git branches from master to main (#7535) @kkraus14
    • Remove detail from device_span (#7533) @rwlee
    • Change dask and distributed branch to main (#7532) @dantegd
    • Update JNI build to use CUDF_USE_ARROW_STATIC (#7526) @jlowe
    • Make sure rmm::rmm CMake target is visibile to cudf users (#7524) @robertmaynard
    • Fix contiguous_split not properly handling output partitions > 2 GB. (#7515) @nvdbaranec
    • Change jit launch to safe_launch (#7510) @devavret
    • Fix comparison between Datetime/Timedelta columns and NULL scalars (#7504) @brandon-b-miller
    • Fix off-by-one error in char-parallel string scalar replace (#7502) @jlowe
    • Fix JNI deprecation of all, put it on the wrong version before (#7501) @revans2
    • Fix Series/Dataframe Mixed Arithmetic (#7491) @brandon-b-miller
    • Fix JNI build after removal of libcudf sub-libraries (#7486) @jlowe
    • Correctly compile benchmarks (#7485) @robertmaynard
    • Fix bool column corruption with ORC Reader (#7483) @rgsl888prabhu
    • Fix __repr__ for categorical dtype (#7476) @galipremsagar
    • Java cleaner synchronization (#7474) @abellina
    • Fix java float/double parsing tests (#7473) @revans2
    • Pass stream and user resource to make_default_constructed_scalar (#7469) @magnatelee
    • Improve stability of dask_cudf.DataFrame.var and dask_cudf.DataFrame.std (#7453) @rjzamora
    • Missing device_storage_dispatch change affecting cudf::gather (#7449) @codereport
    • fix cuFile JNI compile errors (#7445) @rongou
    • Support Series.__setitem__ with key to a new row (#7443) @isVoid
    • Fix BUG: Exception when PYTHONOPTIMIZE=2 (#7434) @skirui-source
    • Make inclusive scan safe for cases with leading nulls (#7432) @magnatelee
    • Fix typo in list_device_view::pair_rep_end() (#7423) @mythrocks
    • Fix string to double conversion and row equivalent comparison (#7410) @ttnghia
    • Fix thrust failure when transfering data from device_vector to host_vector with vectors of size 1 (#7382) @ttnghia
    • Fix std::exeception catch-by-reference gcc9 compile error (#7380) @davidwendt
    • Fix skiprows issue with ORC Reader (#7359) @rgsl888prabhu
    • fix Arrow CMake file (#7358) @rongou
    • Fix lists::contains() for NaN and Decimals (#7349) @mythrocks
    • Handle cupy array in Dataframe.__setitem__ (#7340) @galipremsagar
    • Fix invalid-device-fn error in cudf::strings::replace_re with multiple regex's (#7336) @davidwendt
    • FIX Add codecov upload block to gpu script (#6860) @dillon-cullinan

    📖 Documentation

    • Fix join API doxygen (#7890) @shwina
    • Add Resources to README. (#7697) @bdice
    • Add isin examples in Docstring (#7479) @galipremsagar
    • Resolving unlinked type shorthands in cudf doc (#7416) @isVoid
    • Fix typo in regex.md doc page (#7363) @davidwendt
    • Fix incorrect strings_column_view::chars_size documentation (#7360) @jlowe

    🚀 New Features

    • Enable basic reductions for decimal columns (#7776) @ChrisJar
    • Enable join on decimal columns (#7764) @ChrisJar
    • Allow merging index column with data column using keyword "on" (#7736) @skirui-source
    • Implement DecimalColumn + Scalar and add cudf.Scalars of Decimal64Dtype (#7732) @brandon-b-miller
    • Add support for unique groupby aggregation (#7726) @shwina
    • Expose libcudf's label_bins function to cudf (#7724) @vyasr
    • Adding support for equi-join on struct (#7720) @hyperbolic2346
    • Add decimal column comparison operations (#7716) @isVoid
    • Implement scan operations for decimal columns (#7707) @ChrisJar
    • Enable typecasting between decimal and int (#7691) @ChrisJar
    • Enable decimal support in parquet writer (#7673) @devavret
    • Adds list.unique API (#7664) @isVoid
    • Fix NaN handling in drop_list_duplicates (#7662) @ttnghia
    • Add lists.sort_values API (#7657) @isVoid
    • Add is_integer API that can check for the validity of a string-to-integer conversion (#7642) @ttnghia
    • Adds explode API (#7607) @isVoid
    • Adds list.take, python binding for cudf::lists::segmented_gather (#7591) @isVoid
    • Implement cudf::label_bins() (#7554) @vyasr
    • Add Python bindings for lists::contains (#7547) @skirui-source
    • cudf::row_bit_count() support. (#7534) @nvdbaranec
    • Implement drop_list_duplicates (#7528) @ttnghia
    • Add Python bindings for lists::extract_lists_element (#7505) @skirui-source
    • Add explode_outer and explode_outer_position (#7499) @hyperbolic2346
    • Match Pandas logic for comparing two objects with nulls (#7490) @brandon-b-miller
    • Add struct support to parquet writer (#7461) @devavret
    • Enable type conversion from float to decimal type (#7450) @ChrisJar
    • Add cython for converting strings/fixed-point functions (#7429) @davidwendt
    • Add struct column support to cudf::sort and cudf::sorted_order (#7422) @karthikeyann
    • Implement groupby collect_set (#7420) @ttnghia
    • Merge branch-0.18 into branch-0.19 (#7411) @raydouglass
    • Refactor strings column factories (#7397) @harrism
    • Add groupby scan operations (sort groupby) (#7387) @karthikeyann
    • Add cudf::explode_position (#7376) @hyperbolic2346
    • Add string conversion to/from decimal values libcudf APIs (#7364) @davidwendt
    • Add groupby SUM_OF_SQUARES support (#7362) @karthikeyann
    • Add Series.drop api (#7304) @isVoid
    • get_json_object() implementation (#7286) @nvdbaranec
    • Python API for LIstMethods.len() (#7283) @isVoid
    • Support null_policy::EXCLUDE for COLLECT rolling aggregation (#7264) @mythrocks
    • Add support for special tokens in nvtext::subword_tokenizer (#7254) @davidwendt
    • Fix inplace update of data and add Series.update (#7201) @galipremsagar
    • Implement cudf::group_by (hash) for decimal32 and decimal64 (#7190) @codereport
    • Adding support to specify "level" parameter for Dataframe.rename (#7135) @skirui-source

    🛠️ Improvements

    • fix GDS include path for version 0.95 (#7877) @rongou
    • Update dask + distributed to 2021.4.0 (#7858) @jakirkham
    • Add ability to extract include dirs from CUDF_HOME (#7848) @galipremsagar
    • Add USE_GDS as an option in build script (#7833) @pxLi
    • add an allocate method with stream in java DeviceMemoryBuffer (#7826) @rongou
    • Constrain dask and distributed versions to 2021.3.1 (#7825) @shwina
    • Revert dask versioning of concat dispatch (#7823) @galipremsagar
    • add copy methods in Java memory buffer (#7791) @rongou
    • Update README and CONTRIBUTING for 0.19 (#7778) @robertmaynard
    • Allow hash_partition to take a seed value (#7771) @magnatelee
    • Turn on NVTX by default in java build (#7761) @tgravescs
    • Add Java bindings to join gather map APIs (#7751) @jlowe
    • Add replacements column support for Java replaceNulls (#7750) @jlowe
    • Add Java bindings for row_bit_count (#7749) @jlowe
    • Remove unused JVM array creation (#7748) @jlowe
    • Added JNI support for new is_integer (#7739) @revans2
    • Create and promote library aliases in libcudf installations (#7734) @trxcllnt
    • Support groupby operations for decimal dtypes (#7731) @vyasr
    • Memory map the input file only when GDS compatiblity mode is not used (#7717) @vuule
    • Replace device_vector with device_uvector in null_mask (#7715) @harrism
    • Struct hashing support for SerialMurmur3 and SparkMurmur3 (#7714) @jlowe
    • Add gbenchmark for nvtext replace-tokens function (#7708) @davidwendt
    • Use stream in groupby calls (#7705) @karthikeyann
    • Update codeowners file (#7701) @ajschmidt8
    • Cleanup groupby to use host_span, device_span, device_uvector (#7698) @karthikeyann
    • Add gbenchmark for nvtext ngrams functions (#7693) @davidwendt
    • Misc Python/Cython optimizations (#7686) @shwina
    • Add gbenchmark for nvtext tokenize functions (#7684) @davidwendt
    • Add column_device_view to orc writer (#7676) @kaatish
    • cudf_kafka now uses cuDF CMake export targets (CPM) (#7674) @robertmaynard
    • Add gbenchmark for nvtext normalize functions (#7668) @davidwendt
    • Resolve unnecessary import of thrust/optional.hpp in types.hpp (#7667) @vyasr
    • Feature/optimize accessor copy (#7660) @vyasr
    • Fix find_package(cudf) (#7658) @trxcllnt
    • Work-around for gcc7 compile error on Centos7 (#7652) @davidwendt
    • Add in JNI support for count_elements (#7651) @revans2
    • Fix issues with building cudf in a non-conda environment (#7647) @galipremsagar
    • Refactor ConfigureCUDA to not conditionally insert compiler flags (#7643) @robertmaynard
    • Add gbenchmark for converting strings to/from timestamps (#7641) @davidwendt
    • Handle constructing a cudf.Scalar from a cudf.Scalar (#7639) @shwina
    • Add in JNI support for table partition (#7637) @revans2
    • Add explicit fixed_point merge test (#7635) @codereport
    • Add JNI support for IDENTITY hash partitioning (#7626) @revans2
    • Java support on explode_outer (#7625) @sperlingxx
    • Java support of casting string from/to decimal (#7623) @sperlingxx
    • Convert cudf::concatenate APIs to use spans and device_uvector (#7621) @harrism
    • Add gbenchmark for cudf::strings::translate function (#7617) @davidwendt
    • Use file(COPY ) over file(INSTALL ) so cmake output is reduced (#7616) @robertmaynard
    • Use rmm::device_uvector in place of rmm::device_vector for ORC reader/writer and cudf::io::column_buffer (#7614) @vuule
    • Refactor Java host-side buffer concatenation to expose separate steps (#7610) @jlowe
    • Add gbenchmarks for string substrings functions (#7603) @davidwendt
    • Refactor string conversion check (#7599) @ttnghia
    • JNI: Pass names of children struct columns to native Arrow IPC writer (#7598) @firestarman
    • Revert "ENH Fix stale GHA and prevent duplicates " (#7595) @mike-wendt
    • ENH Fix stale GHA and prevent duplicates (#7594) @mike-wendt
    • Fix auto-detecting GPU architectures (#7593) @trxcllnt
    • Reduce cudf library size (#7583) @robertmaynard
    • Optimize cudf::make_strings_column for long strings (#7576) @davidwendt
    • Always build and export the cudf::cudftestutil target (#7574) @trxcllnt
    • Eliminate literal parameters to uvector::set_element_async and device_scalar::set_value (#7563) @harrism
    • Add gbenchmark for strings::concatenate (#7560) @davidwendt
    • Update Changelog Link (#7550) @ajschmidt8
    • Add gbenchmarks for strings replace regex functions (#7541) @davidwendt
    • Add __repr__ for Column and ColumnAccessor (#7531) @shwina
    • Support Decimal DIV changes in cudf (#7527) @razajafri
    • Remove unneeded step parameter from strings::detail::copy_slice (#7525) @davidwendt
    • Use device_uvector, device_span in sort groupby (#7523) @karthikeyann
    • Add gbenchmarks for strings extract function (#7522) @davidwendt
    • Rename ARROW_STATIC_LIB because it conflicts with one in FindArrow.cmake (#7518) @trxcllnt
    • Reduce compile time/size for scan.cu (#7516) @davidwendt
    • Change device_vector to device_uvector in nvtext source files (#7512) @davidwendt
    • Removed unneeded includes from traits.hpp (#7509) @davidwendt
    • FIX Remove random build directory generation for ccache (#7508) @dillon-cullinan
    • xfail failing pytest in pandas 1.2.3 (#7507) @galipremsagar
    • JNI bit cast (#7493) @revans2
    • Combine rolling window function tests (#7480) @mythrocks
    • Prepare Changelog for Automation (#7477) @ajschmidt8
    • Java support for explode position (#7471) @sperlingxx
    • Update 0.18 changelog entry (#7463) @ajschmidt8
    • JNI: Support skipping nulls for collect aggregation (#7457) @firestarman
    • Join APIs that return gathermaps (#7454) @shwina
    • Remove dependence on managed memory for multimap test (#7451) @jrhemstad
    • Use cuFile for Parquet IO when available (#7444) @vuule
    • Statistics cleanup (#7439) @kaatish
    • Add gbenchmarks for strings filter functions (#7438) @davidwendt
    • fixed_point + cudf::binary_operation API Changes (#7435) @codereport
    • Improve string gather performance (#7433) @jlowe
    • Don't use user resource for a temporary allocation in sort_by_key (#7431) @magnatelee
    • Detail APIs for datetime functions (#7430) @magnatelee
    • Replace thrust::max_element with thrust::reduce in strings findall_re (#7428) @davidwendt
    • Add gbenchmark for strings split/split_record functions (#7427) @davidwendt
    • Update JNI build to use CMAKE_CUDA_ARCHITECTURES (#7425) @jlowe
    • Change nvtext::load_vocabulary_file to return a unique ptr (#7424) @davidwendt
    • Simplify type dispatch with device_storage_dispatch (#7419) @codereport
    • Java support for casting of nested child columns (#7417) @razajafri
    • Improve scalar string replace performance for long strings (#7415) @jlowe
    • Remove unneeded temporary device vector for strings scatter specialization (#7409) @davidwendt
    • bitmask_or implementation with bitmask refactor (#7406) @rwlee
    • Add other cudf::strings::replace functions to current strings replace gbenchmark (#7403) @davidwendt
    • Clean up included headers in device_operators.cuh (#7401) @codereport
    • Move nullable index iterator to indexalator factory (#7399) @davidwendt
    • ENH Pass ccache variables to conda recipe & use Ninja in CI (#7398) @Ethyling
    • upgrade maven-antrun-plugin to support maven parallel builds (#7393) @rongou
    • Add gbenchmark for strings find/contains functions (#7392) @davidwendt
    • Use CMAKE_CUDA_ARCHITECTURES (#7391) @robertmaynard
    • Refactor libcudf strings::replace to use make_strings_children utility (#7384) @davidwendt
    • Added in JNI support for out of core sort algorithm (#7381) @revans2
    • Upgrade pandas to 1.2 (#7375) @galipremsagar
    • Rename logical_cast to bit_cast and allow additional conversions (#7373) @ttnghia
    • jitify 2 support (#7372) @cwharris
    • compile_udf: Cache PTX for similar functions (#7371) @gmarkall
    • Add string scalar replace benchmark (#7369) @jlowe
    • Add gbenchmark for strings contains_re/count_re functions (#7366) @davidwendt
    • Update orc reader and writer fuzz tests (#7357) @galipremsagar
    • Improve url_decode performance for long strings (#7353) @jlowe
    • cudf::ast Small Refactorings (#7352) @codereport
    • Remove std::cout and print in the scatter test function EmptyListsOfNullableStrings. (#7342) @ttnghia
    • Use cudf::detail::make_counting_transform_iterator (#7338) @codereport
    • Change block size parameter from a global to a template param. (#7333) @nvdbaranec
    • Partial clean up of ORC writer (#7324) @vuule
    • Add gbenchmark for cudf::strings::to_lower (#7316) @davidwendt
    • Update Java bindings version to 0.19-SNAPSHOT (#7307) @pxLi
    • Move cudf::test::make_counting_transform_iterator to cudf/detail/iterator.cuh (#7306) @codereport
    • Use string literals in fixed_point release_asserts (#7303) @codereport
    • Fix merge conflicts for #7295 (#7297) @ajschmidt8
    • Add UTF-8 chars to create_random_column<string_view> benchmark utility (#7292) @davidwendt
    • Abstracting block reduce and block scan from cuIO kernels with cub apis (#7278) @rgsl888prabhu
    • Build.sh use cmake --build to drive build system invocation (#7270) @robertmaynard
    • Refactor dictionary support for reductions any/all (#7242) @davidwendt
    • Replace stream.value() with stream for stream_view args (#7236) @karthikeyann
    • Interval index and interval_range (#7182) @marlenezw
    • avro reader integration tests (#7156) @cwharris
    • Rework libcudf CMakeLists.txt to export targets for CPM (#7107) @trxcllnt
    • Adding Interval Dtype (#6984) @marlenezw
    • Cleaning up for loops with make_(counting_)transform_iterator (#6546) @codereport
    Source code(tar.gz)
    Source code(zip)
  • v0.18.1(Mar 15, 2021)

  • v0.18.0(Feb 24, 2021)

    Breaking Changes 🚨

    • Default groupby to sort=False (#7180) @isVoid
    • Add libcudf API for parsing of ORC statistics (#7136) @vuule
    • Replace ORC writer api with class (#7099) @rgsl888prabhu
    • Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
    • Replace parquet writer api with class (#7058) @rgsl888prabhu
    • Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
    • Fix default parameter values of write_csv and write_parquet (#6967) @vuule
    • Align Series.groupby API to match Pandas (#6964) @kkraus14
    • Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller

    Bug Fixes 🐛

    • Remove incorrect std::move call on return variable (#7319) @davidwendt
    • Fix failing CI ORC test (#7313) @vuule
    • Disallow constructing frames from a ColumnAccessor (#7298) @shwina
    • fix java cuFile tests (#7296) @rongou
    • Fix style issues related to NumPy (#7279) @shwina
    • Fix bug when iloc slice terminates at before-the-zero position (#7277) @isVoid
    • Fix copying dtype metadata after calling libcudf functions (#7271) @shwina
    • Move lists utility function definition out of header (#7266) @mythrocks
    • Throw if bool column would cause incorrect result when writing to ORC (#7261) @vuule
    • Use uvector in replace_nulls; Fix sort_helper::grouped_value doc (#7256) @isVoid
    • Remove floating point types from cudf::sort fast-path (#7250) @davidwendt
    • Disallow picking output columns from nested columns. (#7248) @devavret
    • Fix loc for Series with a MultiIndex (#7243) @shwina
    • Fix Arrow column test leaks (#7241) @tgravescs
    • Fix test column vector leak (#7238) @kuhushukla
    • Fix some bugs in java scalar support for decimal (#7237) @revans2
    • Improve assert_eq handling of scalar (#7220) @isVoid
    • Fix missing null_count() comparison in test framework and related failures (#7219) @nvdbaranec
    • Remove floating point types from radix sort fast-path (#7215) @davidwendt
    • Fixing parquet benchmarks (#7214) @rgsl888prabhu
    • Handle various parameter combinations in replace API (#7207) @galipremsagar
    • Export mock aws credentials for s3 tests (#7176) @ayushdg
    • Add MultiIndex.rename API (#7172) @isVoid
    • Fix importing list & struct types in from_arrow (#7162) @galipremsagar
    • Fixing parquet precision writing failing if scale is equal to precision (#7146) @hyperbolic2346
    • Update s3 tests to use moto_server (#7144) @ayushdg
    • Fix JIT cache multi-process test flakiness in slow drives (#7142) @devavret
    • Fix compilation errors in libcudf (#7138) @galipremsagar
    • Fix compilation failure caused by -Wall addition. (#7134) @codereport
    • Add informative error message for sep in CSV writer (#7095) @galipremsagar
    • Add JIT cache per compute capability (#7090) @devavret
    • Implement __hash__ method for ListDtype (#7081) @galipremsagar
    • Only upload packages that were built (#7077) @raydouglass
    • Fix comparisons between Series and cudf.NA (#7072) @brandon-b-miller
    • Handle nan values correctly in Series.one_hot_encoding (#7059) @galipremsagar
    • Add unstack() support for non-multiindexed dataframes (#7054) @isVoid
    • Fix read_orc for decimal type (#7034) @rgsl888prabhu
    • Fix backward compatibility of loading a 0.16 pkl file (#7033) @galipremsagar
    • Decimal casts in JNI became a NOOP (#7032) @revans2
    • Restore usual instance/subclass checking to cudf.DateOffset (#7029) @shwina
    • Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
    • Fix to_csv delimiter handling of timestamp format (#7023) @davidwendt
    • Pin librdkakfa to gcc 7 compatible version (#7021) @raydouglass
    • Fix fillna & dropna to also consider np.nan as a missing value (#7019) @galipremsagar
    • Fix round operator's HALF_EVEN computation for negative integers (#7014) @nartal1
    • Skip Thrust sort patch if already applied (#7009) @harrism
    • Fix cudf::hash_partition for decimal32 and decimal64 (#7006) @codereport
    • Fix Thrust unroll patch command (#7002) @harrism
    • Fix loc behaviour when key of incorrect type is used (#6993) @shwina
    • Fix int to datetime conversion in csv_read (#6991) @kaatish
    • fix excluding cufile tests by default (#6988) @rongou
    • Fix java cufile tests when cufile is not installed (#6987) @revans2
    • Make cudf::round for fixed_point when scale = -decimal_places a no-op (#6975) @codereport
    • Fix type comparison for java (#6970) @revans2
    • Fix default parameter values of write_csv and write_parquet (#6967) @vuule
    • Align Series.groupby API to match Pandas (#6964) @kkraus14
    • Fix timestamp parsing in ORC reader for timezones without transitions (#6959) @vuule
    • Fix typo in numerical.py (#6957) @rgsl888prabhu
    • fixed_point_value double-shifts in fixed_point construction (#6950) @codereport
    • fix libcu++ include path for jni (#6948) @rongou
    • Fix groupby agg/apply behaviour when no key columns are provided (#6945) @shwina
    • Avoid inserting null elements into join hash table when nulls are treated as unequal (#6943) @hyperbolic2346
    • Fix cudf::merge gtest for dictionary columns (#6942) @davidwendt
    • Pass numeric scalars of the same dtype through numeric binops (#6938) @brandon-b-miller
    • Fix N/A detection for empty fields in CSV reader (#6922) @vuule
    • Fix rmm_mode=managed parameter for gtests (#6912) @davidwendt
    • Fix nullmask offset handling in parquet and orc writer (#6889) @kaatish
    • Correct the sampling range when sampling with replacement (#6884) @ChrisJar
    • Handle nested string columns with no children in contiguous_split. (#6864) @nvdbaranec
    • Fix columns & index handling in dataframe constructor (#6838) @galipremsagar

    Documentation 📖

    • Update readme (#7318) @shwina
    • Fix typo in cudf.core.column.string.extract docs (#7253) @adelevie
    • Update doxyfile project number (#7161) @davidwendt
    • Update 10 minutes to cuDF and CuPy with new APIs (#7158) @ChrisJar
    • Cross link RMM & libcudf Doxygen docs (#7149) @ajschmidt8
    • Add documentation for support dtypes in all IO formats (#7139) @galipremsagar
    • Add groupby docs (#7100) @shwina
    • Update cudf python docstrings with new null representation (&lt;NA&gt;) (#7050) @galipremsagar
    • Make Doxygen comments formatting consistent (#7041) @vuule
    • Add docs for working with missing data (#7010) @galipremsagar
    • Remove warning in from_dlpack and to_dlpack methods (#7001) @miguelusque
    • libcudf Developer Guide (#6977) @harrism
    • Add JNI wrapper for the cuFile API (GDS) (#6940) @rongou

    New Features 🚀

    • Support numeric_only field for rank() (#7213) @isVoid
    • Add support for cudf::binary_operation TRUE_DIV for decimal32 and decimal64 (#7198) @codereport
    • Implement COLLECT rolling window aggregation (#7189) @mythrocks
    • Add support for array-like inputs in cudf.get_dummies (#7181) @galipremsagar
    • Default groupby to sort=False (#7180) @isVoid
    • Add libcudf lists column count_elements API (#7173) @davidwendt
    • Implement cudf::group_by (sort) for decimal32 and decimal64 (#7169) @codereport
    • Add encoding and compression argument to CSV writer (#7168) @VibhuJawa
    • cudf::rolling_window SUM support for decimal32 and decimal64 (#7147) @codereport
    • Adding support for explode to cuDF (#7140) @hyperbolic2346
    • Add libcudf API for parsing of ORC statistics (#7136) @vuule
    • update GDS/cuFile location for 0.9 release (#7131) @rongou
    • Add Segmented sort (#7122) @karthikeyann
    • Add cudf::binary_operation NULL_MIN, NULL_MAX & NULL_EQUALS for decimal32 and decimal64 (#7119) @codereport
    • Add scale and value methods to fixed_point (#7109) @codereport
    • Replace ORC writer api with class (#7099) @rgsl888prabhu
    • Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
    • Improve digitize API (#7071) @isVoid
    • Add List types support in data generator (#7064) @galipremsagar
    • cudf::scan support for decimal32 and decimal64 (#7063) @codereport
    • cudf::rolling ROW_NUMBER support for decimal32 and decimal64 (#7061) @codereport
    • Replace parquet writer api with class (#7058) @rgsl888prabhu
    • Support contains() on lists of primitives (#7039) @mythrocks
    • Implement cudf::rolling for decimal32 and decimal64 (#7037) @codereport
    • Add ffill and bfill to string columns (#7036) @isVoid
    • Enable round in cudf for DataFrame and Series (#7022) @ChrisJar
    • Extend replace_nulls_policy to string and dictionary type (#7004) @isVoid
    • Add segmented_gather(list_column, gather_list) (#7003) @karthikeyann
    • Add method field to fillna for fixed width columns (#6998) @isVoid
    • Manual merge of branch 0.17 into branch 0.18 (#6995) @shwina
    • Implement cudf::reduce for decimal32 and decimal64 (part 2) (#6980) @codereport
    • Add Ufunc alias look up for appropriate numpy ufunc dispatching (#6973) @VibhuJawa
    • Add pytest-xdist to dev environment.yml (#6958) @galipremsagar
    • Add Index.set_names api (#6929) @galipremsagar
    • Add replace_null API with replace_policy parameter, fixed_width column support (#6907) @isVoid
    • Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller
    • Implement update() function (#6883) @skirui-source
    • Add groupby idxmin, idxmax aggregation (#6856) @karthikeyann
    • Implement cudf::reduce for decimal32 and decimal64 (part 1) (#6814) @codereport
    • Implement cudf.DateOffset for months (#6775) @brandon-b-miller
    • Add Python DecimalColumn (#6715) @shwina
    • Add dictionary support to libcudf groupby functions (#6585) @davidwendt

    Improvements 🛠️

    • Update stale GHA with exemptions & new labels (#7395) @mike-wendt
    • Add GHA to mark issues/prs as stale/rotten (#7388) @Ethyling
    • Unpin from numpy < 1.20 (#7335) @shwina
    • Prepare Changelog for Automation (#7309) @galipremsagar
    • Prepare Changelog for Automation (#7272) @ajschmidt8
    • Add JNI support for converting Arrow buffers to CUDF ColumnVectors (#7222) @tgravescs
    • Add coverage for skiprows and num_rows in parquet reader fuzz testing (#7216) @galipremsagar
    • Define and implement more behavior for merging on categorical variables (#7209) @brandon-b-miller
    • Add CudfSeriesGroupBy to optimize dask_cudf groupby-mean (#7194) @rjzamora
    • Add dictionary column support to rolling_window (#7186) @davidwendt
    • Modify the semantics of end pointers in cuIO to match standard library (#7179) @vuule
    • Adding unit tests for fixed_point with extremely large scales (#7178) @codereport
    • Fast path single column sort (#7167) @davidwendt
    • Fix -Werror=sign-compare errors in device code (#7164) @trxcllnt
    • Refactor cudf::string_view host and device code (#7159) @davidwendt
    • Enable logic for GPU auto-detection in cudfjni (#7155) @gerashegalov
    • Java bindings for Fixed-point type support for Parquet (#7153) @razajafri
    • Add Java interface for the new API 'explode' (#7151) @firestarman
    • Replace offsets with iterators in cuIO utilities and CSV parser (#7150) @vuule
    • Add gbenchmarks for reduction aggregations any() and all() (#7129) @davidwendt
    • Update JNI for contiguous_split packed results (#7127) @jlowe
    • Add JNI and Java bindings for list_contains (#7125) @kuhushukla
    • Add Java unit tests for window aggregate 'collect' (#7121) @firestarman
    • verify window operations on decimal with java tests (#7120) @sperlingxx
    • Adds in JNI support for creating an list column from existing columns (#7112) @revans2
    • Build libcudf with -Wall (#7105) @trxcllnt
    • Add column_device_view pointers to EncColumnDesc (#7097) @kaatish
    • Add pyorc to dev environment (#7085) @galipremsagar
    • JNI support for creating struct column from existing columns and fixed bug in struct with no children (#7084) @revans2
    • Fastpath single strings column in cudf::sort (#7075) @davidwendt
    • Upgrade nvcomp to 1.2.1 (#7069) @rongou
    • Refactor ORC ProtobufReader to make it more extendable (#7055) @vuule
    • Add Java tests for decimal casts (#7051) @sperlingxx
    • Auto-label PRs based on their content (#7044) @jolorunyomi
    • Create sort gbenchmark for strings column (#7040) @davidwendt
    • Refactor io memory fetches to use hostdevice_vector methods (#7035) @ChrisJar
    • Spark Murmur3 hash functionality (#7024) @rwlee
    • Fix libcudf strings logic where size_type is used to access INT32 column data (#7020) @davidwendt
    • Adding decimal writing support to parquet (#7017) @hyperbolic2346
    • Add compression="infer" as default for dask_cudf.read_csv (#7013) @rjzamora
    • Correct ORC docstring; other minor cuIO improvements (#7012) @vuule
    • Reduce number of hostdevice_vector allocations in parquet reader (#7005) @devavret
    • Check output size overflow on strings gather (#6997) @davidwendt
    • Improve representation of MultiIndex (#6992) @galipremsagar
    • Disable some pragma unroll statements in thrust sort.h (#6982) @davidwendt
    • Minor cudf::round internal refactoring (#6976) @codereport
    • Add Java bindings for URL conversion (#6972) @jlowe
    • Enable strict_decimal_types in parquet reading (#6969) @sperlingxx
    • Add in basic support to JNI for logical_cast (#6954) @revans2
    • Remove duplicate file array_tests.cpp (#6953) @karthikeyann
    • Add null mask fixed_point_column_wrapper constructors (#6951) @codereport
    • Update Java bindings version to 0.18-SNAPSHOT (#6949) @jlowe
    • Use simplified rmm::exec_policy (#6939) @harrism
    • Add null count test for apply_boolean_mask (#6903) @harrism
    • Implement DataFrame.quantile for datetime and timedelta data types (#6902) @ChrisJar
    • Remove **kwargs from string/categorical methods (#6750) @shwina
    • Refactor rolling.cu to reduce compile time (#6512) @mythrocks
    • Add static type checking via Mypy (#6381) @shwina
    • Update to official libcu++ on Github (#6275) @trxcllnt
    Source code(tar.gz)
    Source code(zip)
  • v0.17.0(Dec 10, 2020)

  • v0.18.0a(Mar 12, 2021)

    🔗 Links

    🚨 Breaking Changes

    • Default groupby to sort=False (#7180) @isVoid
    • Add libcudf API for parsing of ORC statistics (#7136) @vuule
    • Replace ORC writer api with class (#7099) @rgsl888prabhu
    • Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
    • Replace parquet writer api with class (#7058) @rgsl888prabhu
    • Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
    • Fix default parameter values of write_csv and write_parquet (#6967) @vuule
    • Align Series.groupby API to match Pandas (#6964) @kkraus14
    • Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller

    🐛 Bug Fixes

    • Fix null-bounds calculation for ranged window queries (#7568) @mythrocks
    • Remove incorrect std::move call on return variable (#7319) @davidwendt
    • Fix failing CI ORC test (#7313) @vuule
    • Disallow constructing frames from a ColumnAccessor (#7298) @shwina
    • fix java cuFile tests (#7296) @rongou
    • Fix style issues related to NumPy (#7279) @shwina
    • Fix bug when iloc slice terminates at before-the-zero position (#7277) @isVoid
    • Fix copying dtype metadata after calling libcudf functions (#7271) @shwina
    • Move lists utility function definition out of header (#7266) @mythrocks
    • Throw if bool column would cause incorrect result when writing to ORC (#7261) @vuule
    • Use uvector in replace_nulls; Fix sort_helper::grouped_value doc (#7256) @isVoid
    • Remove floating point types from cudf::sort fast-path (#7250) @davidwendt
    • Disallow picking output columns from nested columns. (#7248) @devavret
    • Fix loc for Series with a MultiIndex (#7243) @shwina
    • Fix Arrow column test leaks (#7241) @tgravescs
    • Fix test column vector leak (#7238) @kuhushukla
    • Fix some bugs in java scalar support for decimal (#7237) @revans2
    • Improve assert_eq handling of scalar (#7220) @isVoid
    • Fix missing null_count() comparison in test framework and related failures (#7219) @nvdbaranec
    • Remove floating point types from radix sort fast-path (#7215) @davidwendt
    • Fixing parquet benchmarks (#7214) @rgsl888prabhu
    • Handle various parameter combinations in replace API (#7207) @galipremsagar
    • Export mock aws credentials for s3 tests (#7176) @ayushdg
    • Add MultiIndex.rename API (#7172) @isVoid
    • Fix importing list & struct types in from_arrow (#7162) @galipremsagar
    • Fixing parquet precision writing failing if scale is equal to precision (#7146) @hyperbolic2346
    • Update s3 tests to use moto_server (#7144) @ayushdg
    • Fix JIT cache multi-process test flakiness in slow drives (#7142) @devavret
    • Fix compilation errors in libcudf (#7138) @galipremsagar
    • Fix compilation failure caused by -Wall addition. (#7134) @codereport
    • Add informative error message for sep in CSV writer (#7095) @galipremsagar
    • Add JIT cache per compute capability (#7090) @devavret
    • Implement __hash__ method for ListDtype (#7081) @galipremsagar
    • Only upload packages that were built (#7077) @raydouglass
    • Fix comparisons between Series and cudf.NA (#7072) @brandon-b-miller
    • Handle nan values correctly in Series.one_hot_encoding (#7059) @galipremsagar
    • Add unstack() support for non-multiindexed dataframes (#7054) @isVoid
    • Fix read_orc for decimal type (#7034) @rgsl888prabhu
    • Fix backward compatibility of loading a 0.16 pkl file (#7033) @galipremsagar
    • Decimal casts in JNI became a NOOP (#7032) @revans2
    • Restore usual instance/subclass checking to cudf.DateOffset (#7029) @shwina
    • Add days check to cudf::is_timestamp using cuda::std::chrono classes (#7028) @davidwendt
    • Fix to_csv delimiter handling of timestamp format (#7023) @davidwendt
    • Pin librdkakfa to gcc 7 compatible version (#7021) @raydouglass
    • Fix fillna & dropna to also consider np.nan as a missing value (#7019) @galipremsagar
    • Fix round operator's HALF_EVEN computation for negative integers (#7014) @nartal1
    • Skip Thrust sort patch if already applied (#7009) @harrism
    • Fix cudf::hash_partition for decimal32 and decimal64 (#7006) @codereport
    • Fix Thrust unroll patch command (#7002) @harrism
    • Fix loc behaviour when key of incorrect type is used (#6993) @shwina
    • Fix int to datetime conversion in csv_read (#6991) @kaatish
    • fix excluding cufile tests by default (#6988) @rongou
    • Fix java cufile tests when cufile is not installed (#6987) @revans2
    • Make cudf::round for fixed_point when scale = -decimal_places a no-op (#6975) @codereport
    • Fix type comparison for java (#6970) @revans2
    • Fix default parameter values of write_csv and write_parquet (#6967) @vuule
    • Align Series.groupby API to match Pandas (#6964) @kkraus14
    • Fix timestamp parsing in ORC reader for timezones without transitions (#6959) @vuule
    • Fix typo in numerical.py (#6957) @rgsl888prabhu
    • fixed_point_value double-shifts in fixed_point construction (#6950) @codereport
    • fix libcu++ include path for jni (#6948) @rongou
    • Fix groupby agg/apply behaviour when no key columns are provided (#6945) @shwina
    • Avoid inserting null elements into join hash table when nulls are treated as unequal (#6943) @hyperbolic2346
    • Fix cudf::merge gtest for dictionary columns (#6942) @davidwendt
    • Pass numeric scalars of the same dtype through numeric binops (#6938) @brandon-b-miller
    • Fix N/A detection for empty fields in CSV reader (#6922) @vuule
    • Fix rmm_mode=managed parameter for gtests (#6912) @davidwendt
    • Fix nullmask offset handling in parquet and orc writer (#6889) @kaatish
    • Correct the sampling range when sampling with replacement (#6884) @ChrisJar
    • Handle nested string columns with no children in contiguous_split. (#6864) @nvdbaranec
    • Fix columns & index handling in dataframe constructor (#6838) @galipremsagar

    📖 Documentation

    • Update readme (#7318) @shwina
    • Fix typo in cudf.core.column.string.extract docs (#7253) @adelevie
    • Update doxyfile project number (#7161) @davidwendt
    • Update 10 minutes to cuDF and CuPy with new APIs (#7158) @ChrisJar
    • Cross link RMM & libcudf Doxygen docs (#7149) @ajschmidt8
    • Add documentation for support dtypes in all IO formats (#7139) @galipremsagar
    • Add groupby docs (#7100) @shwina
    • Update cudf python docstrings with new null representation (&lt;NA&gt;) (#7050) @galipremsagar
    • Make Doxygen comments formatting consistent (#7041) @vuule
    • Add docs for working with missing data (#7010) @galipremsagar
    • Remove warning in from_dlpack and to_dlpack methods (#7001) @miguelusque
    • libcudf Developer Guide (#6977) @harrism
    • Add JNI wrapper for the cuFile API (GDS) (#6940) @rongou

    🚀 New Features

    • Support numeric_only field for rank() (#7213) @isVoid
    • Add support for cudf::binary_operation TRUE_DIV for decimal32 and decimal64 (#7198) @codereport
    • Implement COLLECT rolling window aggregation (#7189) @mythrocks
    • Add support for array-like inputs in cudf.get_dummies (#7181) @galipremsagar
    • Default groupby to sort=False (#7180) @isVoid
    • Add libcudf lists column count_elements API (#7173) @davidwendt
    • Implement cudf::group_by (sort) for decimal32 and decimal64 (#7169) @codereport
    • Add encoding and compression argument to CSV writer (#7168) @VibhuJawa
    • cudf::rolling_window SUM support for decimal32 and decimal64 (#7147) @codereport
    • Adding support for explode to cuDF (#7140) @hyperbolic2346
    • Add libcudf API for parsing of ORC statistics (#7136) @vuule
    • update GDS/cuFile location for 0.9 release (#7131) @rongou
    • Add Segmented sort (#7122) @karthikeyann
    • Add cudf::binary_operation NULL_MIN, NULL_MAX & NULL_EQUALS for decimal32 and decimal64 (#7119) @codereport
    • Add scale and value methods to fixed_point (#7109) @codereport
    • Replace ORC writer api with class (#7099) @rgsl888prabhu
    • Pack/unpack functionality to convert tables to and from a serialized format. (#7096) @nvdbaranec
    • Improve digitize API (#7071) @isVoid
    • Add List types support in data generator (#7064) @galipremsagar
    • cudf::scan support for decimal32 and decimal64 (#7063) @codereport
    • cudf::rolling ROW_NUMBER support for decimal32 and decimal64 (#7061) @codereport
    • Replace parquet writer api with class (#7058) @rgsl888prabhu
    • Support contains() on lists of primitives (#7039) @mythrocks
    • Implement cudf::rolling for decimal32 and decimal64 (#7037) @codereport
    • Add ffill and bfill to string columns (#7036) @isVoid
    • Enable round in cudf for DataFrame and Series (#7022) @ChrisJar
    • Extend replace_nulls_policy to string and dictionary type (#7004) @isVoid
    • Add segmented_gather(list_column, gather_list) (#7003) @karthikeyann
    • Add method field to fillna for fixed width columns (#6998) @isVoid
    • Manual merge of branch 0.17 into branch 0.18 (#6995) @shwina
    • Implement cudf::reduce for decimal32 and decimal64 (part 2) (#6980) @codereport
    • Add Ufunc alias look up for appropriate numpy ufunc dispatching (#6973) @VibhuJawa
    • Add pytest-xdist to dev environment.yml (#6958) @galipremsagar
    • Add Index.set_names api (#6929) @galipremsagar
    • Add replace_null API with replace_policy parameter, fixed_width column support (#6907) @isVoid
    • Share factorize implementation with Index and cudf module (#6885) @brandon-b-miller
    • Implement update() function (#6883) @skirui-source
    • Add groupby idxmin, idxmax aggregation (#6856) @karthikeyann
    • Implement cudf::reduce for decimal32 and decimal64 (part 1) (#6814) @codereport
    • Implement cudf.DateOffset for months (#6775) @brandon-b-miller
    • Add Python DecimalColumn (#6715) @shwina
    • Add dictionary support to libcudf groupby functions (#6585) @davidwendt

    🛠️ Improvements

    • Update stale GHA with exemptions & new labels (#7395) @mike-wendt
    • Add GHA to mark issues/prs as stale/rotten (#7388) @Ethyling
    • Unpin from numpy < 1.20 (#7335) @shwina
    • Prepare Changelog for Automation (#7309) @galipremsagar
    • Prepare Changelog for Automation (#7272) @ajschmidt8
    • Add JNI support for converting Arrow buffers to CUDF ColumnVectors (#7222) @tgravescs
    • Add coverage for skiprows and num_rows in parquet reader fuzz testing (#7216) @galipremsagar
    • Define and implement more behavior for merging on categorical variables (#7209) @brandon-b-miller
    • Add CudfSeriesGroupBy to optimize dask_cudf groupby-mean (#7194) @rjzamora
    • Add dictionary column support to rolling_window (#7186) @davidwendt
    • Modify the semantics of end pointers in cuIO to match standard library (#7179) @vuule
    • Adding unit tests for fixed_point with extremely large scales (#7178) @codereport
    • Fast path single column sort (#7167) @davidwendt
    • Fix -Werror=sign-compare errors in device code (#7164) @trxcllnt
    • Refactor cudf::string_view host and device code (#7159) @davidwendt
    • Enable logic for GPU auto-detection in cudfjni (#7155) @gerashegalov
    • Java bindings for Fixed-point type support for Parquet (#7153) @razajafri
    • Add Java interface for the new API 'explode' (#7151) @firestarman
    • Replace offsets with iterators in cuIO utilities and CSV parser (#7150) @vuule
    • Add gbenchmarks for reduction aggregations any() and all() (#7129) @davidwendt
    • Update JNI for contiguous_split packed results (#7127) @jlowe
    • Add JNI and Java bindings for list_contains (#7125) @kuhushukla
    • Add Java unit tests for window aggregate 'collect' (#7121) @firestarman
    • verify window operations on decimal with java tests (#7120) @sperlingxx
    • Adds in JNI support for creating an list column from existing columns (#7112) @revans2
    • Build libcudf with -Wall (#7105) @trxcllnt
    • Add column_device_view pointers to EncColumnDesc (#7097) @kaatish
    • Add pyorc to dev environment (#7085) @galipremsagar
    • JNI support for creating struct column from existing columns and fixed bug in struct with no children (#7084) @revans2
    • Fastpath single strings column in cudf::sort (#7075) @davidwendt
    • Upgrade nvcomp to 1.2.1 (#7069) @rongou
    • Refactor ORC ProtobufReader to make it more extendable (#7055) @vuule
    • Add Java tests for decimal casts (#7051) @sperlingxx
    • Auto-label PRs based on their content (#7044) @jolorunyomi
    • Create sort gbenchmark for strings column (#7040) @davidwendt
    • Refactor io memory fetches to use hostdevice_vector methods (#7035) @ChrisJar
    • Spark Murmur3 hash functionality (#7024) @rwlee
    • Fix libcudf strings logic where size_type is used to access INT32 column data (#7020) @davidwendt
    • Adding decimal writing support to parquet (#7017) @hyperbolic2346
    • Add compression="infer" as default for dask_cudf.read_csv (#7013) @rjzamora
    • Correct ORC docstring; other minor cuIO improvements (#7012) @vuule
    • Reduce number of hostdevice_vector allocations in parquet reader (#7005) @devavret
    • Check output size overflow on strings gather (#6997) @davidwendt
    • Improve representation of MultiIndex (#6992) @galipremsagar
    • Disable some pragma unroll statements in thrust sort.h (#6982) @davidwendt
    • Minor cudf::round internal refactoring (#6976) @codereport
    • Add Java bindings for URL conversion (#6972) @jlowe
    • Enable strict_decimal_types in parquet reading (#6969) @sperlingxx
    • Add in basic support to JNI for logical_cast (#6954) @revans2
    • Remove duplicate file array_tests.cpp (#6953) @karthikeyann
    • Add null mask fixed_point_column_wrapper constructors (#6951) @codereport
    • Update Java bindings version to 0.18-SNAPSHOT (#6949) @jlowe
    • Use simplified rmm::exec_policy (#6939) @harrism
    • Add null count test for apply_boolean_mask (#6903) @harrism
    • Implement DataFrame.quantile for datetime and timedelta data types (#6902) @ChrisJar
    • Remove **kwargs from string/categorical methods (#6750) @shwina
    • Refactor rolling.cu to reduce compile time (#6512) @mythrocks
    • Add static type checking via Mypy (#6381) @shwina
    • Update to official libcu++ on Github (#6275) @trxcllnt
    Source code(tar.gz)
    Source code(zip)
Owner
RAPIDS
Open GPU Data Science
RAPIDS
cuDF - GPU DataFrame Library

cuDF - GPU DataFrames NOTE: For the latest stable README.md ensure you are on the main branch. Built based on the Apache Arrow columnar memory format,

RAPIDS 5.2k Dec 31, 2022
Create HTML profiling reports from pandas DataFrame objects

Pandas Profiling Documentation | Slack | Stack Overflow Generates profile reports from a pandas DataFrame. The pandas df.describe() function is great

10k Jan 01, 2023
The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common functions that add additional logs

pandas-log The goal of pandas-log is to provide feedback about basic pandas operations. It provides simple wrapper functions for the most common funct

Eyal Trabelsi 206 Dec 13, 2022
NumPy and Pandas interface to Big Data

Blaze translates a subset of modified NumPy and Pandas-like syntax to databases and other computing systems. Blaze allows Python users a familiar inte

Blaze 3.1k Jan 01, 2023
A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner

swifter A package which efficiently applies any function to a pandas dataframe or series in the fastest available manner. Blog posts Release 1.0.0 Fir

Jason Carpenter 2.2k Jan 04, 2023
Out-of-Core DataFrames for Python, ML, visualize and explore big tabular data at a billion rows per second 🚀

What is Vaex? Vaex is a high performance Python library for lazy Out-of-Core DataFrames (similar to Pandas), to visualize and explore big tabular data

vaex io 7.7k Jan 01, 2023
Modin: Speed up your Pandas workflows by changing a single line of code

Scale your pandas workflows by changing one line of code To use Modin, replace the pandas import: # import pandas as pd import modin.pandas as pd Inst

8.2k Jan 01, 2023
The easy way to write your own flavor of Pandas

Pandas Flavor The easy way to write your own flavor of Pandas Pandas 0.23 added a (simple) API for registering accessors with Pandas objects. Pandas-f

Zachary Sailer 260 Jan 01, 2023
A pure Python implementation of Apache Spark's RDD and DStream interfaces.

pysparkling Pysparkling provides a faster, more responsive way to develop programs for PySpark. It enables code intended for Spark applications to exe

Sven Kreiss 254 Dec 06, 2022
Koalas: pandas API on Apache Spark

pandas API on Apache Spark Explore Koalas docs » Live notebook · Issues · Mailing list Help Thirsty Koalas Devastated by Recent Fires The Koalas proje

Databricks 3.2k Jan 04, 2023
Pandas Google BigQuery

pandas-gbq pandas-gbq is a package providing an interface to the Google BigQuery API from pandas Installation Install latest release version via conda

Python for Data 348 Jan 03, 2023
High performance datastore for time series and tick data

Arctic TimeSeries and Tick store Arctic is a high performance datastore for numeric data. It supports Pandas, numpy arrays and pickled objects out-of-

Man Group 2.9k Dec 23, 2022
Universal 1d/2d data containers with Transformers functionality for data analysis.

XPandas (extended Pandas) implements 1D and 2D data containers for storing type-heterogeneous tabular data of any type, and encapsulates feature extra

The Alan Turing Institute 25 Mar 14, 2022
sqldf for pandas

pandasql pandasql allows you to query pandas DataFrames using SQL syntax. It works similarly to sqldf in R. pandasql seeks to provide a more familiar

yhat 1.2k Jan 09, 2023
A Python package for manipulating 2-dimensional tabular data structures

datatable This is a Python package for manipulating 2-dimensional tabular data structures (aka data frames). It is close in spirit to pandas or SFrame

H2O.ai 1.6k Jan 05, 2023