Spatiotemporal resampling methods for mlr3

Overview

mlr3spatiotempcv

Package website: release | dev

Spatiotemporal resampling methods for mlr3.

tic CRAN Status Coverage status Lifecycle: maturing CodeFactor

This package extends the mlr3 package framework with spatiotemporal resampling and visualization methods.

If you prefer the tidymodels ecosystem, have a look at the {spatialsample} package for spatial sampling methods.

Installation

CRAN version

install.packages("mlr3spatiotempcv")

Development version

remotes::install_github("mlr-org/mlr3spatiotempcv")

# R Universe Repo
install.packages('mlr3spatiotempcv', mlrorg = 'https://mlr-org.r-universe.dev')

Get Started

See the "Get Started" vignette for a quick introduction.

For more detailed information including an usage example see the "Spatiotemporal Analysis" chapter in the mlr3book.

Article "Spatiotemporal Visualization" shows how 3D subplots grids can be created.

Citation

To cite the package in publications, use the output of citation("mlr3spatiotempcv").

Other spatiotemporal resampling packages

This list does not claim to be comprehensive.

Name Language Resources
blockCV R Paper, CRAN
CAST R Paper, CRAN
ENMeval R Paper, CRAN
spatialsample R CRAN
sperrorest R Paper, CRAN
Pyspatialml Python GitHub
spacv Python GitHub

FAQ

Which resampling method should I use?
There is no single-best resampling method. It depends on your dataset characteristics and what your model should is about to predict on. The resampling scheme should reflect the final purpose of the model - this concept is called "target-oriented" resampling. For example, if the model was trained on multiple forest plots and its purpose is to predict something on unknown forest stands, the resampling structure should reflect this.
Are there more resampling methods than the one {mlr3spatiotempcv} offers?
{mlr3spatiotempcv} aims to offer all resampling methods that exist in R. Though this does not mean that it covers all resampling methods. If there are some that you are missing, feel free to open an issue.
How can I use the "blocking" concept of the old {mlr}?
This concept is now supported via the "column roles" concept available in {mlr3} [Task](https://mlr3.mlr-org.com/reference/Task.html) objects. See [this documentation](https://mlr3.mlr-org.com/reference/Resampling.html#grouping-blocking) for more information.
For the methods that offer buffering, how can an appropriate value be chosen?
There is no easy answer to this question. Buffering train and test sets reduces the similarity between both. The degree of this reduction depends on the dataset itself and there is no general approach how to choosen an appropriate buffer size. Some studies used the distance at which the autocorrelation levels off. This buffer distance often removes quite a lot of observations and needs to be calculated first.
Comments
  • Support resampling method based on predefined spatiotemporal groups

    Support resampling method based on predefined spatiotemporal groups

    Just as CAST::CreateSpacetimeFolds() does.

    I am not sure if this approach can work with all currently implemented spatial sampling methods. Even if not, we should support exactly this way of creating resamplings since some people already asked me exactly for this. @HannaMeyer Is there a dedicated name for your method? If not, do you want to make a proposal? :) You can have a look at the current names of the other methods in the README.

    It seems that @jannes-m has added temporal extension support for spcv-coords already. Let's have a look how this works in detail.

    opened by pat-s 15
  • Support presence-background option in

    Support presence-background option in "Spatial Buffer CV"

    If the target has a binary outcome, a presence-background approach (see blockCV::buffering) would be possible. Target needs to be transformed to 0/1 before sampling.

    opened by be-marc 11
  • CRS-related warning in autoplot seems incoherent

    CRS-related warning in autoplot seems incoherent

    autoplot throws the following warning when some of the resampling methods are applied to the ecuador task (see application of methods "spcv_disc" and "spcv_block" in the manuscript):

    CRS not set, transforming to WGS84 (EPSG: 4326).

    This doesn't seem to make sense, as a transformation to WGS84 is not possible when the source CRS is unknown. As far as I can see the ecuador dataset and task contains only UTM coordinates, not even lat/lon, therefore it is also not possibly to just assume that WGS84 is present or to guess which UTM zone is applicable. Just a minor issue, but potentially confusing...

    Priority: Low 
    opened by alexanderbrenning 9
  • Instantiate spcv_coords for AutoTuner

    Instantiate spcv_coords for AutoTuner

    Dear mlr3 team,

    first of all, thanks for your efforts in developing this extension package, it is very much appreciated.

    I am trying to apply spatial CV using "spcv_coords" to an AutoTuner in order to retrieve nested resampling following the process described in the mlr3 book

            RT.at_sp <- AutoTuner$new(
              learner = reg.tree,
              resampling = spatial_CV, 
              measure = opt.mse,
              search_space = param_set_RT,
              terminator = trm.evals,
              tuner = tnr.GridSearch)
    

    However, I end up with the error message:

            "Error: Resampling 'spcv_coords' may not be instantiated". 
    

    The same error message remains, even if I try to instantiate the task manually beforehand using the command

            spatial_CV$instantiate(sp_task)
    

    as described in 2.5.2.

    As I am not an expert, do I make something wrong, or is spatial CV not yet implemented for use with AutoTuner?

    Thank you very much! BR, Jürgen

    Type: Question 
    opened by jue-d 9
  • Planar versus great circle distance

    Planar versus great circle distance

    When using spatial object with unprojected CRS (i.e. lat/lon), does mlr3spatiotempcv use great circle distance (on the ellipsoid) or Euclidean distances based on lat/lon values? Is this handled consistently across resampling tools, e.g. buffering and clustering? This should be clarified in the paper, but perhaps it should be turned into a feature request...

    Priority: Low Type: Question 
    opened by alexanderbrenning 8
  • Visualization

    Visualization

    #6

    plot.ResamplingSpCVBlock, plot.ResamplingSpCVEnv, plot.ResamplingSpCVKmeans are the same. We could create a super class to just have one plotting function. plot.ResamplingSpCVBuffer is different because it is a leave-one-out cross-validation, which cannot be visualized in the same way.

    • [x] Single fold plot

    • [x] Single train-test plot

    • [x] Multi train-test plot

    • [x] Unify redundant code

    • [x] Update documentation

    • [x] add tests

    Examples are in inst/mlr3spatiotemporal_test.R at the end of the file.

    opened by be-marc 8
  • Checkerboard pattern with spcv_block?

    Checkerboard pattern with spcv_block?

    Dear mlr3spatiotempcv team,

    First, many thanks for your hard work on this excellent resource.

    I am having an issues producing a checkerboard sampling pattern using spcv_block. Instead of getting a checkerboard spatial partitioning, I always get something that looks more like a random sampling pattern. I have been successful creating a checkerboard pattern using the blockCV functions directly.

    Here is a reproducible example that fails to produce a checkerboard sampling pattern:

    library(blockCV)
    library(mlr3)
    library(mlr3spatiotempcv)
    
    x <- runif(5000, -80.5, -75)
    y <- runif(5000, 39.7, 42)
    
    data <- data.frame(spp="test", 
                       label=factor(round(runif(length(x), 0, 1))),
                       x=x,
                       y=y)
    
    testTask <- TaskClassifST$new(id = "test", 
                                  backend = data, 
                                  target = "label",
                                  positive="1",
                                  extra_args = list(coordinate_names=c("x", "y"),
                                                    crs="EPSG: 4326"))
    
    blockSamp <- rsmp("spcv_block",
                      folds=2,
                      range=50000,
                      selection="checkerboard")
    blockSamp$instantiate(testTask)
    autoplot(blockSamp, testTask)
    

    Rplot01

    Priority: High Status: In Progress Type: Bug 
    opened by fitzLab-AL 7
  • Code Review

    Code Review

    • Be reasonable with dependencies. E.g., we do not need stringr for str_detect, just use grepl() instead.

    • Some examples are in DONTRUN. Can we put them in if (requireNamespace(...)) blocks instead?

    • ResamplingSpCVBuffer looks similar to LOO. If this is the case, the instance should be stored more efficiently.

    • autoplot tests are not working for me or are waaay to slow for unit tests (I'm stuck in there with 100% CPU)

    • blockCV::spatialBlock() seems to call print(). It is convention (maybe even a CRAN policy?) to use message() which you can be suppressed. This should be reported upstream.

    • blockCV is terrible slow (and has a stupid long dependency chain). Is there an alternative?

    • Suggested packages should be explicitly attached with require_namespaces.

    opened by mllg 7
  • `.$folds()` of all Repeated* classes returns wrong fold number

    `.$folds()` of all Repeated* classes returns wrong fold number

    library(mlr3spatiotempcv)
    library(mlr3)
    rsp <- rsmp("repeated_spcv_coords", folds = 3, repeats = 5)
    rsp$instantiate(tsk("ecuador"))
    
    # should return 3
    rsp$folds(6)
    #> [1] 1
    

    Created on 2021-04-17 by the reprex package (v2.0.0)

    This is because in https://github.com/mlr-org/mlr3spatiotempcv/blob/b9ded4ac098655dc00c300b48426bd6d4cd0a97a/R/ResamplingRepeatedSpCVCoords.R#L54 %% is used whereas it should be %/%.

    But there is more to it - I think the method should look like

        folds = function(iters) {
          iters = assert_integerish(iters, any.missing = FALSE, coerce = TRUE)
          n_folds = ((self$iters - 1L) %/% as.integer(self$param_set$values$repeats)) + 1L
    
          if (all(iters <= n_folds)) {
            return(iters)
          } else {
            # modify all entries which are > n_folds
            iters[which(iters > n_folds)] = iters[which(iters > n_folds)] - n_folds
            return(iters)
          }
        }
    
    Priority: High Status: Accepted Type: Bug 
    opened by pat-s 6
  • Expand Table listing all resampling methods

    Expand Table listing all resampling methods

    • [x] add some use cases for each method
    • [x] list more implementations for each method (eventually also in other languages?)

    Also consider to move methods operating in the feature space into a distinct group, e.g.:

    • spatial
    • spatiotemporal
    • feature space
    opened by pat-s 6
  • New `Task*ST` API, consolidate `autoplot()`

    New `Task*ST` API, consolidate `autoplot()`

    • arguments crs, coordinate_names and coords_as_features are now passed directly in the constructor instead of list extra_args
    • added argument label
    • improved as_task_* converters
    • Column role coordinates was renamed to coordinate to cope with the singular naming of column roles
    • Task printer only returns the first 10 coordinate rows
    • Consolidated autoplot() code internally
    • Improved CLUTO test setup

    fixes #116

    opened by pat-s 5
  • Please remove dependencies on **rgdal**, **rgeos**, and/or **maptools**

    Please remove dependencies on **rgdal**, **rgeos**, and/or **maptools**

    This package depends on (depends, imports or suggests) raster and one or more of the retiring packages rgdal, rgeos or maptools (https://r-spatial.org/r/2022/04/12/evolution.html, https://r-spatial.org/r/2022/12/14/evolution2.html). Since raster 3.6.3, all use of external FOSS library functionality has been transferred to terra, making the retiring packages very likely redundant. It would help greatly if you could remove dependencies on the retiring packages as soon as possible.

    opened by rsbivand 0
  • `as_task_*_st` and friends could allow setting column roles directly

    `as_task_*_st` and friends could allow setting column roles directly

    We could support this via the ellipsis. Otherwise setting the respective column roles could be somewhat easily forgotten. On the other hand behaviour would differ than compared to as_task_*() from mlr3 as such custom conversions would not be supported there.

    @be-marc what do you think?

    Example:

    data("cookfarm_sample", package = "mlr3spatiotempcv")
    
    # data.frame
    as_task_regr_st(cookfarm_sample, target = "PHIHOX",
      coords_as_features = FALSE,
      crs = 26911,
      coordinate_names = c("x", "y"),
      column_role_space = "foo",
      column_role_time = "time"
    )
    
    ```
    Priority: Low Status: Review Needed Type: Optimization 
    opened by pat-s 2
  • Longterm play of Task*ST and DataBackends

    Longterm play of Task*ST and DataBackends

    With the addition of spatial DataBackends (DataBackendVector and DataBackendRaster) from {mlr3spatial} multiple combinations of Tasks and Backends are possible:

    • Task + spatial backends
    • Task*ST + non-spatial backend
    • Task*ST + spatial backend

    Moving forward and to simplify both usage and development, we should pick one combination as the "recommended" one and potentially issue warnings for others.

    cc @be-marc

    Priority: Medium Status: In Progress Type: Optimization 
    opened by pat-s 1
  • New SpCV method Zalazar et al.

    New SpCV method Zalazar et al.

    Unfortunately the GH repo leads to a 404. Contacted the author, he wants to fix it.

    https://www.sciencedirect.com/science/article/abs/pii/S0920410521015023

    opened by pat-s 0
  • Temporal CV

    Temporal CV

    I currently have a task with a column that is a date. As the task is to basically predict values in the future, a cross-validation strategy that can take this into account would be required. Similar to see RollingWindowCV. As this is a very common use-case, we should perhaps think about implementing this.

    • This is implemented in mlr3forecasting, but for forecasting tasks instead of regular Classif|Regr Tasks.
    • Where should such a method live? mlr3spatiotempcv ?
    • How would we go about implementing this.
    Priority: Medium Status: Accepted Type: Enhancement 
    opened by pfistfl 13
Releases(v2.0.3)
  • v2.0.3(Nov 19, 2022)

  • v2.0.2(Aug 9, 2022)

    • Add error message when trying to create a TaskClassifST or TaskRegrST from an sf object
    • Synchronize TaskClassifST or TaskRegrST with {mlr3spatial}
    • Add support for mlr_reflections changes in {mlr3} > 0.13.4
    • Adjust "Getting Started" vignette to recent API changes
    • autoplot.ResamplingSptCVCstf(): Add missing support for argument axis_label_fontsize for x and y axes
    Source code(tar.gz)
    Source code(zip)
  • v2.0.1(Jun 23, 2022)

    Bugfixes

    • autoplot.ResamplingSptCVCstf: when multiple folds are requested, the subplots are now returned again (before, the return was empty)
    • autoplot.ResamplingSptCVCstf: the legend item for the "omitted" observations now displays the correct color and label again
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Jun 15, 2022)

    Breaking

    • Rename task cookfarm to cookfarm_mlr3. This was done to distinguish the cookfarm task implementation in {mlr3} better from the original cookfarm dataset. cookfarm_mlr3 also now comes with all rows of the upstream cookfarm task and not with a random subset as before.
    • Rewrite mlr_resampling_spctcv_cstf implementation. The method will produce different fold results compared to {mlr3spatiotempcv} <= 1.0.1. This is because of a change/fix in the sampling behavior: before, an (unwanted) stratified sampling was done on time and space variables. While this matched the upstream implementation in {CAST}, this did not match with the actual theoretical underpinning described in the literature.

    Features

    • Add support for DataBackendRaster (@be-marc, #191).
    • mlr_resampling_spctcv_cstf: a log message returns the column roles from the Task which are used for partitioning
    • The help pages for all methods now describe the methods manually rather than importing the upstream documentation of the respective method.
    • Task*ST classes now print column roles space and time (if set) (#198)
    • autoplot() gains plot_time_var argument for 3D visualizations of mlr_resamplings_sptcv_cstf resamplings with only 'space' used for partitioning (#197)
    • Vignette updates

    Bugfixes

    • All {mlr3spatiotempcv} methods now comply with the {mlr3} man file declaration logic.

    Misc

    • Escape all examples and tests for non-installed packages.
    • The cookfarm_mlr3 task now sets column roles "space" and "time" for variables SOURCEID and Date, respectively.
    • Harden CLUTO tests (#182)
    • Large update for the "spatiotemporal" section in the mlr3book
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Mar 3, 2022)

    • Fixed a issue which caused coordinates to appear in the feature set when a data.frame was supplied (#166, @be-marc)
    • Add autoplot() support for "groups" column role in rsmp("cv")
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Aug 19, 2021)

    Breaking

    • autoplot(): removed argument crs. The CRS is now inferred from the supplied Task. Setting a different CRS than the task might lead to spurious issues and the initial idea of changing the CRS for plotting to have proper axes labeling does not apply (anymore) (#144)

    Features

    • Added autoplot() support for ResamplingCustomCV (#140)

    Bug fixes

    • "spcv_block": Assert error if folds > 2 when selection = "checkerboard" (#150)
    • Fixed row duplication when creating TaskRegrST tasks from sf objects (#152)

    Miscellaneous

    • Upgrade tests to {vdiffr} 1.0.0
    • Add {rgdal} to suggests and required it in "spcv_block" since it is required in {blockCV} >= 2.1.4 and {sf} >= 1.0
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Jun 24, 2021)

    • Upgrade tests to {vdiffr} 1.0.0
    • Add {rgdal} to suggests and required it in "spcv_block" since it is required in {blockCV} >= 2.1.4 and {sf} >= 1.0
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jun 3, 2021)

    Features

    • Support clustering coords only for "sptcv_cluto"
    • Add as_task_* S3 generics: as_task_classif_st.data.frame(), as_task_classif_st.DataBackend(), as_task_classif_st.sf(), as_task_regr_st.data.frame(), as_task_regr_st.DataBackend(), as_task_regr_st.sf(), as_task_classif.TaskClassifST(), as_task_regr.TaskRegrST() (#99)
    • Add "spcv_tiles" and "repeated_spcv_tiles" (#121)
    • Add "spcv_disc" (#115)

    Bug Fixes

    • Fixed train set issues for sptcv_cstf() with space and time var (#135)
    • Fixed $folds() active binding returning wrong fold number (#120)
    • Add missing man IDs (#122)

    Misc

    • Add example 2D spatial plots to spatiotemp-viz vignette
    • Add {caret} to Suggests
    • "Cstf" methods: remove arguments in favor of param set to align with other methods (#122)
    • Inherit documentation from upstream functions (#117)
    • Vignette: Update and categorize table listing all implemented methods
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Apr 13, 2021)

    New Features

    • autoplot.ResamplingSptCVCstf(): add 2D plotting method (#106)
    • autoplot.ResamplingSptCVCstf(): add arguments show_omitted and static_image (#100)
    • autoplot() (all methods): allow adjusting point size via ... (#98)

    Maintenance

    • Remove {GSIF} package due to CRAN archival and host the cookfarm dataset standalone
    • Use Cstf method for spatiotemporal viz vignette
    • Fix help page content of ResamplingRepeatedSptCVCstf (beforehand the Cluto method was referenced accidentally)
    • Fix segfault in autoplot.ResamplingSpcvBlock example when rendering pkgdown site (unclear why this happens when show_labels = TRUE)
    • Update autoplot() examples and related documentation
    • Remove duplicate resources in Tasks "see also" fields
    • Skip a test on Solaris and macOS 3.6
    • Optimize "Spatiotemporal Visualization" vignette
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Mar 20, 2021)

    • Add support for rasterLayer argument in blockCV::spatialBlock() (#94)
    • Ensure that blockCV::spatialBlock() functions actually returns the same result when invoked via {mlr3spatiotempcv} (#93). Among other issues, blockCV::spatialBlock(selection = "checkerboard") was ignored.
    • Get coordinates names from {sf} objects dynamically. Before some functions would have errored if the coordinate names were not named "x" and "y".
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Mar 8, 2021)

  • v0.1.1(Jan 6, 2021)

  • v0.1.0(Dec 26, 2020)

  • v0.0.0.9006(Oct 27, 2020)

Implementation for On Provable Benefits of Depth in Training Graph Convolutional Networks

Implementation for On Provable Benefits of Depth in Training Graph Convolutional Networks Setup This implementation is based on PyTorch = 1.0.0. Smal

Weilin Cong 8 Oct 28, 2022
The source code and dataset for the RecGURU paper (WSDM 2022)

RecGURU About The Project Source code and baselines for the RecGURU paper "RecGURU: Adversarial Learning of Generalized User Representations for Cross

Chenglin Li 17 Jan 07, 2023
Paddle pit - Rethinking Spatial Dimensions of Vision Transformers

基于Paddle实现PiT ——Rethinking Spatial Dimensions of Vision Transformers,arxiv 官方原版代

Hongtao Wen 4 Jan 15, 2022
This is a model to classify Vietnamese sign language using Motion history image (MHI) algorithm and CNN.

Vietnamese sign lagnuage recognition using MHI and CNN This is a model to classify Vietnamese sign language using Motion history image (MHI) algorithm

Phat Pham 3 Feb 24, 2022
torchlm is aims to build a high level pipeline for face landmarks detection, it supports training, evaluating, exporting, inference(Python/C++) and 100+ data augmentations

💎A high level pipeline for face landmarks detection, supports training, evaluating, exporting, inference and 100+ data augmentations, compatible with torchvision and albumentations, can easily instal

DefTruth 142 Dec 25, 2022
PROJECT - Az Residential Real Estate Analysis

AZ RESIDENTIAL REAL ESTATE ANALYSIS -Decided on libraries to import. Includes pa

2 Jul 05, 2022
abess: Fast Best-Subset Selection in Python and R

abess: Fast Best-Subset Selection in Python and R Overview abess (Adaptive BEst Subset Selection) library aims to solve general best subset selection,

297 Dec 21, 2022
The 2nd place solution of 2021 google landmark retrieval on kaggle.

Google_Landmark_Retrieval_2021_2nd_Place_Solution The 2nd place solution of 2021 google landmark retrieval on kaggle. Environment We use cuda 11.1/pyt

229 Dec 13, 2022
Ankou: Guiding Grey-box Fuzzing towards Combinatorial Difference

Ankou Ankou is a source-based grey-box fuzzer. It intends to use a more rich fitness function by going beyond simple branch coverage and considering t

SoftSec Lab 54 Dec 24, 2022
Deep Learning to Create StepMania SM FIles

StepCOVNet Running Audio to SM File Generator Currently only produces .txt files. Use SMDataTools to convert .txt to .sm python stepmania_note_generat

Chimezie Iwuanyanwu 8 Jan 08, 2023
FasterAI: A library to make smaller and faster models with FastAI.

Fasterai fasterai is a library created to make neural network smaller and faster. It essentially relies on common compression techniques for networks

Nathan Hubens 193 Jan 01, 2023
Image Super-Resolution Using Very Deep Residual Channel Attention Networks

Image Super-Resolution Using Very Deep Residual Channel Attention Networks

kongdebug 14 Oct 14, 2022
Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning

structshot Code and data for paper "Simple and Effective Few-Shot Named Entity Recognition with Structured Nearest Neighbor Learning", Yi Yang and Arz

ASAPP Research 47 Dec 27, 2022
Keepsake is a Python library that uploads files and metadata (like hyperparameters) to Amazon S3 or Google Cloud Storage

Keepsake Version control for machine learning. Keepsake is a Python library that uploads files and metadata (like hyperparameters) to Amazon S3 or Goo

Replicate 1.6k Dec 29, 2022
A spherical CNN for weather forecasting

DeepSphere-Weather - Deep Learning on the sphere for weather/climate applications. The code in this repository provides a scalable and flexible framew

DeepSphere 47 Dec 25, 2022
Re-TACRED: Addressing Shortcomings of the TACRED Dataset

Re-TACRED Re-TACRED: Addressing Shortcomings of the TACRED Dataset

George Stoica 40 Dec 10, 2022
Crawl & visualize ICLR papers and reviews

Crawl and Visualize ICLR 2022 OpenReview Data Descriptions This Jupyter Notebook contains the data crawled from ICLR 2022 OpenReview webpages and thei

Federico Berto 75 Dec 05, 2022
Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021)

Learning RAW-to-sRGB Mappings with Inaccurately Aligned Supervision (ICCV 2021) PyTorch implementation of Learning RAW-to-sRGB Mappings with Inaccurat

Zhilu Zhang 53 Dec 20, 2022
Utilities to bridge Canvas-generated course rosters with GitLab's API.

gitlab-canvas-utils A collection of scripts originally written for CSE 13S. Oversees everything from GitLab course group creation, student repository

Eugene Chou 5 Jun 08, 2022
code for our ECCV 2020 paper "A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation"

Code for our ECCV (2020) paper A Balanced and Uncertainty-aware Approach for Partial Domain Adaptation. Prerequisites: python == 3.6.8 pytorch ==1.1.0

32 Nov 27, 2022