Spatiotemporal resampling methods for mlr3

Overview

mlr3spatiotempcv

Package website: release | dev

Spatiotemporal resampling methods for mlr3.

tic CRAN Status Coverage status Lifecycle: maturing CodeFactor

This package extends the mlr3 package framework with spatiotemporal resampling and visualization methods.

If you prefer the tidymodels ecosystem, have a look at the {spatialsample} package for spatial sampling methods.

Installation

CRAN version

install.packages("mlr3spatiotempcv")

Development version

remotes::install_github("mlr-org/mlr3spatiotempcv")

# R Universe Repo
install.packages('mlr3spatiotempcv', mlrorg = 'https://mlr-org.r-universe.dev')

Get Started

See the "Get Started" vignette for a quick introduction.

For more detailed information including an usage example see the "Spatiotemporal Analysis" chapter in the mlr3book.

Article "Spatiotemporal Visualization" shows how 3D subplots grids can be created.

Citation

To cite the package in publications, use the output of citation("mlr3spatiotempcv").

Other spatiotemporal resampling packages

This list does not claim to be comprehensive.

Name Language Resources
blockCV R Paper, CRAN
CAST R Paper, CRAN
ENMeval R Paper, CRAN
spatialsample R CRAN
sperrorest R Paper, CRAN
Pyspatialml Python GitHub
spacv Python GitHub

FAQ

Which resampling method should I use?
There is no single-best resampling method. It depends on your dataset characteristics and what your model should is about to predict on. The resampling scheme should reflect the final purpose of the model - this concept is called "target-oriented" resampling. For example, if the model was trained on multiple forest plots and its purpose is to predict something on unknown forest stands, the resampling structure should reflect this.
Are there more resampling methods than the one {mlr3spatiotempcv} offers?
{mlr3spatiotempcv} aims to offer all resampling methods that exist in R. Though this does not mean that it covers all resampling methods. If there are some that you are missing, feel free to open an issue.
How can I use the "blocking" concept of the old {mlr}?
This concept is now supported via the "column roles" concept available in {mlr3} [Task](https://mlr3.mlr-org.com/reference/Task.html) objects. See [this documentation](https://mlr3.mlr-org.com/reference/Resampling.html#grouping-blocking) for more information.
For the methods that offer buffering, how can an appropriate value be chosen?
There is no easy answer to this question. Buffering train and test sets reduces the similarity between both. The degree of this reduction depends on the dataset itself and there is no general approach how to choosen an appropriate buffer size. Some studies used the distance at which the autocorrelation levels off. This buffer distance often removes quite a lot of observations and needs to be calculated first.
Comments
  • Support resampling method based on predefined spatiotemporal groups

    Support resampling method based on predefined spatiotemporal groups

    Just as CAST::CreateSpacetimeFolds() does.

    I am not sure if this approach can work with all currently implemented spatial sampling methods. Even if not, we should support exactly this way of creating resamplings since some people already asked me exactly for this. @HannaMeyer Is there a dedicated name for your method? If not, do you want to make a proposal? :) You can have a look at the current names of the other methods in the README.

    It seems that @jannes-m has added temporal extension support for spcv-coords already. Let's have a look how this works in detail.

    opened by pat-s 15
  • Support presence-background option in

    Support presence-background option in "Spatial Buffer CV"

    If the target has a binary outcome, a presence-background approach (see blockCV::buffering) would be possible. Target needs to be transformed to 0/1 before sampling.

    opened by be-marc 11
  • CRS-related warning in autoplot seems incoherent

    CRS-related warning in autoplot seems incoherent

    autoplot throws the following warning when some of the resampling methods are applied to the ecuador task (see application of methods "spcv_disc" and "spcv_block" in the manuscript):

    CRS not set, transforming to WGS84 (EPSG: 4326).

    This doesn't seem to make sense, as a transformation to WGS84 is not possible when the source CRS is unknown. As far as I can see the ecuador dataset and task contains only UTM coordinates, not even lat/lon, therefore it is also not possibly to just assume that WGS84 is present or to guess which UTM zone is applicable. Just a minor issue, but potentially confusing...

    Priority: Low 
    opened by alexanderbrenning 9
  • Instantiate spcv_coords for AutoTuner

    Instantiate spcv_coords for AutoTuner

    Dear mlr3 team,

    first of all, thanks for your efforts in developing this extension package, it is very much appreciated.

    I am trying to apply spatial CV using "spcv_coords" to an AutoTuner in order to retrieve nested resampling following the process described in the mlr3 book

            RT.at_sp <- AutoTuner$new(
              learner = reg.tree,
              resampling = spatial_CV, 
              measure = opt.mse,
              search_space = param_set_RT,
              terminator = trm.evals,
              tuner = tnr.GridSearch)
    

    However, I end up with the error message:

            "Error: Resampling 'spcv_coords' may not be instantiated". 
    

    The same error message remains, even if I try to instantiate the task manually beforehand using the command

            spatial_CV$instantiate(sp_task)
    

    as described in 2.5.2.

    As I am not an expert, do I make something wrong, or is spatial CV not yet implemented for use with AutoTuner?

    Thank you very much! BR, Jürgen

    Type: Question 
    opened by jue-d 9
  • Planar versus great circle distance

    Planar versus great circle distance

    When using spatial object with unprojected CRS (i.e. lat/lon), does mlr3spatiotempcv use great circle distance (on the ellipsoid) or Euclidean distances based on lat/lon values? Is this handled consistently across resampling tools, e.g. buffering and clustering? This should be clarified in the paper, but perhaps it should be turned into a feature request...

    Priority: Low Type: Question 
    opened by alexanderbrenning 8
  • Visualization

    Visualization

    #6

    plot.ResamplingSpCVBlock, plot.ResamplingSpCVEnv, plot.ResamplingSpCVKmeans are the same. We could create a super class to just have one plotting function. plot.ResamplingSpCVBuffer is different because it is a leave-one-out cross-validation, which cannot be visualized in the same way.

    • [x] Single fold plot

    • [x] Single train-test plot

    • [x] Multi train-test plot

    • [x] Unify redundant code

    • [x] Update documentation

    • [x] add tests

    Examples are in inst/mlr3spatiotemporal_test.R at the end of the file.

    opened by be-marc 8
  • Checkerboard pattern with spcv_block?

    Checkerboard pattern with spcv_block?

    Dear mlr3spatiotempcv team,

    First, many thanks for your hard work on this excellent resource.

    I am having an issues producing a checkerboard sampling pattern using spcv_block. Instead of getting a checkerboard spatial partitioning, I always get something that looks more like a random sampling pattern. I have been successful creating a checkerboard pattern using the blockCV functions directly.

    Here is a reproducible example that fails to produce a checkerboard sampling pattern:

    library(blockCV)
    library(mlr3)
    library(mlr3spatiotempcv)
    
    x <- runif(5000, -80.5, -75)
    y <- runif(5000, 39.7, 42)
    
    data <- data.frame(spp="test", 
                       label=factor(round(runif(length(x), 0, 1))),
                       x=x,
                       y=y)
    
    testTask <- TaskClassifST$new(id = "test", 
                                  backend = data, 
                                  target = "label",
                                  positive="1",
                                  extra_args = list(coordinate_names=c("x", "y"),
                                                    crs="EPSG: 4326"))
    
    blockSamp <- rsmp("spcv_block",
                      folds=2,
                      range=50000,
                      selection="checkerboard")
    blockSamp$instantiate(testTask)
    autoplot(blockSamp, testTask)
    

    Rplot01

    Priority: High Status: In Progress Type: Bug 
    opened by fitzLab-AL 7
  • Code Review

    Code Review

    • Be reasonable with dependencies. E.g., we do not need stringr for str_detect, just use grepl() instead.

    • Some examples are in DONTRUN. Can we put them in if (requireNamespace(...)) blocks instead?

    • ResamplingSpCVBuffer looks similar to LOO. If this is the case, the instance should be stored more efficiently.

    • autoplot tests are not working for me or are waaay to slow for unit tests (I'm stuck in there with 100% CPU)

    • blockCV::spatialBlock() seems to call print(). It is convention (maybe even a CRAN policy?) to use message() which you can be suppressed. This should be reported upstream.

    • blockCV is terrible slow (and has a stupid long dependency chain). Is there an alternative?

    • Suggested packages should be explicitly attached with require_namespaces.

    opened by mllg 7
  • `.$folds()` of all Repeated* classes returns wrong fold number

    `.$folds()` of all Repeated* classes returns wrong fold number

    library(mlr3spatiotempcv)
    library(mlr3)
    rsp <- rsmp("repeated_spcv_coords", folds = 3, repeats = 5)
    rsp$instantiate(tsk("ecuador"))
    
    # should return 3
    rsp$folds(6)
    #> [1] 1
    

    Created on 2021-04-17 by the reprex package (v2.0.0)

    This is because in https://github.com/mlr-org/mlr3spatiotempcv/blob/b9ded4ac098655dc00c300b48426bd6d4cd0a97a/R/ResamplingRepeatedSpCVCoords.R#L54 %% is used whereas it should be %/%.

    But there is more to it - I think the method should look like

        folds = function(iters) {
          iters = assert_integerish(iters, any.missing = FALSE, coerce = TRUE)
          n_folds = ((self$iters - 1L) %/% as.integer(self$param_set$values$repeats)) + 1L
    
          if (all(iters <= n_folds)) {
            return(iters)
          } else {
            # modify all entries which are > n_folds
            iters[which(iters > n_folds)] = iters[which(iters > n_folds)] - n_folds
            return(iters)
          }
        }
    
    Priority: High Status: Accepted Type: Bug 
    opened by pat-s 6
  • Expand Table listing all resampling methods

    Expand Table listing all resampling methods

    • [x] add some use cases for each method
    • [x] list more implementations for each method (eventually also in other languages?)

    Also consider to move methods operating in the feature space into a distinct group, e.g.:

    • spatial
    • spatiotemporal
    • feature space
    opened by pat-s 6
  • New `Task*ST` API, consolidate `autoplot()`

    New `Task*ST` API, consolidate `autoplot()`

    • arguments crs, coordinate_names and coords_as_features are now passed directly in the constructor instead of list extra_args
    • added argument label
    • improved as_task_* converters
    • Column role coordinates was renamed to coordinate to cope with the singular naming of column roles
    • Task printer only returns the first 10 coordinate rows
    • Consolidated autoplot() code internally
    • Improved CLUTO test setup

    fixes #116

    opened by pat-s 5
  • Please remove dependencies on **rgdal**, **rgeos**, and/or **maptools**

    Please remove dependencies on **rgdal**, **rgeos**, and/or **maptools**

    This package depends on (depends, imports or suggests) raster and one or more of the retiring packages rgdal, rgeos or maptools (https://r-spatial.org/r/2022/04/12/evolution.html, https://r-spatial.org/r/2022/12/14/evolution2.html). Since raster 3.6.3, all use of external FOSS library functionality has been transferred to terra, making the retiring packages very likely redundant. It would help greatly if you could remove dependencies on the retiring packages as soon as possible.

    opened by rsbivand 0
  • `as_task_*_st` and friends could allow setting column roles directly

    `as_task_*_st` and friends could allow setting column roles directly

    We could support this via the ellipsis. Otherwise setting the respective column roles could be somewhat easily forgotten. On the other hand behaviour would differ than compared to as_task_*() from mlr3 as such custom conversions would not be supported there.

    @be-marc what do you think?

    Example:

    data("cookfarm_sample", package = "mlr3spatiotempcv")
    
    # data.frame
    as_task_regr_st(cookfarm_sample, target = "PHIHOX",
      coords_as_features = FALSE,
      crs = 26911,
      coordinate_names = c("x", "y"),
      column_role_space = "foo",
      column_role_time = "time"
    )
    
    ```
    Priority: Low Status: Review Needed Type: Optimization 
    opened by pat-s 2
  • Longterm play of Task*ST and DataBackends

    Longterm play of Task*ST and DataBackends

    With the addition of spatial DataBackends (DataBackendVector and DataBackendRaster) from {mlr3spatial} multiple combinations of Tasks and Backends are possible:

    • Task + spatial backends
    • Task*ST + non-spatial backend
    • Task*ST + spatial backend

    Moving forward and to simplify both usage and development, we should pick one combination as the "recommended" one and potentially issue warnings for others.

    cc @be-marc

    Priority: Medium Status: In Progress Type: Optimization 
    opened by pat-s 1
  • New SpCV method Zalazar et al.

    New SpCV method Zalazar et al.

    Unfortunately the GH repo leads to a 404. Contacted the author, he wants to fix it.

    https://www.sciencedirect.com/science/article/abs/pii/S0920410521015023

    opened by pat-s 0
  • Temporal CV

    Temporal CV

    I currently have a task with a column that is a date. As the task is to basically predict values in the future, a cross-validation strategy that can take this into account would be required. Similar to see RollingWindowCV. As this is a very common use-case, we should perhaps think about implementing this.

    • This is implemented in mlr3forecasting, but for forecasting tasks instead of regular Classif|Regr Tasks.
    • Where should such a method live? mlr3spatiotempcv ?
    • How would we go about implementing this.
    Priority: Medium Status: Accepted Type: Enhancement 
    opened by pfistfl 13
Releases(v2.0.3)
  • v2.0.3(Nov 19, 2022)

  • v2.0.2(Aug 9, 2022)

    • Add error message when trying to create a TaskClassifST or TaskRegrST from an sf object
    • Synchronize TaskClassifST or TaskRegrST with {mlr3spatial}
    • Add support for mlr_reflections changes in {mlr3} > 0.13.4
    • Adjust "Getting Started" vignette to recent API changes
    • autoplot.ResamplingSptCVCstf(): Add missing support for argument axis_label_fontsize for x and y axes
    Source code(tar.gz)
    Source code(zip)
  • v2.0.1(Jun 23, 2022)

    Bugfixes

    • autoplot.ResamplingSptCVCstf: when multiple folds are requested, the subplots are now returned again (before, the return was empty)
    • autoplot.ResamplingSptCVCstf: the legend item for the "omitted" observations now displays the correct color and label again
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0(Jun 15, 2022)

    Breaking

    • Rename task cookfarm to cookfarm_mlr3. This was done to distinguish the cookfarm task implementation in {mlr3} better from the original cookfarm dataset. cookfarm_mlr3 also now comes with all rows of the upstream cookfarm task and not with a random subset as before.
    • Rewrite mlr_resampling_spctcv_cstf implementation. The method will produce different fold results compared to {mlr3spatiotempcv} <= 1.0.1. This is because of a change/fix in the sampling behavior: before, an (unwanted) stratified sampling was done on time and space variables. While this matched the upstream implementation in {CAST}, this did not match with the actual theoretical underpinning described in the literature.

    Features

    • Add support for DataBackendRaster (@be-marc, #191).
    • mlr_resampling_spctcv_cstf: a log message returns the column roles from the Task which are used for partitioning
    • The help pages for all methods now describe the methods manually rather than importing the upstream documentation of the respective method.
    • Task*ST classes now print column roles space and time (if set) (#198)
    • autoplot() gains plot_time_var argument for 3D visualizations of mlr_resamplings_sptcv_cstf resamplings with only 'space' used for partitioning (#197)
    • Vignette updates

    Bugfixes

    • All {mlr3spatiotempcv} methods now comply with the {mlr3} man file declaration logic.

    Misc

    • Escape all examples and tests for non-installed packages.
    • The cookfarm_mlr3 task now sets column roles "space" and "time" for variables SOURCEID and Date, respectively.
    • Harden CLUTO tests (#182)
    • Large update for the "spatiotemporal" section in the mlr3book
    Source code(tar.gz)
    Source code(zip)
  • v1.0.1(Mar 3, 2022)

    • Fixed a issue which caused coordinates to appear in the feature set when a data.frame was supplied (#166, @be-marc)
    • Add autoplot() support for "groups" column role in rsmp("cv")
    Source code(tar.gz)
    Source code(zip)
  • v1.0.0(Aug 19, 2021)

    Breaking

    • autoplot(): removed argument crs. The CRS is now inferred from the supplied Task. Setting a different CRS than the task might lead to spurious issues and the initial idea of changing the CRS for plotting to have proper axes labeling does not apply (anymore) (#144)

    Features

    • Added autoplot() support for ResamplingCustomCV (#140)

    Bug fixes

    • "spcv_block": Assert error if folds > 2 when selection = "checkerboard" (#150)
    • Fixed row duplication when creating TaskRegrST tasks from sf objects (#152)

    Miscellaneous

    • Upgrade tests to {vdiffr} 1.0.0
    • Add {rgdal} to suggests and required it in "spcv_block" since it is required in {blockCV} >= 2.1.4 and {sf} >= 1.0
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Jun 24, 2021)

    • Upgrade tests to {vdiffr} 1.0.0
    • Add {rgdal} to suggests and required it in "spcv_block" since it is required in {blockCV} >= 2.1.4 and {sf} >= 1.0
    Source code(tar.gz)
    Source code(zip)
  • v0.4.0(Jun 3, 2021)

    Features

    • Support clustering coords only for "sptcv_cluto"
    • Add as_task_* S3 generics: as_task_classif_st.data.frame(), as_task_classif_st.DataBackend(), as_task_classif_st.sf(), as_task_regr_st.data.frame(), as_task_regr_st.DataBackend(), as_task_regr_st.sf(), as_task_classif.TaskClassifST(), as_task_regr.TaskRegrST() (#99)
    • Add "spcv_tiles" and "repeated_spcv_tiles" (#121)
    • Add "spcv_disc" (#115)

    Bug Fixes

    • Fixed train set issues for sptcv_cstf() with space and time var (#135)
    • Fixed $folds() active binding returning wrong fold number (#120)
    • Add missing man IDs (#122)

    Misc

    • Add example 2D spatial plots to spatiotemp-viz vignette
    • Add {caret} to Suggests
    • "Cstf" methods: remove arguments in favor of param set to align with other methods (#122)
    • Inherit documentation from upstream functions (#117)
    • Vignette: Update and categorize table listing all implemented methods
    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Apr 13, 2021)

    New Features

    • autoplot.ResamplingSptCVCstf(): add 2D plotting method (#106)
    • autoplot.ResamplingSptCVCstf(): add arguments show_omitted and static_image (#100)
    • autoplot() (all methods): allow adjusting point size via ... (#98)

    Maintenance

    • Remove {GSIF} package due to CRAN archival and host the cookfarm dataset standalone
    • Use Cstf method for spatiotemporal viz vignette
    • Fix help page content of ResamplingRepeatedSptCVCstf (beforehand the Cluto method was referenced accidentally)
    • Fix segfault in autoplot.ResamplingSpcvBlock example when rendering pkgdown site (unclear why this happens when show_labels = TRUE)
    • Update autoplot() examples and related documentation
    • Remove duplicate resources in Tasks "see also" fields
    • Skip a test on Solaris and macOS 3.6
    • Optimize "Spatiotemporal Visualization" vignette
    Source code(tar.gz)
    Source code(zip)
  • v0.2.1(Mar 20, 2021)

    • Add support for rasterLayer argument in blockCV::spatialBlock() (#94)
    • Ensure that blockCV::spatialBlock() functions actually returns the same result when invoked via {mlr3spatiotempcv} (#93). Among other issues, blockCV::spatialBlock(selection = "checkerboard") was ignored.
    • Get coordinates names from {sf} objects dynamically. Before some functions would have errored if the coordinate names were not named "x" and "y".
    Source code(tar.gz)
    Source code(zip)
  • v0.2.0(Mar 8, 2021)

  • v0.1.1(Jan 6, 2021)

  • v0.1.0(Dec 26, 2020)

  • v0.0.0.9006(Oct 27, 2020)

[NeurIPS 2021] “Improving Contrastive Learning on Imbalanced Data via Open-World Sampling”,

Improving Contrastive Learning on Imbalanced Data via Open-World Sampling Introduction Contrastive learning approaches have achieved great success in

VITA 24 Dec 17, 2022
Group-Free 3D Object Detection via Transformers

Group-Free 3D Object Detection via Transformers By Ze Liu, Zheng Zhang, Yue Cao, Han Hu, Xin Tong. This repo is the official implementation of "Group-

Ze Liu 213 Dec 07, 2022
Hippocampal segmentation using the UNet network for each axis

Hipposeg Hippocampal segmentation using the UNet network for each axis, inspired by https://github.com/MICLab-Unicamp/e2dhipseg Red: False Positive Gr

Juan Carlos Aguirre Arango 0 Sep 02, 2021
Light-Head R-CNN

Light-head R-CNN Introduction We release code for Light-Head R-CNN. This is my best practice for my research. This repo is organized as follows: light

jemmy li 835 Dec 06, 2022
A python script to lookup Passport Index Dataset

visa-cli A python script to lookup Passport Index Dataset Installation pip install visa-cli Usage usage: visa-cli [-h] [-d DESTINATION_COUNTRY] [-f]

rand-net 16 Oct 18, 2022
PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning

PClean: A Domain-Specific Probabilistic Programming Language for Bayesian Data Cleaning Warning: This is a rapidly evolving research prototype.

MIT Probabilistic Computing Project 190 Dec 27, 2022
a generic C++ library for image analysis

VIGRA Computer Vision Library Copyright 1998-2013 by Ullrich Koethe This file is part of the VIGRA computer vision library. You may use,

Ullrich Koethe 378 Dec 30, 2022
A Python library for generating new text from existing samples.

ReMarkov is a Python library for generating text from existing samples using Markov chains. You can use it to customize all sorts of writing from birt

8 May 17, 2022
CONditionals for Ordinal Regression and classification in PyTorch

CONDOR pytorch implementation for ordinal regression with deep neural networks. Documentation: https://GarrettJenkinson.github.io/condor_pytorch About

7 Jul 25, 2022
An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

An open software package to develop BCI based brain and cognitive computing technology for recognizing user's intention using deep learning

deepbci 272 Jan 08, 2023
PyTorch implementation of paper: AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer, ICCV 2021.

AdaAttN: Revisit Attention Mechanism in Arbitrary Neural Style Transfer [Paper] [PyTorch Implementation] [Paddle Implementation] Overview This reposit

148 Dec 30, 2022
Official repo for our 3DV 2021 paper "Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements".

Monocular 3D Reconstruction of Interacting Hands via Collision-Aware Factorized Refinements Yu Rong, Jingbo Wang, Ziwei Liu, Chen Change Loy Paper. Pr

Yu Rong 41 Dec 13, 2022
Automatic meme generation model using Tensorflow Keras.

Memefly You can find the project at MemeflyAI. Contributors Nick Buukhalter Harsh Desai Han Lee Project Overview Trello Board Product Canvas Automatic

BloomTech Labs 2 Jan 13, 2022
A series of Jupyter notebooks with Chinese comment that walk you through the fundamentals of Machine Learning and Deep Learning in python using Scikit-Learn and TensorFlow.

Hands-on-Machine-Learning 目的 这份笔记旨在帮助中文学习者以一种较快较系统的方式入门机器学习, 是在学习Hands-on Machine Learning with Scikit-Learn and TensorFlow这本书的 时候做的个人笔记: 此项目的可取之处 原书的

Baymax 1.5k Dec 21, 2022
Data loaders and abstractions for text and NLP

torchtext This repository consists of: torchtext.datasets: The raw text iterators for common NLP datasets torchtext.data: Some basic NLP building bloc

3.2k Jan 08, 2023
This code finds bounding box of a single human mouth.

This code finds bounding box of a single human mouth. In comparison to other face segmentation methods, it is relatively insusceptible to open mouth conditions, e.g., yawning, surgical robots, etc. T

iThermAI 4 Nov 27, 2022
Unofficial PyTorch implementation of Fastformer based on paper "Fastformer: Additive Attention Can Be All You Need"."

Fastformer-PyTorch Unofficial PyTorch implementation of Fastformer based on paper Fastformer: Additive Attention Can Be All You Need. Usage : import t

Hong-Jia Chen 126 Dec 06, 2022
Image data augmentation scheduler for albumentations transforms

albu_scheduler Scheduler for albumentations transforms based on PyTorch schedulers interface Usage TransformMultiStepScheduler import albumentations a

19 Aug 04, 2021
Code for technical report "An Improved Baseline for Sentence-level Relation Extraction".

RE_improved_baseline Code for technical report "An Improved Baseline for Sentence-level Relation Extraction". Requirements torch = 1.8.1 transformers

Wenxuan Zhou 74 Nov 29, 2022