Aggregating gridded data (xarray) to polygons

Related tags

Data Analysisxagg
Overview

xagg

Binder

A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons. Check out the binder link above for a sample code run!

Installation

The easiest way to install xagg is using pip. Beware though - xagg is still a work in progress; I suggest you install it to a virtual environment first (using e.g. venv, or just creating a separate environment in conda for projects using xagg).

pip install xagg

Intro

Science often happens on grids - gridded weather products, interpolated pollution data, night time lights, remote sensing all approximate the continuous real world for reasons of data resolution, processing time, or ease of calculation.

However, living things don't live on grids, and rarely play, act, or observe data on grids either. Instead, humans tend to work on the county, state, township, okrug, or city level; birds tend to fly along complex migratory corridors; and rain- and watersheds follow valleys and mountains.

So, whenever we need to work with both gridded and geographic data products, we need ways of getting them to match up. We may be interested for example what the average temperature over a county is, or the average rainfall rate over a watershed.

Enter xagg.

xagg provides an easy-to-use (2 lines!), standardized way of aggregating raster data to polygons. All you need is some gridded data in an xarray Dataset or DataArray and some polygon data in a geopandas GeoDataFrame. Both of these are easy to use for the purposes of xagg - for example, all you need to use a shapefile is to open it:

import xarray as xr
import geopandas as gpd
 
# Gridded data file (netcdf/climate data)
ds = xr.open_dataset('file.nc')

# Shapefile
gdf = gpd.open_dataset('file.shp')

xagg will then figure out the geographic grid (lat/lon) in ds, create polygons for each pixel, and then generate intersects between every polygon in the shapefile and every pixel. For each polygon in the shapefile, the relative area of each covering pixel is calculated - so, for example, if a polygon (say, a US county) is the size and shape of a grid pixel, but is split halfway between two pixels, the weight for each pixel will be 0.5, and the value of the gridded variables on that polygon will just be the average of both [TO-DO: add visual example of this].

The two lines mentioned before?

import xagg as xa

# Get overlap between pixels and polygons
weightmap = xa.pixel_overlaps(ds,gdf)

# Aggregate data in [ds] onto polygons
aggregated = xa.aggregate(ds,weightmap)

# aggregated can now be converted into an xarray dataset (using aggregated.to_dataset()), 
# or a geopandas geodataframe (using aggregated.to_dataframe()), or directly exported 
# to netcdf, csv, or shp files using aggregated.to_csv()/.to_netcdf()/.to_shp()

Researchers often need to weight your data by more than just its relative area overlap with a polygon (for example, do you want to weight pixels with more population more?). xagg has a built-in support for adding an additional weight grid (another xarray DataArray) into xagg.pixel_overlaps().

Finally, xagg allows for direct exporting of the aggregated data in several commonly used data formats (please open issues if you'd like support for something else!):

  • netcdf
  • csv for STATA, R
  • shp for QGIS, further spatial processing

Best of all, xagg is flexible. Multiple variables in your dataset? xagg will aggregate them all, as long as they have at least lat/lon dimensions. Fields in your shapefile that you'd like to keep? xagg keeps all fields (for example FIPS codes from county datasets) all the way through the final export. Weird dimension names? xagg is trained to recognize all versions of "lat", "Latitude", "Y", "nav_lat", "Latitude_1"... etc. that the author has run into over the years of working with climate data; and this list is easily expandable as a keyword argumnet if needed.

Please contribute - let me know what works and what doesn't, whether you think this is useful, and if so - please share!

Use Cases

Climate econometrics

Many climate econometrics studies use societal data (mortality, crop yields, etc.) at a political or administrative level (for example, counties) but climate and weather data on grids. Oftentimes, further weighting by population or agricultural density is needed.

Area-weighting of pixels onto polygons ensures that aggregating weather and climate data onto polygons occurs in a robust way. Consider a (somewhat contrived) example: an administrative region is in a relatively flat lowlands, but a pixel that slightly overlaps the polygon primarily covers a wholly different climate (mountainous, desert, etc.). Using a simple mask would weight that pixel the same, though its information is not necessarily relevant to the climate of the region. Population-weighting may not always be sufficient either; consider Los Angeles, which has multiple significantly different climates, all with high densities.

xagg allows a simple population and area-averaging, in addition to export functions that will turn the aggregated data into output easily used in STATA or R for further calculations.

Left to do

  • Testing, bug fixes, stability checks, etc.
  • Share widely! I hope this will be helpful to a wide group of natural and social scientists who have to work with both gridded and polygon data!
Comments
  • Speedup for large grids - mod gdf_pixels in create_raster_polgons

    Speedup for large grids - mod gdf_pixels in create_raster_polgons

    In create_raster_polygons, the for loop that assigns individual polygons to gdf_pixels essentially renders xagg unusable for larger high res grids because it goes so slow. Here I propose elimination of the for loop and replacement with a lambda apply. Big improvement for large grids!

    opened by kerriegeil 10
  • dot product implementation

    dot product implementation

    Starting this pull request. This is code that implements a dot-product approach for doing the aggregation. See #2

    This works for my application but I have not run the tests on this yet.

    opened by jsadler2 9
  • work for one geometry?

    work for one geometry?

    I ran into IndexError: single positional indexer is out-of-bounds (Traceback below)

    I have a dataset with one variable over CONUS and I'm trying to weight to one geom e.g. a county.

    I'll try to give make a reproducible example

    ---------------------------------------------------------------------------
    IndexError                                Traceback (most recent call last)
    <ipython-input-83-5cd8fd54cbfc> in <module>
          1 weightmap = xa.pixel_overlaps(ds, gdf, subset_bbox=True)
    ----> 2 aggregated = xa.aggregate(ds, weightmap)
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/xagg/core.py in aggregate(ds, wm)
        434                 #   the grid have just nan values for this variable
        435                 # in both cases; the "aggregated variable" is just a vector of nans.
    --> 436                 if not np.isnan(wm.agg.iloc[poly_idx,:].pix_idxs).all():
        437                     # Get the dimensions of the variable that aren't "loc" (location)
        438                     other_dims = [k for k in np.atleast_1d(ds[var].dims) if k != 'loc']
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/pandas/core/indexing.py in __getitem__(self, key)
        887                     # AttributeError for IntervalTree get_value
        888                     return self.obj._get_value(*key, takeable=self._takeable)
    --> 889             return self._getitem_tuple(key)
        890         else:
        891             # we by definition only have the 0th axis
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/pandas/core/indexing.py in _getitem_tuple(self, tup)
       1448     def _getitem_tuple(self, tup: Tuple):
       1449 
    -> 1450         self._has_valid_tuple(tup)
       1451         with suppress(IndexingError):
       1452             return self._getitem_lowerdim(tup)
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/pandas/core/indexing.py in _has_valid_tuple(self, key)
        721         for i, k in enumerate(key):
        722             try:
    --> 723                 self._validate_key(k, i)
        724             except ValueError as err:
        725                 raise ValueError(
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/pandas/core/indexing.py in _validate_key(self, key, axis)
       1356             return
       1357         elif is_integer(key):
    -> 1358             self._validate_integer(key, axis)
       1359         elif isinstance(key, tuple):
       1360             # a tuple should already have been caught by this point
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/pandas/core/indexing.py in _validate_integer(self, key, axis)
       1442         len_axis = len(self.obj._get_axis(axis))
       1443         if key >= len_axis or key < -len_axis:
    -> 1444             raise IndexError("single positional indexer is out-of-bounds")
       1445 
       1446     # -------------------------------------------------------------------
    
    IndexError: single positional indexer is out-of-bounds
    
    opened by raybellwaves 5
  • dot product implementation for overlaps breaks xagg for high res grids

    dot product implementation for overlaps breaks xagg for high res grids

    I'm finding that the implementation of the dot product for computing weighted averages in core.py/aggregrate eats up way too much memory for high res grids. It's the wm.overlap_da that requires way too much memory. I am unable to allocate enough memory to make it through core.py/aggregate for many of the datasets I'm processing on an HPC system. I had no issue with the previous aggregate function before commit 4c5cc6503efde05153181e15bc5f7fe6bb92bd07. Looks like the dot product method is a lot cleaner in the code, but is there another benefit?

    opened by kerriegeil 3
  • work with xr.DataArray's

    work with xr.DataArray's

    In providing an xr.DataArray to xa.pixel_overlaps(da, gdf) you get the Traceback below.

    Couple of ideas for fixes:

    • in the code parse it to an xr.Dataset
    • Don't use .keys() but use .dims() instead
    AttributeError                            Traceback (most recent call last)
    <ipython-input-74-f5cd39618cec> in <module>
    ----> 1 weightmap = xa.pixel_overlaps(da, gdf)
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/xagg/wrappers.py in pixel_overlaps(ds, gdf_in, weights, weights_target, subset_bbox)
         58     print('creating polygons for each pixel...')
         59     if subset_bbox:
    ---> 60         pix_agg = create_raster_polygons(ds,subset_bbox=gdf_in,weights=weights)
         61     else:
         62         pix_agg = create_raster_polygons(ds,subset_bbox=None,weights=weights)
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/xagg/core.py in create_raster_polygons(ds, mask, subset_bbox, weights, weights_target)
        148     # Standardize inputs
        149     ds = fix_ds(ds)
    --> 150     ds = get_bnds(ds)
        151     #breakpoint()
        152     # Subset by shapefile bounding box, if desired
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/xagg/aux.py in get_bnds(ds, edges, wrap_around_thresh)
        196         # to [0,360], but it's not tested yet.
        197 
    --> 198     if ('lat' not in ds.keys()) | ('lon' not in ds.keys()):
        199         raise KeyError('"lat"/"lon" not found in [ds]. Make sure the '+
        200                        'geographic dimensions follow this naming convention.')
    
    /opt/userenvs/ray.bell/main/lib/python3.9/site-packages/xarray/core/common.py in __getattr__(self, name)
        237                 with suppress(KeyError):
        238                     return source[name]
    --> 239         raise AttributeError(
        240             "{!r} object has no attribute {!r}".format(type(self).__name__, name)
        241         )
    
    AttributeError: 'DataArray' object has no attribute 'keys'
    
    opened by raybellwaves 2
  • fix export to dataset issue, insert export tests

    fix export to dataset issue, insert export tests

    .to_dataset() was not working due to too many layers of lists in the agg.agg geodataframe. This issue has been fixed by replacing an index with np.squeeze() instead. The broader problem may be that there are too many unecessary layers of lists in the agg.agg geodataframe, which should be simplified in the next round of backend cleanup.

    Furthermore, there are now tests for .to_dataset() and .to_dataframe()

    opened by ks905383 1
  • speed improvement for high res grids in create_raster_polygons

    speed improvement for high res grids in create_raster_polygons

    Hi there, I'm a first timer when it comes to contributing to someone else's repo so please let me know if I need to fix/change anything. I've got a handful of small changes that greatly impact the speed of xagg when using high resolution grids. Planning to submit one at a time when I have the time to spend on it. It may take me a while...

    This first one removes the hard coded 0.1 degree buffer for selecting a subset bounding box in create_raster_polygons. For high res grids this will select a much larger area than desired. The solution I propose is to change the 0.1 degree buffer to twice the max grid spacing.

    opened by kerriegeil 1
  • Rename aux for windows

    Rename aux for windows

    As aux is a protected filename on Windows I could not install the package, and not even clone the repo without renaming the file first. This is a fix inspired by NeuralEnsemble/PyNN#678.

    opened by Hugovdberg 1
  • use aggregated.to_dataset().to_dataframe() within aggregated.to_dataframe()

    use aggregated.to_dataset().to_dataframe() within aggregated.to_dataframe()

    When dealing with time data aggregated.to_dataframe() will return columns as data_var0, data_var1.

    xarray has a method to convert to a dataframe http://xarray.pydata.org/en/stable/generated/xarray.DataArray.to_dataframe.html which moves coords to an multiindex.

    You would just have to add in the geometry and crs from the incoming geopandas to make it a geopandas dataframe.

    opened by raybellwaves 1
  • return geometry key in aggregated.to_dataframe()

    return geometry key in aggregated.to_dataframe()

    When doing aggregated.to_dataframe() it drops the geometry column that is in the original geopandas.DataFrame.

    It would be nice if it was returned to be used for things such as visualization.

    Screen Shot 2021-08-26 at 10 44 18 PM Screen Shot 2021-08-26 at 10 44 34 PM Screen Shot 2021-08-26 at 10 50 34 PM

    Code:

    import geopandas as gpd
    import pooch
    import xagg as xa
    import xarray as xr
    import hvplot.pandas
    
    # Load in example data and shapefiles
    ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)
    file = pooch.retrieve(
        "https://pubs.usgs.gov/of/2006/1187/basemaps/continents/continents.zip", None
    )
    continents = gpd.read_file("zip://" + file)
    continents
    
    wm = xa.pixel_overlaps(ds, continents)
    aggregated = xa.aggregate(ds, wm)
    aggregated.to_dataframe()
    
    pd.merge(aggregated.to_dataframe(), continents, on="CONTINENT").hvplot(c="air")
    
    opened by raybellwaves 1
  • fix index error if input gdf has own index [issue #8]

    fix index error if input gdf has own index [issue #8]

    xa.get_pixel_overlaps() creates a poly_idx column in the gdf that takes as its value the index of the input gdf. However, if there is a pre-existing index, this can lead to bad behavior, since poly_idx is used as an .iloc indexer in the gdf. This update instead makes poly_idx np.arange(0,len(gdf)), which will avoid this indexing issue (and hopefully not cause any more? I figured there would've been a reason I used the existing index if not a new one... fingers crossed).

    opened by ks905383 1
  • silence functions

    silence functions

    Hi, thank you a lot for the great package.

    I was wondering if it is possible to add an argument to the functions (pixel_overlaps and aggregate) to silence them if we want? I am doing aggregations for many geometries and sometimes it becomes too crowded, especially if I try to print other things along while the functions are executed.

    Thanks !

    opened by khalilT 0
  • Mistaken use of ds.var() in `core.py`?

    Mistaken use of ds.var() in `core.py`?

    In core.py, there are a few loops of the form: for var in ds.var():.

    This tries to compute a variance across all dimensions, for each variable. Is that the intention? I think you just mean for var in ds:.

    Note that if any variables are of a type for which var cannot be computed (e.g., timedelta64[ns]) then aggregate fails.

    opened by jrising 3
  • Odd errors from using pixel_overlaps with a weights option

    Odd errors from using pixel_overlaps with a weights option

    This issue is sort of three issues that I encountered while trying to solve a problem. Fixes to any of these would work for me.

    I'm trying to use xagg with some fairly large files including a weights file, and I was getting an error during the regridding process:

    >>> weightmap = xa.pixel_overlaps(ds_tas, gdf_regions, weights=ds_pop.Population, subset_bbox=False)
    creating polygons for each pixel...
    lat/lon bounds not found in dataset; they will be created.
    regridding weights to data grid...
    Create weight file: bilinear_1800x3600_1080x2160.nc
    zsh: illegal hardware instruction  python
    

    (at which point, python crashes)

    I decided to do the regridding myself and save the result. Here are what the data file (ds_tas) and weights file (ds_pop) look like:

    >>> ds_tas
    <xarray.Dataset>
    Dimensions:      (band: 12, x: 2160, y: 1080)
    Coordinates:
      * band         (band) int64 1 2 3 4 5 6 7 8 9 10 11 12
      * x            (x) float64 -179.9 -179.8 -179.6 -179.4 ... 179.6 179.7 179.9
      * y            (y) float64 89.92 89.75 89.58 89.42 ... -89.58 -89.75 -89.92
        spatial_ref  int64 ...
    Data variables:
        band_data    (band, y, x) float32 ...
    
    >>> ds_pop
    <xarray.Dataset>
    Dimensions:     (longitude: 2160, latitude: 1080)
    Coordinates:
      * longitude   (longitude) float64 -179.9 -179.8 -179.6 ... 179.6 179.8 179.9
      * latitude    (latitude) float64 89.92 89.75 89.58 ... -89.58 -89.75 -89.92
    Data variables:
        crs         int32 ...
        Population  (latitude, longitude) float32 ...
    Attributes:
        Conventions:  CF-1.4
        created_by:   R, packages ncdf4 and raster (version 3.4-13)
        date:         2022-02-05 22:14:16
    

    The dimensions line up exactly. But xagg still wanted to regrid my weights file. My guess is that this is because the dimensions are labeled differently (and so an np.allclose fails because taking a difference between the coordinates results in a 2-D matrix).

    So I relabeled my coordinates and dimensions. This results in a new error:

    >>> weightmap = xa.pixel_overlaps(ds_tas, gdf_regions, weights=ds_pop.Population, subset_bbox=False)
    creating polygons for each pixel...
    lat/lon bounds not found in dataset; they will be created.
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xagg/wrappers.py", line 50, in pixel_overlaps
        pix_agg = create_raster_polygons(ds,subset_bbox=None,weights=weights)
      File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xagg/core.py", line 127, in create_raster_polygons
        ds = get_bnds(ds)
      File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xagg/aux.py", line 190, in get_bnds
        bnds_tmp[1:,:] = xr.concat([ds[var]-0.5*ds[var].diff(var),
      File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xarray/core/_typed_ops.py", line 209, in __sub__
        return self._binary_op(other, operator.sub)
      File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xarray/core/dataarray.py", line 3081, in _binary_op
        self, other = align(self, other, join=align_type, copy=False)
      File "/Users/admin/opt/anaconda3/envs/ccenv2/lib/python3.7/site-packages/xarray/core/alignment.py", line 349, in align
        f"arguments without labels along dimension {dim!r} cannot be "
    ValueError: arguments without labels along dimension 'lat' cannot be aligned because they have different dimension sizes: {1080, 1079}
    

    To be clear, neither of my datasets has a dimension of size 1079.

    opened by jrising 11
  • weightmap (pixel_overlaps) warnings and errors

    weightmap (pixel_overlaps) warnings and errors

    Hi, thanks for the helpful package.

    On a Windows machine, I'm using the package successfully on ERA5 reanalysis data although I do get a user warning when calling pixel_overlaps. It occurs after the output "calculating overlaps between pixels and output polygons...". The warning is:

    "/home/kgeil/miniconda3/envs/xagg/lib/python3.9/site-packages/xagg/core.py:308: UserWarning: keep_geom_type=True in overlay resulted in 1 dropped geometries of different geometry types than df1 has. Set keep_geom_type=False to retain all geometries overlaps = gpd.overlay(gdf_in.to_crs(epsg_set),'

    When I try generating a weight map with the exact same shapefile but on AVHRR NDVI data instead I get a full error at the same-ish location:

    "ValueError: GeoDataFrame does not support setting the geometry column where the column name is shared by multiple columns."

    It looks like something is going wrong in get_pixel_overlaps around line 323 overlap_info...

    I've tried rewriting the NDVI netcdf to be as identical as possible as the ERA5 file (same coord and dim names, etc) and both files are epsg:4326.

    Any ideas how to get past this error?

    opened by kerriegeil 3
  • Trying to install outside target directory

    Trying to install outside target directory

    Get this error when trying to install with pip on windows. Have tried to install from pypi, github, and zip. Same error in each instance. I've tried with a base python install using virtual env and with conda. ERROR: The zip file (C:\Users\profile\Downloads\xagg-main.zip) has a file (C:\Users\khafen\AppData\Local\Temp\3\pip-req-build-cok0yin6\xagg/aux.py) trying to install outside target directory (C:\Users\profile\AppData\Local\Temp\3\pip-req-build-cok0yin6)

    opened by konradhafen 2
  • add to_geodataframe

    add to_geodataframe

    Closes https://github.com/ks905383/xagg/issues/17

    Open to feedback here.

    I believe https://github.com/ks905383/xagg/blob/main/xagg/classes.py#L62 should say geopandas.GeoDataFrame

    but I was thinking to_dataframe could return a pandas dataframe (no geometry and no crs). and to_geodataframe returns the geomety and crs

    opened by raybellwaves 2
Releases(v0.3.0.2)
  • v0.3.0.2(Apr 10, 2022)

    Bug fixes

    • .to_dataset() functions again
    • .read_wm() is now loaded by default

    What's Changed

    • fix export to dataset issue, insert export tests by @ks905383 in https://github.com/ks905383/xagg/pull/35
    • add read_wm() to init by @ks905383 in https://github.com/ks905383/xagg/pull/36

    Full Changelog: https://github.com/ks905383/xagg/compare/v0.3.0.1...v0.3.0.2

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0.1(Apr 7, 2022)

    Fixes dependency error in setup.py that was preventing publication of v0.3* on conda-forge.

    Full Changelog: https://github.com/ks905383/xagg/compare/v0.3.0...v0.3.0.1

    Source code(tar.gz)
    Source code(zip)
  • v0.3.0(Apr 2, 2022)

    Performance upgrades

    Various performance upgrades, particularly for working with high resolution grids.

    In create_raster_polygons:

    • replacing the for-loop assigning pixels to polygons with a lambda apply
    • creating flexible buffer for subsetting to bounding box, replacing the hardcoded 0.1 degrees used previously with twice the max grid spacing

    In aggregate:

    • an optional replacement of the aggregating calculation with a dot-product implementation (impl='dot_product' in pixel_overlaps() and aggregate()), which may improve performance in certain situations

    Expanded functionality

    Weightmaps can now be saved using wm.to_file() and loaded using xagg.core.read_wm(), and no longer have to be regenerated with each code run.

    Bug fixes

    Various bug fixes

    What's Changed

    • speed improvement for high res grids in create_raster_polygons by @kerriegeil in https://github.com/ks905383/xagg/pull/29
    • dot product implementation by @jsadler2 in https://github.com/ks905383/xagg/pull/4
    • Speedup for large grids - mod gdf_pixels in create_raster_polgons by @kerriegeil in https://github.com/ks905383/xagg/pull/30
    • implement making dot product optional, restoring default agg behavior by @ks905383 in https://github.com/ks905383/xagg/pull/32
    • Implement a way to save weightmaps (output from pixel_overlaps) by @ks905383 in https://github.com/ks905383/xagg/pull/33

    New Contributors

    • @kerriegeil made their first contribution in https://github.com/ks905383/xagg/pull/29
    • @jsadler2 made their first contribution in https://github.com/ks905383/xagg/pull/4

    Full Changelog: https://github.com/ks905383/xagg/compare/v0.2.6...v0.3.0

    Source code(tar.gz)
    Source code(zip)
  • v0.2.6(Jan 26, 2022)

    Bug fixes:

    • #11 pixel_overlaps no longer changes the gdf_in outside of the function

    Functionality tweaks

    • added agg.to_geodataframe(), similar to agg.to_dataframe(), but keeping the geometries from the original shapefile
    • adapted xarray's ds.to_dataframe() in agg.to_dataframe(), which has better functionality
    • .csvs now export long instead of wide, using the output from ds.to_dataframe() above
    Source code(tar.gz)
    Source code(zip)
  • v0.2.5(Jul 24, 2021)

  • v0.2.4(May 14, 2021)

Owner
Kevin Schwarzwald
Researching climate variability + impacts by profession, urban expansion by studies, and transit/land use policy by interest. Moonlight as rock violinist.
Kevin Schwarzwald
Flood modeling by 2D shallow water equation

hydraulicmodel Flood modeling by 2D shallow water equation. Refer to Hunter et al (2005), Bates et al. (2010). Diffusive wave approximation Local iner

6 Nov 30, 2022
MDAnalysis is a Python library to analyze molecular dynamics simulations.

MDAnalysis Repository README [*] MDAnalysis is a Python library for the analysis of computer simulations of many-body systems at the molecular scale,

MDAnalysis 933 Dec 28, 2022
A script to "SHUA" H1-2 map of Mercenaries mode of Hearthstone

lushi_script Introduction This script is to "SHUA" H1-2 map of Mercenaries mode of Hearthstone Installation Make sure you installed python=3.6. To in

210 Jan 02, 2023
cLoops2: full stack analysis tool for chromatin interactions

cLoops2: full stack analysis tool for chromatin interactions Introduction cLoops2 is an extension of our previous work, cLoops. From loop-calling base

YaqiangCao 25 Dec 14, 2022
Improving your data science workflows with

Make Better Defaults Author: Kjell Wooding [email protected] This is the git re

Kjell Wooding 18 Dec 23, 2022
Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Data Scientist Learning Plan Demonstrate the breadth and depth of your data science skills by earning all of the Databricks Data Scientist credentials

Trung-Duy Nguyen 27 Nov 01, 2022
Deep universal probabilistic programming with Python and PyTorch

Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab

7.7k Dec 30, 2022
First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we want to understand column level lineage and automate impact analysis.

dbt-osmosis First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we wan

Alexander Butler 150 Jan 06, 2023
pyETT: Python library for Eleven VR Table Tennis data

pyETT: Python library for Eleven VR Table Tennis data Documentation Documentation for pyETT is located at https://pyett.readthedocs.io/. Installation

Tharsis Souza 5 Nov 19, 2022
SNV calling pipeline developed explicitly to process individual or trio vcf files obtained from Illumina based pipeline (grch37/grch38).

SNV Pipeline SNV calling pipeline developed explicitly to process individual or trio vcf files obtained from Illumina based pipeline (grch37/grch38).

East Genomics 1 Nov 02, 2021
Shot notebooks resuming the main functions of GeoPandas

Shot notebooks resuming the main functions of GeoPandas, 2 notebooks written as Exercises to apply these functions.

1 Jan 12, 2022
:truck: Agile Data Preparation Workflows made easy with dask, cudf, dask_cudf and pyspark

To launch a live notebook server to test optimus using binder or Colab, click on one of the following badges: Optimus is the missing framework to prof

Iron 1.3k Dec 30, 2022
A utility for functional piping in Python that allows you to access any function in any scope as a partial.

WithPartial Introduction WithPartial is a simple utility for functional piping in Python. The package exposes a context manager (used with with) calle

Michael Milton 1 Oct 26, 2021
A Python adaption of Augur to prioritize cell types in perturbation analysis.

A Python adaption of Augur to prioritize cell types in perturbation analysis.

Theis Lab 2 Mar 29, 2022
MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data.

MetPy MetPy is a collection of tools in Python for reading, visualizing and performing calculations with weather data. MetPy follows semantic versioni

Unidata 971 Dec 25, 2022
WaveFake: A Data Set to Facilitate Audio DeepFake Detection

WaveFake: A Data Set to Facilitate Audio DeepFake Detection This is the code repository for our NeurIPS 2021 (Track on Datasets and Benchmarks) paper

Chair for Sys­tems Se­cu­ri­ty 27 Dec 22, 2022
An experimental project I'm undertaking for the sole purpose of increasing my Python knowledge

5ePy is an experimental project I'm undertaking for the sole purpose of increasing my Python knowledge. #Goals Goal: Create a working, albeit lightwei

Hayden Covington 1 Nov 24, 2021
The Spark Challenge Student Check-In/Out Tracking Script

The Spark Challenge Student Check-In/Out Tracking Script This Python Script uses the Student ID Database to match the entries with the ID Card Swipe a

1 Dec 09, 2021
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana

Amazon Web Services - Labs 3.3k Jan 04, 2023
Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown.

Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown

915 Dec 26, 2022