PyNHD is a part of HyRiver software stack that is designed to aid in watershed analysis through web services.

Overview
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/pynhd_logo.png

JOSS

Package Description Status
PyNHD Navigate and subset NHDPlus (MR and HR) using web services Github Actions
Py3DEP Access topographic data through National Map's 3DEP web service Github Actions
PyGeoHydro Access NWIS, NID, HCDN 2009, NLCD, and SSEBop databases Github Actions
PyDaymet Access Daymet for daily climate data both single pixel and gridded Github Actions
AsyncRetriever High-level API for asynchronous requests with persistent caching Github Actions
PyGeoOGC Send queries to any ArcGIS RESTful-, WMS-, and WFS-based services Github Actions
PyGeoUtils Convert responses from PyGeoOGC's supported web services to datasets Github Actions

PyNHD: Navigate and subset NHDPlus database

PyPi Conda Version CodeCov Python Versions Downloads

CodeFactor black pre-commit Binder

Features

PyNHD is a part of HyRiver software stack that is designed to aid in watershed analysis through web services.

This package provides access to WaterData, the National Map's NHDPlus HR, NLDI, and PyGeoAPI web services. These web services can be used to navigate and extract vector data from NHDPlus V2 (both medium- and hight-resolution) database such as catchments, HUC8, HUC12, GagesII, flowlines, and water bodies. Moreover, PyNHD gives access to an item on ScienceBase called Select Attributes for NHDPlus Version 2.1 Reach Catchments and Modified Network Routed Upstream Watersheds for the Conterminous United States. This item provides over 30 attributes at catchment-scale based on NHDPlus ComIDs. These attributes are available in three categories:

  1. Local (local): For individual reach catchments,
  2. Total (upstream_acc): For network-accumulated values using total cumulative drainage area,
  3. Divergence (div_routing): For network-accumulated values using divergence-routed.

Moreover, the PyGeoAPI service provides four functionalities:

  1. flow_trace: Trace flow from a starting point to up/downstream direction.
  2. split_catchment: Split the local catchment of a point of interest at the point's location.
  3. elevation_profile: Extract elevation profile along a flow path between two points.
  4. cross_section: Extract cross-section at a point of interest along a flow line.

A list of these attributes for each characteristic type can be accessed using nhdplus_attrs function.

Similarly, PyNHD uses this item on Hydroshare to get ComID-linked NHDPlus Value Added Attributes. This dataset includes slope and roughness, among other attributes, for all the flowlines. You can use nhdplus_vaa function to get this dataset.

Additionally, PyNHD offers some extra utilities for processing the flowlines:

  • prepare_nhdplus: For cleaning up the dataframe by, for example, removing tiny networks, adding a to_comid column, and finding a terminal flowlines if it doesn't exist.
  • topoogical_sort: For sorting the river network topologically which is useful for routing and flow accumulation.
  • vector_accumulation: For computing flow accumulation in a river network. This function is generic and any routing method can be plugged in.

These utilities are developed based on an R package called nhdplusTools.

You can find some example notebooks here.

Please note that since this project is in early development stages, while the provided functionalities should be stable, changes in APIs are possible in new releases. But we appreciate it if you give this project a try and provide feedback. Contributions are most welcome.

Moreover, requests for additional functionalities can be submitted via issue tracker.

Installation

You can install PyNHD using pip after installing libgdal on your system (for example, in Ubuntu run sudo apt install libgdal-dev). Moreover, PyNHD has an optional dependency for using persistent caching, requests-cache. We highly recommend to install this package as it can significantly speedup send/receive queries. You don't have to change anything in your code, since PyNHD under-the-hood looks for requests-cache and if available, it will automatically use persistent caching:

$ pip install pynhd

Alternatively, PyNHD can be installed from the conda-forge repository using Conda:

$ conda install -c conda-forge pynhd

Quick start

Let's explore the capabilities of NLDI. We need to instantiate the class first:

from pynhd import NLDI, WaterData, NHDPlusHR
import pynhd as nhd

First, let’s get the watershed geometry of the contributing basin of a USGS station using NLDI:

nldi = NLDI()
station_id = "01031500"

basin = nldi.get_basins(station_id)

The navigate_byid class method can be used to navigate NHDPlus in both upstream and downstream of any point in the database. Let’s get ComIDs and flowlines of the tributaries and the main river channel in the upstream of the station.

flw_main = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamMain",
    source="flowlines",
    distance=1000,
)

flw_trib = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamTributaries",
    source="flowlines",
    distance=1000,
)

We can get other USGS stations upstream (or downstream) of the station and even set a distance limit (in km):

st_all = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamTributaries",
    source="nwissite",
    distance=1000,
)

st_d20 = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamTributaries",
    source="nwissite",
    distance=20,
)

Now, let’s get the HUC12 pour points:

pp = nldi.navigate_byid(
    fsource="nwissite",
    fid=f"USGS-{station_id}",
    navigation="upstreamTributaries",
    source="huc12pp",
    distance=1000,
)
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/nhdplus_navigation.png

Also, we can get the slope data for each river segment from NHDPlus VAA database:

vaa = nhd.nhdplus_vaa("input_data/nhdplus_vaa.parquet")

flw_trib["comid"] = pd.to_numeric(flw_trib.nhdplus_comid)
slope = gpd.GeoDataFrame(
    pd.merge(flw_trib, vaa[["comid", "slope"]], left_on="comid", right_on="comid"),
    crs=flw_trib.crs,
)
slope[slope.slope < 0] = np.nan

Now, let's explore the PyGeoAPI capabilities:

pygeoapi = PyGeoAPI()

trace = pygeoapi.flow_trace(
    (1774209.63, 856381.68), crs="ESRI:102003", raindrop=False, direction="none"
)

split = pygeoapi.split_catchment((-73.82705, 43.29139), crs="epsg:4326", upstream=False)

profile = pygeoapi.elevation_profile(
    [(-103.801086, 40.26772), (-103.80097, 40.270568)], numpts=101, dem_res=1, crs="epsg:4326"
)

section = pygeoapi.cross_section((-103.80119, 40.2684), width=1000.0, numpts=101, crs="epsg:4326")
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/split_catchment.png

Next, we retrieve the medium- and high-resolution flowlines within the bounding box of our watershed and compare them. Moreover, Since several web services offer access to NHDPlus database, NHDPlusHR has an argument for selecting a service and also an argument for automatically switching between services.

mr = WaterData("nhdflowline_network")
nhdp_mr = mr.bybox(basin.geometry[0].bounds)

hr = NHDPlusHR("networknhdflowline", service="hydro", auto_switch=True)
nhdp_hr = hr.bygeom(basin.geometry[0].bounds)
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/hr_mr.png

Moreover, WaterData can find features within a given radius (in meters) of a point:

eck4 = "+proj=eck4 +lon_0=0 +x_0=0 +y_0=0 +datum=WGS84 +units=m +no_defs"
coords = (-5727797.427596455, 5584066.49330473)
rad = 5e3
flw_rad = mr.bydistance(coords, rad, loc_crs=eck4)
flw_rad = flw_rad.to_crs(eck4)

Instead of getting all features within a radius of the coordinate, we can snap to the closest flowline using NLDI:

comid_closest = nldi.comid_byloc((x, y), eck4)
flw_closest = nhdp_mr.byid("comid", comid_closest.comid.values[0])
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/nhdplus_radius.png

Since NHDPlus HR is still at the pre-release stage let's use the MR flowlines to demonstrate the vector-based accumulation. Based on a topological sorted river network pynhd.vector_accumulation computes flow accumulation in the network. It returns a dataframe which is sorted from upstream to downstream that shows the accumulated flow in each node.

PyNHD has a utility called prepare_nhdplus that identifies such relationship among other things such as fixing some common issues with NHDPlus flowlines. But first we need to get all the NHDPlus attributes for each ComID since NLDI only provides the flowlines’ geometries and ComIDs which is useful for navigating the vector river network data. For getting the NHDPlus database we use WaterData. Let’s use the nhdflowline_network layer to get required info.

wd = WaterData("nhdflowline_network")

comids = flw_trib.nhdplus_comid.to_list()
nhdp_trib = wd.byid("comid", comids)
flw = nhd.prepare_nhdplus(nhdp_trib, 0, 0, purge_non_dendritic=False)

To demonstrate the use of routing, let's use nhdplus_attrs function to get list of available NHDPlus attributes

char = "CAT_RECHG" area = "areasqkm" local = nldi.getcharacteristic_byid(comids, "local", char_ids=char) flw = flw.merge(local[char], left_on="comid", right_index=True) def runoff_acc(qin, q, a): return qin + q * a flw_r = flw[["comid", "tocomid", char, area]] runoff = nhd.vector_accumulation(flw_r, runoff_acc, char, [char, area]) def area_acc(ain, a): return ain + a flw_a = flw[["comid", "tocomid", area]] areasqkm = nhd.vector_accumulation(flw_a, area_acc, area, [area]) runoff /= areasqkm

Since these are catchment-scale characteristic, let’s get the catchments then add the accumulated characteristic as a new column and plot the results.

wd = WaterData("catchmentsp")
catchments = wd.byid("featureid", comids)

c_local = catchments.merge(local, left_on="featureid", right_index=True)
c_acc = catchments.merge(runoff, left_on="featureid", right_index=True)
https://raw.githubusercontent.com/cheginit/HyRiver-examples/main/notebooks/_static/flow_accumulation.png

More examples can be found here.

Contributing

Contributions are very welcomed. Please read CONTRIBUTING.rst file for instructions.

Comments
  • Pynhd: GeoConnex NameError

    Pynhd: GeoConnex NameError

    I'm using PythonWin with Python version 3.10 on a windows PC. I was trying out the quick start code for the pynhd package. Once I hit the code: gcx = GeoConnex("gages"). I get a NameError: name GeoConnex is not defined.

    I expected my code to assign the value GeoConnex("gages") to the variable gcx.

    import pynhd gcx = GeoConnex("gages")

    opened by nluft2 9
  • Example notebook import errors

    Example notebook import errors

    I am trying to use the example Jupyter Notebook for PyNHD.

    After commenting out the import of a colormap that isn't present:

    image

    If I change the import of pynhd to pynhd.pynhd, I get the following message:

    image

    opened by jjstarn-usgs 9
  • WBD watershed access using pynhd.WaterData vs pygeoogc.ArcGISRESTful

    WBD watershed access using pynhd.WaterData vs pygeoogc.ArcGISRESTful

    Currently we can get WBD data using either pynhd.WaterData or pygeoogc.ArcGISRESTful().restful.wbd. The former accesses the USGS "waterlabs" GeoServer (which presumably is more experimental and less publicly supported) and the latter apparently accesses the official USGS TNM source. The former provides access to HUC08 and HUC12 layers, while the latter apparently only provides access to a HUC12 layer -- at least that's what I see in your pygeoogc example.

    Which one would you recommend to a new user of the hydrodata ecosystem? I think that choice is somewhat confusing to users unfamiliar with web services. Personally, I like having the option to access a HUC08 layer directly, but at the same time the projection inconsistencies in the waterlabs Geoserver layers are not user friendly.

    For WaterHackWeek I'm using pynhd.WaterData exclusively. FYI, here's the current state of my WHW tutorial notebook that uses the hydrodata suite to query for HUC layers. It's almost done, but it'll continue to change for the next couple of days.

    opened by emiliom 9
  • Bad Gateway for WaterData() URL

    Bad Gateway for WaterData() URL

    Thanks for this library! I am on a Windows 10 and Python 3.8.10

    I was running through the example contained in the phynhd readme file for USGS station ID 01482100. I get an error when I run:

    wd_cat = WaterData("catchmentsp")

    catchments = wd_cat.byid("featureid", comids)

    It seems like the URL provided by WaterData() is dead. Do you know if this is a temporary problem, or a permanent change? I get the following error:

    `--------------------------------------------------------------------------- ClientResponseError Traceback (most recent call last) ~\Anaconda3\envs\pangeo\lib\site-packages\async_retriever\async_retriever.py in _retrieve(uid, url, session, read_type, s_kwds, r_kwds) 59 try: ---> 60 response.raise_for_status() 61 resp = await getattr(response, read_type)(**r_kwds)

    ~\Anaconda3\envs\pangeo\lib\site-packages\aiohttp\client_reqrep.py in raise_for_status(self) 999 self.release() -> 1000 raise ClientResponseError( 1001 self.request_info,

    ClientResponseError: 502, message='Bad Gateway', url=URL('https://labs.waterdata.usgs.gov/geoserver/wmadata/ows')

    The above exception was the direct cause of the following exception:

    ServiceError Traceback (most recent call last) in ----> 1 catchments = wd_cat.byid("featureid", comids)

    ~\Anaconda3\envs\pangeo\lib\site-packages\pynhd\pynhd.py in byid(self, featurename, featureids) 233 def byid(self, featurename: str, featureids: Union[List[str], str]) -> gpd.GeoDataFrame: 234 """Get features based on IDs.""" --> 235 resp = self.wfs.getfeature_byid(featurename, featureids) 236 return self._to_geodf(resp) 237

    ~\Anaconda3\envs\pangeo\lib\site-packages\pygeoogc\pygeoogc.py in getfeature_byid(self, featurename, featureids) 502 503 if len(featureids) > 200: --> 504 return self.getfeature_byfilter(f"{featurename} IN ({fid_list})", method="POST") 505 506 return self.getfeature_byfilter(f"{featurename} IN ({fid_list})")

    ~\Anaconda3\envs\pangeo\lib\site-packages\pygeoogc\pygeoogc.py in getfeature_byfilter(self, cql_filter, method) 550 elif method == "POST": 551 headers = {"content-type": "application/x-www-form-urlencoded"} --> 552 resp = ar.retrieve( 553 [self.url], self.read_method, [{"data": payload, "headers": headers}], "POST" 554 )

    ~\Anaconda3\envs\pangeo\lib\site-packages\async_retriever\async_retriever.py in retrieve(urls, read, request_kwds, request_method, max_workers, cache_name, family) 190 ) 191 --> 192 return [r for _, r in sorted(tlz.concat(results))] 193 194

    ~\Anaconda3\envs\pangeo\lib\site-packages\async_retriever\async_retriever.py in (.0) 184 chunked_reqs = tlz.partition_all(max_workers, inp.url_kwds) 185 results = ( --> 186 loop.run_until_complete( 187 async_session(c, inp.read, inp.r_kwds, inp.request_method, inp.cache_name, inp.family), 188 )

    ~\Anaconda3\envs\pangeo\lib\site-packages\nest_asyncio.py in run_until_complete(self, future) 68 raise RuntimeError( 69 'Event loop stopped before Future completed.') ---> 70 return f.result() 71 72 def _run_once(self):

    ~\Anaconda3\envs\pangeo\lib\asyncio\futures.py in result(self) 176 self.__log_traceback = False 177 if self._exception is not None: --> 178 raise self._exception 179 return self._result 180

    ~\Anaconda3\envs\pangeo\lib\asyncio\tasks.py in __step(failed resolving arguments) 278 # We use the send method directly, because coroutines 279 # don't have __iter__ and __next__ methods. --> 280 result = coro.send(None) 281 else: 282 result = coro.throw(exc)

    ~\Anaconda3\envs\pangeo\lib\site-packages\async_retriever\async_retriever.py in async_session(url_kwds, read, r_kwds, request_method, cache_name, family) 115 request_func = getattr(session, request_method.lower()) 116 tasks = (_retrieve(uid, u, request_func, read, kwds, r_kwds) for uid, u, kwds in url_kwds) --> 117 return await asyncio.gather(*tasks) 118 119

    ~\Anaconda3\envs\pangeo\lib\asyncio\tasks.py in __wakeup(self, future) 347 def __wakeup(self, future): 348 try: --> 349 future.result() 350 except BaseException as exc: 351 # This may also be a cancellation.

    ~\Anaconda3\envs\pangeo\lib\asyncio\tasks.py in __step(failed resolving arguments) 278 # We use the send method directly, because coroutines 279 # don't have __iter__ and __next__ methods. --> 280 result = coro.send(None) 281 else: 282 result = coro.throw(exc)

    ~\Anaconda3\envs\pangeo\lib\site-packages\async_retriever\async_retriever.py in _retrieve(uid, url, session, read_type, s_kwds, r_kwds) 61 resp = await getattr(response, read_type)(**r_kwds) 62 except (ClientResponseError, ContentTypeError) as ex: ---> 63 raise ServiceError(await response.text()) from ex 64 else: 65 return uid, resp

    ServiceError:

    502 Bad Gateway

    502 Bad Gateway

    ` bug 
    opened by sjordan29 7
  • InvalidInputValue Error Using NLDI navigate_byid()

    InvalidInputValue Error Using NLDI navigate_byid()

    What happened: Following an example notebook - dam_impact.ipynb fails at the cell (17) using navigate_byid(). All previous cells executed successfully and in sequence. Returns the following error -

    ---------------------------------------------------------------------------
    InvalidInputValue                         Traceback (most recent call last)
    c:\Users\keonm\Documents\GitHub\HyRiver-examples\Geospatial Hydrologic Data Using Web Services.ipynb Cell 20' in <cell line: 3>()
          [3](vscode-notebook-cell:/c%3A/Users/keonm/Documents/GitHub/HyRiver-examples/Geospatial%20Hydrologic%20Data%20Using%20Web%20Services.ipynb#ch0000019?line=2) for agency, fid in sites[["agency_cd", "site_no"]].itertuples(index=False, name=None):
          [4](vscode-notebook-cell:/c%3A/Users/keonm/Documents/GitHub/HyRiver-examples/Geospatial%20Hydrologic%20Data%20Using%20Web%20Services.ipynb#ch0000019?line=3)     try:
    ----> [5](vscode-notebook-cell:/c%3A/Users/keonm/Documents/GitHub/HyRiver-examples/Geospatial%20Hydrologic%20Data%20Using%20Web%20Services.ipynb#ch0000019?line=4)         flw_up[fid] = nldi.navigate_byid(
          [6](vscode-notebook-cell:/c%3A/Users/keonm/Documents/GitHub/HyRiver-examples/Geospatial%20Hydrologic%20Data%20Using%20Web%20Services.ipynb#ch0000019?line=5)             fsource="nwissite",
          [7](vscode-notebook-cell:/c%3A/Users/keonm/Documents/GitHub/HyRiver-examples/Geospatial%20Hydrologic%20Data%20Using%20Web%20Services.ipynb#ch0000019?line=6)             fid=f"{agency}-{fid}",
          [8](vscode-notebook-cell:/c%3A/Users/keonm/Documents/GitHub/HyRiver-examples/Geospatial%20Hydrologic%20Data%20Using%20Web%20Services.ipynb#ch0000019?line=7)             navigation="upstreamTributaries",
          [9](vscode-notebook-cell:/c%3A/Users/keonm/Documents/GitHub/HyRiver-examples/Geospatial%20Hydrologic%20Data%20Using%20Web%20Services.ipynb#ch0000019?line=8)             source="flowlines",
         [10](vscode-notebook-cell:/c%3A/Users/keonm/Documents/GitHub/HyRiver-examples/Geospatial%20Hydrologic%20Data%20Using%20Web%20Services.ipynb#ch0000019?line=9)             distance=10)
         [11](vscode-notebook-cell:/c%3A/Users/keonm/Documents/GitHub/HyRiver-examples/Geospatial%20Hydrologic%20Data%20Using%20Web%20Services.ipynb#ch0000019?line=10)     except ZeroMatched:
         [12](vscode-notebook-cell:/c%3A/Users/keonm/Documents/GitHub/HyRiver-examples/Geospatial%20Hydrologic%20Data%20Using%20Web%20Services.ipynb#ch0000019?line=11)         noflw.append(fid)
    
    File ~\miniconda3\envs\pygeo-hyriver\lib\site-packages\pynhd\pynhd.py:945, in NLDI.navigate_byid(self, fsource, fid, navigation, source, distance, trim_start)
        [942](file:///~/miniconda3/envs/pygeo-hyriver/lib/site-packages/pynhd/pynhd.py?line=941)     raise ZeroMatched
        [944](file:///~/miniconda3/envs/pygeo-hyriver/lib/site-packages/pynhd/pynhd.py?line=943) if navigation not in valid_navigations.keys():
    --> [945](file:///~/miniconda3/envs/pygeo-hyriver/lib/site-packages/pynhd/pynhd.py?line=944)     raise InvalidInputValue("navigation", list(valid_navigations.keys()))
        [947](file:///~/miniconda3/envs/pygeo-hyriver/lib/site-packages/pynhd/pynhd.py?line=946) url = valid_navigations[navigation]
        [949](file:///~/miniconda3/envs/pygeo-hyriver/lib/site-packages/pynhd/pynhd.py?line=948) r_json = self._get_url(url)
    
    InvalidInputValue: Given navigation is invalid. Valid options are:
    description
    type
    

    What you expected to happen: Cell executes + calls NLDI web service using function

    Environment: Created conda environment using repo's .yml. Installed Jupyter Lab & running in VSCode.

    opened by kdmonroe 5
  • Invalid projection issue while importing

    Invalid projection issue while importing "get_basins()" method from "pynhd".

    What happened: image image

    What you expected to happen:

    Minimal Complete Verifiable Example:

    # Put your MCVE code here
    

    Anything else we need to know?: I am using Python 3.8 for which I get errors of failing to load DLL files. Do I need to use earlier version of Python?

    Environment:

    Output of pygeohydro.show_versions() ```
    </details>
    
    opened by rezaulwre 5
  • Error while using pynhd.

    Error while using pynhd.

    What happened: I am trying to use pynhd but it gives the error: "ImportError: DLL load failed while importing lib: The specified module could not be found." How to solve this?

    pynhd.show_versions()

    ImportError Traceback (most recent call last) in ----> 1 pynhd.show_versions()

    ~\anaconda3\lib\site-packages\pynhd\print_versions.py in show_versions(file) 168 for (modname, ver_f) in deps: 169 try: --> 170 mod = _get_mod(modname) 171 except ModuleNotFoundError: 172 deps_blob.append((modname, None))

    ~\anaconda3\lib\site-packages\pynhd\print_versions.py in get_mod(modname) 94 return sys.modules[modname] 95 try: ---> 96 return importlib.import_module(modname) 97 except ModuleNotFoundError: 98 return importlib.import_module(modname.replace("-", ""))

    ~\anaconda3\lib\importlib_init_.py in import_module(name, package) 125 break 126 level += 1 --> 127 return _bootstrap._gcd_import(name[level:], package, level) 128 129

    ~\anaconda3\lib\importlib_bootstrap.py in _gcd_import(name, package, level)

    ~\anaconda3\lib\importlib_bootstrap.py in find_and_load(name, import)

    ~\anaconda3\lib\importlib_bootstrap.py in find_and_load_unlocked(name, import)

    ~\anaconda3\lib\importlib_bootstrap.py in _load_unlocked(spec)

    ~\anaconda3\lib\importlib_bootstrap_external.py in exec_module(self, module)

    ~\anaconda3\lib\importlib_bootstrap.py in _call_with_frames_removed(f, *args, **kwds)

    ~\anaconda3\lib\site-packages\pygeos_init_.py in 32 # end delvewheel patch 33 ---> 34 from .lib import GEOSException # NOQA 35 from .lib import Geometry # NOQA 36 from .lib import geos_version, geos_version_string # NOQA

    ImportError: DLL load failed while importing lib: The specified module could not be found.

    ========================= What you expected to happen:

    Minimal Complete Verifiable Example:

    # Put your MCVE code here
    

    Anything else we need to know?:

    Environment: I am using Python 3.7.12 with anaconda.

    Output of pynhd.show_versions()
    
    
    opened by rezaulwre 4
  • flow_trace() only returns one upstream river reach

    flow_trace() only returns one upstream river reach

    I expected the flow_trace() function to return all of the upstream river reaches, but it is only returning one. I think this is because the NLDI API changed, and now requires distance as a parameter. https://waterdata.usgs.gov/blog/nldi_update/#distance-is-now-a-required-query-parameter

    import pynhd
    pygeoapi = pynhd.PyGeoAPI()
    lng, lat = -73.82705, 43.29139
    trace = pygeoapi.flow_trace((lng, lat),  crs="epsg:4326", direction="up")
    print(len(trace))
    

    returns 1 (expected this watershed to contain dozens or even hundreds of river reaches).

    opened by mheberger 3
  • Add support for StreamStat

    Add support for StreamStat

    Is your feature request related to a problem? Please describe. Add support for StreamStat following the suggestion in cheginit/pygeohydro#38

    Describe the solution you'd like

    NLDI and StreamStats are working together to revise the NLDI delineation tools so they will delineate from a click point not just from the catchment. The data processing steps and quality assurance work, as well as the underlying data in StreamStats, typically mean that delineations from StreamStats will be more accurate than from the NHDPlus datasets being queried in NLDI. For example, South Carolina data is based on lidar data, we're currently working on 3-meter lidar data in Nebraska. Thus, depending on the use, you may want to include the option of using StreamStats as well as NLDI.

    Describe alternatives you've considered We need to figure out a way to implement StreamStat that can either complement NLDI and/or work with a similar API to NLDI.

    Additional context @USGSPMM3, I went through the documentation and it seems that it's designed with a specific workflow in mind. I was wondering if you can provide a common example. Also, can you explain the importance of rcode? I don't understand the reason behind rcode being mandatory when you can provide lon and lat.

    enhancement 
    opened by cheginit 3
  • Error when using nhd.byids(

    Error when using nhd.byids("COMID", main.index.tolist()) (River Elevation and Cross-Section example)

    What happened: I'm just trying to run the example notebook "River Elevation and Cross-Section"

    I got an error at cell 13 when trying to query the flowlines by COMID

    image

    I get a JSONDecodeError

    image

    Minimal Complete Verifiable Example: This simple code produces the same error:

    nhd = NHD("flowline_mr")
    main_nhd = nhd.byids('COMID',['1722317'])
    

    Environment:

    Output of pynhd.show_versions()
    INSTALLED VERSIONS
    ------------------
    commit: None
    python: 3.9.13 | packaged by conda-forge | (main, May 27 2022, 16:50:36) [MSC v.1929 64 bit (AMD64)]
    python-bits: 64
    OS: Windows
    OS-release: 10
    machine: AMD64
    processor: Intel64 Family 6 Model 141 Stepping 1, GenuineIntel
    byteorder: little
    LC_ALL: None
    LANG: None
    LOCALE: English_United States.1252
    libhdf5: 1.12.1
    libnetcdf: 4.8.1
    
    aiodns: 3.0.0
    aiohttp: 3.8.1
    aiohttp-client-cache: 0.7.1
    aiosqlite: 0.17.0
    async-retriever: 0.3.3
    bottleneck: 1.3.4
    brotli: installed
    cchardet: 2.1.7
    click: 8.1.3
    cytoolz: 0.11.2
    dask: 2022.6.1
    defusedxml: 0.7.1
    folium: 0.12.1.post1
    geopandas: 0.11.0
    lxml: 4.9.0
    matplotlib: 3.4.3
    netCDF4: 1.5.8
    networkx: 2.8.4
    numpy: 1.23.0
    owslib: 0.25.0
    pandas: 1.4.3
    py3dep: 0.0
    pyarrow: 6.0.0
    pydantic: 1.9.1
    pydaymet: 0.13.2
    pygeohydro: 0.13.2
    pygeoogc: 0.13.2
    pygeos: 0.12.0
    pygeoutils: 0.13.2
    pynhd: 0.13.2
    pyproj: 3.3.1
    pytest: None
    pytest-cov: None
    rasterio: 1.2.10
    requests: 2.28.0
    requests-cache: 0.9.4
    richdem: None
    rioxarray: 0.11.1
    scipy: 1.8.1
    shapely: 1.8.2
    tables: None
    ujson: 5.3.0
    urllib3: 1.26.9
    xarray: 2022.3.0
    xdist: None
    yaml: 6.0
    
    opened by LucRSquared 2
  • BOT: [skip ci] Bump styfle/cancel-workflow-action from 0.9.1 to 0.10.0

    BOT: [skip ci] Bump styfle/cancel-workflow-action from 0.9.1 to 0.10.0

    Bumps styfle/cancel-workflow-action from 0.9.1 to 0.10.0.

    Release notes

    Sourced from styfle/cancel-workflow-action's releases.

    0.10.0

    Changes

    • Feat(all):support for considering all workflows with one term: #165
    • Chore: rebuild: 74a81dc1a9321342ebc12fa8670cc91600c8c494
    • Chore: update main.yml: #78
    • Bump @​vercel/ncc from 0.28.6 to 0.29.1: #106
    • Bump @​vercel/ncc from 0.29.1 to 0.29.2: #109
    • Bump @​vercel/ncc from 0.29.2 to 0.30.0: #112
    • Bump husky from 7.0.1 to 7.0.2: #110
    • Bump prettier from 2.3.2 to 2.4.0: #116
    • Bump @​vercel/ncc from 0.30.0 to 0.31.1: #115
    • Bump typescript from 4.3.5 to 4.4.3: #114
    • Bump prettier from 2.4.0 to 2.4.1: #117
    • Bump @​actions/github from 4.0.0 to 5.0.0: #89
    • Bump @​actions/core from 1.3.0 to 1.6.0: #118
    • Bump typescript from 4.4.3 to 4.4.4: #119
    • Bump husky from 7.0.2 to 7.0.4: #120
    • Bump typescript from 4.4.4 to 4.5.2: #124
    • Bump @​vercel/ncc from 0.31.1 to 0.32.0: #123
    • Bump prettier from 2.4.1 to 2.5.0: #125
    • Bump prettier from 2.5.0 to 2.5.1: #126
    • Bump @​vercel/ncc from 0.32.0 to 0.33.0: #127
    • Bump typescript from 4.5.2 to 4.5.3: #128
    • Bump @​vercel/ncc from 0.33.0 to 0.33.1: #130
    • Bump typescript from 4.5.3 to 4.5.4: #129
    • Bump typescript from 4.5.4 to 4.5.5: #131
    • Bump node-fetch from 2.6.5 to 2.6.7: #132
    • Bump @​vercel/ncc from 0.33.1 to 0.33.3: #138
    • Bump actions/setup-node from 2 to 3.0.0: #140
    • Bump actions/checkout from 2 to 3: #141
    • Bump typescript from 4.5.5 to 4.6.2: #142
    • Bump prettier from 2.5.1 to 2.6.0: #143
    • Bump prettier from 2.6.0 to 2.6.1: #145
    • Bump actions/setup-node from 3.0.0 to 3.1.0: #146
    • Bump typescript from 4.6.2 to 4.6.3: #144
    • Bump prettier from 2.6.1 to 2.6.2: #147
    • Bump @​actions/github from 5.0.0 to 5.0.1: #148
    • Bump actions/setup-node from 3.1.0 to 3.1.1: #149
    • Bump @​vercel/ncc from 0.33.3 to 0.33.4: #151
    • Bump @​actions/core from 1.6.0 to 1.7.0: #153
    • Bump typescript from 4.6.3 to 4.6.4: #154
    • Bump husky from 7.0.4 to 8.0.1: #155
    • Bump @​actions/core from 1.7.0 to 1.8.0: #156
    • Bump actions/setup-node from 3.1.1 to 3.2.0: #159
    • Bump @​actions/github from 5.0.1 to 5.0.3: #157
    • Bump @​actions/core from 1.8.0 to 1.8.2: #158
    • Bump typescript from 4.6.4 to 4.7.2: #160
    • Bump @​vercel/ncc from 0.33.4 to 0.34.0: #161
    • Bump typescript from 4.7.2 to 4.7.3: #163

    ... (truncated)

    Commits

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 2
Releases(v0.13.8)
  • v0.13.8(Dec 9, 2022)

    Release Notes

    New Features

    • Add a new function, called nhdplus_attrs_s3, for accessing the recently released NHDPlus derived attributes on a USGS's S3 bucket. The attributes are provided in parquet files, so getting them is faster than nhdplus_attrs. Also, you can request for multiple attributes at once whereas in nhdplus_attrs you had to request for each attribute one at a time. This function will replace nhdplus_attrs in a future release, as soon as all data that are available on the ScienceBase version are also accessible from the S3 bucket.
    • Add two new functions called mainstem_huc12_nx and enhd_flowlines_nx. These functions generate a networkx directed graph object of NHD HUC12 water boundaries and flowlines, respectively. They also return a dictionary mapping of COMID and HUC12 to the corresponding networkx node. Additionally, a topologically sorted list of COMIDs/HUC12s are returned. The generated data are useful for doing US-scale network analysis and flow accumulation on the NHD network. The NHD graph has about 2.7 million edges and the mainstem HUC12 graph has about 80K edges.
    • Add a new function for getting the entire NHDPlus dataset for CONUS (Lower 48), called nhdplus_l48. The entire NHDPlus dataset is downloaded from here. This 7.3 GB file will take a while to download, depending on your internet connection. The first time you run this function, the file will be downloaded and stored in the ./cache directory. Subsequent calls will use the cached file. Moreover, there are two additional dependencies for using this function: pyogrio and py7zr. These dependencies can be installed using pip install pyogrio py7zr or conda install -c conda-forge pyogrio py7zr.

    Internal Changes

    • Refactor vector_accumulation for significant performance improvements.
    • Modify the codebase based on Refurb suggestions.
    Source code(tar.gz)
    Source code(zip)
  • v0.13.7(Nov 4, 2022)

    Release Notes

    New Features

    • Add a new function called epa_nhd_catchments to access one of the EPA's HMS endpoints called WSCatchment. You can use this function to access 414 catchment-scale characteristics for all the NHDPlus catchments including 16-day average curve number. More information on the curve number dataset can be found at its project page here.

    Bug Fixes

    • Fix a bug in NHDTools where due to the recent changes in pandas exception handling, the NHDTools fails in converting columns with NaN values to integer type. Now, pandas throws IntCastingNaNError instead of TypeError when using astype method on a column.

    Internal Changes

    • Use pyupgrade package to update the type hinting annotations to Python 3.10 style.
    Source code(tar.gz)
    Source code(zip)
  • v0.13.6(Aug 30, 2022)

  • v0.13.5(Aug 29, 2022)

    Release Notes

    Breaking Changes

    • Append "Error" to all exception classes for conforming to PEP-8 naming conventions.

    Internal Changes

    • Bump the minimum versions of pygeoogc and pygeoutils to 0.13.5 and that of async-retriever to 0.3.5.

    Bug Fixes

    • Fix an issue in nhdplus_vaa and enhd_attrs functions where if cache folder does not exist, it would not have been created, thus resulting to an error.
    Source code(tar.gz)
    Source code(zip)
  • v0.13.3(Jul 31, 2022)

    Release Notes

    Internal Changes

    • Use the new async_retriever.stream_write function to download files in nhdplus_vaa and enhd_attrs functions. This is more memory efficient.
    • Convert the type of list of not found items in NLDI.comid_byloc and NLDI.feature_byloc to list of tuples of coordinates from list of strings. This matches the type of returned not found coordinates to that of the inputs.
    • Fix an issue with NLDI that was caused by the recent changes in the NLDI web service's error handling. The NLDI web service now returns more descriptive error messages in a json format instead of returning the usual status errors.
    • Slice the ENHD dataframe in NHDTools.clean_flowlines before updating the flowline dataframe to reduce the required memory for the update operation.
    Source code(tar.gz)
    Source code(zip)
  • v0.13.2(Jun 14, 2022)

    Release Notes

    Breaking Changes

    • Set the minimum supported version of Python to 3.8 since many of the dependencies such as xarray, pandas, rioxarray have dropped support for Python 3.7.

    Internal Changes

    • Use micromamba for running tests and use nox for linting in CI.
    Source code(tar.gz)
    Source code(zip)
  • v0.13.1(Jun 12, 2022)

    Release Notes

    New Features

    • Add support for all the GeoConnex web service endpoints. There are two ways to use it. For a single query, you can use the geoconnex function and for multiple queries, it's more efficient to use the GeoConnex class.
    • Add support for passing any of the supported NLDI feature sources to the get_basins method of the NLDI class. The default is nwissite to retain backward compatibility.

    Bug Fixes

    • Set the type of "ReachCode" column to str instead of int in pygeoapi and nhdplus_vaa functions.
    Source code(tar.gz)
    Source code(zip)
  • v0.13.0(Apr 4, 2022)

    Release Notes

    New Features

    • Add two new functions called flowline_resample and network_resample for resampling a flowline or network of flowlines based on a given spacing. This is useful for smoothing jagged flowlines similar to those in the NHDPlus database.
    • Add support for the new NLDI endpoint called "hydrolocation". The NLDI class now has two methods for getting features by coordinates: feature_byloc and comid_byloc. The feature_byloc method returns the flowline that is associated with the closest NHDPlus feature to the given coordinates. The comid_byloc method returns a point on the closest downstream flowline to the given coordinates.
    • Add a new function called pygeoapi for calling the API in batch mode. This function accepts the input coordinates as a geopandas.GeoDataFrame. It is more performant than calling its counteract PyGeoAPI multiple times. It's recommended to switch to using this new batch function instead of the PyGeoAPI class. Users just need to prepare an input data frame that has all the required service parameters as columns.
    • Add a new step to prepare_nhdplus to convert MultiLineString to LineString.
    • Add support for the simplified flag of NLDI's get_basins function. The default value is True to retain the old behavior.

    Breaking Changes

    • Remove caching-related arguments from all functions since now they can be set globally via three environmental variables:

      • HYRIVER_CACHE_NAME: Path to the caching SQLite database.
      • HYRIVER_CACHE_EXPIRE: Expiration time for cached requests in seconds.
      • HYRIVER_CACHE_DISABLE: Disable reading/writing from/to the cache file.

      You can do this like so:

    import os
    
    os.environ["HYRIVER_CACHE_NAME"] = "path/to/file.sqlite"
    os.environ["HYRIVER_CACHE_EXPIRE"] = "3600"
    os.environ["HYRIVER_CACHE_DISABLE"] = "true"
    
    Source code(tar.gz)
    Source code(zip)
  • v0.12.2(Feb 4, 2022)

    Release Notes

    New Features

    • Add a new class called NHD for accessing the latest National Hydrography Dataset. More info regarding this data can be found here.
    • Add two new functions for getting cross-sections along a single flowline via flowline_xsection or throughout a network of flowlines via network_xsection. You can specify spacing and width parameters to control their location. For more information and examples please consult the documentations.
    • Add a new property to AGRBase called service_info to include some useful info about the service including feature_types which can be handy for converting numeric values of types to their string equivalent.

    Internal Changes

    • Use the new PyGeoAPI API.
    • Refactor prepare_nhdplus for improving the performance and robustness of determining tocomid within a network of NHD flowlines.
    • Add empty geometries that NLDI.getbasins returns to the list of not found IDs. This is because the NLDI service does not include non-network flowlines and instead returns an empty geometry for these flowlines. (:issue_nhd:[#48]{.title-ref})
    Source code(tar.gz)
    Source code(zip)
  • v0.12.1(Dec 31, 2021)

    Release Notes

    Internal Changes

    • Use the three new ar.retrieve_* functions instead of the old ar.retrieve function to improve type hinting and to make the API more consistent.
    • Revert to the original PyGeoAPI base URL.
    Source code(tar.gz)
    Source code(zip)
  • v0.12.0(Dec 28, 2021)

    Release Notes

    Breaking Changes

    • Rewrite ScienceBase to make it generally usable for working with other ScienceBase items. A new function has been added for staging the Additional NHDPlus attributes items called stage_nhdplus_attrs.
    • Refactor AGRBase to remove unnecessary functions and make it more general.
    • Update PyGeoAPI class to conform to the new pygeoapi API. This web service is undergoing some changes at the time of this release and API is not stable, might not work as expected. As soon as the web service is stable, a new version will be released.

    New Features

    • In WaterData.byid show a warning if there are any missing feature IDs that are requested but are not available in the dataset.
    • For all by* methods of WaterData throw a ZeroMatched exception if no features are found.
    • Add expire_after and disable_caching arguments to all functions that use async_retriever. Set the default request caching expiration time to never expire. You can use disable_caching if you don't want to use the cached responses. Please refer to documentations of the functions for more details.

    Internal Changes

    • Refactor prepare_nhdplus to reduce code complexity by grouping all the NHDPlus tools as a private class.
    • Modify AGRBase to reflect the latest API changes in pygeoogc.ArcGISRESTfull class.
    • Refactor prepare_nhdplus by creating a private class that include all the previously used private functions. This will make the code more readable and easier to maintain.
    • Add all the missing types so mypy --strict passes.
    Source code(tar.gz)
    Source code(zip)
  • v0.11.4(Nov 12, 2021)

    Release Notes

    New Features

    • Add a new argument to NLDI.get_basins called split_catchment which if set to True will split the basin geometry at the watershed outlet.

    Internal Changes

    • Catch service errors in PyGeoAPI and show useful error messages.
    • Use importlib-metadata for getting the version insead of pkg_resources to decrease import time as discussed in this issue.
    Source code(tar.gz)
    Source code(zip)
  • v0.11.3(Sep 11, 2021)

    Release Notes

    Internal Changes

    • More robust handling of inputs and outputs of NLDI's methods.
    • Use an alternative download link for NHDPlus VAA file on Hydroshare.
    • Restructure the code base to reduce the complexity of pynhd.py file by dividing it into three files: pynhd all classes that provide access to the supported web services, core that includes base classes, and nhdplus_derived that has functions for getting databases that provided additional attributes for the NHDPlus database.
    Source code(tar.gz)
    Source code(zip)
  • v0.11.2(Aug 27, 2021)

  • v0.11.1(Jul 31, 2021)

    Release Notes

    New Features

    • Add a function for getting all NHD Fcodes as a dataframe, called nhd_fcode.
    • Improve prepare_nhdplus function by removing all coastlines and better detection of the terminal point in a network.

    Internal Changes

    • Migrate to using AsyncRetriever for handling communications with web services.
    • Catch the ConnectionError separately in NLDI and raise a ServiceError instead. So user knows that data cannot be returned due to the out of service status of the server not ZeroMatched.
    Source code(tar.gz)
    Source code(zip)
  • v0.11.0(Jun 19, 2021)

    Release Notes

    New Features

    • Add nhdplus_vaa to access NHDPlus Value Added Attributes for all its flowlines.
    • To see a list of available layers in NHDPlus HR, you can instantiate its class without passing any argument like so NHDPlusHR().

    Breaking Changes

    • Drop support for Python 3.6 since many of the dependencies such as xarray and pandas have done so.

    Internal Changes

    • Use persistent caching for all requests which can help speed up network responses significantly.
    • Improve documnetation and testing.
    Source code(tar.gz)
    Source code(zip)
  • v0.10.1(Mar 27, 2021)

  • v0.10.0(Mar 6, 2021)

  • v0.9.0(Feb 17, 2021)

  • v0.2.0(Dec 7, 2020)

  • v0.1.3(Aug 18, 2020)

  • v0.1.2(Aug 12, 2020)

  • v0.1.1(Aug 4, 2020)

  • v0.1.0(Jul 24, 2020)

Owner
Taher Chegini
Ph.D. candidate, Civil and Environmental Engineering
Taher Chegini
Port of dplyr and other related R packages in python, using pipda.

Unlike other similar packages in python that just mimic the piping syntax, datar follows the API designs from the original packages as much as possible, and is tested thoroughly with the cases from t

179 Dec 21, 2022
ETL pipeline on movie data using Python and postgreSQL

Movies-ETL ETL pipeline on movie data using Python and postgreSQL Overview This project consisted on a automated Extraction, Transformation and Load p

Juan Nicolas Serrano 0 Jul 07, 2021
Recommendations from Cramer: On the show Mad-Money (CNBC) Jim Cramer picks stocks which he recommends to buy. We will use this data to build a portfolio

Backtesting the "Cramer Effect" & Recommendations from Cramer Recommendations from Cramer: On the show Mad-Money (CNBC) Jim Cramer picks stocks which

Gábor Vecsei 12 Aug 30, 2022
Top 50 best selling books on amazon

It's a dashboard that shows the detailed information about each book in the top 50 best selling books on amazon over the last ten years

Nahla Tarek 1 Nov 18, 2021
A CLI tool to reduce the friction between data scientists by reducing git conflicts removing notebook metadata and gracefully resolving git conflicts.

databooks is a package for reducing the friction data scientists while using Jupyter notebooks, by reducing the number of git conflicts between different notebooks and assisting in the resolution of

dataroots 86 Dec 25, 2022
ASTR 302: Python for Astronomy (Winter '22)

ASTR 302, Winter 2022, University of Washington: Python for Astronomy Mario Jurić Location When: 2:30-3:50, Monday & Wednesday, Winter quarter 2022 Wh

UW ASTR 302: Python for Astronomy 4 Jan 12, 2022
API>local_db>AWS_RDS - Disclaimer! All data used is for educational purposes only.

APIlocal_dbAWS_RDS Disclaimer! All data used is for educational purposes only. ETL pipeline diagram. Aim of project By creating a fully working pipe

0 Apr 25, 2022
A Python package for the mathematical modeling of infectious diseases via compartmental models

A Python package for the mathematical modeling of infectious diseases via compartmental models. Originally designed for epidemiologists, epispot can be adapted for almost any type of modeling scenari

epispot 12 Dec 28, 2022
Analyzing Earth Observation (EO) data is complex and solutions often require custom tailored algorithms.

eo-grow Earth observation framework for scaled-up processing in Python. Analyzing Earth Observation (EO) data is complex and solutions often require c

Sentinel Hub 18 Dec 23, 2022
Python ELT Studio, an application for building ELT (and ETL) data flows.

The Python Extract, Load, Transform Studio is an application for performing ELT (and ETL) tasks. Under the hood the application consists of a two parts.

Schlerp 55 Nov 18, 2022
Data Scientist in Simple Stock Analysis of PT Bukalapak.com Tbk for Long Term Investment

Data Scientist in Simple Stock Analysis of PT Bukalapak.com Tbk for Long Term Investment Brief explanation of PT Bukalapak.com Tbk Bukalapak was found

Najibulloh Asror 2 Feb 10, 2022
Shot notebooks resuming the main functions of GeoPandas

Shot notebooks resuming the main functions of GeoPandas, 2 notebooks written as Exercises to apply these functions.

1 Jan 12, 2022
Feature engineering and machine learning: together at last

Feature engineering and machine learning: together at last! Lambdo is a workflow engine which significantly simplifies data analysis by unifying featu

Alexandr Savinov 14 Sep 15, 2022
This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP.

Overview Welcome to the Step-X repository. This repo is dedicated to the data extraction and manipulation of the World Bank's database called STEP. Be

Keanu Pang 0 Jan 20, 2022
PyClustering is a Python, C++ data mining library.

pyclustering is a Python, C++ data mining library (clustering algorithm, oscillatory networks, neural networks). The library provides Python and C++ implementations (C++ pyclustering library) of each

Andrei Novikov 1k Jan 05, 2023
An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify.

An ETL Pipeline of a large data set from a fictitious music streaming service named Sparkify. The ETL process flows from AWS's S3 into staging tables in AWS Redshift.

1 Feb 11, 2022
Larch: Applications and Python Library for Data Analysis of X-ray Absorption Spectroscopy (XAS, XANES, XAFS, EXAFS), X-ray Fluorescence (XRF) Spectroscopy and Imaging

Larch: Data Analysis Tools for X-ray Spectroscopy and More Documentation: http://xraypy.github.io/xraylarch Code: http://github.com/xraypy/xraylarch L

xraypy 95 Dec 13, 2022
A data structure that extends pyspark.sql.DataFrame with metadata information.

MetaFrame A data structure that extends pyspark.sql.DataFrame with metadata info

Invent Analytics 8 Feb 15, 2022
An interactive grid for sorting, filtering, and editing DataFrames in Jupyter notebooks

qgrid Qgrid is a Jupyter notebook widget which uses SlickGrid to render pandas DataFrames within a Jupyter notebook. This allows you to explore your D

Quantopian, Inc. 2.9k Jan 08, 2023
Pizza Orders Data Pipeline Usecase Solved by SQL, Sqoop, HDFS, Hive, Airflow.

PizzaOrders_DataPipeline There is a Tony who is owning a New Pizza shop. He knew that pizza alone was not going to help him get seed funding to expand

Melwin Varghese P 4 Jun 05, 2022