Intake is a lightweight package for finding, investigating, loading and disseminating data.

Last update: Jan 01, 2023

Overview

Intake: A general interface for loading data

Intake is a lightweight set of tools for loading and sharing data in data science projects. Intake helps you:

Load data from a variety of formats (see the current list of known plugins) into containers you already know, like Pandas dataframes, Python lists, NumPy arrays, and more.
Convert boilerplate data loading code into reusable Intake plugins
Describe data sets in catalog files for easy reuse and sharing between projects and with others.
Share catalog information (and data sets) over the network with the Intake server

Documentation is available at Read the Docs.

Status of intake and related packages is available at Status Dashboard

Weekly news about this repo and other related projects can be found on the wiki

Install

Recommended method using conda:

conda install -c conda-forge intake

You can also install using pip, in which case you have a choice as to how many of the optional dependencies you install, with the simplest having least requirements

pip install intake

and additional sections [server], [plot] and [dataframe], or to include everything:

pip install intake[complete]

Note that you may well need specific drivers and other plugins, which usually have additional dependencies of their own.

Development

Create development Python environment with the required dependencies, ideally with conda. The requirements can be found in the yml files in the scripts/ci/ directory of this repo.
- e.g. conda env create -f scripts/ci/environment-py38.yml and then conda activate test_env
Install intake using pip install -e .[complete]
Use pytest to run tests.
Create a fork on github to be able to submit PRs.
We respect, but do not enforce, pep8 standards; all new code should be covered by tests.

Intake is a lightweight package for finding, investigating, loading and disseminating data.

Related tags

Overview

Intake: A general interface for loading data

Install

Development

Owner

Intake

MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Udacity-api-reporting-pipeline - Udacity api reporting pipeline

Weather analysis with Python, SQLite, SQLAlchemy, and Flask

Spectacular AI SDK fuses data from cameras and IMU sensors and outputs an accurate 6-degree-of-freedom pose of a device.

Supply a wrapper ``StockDataFrame`` based on the ``pandas.DataFrame`` with inline stock statistics/indicators support.

An implementation of the largeVis algorithm for visualizing large, high-dimensional datasets, for R

Statistical package in Python based on Pandas

Manage large and heterogeneous data spaces on the file system.

Collections of pydantic models

Clean and reusable data-sciency notebooks.

Data pipelines built with polars

A Streamlit web-app for a data-science project that aims to evaluate if the answer to a question is helpful.

The Master's in Data Science Program run by the Faculty of Mathematics and Information Science

Semi-Automated Data Processing

A meta plugin for processing timelapse data timepoint by timepoint in napari

Using approximate bayesian posteriors in deep nets for active learning

Evidence enables analysts to deliver a polished business intelligence system using SQL and markdown.

Data Science Environment Setup in single line

NumPy aware dynamic Python compiler using LLVM

Probabilistic reasoning and statistical analysis in TensorFlow