Python package for analyzing behavioral data for Brain Observatory: Visual Behavior

Overview

Allen Institute Visual Behavior Analysis package

This repository contains code for analyzing behavioral data from the Allen Brain Observatory: Visual Behavior 2P Project.

This code is an important part of the internal Allen Institute code base and we are actively using and maintaining it. Issues are encouraged, but because this tool is so central to our mission pull requests might not be accepted if they conflict with our existing plans.

Before installing, it's recommended to set up a new Python environment:

For example, using Conda:

conda create -n visual_behavior_analysis python=3.7

Then activate the environment:

conda activate visual_behavior_analysis

Quickstart

and install with pip (Allen Institute internal users only):

pip install git+https://github.com/AllenInstitute/visual_behavior_analysis.git

Installation

This package is designed to be installed using standard Python packaging tools. For example,

python setup.py install

If you are using pip to manage packages and versions (recommended), you can also install using pip:

pip install ./

If you are plan to contribute to the development of the package, I recommend installing in "editable" mode:

pip install -e ./

This ensures that Python uses the current, active files in the folder (even while switching between branches).

To ensure that the newly created environment is visible in Jupyter:

Activate the environment:

conda activate visual_behavior_analysis

Install ipykernel:

pip install ipykernel

Register the environment with Jupyter:

python -m ipykernel install --user --name visual_behavior_analysis

Use

First, load up a Foraging2 output

import pandas as pd
data = pd.read_pickle(PATH_TO_FORAGING2_OUTPUT_PKL)

Then, we create the "core" data structure: a dictionary with licks, rewards, trials, running, visual stimuli, and metadata.

from visual_behavior.translator.foraging2 import data_to_change_detection_core

core_data = data_to_change_detection_core(data)

Finally, we create an "extended" dataframe for use in generating trial-level plots and analysis.

from visual_behavior.translator.core import create_extended_dataframe

extended_trials = create_extended_dataframe(
    trials=core_data['trials'],
    metadata=core_data['metadata'],
    licks=core_data['licks'],
    time=core_data['time'],
)

Testing

Before committing and/or submitting a pull request, it is ideal to run tests.

Tests are currently run against Python 3.6.12 and 3.7.7 on github using CircleCI. You can replicate those tests locally as follows:

Creating test virtual environments

CD {your local VBA directory}
conda create -n VBA_test_36 python=3.6.12
conda activate VBA_test_36
pip install .[DEV]

Then deactivate VBA_test_36 to create the 3.7 virtual environment:

conda create -n VBA_test_37 python=3.7.7
conda activate test_37
pip install .[DEV]

Basic testing (external users): Baseline tests consist of tests that can be run from outside of the Allen Institute and do not require access to any internal databases such as LIMS. The not onprem argument will skip all tests that can only be run on internal Allen Institute servers and are marked as onprem. To run these tests, do the following:

CD {your local VBA directory}
conda activate VBA_test_36
pytest -m "not onprem" 

On Premises Testing + Basic testing (internal Allen Institute Users): Some tests may only be run on premises (at the Allen Institute) because they must access our internal databases such as LIMS. For internal Allen Institute users, the call to pytest could be called without an onprem argument, which would run ALL tests. To run these tests, do the following:

CD {your local VBA directory}
conda activate VBA_test_36
pytest 

Linting / Circle CI Testing (all users):

CircleCI also tests that all files meet Pep 8 style requirements using the Flake8 module - a process referred to as 'linting'. Linting can be performed locally before commiting using Flake8 as follows:

flake8 {FILE_TO_CHECK}

Running a subset of tests: You can run a subset of test by doing the following

All tests in a sub directory:

CD {subfolder of VBA that contains the tests you'd like to run}
conda activate VBA_test_36
pytest {add -m "not onprem" as necessary}

All test in a single .py file:

CD {subfolder of VBA that contains the file with the tests you'd like to run}
conda activate VBA_test_36
pytest fileWithTests.py  {add -m "not onprem" as necessary}

Contributing

Pull requests are welcome.

  1. Fork the repo
  2. Create a feature branch
  3. Commit your changes
  4. Create a pull request
  5. Tag @dougollerenshaw, @matchings to review

Contributors:

Additional Links

Comments
  • [WIP] adds Foraging2 support

    [WIP] adds Foraging2 support

    this PR is a major refactor of this repository which implements the following...

    • formalizes a "core" set of behavioral data items that can be loaded from a Change Detection task & fixes #29 & #34
    • adds support for loading these core data structures from both legacy stimulus_code pkl files and Foraging2 output files, fixes #24
    • refactors much of the repo to isolate functions based on the type of data they manipulate and the types of manipulations they perform
    • removes functions that are not critical to data transformations, daily summary plots, or mtrain analysis to visual_behavior_research (see https://github.com/AllenInstitute/visual_behavior_research/pull/2)

    Outstanding items

    These must be resolved before merging into master

    • [x] @mochic808 noted multiple pieces of data that need to be loaded from Foraging2 output files that do not appear to be present and/or are not computable from existing fields. these currently get filled with nulls, but will need to be populated with real data once Foraging2 meets our needs
    • [ ] build out remaining data columns with new foraging2 data
    • [x] the legacy loading needs to be updated to fully conform to the core data structure
    • [x] bumping the version of this repo to 0.2.0

    cc @dougollerenshaw @matchings @mochic808 @nicain @ryval

    opened by neuromusic 23
  • Major Bug: multiple segmentation directories

    Major Bug: multiple segmentation directories

    https://github.com/AllenInstitute/visual_behavior_analysis/blob/8d6766218de9ec89281a15060dfac263e2d001f9/visual_behavior/ophys/io/convert_level_1_to_level_2.py#L135

    This line will select the "first" directory, but there could be multiple of these.

    bug 
    opened by nicain 20
  • cannot assign timestamps to all encoder values

    cannot assign timestamps to all encoder values

    from visual_behavior.translator.core import create_extended_dataframe
    from visual_behavior.schemas.extended_trials import ExtendedTrialSchema
    from visual_behavior.translator.foraging2 import data_to_change_detection_core
    import pandas as pd
    
    foraging_file_name = "/allen/programs/braintv/production/neuralcoding/prod0/specimen_651725156/behavior_session_703485615/180530092658_363894_81c53274-e9c7-4b94-b51d-78c76c494e9d.pkl"
    
    
    data = pd.read_pickle(foraging_file_name)
    assert data['platform_info']['camstim'] == '0.3.2'
    core_data = data_to_change_detection_core(data)
    df = create_extended_dataframe(trials=core_data['trials'],metadata=core_data['metadata'],licks=core_data['licks'],time=core_data['time'],)
    

    Error message:

    Traceback (most recent call last):
      File "/home/nicholasc/projects/mtrain_api/scripts/debug.py", line 13, in <module>
        core_data = data_to_change_detection_core(data)
      File "/home/nicholasc/projects/visual_behavior_analysis/visual_behavior/translator/foraging2/__init__.py", line 45, in data_to_change_detection_core
        "running": data_to_running(data),
      File "/home/nicholasc/projects/visual_behavior_analysis/visual_behavior/translator/foraging2/__init__.py", line 218, in data_to_running
        speed_df = get_running_speed(data)[["speed (cm/s)", "time"]]  # yeah...it's dumb i kno...
      File "/home/nicholasc/projects/visual_behavior_analysis/visual_behavior/translator/foraging2/extract.py", line 713, in get_running_speed
        raise ValueError("dx and time must be the same length")
    ValueError: dx and time must be the same length
    
    bug foraging2 mtrain_upload 
    opened by nicain 18
  • catch frequency on stage0 autorewards?

    catch frequency on stage0 autorewards?

    What is the catch frequency being set to in stage0 autorewards? I don't see any reference to catch frequency in params, but it is in the top level of core_data['metadata'] (=0.125). Interrogating the trial, it doesn't appear that there are any catch trials (which is correct behavior), so it would appear that this parameter isn't being applied.

    screen shot 2018-05-24 at 10 32 39 pm

    question 
    opened by dougollerenshaw 14
  • Can't read any files from foraging2 commit 0a4a96a

    Can't read any files from foraging2 commit 0a4a96a

    A new batch of foraging2 files started showing up this evening with commit hash '0a4a96a'. Visual_behavior can't open any of them.

    Minimum code to replicate error:

    import pandas as pd
    from visual_behavior.translator.foraging2 import data_to_change_detection_core
    
    datapath= r'/users/dougo/dropbox/sampledata/stage_4/doc_images_0a4a96a_ObstinateDoCMouse.pkl'
    
    data=pd.read_pickle(datapath)
    
    core_data=data_to_change_detection_core(data)
    

    Traceback:

    ---------------------------------------------------------------------------
    ValueError                                Traceback (most recent call last)
    <ipython-input-1-796296e37cd2> in <module>()
          6 data=pd.read_pickle(datapath)
          7 
    ----> 8 core_data=data_to_change_detection_core(data)
    
    /Users/dougo/Dropbox/PythonCode/visual_behavior/visual_behavior/translator/foraging2/__init__.pyc in data_to_change_detection_core(data)
         35         "licks": data_to_licks(data),
         36         "trials": data_to_trials(data),
    ---> 37         "running": data_to_running(data),
         38         "rewards": data_to_rewards(data),
         39         "visual_stimuli": None,  # not yet implemented
    
    /Users/dougo/Dropbox/PythonCode/visual_behavior/visual_behavior/translator/foraging2/__init__.pyc in data_to_running(data)
        197     - the index of each time is the frame number
        198     """
    --> 199     speed_df = get_running_speed(data)[["speed (cm/s)", "time"]]  # yeah...it's dumb i kno...
        200 
        201     n_frames = len(speed_df)
    
    /Users/dougo/Dropbox/PythonCode/visual_behavior/visual_behavior/translator/foraging2/extract.pyc in get_running_speed(exp_data, smooth, time)
        715 
        716     if len(time) != len(dx):
    --> 717         raise ValueError("dx and time must be the same length")
        718 
        719     speed = calc_deriv(dx, time)
    
    ValueError: dx and time must be the same length
    
    bug foraging2 
    opened by dougollerenshaw 14
  • multiple pickle files.

    multiple pickle files.

    @matchings @wbwakeman @NileGraddis when there are multiple pickle files, which one should we pick? I see all sorts of names for the pkl file: xxx.pkl xxx_stimulus.pkl xxx_session.pkl

    currently the convert code is using the first one :
    pkl_file = [file for file in os.listdir(pkl_dir) if file.startswith(expt_date)][0]

    however this fails for a session like: session_id: 790910226

    where the valid pkl file name is "xxx_stim.pkl", but the convert code picks the "xxx_session.pkl".

    what's your recommendation? ... should we just do a "try and except" loop and pick the one that works?

    Thanks.

    opened by farznaj 13
  • Remove dependency on computer list in `devices`

    Remove dependency on computer list in `devices`

    Currently, visual behavior relies on a hard-coded dictionary linking computer name to 'Rig ID'. The dictionary lives in 'devices': https://github.com/AllenInstitute/visual_behavior_analysis/blob/master/visual_behavior/devices.py

    MPE is maintaining a list of computers and rig IDs in a network location. We should use that list instead. I'll follow up with a link to the MPE location in a comment.

    good first issue task 
    opened by dougollerenshaw 13
  • KeyError: 'auto_reward_vol'

    KeyError: 'auto_reward_vol'

    I'm getting key errors when trying to process 2P6 pkl files using this package. There is no auto_reward_vol in core_data['metadata']. There is a rewardvol key, however. Is there something missing in the foraging translator?

    I suspect commit 7061e37 brought this to light.

    bug 
    opened by ryval 12
  • change time distribution needs fixed

    change time distribution needs fixed

    Two validation functions are currently failing due to issues with the change distribution. We need to find a way to deal with these. The most likely solution is to switch from drawing change times on a continuous exponential distribution to drawing on a discrete distribution based on the expected number of stimulus flashes in the stimulus window.

    Failing validation functions: Function: validate_max_change_time Reason for failure: If the change falls on the last flash, it can fall slightly outside of the stimulus window (in the example below, the max change time is 8.256 seconds and the stimulus window is 8.250 seconds)

    Function: validate_monotonically_decreasing_number_of_change_times Reason for failure: The mapping from the continuous distribution to the discrete flashes distorts the exponential function. See the example below.

    Histogram of change times from \allen\aibs\mpe\Software\data\behavior\validation\stage_4\doc_images_9364d72_PerfectDoCMouse.pkl

    change_time_distribution

    foraging2 
    opened by dougollerenshaw 12
  • foraging2 translator is missing change times

    foraging2 translator is missing change times

    @neuromusic @mochic : No catch trials are being identified when loading Foraging2 data with the master branch. When I revert to the 'fix/load_licks' branch and reload the same PKL file, the problem resolves, so it would appear not to be an issue with the underlying Foraging2 data.

    Minimum code to replicate error (on master branch):

    from visual_behavior.translator.foraging2 import data_to_change_detection_core
    from visual_behavior.change_detection.trials.extended import create_extended_dataframe
    from visual_behavior.visualization.extended_trials.daily import make_daily_figure
    import pandas as pd
    
    datapath=r"\\allen\programs\braintv\workgroups\neuralcoding\Behavior\Data\M347745\output\180430100756650000.pkl"
    
    data=pd.read_pickle(datapath)
    
    core_data=data_to_change_detection_core(data)
    
    trials = create_extended_dataframe(
        trials=core_data['trials'], 
        metadata=core_data['metadata'], 
        licks=core_data['licks'], 
        time=core_data['time'],
    )
    
    assert len(trials[trials['trial_type']=='catch'])>0
    
    bug 
    opened by dougollerenshaw 12
  • Lick time/number mismatch between core_data['trials'] and extended dataframe

    Lick time/number mismatch between core_data['trials'] and extended dataframe

    There seem to be some extra licks showing up in the extended dataframe. These extra licks trigger False from a validation function designed to ensure that any pre-change licks lead to aborted trials.

    image datapath= r'//allen/aibs/mpe/Software/data/behavior/validation/stage_1\doc_gratings_8910798_StupidDoCMouse.pkl'

    bug 
    opened by dougollerenshaw 11
  • Circular import problem

    Circular import problem

    import visual_behavior.plotting as vbp
    

    Throws an attribute error "module 'visual_behavior' has no attribute 'utilities'

    This is because visual_behavior.utilities imports visual_behavior.visualization.behavior which circularly imports visual_behavior.utilities.

    opened by alexpiet 0
  • 8 behavior session NWBs are missing from the platform_paper_cache and do not download properly from AWS

    8 behavior session NWBs are missing from the platform_paper_cache and do not download properly from AWS

    There are 8 behavior sessions in the platform paper experiments table that do not have NWB files in the platform paper cache in the directory below, and don't download from AWS when attempting to load the dataset object. Attempting to load results in the below error indicating that the files are truncated, however they simply don't exist.

    These sessions will not be included in any platform paper analysis until the issue is resolved.

    behavior_session_ids = [1002520823, 1002956042, 1003249011, 814545306, 815045874, 818007489, 818825644, 875471358]

    platform paper cache dir = \allen\programs\braintv\workgroups\nc-ophys\visual_behavior\platform_paper_cache\2.12.4\visual-behavior-ophys-1.0.1\behavior_sessions

    image

    This probably also requires an SDK GitHub issue but i am logging it here first for record keeping purposes and visibility.

    opened by matchings 0
  • update dependencies

    update dependencies

    starting from a fresh conda install, i get errors associated with h5py, pytables, and umap when i try to run VBA loading functions, so i believe they need to be added as dependencies.

    opened by matchings 1
  • calls to visual_behavior.database only work with specific pymongo versions

    calls to visual_behavior.database only work with specific pymongo versions

    Calls I made to visual_behavior.database only work on pymongo==3.12.3 or below (pymongo>=4.0 does not work)

    *MongoClient.database_names() was removed in the migration to version 4, so the function list_database_names() must be used instead

    As shown here: https://pymongo.readthedocs.io/en/stable/migrate-to-pymongo4.html#mongoclient-database-names-is-removed

    opened by saakethmm 1
Releases(v0.4.3)
Owner
Allen Institute
Please visit http://alleninstitute.github.io/ for more information.
Allen Institute
Example Of Splunk Search Query With Python And Splunk Python SDK

SSQAuto (Splunk Search Query Automation) Example Of Splunk Search Query With Python And Splunk Python SDK installation: ➜ ~ git clone https://github.c

AmirHoseinTangsiriNET 1 Nov 14, 2021
Pandas on AWS - Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretManager, PostgreSQL, MySQL, SQLServer and S3 (Parquet, CSV, JSON and EXCEL).

AWS Data Wrangler Pandas on AWS Easy integration with Athena, Glue, Redshift, Timestream, QuickSight, Chime, CloudWatchLogs, DynamoDB, EMR, SecretMana

Amazon Web Services - Labs 3.3k Jan 04, 2023
The lastest all in one bombing tool coded in python uses tbomb api

BaapG-Attack is a python3 based script which is officially made for linux based distro . It is inbuit mass bomber with sms, mail, calls and many more bombing

59 Dec 25, 2022
MS in Data Science capstone project. Studying attacks on autonomous vehicles.

Surveying Attack Models for CAVs Guide to Installing CARLA and Collecting Data Our project focuses on surveying attack models for Connveced Autonomous

Isabela Caetano 1 Dec 09, 2021
Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

Autopsy Module to analyze Registry Hives based on bookmarks provided by EricZimmerman for his tool RegistryExplorer

Mohammed Hassan 13 Mar 31, 2022
pandas: powerful Python data analysis toolkit

pandas is a Python package that provides fast, flexible, and expressive data structures designed to make working with "relational" or "labeled" data both easy and intuitive.

pandas 36.4k Jan 03, 2023
Advanced Pandas Vault — Utilities, Functions and Snippets (by @firmai).

PandasVault ⁠— Advanced Pandas Functions and Code Snippets The only Pandas utility package you would ever need. It has no exotic external dependencies

Derek Snow 374 Jan 07, 2023
Learn machine learning the fun way, with Oracle and RedBull Racing

Red Bull Racing Analytics Hands-On Labs Introduction Are you interested in learning machine learning (ML)? How about doing this in the context of the

Oracle DevRel 55 Oct 24, 2022
Python Library for learning (Structure and Parameter) and inference (Statistical and Causal) in Bayesian Networks.

pgmpy pgmpy is a python library for working with Probabilistic Graphical Models. Documentation and list of algorithms supported is at our official sit

pgmpy 2.2k Dec 25, 2022
BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings.

BinTuner is a cost-efficient auto-tuning framework, which can deliver a near-optimal binary code that reveals much more differences than -Ox settings. it also can assist the binary code analysis rese

BinTuner 42 Dec 16, 2022
Describing statistical models in Python using symbolic formulas

Patsy is a Python library for describing statistical models (especially linear models, or models that have a linear component) and building design mat

Python for Data 866 Dec 16, 2022
This is a repo documenting the best practices in PySpark.

Spark-Syntax This is a public repo documenting all of the "best practices" of writing PySpark code from what I have learnt from working with PySpark f

Eric Xiao 447 Dec 25, 2022
Deep universal probabilistic programming with Python and PyTorch

Getting Started | Documentation | Community | Contributing Pyro is a flexible, scalable deep probabilistic programming library built on PyTorch. Notab

7.7k Dec 30, 2022
Data pipelines built with polars

valves Warning: the project is very much work in progress. Valves is a collection of functions for your data .pipe()-lines. This project aimes to host

14 Jan 03, 2023
This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot.

superSFS This is a tool for speculation of ancestral allel, calculation of sfs and drawing its bar plot. It is easy-to-use and runing fast. What you s

3 Dec 16, 2022
API>local_db>AWS_RDS - Disclaimer! All data used is for educational purposes only.

APIlocal_dbAWS_RDS Disclaimer! All data used is for educational purposes only. ETL pipeline diagram. Aim of project By creating a fully working pipe

0 Apr 25, 2022
Aggregating gridded data (xarray) to polygons

A package to aggregate gridded data in xarray to polygons in geopandas using area-weighting from the relative area overlaps between pixels and polygons. Check out the binder link above for a sample c

Kevin Schwarzwald 42 Nov 09, 2022
Nobel Data Analysis

Nobel_Data_Analysis This project is for analyzing a set of data about people who have won the Nobel Prize in different fields and different countries

Mohammed Hassan El Sayed 1 Jan 24, 2022
Parses data out of your Google Takeout (History, Activity, Youtube, Locations, etc...)

google_takeout_parser parses both the Historical HTML and new JSON format for Google Takeouts caches individual takeout results behind cachew merge mu

Sean Breckenridge 27 Dec 28, 2022
Powerful, efficient particle trajectory analysis in scientific Python.

freud Overview The freud Python library provides a simple, flexible, powerful set of tools for analyzing trajectories obtained from molecular dynamics

Glotzer Group 195 Dec 20, 2022