A set of functions and analysis classes for solvation structure analysis

Overview

SolvationAnalysis

Powered by NumFOCUS Powered by MDAnalysis GitHub Actions Status TravisCI codecov docs


The macroscopic behavior of a liquid is determined by its microscopic structure. For ionic systems, like batteries and many enzymes, the solvation environment surrounding ions is especially important. By studying the solvation of interesting materials, scientists can better understand, engineer, and design new technologies. The aim of this project is to implement a robust and cohesive set of methods for solvation analysis that would be widely useful in both biomolecular and battery electrolyte simulations. The core of the solvation module will be a set of functions for easily working with ionic solvation shells. Building from that core functionality, the module will implement several analysis methods for analyzing ion pairing, ion speciation, residence times, and shell association and dissociation.

Main development by @orioncohen, with mentorship from @richardjgowers, @IAlibay, and @hmacdope.

Acknowledgements

Project based on the Computational Molecular Science Python Cookiecutter version 1.5.

Comments
  • analysis functions

    analysis functions

    Description

    This PR will establish the analysis functions for the solvation module. Linked to issues #11, #21, and #28.

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [x] create ion speciation class
    • [x] create coordination number class
    • [x] create solvation god class
    • [x] update documentation
    • [x] finalize data structure
    • [x] complete implementation
    • [x] finalize in-line documentation

    Status

    • [x] Ready to go
    core high-priority 
    opened by orionarcher 34
  • Parse RDF's and find minima

    Parse RDF's and find minima

    Description

    Provide a brief description of the PR's purpose here.

    Todos

    • [x] add test data
    • [x] functionality to interpolate rdfs
    • [x] functionality to find minima
    • [x] refine minima finding
    • [x] solidify unit testing

    Status

    • [x] Ready to go
    core testing high-priority 
    opened by orionarcher 20
  • Basic testing with Pytest

    Basic testing with Pytest

    Description

    Set up basic testing for current implemented functions.

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [x] get_radial_shell testing
    • [x] get_closest_n_mol testing
    • [x] slightly rework get_closest_n_mol signature

    Status

    • [x] Ready to go
    core testing 
    opened by orionarcher 17
  • tutorials

    tutorials

    Description

    Create a convenient tutorial for users to learn solvation_analysis

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [x] jupyter tutorial for users
    • [x] set up jupyter edit requests

    Status

    • [x] Ready to go
    documentation high-priority 
    opened by orionarcher 12
  • Decide hierarchy of analysis classes/methods

    Decide hierarchy of analysis classes/methods

    Some of our proposed functionality follows on directly from other functionality and some does not.

    This means we need to discuss how to compartmentalise functionality into classes and methods many of which will subclass analysis base.

    I thought I would make this issue as an open discussion area for how we want to compartmentalise /hierarchicalize our functionality. I think the best way to proceed is if @orioncohen can lay out how he sees it and then @richardjgowers @IAlibay and any @MDAnalysis/gsoc-mentors can weigh in. We may also need to meet to discuss this. :)

    question core high-priority 
    opened by hmacdope 11
  • add visualization tutorial

    add visualization tutorial

    Description

    Since nglview is causing issues and I don't want to get bogged down before the release, I wanted to put this in a separate thread.

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [x] tutorial introduction
    • [x] fix nglview issues
    • [x] finish tutorial

    Questions

    • [x] Why is nglview showing too many bonds on molecules?

    Status

    • [x] Ready to go
    bug documentation enhancement 
    opened by orionarcher 10
  • Initial functions for solvation selection + documentation + testing

    Initial functions for solvation selection + documentation + testing

    Description

    Implements basic functionality for solvation selection. Currently implements get_atom_group, get_n_shells, get_closest_n_mol, and get_radial_shell. Addresses issue #5

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [x] implement basic functionality for solvation selection
    • [x] documentation for functions
    • [x] unit tests for basic functionality

    Questions

    • [x] what doc style should be used?

    Status

    • [x] ready to go
    opened by orionarcher 10
  • Support for multi atom solutes

    Support for multi atom solutes

    Description

    This PR was split off from PR #70 to better separate quality-of-life changes from API changes. See that PR for some history and discussion.

    This is intended to be a major PR to handle multi-atom solutes. It relates to issues #47, #66, and #58.

    I would like to propose the following outline of new functionality. In this description, I will focus on the outward facing API. I'll use the somewhat trivial case of water as an example.

    1. Solution will be renamed to Solute. All references to solute in the current documentation will be renamed to solvated atom or solvated atoms. I think this better captures what the Solute class really is, especially as we expand to multi-atom Solutes.

    2. The default initializer for Solute will take only a single atom per residue. It will not support multiple identical atoms on a residue. This will be handled by the more general case. As a result, instantiating a Solute for a single atom remains the same.

    water = u.select_atoms(...)
    water_O = u.select_atoms(...)
    water_O_solute = Solute(water_O, {"water": water})
    
    1. Additional initializers will be added to instantiate a Solute, these will support multi-atom solutes.

    The first will allow the user to stitch together multiple solutes to create a new solute.

    water_H1 = u.select_atoms(...)
    water_H2 = u.select_atoms(...)
    solute_O = Solute(water_O, {"water": water})
    solute_H1 = Solute(water_H1, {"water": water})
    solute_H2 = Solute(water_H2, {"water": water})
    
    multi_atom_solute = Solute.from_solutes([solute_O, solute_H1, solute_H2])  # maybe this should be a dict?
    

    The second will allow users to simply instantiate a solute from an entire residue (or part of a residue). There may be technical challenges here so this behavior is not guaranteed.

    multi_atom_solute = Solute.from_residue(water, {"water": water})
    
    1. To support this, the solvation_data dataframe will have two additional columns added, a "residue" column and a "solute_name" column. All analysis classes will be refactored to operate on the "residue" column rather than the "solvated_atom" column. This will make no difference for single-atom solutes but will allow the analysis classes to generalize easily. I'm not completely sure the "solute_name" column is necessary, but it would be convenient to have.

    2. When a multi-atom solute is created all of the solvation_data dataframes from each constituent single-atom solute will be merged together. The "residue" column will group together solvated atoms on the same residue such that the analysis classes can operate on the whole solute. The API for accessing the residence classes will be identical.

    multi_atom_solute.coordination_number["water"]   # valid property
    
    1. We will retain all of the single atom Solutes as a property of the multi-atom Solute. This would amount to a rough doubling of the memory footprint, but it would make follow up analysis easier. I'm a bit torn here and there may be a better way.
    >>> print(water.atoms)  # what should this be called?
    >>> [solute_O, solute_H1, solute_H2]  # maybe this should be a dict?
    

    For a single atom solute the atoms list would still be present but the data within would be identical to the solvation_data of the solute itself. Single atom solutes are now just a special case of multi-atom solutes.

    water_O_solute.atoms[0].solvation_data = water_O_solute.solvation_data
    

    I'm sure there are many things I am not considering that will come up later, but as a start, I think this plan will allow the package to be generalized with maximum code reuse. I'd love feedback or suggestions on any aspect of the outline above.

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [x] make solutions composable so that multi-atom solutes can be constructed systematically
    • [x] put guardrails in place to prevent misuse
    • [ ] bare minimum rewrite of the documentation

    Status

    • [ ] Ready to go
    enhancement core 
    opened by orionarcher 8
  • Residence times and solute-solvent network calculations

    Residence times and solute-solvent network calculations

    Description

    This PR adds an analysis module to calculate residence times and an analysis module to calculate solute-solvent networks.

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [x] Residence module
    • [x] Change exponential fit in residence time calculations to 1/e decay point
    • [x] Networking module
    • [x] Residence testing
    • [x] Networking testing
    • [x] Residence documentation
    • [x] Networking documentation
    • [x] Add diluent_composition analysis to Pairing class
    • [x] Add module selection kwarg to Solution
    • [x] Add from_solution class method to all analysis classes
    • [x] Residence and Networking tutorial
    • [x] add citations for Networking and Residence codes
    • [x] improve documentation for Residence time
      • [x] specify caveats and difference between both implementations

    Status

    The PR is nearly finished, the main outstanding issues are improved documentation, citations and tutorials.

    enhancement 
    opened by orionarcher 8
  • Formalize Roadmap in Projects tab

    Formalize Roadmap in Projects tab

    We should decide on a model for a release schedule, Do we wish to match the MDA core idea and do quarterly (@IAlibay is that right?) releases? Or do we just want to do with major functionality change, ie at your discretion @orioncohen?

    If people want updated functionality and bugfixes, they can always clone the current main branch and install with pip install -e .

    Raising issue as food for thought.

    release 
    opened by hmacdope 8
  • Enhancements to support composable Solutions and self-solvating solutes

    Enhancements to support composable Solutions and self-solvating solutes

    UPDATE:

    This PR now implements a number of quality of life changes and solves issue #31. The proposed multi-atom solute changes will be implemented in another PR. See PR #72 for the new changes!

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [x] add new testing data with a multi-atom solute
    • [x] remove foolish internal numbering scheme of solvated_atoms
    • [x] allow solutes to also act as solvents
    • [x] replace all column name strings with variables stored in a column_names.py file

    Status

    • [x] Ready to go

    Description

    This is intended to be a major PR to handle multi-atom solutes. It relates to issues #47, #31, #66, and #58.

    I would like to propose the following outline of new functionality. In this description, I will focus on the outward facing API. I'll use the somewhat trivial case of water as an example.

    1. Solution will be renamed to Solute. All references to solute in the current documentation will be renamed to solvated atom or solvated atoms. I think this better captures what the Solute class really is, especially as we expand to multi-atom Solutes.

    2. The default initializer for Solute will take only a single atom per residue. It will not support multiple identical atoms on a residue. This will be handled by the more general case. As a result, instantiating a Solute for a single atom remains the same. (note that I have already fixed the case with self-solvation identified in issue #31)

    water = u.select_atoms(...)
    water_O = u.select_atoms(...)
    water_O_solute = Solute(water_O, {"water": water})
    
    1. Additional initializers will be added to instantiate a Solute, these will support multi-atom solutes.

    The first will allow the user to stitch together multiple solutes to create a new solute.

    water_H1 = u.select_atoms(...)
    water_H2 = u.select_atoms(...)
    solute_O = Solute(water_O, {"water": water})
    solute_H1 = Solute(water_H1, {"water": water})
    solute_H2 = Solute(water_H2, {"water": water})
    
    multi_atom_solute = Solute.from_solutes([solute_O, solute_H1, solute_H2])  # maybe this should be a dict?
    

    The second will allow users to simply instantiate a solute from an entire residue (or part of a residue). There may be technical challenges here so this behavior is not guaranteed.

    multi_atom_solute = Solute.from_residue(water, {"water": water})
    
    1. To support this, the solvation_data dataframe will have two additional columns added, a "residue" column and a "solute_name" column. All analysis classes will be refactored to operate on the "residue" column rather than the "solvated_atom" column. This will make no difference for single-atom solutes but will allow the analysis classes to generalize easily. I'm not completely sure the "solute_name" column is necessary, but it would be convenient to have.

    2. When a multi-atom solute is created all of the solvation_data dataframes from each constituent single-atom solute will be merged together. The "residue" column will group together solvated atoms on the same residue such that the analysis classes can operate on the whole solute. The API for accessing the residence classes will be identical.

    multi_atom_solute.coordination_number["water"]   # valid property
    
    1. We will retain all of the single atom Solutes as a property of the multi-atom Solute. This would amount to a rough doubling of the memory footprint, but it would make follow up analysis easier. I'm a bit torn here and there may be a better way.
    >>> print(water.atoms)  # what should this be called?
    >>> [solute_O, solute_H1, solute_H2]  # maybe this should be a dict?
    

    For a single atom solute the atoms list would still be present but the data within would be identical to the solvation_data of the solute itself. Single atom solutes are now just a special case of multi-atom solutes.

    water_O_solute.atoms[0].solvation_data = water_O_solute.solvation_data
    

    I'm sure there are many things I am not considering that will come up later, but as a start, I think this plan will allow the package to be generalized with maximum code reuse. I'd love feedback or suggestions on any aspect of the outline above.

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [x] add new testing data with a multi-atom solute
    • [x] remove foolish internal numbering scheme of solvated_atoms
    • [x] allow solutes to also act as solvents
    • [x] replace all column name strings with variables stored in a column_names.py file
    • [ ] make solutions composable so that multi-atom solutes can be constructed systematically
    • [ ] put guardrails in place to prevent misuse

    Status

    • [ ] Ready to go
    enhancement core high-priority 
    opened by orionarcher 7
  • Concatenation Issue for Residence Time Calculation

    Concatenation Issue for Residence Time Calculation

    I am using Residence.from_solute(solute) to calculate the residence time between trimers' N and water' O. It always reports the same concatenation issue which seems to be caused by the calculation of auto-covariance (Fig 1). However, when I calculate the residence time between anion and water, there isn't this problem. I add the solvation_data of anion solution (Fig 2) and trimer solution for your reference (Fig 3).

    Screen Shot 2022-12-05 at 15 58 01 Screen Shot 2022-12-05 at 15 58 21Fig 1 Screen Shot 2022-12-05 at 15 53 21 Fig 2

    Screen Shot 2022-12-05 at 15 52 53Fig 3

    opened by SophiaRuan 0
  • Styling and consistency of documentation could be improved

    Styling and consistency of documentation could be improved

    This is split off from PR #78 to address two points the consistency and formatting of the documentation.

    Todos:

    • [ ] closely read over all documentation for errors
    • [ ] make sure all syntax and code styling is consistent
    • [ ] enhance aesthetic styling of documentation
    documentation 
    opened by orionarcher 0
  • Work towards MDAKit integration

    Work towards MDAKit integration

    Now that MDAKits are live to roll we should work towards registering solvation-analysis as an MDAKit!

    See the blog post for more info.

    @orionarcher I would be interested to know if you would prefer to just go for it or wait for 0.2.0 (and some conda packages).

    AFAIK solvation-analysis already meets all the requirements listed in the white paper.

    opened by hmacdope 2
  • Solvation plots

    Solvation plots

    Description

    Provide a brief description of the PR's purpose here.

    Todos

    Notable points that this PR has either accomplished or will accomplish.

    • [ ] TODO 1

    Questions

    • [ ] Question1

    Status

    • [ ] Ready to go
    opened by laurlee 1
  • Create a `save_data` method for solution

    Create a `save_data` method for solution

    This method should dump the core solvation statistics to python dict. It will not contain enough information to reconstitute the Solution.

    Brought up in issue #52.

    opened by orionarcher 0
Releases(v0.1.4)
  • v0.1.4(Jun 28, 2022)

  • v0.1.3(Mar 17, 2022)

  • v0.1.2-beta(Sep 23, 2021)

    New functionality:

    • co-occurrence matrix calculation and plotting
    • identify types of coordinating atoms
    • find percentage of free solvent

    Bug fixes:

    • all AtomGroup.ids and ResidueGroup.resids changed to AtomGroup.ix and ResidueGroup.resindices
    Source code(tar.gz)
    Source code(zip)
  • v0.1.2(Sep 23, 2021)

    New functionality:

    • co-occurrence matrix calculation and plotting
    • identify types of coordinating atoms
    • find percentage of free solvent

    Bug fixes:

    • all AtomGroup.ids and ResidueGroup.resids changed to AtomGroup.ix and ResidueGroup.resindices
    Source code(tar.gz)
    Source code(zip)
  • v0.1.1-alpha(Aug 19, 2021)

  • v0.1.1(Aug 19, 2021)

Owner
MDAnalysis
MDAnalysis is an object-oriented Python library to analyze molecular dynamics trajectories.
MDAnalysis
Pypeln is a simple yet powerful Python library for creating concurrent data pipelines.

Pypeln Pypeln (pronounced as "pypeline") is a simple yet powerful Python library for creating concurrent data pipelines. Main Features Simple: Pypeln

Cristian Garcia 1.4k Dec 31, 2022
This is a repo documenting the best practices in PySpark.

Spark-Syntax This is a public repo documenting all of the "best practices" of writing PySpark code from what I have learnt from working with PySpark f

Eric Xiao 447 Dec 25, 2022
First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we want to understand column level lineage and automate impact analysis.

dbt-osmosis First and foremost, we want dbt documentation to retain a DRY principle. Every time we repeat ourselves, we waste our time. Second, we wan

Alexander Butler 150 Jan 06, 2023
Kennedy Institute of Rheumatology University of Oxford Project November 2019

TradingBot6M Kennedy Institute of Rheumatology University of Oxford Project November 2019 Run Change api.txt to binance api key: https://www.binance.c

Kannan SAR 2 Nov 16, 2021
Python package for analyzing sensor-collected human motion data

Python package for analyzing sensor-collected human motion data

Simon Ho 71 Nov 05, 2022
Python ELT Studio, an application for building ELT (and ETL) data flows.

The Python Extract, Load, Transform Studio is an application for performing ELT (and ETL) tasks. Under the hood the application consists of a two parts.

Schlerp 55 Nov 18, 2022
Repository created with LinkedIn profile analysis project done

EN/en Repository created with LinkedIn profile analysis project done. The datase

Mayara Canaver 4 Aug 06, 2022
TextDescriptives - A Python library for calculating a large variety of statistics from text

A Python library for calculating a large variety of statistics from text(s) using spaCy v.3 pipeline components and extensions. TextDescriptives can be used to calculate several descriptive statistic

150 Dec 30, 2022
BAyesian Model-Building Interface (Bambi) in Python.

Bambi BAyesian Model-Building Interface in Python Overview Bambi is a high-level Bayesian model-building interface written in Python. It's built on to

861 Dec 29, 2022
My solution to the book A Collection of Data Science Take-Home Challenges

DS-Take-Home Solution to the book "A Collection of Data Science Take-Home Challenges". Note: Please don't contact me for the dataset. This repository

Jifu Zhao 1.5k Jan 03, 2023
DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis.

DaDRA (day-druh) is a Python library for Data-Driven Reachability Analysis. The main goal of the package is to accelerate the process of computing estimates of forward reachable sets for nonlinear dy

2 Nov 08, 2021
Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database

Galvanalyser is a system for automatically storing data generated by battery cycling machines in a database, using a set of "harvesters", whose job it

Battery Intelligence Lab 20 Sep 28, 2022
Approximate Nearest Neighbor Search for Sparse Data in Python!

Approximate Nearest Neighbor Search for Sparse Data in Python! This library is well suited to finding nearest neighbors in sparse, high dimensional spaces (like text documents).

Meta Research 906 Jan 01, 2023
MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI

MidTerm Project for the Data Analysis FT Bootcamp, Adam Tycner and Florent ZAHOUI Hallo

Florent Zahoui 1 Feb 07, 2022
PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams

PLStream: A Framework for Fast Polarity Labelling of Massive Data Streams Motivation When dataset freshness is critical, the annotating of high speed

4 Aug 02, 2022
Data science/Analysis Health Care Portfolio

Health-Care-DS-Projects Data Science/Analysis Health Care Portfolio Consists Of 3 Projects: Mexico Covid-19 project, analyze the patient medical histo

Mohamed Abd El-Mohsen 1 Feb 13, 2022
Hangar is version control for tensor data. Commit, branch, merge, revert, and collaborate in the data-defined software era.

Overview docs tests package Hangar is version control for tensor data. Commit, branch, merge, revert, and collaborate in the data-defined software era

Tensorwerk 193 Nov 29, 2022
The repo for mlbtradetrees.com. Analyze any trade in baseball history!

The repo for mlbtradetrees.com. Analyze any trade in baseball history!

7 Nov 20, 2022
2019 Data Science Bowl

Kaggle-2019-Data-Science-Bowl-Solution - Here i present my solution to kaggle 2019 data science bowl and how i improved it to win a silver medal in that competition.

Deepak Nandwani 1 Jan 01, 2022
Implementation in Python of the reliability measures such as Omega.

reliabiliPy Summary Simple implementation in Python of the [reliability](https://en.wikipedia.org/wiki/Reliability_(statistics) measures for surveys:

Rafael Valero Fernández 2 Apr 27, 2022