The aim is to contain multiple models for materials discovery under a common interface

Overview

Aviary

License: MIT GitHub Repo Size GitHub last commit Tests pre-commit.ci status

The aviary contains:

  • Open Roost In Colab  -  roost,
  • Open Wren In Colab  -  wren,
  • cgcnn.

The aim is to contain multiple models for materials discovery under a common interface

Environment Setup

To use aviary you need to create an environment with the correct dependencies. The easiest way to get up and running is to use anaconda. A cudatoolkit=11.1 environment file is provided environment-gpu-cu111.yml allowing a working environment to be created with:

conda env create -f environment-gpu-cu111.yml

If you are not using cudatoolkit=11.1 or do not have access to a GPU this setup will not work for you. If so please check the following pages PyTorch, PyTorch-Scatter for how to install the core packages and then install the remaining requirements as detailed in requirements.txt.

The code was developed and tested on Linux Mint 19.1 Tessa. The code should work with other Operating Systems but it has not been tested for such use.

Aviary Setup

Once you have set up an environment with the correct dependencies you can install aviary using the following commands from the top of the directory:

conda activate aviary
python setup.py sdist
pip install -e .

This will install the library in an editable state allowing for advanced users to make changes as desired.

Example Use

To test the input files generation and cleaning/canonicalization please run:

python examples/inputs/poscar2df.py

This script will load and parse a subset of raw POSCAR files from the TAATA dataset and produce the datasets/examples/examples.csv file used for the next example. The raw files have been selected to ensure that the subset contains all the correct endpoints for the 5 elemental species in the Hf-N-Ti-Zr-Zn chemical system. All the models used share can be run on the input file produced by this example code. To test each of the three models provided please run:

python examples/roost-example.py --train --evaluate --data-path examples/inputs/examples.csv --targets E_f --tasks regression --losses L1 --robust --epoch 10
python examples/wren-example.py --train --evaluate --data-path examples/inputs/examples.csv --targets E_f --tasks regression --losses L1 --robust --epoch 10
python examples/cgcnn-example.py --train --evaluate --data-path examples/inputs/examples.csv --targets E_f --tasks regression --losses L1 --robust --epoch 10

Please note that for speed/demonstration purposes this example runs on only ~68 materials for 10 epochs - running all these examples should take < 30s. These examples do not have sufficient data or training to make accurate predictions, however, the same scripts have been used for all experiments conducted.

Cite This Work

If you use this code please cite the relevant work:

Predicting materials properties without crystal structure: Deep representation learning from stoichiometry. [Paper] [arXiv]

@article{goodall2020predicting,
  title={Predicting materials properties without crystal structure: Deep representation learning from stoichiometry},
  author={Goodall, Rhys EA and Lee, Alpha A},
  journal={Nature Communications},
  volume={11},
  number={1},
  pages={1--9},
  year={2020},
  publisher={Nature Publishing Group}
}

Rapid Discovery of Novel Materials by Coordinate-free Coarse Graining. [arXiv]

@article{goodall2021rapid,
  title={Rapid Discovery of Novel Materials by Coordinate-free Coarse Graining},
  author={Goodall, Rhys EA and Parackal, Abhijith S and Faber, Felix A and Armiento, Rickard and Lee, Alpha A},
  journal={arXiv preprint arXiv:2106.11132},
  year={2021}
}

Crystal Graph Convolutional Neural Networks for an Accurate and Interpretable Prediction of Material Properties. [Paper] [arXiv]

@article{xie2018crystal,
  title={Crystal graph convolutional neural networks for an accurate and interpretable prediction of material properties},
  author={Xie, Tian and Grossman, Jeffrey C},
  journal={Physical review letters},
  volume={120},
  number={14},
  pages={145301},
  year={2018},
  publisher={APS}
}

Disclaimer

This research code is provided as-is. We have checked for potential bugs and believe that the code is being shared in a bug-free state. As this is an archive version we will not be able to amend the code to fix bugs/edge-cases found at a later date. However, this code will likely continue to be developed at the location described in the metadata.

Comments
  • Wren: Why does averaging of augmented Wyckoff positions happen inside the NN, after message passing?

    Wren: Why does averaging of augmented Wyckoff positions happen inside the NN, after message passing?

    https://www.science.org/doi/epdf/10.1126/sciadv.abn4117

    The categorization of Wyckoff positions depends on a choice of origin (50). Hence, there is not a unique mapping between the crystal structure and the Wyckoff representation. To ensure that the model is invariant to the choice of origin, we perform on-the-fly augmentation of Wyckoff positions with respect to this choice of origin (see Fig. 6). The augmented representations are averaged at the end of the message passing stage to provide a single representation of equivalent Wyckoff representations to the output network. By pooling at this point, we ensure that the model is invariant and that its training is not biased toward materials for which many equivalent Wyckoff representations exist.

    Probably a noob question here. I think I understand that it needs to happen at some point, but why does it need to happen after message passing? Why not implement this at the very beginning (i.e. in the input data representation)? Not so much doubtful of the choice as I am interested in the mechanics behind this choice. A topic that's come up in another context for me.

    question 
    opened by sgbaird 11
  • Add models that are equivalent to Roost

    Add models that are equivalent to Roost

    CrabNet and AtomSets-v0 are both equivalent to roost in that they are weighted set regression architectures. If aviary is to develop into a DeepChem for inorganic materials property prediction it might be nice to add implementations of these models.

    enhancement help wanted 
    opened by CompRhys 11
  • How to predict on new materials with saved pytorch file

    How to predict on new materials with saved pytorch file

    I used roost-example.py and saved the trained model in a pytorch file (e.g., roost.pt). I have tried to load this file and predict as follows:

    targets=["E_f"]
    tasks=["regression"]
    task_dict = dict(zip(targets, tasks))
    df = pd.read_csv('candidate_compositions.csv')
    X = CompositionData(df, elem_embedding = "matscholar200", task_dict = task_dict)
    
    model = torch.load('models/roost.pt')
    y_pred = model.predict(X)
    

    and I get the following output:

    Traceback (most recent call last):
      File "~/opt/anaconda3/envs/aviary/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3361, in get_loc
        return self._engine.get_loc(casted_key)
      File "pandas/_libs/index.pyx", line 76, in pandas._libs.index.IndexEngine.get_loc
      File "pandas/_libs/index.pyx", line 108, in pandas._libs.index.IndexEngine.get_loc
      File "pandas/_libs/hashtable_class_helper.pxi", line 5198, in pandas._libs.hashtable.PyObjectHashTable.get_item
      File "pandas/_libs/hashtable_class_helper.pxi", line 5206, in pandas._libs.hashtable.PyObjectHashTable.get_item
    KeyError: 'E_f'
    
    The above exception was the direct cause of the following exception:
    
    Traceback (most recent call last):
      File "roost-predict.py", line 12, in <module>
        y_pred = model.predict(X)
      File "~/opt/anaconda3/envs/aviary/lib/python3.7/site-packages/torch/autograd/grad_mode.py", line 28, in decorate_context
        return func(*args, **kwargs)
      File "~/opt/anaconda3/envs/aviary/lib/python3.7/site-packages/aviary/core.py", line 357, in predict
        data_loader, disable=True if not verbose else None
      File "~/opt/anaconda3/envs/aviary/lib/python3.7/site-packages/tqdm/std.py", line 1173, in __iter__
        for obj in iterable:
      File "~/opt/anaconda3/envs/aviary/lib/python3.7/site-packages/aviary/roost/data.py", line 126, in __getitem__
        targets.append(Tensor([row[target]]))
      File "~/opt/anaconda3/envs/aviary/lib/python3.7/site-packages/pandas/core/series.py", line 942, in __getitem__
        return self._get_value(key)
      File "~/opt/anaconda3/envs/aviary/lib/python3.7/site-packages/pandas/core/series.py", line 1051, in _get_value
        loc = self.index.get_loc(label)
      File "~/opt/anaconda3/envs/aviary/lib/python3.7/site-packages/pandas/core/indexes/base.py", line 3363, in get_loc
        raise KeyError(key) from err
    KeyError: 'E_f'
    

    Is it possible to add an example script to perform a prediction from a saved model?

    Thank you

    opened by sarah-allec 10
  • separate `fit` and `predict`

    separate `fit` and `predict`

    Thanks for the patience with all the posts.

    It seems that the train and test data is passed in all at once. Ideally, I'd like to use RooSt in an sklearn-esque "instantiate, fit, and predict" style; it's not urgent, timescale is about a month. Since I'm not familiar with the underlying code, thought I would ask before diving in. Any thoughts/suggestions on this?

    opened by sgbaird 7
  • Git Surgery Plan

    Git Surgery Plan

    In developing this code at several times I've been sloppy about committing large files to the git history. If we would like others to commit we would also like it to show a more accurate representation of their contribution in terms of relative LOC. Consequently we're going to carry out some git surgery before out first official release.

    The following is useful to identify large files in the git history:

    git rev-list --objects --all |   git cat-file --batch-check='%(objecttype) %(objectname) %(objectsize) %(rest)' |   sed -n 's/^blob //p' |   sort --numeric-sort --key=2 |   cut -c 1-12,41- |   $(command -v gnumfmt || echo numfmt) --field=2 --to=iec-i --suffix=B --padding=7 --round=nearest
    

    The following are some of the proposed clean-up commands.

    git filter-branch --force --index-filter "git rm -r --cached --ignore-unmatch data/" --prune-empty --tag-name-filter cat -- --all
    git filter-branch --force --index-filter "git rm -r --cached --ignore-unmatch *.pth.tar" --prune-empty --tag-name-filter cat -- --all
    git filter-branch --force --index-filter "git rm -r --cached --ignore-unmatch notebooks/" --prune-empty --tag-name-filter cat -- --all
    git filter-branch --force --index-filter "git rm -r --cached --ignore-unmatch examples/colab/" --prune-empty --tag-name-filter cat -- --all
    git filter-branch --force --index-filter "git rm -r --cached --ignore-unmatch results/" --prune-empty --tag-name-filter cat -- --all
    git filter-branch --force --index-filter "git rm -r --cached --ignore-unmatch examples/plots/" --prune-empty --tag-name-filter cat -- --all
    

    Colab example notebooks will be re-added but ensuring that their output is cleaned.

    code quality 
    opened by CompRhys 6
  • Instructions for use with custom datasets

    Instructions for use with custom datasets

    Hi @CompRhys, curious if you could give some tips on using Roost with a custom dataset. In my case, I have the chemical formulas as a list of str and the target properties, already separate by train+val vs. test datasets. I'm looking through the Colab notebook getting things set up.

    opened by sgbaird 5
  • TypeError: 'NoneType' object is not iterable

    TypeError: 'NoneType' object is not iterable

    I installed aviary using conda based on the instructions. However, when I run the command python examples/inputs/poscar2df.py, I met the following error:

    Traceback (most recent call last):
      File "examples/inputs/poscar2df.py", line 7, in <module>
        from pymatgen.core import Composition, Structure
      File "/(home path)/.conda/envs/aviary/lib/python3.7/site-packages/pymatgen/core/__init__.py", line 62, in <module>
        SETTINGS = _load_pmg_settings()
      File "/(home path)/.conda/envs/aviary/lib/python3.7/site-packages/pymatgen/core/__init__.py", line 52, in _load_pmg_settings
        d.update(d_yml)
    TypeError: 'NoneType' object is not iterable
    

    Any idea on how to solve this?

    invalid 
    opened by PinwenGuan 4
  • Roost Colab default Cuda version issue

    Roost Colab default Cuda version issue

    Tried running the Roost example Colab and got an error that seems it's probably related to Colab now using CUDA 11.2.

    OSError: libcudart.so.10.2: cannot open shared object file: No such file or directory
    
    stack trace
    OSError                                   Traceback (most recent call last)
    [<ipython-input-10-fd45f7ae93a3>](https://z3go6q25tqk-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20220217-060102-RC00_429270882#) in <module>()
          1 from aviary.roost.data import CompositionData, collate_batch as roost_cb
    ----> 2 from aviary.roost.model import Roost
          3 
          4 torch.manual_seed(0)  # ensure reproducible results
          5 
    
    4 frames
    [/usr/local/lib/python3.7/dist-packages/aviary/roost/model.py](https://z3go6q25tqk-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20220217-060102-RC00_429270882#) in <module>()
          4 
          5 from aviary.core import BaseModelClass
    ----> 6 from aviary.segments import (
          7     MessageLayer,
          8     ResidualNetwork,
    
    [/usr/local/lib/python3.7/dist-packages/aviary/segments.py](https://z3go6q25tqk-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20220217-060102-RC00_429270882#) in <module>()
          1 import torch
          2 import torch.nn as nn
    ----> 3 from torch_scatter import scatter_add, scatter_max, scatter_mean
          4 
          5 
    
    [/usr/local/lib/python3.7/dist-packages/torch_scatter/__init__.py](https://z3go6q25tqk-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20220217-060102-RC00_429270882#) in <module>()
         14     spec = cuda_spec or cpu_spec
         15     if spec is not None:
    ---> 16         torch.ops.load_library(spec.origin)
         17     elif os.getenv('BUILD_DOCS', '0') != '1':  # pragma: no cover
         18         raise ImportError(f"Could not find module '{library}_cpu' in "
    
    [/usr/local/lib/python3.7/dist-packages/torch/_ops.py](https://z3go6q25tqk-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20220217-060102-RC00_429270882#) in load_library(self, path)
        108             # static (global) initialization code in order to register custom
        109             # operators with the JIT.
    --> 110             ctypes.CDLL(path)
        111         self.loaded_libraries.add(path)
        112 
    
    [/usr/lib/python3.7/ctypes/__init__.py](https://z3go6q25tqk-496ff2e9c6d22116-0-colab.googleusercontent.com/outputframe.html?vrz=colab-20220217-060102-RC00_429270882#) in __init__(self, name, mode, handle, use_errno, use_last_error)
        362 
        363         if handle is None:
    --> 364             self._handle = _dlopen(self._name, mode)
        365         else:
        366             self._handle = handle
    
    OSError: libcudart.so.10.2: cannot open shared object file: No such file or directory
    
    opened by jdagdelen 4
  • Type hints

    Type hints

    Lays the ground work for #29 and closes #30.

    These changes are all py37 compatible (unless I made a mistake). @CompRhys You may want to try this branch on Colab just to be sure.

    code quality types 
    opened by janosh 3
  • Suggested parameters for a

    Suggested parameters for a "performance" submission to matbench

    Curious if you have any suggestions on a general set of parameters that you would use for submission to matbench. For example, number of epochs. Right now, I've been using the defaults from the Colab notebook (just for the matbench_expt_gap task).

    opened by sgbaird 3
  • Better model.__repr__()

    Better model.__repr__()

    model.__repr__() now includes trainable params and epoch count. Moved from Wren + Roost having identical implementations to SSOT on BaseModelClass so CGCNN now has custom __repr__ too.

    Also confines coverage reporting in CI to package files (i.e. exclude test files).

    opened by janosh 3
  • Refactor `aviary/utils.py`

    Refactor `aviary/utils.py`

    aviary/utils.py is definitely in need of an overhaul. Was quite hard to type it in #31 and flake8 complained about surpassing max-complexity, both of which are bad signs for API design.

    code quality 
    opened by janosh 1
Releases(v0.0.4)
  • v0.0.4(Jul 1, 2022)

  • v0.0.3(Apr 20, 2022)

    This is a tag of the code used to generate results shown in science advances.

    After this tag in order to make the LOC more realistic git surgery was performed. This release is therefore also serves as a backup of the code before the clean-up commands were carried out.

    Source code(tar.gz)
    Source code(zip)
Owner
Rhys Goodall
PhD Student at the University of Cambridge working on the application of Machine Learning to Materials Discovery.
Rhys Goodall
A demo titiler for Sentinel 2 Digital Twin dataset

This is a DEMO custom api built on top of TiTiler to create Web Map Tiles from the Digital Twin Sentinel-2 COG created by Sinergise

Development Seed 26 May 21, 2022
A simple notebook to stream torrent files directly to Google Drive using Google Colab.

Colab-Torrent-to-Drive Originally by FKLC, this is a simple notebook to stream torrent files directly to Google Drive using Google Colab. You can eith

1 Jan 11, 2022
A feishu bot daily push arxiv latest articles.

arxiv-feishu-bot We develop A simple feishu bot script daily pushes arxiv latest articles. His effect is as follows: Of course, you can also use other

huchi 6 Apr 06, 2022
Utility for converting IP Fabric webhooks into a Teams format

IP Fabric Webhook Integration for Microsoft Teams and/or Slack Setup IP Fabric Setup Go to Settings Webhooks Add webhook Provide a name URL will b

Community Fabric 1 Jan 26, 2022
Automatically mass follows tons of NameMC profiles.

Automatically mass follows tons of NameMC profiles. (Creates REAL traffic to your profile)

Jam 3 Jun 29, 2022
📢 Video Chat Stream Telegram Bot. Can ⏳ Stream Live Videos, Radios, YouTube Videos & Telegram Video Files On Your Video Chat Of Channels & Groups !

Telegram Video Chat Bot (Beta) 📢 Video Chat Stream Telegram Bot 🤖 Can Stream Live Videos, Radios, YouTube Videos & Telegram Video Files On Your Vide

brut✘⁶⁹ // ユスフ 15 Dec 24, 2022
An Open-Source Discord bot created to provide basic functionality which should be in every discord guild. We use this same bot with additional configurations for our guilds.

A Discord bot completely written to be taken from the source and built according to your own custom needs. This bot supports some core features and is

Tesseract Coding 14 Jan 11, 2022
Bot per la chat live del corso di sistemi operativi UniBO

cravattaBot TL;DR: Ho fatto un bot telegram per la chat del corso di sistemi. Indice Installazione e prerequisiti Prerequisiti Installazione Setup Con

Alessandro Frau 3 May 21, 2022
A drop-in vanilla discord.py cog to add slash command support with little to no code modifications

discord.py /slash cog A drop-in vanilla discord.py cog that acts as a translation layer to add slash command support with little to no code modificati

marshall 3 Jun 01, 2022
Discord nuke bot with python

Discord-nuke-bot 🇷🇺 🇷🇺 🇷🇺 🇷🇺 🇷🇺 TODO: Добавить команду: Удаления всех ролей Спама каналами Спама во все каналы @everyone Удаления всего aka

Nikita Maykov 10 Oct 14, 2022
My telegram bot to download Instagram Profiles

Instagram Profile Get for Telegram My telegram bot to download Instagram Profiles First you have to get a telegrm bot api key from @BotFather Then you

Ali Yoonesi 2 Sep 22, 2022
A simple Python API wrapper for Cloudflare Stream's API.

python-cloudflare-stream A basic Python API wrapper for working with Cloudflare Stream. Arbington.com started off using Cloudflare Stream. We used the

Arbington 3 Sep 08, 2022
数字货币动态趋势网格,随着行情变动。目前实盘月化10%。目前支持币安,未来上线火币、OKEX。

数字货币动态趋势网格,随着行情变动。目前实盘月化10%。目前支持币安,未来上线火币、OKEX。

幸福村的码农 98 Dec 27, 2022
A small bot to interact with the reddit API. Get top viewers and update the sidebar widget.

LiveStream_Reddit_Bot Get top twitch and facebook stream viewers for a game and update the sidebar widget and old reddit sidebar to show your communit

Tristan Wise 1 Nov 21, 2021
Python client for the Socrata Open Data API

sodapy sodapy is a python client for the Socrata Open Data API. Installation You can install with pip install sodapy. If you want to install from sour

Cristina 368 Dec 09, 2022
An async-ready Python wrapper around FerrisChat's API.

FerrisWheel An async-ready Python wrapper around FerrisChat's API. Installation Instructions Linux: $ python3.9 -m pip install -U ferriswheel Python 3

FerrisChat 8 Feb 08, 2022
A custom rom post bot for Telegram.

Rom Poster Bot A simple Post Bot written in Python using pyTelegramBotAPI to post rom updates to telegram whenever you need. Made by lazy peep for laz

Prajwal 6 Nov 03, 2022
Fastest Tiktok Username checker on site.

Tiktok Username Checker Fastest Tiktok Username checker on site

sql 3 Jun 19, 2021
A Telegram bot to send messages in Telegram groups or Channels using bots anonymously.

Group-chatting-bot A bot to send messeges to group using bot telegram bot ❤️ Support Made with Python3

Pyrogramers 16 Nov 06, 2022
Projeto de estudantes do primeiro período do CIn - UFPE voltado para a criação de um sistema interativo no fechamento da disciplina IF669 - Introdução a Programação.

Projeto Game: Dona da Lua Alunos: Beatriz Férre Clara Kenderessy Matheus Silva Rafael Baltar Roseane Oliveira Samuel Marsaro Sinopse O Cebolinha apron

Maria Clara Kenderessy 5 Dec 20, 2021