An SE(3)-invariant autoencoder for generating the periodic structure of materials

Related tags

Deep Learningcdvae
Overview

Crystal Diffusion Variational AutoEncoder

This software implementes Crystal Diffusion Variational AutoEncoder (CDVAE), which generates the periodic structure of materials.

It has several main functionalities:

  • Generate novel, stable materials by learning from a dataset containing existing material structures.
  • Generate materials by optimizing a specific property in the latent space, i.e. inverse design.

[Paper] [Datasets]

Table of Contents

Installation

The easiest way to install prerequisites is via conda.

Pre-install step

Install conda-merge:

pip install conda-merge

Check that you can invoke conda-merge by running conda-merge -h.

GPU machines

Run the following command to install the environment:

conda-merge env.common.yml env.gpu.yml > env.yml
conda env create -f env.yml

Activate the conda environment with conda activate cdvae.

Install this package with pip install -e ..

CPU-only machines

conda-merge env.common.yml env.cpu.yml > env.yml
conda env create -f env.yml
conda activate cdvae
pip install -e .

Setting up environment variables

Make a copy of the .env.template file and rename it to .env. Modify the following environment variables in .env.

  • PROJECT_ROOT: path to the folder that contains this repo
  • HYDRA_JOBS: path to a folder to store hydra outputs
  • WABDB: path to a folder to store wabdb outputs

Datasets

All datasets are directly available on data/ with train/valication/test splits. You don't need to download them again. If you use these datasets, please consider to cite the original papers from which we curate these datasets.

Find more about these datasets by going to our Datasets page.

Training CDVAE

Training without a property predictor

To train a CDVAE, run the following command:

python cdvae/run.py data=perov expname=perov

To use other datasets, use data=carbon and data=mp_20 instead. CDVAE uses hydra to configure hyperparameters, and users can modify them with the command line or configure files in conf/ folder.

After training, model checkpoints can be found in $HYDRA_JOBS/singlerun/YYYY-MM-DD/expname.

Training with a property predictor

Users can also additionally train an MLP property predictor on the latent space, which is needed for the property optimization task:

python cdvae/run.py data=perov expname=perov model.predict_property=True

The name of the predicted propery is defined in data.prop, as in conf/data/perov.yaml for Perov-5.

Generating materials

To generate materials, run the following command:

python scripts/evaluate.py --model_path MODEL_PATH --tasks recon gen opt

MODEL_PATH will be the path to the trained model. Users can choose one or several of the 3 tasks:

  • recon: reconstruction, reconstructs all materials in the test data. Outputs can be found in eval_recon.ptl
  • gen: generate new material structures by sampling from the latent space. Outputs can be found in eval_gen.pt.
  • opt: generate new material strucutre by minimizing the trained property in the latent space (requires model.predict_property=True). Outputs can be found in eval_opt.pt.

eval_recon.pt, eval_gen.pt, eval_opt.pt are pytorch pickles files containing multiple tensors that describes the structures of M materials batched together. Each material can have different number of atoms, and we assume there are in total N atoms. num_evals denote the number of Langevin dynamics we perform for each material.

  • frac_coords: fractional coordinates of each atom, shape (num_evals, N, 3)
  • atom_types: atomic number of each atom, shape (num_evals, N)
  • lengths: the lengths of the lattice, shape (num_evals, M, 3)
  • angles: the angles of the lattice, shape (num_evals, M, 3)
  • num_atoms: the number of atoms in each material, shape (num_evals, M)

Evaluating model

To compute evaluation metrics, run the following command:

python scripts/compute_metrics.py --root_path MODEL_PATH --tasks recon gen opt

MODEL_PATH will be the path to the trained model. All evaluation metrics will be saved in eval_metrics.json.

Authors and acknowledgements

The software is primary written by Tian Xie, with signficant contributions from Xiang Fu.

The GNN codebase and many utility functions are adapted from the ocp-models by the Open Catalyst Project. Especially, the GNN implementations of DimeNet++ and GemNet are used.

The main structure of the codebase is built from NN Template.

For the datasets, Perov-5 is curated from Perovksite water-splitting, Carbon-24 is curated from AIRSS data for carbon at 10GPa, MP-20 is curated from Materials Project.

Citation

Please consider citing the following paper if you find our code & data useful.

@article{xie2021crystal,
  title={Crystal Diffusion Variational Autoencoder for Periodic Material Generation},
  author={Xie, Tian and Fu, Xiang and Ganea, Octavian-Eugen and Barzilay, Regina and Jaakkola, Tommi},
  journal={arXiv preprint arXiv:2110.06197},
  year={2021}
}

Contact

Please leave an issue or reach out to Tian Xie (txie AT csail DOT mit DOT edu) if you have any questions.

Owner
Tian Xie
Postdoc at MIT CSAIL. Machine learning algorithms for materials, drugs, and beyond.
Tian Xie
Code for the paper "Implicit Representations of Meaning in Neural Language Models"

Implicit Representations of Meaning in Neural Language Models Preliminaries Create and set up a conda environment as follows: conda create -n state-pr

Belinda Li 39 Nov 03, 2022
Face Mask Detector by live camera using tensorflow-keras, openCV and Python

Face Mask Detector 😷 by Live Camera Detecting masked or unmasked faces by live camera with percentange of mask occupation About Project: This an Arti

Karan Shingde 2 Apr 04, 2022
Mmrotate - OpenMMLab Rotated Object Detection Benchmark

OpenMMLab website HOT OpenMMLab platform TRY IT OUT 📘 Documentation | 🛠️ Insta

OpenMMLab 1.2k Jan 04, 2023
Solver for Large-Scale Rank-One Semidefinite Relaxations

STRIDE: spectrahedral proximal gradient descent along vertices A Solver for Large-Scale Rank-One Semidefinite Relaxations About STRIDE is designed for

48 Dec 20, 2022
Fast Neural Representations for Direct Volume Rendering

Fast Neural Representations for Direct Volume Rendering Sebastian Weiss, Philipp Hermüller, Rüdiger Westermann This repository contains the code and s

Sebastian Weiss 20 Dec 03, 2022
Unified MultiWOZ evaluation scripts for the context-to-response task.

MultiWOZ Context-to-Response Evaluation Standardized and easy to use Inform, Success, BLEU ~ See the paper ~ Easy-to-use scripts for standardized eval

Tomáš Nekvinda 38 Dec 13, 2022
This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports"

Introduction: X-Ray Report Generation This repository is for our EMNLP 2021 paper "Automated Generation of Accurate & Fluent Medical X-ray Reports". O

no name 36 Dec 16, 2022
Official code of paper: MovingFashion: a Benchmark for the Video-to-Shop Challenge

SEAM Match-RCNN Official code of MovingFashion: a Benchmark for the Video-to-Shop Challenge paper Installation Requirements: Pytorch 1.5.1 or more rec

HumaticsLAB 31 Oct 10, 2022
Image morphing without reference points by applying warp maps and optimizing over them.

Differentiable Morphing Image morphing without reference points by applying warp maps and optimizing over them. Differentiable Morphing is machine lea

Alex K 380 Dec 19, 2022
Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

GradTTS Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv) About this repo This is an unoffic

HeyangXue1997 103 Dec 23, 2022
Churn-Prediction-Project - In this project, a churn prediction model is developed for a private bank as a term project for Data Mining class.

Churn-Prediction-Project In this project, a churn prediction model is developed for a private bank as a term project for Data Mining class. Project in

1 Jan 03, 2022
ML course - EPFL Machine Learning Course, Fall 2021

EPFL Machine Learning Course CS-433 Machine Learning Course, Fall 2021 Repository for all lecture notes, labs and projects - resources, code templates

EPFL Machine Learning and Optimization Laboratory 1k Jan 04, 2023
MLP-Numpy - A simple modular implementation of Multi Layer Perceptron in pure Numpy.

MLP-Numpy A simple modular implementation of Multi Layer Perceptron in pure Numpy. I used the Iris dataset from scikit-learn library for the experimen

Soroush Omranpour 1 Jan 01, 2022
Minimal implementation of PAWS (https://arxiv.org/abs/2104.13963) in TensorFlow.

PAWS-TF 🐾 Implementation of Semi-Supervised Learning of Visual Features by Non-Parametrically Predicting View Assignments with Support Samples (PAWS)

Sayak Paul 43 Jan 08, 2023
Fashion Recommender System With Python

Fashion-Recommender-System Thr growing e-commerce industry presents us with a la

Omkar Gawade 2 Feb 02, 2022
Toolbox to analyze temporal context invariance of deep neural networks

PyTCI A toolbox that estimates the integration window of a sensory response using the "Temporal Context Invariance" paradigm (TCI). The TCI method Int

4 Oct 23, 2022
Subnet Replacement Attack: Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks

Subnet Replacement Attack: Towards Practical Deployment-Stage Backdoor Attack on Deep Neural Networks Official implementation of paper Towards Practic

Xiangyu Qi 8 Dec 30, 2022
Realtime YOLO Monster Detection With Non Maximum Supression

Realtime-YOLO-Monster-Detection-With-Non-Maximum-Supression Table of Contents In

5 Oct 07, 2022
Tutorials, assignments, and competitions for MIT Deep Learning related courses.

MIT Deep Learning This repository is a collection of tutorials for MIT Deep Learning courses. More added as courses progress. Tutorial: Deep Learning

Lex Fridman 9.5k Jan 07, 2023