Supporting code for "Autoregressive neural-network wavefunctions for ab initio quantum chemistry".

Overview

naqs-for-quantum-chemistry

Generic badge MIT License


This repository contains the codebase developed for the paper Autoregressive neural-network wavefunctions for ab initio quantum chemistry.


(a) Architecture of a neural autoregressive quantum state (NAQS) (b) Energy surface of N2

TL;DR

Certain parts of the notebooks relating to generating molecular data are currently not working due to updates to the underlying OpenFermion and Psi4 packages (I'll fix it!) - however the experimental results of NAQS can still be reproduced as we also provide pre-generated data in this repository.

If you don't care for now, and you just want to see it running, here are two links to notebooks that will set-up and run on Colab. Just note that Colab will not have enough memory to run experiments on the largest molecules we considered.

  • run_naqs.ipynb Open In Colab: Run individual experiments or batches of experiments, including those to recreate published results.

  • generate_molecular_data_and_baselines.ipynb Open In Colab:

    1. Create the [molecule].hdf5 and [molecule]_qubit_hamiltonian.pkl files required (these are provided for molecules used in the paper in the molecules directory.)
    2. Solve these molecules using various canconical QC methods using Psi4.

Overview

Quantum chemistry with neural networks

A grand challenge of ab-inito quantum chemistry (QC) is to solve the many-body Schrodinger equation describing interaction of heavy nuclei and orbiting electrons. Unfortunatley, this is an extremely (read, NP) hard problem, and so a significant amout of research effort has, and continues, to be directed towards numerical methods in QC. Typically, these methods work by optimising the wavefunction in a basis set of "Slater determinants". (In practice these are anti-symetterised tensor products of single-electron orbitals, but for our purposes let's not worry about the details.) Typically, the number of Slater determinants - and so the complexity of optimisation - grows exponentially with the system size, but recently machine learning (ML) has emerged as a possible tool with which to tackle this seemingly intractable scaling issue.

Translation/disclaimer: we can use ML and it has displayed some promising properties, but right now the SOTA results still belong to the established numerical methods (e.g. coupled-cluster) in practical settings.

Project summary

We follow the approach proposed by Choo et al. to map the exponentially complex system of interacting fermions to an equivilent (and still exponentially large) system of interacting qubits (see their or our paper for details). The advantage being that we can then apply neural network quantum states (NNQS) originally developed for condensed matter physics (CMP) (with distinguishable interacting particles) to the electron structure calculations (with indistinguishable electrons and fermionic anti-symettries).

This project proposes that simply applying techniques from CMP to QC will inevitably fail to take advantage of our significant a priori knowledge of molecular systems. Moreover, the stochastic optimisation of NNQS relies on repeatedly sampling the wavefunction, which can be prohibitively expensive. This project is a sandbox for trialling different NNQS, in particular an ansatz based on autoregressive neural networks that we present in the paper. The major benefits of our approach are that it:

  1. allows for highly efficient sampling, especially of the highly asymmetric wavefunction typical found in QC,
  2. allows for physical priors - such as conservation of electron number, overall spin and possible symettries - to be embedded into the network without sacrificing expressibility.

Getting started

In this repo

notebooks
  • run_naqs.ipynb Open In Colab: Run individual experiments or batches of experiments, including those to recreate published results.

  • generate_molecular_data_and_baselines.ipynb Open In Colab:

    1. Create the [molecule].hdf5 and [molecule]_qubit_hamiltonian.pkl files required (these are provided for molecules used in the paper in the molecules directory.)
    2. Solve these molecules using various canconical QC methods using Psi4.
experiments

Experimental scripts, including those to reproduced published results, for NAQS and Psi4.

molecules

The molecular data required to reproduce published results.

src / src_cpp

Python and cython source code for the main codebase and fast calculations, respectively.

Running experiments

Further details are provided in the run_naqs.ipynb notebook, however the published experiments can be run using the provided batch scripts.

>>> experiments/bash/naqs/batch_train.sh 0 LiH

Here, 0 is the GPU number to use (if one is available, otherwise the CPU will be used by default) and LiH can be replaced by any folder in the molecules directory. Similarly, the experimental ablations can be run using the corresponding bash scripts.

>>> experiments/bash/naqs/batch_train_no_amp_sym.sh 0 LiH
>>> experiments/bash/naqs/batch_train_no_mask.sh 0 LiH
>>> experiments/bash/naqs/batch_train_full_mask.sh 0 LiH

Requirements

The underlying neural networks require PyTorch. The molecular systems are typically handled by OpenFermion with the backend calculations and baselines requiring and Psi4. Note that this code expects OpenFermion 0.11.0 and will need refactoring to work with newer versions. Otherwise, all other required packages - numpy, matplotlib, seaborn if you want pretty plots etc - are standard. However, to be concrete, the linked Colab notebooks will provide an environment in which the code can be run.

Reference

If you find this project or the associated paper useful, it can be cited as below.

@article{barrett2021autoregressive,
  title={Autoregressive neural-network wavefunctions for ab initio quantum chemistry},
  author={Barrett, Thomas D and Malyshev, Aleksei and Lvovsky, AI},
  journal={arXiv preprint arXiv:2109.12606},
  year={2021}
}
You might also like...
TensorFlow code for the neural network presented in the paper:
TensorFlow code for the neural network presented in the paper: "Structural Language Models of Code" (ICML'2020)

SLM: Structural Language Models of Code This is an official implementation of the model described in: "Structural Language Models of Code" [PDF] To ap

Inference code for "StylePeople: A Generative Model of Fullbody Human Avatars" paper. This code is for the part of the paper describing video-based avatars.

NeuralTextures This is repository with inference code for paper "StylePeople: A Generative Model of Fullbody Human Avatars" (CVPR21). This code is for

A code generator from ONNX to PyTorch code

onnx-pytorch Generating pytorch code from ONNX. Currently support onnx==1.9.0 and torch==1.8.1. Installation From PyPI pip install onnx-pytorch From

This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code for training a DPR model then continuing training with RAG.

KGI (Knowledge Graph Induction) for slot filling This is the code for our KILT leaderboard submission to the T-REx and zsRE tasks. It includes code fo

Convert Python 3 code to CUDA code.

Py2CUDA Convert python code to CUDA. Usage To convert a python file say named py_file.py to CUDA, run python generate_cuda.py --file py_file.py --arch

Empirical Study of Transformers for Source Code & A Simple Approach for Handling Out-of-Vocabulary Identifiers in Deep Learning for Source Code

Transformers for variable misuse, function naming and code completion tasks The official PyTorch implementation of: Empirical Study of Transformers fo

Reference implementation of code generation projects from Facebook AI Research. General toolkit to apply machine learning to code, from dataset creation to model training and evaluation. Comes with pretrained models.

This repository is a toolkit to do machine learning for programming languages. It implements tokenization, dataset preprocessing, model training and m

Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

CoProtector Code for the prototype tool in our paper "CoProtector: Protect Open-Source Code against Unauthorized Training Usage with Data Poisoning".

Low-code/No-code approach for deep learning inference on devices
Low-code/No-code approach for deep learning inference on devices

EzEdgeAI A concept project that uses a low-code/no-code approach to implement deep learning inference on devices. It provides a componentized framewor

Comments
  • pip installation

    pip installation

    Great code. It runs very smoothly and clearly outperforms the results in Choo et al. Would you consider re-engineering the code slightly to allow for a pipy installation?

    opened by kastoryano 0
Releases(v1.0.0)
Owner
Tom Barrett
Research Scientist @ InstaDeep, formerly postdoctoral researcher @ Oxford. RL, GNN's, quantum physics, optical computing and the intersection thereof.
Tom Barrett
🥇Samsung AI Challenge 2021 1등 솔루션입니다🥇

MoT - Molecular Transformer Large-scale Pretraining for Molecular Property Prediction Samsung AI Challenge for Scientific Discovery This repository is

Jungwoo Park 44 Dec 03, 2022
Official pytorch implementation of "Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization" ACMMM 2021 (Oral)

Feature Stylization and Domain-aware Contrastive Loss for Domain Generalization This is an official implementation of "Feature Stylization and Domain-

22 Sep 22, 2022
Fast Neural Representations for Direct Volume Rendering

Fast Neural Representations for Direct Volume Rendering Sebastian Weiss, Philipp Hermüller, Rüdiger Westermann This repository contains the code and s

Sebastian Weiss 20 Dec 03, 2022
Robust Self-augmentation for NER with Meta-reweighting

Robust Self-augmentation for NER with Meta-reweighting

Lam chi 17 Nov 22, 2022
Fast Style Transfer in TensorFlow

Fast Style Transfer in TensorFlow Add styles from famous paintings to any photo in a fraction of a second! You can even style videos! It takes 100ms o

Jefferson 5 Oct 24, 2021
Code for "LASR: Learning Articulated Shape Reconstruction from a Monocular Video". CVPR 2021.

LASR Installation Build with conda conda env create -f lasr.yml conda activate lasr # install softras cd third_party/softras; python setup.py install;

Google 157 Dec 26, 2022
Material for my PyConDE & PyData Berlin 2022 Talk "5 Steps to Speed Up Your Data-Analysis on a Single Core"

5 Steps to Speed Up Your Data-Analysis on a Single Core Material for my talk at the PyConDE & PyData Berlin 2022 Description Your data analysis pipeli

Jonathan Striebel 9 Dec 12, 2022
A small library of 3D related utilities used in my research.

utils3D A small library of 3D related utilities used in my research. Installation Install via GitHub pip install git+https://github.com/Steve-Tod/util

Zhenyu Jiang 8 May 20, 2022
🥈78th place in Riiid Answer Correctness Prediction competition

Riiid Answer Correctness Prediction Introduction This repository is the code that placed 78th in Riiid Answer Correctness Prediction competition. Requ

Jungwoo Park 10 Jul 14, 2022
A Python module for the generation and training of an entry-level feedforward neural network.

ff-neural-network A Python module for the generation and training of an entry-level feedforward neural network. This repository serves as a repurposin

Riadh 2 Jan 31, 2022
PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

Study-CSRNet-pytorch This is the PyTorch version repo for CSRNet: Dilated Convolutional Neural Networks for Understanding the Highly Congested Scenes

0 Mar 01, 2022
PyTorch implementation of SimSiam: Exploring Simple Siamese Representation Learning

SimSiam: Exploring Simple Siamese Representation Learning This is a PyTorch implementation of the SimSiam paper: @Article{chen2020simsiam, author =

Facebook Research 834 Dec 30, 2022
Implementation of Gans

GAN Generative Adverserial Networks are an approach to generative data modelling using Deep learning methods. I have currently implemented : DCGAN on

Sibam Parida 5 Sep 07, 2021
Job-Recommend-Competition - Vectorwise Interpretable Attentions for Multimodal Tabular Data

SiD - Simple Deep Model Vectorwise Interpretable Attentions for Multimodal Tabul

Jungwoo Park 40 Dec 22, 2022
A Collection of LiDAR-Camera-Calibration Papers, Toolboxes and Notes

A Collection of LiDAR-Camera-Calibration Papers, Toolboxes and Notes

443 Jan 06, 2023
Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech"

GradTTS Unofficial Pytorch implementation of "Grad-TTS: A Diffusion Probabilistic Model for Text-to-Speech" (arxiv) About this repo This is an unoffic

HeyangXue1997 103 Dec 23, 2022
Process JSON files for neural recording sessions using Medtronic's BrainSense Percept PC neurostimulator

percept_processing This code processes JSON files for streamed neural data using Medtronic's Percept PC neurostimulator with BrainSense Technology for

Maria Olaru 3 Jun 06, 2022
Code repository for "Free View Synthesis", ECCV 2020.

Free View Synthesis Code repository for "Free View Synthesis", ECCV 2020. Setup Install the following Python packages in your Python environment - num

Intelligent Systems Lab Org 253 Dec 07, 2022
Atif Hassan 103 Dec 14, 2022
An open-source outlier detection package by Getcontact Data Team

pyfbad The pyfbad library supports anomaly detection projects. An end-to-end anomaly detection application can be written using the source codes of th

Teknasyon Tech 41 Dec 27, 2022