Dual Adaptive Sampling for Machine Learning Interatomic potential.

Related tags

Machine Learningdas
Overview

DAS

Dual Adaptive Sampling for Machine Learning Interatomic potential.

How to cite

If you use this code in your research, please cite this using: Hongliang Yang, Yifan Zhu, Erting Dong, Yabei Wu, Jiong Yang, and Wenqing Zhang. Dual adaptive sampling and machine learning interatomic potentials for modeling materials with chemical bond hierarchy. Phys. Rev. B 104, 094310 (2021).

Install

Install pymtp

You should first install the python interface for mtp: https://github.com/hlyang1992/pymtp

Install das

You can download the code by

git clone https://github.com/hlyang1992/das
cd das
cp -r <path-to-mlip-2>/untrained_mtps/*.mtp das/utils/untrained_mtps

Then remove the redundant settings from each mtp file. Only the following settings can be retained for each mtp file:

radial_funcs_count = 
alpha_moments_count = 
alpha_index_basic_count = 
alpha_index_basic = 
alpha_index_times_count = 
alpha_index_times = 
alpha_scalar_moments = 
alpha_moment_mapping =

Install das by

cd <path-to-das>
pip install -r requirements.txt
pip install .

Usage

das  config_dir  job_name

Configuration

The configuration directory config_dir must contain the configuration file conf.yaml, which controls all sampling processes. The conf.yaml file should look like the following:

"global_settings":

"machine_settings":

"selector_settings": {} 

"labeler_settings":

"trainer_settings":

"sampler_settings":

"init_conf_setting":

"iter_params_template":

"iter_params":
  • global_settings:
"global_settings":
  # The elements in the system, the order of the elements does not matter, the program automatically numbers the 
  # atomic types according to their atomic number from smallest to largest.
  "unique_elements": [ "Co", "Sb" ]
  # path to VASP Pseudopotential Database, see detail at https://wiki.fysik.dtu.dk/ase/ase/calculators/vasp.html#vasp
  "vasp_pp_path": "path_to_directory" 
  • machine_settings:

All time-consuming computational tasks such as sampling, labeling, and training can be dispatched to designated machines via ssh. Currently only LSF is supported and migration to other job management systems is very easy.

"machine_settings":
  "machine_1":
    # The supported machine types are now: `machine_lsf`, `machine_shell`
    "machine_type": "machine_lsf"
    "host": "ip address"
    "user": "username"
    "password": "password"
    # Exclude these nodes when submitting tasks.
    "bad_nodes": [ ] # #BSUB -R "hname!={{node}}"
    "port": 22
    # number of cores for each task
    "n_cores": 40 # #BSUB -n {{ncores}}
    "n_tasks": 40 # The maximum number of tasks to run simultaneously.
    "q_name": "short" # #BSUB -q {{q_name}}
    "env_source_file": "env.sh" # env.sh is in the config_dir
    "run_dir": "path-to-run-directory-in-target"
    "extra_params":
      "vasp_cmd": "mpiexec.hydra -machinefile $LSB_DJOB_HOSTFILE -np $NP vasp"
      "lmp_cmd": "mpiexec.hydra -machinefile $LSB_DJOB_HOSTFILE -np $NP lmp_mlp"
      "mlip_cmd": "mpiexec.hydra -machinefile $LSB_DJOB_HOSTFILE -np $NP mlp train"
      "python_cmd": "absolute path to python path"
  "machine_2":
    # setting for machchine_2
    "machine_type": "machine_lsf"
    # ...

You should prepare a file to set the environment variables. The program will source this file to set the environment variables after connecting to the machine via ssh. For technical reasons please see: The remote shell environment doesn’t match interactive shells

  • sampler_settings
"scale_1":
  "kind": "scale_box"
  "scale_factors": [0.998, 0.9985, 0.999]
"scale_2":
  "kind": "scale_box"
  "scale_factors": [[0.998, 0.9985, 0.999, 0.997], # a
                    [1.002, 1.003, 1.004, 1.005],  # b
                    [0.997, 0.995, 0.999, 0.996]] # c
"nvt_0": 
  "kind": "lmp_model_sampler"
  "max_number_confs": 5
  "min_number_confs": 0
  "machine": "machine_1"
  "lmp_vars":
    "temp": [ 100, 150 ]
    "steps": [ 10000 ]
    "nevery": [ 20 ]
    "prev_steps": [ 0 ]
 
"npt_0": 
  "kind": "lmp_model_sampler"
  "max_number_confs": 5
  "min_number_confs": 0
  "machine": "machine_2"
  "lmp_vars":
    "temp": [ 100, 150 ]
    "steps": [ 10000 ]
    "nevery": [ 20 ]
    "press": [100, 200] # bar
    "prev_steps": [ 0 ]
  • "labeler_settings"

We use ase to generate input files (INCAR, POTCAR, KPOINTS) for VASP calculation. Please see detail at Ase vasp calculator

"labeler_settings":
  "vasp":
    "kind": "vasp"
    "machine": "ty_label"
    "vasp_parms":
      "xc": "pbe"
      "prec": "A"
      # other setting for vasp calculations
  • "trainer_settings"
"trainer_settings":
  "train_5_model":
    "kind": "mtp_trainer"
    "machine": "ty_train" 
    "model_index": 18 
    "min_dist": 1.39 
    "max_dist": 5.0
    "n_models": 5 
    "train_from_prev_model": true 
  • init_conf_setting:
"init_conf_setting":
  "-1": [ "init_MD.cfg" ]
  "-2": [ "init_1.vasp" ]
  "-3": [ "init_2.vasp" ]
  • iter_params_template:
"iter_params_template":
  "0":
    "init_conf": [ -1 ]
    "sampler": [ ]
    "selector": [ ]
    "labeler": [ ]
    "trainer": [ "train_5_model" ]
  "10":
    "init_conf": [ -2 ]
    "sampler": [ "scale_0", "nvt_0" ]
    "selector": [ ]
    "labeler": [ "vasp" ]
    "trainer": [ "train_5_model" ]
  "20":
    "init_conf": [ -3 ]
    "sampler": [ "npt_0"]
    "selector": [ ]
    "labeler": [ "vasp" ]
    "trainer": [ "train_5_model" ]
  "30":
    "init_conf": [ -2,-3 ]
    "sampler": [ "npt_0"]
    "selector": [ ]
    "labeler": [ "vasp" ]
    "trainer": [ "train_5_model" ]
  • iter_params:
"iter_params":
  [
    [ "0" ],
    # If the last one is LOOP, repeat all the previous ones until convergence.
    ["10", "LOOP"], 
    ["30", "LOOP"],
    ["10", "10"]  
    ["20"],
  ]
Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Model Agnostic Confidence Estimator (MACEST) - A Python library for calibrating Machine Learning models' confidence scores

Oracle 95 Dec 28, 2022
A game theoretic approach to explain the output of any machine learning model.

SHAP (SHapley Additive exPlanations) is a game theoretic approach to explain the output of any machine learning model. It connects optimal credit allo

Scott Lundberg 18.2k Jan 02, 2023
Simple and flexible ML workflow engine.

This is a simple and flexible ML workflow engine. It helps to orchestrate events across a set of microservices and create executable flow to handle requests. Engine is designed to be configurable wit

Katana ML 295 Jan 06, 2023
Empyrial is a Python-based open-source quantitative investment library dedicated to financial institutions and retail investors

By Investors, For Investors. Want to read this in Chinese? Click here Empyrial is a Python-based open-source quantitative investment library dedicated

Santosh 640 Dec 31, 2022
A scikit-learn based module for multi-label et. al. classification

scikit-multilearn scikit-multilearn is a Python module capable of performing multi-label learning tasks. It is built on-top of various scientific Pyth

802 Jan 01, 2023
Merlion: A Machine Learning Framework for Time Series Intelligence

Merlion is a Python library for time series intelligence. It provides an end-to-end machine learning framework that includes loading and transforming data, building and training models, post-processi

Salesforce 2.8k Jan 05, 2023
Retrieve annotated intron sequences and classify them as minor (U12-type) or major (U2-type)

(intron I nterrogator and C lassifier) intronIC is a program that can be used to classify intron sequences as minor (U12-type) or major (U2-type), usi

Graham Larue 4 Jul 26, 2022
This jupyter notebook project was completed by me and my friend using the dataset from Kaggle

ARM This jupyter notebook project was completed by me and my friend using the dataset from Kaggle. The world Happiness 2017, which ranks 155 countries

1 Jan 23, 2022
Time-series momentum for momentum investing strategy

Time-series-momentum Time-series momentum strategy. You can use the data_analysis.py file to find out the best trigger and window for a given asset an

Victor Caldeira 3 Jun 18, 2022
A Python library for detecting patterns and anomalies in massive datasets using the Matrix Profile

matrixprofile-ts matrixprofile-ts is a Python 2 and 3 library for evaluating time series data using the Matrix Profile algorithms developed by the Keo

Target 696 Dec 26, 2022
Adversarial Framework for (non-) Parametric Image Stylisation Mosaics

Fully Adversarial Mosaics (FAMOS) Pytorch implementation of the paper "Copy the Old or Paint Anew? An Adversarial Framework for (non-) Parametric Imag

Zalando Research 120 Dec 24, 2022
Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark environment.

pyspark-anonymizer Python library which makes it possible to dynamically mask/anonymize data using JSON string or python dict rules in a PySpark envir

6 Jun 30, 2022
A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching.

A linear equation solver using gaussian elimination. Implemented for fun and learning/teaching. The solver will solve equations of the type: A can be

Sanjeet N. Dasharath 3 Feb 15, 2022
The project's goal is to show a real world application of image segmentation using k means algorithm

The project's goal is to show a real world application of image segmentation using k means algorithm

2 Jan 22, 2022
Fit interpretable models. Explain blackbox machine learning.

InterpretML - Alpha Release In the beginning machines learned in darkness, and data scientists struggled in the void to explain them. Let there be lig

InterpretML 5.2k Jan 09, 2023
A Python library for choreographing your machine learning research.

A Python library for choreographing your machine learning research.

AI2 270 Jan 06, 2023
A machine learning project that predicts the price of used cars in the UK

Car Price Prediction Image Credit: AA Cars Project Overview Scraped 3000 used cars data from AA Cars website using Python and BeautifulSoup. Cleaned t

Victor Umunna 7 Oct 13, 2022
Forecasting prices using Facebook/Meta's Prophet model

CryptoForecasting using Machine and Deep learning (Part 1) CryptoForecasting using Machine Learning The main aspect of predicting the stock-related da

1 Nov 27, 2021
A machine learning model for Covid case prediction

CovidcasePrediction A machine learning model for Covid case prediction Problem Statement Using regression algorithms we can able to track the active c

VijayAadhithya2019rit 1 Feb 02, 2022
Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogramas anuais com spark, em pyspark e SQL!

Olá! Esse é o meu primeiro repo tratando de fim a fim, uma pipeline de dados abertos do governo brasileiro relacionado a compras de contrato e cronogr

Henrique de Paula 10 Apr 04, 2022