Fitting thermodynamic models with pycalphad

Overview

ESPEI

ESPEI, or Extensible Self-optimizing Phase Equilibria Infrastructure, is a tool for thermodynamic database development within the CALPHAD method. It uses pycalphad for calculating Gibbs free energies of thermodynamic models.

Read the documentation at espei.org.

Installation Anaconda (recommended)

ESPEI does not require any special compiler, but several dependencies do. Therefore it is suggested to install ESPEI from conda-forge.

conda install -c conda-forge espei

What is ESPEI?

  1. ESPEI parameterizes CALPHAD models with enthalpy, entropy, and heat capacity data using the corrected Akiake Information Criterion (AICc). This parameter generation step augments the CALPHAD modeler by providing tools for data-driven model selection, rather than relying on a modeler's intuition alone.
  2. ESPEI optimizes CALPHAD model parameters to thermochemical and phase boundary data and quantifies the uncertainty of the model parameters using Markov Chain Monte Carlo (MCMC). This is similar to the PARROT module of Thermo-Calc, but goes beyond by adjusting all parameters simultaneously and evaluating parameter uncertainty.

Details on the implementation of ESPEI can be found in the publication: B. Bocklund et al., MRS Communications 9(2) (2019) 1–10. doi:10.1557/mrc.2019.59.

What ESPEI can do?

ESPEI can be used to generate model parameters for CALPHAD models of the Gibbs energy that follow the temperature-dependent polynomial by Dinsdale (CALPHAD 15(4) 1991 317-425) within the compound energy formalism (CEF) for endmembers and Redlich-Kister-Mugganu excess mixing parameters in unary, binary and ternary systems.

All thermodynamic quantities are computed by pycalphad. The MCMC-based Bayesian parameter estimation can optimize data for any model that is supported by pycalphad, including models beyond the endmember Gibbs energies Redlich-Kister-Mugganiu excess terms, such as parameters in the ionic liquid model, magnetic, or two-state models. Performing Bayesian parameter estimation for arbitrary multicomponent thermodynamic data is supported.

Goals

  1. Offer a free and open-source tool for users to develop multicomponent databases with quantified uncertainty
  2. Enable development of CALPHAD-type models for Gibbs energy, thermodynamic or kinetic properties
  3. Provide a platform to build and apply novel model selection, optimization, and uncertainty quantification methods

The implementation for ESPEI involves first performing parameter generation by calculating parameters in thermodynamic models that are linearly described by non-equilibrium thermochemical data. Then Markov Chain Monte Carlo (MCMC) is used to optimize the candidate models from the parameter generation to phase boundary data.

Cu-Mg phase diagram

Cu-Mg phase diagram from a database created with and optimized by ESPEI. See the Cu-Mg Example.

History

The ESPEI package is based on a fork of pycalphad-fitting. The name and idea of ESPEI are originally based off of Shang, Wang, and Liu, ESPEI: Extensible, Self-optimizing Phase Equilibrium Infrastructure for Magnesium Alloys Magnes. Technol. 2010 617-622 (2010).

Implementation details for ESPEI have been described in the following publications:

Getting Help

For help on installing and using ESPEI, please join the PhasesResearchLab/ESPEI Gitter room.

Bugs and software issues should be reported on GitHub.

License

ESPEI is MIT licensed.

The MIT License (MIT)

Copyright (c) 2015-2018 Richard Otis
Copyright (c) 2017-2018 Brandon Bocklund
Copyright (c) 2018-2019 Materials Genome Foundation

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in all
copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE
SOFTWARE.

Citing ESPEI

If you use ESPEI for work presented in a publication, we ask that you cite the following publication:

  1. Bocklund, R. Otis, A. Egorov, A. Obaied, I. Roslyakova, Z.-K. Liu, ESPEI for efficient thermodynamic database development, modification, and uncertainty quantification: application to Cu–Mg, MRS Commun. (2019) 1–10. doi:10.1557/mrc.2019.59.
@article{Bocklund2019ESPEI,
         archivePrefix = {arXiv},
         arxivId = {1902.01269},
         author = {Bocklund, Brandon and Otis, Richard and Egorov, Aleksei and Obaied, Abdulmonem and Roslyakova, Irina and Liu, Zi-Kui},
         doi = {10.1557/mrc.2019.59},
         eprint = {1902.01269},
         issn = {2159-6859},
         journal = {MRS Communications},
         month = {jun},
         pages = {1--10},
         title = {{ESPEI for efficient thermodynamic database development, modification, and uncertainty quantification: application to Cu–Mg}},
         year = {2019}
}
Comments
  • Compute metastable/unstable single phase driving forces in ZPF error

    Compute metastable/unstable single phase driving forces in ZPF error

    Thanks to Tobias Spitaler for suggesting this and to @richardotis for brainstorming this solution concept.

    This PR introduces two new functions in ZPF error, _solve_sitefracs_composition and _sample_solution_constitution. Their purpose is to facilitate computing metastable or unstable single phase driving forces when a phase has a miscibility gap. This should improve the convergence for any phase that has a stable or metastable miscibility gap.

    Rationale

    ESPEI currently computes the "single-phase hyperplane" at a vertex by performing an equilibrium calculate at a black point and then subtracting that from the target hyperplane energy at that composition. As illustrated in the figure Tobias constructed (below), this is problematic for phases with a miscibility gap because a "single-phase" equilibrium calculation in pycalphad will always compute the global minimum energy and give two composition sets.

    driving-force-Spitaler

    What ESPEI should do is what Tobias illustrates by the orange x and the green driving force line. This solution ensures that minimizing the driving force will force the Gibbs energy curve to match the energy of the black points on the multi-phase target hyperplane.

    Historically, we didn't implement this because one would like to use equilibrium to minimize the internal degeres of freedom, but pycalphad always computes the global minimum energy, so it was not possible to do via equilibrium. More recently, ESPEI had introduced the idea of approximate_equilibrium, which uses starting_point to more quickly determine a minimum energy solution from a discrete point smapling grid. The approximate_equilibrium method we use still has the same problem as pycalphad's equilibrium because starting_point will still give the global minimum solution for the discrete sampling.

    Solution

    In an ideal world, pycalphad should be able to turn off global minimization (automatically introducing new composition sets) and enable a condition to be set for the composition of a phase, i.e. X(BCC,B). In practice, being able to turn off global minimum and provide a valid starting point for only one composition set that has a global composition condition would simulate a phase composition condition. Unfortunately, neither turning off global minimization nor phase composition conditions are currently implemented. So we need to do a workaround.

    The two functions introduced here consider each single phase composition at a tie-vertex and construct a point grid that only contains points which satisfy the prescribed overall composition (and the internal phase constraints). This can be used in either approximate or exact equilibrium modes to find lowest energy starting point and then to pass that equilibrium with the constrained point grid so the global minimization step has no new composition sets to introduce (i.e. it cannot detect a miscibility gap).

    For perfomance, we pre-compute the grid of points for every phase composition in the ZPF datasets and re-use them to compute the grid, starting point and equilibrium at every parameter iteration (note that this would be invalid if a parameter changes the number of moles, like varying coordination number in the MQMQA).

    To summarize the impact:

    1. This method will be entirely backwards compatible for phases without a miscibility gap.
    2. For cases where a miscibility gap is present in the parameters, but a single phase is prescribed, there will be a driving force to eliminate the miscibility gap, so the single phase compositions are more meaningful too. This is significant because you can prescribe single phase regions in ZPF datasets and it will enforce that no miscibility gap occurs, which is not true today.
    3. For phase compositions inside a miscibilty gap, the Gibbs energy curve will match the multi-phase global minimum hyperplane at the phase compositions (at convergence).
    opened by bocklund 20
  • ERROR occurred using the new development version

    ERROR occurred using the new development version

    Dear Administrator, There were some tests that failed when I try to run pytest after install the new development version(2021/4/21, Beijing time). Meanwhile, there is some error occurred when I run some example cases that successfully run using other versions before. errorlog.txt pytestfail.txt condalist.txt

    opened by duxiaoxian 12
  • Error releasing un-acquired lock in dask

    Error releasing un-acquired lock in dask

    Was distributed (1.18.0) when this error occurred. Changed to distributed (1.16.3).

      File "/Applications/anaconda/envs/my_pycalphad/bin/espei", line 11, in <module>
        sys.exit(main())
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/espei/run_espei.py", line 135, in main
        mcmc_steps=args.mcmc_steps, save_interval=args.save_interval)
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/espei/paramselect.py", line 754, in fit
        for i, result in enumerate(sampler.sample(walkers, iterations=mcmc_steps)):
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/emcee/ensemble.py", line 259, in sample
        lnprob[S0])
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/emcee/ensemble.py", line 332, in _propose_stretch
        newlnprob, blob = self._get_lnprob(q)
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/emcee/ensemble.py", line 382, in _get_lnprob
        results = list(M(self.lnprobfn, [p[i] for i in range(len(p))]))
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/espei/utils.py", line 39, in map
        result = [x.result() for x in result]
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/espei/utils.py", line 39, in <listcomp>
        result = [x.result() for x in result]
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/distributed/client.py", line 155, in result
        six.reraise(*result)
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/six.py", line 685, in reraise
        raise value.with_traceback(tb)
      File "/Applications/anaconda/envs/my_pycalphad/lib/python3.6/site-packages/distributed/protocol/pickle.py", line 59, in loads
        return pickle.loads(x)
    RuntimeError: cannot release un-acquired lock```
    bug 
    opened by ghost 10
  • dask workers can sometimes die without warning

    dask workers can sometimes die without warning

    I haven't been able to reproduce it consistently, but dark workers sometimes die with the dask scheduler.

    To debug this, I turned on debugging output by scheduler = LocalCluster(n_workers=cores, threads_per_worker=1, processes=True, silence_logs=verbosity[output_settings['verbosity']]).

    I am still waiting for that job to have workers die to see the output, but for now as iterations in emcee complete the results are processed in Python (it is known that this is happening because of the progress bar output). During this time, the LocalCluster debugging gives output

    distributed.core - WARNING - Event loop was unresponsive for 1.69s.  This is often caused by long-running GIL-holding functions or moving large chunks of data. This can cause timeouts and instability.
    

    Usually I get two similar messages in a row.

    As another possibility, the most recent time I was able to reproduce this was when I had two instances of ESPEI running at the same time. I wouldn't think that the different client instances would interact, but maybe it should be investigated.

    opened by bocklund 6
  • Issues reproducing Cu-Mg example

    Issues reproducing Cu-Mg example

    I had several issues running the Cu-Mg example from the ESPEI website. I installed ESPEI using the conda command, and took the Cu-Mg data directory from the ESPEI-datasets repository.

    I first tried reproducing the diagram from the section titled, First-principles phase diagram The code successfully ran, but the returned phase diagram didn't match the example well: diagram_dft

    I then tried reproducing the results in the MCMC optimization section. I wasn't able to successfully perform the MCMC optimization. The code returned numerous errors over the course of several minutes and eventually hung with no further output.

    This file contains the full python output when I ran the optimization: espei_mcmc_error.txt

    Here is my python version and installed packages/versions: python_info.txt

    opened by npaulson 6
  • The latest version of espei = 0.7.2 get an error when plot

    The latest version of espei = 0.7.2 get an error when plot

    I have recently used the latest version of espei = 0.7.2 and I always get an error, but I used espei = 0.6 and it works fine. image

    My current computer can't use espei = 0.6 again, so I don't know which version to use, I don't know what went wrong. I always get MPI errors when I use espei = 0.6 image

    AG_CU_1214.zip

    opened by duxiaoxian 5
  • Run ESPEI via input files, rather than command line arguments

    Run ESPEI via input files, rather than command line arguments

    A first draft and feedback was written in this gist

    The current iteration is:

    Header area.
    Include any metadata above the `---`.
    ---
    # core run settings
    run_type: full # choose full | dft | mcmc
    phase_models: input.json
    datasets: input-datasets # path to datasets. Defaults to current directory.
    scheduler: dask # can be dask | MPIPool
    
    # control output
    verbosity: 0 # integer verbosity level 0 | 1 | 2, where 2 is most verbose.
    output_tdb: out.tdb
    tracefile: chain.npy # name of the file containing the mcmc chain array
    probfile: lnprob.npy # name of the file containing the mcmc ln probability array
    
    # the following only take effect for full or mcmc runs
    mcmc:
      mcmc_steps: 2000
      mcmc_save_interval: 100
    
      # the following take effect for only mcmc runs
      input_tdb: null # TDB file used to start the mcmc run
      restart_chain: null # restart the mcmc fitting from a previous calculation
    

    This issue will focus on the development of a first generation input file structure and spec, and also as a place to brainstorm options that should be user-facing.

    opened by bocklund 5
  • Limit the degrees of freedom for non-active phases in MCMC to prevent them from diverging?

    Limit the degrees of freedom for non-active phases in MCMC to prevent them from diverging?

    Phases that do not have phase equilibria data should have their parameters fixed before the MCMC run.

    A particular phase in an ESPEI run can have single phase DFT data and no phase equilibria. This means that the parameters that were calculated in the single phase fitting have no effect on the error function that is used in the MCMC run.

    When parameters have no effect on the error function, they diverge when used in emcee because the ensemble sampler scales them up to infinity in an attempt to force that parameter to affect the error function.

    bug enhancement 
    opened by bocklund 5
  • Error when running Cu-Mg example

    Error when running Cu-Mg example

    Hello, I am trying to run ESPEI for the first time.

    I created a conda env and installed ESPEI using conda. I downloaded json and yaml files as well as the contents of the Cu-Mg folder in ESPEI-datasets, renamed it to input-data. After running espei --input espei-in.yaml, I get the errors below. Could you please let me know if I am doing anything wrong?

    Thanks!

    Traceback (most recent call last):
      File "/Users/latmarat/miniforge3/envs/espenv/bin/espei", line 10, in <module>
        sys.exit(main())
      File "/Users/latmarat/miniforge3/envs/espenv/lib/python3.10/site-packages/espei/espei_script.py", line 307, in main
        run_espei(input_settings)
      File "/Users/latmarat/miniforge3/envs/espenv/lib/python3.10/site-packages/espei/espei_script.py", line 177, in run_espei
        dbf = generate_parameters(phase_models, datasets, refdata, excess_model,
      File "/Users/latmarat/miniforge3/envs/espenv/lib/python3.10/site-packages/espei/paramselect.py", line 517, in generate_parameters
        aliases = extract_aliases(phase_models)
      File "/Users/latmarat/miniforge3/envs/espenv/lib/python3.10/site-packages/espei/utils.py", line 370, in extract_aliases
        aliases = {phase_name: phase_name for phase_name in phase_models["phases"].keys()}
    AttributeError: 'list' object has no attribute 'keys'
    
    opened by latmarat 4
  • AttributeError: 'NoneType' object has no attribute 'values'

    AttributeError: 'NoneType' object has no attribute 'values'

    Dear Administrator, An 'AttributeError' occurred when I run 'espei --input espei-in-2.yaml' using the latest development version of ESPEI. Would you mind help me to check my dataset? Thanks. errorprint-log.txt verbosity-log.txt CO-CU-20201104.zip

    f:\users\zhang\pycalphad\pycalphad\codegen\callables.py:97: UserWarning: State variables in build_callables are not {N, P, T}, but {T, P}. This can lead to incorrectly calculated values if the state variables used to call the generated functions do not match the state variables used to create them. State variables can be added with the additional_statevars argument. "additional_statevars argument.".format(state_variables)) Traceback (most recent call last): File "F:\Users\zhang\Anaconda32020\envs\espei2020test\Scripts\espei-script.py", line 33, in sys.exit(load_entry_point('espei', 'console_scripts', 'espei')()) File "f:\users\zhang\espei\espei\espei_script.py", line 311, in main run_espei(input_settings) File "f:\users\zhang\espei\espei\espei_script.py", line 260, in run_espei approximate_equilibrium=approximate_equilibrium, File "f:\users\zhang\espei\espei\optimizers\opt_base.py", line 36, in fit node = self.fit(symbols, datasets, *args, **kwargs) File "f:\users\zhang\espei\espei\optimizers\opt_mcmc.py", line 238, in fit self.predict(initial_guess, **ctx) File "f:\users\zhang\espei\espei\optimizers\opt_mcmc.py", line 289, in predict multi_phase_error = calculate_zpf_error(parameters=np.array(params), **zpf_kwargs) File "f:\users\zhang\espei\espei\error_functions\zpf_error.py", line 315, in calculate_zpf_error target_hyperplane = estimate_hyperplane(phase_region, parameters, approximate_equilibrium=approximate_equilibrium) File "f:\users\zhang\espei\espei\error_functions\zpf_error.py", line 186, in estimate_hyperplane grid = calculate(dbf, species, phases, str_statevar_dict, models, phase_records, pdens=500, fake_points=True) File "f:\users\zhang\espei\espei\shadow_functions.py", line 55, in calculate largest_energy=float(1e10), fake_points=fp) File "f:\users\zhang\pycalphad\pycalphad\core\calculate.py", line 190, in _compute_phase_values param_symbols, parameter_array = extract_parameters(parameters) File "f:\users\zhang\pycalphad\pycalphad\core\utils.py", line 361, in extract_parameters parameter_array_lengths = set(np.atleast_1d(val).size for val in parameters.values()) AttributeError: 'NoneType' object has no attribute 'values'

    opened by duxiaoxian 4
  • Migrate pycalphad refdata to ESPEI

    Migrate pycalphad refdata to ESPEI

    Tracking from https://github.com/pycalphad/pycalphad/issues/120

    Assume that SGTE91Stable is correct per https://github.com/pycalphad/pycalphad/issues/120. Then we must

    • [x] Remove the metastable phases not present in the SGTE91 original paper
    • [ ] Check that remaining phases have correct descriptions
    opened by bocklund 4
  • MCMC Initialized chains should include initial point

    MCMC Initialized chains should include initial point

    During the initialization of the chains for the MCMC optimizer, a Gaussian distribution about an initial point is taken. https://github.com/PhasesResearchLab/ESPEI/blob/7c797191d4c3178fe4a22275bbaee9c2977786ad/espei/optimizers/opt_mcmc.py#L98

    I would suggest including the initial point in that set of initial chains. If everything is set up correctly, this won't matter, but for cases where the standard deviation is too high while the initial guess is quite good, the current behavior will lead to a lot of bad starting points. Modifying the initial set to include the initial guess point should ensure that at least this state (or acceptable permutations of it) will survive the MCMC run. What do you think?

    opened by toastedcrumpets 0
  • formatted_parameter broken by SymEngine

    formatted_parameter broken by SymEngine

    Switching the symbolic backend to SymEngine broke espei.utils.formatted_parameter. Here's a test to validate (run from the tests directory for the testing_data module to be importable).

    # espei/tests/test_utils.py
    
    from pycalphad import Database
    from espei.utils import formatted_parameter, database_symbols_to_fit
    from .testing_data import CU_MG_TDB
    def test_cu_mg_parameters_can_be_formatted_to_strings():
        """Formating parameters should work for common variables parameters"""
        dbf = Database(CU_MG_TDB)
        for sym in database_symbols_to_fit(dbf):
            assert isinstance(formatted_parameter(dbf, sym), str), f"Formatted parameter for symbol {sym} (value = {dbf.symbols[sym]}) in database not a string"
    

    Running this gives an error:

    Traceback (most recent call last):
      File "/Users/bocklund1/src/calphad/espei/tests/dummy.py", line 11, in <module>
        test_cu_mg_parameters_can_be_formatted_to_strings()
      File "/Users/bocklund1/src/calphad/espei/tests/dummy.py", line 9, in test_cu_mg_parameters_can_be_formatted_to_strings
        assert isinstance(formatted_parameter(dbf, sym), str), f"Formatted parameter for symbol {sym} (value = {dbf.symbols[sym]}) in database not a string"
      File "/Users/bocklund1/src/calphad/espei/espei/utils.py", line 295, in formatted_parameter
        term = parameter_term(result['parameter'], symbol)
      File "/Users/bocklund1/src/calphad/espei/espei/utils.py", line 218, in parameter_term
        coeff, root = term_coeff.as_coeff_mul(symbol)
    AttributeError: 'symengine.lib.symengine_wrapper.Symbol' object has no attribute 'as_coeff_mul'
    

    I think the breakage might be because espei.utils.parameter_term isn't correctly picking up the first condition, since for the case of symbol being a symengine.lib.symengine_wrapper.Symbol, I think expression == symbol should evaluate to true, but evidently (via the traceback) it is evaluating to false.

    opened by bocklund 0
  • Memory leak when running MCMC in parallel

    Memory leak when running MCMC in parallel

    Due to a known memory leak when instantiating subclasses of SymEngine (one of our upstream dependencies) Symbol objects (see https://github.com/symengine/symengine.py/issues/379), running ESPEI with parallelization will cause memory to grow in each worker.

    Only running in parallel will trigger significant memory growth, because running in parallel uses the pickle library to serialize and deserialize symbol objects and create new objects that can't be freed. When running without parallelization (mcmc.scheduler: null), new symbols are not created.

    Until https://github.com/symengine/symengine.py/issues/379 is fixed, some mitigation strategies to avoid running out of memory are:

    • Run ESPEI without parallelization by setting scheduler: null
    • (Under consideration to implement): when parallelization is active, use an option to restart the workers every N iterations.
    • (Under consideration to implement): remove Model objects from the keyword arguments of ESPEI's likelihood functions. Model objects contribute a lot of symbol instances in the form of v.SiteFraction objects. We should be able to get away with only using PhaseRecord objects, but there are a few places Model.constituents to be able to infer the sublattice model and internal degrees of freedom that would need to be rewritten.
    opened by bocklund 1
  • Unable to use activity data in binary Fe-C with Graphite as reference state

    Unable to use activity data in binary Fe-C with Graphite as reference state

    Hi,

    We are currently trying to use activity data for Fe-C. Lobo1976 measured the activity of C in alpha-iron relative to Graphite as the standard state, but get erroneous results. (Lobo, Joseph A., and Gordon H. Geiger. "Thermodynamics and solubility of carbon in ferrite and ferritic Fe-Mo alloys." Metallurgical Transactions A 7.8 (1976): 1347-1357.)

    I have added the input file below. With this input file, we get chemical potential difference: [nan] (verbosity 3 output). Is the input file correct or are we missing something? I have had a look at the value of ref_result within the activity_error.py and this does give only nan results for the specified reference state. Graphite only has C as a component. An equilibrium calculation of Graphite specifying x.V('C') gives an error as Number of dependent components different from one. Can this cause an error here as well? Used versions: espei: 0.8.6 and pycalphad 0.9.2. I have added a zip-file with the TDB file and espei input files which reproduces this behaviour.

    Thank you for your help, Tobias

    {
            "components": ["FE", "C", "VA"],
            "phases": ["BCC_A2", "GRAPHITE"],
            "weight": 1000,
            "reference_state": {
                    "phases": ["GRAPHITE"],
                    "conditions": {
                            "P": 101325,
                            "T": 1056.15,
                            "X_C": 1
    
                    }
            },
            "conditions": {
                    "P": 101325,
                    "T": 1056.15,
                    "X_C": [0.00013017]
            },
            "output": "ACR_C",
            "values": [[[0.087]]
                    ],
            "reference": "Lobo1976_1056K",
            "meta_data": {
                    "DOI": "10.1007/BF02658820",
                    "literature reference": "Thermodynamics and Solubility of Carbon in Ferrite and Ferritic Fe-Mo Alloys",
                    "table/figure": "table 1",
                    "measured data": "C-activity in Alpha-Iron",
                    "experimental details": "not available",
                    "weight": "default"
            }
    }
    

    minimal_example.zip

    opened by tobiasspt 1
  • ENH: Allow multiple datasets directories to be specified in YAML input

    ENH: Allow multiple datasets directories to be specified in YAML input

    Sometimes it is useful to load datasets from different filesystem locations, for example if one folder contains hand-curated data and another contains automatically generated data.

    In code, it would be pretty simple to handle this. Instead of

    from espei.datasets import load_datasets, recursive_glob
    directory = '/path/to/directory/'
    load_datasets(recursive_glob(directory))
    

    we could do

    from itertools import chain
    from espei.datasets import load_datasets, recursive_glob
    directories = ['/path/to/directory_1/', '/path/to/directory_2/']
    load_datasets(chain(*map(recursive_glob, directories)))
    
    opened by bocklund 1
Releases(0.8.9)
Owner
Phases Research Lab
Research group lead by Dr. Zi-Kui Liu at The Pennsylvania State University.
Phases Research Lab
Stochastic Gradient Trees implementation in Python

Stochastic Gradient Trees - Python Stochastic Gradient Trees1 by Henry Gouk, Bernhard Pfahringer, and Eibe Frank implementation in Python. Based on th

John Koumentis 2 Nov 18, 2022
Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Processo de ETL (extração, transformação, carregamento) realizado pela equipe no projeto final do curso da Soul Code Academy.

Débora Mendes de Azevedo 1 Feb 03, 2022
Instant search for and access to many datasets in Pyspark.

SparkDataset Provides instant access to many datasets right from Pyspark (in Spark DataFrame structure). Drop a star if you like the project. 😃 Motiv

Souvik Pratiher 31 Dec 16, 2022
Mining the Stack Overflow Developer Survey

Mining the Stack Overflow Developer Survey A prototype data mining application to compare the accuracy of decision tree and random forest regression m

1 Nov 16, 2021
The Dash Enterprise App Gallery "Oil & Gas Wells" example

This app is based on the Dash Enterprise App Gallery "Oil & Gas Wells" example. For more information and more apps see: Dash App Gallery See the Dash

Austin Caudill 1 Nov 08, 2021
Python utility to extract differences between two pandas dataframes.

Python utility to extract differences between two pandas dataframes.

Jaime Valero 8 Jan 07, 2023
Synthetic Data Generation for tabular, relational and time series data.

An Open Source Project from the Data to AI Lab, at MIT Website: https://sdv.dev Documentation: https://sdv.dev/SDV User Guides Developer Guides Github

The Synthetic Data Vault Project 1.2k Jan 07, 2023
Methylation/modified base calling separated from basecalling.

Remora Methylation/modified base calling separated from basecalling. Remora primarily provides an API to call modified bases for basecaller programs s

Oxford Nanopore Technologies 72 Jan 05, 2023
Pandas and Dask test helper methods with beautiful error messages.

beavis Pandas and Dask test helper methods with beautiful error messages. test helpers These test helper methods are meant to be used in test suites.

Matthew Powers 18 Nov 28, 2022
PostQF is a user-friendly Postfix queue data filter which operates on data produced by postqueue -j.

PostQF Copyright © 2022 Ralph Seichter PostQF is a user-friendly Postfix queue data filter which operates on data produced by postqueue -j. See the ma

Ralph Seichter 11 Nov 24, 2022
Pyspark Spotify ETL

This is my first Data Engineering project, it extracts data from the user's recently played tracks using Spotify's API, transforms data and then loads it into Postgresql using SQLAlchemy engine. Data

16 Jun 09, 2022
A pipeline that creates consensus sequences from a Nanopore reads. I

A pipeline that creates consensus sequences from a Nanopore reads. It clusters reads that are similar to each other and creates a consensus that is then identified using BLAST.

Ada Madejska 2 May 15, 2022
Sensitivity Analysis Library in Python (Numpy). Contains Sobol, Morris, Fractional Factorial and FAST methods.

Sensitivity Analysis Library (SALib) Python implementations of commonly used sensitivity analysis methods. Useful in systems modeling to calculate the

SALib 663 Jan 05, 2023
Detailed analysis on fraud claims in insurance companies, gives you information as to why huge loss take place in insurance companies

Insurance-Fraud-Claims Detailed analysis on fraud claims in insurance companies, gives you information as to why huge loss take place in insurance com

1 Jan 27, 2022
Uses MIT/MEDSL, New York Times, and US Census datasources to analyze per-county COVID-19 deaths.

Covid County Executive summary Setup Install miniconda, then in the command line, run conda create -n covid-county conda activate covid-county conda i

Ahmed Fasih 1 Dec 22, 2021
Important dataframe statistics with a single command

quick_eda Receiving dataframe statistics with one command Project description A python package for Data Scientists, Students, ML Engineers and anyone

Sven Eschlbeck 2 Dec 19, 2021
Analyze the Gravitational wave data stored at LIGO/VIRGO observatories

Gravitational-Wave-Analysis This project showcases how to analyze the Gravitational wave data stored at LIGO/VIRGO observatories, using Python program

1 Jan 23, 2022
Desafio proposto pela IGTI em seu bootcamp de Cloud Data Engineer

Desafio Modulo 4 - Cloud Data Engineer Bootcamp - IGTI Objetivos Criar infraestrutura como código Utuilizando um cluster Kubernetes na Azure Ingestão

Otacilio Filho 4 Jan 23, 2022
A simple and efficient tool to parallelize Pandas operations on all available CPUs

Pandaral·lel Without parallelization With parallelization Installation $ pip install pandarallel [--upgrade] [--user] Requirements On Windows, Pandara

Manu NALEPA 2.8k Dec 31, 2022
An easy-to-use feature store

A feature store is a data storage system for data science and machine-learning. It can store raw data and also transformed features, which can be fed straight into an ML model or training script.

ByteHub AI 48 Dec 09, 2022