Thermodynamics Surrogate Modeling

Surrogate Modeling of Thermodynamic Calculations

The aim of this project was twofold. Firstly, thermodynamic calculations are individually quick to complete, however, when doing many calculations, there can be a fairly substantial time cost. As such the aim is to reduce the cost of the calculations by using a cheap to evaluate surrogate model. The second aim is to remove the obstacle of license requirements. It is fairly common to have access to the Thermodynamic Database Licenses in a limited quantity (significantly less than would be required to run these calculations on a supercomputer). Therefore, in order to be able to leverage some of the more advanced materials design approaches it is necessary to be able to access the information in the Thermodynamic Databases, but use supercomputing resources it is necessary to construct surrogates.
At the end of this process the aim was to have a surrogate model that is cheap to query and is accurate enough to replace the use of the actual Thermodynamic calculations.

When using a relatively coarse sampling (6 samples per dimension) of the design space it was observed that the fit of the surrogate model was not sufficient. The main discrepancy between the surrogate model and the Thermo-Calc results was at higher volume fractions. However, it also appears that most of the error is associated with points that have high temperatures.

When the sampling was increased to 10 samples per dimension the results were significantly better. These show much less error at larger volume fractions and so it was chosen to use this sampling scheme for constructing the final surrogate model.

In order to obtain an estimate of the parameter error of the CALPHAD model, a distribution was applied to the inputs of the surrogate model. This distribution was defined as a normal distribution in these results and the standard deviation of the distribution was linked to the typical concentration range of each element in the alloy. These distributions were sampled for each nominal composition and the mean and standard deviation of the outputs were calculated. These, together with the surrogate model standard deviation were used to define the combined error.

In addition to using a normal distribution on the input variables, a uniform distribution was also used. Here the limits of the uniform distribution were defined by the concentration range of each element. The combined error in this case is slightly smaller than the error from using the normal distribution. This is partly because the normal distribution can have values outside of the range of the uniform distribution.

Journal Publications

1

Couperthwaite, Richard, Douglas Allaire, and Raymundo Arroyave. “Utilizing Gaussian Processes to Fit High Dimension Thermodynamic Data That Includes Estimated Variability.” Computational Materials Science 188, 110133, 2021 DOI: doi.org/10.1016/j.commatsci.2020.110133

It is sometimes a mistake to climb; it is always a mistake never even to make the attempt.

If you do not climb, you will not fall.

This is true.

But is it that bad to fail, that hard to fall?
-Neil Gaiman, Fables & Reflections (Sandman #6)

Made with Mobirise - Check it