Placeholder linkdump on the theme of machine learning in condensed matter physics and materials science is a rapidly growing field. See also learnable coarse-graining and machine learning in physical sciences.
This collection includes the list of online and offline resources of physical, chemical, mechanical and all other properties of materials.
Helping the students or enthusiasts who seek necessary data to practice machine learning techniques is the main motivation of this collection. It is also expected to assist the researchers in the material informatics field.
In the first three sections below, you can find the list of databases and dataset-sharing platforms accessible publicly and the information of the books/handbooks including materials data.
Additionally, in the last section, there are couple of toy datasets shared by researchers for educational purposes of machine learning techniques in materials science.
Please help me sort out all these projects
The pycalphad software package is a free and open-source Python library for designing thermodynamic models, calculating phase diagrams and investigating phase equilibria using the CALPHAD method. It provides routines for reading thermodynamic databases and solving the multi-component, multi-phase Gibbs energy minimization problem. The pycalphad software project advances the state of thermodynamic modeling by providing a flexible yet powerful interface for manipulating CALPHAD data and models. The key feature of the software is that the thermodynamic models of individual phases and their associated databases can be programmatically manipulated and overridden at run-time without modifying any internal solver or calculation code. Because the models are internally decoupled from the equilibrium solver and the models themselves are represented symbolically, pycalphad is an ideal tool for CALPHAD database development and model prototyping.
MatGL (Materials Graph Library) is a graph deep learning library for materials science. Mathematical graphs are a natural representation for a collection of atoms. Graph deep learning models have been shown to consistently deliver exceptional performance as surrogate models for the prediction of materials properties.
materialsproject/pymatgen: Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project. (Ong et al. 2013)
maml (MAterials Machine Learning) is a Python package that aims to provide useful high-level interfaces that make ML for materials science as easy as possible.
The goal of maml is not to duplicate functionality already available in other packages. maml relies on well-established packages such as scikit-learn and tensorflow for implementations of ML algorithms, as well as other materials science packages such as pymatgen and matminer for crystal/molecule manipulation and feature generation.
Harnessing the power of supercomputing and state-of-the-art methods, the Materials Project provides open web-based access to computed information on known and predicted materials as well as powerful analysis tools to inspire and design novel materials.
DeePMD-kit is a package written in Python/C++, designed to minimize the effort required to build deep learning-based model of interatomic potential energy and force field and to perform molecular dynamics (MD). This brings new hopes to addressing the accuracy-versus-efficiency dilemma in molecular simulations. Applications of DeePMD-kit span from finite molecules to extended systems and from metallic systems to chemically bonded systems.
The aim of aviary is to contain multiple models for materials discovery under a common interface, over time we hope to add more models with a particular focus on coordinate-free deep learning models.
The goal of ChemGymRL is to simulate enough complexity of real-world chemistry experiments to allow meaningful exploration of algorithms for learning policies to control bench-specific agents, while keeping it simple enough that episodes can be rapidly generated during the RL algorithm development process. The environment supports the training of RL agents by associating positive and negative rewards based on the procedure and outcomes of actions taken by the agents. The aim is for ChemGymRL to help bridge the gap between autonomous laboratories and digital chemistry. This will have impacts for producing new materials, chemicals, and drugs. It will also require many technologies including search, feedback and control, and optimization, and artificial intelligence algorithms that can deal with the unique challenges of material design. This simulation environment encapsulates some of those challenges while maintaining as much realism as possible, and extensibility to allow open-source improvement of the simulations going forward. The framework raises interesting computational and modeling challenges for the Reinforcement Learning paradigm that are not always all present in other frameworks such as costs of observation, observations of various level of detail and hierarchical planning.
IntelLabs/matsciml: Open MatSci ML Toolkit is a single framework for prototyping and scaling out deep learning models for materials discovery, built on top of OpenCatalyst, PyTorch Lightning, and the Deep Graph Library. (Miret et al. 2022)
The Open Catalyst Project is a collaborative research effort between Fundamental AI Research (FAIR) at Meta AI and Carnegie Mellon University’s (CMU) Department of Chemical Engineering. The aim is to use AI to model and discover new catalysts for use in renewable energy storage to help in addressing climate change…
To enable the broader research community to participate in this important project, we have released the Open Catalyst 2020 (OC20) and 2022 (OC22) datasets for training ML models. These datasets altogether contain 1.3 million molecular relaxations with results from over 260 million DFT calculations. In addition to the data, baseline models and code are open-sourced on our Github page. View the leaderboard to see the latest results and to submit your own to the evaluation server! Join the discuss forum to join the discussion with the community and ask any questions.