Materials informatics
Machine learning in condensed matter physics, chemistry and materials science
August 1, 2023 — October 29, 2024
Placeholder linkdump on the theme of machine learning in condensed matter physics and materials science is a rapidly growing field. See also learnable coarse-graining and machine learning in physical sciences.
1 Master lists
-
This collection includes the list of online and offline resources of physical, chemical, mechanical and all other properties of materials.
Helping the students or enthusiasts who seek necessary data to practise machine learning techniques is the main motivation of this collection. It is also expected to assist the researchers in the material informatics field.
In the first three sections below, you can find the list of databases and dataset-sharing platforms accessible publicly and the information of the books/handbooks including materials data.
Additionally, in the last section, there are a couple of toy datasets shared by researchers for educational purposes of machine learning techniques in materials science.
The cited-by list for Butler et al. (2018)
2 Please help me sort out all these projects
lukasturcani/stk: A Python library which allows construction and manipulation of complex molecules, as well as automatic molecular design and the creation of molecular databases. /documentation /
pycalphad/pycalphad: CALPHAD tools for designing thermodynamic models, calculating phase diagrams and investigating phase equilibria. / pycalphad documentation (Otis and Liu 2017)
The pycalphad software package is a free and open-source Python library for designing thermodynamic models, calculating phase diagrams and investigating phase equilibria using the CALPHAD method. It provides routines for reading thermodynamic databases and solving the multi-component, multi-phase Gibbs energy minimisation problem. The pycalphad software project advances the state of thermodynamic modelling by providing a flexible yet powerful interface for manipulating CALPHAD data and models. The key feature of the software is that the thermodynamic models of individual phases and their associated databases can be programmatically manipulated and overridden at run-time without modifying any internal solver or calculation code. Because the models are internally decoupled from the equilibrium solver and the models themselves are represented symbolically, pycalphad is an ideal tool for CALPHAD database development and model prototyping.
pycroscopy/pycroscopy: Scientific analysis of nanoscale materials imaging data / pycroscopy documentation (Somnath et al. 2019)
materialsvirtuallab/matgl: Graph deep learning library for materials/ Home | MatGL
MatGL (Materials Graph Library) is a graph deep learning library for materials science. Mathematical graphs are a natural representation for a collection of atoms. Graph deep learning models have been shown to consistently deliver exceptional performance as surrogate models for the prediction of materials properties.
hackingmaterials/matminer: Data mining for materials science/ matminer (Materials Data Mining) documentation (Ward et al. 2018)
DeepChem /deepchem/deepchem: Democratising Deep-Learning for Drug Discovery, Quantum Chemistry, Materials Science and Biology (Ramsundar et al. 2019)
materialsproject/pymatgen: Python Materials Genomics (pymatgen) is a robust materials analysis code that defines classes for structures and molecules with support for many electronic structure codes. It powers the Materials Project. (Ong et al. 2013)
materialsvirtuallab/maml: Python for Materials Machine Learning, Materials Descriptors, Machine Learning Force Fields, Deep Learning, etc./maml documentation
maml (MAterials Machine Learning) is a Python package that aims to provide useful high-level interfaces that make ML for materials science as easy as possible.
The goal of maml is not to duplicate functionality already available in other packages. maml relies on well-established packages such as scikit-learn and tensorflow for implementations of ML algorithms, as well as other materials science packages such as pymatgen and matminer for crystal/molecule manipulation and feature generation.
-
Harnessing the power of supercomputing and state-of-the-art methods, the Materials Project provides open web-based access to computed information on known and predicted materials as well as powerful analysis tools to inspire and design novel materials.
-
DeePMD-kit is a package written in Python/C++, designed to minimize the effort required to build deep learning-based model of interatomic potential energy and force field and to perform molecular dynamics (MD). This brings new hopes to addressing the accuracy-versus-efficiency dilemma in molecular simulations. Applications of DeePMD-kit span from finite molecules to extended systems and from metallic systems to chemically bonded systems.
sparks-baird/mat_discover: A materials discovery algorithm geared towards exploring high-performance candidates in new chemical spaces./ mat_discover documentation (Baird, Diep, and Sparks 2022)
-
The aim of aviary is to contain multiple models for materials discovery under a common interface, over time we hope to add more models with a particular focus on coordinate-free deep learning models.
chemgymrl.com/ Chemgymrl: Mark Crowley | Automated Materials Design and Discovery Using Reinforcement Learning
The goal of ChemGymRL is to simulate enough complexity of real-world chemistry experiments to allow meaningful exploration of algorithms for learning policies to control bench-specific agents, while keeping it simple enough that episodes can be rapidly generated during the RL algorithm development process. The environment supports the training of RL agents by associating positive and negative rewards based on the procedure and outcomes of actions taken by the agents. The aim is for ChemGymRL to help bridge the gap between autonomous laboratories and digital chemistry. This will have impacts for producing new materials, chemicals, and drugs. It will also require many technologies including search, feedback and control, and optimisation, and artificial intelligence algorithms that can deal with the unique challenges of material design. This simulation environment encapsulates some of those challenges while maintaining as much realism as possible, and extensibility to allow open-source improvement of the simulations going forward. The framework raises interesting computational and modelling challenges for the Reinforcement Learning paradigm that are not always all present in other frameworks such as costs of observation, observations of various level of detail and hierarchical planning.
IntelLabs/matsciml: Open MatSci ML Toolkit is a single framework for prototyping and scaling out deep learning models for materials discovery, built on top of OpenCatalyst, PyTorch Lightning, and the Deep Graph Library. (Miret et al. 2022)
Open Catalyst Project (Chanussot et al. 2021; Tran et al. 2023; Zitnick et al. 2020)
The Open Catalyst Project is a collaborative research effort between Fundamental AI Research (FAIR) at Meta AI and Carnegie Mellon University’s (CMU) Department of Chemical Engineering. The aim is to use AI to model and discover new catalysts for use in renewable energy storage to help in addressing climate change…
To enable the broader research community to participate in this important project, we have released the Open Catalyst 2020 (OC20) and 2022 (OC22) datasets for training ML models. These datasets altogether contain 1.3 million molecular relaxations with results from over 260 million DFT calculations. In addition to the data, baseline models and code are open-sourced on our Github page. View the leaderboard to see the latest results and to submit your own to the evaluation server! Join the discuss forum to join the discussion with the community and ask any questions.
C2DB etc (Gjerding et al. 2021; Haastrup et al. 2018)
The Computational 2D Materials Database (C2DB) is a highly curated open database organising a wealth of computed properties for more than 4000 atomically thin two-dimensional (2D) materials