rMD17¶
A dataset composed of a single MD trajectory for each of 10 molecules. Original structures are taken from Chmiela et al., with energy and force labels recalculated by Christensen and von Lilienfeld using “the PBE/def2-SVP level of theory [with] very tight SCF convergence and [a] very dense DFT integration grid”. The MD trajectories are presented one at a time, with structures within each trajectory in chronological order.
>>> from load_atoms import load_dataset
>>> load_dataset("rMD17")
rMD17:
structures: 999,988
atoms: 15,599,712
species:
H: 44.23%
C: 43.59%
O: 8.97%
N: 3.21%
properties:
per atom: (forces)
per structure: (energy, name)
Citation¶
If you use this dataset in your work, please cite the following:
@article{Christensen-20-10,
title = {
On the Role of Gradients for Machine
Learning of Molecular Energies and Forces
},
author = {Christensen, Anders S. and von Lilienfeld, O. Anatole},
year = {2020},
journal = {Machine Learning: Science and Technology},
volume = {1},
number = {4},
pages = {045018},
doi = {10.1088/2632-2153/abba6f},
}
@article{Chmiela-17-05,
title = {
Machine Learning of Accurate Energy-Conserving Molecular Force Fields
},
author = {
Chmiela, Stefan and Tkatchenko, Alexandre
and Sauceda, Huziel E. and Poltavsky, Igor
and Sch{\"u}tt, Kristof T. and M{\"u}ller, Klaus-Robert
},
year = {2017},
journal = {Science Advances},
volume = {3},
number = {5},
pages = {e1603015},
doi = {10.1126/sciadv.1603015},
}
Properties¶
Per-atom:
Property |
Units |
Type |
Description |
---|---|---|---|
|
eV/Å |
forces |
Per-structure:
Property |
Units |
Type |
Description |
---|---|---|---|
|
eV |
|
energy |
|
str |
name of the molecule |
Miscellaneous information¶
rMD17
is imported as an
InMemoryAtomsDataset
:
Importer script for rMD17
from __future__ import annotations
from pathlib import Path
from typing import Iterator
import numpy as np
from ase import Atoms
from ase.units import eV, kcal, mol
from load_atoms.database.backend import BaseImporter, unzip_file
from load_atoms.database.internet import FileDownload
from load_atoms.progress import Progress
class Importer(BaseImporter):
@classmethod
def files_to_download(cls) -> list[FileDownload]:
return [
FileDownload(
url="https://figshare.com/ndownloader/files/23950376",
expected_hash="cddeea2ec2c4",
local_name="rmd17.tar.bz2",
)
]
@classmethod
def get_structures(
cls, tmp_dir: Path, progress: Progress
) -> Iterator[Atoms]:
# Unzip the file
contents_path = (
unzip_file(tmp_dir / "rmd17.tar.bz2", progress) / "rmd17/npz_data"
)
# Process each npz archive
structure_names = "aspirin benzene ethanol malonaldehyde naphthalene paracetamol salicylic toluene uracil azobenzene".split() # noqa: E501
assert len(structure_names) == 10
for structure_name in structure_names:
archive_path = contents_path / f"rmd17_{structure_name}.npz"
archive = np.load(archive_path)
Z = archive["nuclear_charges"]
coords = archive["coords"]
energy = archive["energies"]
forces = archive["forces"]
for idx in np.argsort(archive["old_indices"]):
structure = Atoms(numbers=Z, positions=coords[idx])
structure.info["name"] = structure_name
structure.info["energy"] = energy[idx] / (kcal / mol) * eV
structure.arrays["forces"] = forces[idx] / (kcal / mol) * eV
yield structure
DatabaseEntry
for rMD17
name: rMD17
year: 2020
description: |
A dataset composed of a single MD trajectory for each of 10 molecules.
Original structures are taken from Chmiela et al., with energy and force
labels recalculated by Christensen and von Lilienfeld using "the PBE/def2-SVP
level of theory [with] very tight SCF convergence and [a] very dense DFT
integration grid". The MD trajectories are presented one at a time, with
structures within each trajectory in chronological order.
category: Benchmarks
minimum_load_atoms_version: 0.3
per_structure_properties:
energy:
desc: energy
units: eV
name:
desc: name of the molecule
units: str
per_atom_properties:
forces:
desc: forces
units: eV/Å
representative_structure: 0
citation: |
@article{Christensen-20-10,
title = {
On the Role of Gradients for Machine
Learning of Molecular Energies and Forces
},
author = {Christensen, Anders S. and von Lilienfeld, O. Anatole},
year = {2020},
journal = {Machine Learning: Science and Technology},
volume = {1},
number = {4},
pages = {045018},
doi = {10.1088/2632-2153/abba6f},
}
@article{Chmiela-17-05,
title = {
Machine Learning of Accurate Energy-Conserving Molecular Force Fields
},
author = {
Chmiela, Stefan and Tkatchenko, Alexandre
and Sauceda, Huziel E. and Poltavsky, Igor
and Sch{\"u}tt, Kristof T. and M{\"u}ller, Klaus-Robert
},
year = {2017},
journal = {Science Advances},
volume = {3},
number = {5},
pages = {e1603015},
doi = {10.1126/sciadv.1603015},
}
license: CC0