C-SYNTH-23M¶
The complete “synthetic” dataset of carbon structures from Synthetic Data Enable Experiments in Atomistic Machine Learning. This dataset comprises 546 uncorrelated MD trajectories, each containing 200 atoms, driven by the C-GAP-17 interatomic potential, and sampled every 1ps. The structures cover a wide range of densities, temperatures and degrees of dis/order.
>>> from load_atoms import load_dataset
>>> load_dataset("C-SYNTH-23M")
C-SYNTH-23M:
structures: 115,206
atoms: 23,041,200
species:
C: 100.00%
properties:
per atom: (forces, local_energies)
per structure: (anneal_T, density, energy, run_id, time)
License¶
This dataset is licensed under the MIT license.
Citation¶
If you use this dataset in your work, please cite the following:
@article{Gardner-23-03,
title = {
Synthetic Data Enable Experiments in Atomistic Machine Learning
},
author = {
Gardner, John L. A. and Beaulieu, Zo{\'e} Faure
and Deringer, Volker L.
},
year = {2023},
journal = {Digital Discovery},
doi = {10.1039/D2DD00137C},
}
Properties¶
Per-atom:
Property |
Units |
Type |
Description |
---|---|---|---|
|
eV/Å |
force vectors (C-GAP-17) |
|
|
eV |
local energies (C-GAP-17) |
Per-structure:
Property |
Units |
Type |
Description |
---|---|---|---|
|
eV |
|
total energy of the structure (C-GAP-17) |
|
K |
|
annealing temperature |
|
g cm\({}^{-3}\) |
|
density of the structure |
|
|
unique identifier for the trajectory |
|
|
ps |
|
timestep of the structure in the trajectory |
Miscellaneous information¶
C-SYNTH-23M
is imported as an
LmdbAtomsDataset
:
Importer script for C-SYNTH-23M
from __future__ import annotations
from pathlib import Path
from typing import Iterator
import ase.io
from ase import Atoms
from load_atoms.database.backend import BaseImporter, rename, unzip_file
from load_atoms.database.internet import FileDownload
from load_atoms.progress import Progress
class Importer(BaseImporter):
@classmethod
def files_to_download(cls) -> list[FileDownload]:
return [
FileDownload(
url="https://zenodo.org/records/7704087/files/jla-gardner/carbon-data-v1.0.zip",
expected_hash="b43fc702ef6d",
)
]
@classmethod
def get_structures(
cls, tmp_dir: Path, progress: Progress
) -> Iterator[Atoms]:
# Unzip the file
contents_path = unzip_file(tmp_dir / "carbon-data-v1.0.zip", progress)
extxyz_files = sorted(contents_path.glob("**/*.extxyz"))
task = progress.new_task(
f"Processing {len(extxyz_files)} .extxyz files",
total=len(extxyz_files),
)
# iterate through all .extxyz files
for file_path in extxyz_files:
structures = ase.io.read(file_path, index=":")
assert isinstance(structures, list)
for structure in structures:
yield process_structure(structure)
task.update(advance=1)
def process_structure(structure: Atoms) -> Atoms:
structure = rename(
structure,
{
"gap17_forces": "forces",
"gap17_energy": "local_energies",
},
)
structure.info["energy"] = structure.arrays["local_energies"].sum()
return structure
DatabaseEntry
for C-SYNTH-23M
name: C-SYNTH-23M
year: 2022
description: |
The complete "synthetic" dataset of carbon structures from `Synthetic Data Enable Experiments in Atomistic Machine Learning <https://doi.org/10.1039/D2DD00137C>`_.
This dataset comprises 546 uncorrelated MD trajectories, each containing 200 atoms, driven by the `C-GAP-17 <https://doi.org/10.1103/PhysRevB.95.094203>`_ interatomic potential,
and sampled every 1ps. The structures cover a wide range of densities, temperatures and degrees of dis/order.
category: Synthetic Data
license: MIT
minimum_load_atoms_version: 0.2
format: lmdb
citation: |
@article{Gardner-23-03,
title = {
Synthetic Data Enable Experiments in Atomistic Machine Learning
},
author = {
Gardner, John L. A. and Beaulieu, Zo{\'e} Faure
and Deringer, Volker L.
},
year = {2023},
journal = {Digital Discovery},
doi = {10.1039/D2DD00137C},
}
representative_structure: 199
per_atom_properties:
forces:
desc: force vectors (C-GAP-17)
units: eV/Å
local_energies:
desc: local energies (C-GAP-17)
units: eV
per_structure_properties:
energy:
desc: total energy of the structure (C-GAP-17)
units: eV
anneal_T:
desc: annealing temperature
units: K
density:
desc: density of the structure
units: g cm\ :math:`{}^{-3}`
run_id:
desc: unique identifier for the trajectory
time:
desc: timestep of the structure in the trajectory
units: ps
# TODO: remove after Dec 2024
# backwards compatability: unused as of 0.3.0
files:
- url: https://zenodo.org/records/7704087/files/jla-gardner/carbon-data-v1.0.zip
hash: b43fc702ef6d
processing:
- UnZip
- ForEachFile:
pattern: "**/*.extxyz"
steps:
- ReadASE
- Rename:
gap17_forces: forces
gap17_energy: local_energies