ANI-1ccx¶
>>> from load_atoms import load_dataset
>>> load_dataset("ANI-1ccx")
ANI-1ccx:
structures: 489,571
atoms: 6,763,288
species:
H: 45.52%
C: 29.58%
N: 15.30%
O: 9.60%
properties:
per atom: (dft_forces)
per structure: (1x_idx, cc_energy, dft_dipole, dft_energy)
License¶
This dataset is licensed under the CC0 license.
Citation¶
If you use this dataset in your work, please cite the following:
@article{Smith-19-07,
title = {
Approaching Coupled Cluster Accuracy with a
General-Purpose Neural Network Potential
through Transfer Learning
},
author = {
Smith, Justin S. and Nebgen, Benjamin T. and Zubatyuk, Roman
and Lubbers, Nicholas and Devereux, Christian and Barros, Kipton
and Tretiak, Sergei and Isayev, Olexandr and Roitberg, Adrian E.
},
year = {2019},
journal = {Nature Communications},
volume = {10},
number = {1},
pages = {2903},
doi = {10.1038/s41467-019-10827-4},
}
@article{Smith-20-05,
title = {
The ANI-1ccx and ANI-1x Data Sets, Coupled-Cluster
and Density Functional Theory Properties for Molecules
},
author = {
Smith, Justin S. and Zubatyuk, Roman and Nebgen, Benjamin and
Lubbers, Nicholas and Barros, Kipton and Roitberg, Adrian E. and
Isayev, Olexandr and Tretiak, Sergei
},
year = {2020},
journal = {Scientific Data},
volume = {7},
number = {1},
pages = {134},
doi = {10.1038/s41597-020-0473-z},
}
@article{Smith-18-05,
title = {
Less Is More: Sampling Chemical Space with Active Learning
},
author = {
Smith, Justin S. and Nebgen, Ben and Lubbers, Nicholas and
Isayev, Olexandr and Roitberg, Adrian E.
},
year = {2018},
journal = {The Journal of Chemical Physics},
volume = {148},
number = {24},
doi = {10.1063/1.5023802},
}
Properties¶
Per-atom:
Property |
Units |
Type |
Description |
---|---|---|---|
|
eV/Å |
force vectors (as labelled with \(\omega\)B97x/6-31G(d)) |
Per-structure:
Property |
Units |
Type |
Description |
---|---|---|---|
|
eV |
|
energy of the structure (as labelled with CCSD(T)/CBS) |
|
eV |
|
energy of the structure (as labelled with \(\omega\)B97x/6-31G(d)) |
|
e Å |
dipole moment of the structure (as labelled with \(\omega\)B97x/6-31G(d)) |
|
|
index of the structure in the ANI-1x dataset |
Miscellaneous information¶
ANI-1ccx
is imported as an
LmdbAtomsDataset
:
Importer script for ANI-1ccx
from __future__ import annotations
from pathlib import Path
from typing import Iterator
import h5py
import numpy as np
from ase import Atoms
from load_atoms.database.backend import BaseImporter, FileDownload
from load_atoms.progress import Progress
Ha_to_eV = 27.2114079527
class Importer(BaseImporter):
@classmethod
def permanent_download_dirname(cls) -> str | None:
return "ANI" # ensure same as ANI-1x to avoid re-downloading
@classmethod
def files_to_download(cls) -> list[FileDownload]:
return [
FileDownload(
url="https://springernature.figshare.com/ndownloader/files/18112775",
expected_hash="fe0ba06198ee",
local_name="ani1x-release.h5",
)
]
@classmethod
def get_structures(
cls, tmp_dir: Path, progress: Progress
) -> Iterator[Atoms]:
with h5py.File(tmp_dir / "ani1x-release.h5", "r") as f:
n_structures = sum(
(~np.isnan(data["ccsd(t)_cbs.energy"][()])).sum()
for data in f.values()
)
task = progress.new_task(
"Processing 500k structures",
total=n_structures,
)
ani1x_idx = -1
# iterate over each chemical formula in the dataset:
for data in f.values():
Zs = data["atomic_numbers"]
coords = data["coordinates"][()]
cc_energy = data["ccsd(t)_cbs.energy"][()]
dft_energy = data["wb97x_dz.energy"][()]
dft_forces = data["wb97x_dz.forces"][()]
dft_dipole = data["wb97x_dz.dipole"][()]
for i in range(len(cc_energy)):
ani1x_idx += 1
if np.isnan(cc_energy[i]):
continue
structure = Atoms(
positions=coords[i],
numbers=Zs,
)
# see: https://www.nature.com/articles/s41597-020-0473-z/tables/2
# energy is in hartree, convert to eV
structure.info["dft_energy"] = dft_energy[i] * Ha_to_eV
structure.info["cc_energy"] = cc_energy[i] * Ha_to_eV
# units of e * angstrom
structure.info["dft_dipole"] = dft_dipole[i]
structure.info["1x_idx"] = ani1x_idx
# forces are in hartree/angstrom, convert to eV/angstrom
structure.arrays["dft_forces"] = dft_forces[i] * Ha_to_eV
task.update(advance=1)
yield structure
DatabaseEntry
for ANI-1ccx
name: ANI-1ccx
year: 2019
category: Benchmarks
license: CC0
minimum_load_atoms_version: 0.3
format: lmdb
description: |
The ANI-1ccx dataset comprises an "optimally spanning" subset of the :doc:`/datasets/ANI-1x` dataset,
with each structure being re-labelled with the total structure energy using the
"gold standard" CCSD(T)/CBS level of theory. Internall, files are downloaded from
`FigShare <https://springernature.figshare.com/collections/The_ANI-1ccx_and_ANI-1x_data_sets_coupled-cluster_and_density_functional_theory_properties_for_molecules/4712477>`__.
citation: |
@article{Smith-19-07,
title = {
Approaching Coupled Cluster Accuracy with a
General-Purpose Neural Network Potential
through Transfer Learning
},
author = {
Smith, Justin S. and Nebgen, Benjamin T. and Zubatyuk, Roman
and Lubbers, Nicholas and Devereux, Christian and Barros, Kipton
and Tretiak, Sergei and Isayev, Olexandr and Roitberg, Adrian E.
},
year = {2019},
journal = {Nature Communications},
volume = {10},
number = {1},
pages = {2903},
doi = {10.1038/s41467-019-10827-4},
}
@article{Smith-20-05,
title = {
The ANI-1ccx and ANI-1x Data Sets, Coupled-Cluster
and Density Functional Theory Properties for Molecules
},
author = {
Smith, Justin S. and Zubatyuk, Roman and Nebgen, Benjamin and
Lubbers, Nicholas and Barros, Kipton and Roitberg, Adrian E. and
Isayev, Olexandr and Tretiak, Sergei
},
year = {2020},
journal = {Scientific Data},
volume = {7},
number = {1},
pages = {134},
doi = {10.1038/s41597-020-0473-z},
}
@article{Smith-18-05,
title = {
Less Is More: Sampling Chemical Space with Active Learning
},
author = {
Smith, Justin S. and Nebgen, Ben and Lubbers, Nicholas and
Isayev, Olexandr and Roitberg, Adrian E.
},
year = {2018},
journal = {The Journal of Chemical Physics},
volume = {148},
number = {24},
doi = {10.1063/1.5023802},
}
per_atom_properties:
dft_forces:
desc: force vectors (as labelled with :math:`\omega`\ B97x/6-31G(d))
units: eV/Å
per_structure_properties:
cc_energy:
desc: energy of the structure (as labelled with CCSD(T)/CBS)
units: eV
dft_energy:
desc: energy of the structure (as labelled with :math:`\omega`\ B97x/6-31G(d))
units: eV
dft_dipole:
desc: dipole moment of the structure (as labelled with :math:`\omega`\ B97x/6-31G(d))
units: e Å
1x_idx:
desc: index of the structure in the :doc:`/datasets/ANI-1x` dataset
representative_structure: 413_000