load-atoms

load-atoms logo

Important

This project is under active development. Until version 1.0.0 is released, breaking changes to the API may occur.

load-atoms is a Python package for downloading, inspecting and manipulating datasets of atomic structures.

Quickstart

Install using pip install load-atoms, and then use load_dataset() to download an AtomsDataset (full list available here):

>>> from load_atoms import load_dataset
>>> dataset = load_dataset("QM9")
╭───────────────────────────────── QM9 ─────────────────────────────────╮
│                                                                       │
│   Downloading dsgdb9nsd.xyz.tar.bz2 ━━━━━━━━━━━━━━━━━━━━ 100% 00:09   │
│   Extracting dsgdb9nsd.xyz.tar.bz2  ━━━━━━━━━━━━━━━━━━━━ 100% 00:18   │
│   Processing files                  ━━━━━━━━━━━━━━━━━━━━ 100% 00:19   │
│   Caching to disk                   ━━━━━━━━━━━━━━━━━━━━ 100% 00:02   │
│                                                                       │
│            The QM9 dataset is covered by the CC0 license.             │
│        Please cite the QM9 dataset if you use it in your work.        │
│          For more information about the QM9 dataset, visit:           │
│                            load-atoms/QM9                             │
╰───────────────────────────────────────────────────────────────────────╯

These are thin wrappers around lists of ase.Atoms:

>>> dataset[0]
Atoms(symbols='CH4', pbc=False, partial_charges=...)

view() provides interactive visualization of atomic structures:

>>> from load_atoms import view
>>> view(dataset[23_810], show_bonds=True)

We provide several dataset-level operations:

>>> small_structures = dataset.filter_by(lambda atoms: len(atoms) < 10)
>>> dataset.info["energy"]
array([-10.5498, -6.9933,  ...,  -5.7742, -6.3021])
>>> trainset, testset = dataset.random_split([0.9, 0.1], seed=42)

pip installing load-atoms also installs a command line interface for downloading datasets to local files that are readable by ase.io.read():

$ load-atoms -h
usage: load-atoms [-h] [--format FORMAT] [--root ROOT] dataset_id

Download a load_atoms dataset from the internet to a local, `ase.io.read`able file.

positional arguments:
dataset_id       ID of the dataset to download

optional arguments:
-h, --help       show this help message and exit
--format FORMAT  Format to save the dataset in. Must be one of the formats supported by `ase.io.write`.
--root ROOT      Root directory to save the dataset (default: current directory)

Contributing

load-atoms was originally conceived and developed by me, John Gardner, as part of my PhD research at the University of Oxford within the Deringer Group.

If you are interested in contributing to the project, be that adding new functionality, suggesting a dataset or fixing a bug, please see the developer guide and feel free to open an issue or pull request on the GitHub repository.