LAMMPS

FYI, you can open this documentation as a Google Colab notebook to follow along interactively.

LAMMPS is a fast software package for molecular dynamics simulations written in C++.

graph-pes provides a pair_style for interfacing LAMMPS to any model that inherits from GraphPESModel - that includes all of the architectures implemented in graph-pes, together with any other ones you implement yourself.

To run MD simulations with a graph-pes model, you need to do 3 things:

  1. build LAMMPS with support for graph_pes

  2. “deploy” the model for use with LAMMPS

  3. run LAMMPS with the pair_style graph_pes command

Let’s go through each of these steps in turn.

1. Building LAMMPS

Compiling LAMMPS can be a pain.

To combat this, we provide a script that attempts to automatically build LAMMPS with graph_pes support. We can’t guarantee that this will work in all cases, but we’ve tested it on various Linux machines, and in Google Colab, with success. If you have any issues, please open an issue.

Note

Our current pair_graph_pes.cpp implementation assumes that you will be running LAMMPS in a single process (no MPI parallelism), optionally with a single GPU attached. A multi-GPU and MPI parallelised pair_style is on our roadmap for the near future (watch this space).

Let’s start by downloading this script and running it with the --help flag to see what options are available:

[3]:
!wget https://tinyurl.com/graph-pes-build-lammps -O build-lammps.sh
!bash build-lammps.sh --help
Usage: ../../../scripts/build-lammps.sh [OPTIONS]

This script builds a LAMMPS executable with support for 'pair_style graph_pes'.

It performs the following tasks:
  1. Locates the graph-pes installation
  2. Creates a conda environment with necessary dependencies
  3. Clones the LAMMPS repository
  4. Patches LAMMPS with graph-pes source code
  5. Builds LAMMPS with graph-pes support

The final executable will be located at:
  ./graph_pes_lmp_cpu_only (if --cpu-only is used)
  ./graph_pes_lmp (default, with GPU support)

Requirements:
  - conda

Options:
  --help           Display this help message and exit
  --cpu-only       Build LAMMPS for CPU only (default: GPU enabled)
  --force-rebuild  Force rebuilding of conda environment and LAMMPS

This build-lammps.sh script creates a custom, throwaway conda environment to ensure a uniform build process. It you don’t have conda installed, you can install it from e.g. here.

Note: As we discover common problems, we’ll document them here:

  • CMake Error at (...) Failed to detect a default CUDA architecture. This is a common issue when installing on e.g. an HPC system. A simple fix (if available) is to run module load cuda before running the script.

Now let’s build our LAMMPS executable:

[4]:
!bash build-lammps.sh
Running build-lammps.sh with the following parameters:
          CPU_ONLY: false
    FORCE_REBUILD : false
Found graph-pes pair style at (...)/graph_pes/pair_style
Creating conda environment lammps-env-gpu-throwaway
Channels:
 - conda-forge
 - defaults
Platform: linux-64
Collecting package metadata (repodata.json): ...working... done
Solving environment: ...working... done

Downloading and Extracting Packages: ...working... done
Preparing transaction: ...working... done
Verifying transaction: ...working... done
Executing transaction: ...working... done
(...)
Installing collected packages: mpmath, typing-extensions, sympy, nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-cupti-cu12, nvidia-cublas-cu12, numpy, networkx, MarkupSafe, fsspec, filelock, triton, nvidia-cusparse-cu12, nvidia-cudnn-cu12, jinja2, nvidia-cusolver-cu12, torch
Successfully installed MarkupSafe-3.0.1 filelock-3.16.1 fsspec-2024.9.0 jinja2-3.1.4 mpmath-1.3.0 networkx-3.4.1 numpy-2.1.2 nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-9.1.0.70 nvidia-cufft-cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-cu12-12.1.0.106 nvidia-nccl-cu12-2.20.5 nvidia-nvjitlink-cu12-12.6.77 nvidia-nvtx-cu12-12.1.105 sympy-1.13.3 torch-2.4.1 triton-3.0.0 typing-extensions-4.12.2

done
#
# To activate this environment, use
#
#     $ conda activate lammps-env-gpu-throwaway
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Conda environment lammps-env-gpu-throwaway successfully activated
Cloning into 'lammps'...
remote: Enumerating objects: 403276, done.
remote: Counting objects: 100% (3486/3486), done.
remote: Compressing objects: 100% (1504/1504), done.
remote: Total 403276 (delta 2317), reused 3036 (delta 1977), pack-reused 399790 (from 1)
Receiving objects: 100% (403276/403276), 757.23 MiB | 52.42 MiB/s, done.
Resolving deltas: 100% (332564/332564), done.
Updating files: 100% (13334/13334), done.
-- The CXX compiler identification is GNU 12.4.0
-- Detecting CXX compiler ABI info
-- Detecting CXX compiler ABI info - done
-- Check for working CXX compiler: (..)/miniconda3/envs/lammps-env-gpu-throwaway/bin/c++ - skipped
-- Detecting CXX compile features
-- Detecting CXX compile features - done
-- Found Git: (...)/miniconda/envs/lammps-env-gpu-throwaway/bin/git (found version "2.47.0")
-- Running check for auto-generated files from make-based build system
-- Checking for module 'mpi-cxx'
--   Package 'mpi-cxx', required by 'virtual:world', not found
-- Looking for C++ include omp.h
-- Looking for C++ include omp.h - found
-- Found OpenMP_CXX: -fopenmp (found version "4.5")
-- Found OpenMP: TRUE (found version "4.5") found components: CXX
-- Found CURL: (...)/miniconda/envs/lammps-env-gpu-throwaway/lib/libcurl.so (found version "8.10.1") found components: HTTP HTTPS
-- Found GZIP: /usr/bin/gzip
-- Could NOT find FFMPEG (missing: FFMPEG_EXECUTABLE)
-- Looking for C++ include cmath
-- Looking for C++ include cmath - found
-- Generating style headers...
-- Generating package headers...
-- Generating lmpinstalledpkgs.h...
-- Found Python3: (...)/miniconda/envs/lammps-env-gpu-throwaway/bin/python3.10 (found version "3.10.15") found components: Interpreter
-- Could NOT find ClangFormat (missing: ClangFormat_EXECUTABLE) (Required is at least version "11.0")
-- The following tools and libraries have been found and configured:
 * Git
 * OpenMP
 * CURL
 * Python3

-- <<< Build configuration >>>
   LAMMPS Version:   20240829 patch_29Aug2024-modified
   Operating System: Linux Rocky 8.9
   CMake Version:    3.30.5
   Build type:       RelWithDebInfo
   Install path:     (...)/.local
   Generator:        Unix Makefiles using /usr/bin/gmake
-- Enabled packages: <None>
-- <<< Compilers and Flags: >>>
-- C++ Compiler:     (...)/miniconda/envs/lammps-env-gpu-throwaway/bin/c++
      Type:          GNU
      Version:       12.4.0
      C++ Standard:  17
      C++ Flags:     -O2 -g -DNDEBUG
      Defines:       LAMMPS_SMALLBIG;LAMMPS_MEMALIGN=64;LAMMPS_OMP_COMPAT=4;LAMMPS_CURL;LAMMPS_GZIP
-- <<< Linker flags: >>>
-- Executable name:  lmp
-- Static library flags:
-- Found CUDA: /usr/local/cuda-12.2 (found version "12.2")
-- The CUDA compiler identification is NVIDIA 12.2.140
-- Detecting CUDA compiler ABI info
-- Detecting CUDA compiler ABI info - done
-- Check for working CUDA compiler: /usr/local/cuda-12.2/bin/nvcc - skipped
-- Detecting CUDA compile features
-- Detecting CUDA compile features - done
-- Found CUDAToolkit: /usr/local/cuda-12.2/include (found version "12.2.140")
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD
-- Performing Test CMAKE_HAVE_LIBC_PTHREAD - Failed
-- Looking for pthread_create in pthreads
-- Looking for pthread_create in pthreads - not found
-- Looking for pthread_create in pthread
-- Looking for pthread_create in pthread - found
-- Found Threads: TRUE
-- Caffe2: CUDA detected: 12.2
-- Caffe2: CUDA nvcc is: /usr/local/cuda-12.2/bin/nvcc
-- Caffe2: CUDA toolkit directory: /usr/local/cuda-12.2
-- Caffe2: Header version is: 12.2
-- /usr/local/cuda-12.2/lib64/libnvrtc.so shorthash is 000ca627
-- USE_CUDNN is set to 0. Compiling without cuDNN support
-- USE_CUSPARSELT is set to 0. Compiling without cuSPARSELt support
-- Autodetected CUDA architecture(s):  8.6 8.6
-- Added CUDA NVCC flags for: -gencode;arch=compute_86,code=sm_86
-- Found Torch: (...)/miniconda/envs/lammps-env-gpu-throwaway/lib/python3.10/site-packages/torch/lib/libtorch.so
-- Configuring done (4.9s)
-- Generating done (0.0s)
-- Build files have been written to: (...)/graph-pes/ignore/lammps/build
Building LAMMPS executable
(...)
[ 98%] Linking CXX static library liblammps.a
[ 98%] Built target lammps
[ 98%] Building CXX object CMakeFiles/lmp.dir(...)/graph-pes/ignore/lammps/src/main.cpp.o
[100%] Linking CXX executable lmp
[100%] Built target lmp
LAMMPS executable successfully built with graph-pes, and is available at ./graph_pes_lmp

We can check that our graph_pes_lmp executable works by inspecting the list of available pair_style modules:

[6]:
!./graph_pes_lmp -h | grep -A8 "* Pair styles:"
* Pair styles:

born            buck            buck/coul/cut   coul/cut        coul/debye
coul/dsf        coul/wolf       meam/c          reax            reax/c
mesont/tpm      graph_pes       hybrid          hybrid/omp      hybrid/molecular
hybrid/molecular/omp            hybrid/overlay  hybrid/overlay/omp
hybrid/scaled   hybrid/scaled/omp               lj/cut          lj/cut/coul/cut
lj/expand       morse           soft            table           yukawa
zbl             zero

Success!

2. Model deployment

To use a GraphPESModel with LAMMPS, we need to first convert it to a format that LAMMPS can understand. This involves compiling the PyTorch object into a C++ object using TorchScript. All function and classes provided by graph-pes are compatible with this process.

By default, the graph-pes-train command saves a deployed model to <output-dir>/lammps_model.pt. We can also deploy models from within Python using the deploy_model function.

graph_pes.utils.lammps.deploy_model(model, path)[source]

Deploy a GraphPESModel for use with LAMMPS.

Use the resulting model with LAMMPS according to:

pair_style graph_pes <cpu>
pair_coeff * * path/to/model.pt <element-of-type-1> <element-of-type-2> ...
Parameters:

Let’s create a simple LennardJones model and deploy it for use in LAMMPS:

[1]:
from graph_pes.models import LennardJones
from graph_pes.utils.lammps import deploy_model

model = LennardJones(epsilon=0.4, sigma=1.2, cutoff=5.0)
deploy_model(model, "lj_model_lammps.pt")
[2]:
!ls | grep ".*pt"
lj_model_lammps.pt

3. Running MD

Now let’s run a simple “NVT” simulation of a copper crystal using our Lennard-Jones potential.

First of all, we create a starting structure for our simulation:

[73]:
import numpy as np
from ase.build import bulk, make_supercell
from load_atoms import view

atoms = bulk("Cu", "hcp", a=3.6, c=5.8)
atoms = make_supercell(atoms, np.eye(3) * 5)
atoms.write("starting-structure.data", format="lammps-data")

view(atoms)
[73]:
[74]:
!cat starting-structure.data | head -n 15
starting-structure.data (written by ASE)

250      atoms
1  atom types
0.0                      18  xlo xhi
0.0      15.588457268119896  ylo yhi
0.0                      29  zlo zhi
                     -9                       0                       0  xy xz yz


Atoms

     1   1                       0                       0                       0
     2   1                       0      2.0784609690826525      2.8999999999999999
     3   1                       0                       0      5.8000000000000007

pair_style graph_pes usage

Usage:

pair_style graph_pes <cpu>
pair_coeff * * <model_file> <element_symbols>

Use pair_style graph_pes cpu to run the model on the CPU only. By default, the model will be run on the GPU if available.

In the pair_coeff command, replace <model_file> with a path to your deployed model and <element_symbols> with the chemical symbols of the atoms in your system (e.g. Cu for copper) in the order of their respective LAMMPS types.

For instance, if you have a structure with C, H and O atoms, and the LAMMPS types for C, H and O are 1, 2 and 3 respectively, then you should use pair_coeff * * lj_model_lammps.pt C H O.

[75]:
from pathlib import Path

input_script = """
# a basic NVT simulation using pair_style graph_pes
#
# expected variables:
#   output_dir: path to output directory
#   temp_K: target temperature in K
#   timestep_fs: simulation timestep in fs
#   total_time: total simulation time in ps

variable timestep_ps equal ${timestep_fs}/1000
variable n_steps equal ${total_time}/${timestep_ps}

log ${output_dir}/log.lammps

# Set units to 'metal' for atomic units (ps, eV, etc.)
units metal
atom_style atomic
newton off

# Read initial structure from data file
read_data starting-structure.data
mass 1 63.546

# Define graph-pes pair style
pair_style      graph_pes
pair_coeff      * * lj_model_lammps.pt Cu

# Define neighbor list
neighbor 0.5 bin
neigh_modify delay 0 every 1 check yes

timestep ${timestep_ps}

# Dump output every 50 timesteps
dump 1 all atom 50 ${output_dir}/dumps/*.data

# Thermodynamic output every 1000 steps
thermo 1000
thermo_style custom step time cpu temp etotal press
thermo_modify flush yes

# Setup NVT ensemble with Nose-Hoover thermostat
# relaxation time of 10 fs
fix 1 all nvt temp ${temp_K} ${temp_K} 0.01
velocity all create ${temp_K} 42
run ${n_steps}
"""

Path("nvt.in").write_text(input_script);

Now lets (finally) run our simulation:

[76]:
! mkdir -p nvt-simulation/dumps
! ./graph_pes_lmp -in nvt.in -var output_dir "nvt-simulation" -var temp_K 400 -var timestep_fs 1 -var total_time 10

LAMMPS (29 Aug 2024)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
  using 1 OpenMP thread(s) per MPI task
Reading data file ...
  triclinic box = (0 0 0) to (18 15.588457 29) with tilt (-9 0 0)
WARNING: Triclinic box skew is large. LAMMPS will run inefficiently. (src/domain.cpp:221)
  1 by 1 by 1 MPI processor grid
  reading atoms ...
  250 atoms
  read_data CPU = 0.001 seconds
GraphPES is using device cuda
Loading model from lj_model_lammps.pt
Freezing TorchScript model...

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Your simulation uses code contributions which should be cited:
- Type Label Framework: https://doi.org/10.1021/acs.jpcb.3c08419
The log file lists these citations in BibTeX format.

CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE

Neighbor list info ...
  update: every = 1 steps, delay = 0 steps, check = yes
  max neighbors/atom: 2000, page size: 100000
  master list distance cutoff = 5.5
  ghost atom cutoff = 5.5
  binsize = 2.75, bins = 10 6 11
  1 neighbor lists, perpetual/occasional/extra = 1 0 0
  (1) pair graph_pes, perpetual
      attributes: full, newton off
      pair build: full/bin/atomonly
      stencil: full/bin/3d
      bin: standard
Setting up Verlet run ...
  Unit style    : metal
  Current step  : 0
  Time step     : 0.001
Per MPI rank memory allocation (min/avg/max) = 3.076 | 3.076 | 3.076 Mbytes
   Step          Time           CPU            Temp          TotEng         Press
         0   0              0              400            9.4960562      361.48303
      1000   1              3.411231       432.8207      -87.101875     -880.49216
      2000   2              6.4558746      413.18378     -190.99657     -2779.0133
      3000   3              9.5907054      437.73432     -245.78015      658.24861
      4000   4              12.88936       404.33008     -279.66436      1123.0481
      5000   5              16.439895      404.60413     -290.74592      132.02435
      6000   6              20.475099      405.53413     -325.35688      3090.0458
      7000   7              24.82393       454.21859     -338.08393     -370.19839
      8000   8              30.120096      414.12291     -371.99827      1340.3408
      9000   9              36.077551      439.844       -378.79259      6711.7289
     10000   10             42.336936      404.27497     -385.80563     -737.94683
Loop time of 42.3371 on 1 procs for 10000 steps with 250 atoms

Performance: 20.408 ns/day, 1.176 hours/ns, 236.199 timesteps/s, 59.050 katom-step/s
99.4% CPU use with 1 MPI tasks x 1 OpenMP threads

MPI task timing breakdown:
Section |  min time  |  avg time  |  max time  |%varavg| %total
---------------------------------------------------------------
Pair    | 42.133     | 42.133     | 42.133     |   0.0 | 99.52
Neigh   | 0.064318   | 0.064318   | 0.064318   |   0.0 |  0.15
Comm    | 0.021263   | 0.021263   | 0.021263   |   0.0 |  0.05
Output  | 0.055924   | 0.055924   | 0.055924   |   0.0 |  0.13
Modify  | 0.048878   | 0.048878   | 0.048878   |   0.0 |  0.12
Other   |            | 0.01398    |            |       |  0.03

Nlocal:            250 ave         250 max         250 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost:            732 ave         732 max         732 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs:              0 ave           0 max           0 min
Histogram: 1 0 0 0 0 0 0 0 0 0
FullNghs:         9430 ave        9430 max        9430 min
Histogram: 1 0 0 0 0 0 0 0 0 0

Total # of neighbors = 9430
Ave neighs/atom = 37.72
Neighbor list builds = 394
Dangerous builds = 0
Total wall time: 0:00:42

That took 42s to simulate 10 ps of MD.

We’ve used a very simple LJ model here, and so we expect (a) the simulation to run very quickly, and (b) the MD to have produced some LJ type clusters:

[77]:
from pathlib import Path

from ase.io import read

final_file = sorted(
    Path("nvt-simulation/dumps/").glob("*.data"),
    key=lambda x: int(x.stem),
)[-1]
final_structure = read(str(final_file), format="lammps-dump-text")
final_structure.symbols = "Cu" * len(final_structure)
view(final_structure)
[77]: