You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am using DeePMD-kit v3.0.0b3 for pretraining and fine-tuning with DPA-2. The software was installed offline with CUDA 11.8, which matches my CUDA version. Both the pretraining and fine-tuning processes completed successfully, and molecular dynamics simulations with ASE run without issues. However, when I attempt to run LAMMPS with a frozen model file (model.pth) using the command lmp -in in.lmp, I encounter a problem, which also exists in the version that I compiled myself. Could you please help me resolve this issue? Thank you for your assistance.
WARNING: There was an error initializing an OpenFabrics device.
Local host: xc06n08
Local device: mlx5_0
LAMMPS (2 Aug 2023)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
DeePMD-kit: Successfully load libcudart.so.12
2024-09-02 01:44:51.820682: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-02 01:44:51.820809: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-02 01:44:51.821687: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Loaded 1 plugins from /share/home/yjli/apps/dp-v300b3-cuda124/lib/deepmd_lmp
Reading data file ...
orthogonal box = (0 0 0) to (50 50 50)
1 by 1 by 1 MPI processor grid
reading atoms ...
6 atoms
Finding 1-2 1-3 1-4 neighbors ...
special bond factors lj: 0 0 0
special bond factors coul: 0 0 0
0 = max # of 1-2 neighbors
0 = max # of 1-3 neighbors
0 = max # of 1-4 neighbors
1 = max # of special neighbors
special bonds CPU = 0.000 seconds
read_data CPU = 0.003 seconds
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
Summary of lammps deepmd module ...
Info of deepmd-kit:
installed to: /share/home/yjli/apps/dp-v300b3-cuda124
source:
source branch: HEAD
source commit: cbf2de6
source commit at: 2024-07-27 05:11:58 +0000
support model ver.: 1.1
build variant: cuda
build with tf inc: /share/home/yjli/apps/dp-v300b3-cuda124/lib/python3.11/site-packages/tensorflow/include;/share/home/yjli/apps/dp-v300b3-cuda124/include
build with tf lib: /share/home/yjli/apps/dp-v300b3-cuda124/lib/python3.11/site-packages/tensorflow/libtensorflow_cc.so.2
build with pt lib: torch;torch_library;/share/home/yjli/apps/dp-v300b3-cuda124/lib/python3.11/site-packages/torch/lib/libc10.so;/home/conda/feedstock_root/build_artifacts/deepmd-kit_1722057349216/_build_env/targets/x86_64-linux/lib/stubs/libcuda.so;/share/home/yjli/apps/dp-v300b3-cuda124/lib/libnvrtc.so;/share/home/yjli/apps/dp-v300b3-cuda124/lib/libnvToolsExt.so;/share/home/yjli/apps/dp-v300b3-cuda124/lib/libcudart.so;/share/home/yjli/apps/dp-v300b3-cuda124/lib/python3.11/site-packages/torch/lib/libc10_cuda.so
set tf intra_op_parallelism_threads: 0
set tf inter_op_parallelism_threads: 0
Info of lammps module:
use deepmd-kit at: /share/home/yjli/apps/dp-v300b3-cuda124load model from: ./model.pth to gpu 0
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
Info of model(s):
using 1 model(s): ./model.pth
rcut in model: 9
ntypes in model: 3
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
update: every = 2 steps, delay = 10 steps, check = yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 11
ghost atom cutoff = 11
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair deepmd, perpetual
attributes: full, newton on
pair build: full/nsq
stencil: none
bin: none
WARNING: Proc sub-domain size < neighbor skin, could lead to lost atoms (src/domain.cpp:966)
Setting up Verlet run ...
Unit style : metal
Current step : 0
Time step : 0.0001
WARNING: Communication cutoff adjusted to 11 (src/comm.cpp:732)
ERROR on proc 0: DeePMD-kit C API Error: DeePMD-kit Error: DeePMD-kit PyTorch backend error: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/deepmd/pt/model/model/transform_output.py", line 156, in forward_lower
vvi = split_vv1[_44]
svvi = split_svv1[_44]
_45 = _36(vvi, svvi, coord_ext, do_virial, do_atomic_virial, create_graph, )
ffi, aviri, = _45
ffi0 = torch.unsqueeze(ffi, -2)
File "code/__torch__/deepmd/pt/model/model/transform_output.py", line 191, in task_deriv_one
faked_grad = torch.ones_like(energy)
lst = annotate(List[Optional[Tensor]], [faked_grad])
_52 = torch.autograd.grad([energy], [extended_coord], lst, True, create_graph)
~~~~~~~~~~~~~~~~~~~ <--- HERE
extended_force = _52[0]
if torch.__isnot__(extended_force, None):
Traceback of TorchScript, original code (most recent call last):
File "/share/home/yjli/apps/dp-v300b3-cuda124/lib/python3.11/site-packages/deepmd/pt/model/model/transform_output.py", line 138, in forward_lower
for vvi, svvi in zip(split_vv1, split_svv1):
# nf x nloc x 3, nf x nloc x 9
ffi, aviri = task_deriv_one(
~~~~~~~~~~~~~~ <--- HERE
vvi,
svvi,
File "/share/home/yjli/apps/dp-v300b3-cuda124/lib/python3.11/site-packages/deepmd/pt/model/model/transform_output.py", line 80, in task_deriv_one
faked_grad = torch.ones_like(energy)
lst = torch.jit.annotate(List[Optional[torch.Tensor]], [faked_grad])
extended_force = torch.autograd.grad(
~~~~~~~~~~~~~~~~~~~ <--- HERE
[energy],
[extended_coord],
RuntimeError: max(): Expected reduction dim to be specified for input.numel() == 0. Specify the reduction dim with the 'dim' argument.
(/home/conda/feedstock_root/build_artifacts/deepmd-kit_1722057349216/work/source/lmp/pair_deepmd.cpp:586)
Last command: run 100
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
reacted with thumbs up emoji reacted with thumbs down emoji reacted with laugh emoji reacted with hooray emoji reacted with confused emoji reacted with heart emoji reacted with rocket emoji reacted with eyes emoji
-
Dear Developers,
I am using DeePMD-kit v3.0.0b3 for pretraining and fine-tuning with DPA-2. The software was installed offline with CUDA 11.8, which matches my CUDA version. Both the pretraining and fine-tuning processes completed successfully, and molecular dynamics simulations with ASE run without issues. However, when I attempt to run LAMMPS with a frozen model file (model.pth) using the command lmp -in in.lmp, I encounter a problem, which also exists in the version that I compiled myself. Could you please help me resolve this issue? Thank you for your assistance.
Here are all the relevant files:
link: https://pan.baidu.com/s/1dfFwZhzANTwI70Pf7We5Hg
extract code: a88t
The output error message of lmp -in in.lmp:
WARNING: There was an error initializing an OpenFabrics device.
Local host: xc06n08
Local device: mlx5_0
LAMMPS (2 Aug 2023)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/comm.cpp:98)
using 1 OpenMP thread(s) per MPI task
DeePMD-kit: Successfully load libcudart.so.12
2024-09-02 01:44:51.820682: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2024-09-02 01:44:51.820809: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2024-09-02 01:44:51.821687: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
Loaded 1 plugins from /share/home/yjli/apps/dp-v300b3-cuda124/lib/deepmd_lmp
Reading data file ...
orthogonal box = (0 0 0) to (50 50 50)
1 by 1 by 1 MPI processor grid
reading atoms ...
6 atoms
Finding 1-2 1-3 1-4 neighbors ...
special bond factors lj: 0 0 0
special bond factors coul: 0 0 0
0 = max # of 1-2 neighbors
0 = max # of 1-3 neighbors
0 = max # of 1-4 neighbors
1 = max # of special neighbors
special bonds CPU = 0.000 seconds
read_data CPU = 0.003 seconds
DeePMD-kit WARNING: Environmental variable DP_INTRA_OP_PARALLELISM_THREADS is not set. Tune DP_INTRA_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable DP_INTER_OP_PARALLELISM_THREADS is not set. Tune DP_INTER_OP_PARALLELISM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
DeePMD-kit WARNING: Environmental variable OMP_NUM_THREADS is not set. Tune OMP_NUM_THREADS for the best performance. See https://deepmd.rtfd.io/parallelism/ for more information.
Summary of lammps deepmd module ...
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Your simulation uses code contributions which should be cited:
The log file lists these citations in BibTeX format.
CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE-CITE
Generated 0 of 1 mixed pair_coeff terms from geometric mixing rule
Neighbor list info ...
update: every = 2 steps, delay = 10 steps, check = yes
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 11
ghost atom cutoff = 11
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pair deepmd, perpetual
attributes: full, newton on
pair build: full/nsq
stencil: none
bin: none
WARNING: Proc sub-domain size < neighbor skin, could lead to lost atoms (src/domain.cpp:966)
Setting up Verlet run ...
Unit style : metal
Current step : 0
Time step : 0.0001
WARNING: Communication cutoff adjusted to 11 (src/comm.cpp:732)
ERROR on proc 0: DeePMD-kit C API Error: DeePMD-kit Error: DeePMD-kit PyTorch backend error: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript, serialized code (most recent call last):
File "code/torch/deepmd/pt/model/model/transform_output.py", line 156, in forward_lower
vvi = split_vv1[_44]
svvi = split_svv1[_44]
_45 = _36(vvi, svvi, coord_ext, do_virial, do_atomic_virial, create_graph, )
Beta Was this translation helpful? Give feedback.
All reactions