
Applications User Guide
Spack
Spack is a package manager for supercomputers, Linux, and macOS. It makes installing scientific software more easy. Spack packages are installation scripts, which are essentially recipes for building (and testing) software.
Setting spack environment
$ module load spack
- How to check compilers available in the machine and installed through Spack utility?
spack compilers
- How to find compilers, library packages and application installed on machine and
If the package is available as module files, use below commands to find and load package in user environment.
$ module avail or module av (this will give you list of available modules)
$ module load package_name (to load package)
- How to find the path of installed package.
$ spack find –path <package-name>
Load Intel compiler
spack load intel-oneapi-compilers
Matching packages:
dc7udtm intel-oneapi-compilers@2021.2.0%gcc@11.2.0 arch=linux-centos7-cascadelake
toomj3b intel-oneapi-compilers@2021.3.0%gcc@8.3.0 arch=linux-centos7-skylake_avx512
zgrpvbj intel-oneapi-compilers@2021.3.0%gcc@11.2.0 arch=linux-centos7-cascadelake
forrfki intel-oneapi-compilers@2021.4.0%gcc@11.2.0 arch=linux-centos7-cascadelake
ksraq7r intel-oneapi-compilers@2022.0.1%gcc@11.2.0 arch=linux-centos7-cascadelake
sooevvg intel-oneapi-compilers@2022.1.0%gcc@12.1.0 arch=linux-centos7-cascadelake
Use a more specific spec (e.g., prepend ‘/’ to the hash).
spack load /toomj3b
(type the required hashcode as per requirement)
Check loaded compiler
Intel – which icc
MPI – whichmpirun
GCC – whichgcc
Check installed application using spack
spack find packageName
- Install Gromacs using spack
- Check the package availability
spack find gromacs
- Set environment
spack load gromacs@2021.2
Check availability of mpi with gromacs
gmx_mpi
- Get dataset for benchmarking
wget –no-check-certificate http://ftp.gromacs.org/pub/benchmarks/water_GMX50_bare.tar.gz
- Untar the file
tar -xvf water_GMX50_bare.tar.gz
cd water-cut1.0_GMX50_bare/
cd 3072/
- Running Gromacs
gmx_mpigrompp -f pme.mdp -c conf.gro -p topol.top -o water_pme.tpr
gmp_mpigrompp -f pme.mdp -c conf.gro -p topol.top -o water_pme.tpr
gmx_mpimdrun -nsteps 40 -s water_pme.tpr
- Sample Output
Core t (s) Wall t (s) (%)
Time: 638.801 16.002 3992.1
(ns/day) (hour/ns)
Performance: 0.443 54.206
- Benchmarking on multiple nodes
Script to submit the job-
#!/bin/bash
#SBATCH -N 3 #number of nodes
#SBATCH –ntasks-per-node=16 #MPI processes per node
#SBATCH –time=15:00:00 #maximum wall time allocated for the job
#SBATCH –job-name=gromacs_test #job name
#SBATCH –error=gromacs.%J.err #filename for error file
#SBATCH –output=gromacs.%J.out #filename for output file
#SBATCH –partition=cpu #partition cpu/gpu
ulimit -s unlimited #setting up unlimited stack space#settingSpack source
source /home/apps/spack/share/spack/setup-env.sh
spack load gromacs@2021.2
timempirun -np $SLURM_NTASKS gmx_mpimdrun -ntomp 4 -s water_pme.tpr
Command to run script-
sbatch<script_name>
sbatch gromacs.sh
After successful run two file will be created as mentioned in script –
- Output-file (Output can be checked in it)
2 . Error file (for error checking refer this file)
Batch Script for running gromacs on Cluster (GPU)
#!/bin/bash
#SBATCH -N 2 #number of nodes
#SBATCH –ntasks-per-node=16 #MPI processes per node
#SBATCH –time=15:00:00 #maximum wall time allocated
#SBATCH –job-name=gromacs_test #job name
#SBATCH –error=gro_gpu.%J.err #filename for error file
#SBATCH –output=gro_gpu.%J.out #filename for output file
#SBATCH –partition=gpu #partition cpu/gpu
#SBATCH –gres=gpu:2 # number of GPUs per node
#setting up unlimited stack space
ulimit -s unlimited
source /home/apps/spack/share/spack/setup-env.sh
spack load gromacs@2022.2%gcc@=11.2.0# program execution
mpirun –oversubscribe -np $SLURM_NTASKS gmx_mpimdrun -nsteps 40 -s water_pme.tpr
- Install Lammps using sapck
- Installation of lammps with spack -(molecule enabled)
spack install -v -j30 lammps@20220623 +asphere +class2 +kspace +manybody +molecule +mpiio +opt +replica +rigid +granular +openmp ^openmpi^cmake@3.21.3%gcc@10.3.0
- Set environment
- Check installed package
spack find lammps
spack find lammps
— linux-centos7-cascadelake / gcc@10.3.0 ———————–
lammps@20220623
— linux-centos7-cascadelake / intel@2021.3.0 ——————-
lammps@20210310
==> 2 installed packages
Add package touser environment
spack load lammps@20210310
- Check loaded packages in user environment
spack find –loaded
- Benchmarking
Redirect to examples/melt
cd Application-Data/lammps/lammps-10Mar21/examples/melt
- Running lammps
lmp -in in.melt
LAMMPS (10 Mar 2021)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/src/comm.cpp:94)
using
1 OpenMP thread(s) per MPI task
Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962
Created orthogonal box = (0.0000000 0.0000000 0.0000000) to (16.795962 16.795962 16.795962)
1 by 1 by 1 MPI processor grid
Created 4000 atoms
create_atoms CPU = 0.095 seconds
Neighbor list info …
update every 20 steps, delay 0 steps, check no
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 2.8
ghost atom cutoff = 2.8
binsize = 1.4, bins = 12 12 12
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pairlj/cut, perpetual
attributes: half, newton on
pair build: half/bin/atomonly/newton
stencil: half/bin/3d/newton
bin: standard
Setting up Verletrun …
Unit style : lj
Current step : 0
Time step : 0.005
Per MPI rank memory allocation (min/avg/max) = 3.222 | 3.222 | 3.222 Mbytes
Step Temp E_pairE_molTotEng Press
0 3 -6.7733681 0 -2.2744931 -3.7033504
50 1.6758903 -4.7955425 0 -2.2823355 5.670064
100 1.6458363 -4.7492704 0 -2.2811332 5.8691042
150 1.6324555 -4.7286791 0 -2.280608 5.9589514
200 1.6630725 -4.7750988 0 -2.2811136 5.7364886
250 1.6275257 -4.7224992 0 -2.281821 5.9567365
Loop time of 0.715434 on 1 procs for 250 steps with 4000 atoms
Performance: 150957.395 tau/day, 349.438 timesteps/s
99.4% CPU use with 1 MPI tasks x 1 OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
—————————————————————
Pair | 0.60798 | 0.60798 | 0.60798 | 0.0 | 84.98
Neigh | 0.080307 | 0.080307 | 0.080307 | 0.0 | 11.22
Comm | 0.010035 | 0.010035 | 0.010035 | 0.0 | 1.40
Output | 0.0047811 | 0.0047811 | 0.0047811 | 0.0 | 0.67
Modify | 0.010413 | 0.010413 | 0.010413 | 0.0 | 1.46
Other | | 0.001916 | | | 0.27
Nlocal: 4000.00 ave 4000 max 4000 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Nghost: 5499.00 ave 5499 max 5499 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Neighs: 151513.0 ave 151513 max 151513 min
Histogram: 1 0 0 0 0 0 0 0 0 0
Total # of neighbors = 151513
Ave neighs/atom = 37.878250
Neighbor list builds = 12
Dangerous builds not checked
Total wall time: 0:00:02
Try running LAMMPS with MPI
mpirun -np 4 lmp -in in.melt
LAMMPS (10 Mar 2021)
OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/src/comm.cpp:94)
using 1 OpenMP thread(s) per MPI task
Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962
Created orthogonal box = (0.0000000 0.0000000 0.0000000) to (16.795962 16.795962 16.795962)
1 by 1 by 5 MPI processor grid
Created 4000 atoms
create_atoms CPU = 0.002 seconds
Neighbor list info …
update every 20 steps, delay 0 steps, check no
max neighbors/atom: 2000, page size: 100000
master list distance cutoff = 2.8
ghost atom cutoff = 2.8
binsize = 1.4, bins = 12 12 12
1 neighbor lists, perpetual/occasional/extra = 1 0 0
(1) pairlj/cut, perpetual
attributes: half, newton on
pair build: half/bin/atomonly/newton
stencil: half/bin/3d/newton
bin: standard
Setting up Verletrun …
Unit style : lj
Current step : 0
Time step : 0.005
Per MPI rank memory allocation (min/avg/max) = 2.735 | 2.755 | 2.760 Mbytes
Step Temp E_pairE_molTotEng Press
0 3 -6.7733681 0 -2.2744931 -3.7033504
50 1.6892453 -4.8154864 0 -2.2822519 5.5409013
100 1.6610305 -4.7717142 0 -2.2807913 5.731998
150 1.6549075 -4.7625049 0 -2.2807643 5.8132757
200 1.6297199 -4.7245501 0 -2.2805815 5.97328
250 1.6319279 -4.7282494 0 -2.2809695 5.9259169
Loop time of 0.202217 on 5 procs for 250 steps with 4000 atoms
Performance: 534080.130 tau/day, 1236.297 timesteps/s
99.5% CPU use with 5 MPI tasks x 1 OpenMP threads
MPI task timing breakdown:
Section | min time | avg time | max time |%varavg| %total
—————————————————————
Pair | 0.14526 | 0.15024 | 0.16138 | 1.5 | 74.30
Neigh | 0.019629 | 0.020201 | 0.020766 | 0.3 | 9.99
Comm | 0.016437 | 0.028303 | 0.033698 | 3.7 | 14.00
Output | 0.00013851 | 0.0001916 | 0.00025488 | 0.0 | 0.09
Modify | 0.0024797 | 0.002548 | 0.0027186 | 0.2 | 1.26
Other | | 0.0007309 | | | 0.36
Nlocal: 800.000 ave 810 max 791 min
Histogram: 1 0 1 1 0 0 0 1 0 1
Nghost: 3008.00 ave 3019 max 2990 min
Histogram: 1 0 0 0 0 1 1 0 1 1
Neighs: 30321.0 ave 31356 max 29290 min
Histogram: 1 0 0 0 2 1 0 0 0 1
Total # of neighbors = 151605
Ave neighs/atom = 37.901250
Neighbor list builds = 12
Dangerous builds not checked
Total wall time: 0:00:00
Try running LAMMPS with Slurm Script
#!/bin/sh
#SBATCH -N 1 # specify number of nodes
#SBATCH –ntasks-per-node=10 # specify number of cores per node
#SBATCH –time=00:10:00 # specify maximum duration of run
#SBATCH –job-name=lammps # specify job name
#SBATCH –error=lammps.%J.err # specify error file name
#SBATCH –output=lammps.%J.out # specify output file name
#SBATCH –partition=cpu # specify type of resource such as CPU/GPU/High memory etc.
### Load the necessary modules and environment for running
ulimit -s unlimited
. /home/apps/spack/share/spack/setup-env.sh
spack load intel-oneapi-compilers@2021.3.0 /zgrpvbj
spack load intel-oneapi-mpi@2021.3.0 /ceebwul
spack load lammps@20210310
#Change to the directory where the input files are located
cd /home/cdacapp/Application-Data/lammps/lammps-10Mar21/examples/crack/
### Run the mpi program with mpirun
mpirun -np 4 lmp -in in.crack
Command to run script-
sbatch<script_name>
sbatch lammps.sh
After successful run two file will be created as mentioned in script –
- Output-file (Output can be checked in it)
2 . Error file (for error checking refer this file)
- Running Quatum ESPRESSO
- Setting Environment
spack load quantum-espresso@7.0
- Obtaining Benchmarks
This is a larger case suitable for scaling on large distributed systems.
Can be obtained from:
https://repository.praceri.eu/git/UEABS/ueabs/tree/master/quantum_espresso/test_cases/medium
cd $pwd/quantum-espresso/ueabs/quantum_espresso/test_cases
- Running test_cases
timempirun -n 4 pw.x -npool 2 -in ausurf.in
Dense grid: 2158381 G-vectors FFT dimensions: ( 180, 90, 288)
Smooth grid: 763307 G-vectors FFT dimensions: ( 125, 64, 200)
Estimated max dynamical RAM per process > 5.74 GB
Estimated total dynamical RAM > 22.44 GB
Initial potential from superposition of free atoms
starting charge 1230.6995, renormalised to 1232.0000
negative rho (up, down): 3.043E+00 0.000E+00
Starting wfcs are 1008 randomized atomic wfcs
totalcpu time spent up to now is 84.0 secs
per-process dynamical memory: 2077.5 Mb
Self-consistent Calculation
iteration # 1 ecut= 25.00 Ry beta= 0.70
Davidson diagonalization with overlap
ethr = 1.00E-02, avg # of iterations = 5.0
Threshold (ethr) on eigenvalues was too large:
Diagonalizing with lowered threshold
Davidson diagonalization with overlap
c_bands: 3 eigenvalues not converged
ethr = 4.37E-04, avg # of iterations = 20.0
negative rho (up, down): 2.992E+00 0.000E+00
totalcpu time spent up to now is 556.2 secs
total energy = -11423.48950106 Ry
estimatedscf accuracy < 6.31679869 Ry
iteration # 2 ecut= 25.00 Ry beta= 0.70
Davidson diagonalization with overlap
ethr = 5.13E-04, avg # of iterations = 16.0
negative rho (up, down): 2.993E+00 0.000E+00
totalcpu time spent up to now is 996.1 secs
total energy = -11408.37866024 Ry
estimatedscf accuracy < 196.20589225 Ry
iteration # 3 ecut= 25.00 Ry beta= 0.70
Davidson diagonalization with overlap
ethr = 5.13E-04, avg # of iterations = 10.5
negative rho (up, down): 3.038E+00 0.000E+00
totalcpu time spent up to now is 1363.2 secs
total energy = -11426.41161046 Ry
estimatedscf accuracy < 5.03428114 Ry
iteration # 4 ecut= 25.00 Ry beta= 0.70
Davidson diagonalization with overlap
ethr = 4.09E-04, avg # of iterations = 2.5
negative rho (up, down): 3.046E+00 0.000E+00
totalcpu time spent up to now is 1547.2 secs
total energy = -11426.62748943 Ry
estimatedscf accuracy < 4.18858357 Ry
iteration # 5 ecut= 25.00 Ry beta= 0.70
Davidson diagonalization with overlap
ethr = 3.40E-04, avg # of iterations = 2.0
negative rho (up, down): 3.061E+00 0.000E+00
totalcpu time spent up to now is 1729.5 secs
total energy = -11426.86607226 Ry
estimatedscf accuracy < 4.92165230 Ry
iteration # 6 ecut= 25.00 Ry beta= 0.70
Davidson diagonalization with overlap
ethr = 3.40E-04, avg # of iterations = 1.0
negative rho (up, down): 3.070E+00 0.000E+00
totalcpu time spent up to now is 1883.4 secs
total energy = -11427.08074058 Ry
estimatedscf accuracy < 0.09057940 Ry
iteration # 7 ecut= 25.00 Ry beta= 0.70
Davidson diagonalization with overlap
c_bands: 2 eigenvalues not converged
ethr = 7.35E-06, avg # of iterations = 20.0
negative rho (up, down): 3.079E+00 0.000E+00
totalcpu time spent up to now is 2207.5 secs
total energy = -11427.07323221 Ry
estimatedscf accuracy < 0.20864446 Ry
iteration # 8 ecut= 25.00 Ry beta= 0.70
Davidson diagonalization with overlap
ethr = 7.35E-06, avg # of iterations = 16.5
negative rho (up, down): 3.080E+00 0.000E+00
totalcpu time spent up to now is 2448.4 secs
total energy = -11427.08884909 Ry
estimatedscf accuracy < 0.09443612 Ry
iteration # 9 ecut= 25.00 Ry beta= 0.70
Davidson diagonalization with overlap
ethr = 7.35E-06, avg # of iterations = 3.0
negative rho (up, down): 3.082E+00 0.000E+00
totalcpu time spent up to now is 2631.2 secs
total energy = -11427.09314220 Ry
estimatedscf accuracy < 0.01256330 Ry
iteration # 10 ecut= 25.00 Ry beta= 0.70
Davidson diagonalization with overlap
Benchmarking on multiple nodes
For 4 node
Script to run Qunatum-Espresso with 4 nodes
#!/bin/bash
#SBATCH –job-name=qe_test
#SBATCH -o qe_out%j.out
#SBATCH -e qe_err%j.err
#SBATCH -N 4
#SBATCH –ntasks-per-node=4
echo -e ‘\n submitted Quantum Espresso job’
echo ‘hostname’
hostname
# loads Open MPI and Quantum Espresso modules
#module load openmpi/gcc
#module load qe
source /home/apps/spack/share/spack/setup-env.sh
spack load quantum-espresso@7.0
# run Quantum Espresso using Open MPI’s mpirun
# results will be printed to output.file
#command to run – time mpirun -n 4 pw.x -npool 2 -in <input file>
timempirun -n 4 pw.x -npool 2 -in /home/cdacapp/Application-Data/quantum-espresso/ueabs/quantum_espresso/test_cases/small/ausurf.in
Command to run script-
sbatch<script_name>
sbatch job.sh
After successful run two file will be created as mentioned in script –
- Output-file (Output can be checked in it)
2 . Error file (for error checking refer this file)
- Install OpenFoam using spack
- Check the package availability
spack find openfoam
- Set environment
spack load openfoam@2106
- Running OpenFOAM
- Obtaining Benchmarks
# Obtaining Benchmark: motorBike 2M cells
wgethttp://openfoamwiki.net/images/6/62/Motorbike_bench_template.tar.gz
tar -xzvf Motorbike_bench_template.tar.gz
cdbench_template
- Run the script – run.sh
#!/bin/bash
# Prepare cases
fori in 1 2 4 6 8 12 16 20 24; do
d=run_$i
echo “Prepare case ${d}…”
cp -r basecase $d
cd $d
if [ $i -eq 1 ]
then
mvAllmesh_serialAllmesh
fi
sed -i “s/method.*/method scotch;/” system/decomposeParDict
sed -i “s/numberOfSubdomains.*/numberOfSubdomains ${i};/” system/decomposeParDict
time ./Allmesh
cd ..
done
# Run cases
fori in 1 2 4 6 8 12 16 20 24; do
echo “Run for ${i}…”
cd run_$i
if [ $i -eq 1 ]
then
simpleFoam>log.simpleFoam 2>&1
else
mpirun -np ${i} simpleFoam -parallel >log.simpleFoam 2>&1
fi
cd ..
done
sbatch run.sh
- Output file will get generated – slurm-39266.out
- To check the status of the job running type-
squeue –me
- Install cp2k from spack
- Running cp2k
- Set environment
spack load cp2k@8.2
- Benchmarking
- Download benchmarking data from
wgethttps://repository.praceri.eu/git/UEABS/ueabs/tree/master/quantum_espresso/test_cases/medium
- cd cp2k/benchmarks/TestCaseC_H2O-DFT-LS
- Command to run test_case
mpirun -np 128 <cp2k executable> -i<input_fule.in>
mpirun -np 128 cp2k.psmp -i H2O-DFT-LS.inp
Sample Output
ATOMIC KIND INFORMATION
- Atomic kind: O Number of atoms: 2048
Orbital Basis Set DZVP-MOLOPT-SR-GTH
Number of orbital shell sets: 1
Number of orbital shells: 5
Number of primitive Cartesian functions: 5
Number of Cartesian basis functions: 14
Number of spherical basis functions: 13
Norm type: 2
Normalised Cartesian orbitals:
Set Shell Orbital Exponent Coefficient
1 1 2s 10.389228 0.396646
3.849621 0.208811
1.388401 -0.301641
0.496955 -0.274061
0.162492 -0.033677
1 2 3s 10.389228 0.303673
3.849621 0.240943
1.388401 -0.313066
0.496955 -0.043055
0.162492 0.213991
1 3 3px 10.389228 -1.530415
3.849621 -1.371928
1.388401 -0.761951
0.496955 -0.253695
0.162492 -0.035541
1 3 3py 10.389228 -1.530415
3.849621 -1.371928
1.388401 -0.761951
0.496955 -0.253695
0.162492 -0.035541
1 3 3pz 10.389228 -1.530415
3.849621 -1.371928
1.388401 -0.761951
0.496955 -0.253695
0.162492 -0.035541
1 4 4px 10.389228 -0.565392
.
.
.
.
PW_GRID| Information for grid number 1
PW_GRID| Grid distributed over 128 processors
PW_GRID| Real space group dimensions 128 1
PW_GRID| the grid is blocked: NO
PW_GRID| Cutoff [a.u.] 150.0
PW_GRID| spherical cutoff: NO
PW_GRID| Bounds 1 -210 209 Points: 420
PW_GRID| Bounds 2 -210 209 Points: 420
PW_GRID| Bounds 3 -210 209 Points: 420
PW_GRID| Volume element (a.u.^3) 0.5576E-02 Volume (a.u.^3) 413100.3686
PW_GRID| Grid span FULLSPACE
PW_GRID| Distribution Average Max Min
PW_GRID| G-Vectors 578812.5 579180 578760
PW_GRID| G-Rays 1378.1 1379 1378
PW_GRID| Real Space Points 578812.5 705600 529200
.
.
.
Total Electron Density at R=0: 0.425028
Re-scaling the density matrix to get the right number of electrons
# Electrons Trace(P) Scaling factor
16384 16384.000 1.000
Energy with the initial guess: -34723.021315705
.
.
.
Benchmarking on multiple nodes
For 4 node
Script to run Cp2k with 4 nodes
#!/bin/bash
#SBATCH –job-name=cp2k_test
#SBATCH -o cp2k_out%j.out
#SBATCH -e cp2k_err%j.err
#SBATCH -N 4
#SBATCH –ntasks-per-node=4
source /home/apps/spack/share/spack/setup-env.sh
spack load cp2k@8.2
source /home/apps/spack/share/spack/setup-env.sh
spack load cp2k@8.2
#srun cp2k.psmp test.inp
mpirun -np 128 cp2k.psmp -i /home/cdacapp/Application-Data/quantum-espresso/ueabs/cp2k/benchmarks/TestCaseC_H2O-DFT-LS/H2O-DFT-LS.inp
Command to run script-
sbatch<script_name>
sbatch cp2k.sh
After successful run two file will be created as mentioned in script –
- Output-file (Output can be checked in it)
2 . Error file (for error checking refer this file)
- Install OpenFoam using spack
- Check the package availability
spack find nwchem
- Set environment
spack load nwchem@7.0.2
- Running OpenFOAM
- Obtaining Benchmarks
wgethttps://nwchemgit.github.io/c240_631gs.nw
cd<input file directory>
- Command to run test_case
- mpirun -np $NP –bind-to core –map-by ppr:$ITE:$RESOURCE:pe=$PE <executable file name along with path to executable file>inpit_file
- mpirun -np $NP –bind-to core –map-by ppr:$ITE:$RESOURCE:pe=$PE /home/apps/spack/opt/spack/linux-centos7-cascadelake/intel-2021.3.0/nwchem-7.0.2-deq5wvlvropbxxgxbrnvhoktalam4wcu/bin/nwchem c240_631gs.nw
Output
Sum of atomic energies: -9038.46099563
Non-variational initial energy
——————————
Total energy = -9227.773601
1-e energy = -155477.628661
2-e energy = 74518.540306
HOMO = -0.162636
LUMO = -0.107661
Time after variat. SCF: 693.1
Time prior to 1st pass: 694.3
.
.
.
- Installation of cloverLeaf –
Command to install cloverLeaf –
spack install -v -j30 cloverleaf@1.1 build=mpi_only %gcc@12.2.0 ^openmpi@4.1.4
- Check the package availability
spack find cloverleaf
- Set environment
- spack load cloverleaf@1.1
- Running CloverLeaf
- Obtaining Benchmarks
- Command for running-
- export OMP_NUM_THREADS=1
- mpirun -np 40 <executable file > -i<input-file>
- mpirun -np 40 /home/apps/spack/opt/spack/linux-centos7-cascadelake/gcc-12.2.0/cloverleaf-1.1-bvvsa5dlbmit4xckpjn4kaux2t52zv7c/bin/clover_leaf -i clover_bm1024_short.in
++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++
Benchmarking
- HPCG
- Installation of HPCG
- spack install -v -j30 hpcg@3.1%gcc@12.2.0
- Check the package availability
- spack find hpcg
- Set environment
- spack load hpl@2.3
- Running HPCG
- Obtaining Benchmarks
- Input file- the input file nameddat will be created in bin folder,
- Problem size – (Line 3)
- Run time – (Line 4)
- Command to run hpcg benchmarking
- mpirun -np 32 xhpcg
- Sample Output-
- After execution it will create .txt file , refer this for output.
- The execution may take a while .
- HPL
- Installation of HPL
- spack install -v -j30 hpl@2.3 +openmp%gcc@12.2.0 ^openblas@0.3.20 ^openmpi@4.1.4
- Check the package availability
- spack find hpl
- Set environment
- spack load hpl@2.3
- Running HPL
- Obtaining Benchmarks
- Create run scripts ( run_hpl_ccx.sh ) –
#! /bin/bash
# To load HPL into environment
spack load hpl@2.3%gcc@12.2.0
### performance settings ###
echo 3 > /proc/sys/vm/drop_caches
echo 1 > /proc/sys/vm/compact_memory
echo 0 > /proc/sys/kernel/numa_balancing
echo ‘always‘ > /sys/kernel/mm/transparent_hugepage/enabled
echo ‘always‘ > /sys/kernel/mm/transparent_hugepage/defrag
ldd `which xhpl`
whichmpicc
sleep 10
# Run the appfile as root, which specifies 16 processes, each with its own CPU binding for OpenMP
# set the CPU governor to performance
# sudocpupower frequency-set -g performance
# Verify the knem module is loaded
lsmod | grep -q knem
if [ $? -eq1 ]; then
echo “Loading knem module…”
sudomodprobe -v knem
fi
mpi_options=”–mcampi_leave_pinned 1 –bind-to none –report-bindings –mcabtlself,vader”
mpi_options=”$mpi_options –map-by ppr:1:l3cache -x OMP_NUM_THREADS=4 -x OMP_PROC_BIND=TRUE -x OMP_PLACES=cores”
mpirun $mpi_options -app ./appFile_ccx
- The script “run_hpl_ccx.sh” requires two additional files: “appFile_ccx” and “xhpl_ccx.sh”.
- appFile_ccx
-np 1 ./xhpl_ccx.sh 0 0-3 4
-np 1 ./xhpl_ccx.sh 0 4-7 4
-np 1 ./xhpl_ccx.sh 0 8-11 4
-np 1 ./xhpl_ccx.sh 0 12-15 4
-np 1 ./xhpl_ccx.sh 1 16-19 4
-np 1 ./xhpl_ccx.sh 1 20-23 4
-np 1 ./xhpl_ccx.sh 1 24-27 4
-np 1 ./xhpl_ccx.sh 1 28-31 4
-np 1 ./xhpl_ccx.sh 2 32-35 4
-np 1 ./xhpl_ccx.sh 2 36-39 4
-np 1 ./xhpl_ccx.sh 2 40-43 4
-np 1 ./xhpl_ccx.sh 2 44-47 4
-np 1 ./xhpl_ccx.sh 3 48-51 4
-np 1 ./xhpl_ccx.sh 3 52-55 4
-np 1 ./xhpl_ccx.sh 3 56-59 4
-np 1 ./xhpl_ccx.sh 3 60-63 4
-np 1 ./xhpl_ccx.sh 4 64-67 4
-np 1 ./xhpl_ccx.sh 4 68-71 4
-np 1 ./xhpl_ccx.sh 4 72-75 4
-np 1 ./xhpl_ccx.sh 4 76-79 4
-np 1 ./xhpl_ccx.sh 5 80-83 4
-np 1 ./xhpl_ccx.sh 5 84-87 4
-np 1 ./xhpl_ccx.sh 5 88-91 4
-np 1 ./xhpl_ccx.sh 5 92-95 4
-np 1 ./xhpl_ccx.sh 6 96-99 4
-np 1 ./xhpl_ccx.sh 6 100-103 4
-np 1 ./xhpl_ccx.sh 6 104-107 4
-np 1 ./xhpl_ccx.sh 6 108-111 4
-np 1 ./xhpl_ccx.sh 7 112-115 4
-np 1 ./xhpl_ccx.sh 7 116-119 4
-np 1 ./xhpl_ccx.sh 7 120-123 4
-np 1 ./xhpl_ccx.sh 7 124-127 4
- sh
#! /bin/bash
#
# Bind memory to node $1 and four child threads to CPUs specified in $2
export OMP_NUM_THREADS=$3
export GOMP_CPU_AFFINITY=”$2″
export OMP_PROC_BIND=TRUE
# BLIS_JC_NT=1 (No outer loop parallelization):
export BLIS_JC_NT=1
# BLIS_IC_NT= #cores/ccx (# of 2nd level threads – one per core in the shared L3 cache domain):
export BLIS_IC_NT=$OMP_NUM_THREADS
# BLIS_JR_NT=1 (No 4th level threads):
export BLIS_JR_NT=1
# BLIS_IR_NT=1 (No 5th level threads):
export BLIS_IR_NT=1
numactl – -membind=$1 /home/apps/spack/opt/spack/linux-centos7-cascadelake/gcc-12.2.0/hpl-2.3-gneaaoilemgmzmx4kuxaciqvxm7iixav/bin/xhpl
- dat is the data file available along with the installation package in bin.
- Command to run benchmarking
- Run the run script – run_hpl_ccx.sh
- mpirun -np 4 <executable-file>
- mpirun -np 4 /home/apps/spack/opt/spack/linux-centos7-cascadelake/gcc-12.2.0/hpl-2.3-gneaaoilemgmzmx4kuxaciqvxm7iixav/bin/xhpl
- Sample Output
HPL_pdgesv() end time Wed Nov 16 17:23:27 2022
——————————————————————————–
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 2.11745643e-02 …… PASSED
================================================================================
T/V N NB P Q Time Gflops
——————————————————————————–
WR00R2L4 29 1 2 2 0.00 9.5141e-02
HPL_pdgesv() start time Wed Nov 16 17:23:27 2022
HPL_pdgesv() end time Wed Nov 16 17:23:27 2022
.
.
.
.
——————————————————————————–
||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 2.22409445e-02 …… PASSED
================================================================================
Finished 864 tests with the following results:
864 tests completed and passed residual checks,
0 tests completed and failed residual checks,
0 tests skipped because of illegal input values.
——————————————————————————–
End of Tests.
- STREAM
- Installation of Stream
- spack install -v -j30 stream@5.10%intel@2021.6.0
- Check the package availability
- spack find stream
- Set environment
- spack load stream@5.10%intel@2021.6.0
- Running HPL
- Obtaining Benchmarks
Successful installation of package will two executables
- Command to run benchmarking
- exe (c file)
- exe (fortran file)
- Sample Output
———————————————-
Double precision appears to have 16 digits of accuracy
Assuming 8 bytes per DOUBLE PRECISION word
———————————————-
———————————————-
STREAM Version $Revision: 5.6 $
———————————————-
Array size = 20000000
Offset = 0
The total memory requirement is 457 MB
You are running each test 10 times
—
The *best* time for each test is used
*EXCLUDING* the first and last iterations
———————————————-
Number of Threads = 40
———————————————-
Printing one line per active thread….
Printing one line per active thread….
Printing one line per active thread….
Printing one line per active thread….
Printing one line per active thread….
.
.
.
.
Your clock granularity/precision appears to be 1 microseconds
—————————————————-
Function Rate (MB/s) Avg time Min time Max time
Copy: ********** 0.0031 0.0026 0.0034
Scale: ********** 0.0027 0.0026 0.0028
Add: ********** 0.0039 0.0038 0.0041
Triad: ********** 0.0036 0.0035 0.0038
—————————————————-
Solution Validates!
Obtaining Benchmarks
# STMV
$ wget –no-check-certificate https://www.ks.uiuc.edu/Research/namd/utilities/stmv/par_all27_prot_na.inp
$ wget –no-check-certificate https://www.ks.uiuc.edu/Research/namd/utilities/stmv/stmv.namd
$ wget –no-check-certificate https://www.ks.uiuc.edu/Research/namd/utilities/stmv/stmv.pdb.gz
$ wget –no-check-certificate https://www.ks.uiuc.edu/Research/namd/utilities/stmv/stmv.psf.gz
$ gunzip stmv.psf.gz
$ gunzip stmv.pdb.gz
Batch Script for running namd on Cluster (CPU)
#!/bin/bash
#SBATCH –job-name=namd #Job name
#SBATCH –nodes=1 #Number of nodes requested
#SBATCH –ntasks-per-node=4 #Number of processes per node
#SBATCH –time=00:30:00 #Maximum time limit for job
#SBATCH –partition=cpu #Cluster partition setup
#SBATCH –error=namd%J.err #specify error file name
#SBATCH –output=namd%J.out #specify output file name
# Load the necessary modules and environment for running namd
. /home/apps/spack/share/spack/setup-env.sh
spack load namd@2.14 #load the namd package from spack
module load openmpi/3.1.5 #load openmpi from available module
# Change to the directory where the input files are located
cd /home/cdacapp/Application-Data/namd/apoa1/
# Run NAMD
mpirun -np $SLURM_NTASKS namd2 apoa1.namd