Menu Close

Applications User Guide

Spack

Spack is a package manager for supercomputers, Linux, and macOS. It makes installing scientific software more easy. Spack packages are installation scripts, which are essentially recipes for building (and testing) software.

Setting spack environment

$ module load spack

 

  1. How to check compilers available in the machine and installed through Spack utility?

spack compilers

 

  1. How to find compilers, library packages and application installed on machine and

If the package is available as  module files, use below commands to find and load package in user environment.

$ module avail or module av  (this will give you list of available modules)

$ module load package_name  (to load package)

 

  1. How to find the path of installed package.

$ spack find –path <package-name>

 

Load Intel compiler

spack load intel-oneapi-compilers

 Matching packages:

    dc7udtm intel-oneapi-compilers@2021.2.0%gcc@11.2.0 arch=linux-centos7-cascadelake

    toomj3b intel-oneapi-compilers@2021.3.0%gcc@8.3.0 arch=linux-centos7-skylake_avx512

zgrpvbj intel-oneapi-compilers@2021.3.0%gcc@11.2.0 arch=linux-centos7-cascadelake

forrfki intel-oneapi-compilers@2021.4.0%gcc@11.2.0 arch=linux-centos7-cascadelake

    ksraq7r intel-oneapi-compilers@2022.0.1%gcc@11.2.0 arch=linux-centos7-cascadelake

sooevvg intel-oneapi-compilers@2022.1.0%gcc@12.1.0 arch=linux-centos7-cascadelake

  Use a more specific spec (e.g., prepend ‘/’ to the hash).

 

spack load /toomj3b

(type the required hashcode as per requirement)

 

Check loaded compiler

Intel – which icc

MPI –  whichmpirun

GCC –  whichgcc

 

Check installed application using spack

spack find packageName

  • Install Gromacs using spack
  • Check the package availability

spack find gromacs

 

  • Set environment

spack load gromacs@2021.2

 

Check availability of mpi with gromacs

gmx_mpi

 

  • Get dataset for benchmarking

wget –no-check-certificate  http://ftp.gromacs.org/pub/benchmarks/water_GMX50_bare.tar.gz

 

  • Untar the file

tar -xvf water_GMX50_bare.tar.gz

cd water-cut1.0_GMX50_bare/

cd 3072/

 

  • Running Gromacs

gmx_mpigrompp -f pme.mdp -c conf.gro -p topol.top -o water_pme.tpr

gmp_mpigrompp -f pme.mdp -c conf.gro -p topol.top -o water_pme.tpr

 

gmx_mpimdrun -nsteps 40 -s water_pme.tpr

 

 

  • Sample Output

Core t (s)   Wall t (s)        (%)

            Time:      638.801       16.002     3992.1

                 (ns/day)    (hour/ns)

Performance:        0.443       54.206

 

  • Benchmarking on multiple nodes

Script to submit the job-

 

 

#!/bin/bash

#SBATCH -N 3                                                #number of nodes

#SBATCH –ntasks-per-node=16        #MPI processes per node

#SBATCH –time=15:00:00                 #maximum wall time allocated for the job

#SBATCH –job-name=gromacs_test #job name

#SBATCH –error=gromacs.%J.err     #filename for error file

#SBATCH –output=gromacs.%J.out  #filename for output file

#SBATCH –partition=cpu                   #partition cpu/gpu

 

ulimit -s unlimited #setting up unlimited stack space#settingSpack source

 

source /home/apps/spack/share/spack/setup-env.sh 

spack load gromacs@2021.2

 

timempirun -np $SLURM_NTASKS gmx_mpimdrun -ntomp 4 -s water_pme.tpr

 

 

Command to run script-

sbatch<script_name>

sbatch gromacs.sh

 

After successful run two file will be created as mentioned in script –

  1. Output-file (Output can be checked in it)

2 . Error file     (for error checking refer this file)

 

Batch Script for running gromacs on Cluster (GPU)

 

#!/bin/bash

#SBATCH -N 2                                                            #number of nodes

#SBATCH –ntasks-per-node=16                    #MPI processes per node

#SBATCH –time=15:00:00                             #maximum wall time allocated

#SBATCH –job-name=gromacs_test             #job name

#SBATCH –error=gro_gpu.%J.err                  #filename for error file

#SBATCH –output=gro_gpu.%J.out               #filename for output file

#SBATCH –partition=gpu                               #partition cpu/gpu

#SBATCH –gres=gpu:2                                              # number of GPUs per node

 

#setting up unlimited stack space

ulimit -s unlimited

 

 

source /home/apps/spack/share/spack/setup-env.sh

spack load gromacs@2022.2%gcc@=11.2.0# program execution

 

 

mpirun –oversubscribe -np $SLURM_NTASKS gmx_mpimdrun -nsteps 40 -s water_pme.tpr

  • Install Lammps using sapck
  • Installation of lammps with spack -(molecule enabled)

spack install -v -j30 lammps@20220623 +asphere +class2 +kspace +manybody +molecule +mpiio +opt +replica +rigid +granular +openmp ^openmpi^cmake@3.21.3%gcc@10.3.0

  • Set environment
    • Check installed package

spack find lammps

spack find lammps

— linux-centos7-cascadelake / gcc@10.3.0 ———————–

lammps@20220623

 

— linux-centos7-cascadelake / intel@2021.3.0 ——————-

lammps@20210310

==> 2 installed packages

 

Add package touser environment

            spack load lammps@20210310

 

  • Check loaded packages in user environment

spack find –loaded

 

  • Benchmarking

Redirect to examples/melt

cd Application-Data/lammps/lammps-10Mar21/examples/melt

 

  • Running lammps

lmp -in in.melt

 

LAMMPS (10 Mar 2021)

OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/src/comm.cpp:94)

using

1 OpenMP thread(s) per MPI task

Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962

Created orthogonal box = (0.0000000 0.0000000 0.0000000) to (16.795962 16.795962 16.795962)

  1 by 1 by 1 MPI processor grid

Created 4000 atoms

create_atoms CPU = 0.095 seconds

Neighbor list info …

update every 20 steps, delay 0 steps, check no

max neighbors/atom: 2000, page size: 100000

master list distance cutoff = 2.8

ghost atom cutoff = 2.8

binsize = 1.4, bins = 12 12 12

  1 neighbor lists, perpetual/occasional/extra = 1 0 0

  (1) pairlj/cut, perpetual

attributes: half, newton on

pair build: half/bin/atomonly/newton

stencil: half/bin/3d/newton

bin: standard

Setting up Verletrun …

  Unit style    : lj

  Current step  : 0

  Time step     : 0.005

Per MPI rank memory allocation (min/avg/max) = 3.222 | 3.222 | 3.222 Mbytes

Step Temp E_pairE_molTotEng Press

       0            3   -6.7733681            0   -2.2744931   -3.7033504

      50    1.6758903   -4.7955425            0   -2.2823355     5.670064

     100    1.6458363   -4.7492704            0   -2.2811332    5.8691042

     150    1.6324555   -4.7286791            0    -2.280608    5.9589514

     200    1.6630725   -4.7750988            0   -2.2811136    5.7364886

     250    1.6275257   -4.7224992            0    -2.281821    5.9567365

Loop time of 0.715434 on 1 procs for 250 steps with 4000 atoms

 

Performance: 150957.395 tau/day, 349.438 timesteps/s

99.4% CPU use with 1 MPI tasks x 1 OpenMP threads

 

MPI task timing breakdown:

Section |  min time  |  avg time  |  max time  |%varavg| %total

—————————————————————

Pair    | 0.60798    | 0.60798    | 0.60798    |   0.0 | 84.98

Neigh   | 0.080307   | 0.080307   | 0.080307   |   0.0 | 11.22

Comm    | 0.010035   | 0.010035   | 0.010035   |   0.0 |  1.40

Output  | 0.0047811  | 0.0047811  | 0.0047811  |   0.0 |  0.67

Modify  | 0.010413   | 0.010413   | 0.010413   |   0.0 |  1.46

Other   |            | 0.001916   |            |       |  0.27

 

Nlocal:        4000.00 ave        4000 max        4000 min

Histogram: 1 0 0 0 0 0 0 0 0 0

Nghost:        5499.00 ave        5499 max        5499 min

Histogram: 1 0 0 0 0 0 0 0 0 0

Neighs:        151513.0 ave      151513 max      151513 min

Histogram: 1 0 0 0 0 0 0 0 0 0

 

Total # of neighbors = 151513

Ave neighs/atom = 37.878250

Neighbor list builds = 12

Dangerous builds not checked

Total wall time: 0:00:02

 

Try running LAMMPS with MPI

 

mpirun -np 4 lmp -in in.melt

LAMMPS (10 Mar 2021)

OMP_NUM_THREADS environment is not set. Defaulting to 1 thread. (src/src/comm.cpp:94)

using 1 OpenMP thread(s) per MPI task

Lattice spacing in x,y,z = 1.6795962 1.6795962 1.6795962

Created orthogonal box = (0.0000000 0.0000000 0.0000000) to (16.795962 16.795962 16.795962)

  1 by 1 by 5 MPI processor grid

Created 4000 atoms

create_atoms CPU = 0.002 seconds

Neighbor list info …

update every 20 steps, delay 0 steps, check no

max neighbors/atom: 2000, page size: 100000

master list distance cutoff = 2.8

ghost atom cutoff = 2.8

binsize = 1.4, bins = 12 12 12

  1 neighbor lists, perpetual/occasional/extra = 1 0 0

  (1) pairlj/cut, perpetual

attributes: half, newton on

pair build: half/bin/atomonly/newton

stencil: half/bin/3d/newton

bin: standard

Setting up Verletrun …

  Unit style    : lj

  Current step  : 0

  Time step     : 0.005

Per MPI rank memory allocation (min/avg/max) = 2.735 | 2.755 | 2.760 Mbytes

Step Temp E_pairE_molTotEng Press

       0            3   -6.7733681            0   -2.2744931   -3.7033504

      50    1.6892453   -4.8154864            0   -2.2822519    5.5409013

     100    1.6610305   -4.7717142            0   -2.2807913     5.731998

     150    1.6549075   -4.7625049            0   -2.2807643    5.8132757

     200    1.6297199   -4.7245501            0   -2.2805815      5.97328

     250    1.6319279   -4.7282494            0   -2.2809695    5.9259169

Loop time of 0.202217 on 5 procs for 250 steps with 4000 atoms

 

Performance: 534080.130 tau/day, 1236.297 timesteps/s

99.5% CPU use with 5 MPI tasks x 1 OpenMP threads

 

MPI task timing breakdown:

Section |  min time  |  avg time  |  max time  |%varavg| %total

—————————————————————

Pair    | 0.14526    | 0.15024    | 0.16138    |   1.5 | 74.30

Neigh   | 0.019629   | 0.020201   | 0.020766   |   0.3 |  9.99

Comm    | 0.016437   | 0.028303   | 0.033698   |   3.7 | 14.00

Output  | 0.00013851 | 0.0001916  | 0.00025488 |   0.0 |  0.09

Modify  | 0.0024797  | 0.002548   | 0.0027186  |   0.2 |  1.26

Other   |            | 0.0007309  |            |       |  0.36

 

Nlocal:        800.000 ave         810 max         791 min

Histogram: 1 0 1 1 0 0 0 1 0 1

Nghost:        3008.00 ave        3019 max        2990 min

Histogram: 1 0 0 0 0 1 1 0 1 1

Neighs:        30321.0 ave       31356 max       29290 min

Histogram: 1 0 0 0 2 1 0 0 0 1

 

Total # of neighbors = 151605

Ave neighs/atom = 37.901250

Neighbor list builds = 12

Dangerous builds not checked

Total wall time: 0:00:00

 

 

Try running LAMMPS with Slurm Script

 

 

#!/bin/sh

#SBATCH -N 1                                        # specify number of nodes

#SBATCH –ntasks-per-node=10               # specify number of cores per node

#SBATCH –time=00:10:00                         # specify maximum duration of run

#SBATCH –job-name=lammps                 # specify job name

#SBATCH –error=lammps.%J.err          # specify error file name

#SBATCH –output=lammps.%J.out         # specify output file name

#SBATCH –partition=cpu                          # specify type of resource such as  CPU/GPU/High memory etc.

 

### Load the necessary modules and environment for running

 

ulimit -s unlimited

 

. /home/apps/spack/share/spack/setup-env.sh

 

spack load intel-oneapi-compilers@2021.3.0 /zgrpvbj

spack load intel-oneapi-mpi@2021.3.0 /ceebwul

spack load lammps@20210310

 

#Change to the directory where the input files are located

 

cd /home/cdacapp/Application-Data/lammps/lammps-10Mar21/examples/crack/

 

### Run the mpi program with mpirun

 

mpirun -np 4 lmp -in in.crack

 

Command to run script-

sbatch<script_name>

sbatch lammps.sh

 

After successful run two file will be created as mentioned in script –

  1. Output-file (Output can be checked in it)

2 . Error file     (for error checking refer this file)

  • Running Quatum ESPRESSO
  • Setting Environment

spack load quantum-espresso@7.0

 

  • Obtaining Benchmarks

This is a larger case suitable for scaling on large distributed systems.

Can be obtained from:

https://repository.praceri.eu/git/UEABS/ueabs/tree/master/quantum_espresso/test_cases/medium

 

cd $pwd/quantum-espresso/ueabs/quantum_espresso/test_cases

 

  • Running test_cases

timempirun -n 4 pw.x -npool 2 -in ausurf.in

 

Dense  grid:  2158381 G-vectors     FFT dimensions: ( 180,  90, 288)

     Smooth grid:   763307 G-vectors     FFT dimensions: ( 125,  64, 200)

     Estimated max dynamical RAM per process >       5.74 GB

     Estimated total dynamical RAM >      22.44 GB

     Initial potential from superposition of free atoms

starting charge    1230.6995, renormalised to    1232.0000

negative rho (up, down):  3.043E+00 0.000E+00

     Starting wfcs are 1008 randomized atomic wfcs

totalcpu time spent up to now is       84.0 secs

per-process dynamical memory:  2077.5 Mb

     Self-consistent Calculation

iteration #  1     ecut=    25.00 Ry     beta= 0.70

     Davidson diagonalization with overlap

ethr =  1.00E-02,  avg # of iterations =  5.0

 

     Threshold (ethr) on eigenvalues was too large:

Diagonalizing with lowered threshold

 

     Davidson diagonalization with overlap

c_bands:  3 eigenvalues not converged

ethr =  4.37E-04,  avg # of iterations = 20.0

 

negative rho (up, down):  2.992E+00 0.000E+00

 

totalcpu time spent up to now is      556.2 secs

 

total energy              =  -11423.48950106 Ry

estimatedscf accuracy    <       6.31679869 Ry

 

iteration #  2     ecut=    25.00 Ry     beta= 0.70

     Davidson diagonalization with overlap

ethr =  5.13E-04,  avg # of iterations = 16.0

 

negative rho (up, down):  2.993E+00 0.000E+00

 

totalcpu time spent up to now is      996.1 secs

 

total energy              =  -11408.37866024 Ry

estimatedscf accuracy    <     196.20589225 Ry

 

iteration #  3     ecut=    25.00 Ry     beta= 0.70

     Davidson diagonalization with overlap

ethr =  5.13E-04,  avg # of iterations = 10.5

 

negative rho (up, down):  3.038E+00 0.000E+00

 

totalcpu time spent up to now is     1363.2 secs

 

total energy              =  -11426.41161046 Ry

estimatedscf accuracy    <       5.03428114 Ry

 

iteration #  4     ecut=    25.00 Ry     beta= 0.70

     Davidson diagonalization with overlap

ethr =  4.09E-04,  avg # of iterations =  2.5

 

negative rho (up, down):  3.046E+00 0.000E+00

 

totalcpu time spent up to now is     1547.2 secs

 

total energy              =  -11426.62748943 Ry

estimatedscf accuracy    <       4.18858357 Ry

 

iteration #  5     ecut=    25.00 Ry     beta= 0.70

     Davidson diagonalization with overlap

ethr =  3.40E-04,  avg # of iterations =  2.0

 

negative rho (up, down):  3.061E+00 0.000E+00

 

totalcpu time spent up to now is     1729.5 secs

 

total energy              =  -11426.86607226 Ry

estimatedscf accuracy    <       4.92165230 Ry

 

iteration #  6     ecut=    25.00 Ry     beta= 0.70

     Davidson diagonalization with overlap

ethr =  3.40E-04,  avg # of iterations =  1.0

 

negative rho (up, down):  3.070E+00 0.000E+00

 

totalcpu time spent up to now is     1883.4 secs

 

total energy              =  -11427.08074058 Ry

estimatedscf accuracy    <       0.09057940 Ry

 

iteration #  7     ecut=    25.00 Ry     beta= 0.70

     Davidson diagonalization with overlap

c_bands:  2 eigenvalues not converged

ethr =  7.35E-06,  avg # of iterations = 20.0

 

negative rho (up, down):  3.079E+00 0.000E+00

 

totalcpu time spent up to now is     2207.5 secs

 

total energy              =  -11427.07323221 Ry

estimatedscf accuracy    <       0.20864446 Ry

 

iteration #  8     ecut=    25.00 Ry     beta= 0.70

     Davidson diagonalization with overlap

ethr =  7.35E-06,  avg # of iterations = 16.5

 

negative rho (up, down):  3.080E+00 0.000E+00

 

totalcpu time spent up to now is     2448.4 secs

 

total energy              =  -11427.08884909 Ry

estimatedscf accuracy    <       0.09443612 Ry

 

iteration #  9     ecut=    25.00 Ry     beta= 0.70

     Davidson diagonalization with overlap

ethr =  7.35E-06,  avg # of iterations =  3.0

 

negative rho (up, down):  3.082E+00 0.000E+00

 

totalcpu time spent up to now is     2631.2 secs

 

total energy              =  -11427.09314220 Ry

estimatedscf accuracy    <       0.01256330 Ry

 

iteration # 10     ecut=    25.00 Ry     beta= 0.70

     Davidson diagonalization with overlap

 

 

Benchmarking on multiple nodes

 

For 4 node

Script to run Qunatum-Espresso with 4 nodes

 

#!/bin/bash

 

#SBATCH –job-name=qe_test

#SBATCH -o qe_out%j.out

#SBATCH -e qe_err%j.err

#SBATCH -N 4

#SBATCH –ntasks-per-node=4

 

echo -e ‘\n submitted Quantum Espresso job’

echo ‘hostname’

hostname

 

# loads Open MPI and Quantum Espresso modules

#module load openmpi/gcc

#module load qe

source /home/apps/spack/share/spack/setup-env.sh

spack load quantum-espresso@7.0

 

# run Quantum Espresso using Open MPI’s mpirun

# results will be printed to output.file

#command to run – time mpirun -n 4 pw.x -npool 2 -in <input file>

timempirun -n 4 pw.x -npool 2 -in /home/cdacapp/Application-Data/quantum-espresso/ueabs/quantum_espresso/test_cases/small/ausurf.in

 

Command to run script-

sbatch<script_name>

sbatch job.sh

After successful run two file will be created as mentioned in script –

  1. Output-file (Output can be checked in it)

2 . Error file     (for error checking refer this file)

  • Install OpenFoam using spack

 

  • Check the package availability

spack find openfoam

 

  • Set environment

spack load openfoam@2106

 

  • Running OpenFOAM
    • Obtaining Benchmarks

# Obtaining Benchmark: motorBike 2M cells

wgethttp://openfoamwiki.net/images/6/62/Motorbike_bench_template.tar.gz

 

tar -xzvf Motorbike_bench_template.tar.gz

cdbench_template

 

  • Run the script – run.sh

 

#!/bin/bash

 

# Prepare cases

fori in 1 2 4 6 8 12 16 20 24; do

    d=run_$i

echo “Prepare case ${d}…”

cp -r basecase $d

cd $d

if [ $i -eq 1 ]

then

mvAllmesh_serialAllmesh

fi

sed -i “s/method.*/method scotch;/” system/decomposeParDict

sed -i “s/numberOfSubdomains.*/numberOfSubdomains ${i};/” system/decomposeParDict

time ./Allmesh

cd ..

done

 

# Run cases

fori in 1 2 4 6 8 12 16 20 24; do

echo “Run for ${i}…”

cd run_$i

if [ $i -eq 1 ]

then

simpleFoam>log.simpleFoam 2>&1

else

mpirun -np ${i} simpleFoam -parallel >log.simpleFoam 2>&1

fi

cd ..

done

 

 

sbatch run.sh

 

  • Output file will get generated – slurm-39266.out
  • To check the status of the job running type-

squeue –me

  • Install cp2k from spack
  • Running cp2k
    • Set environment

spack load cp2k@8.2

 

  • Benchmarking
    • Download benchmarking data from

wgethttps://repository.praceri.eu/git/UEABS/ueabs/tree/master/quantum_espresso/test_cases/medium

 

  • cd cp2k/benchmarks/TestCaseC_H2O-DFT-LS

 

  • Command to run test_case

mpirun -np 128 <cp2k executable> -i<input_fule.in>

mpirun -np 128 cp2k.psmp -i H2O-DFT-LS.inp

 

Sample Output

ATOMIC KIND INFORMATION

 

  1. Atomic kind: O Number of atoms:    2048

 

     Orbital Basis Set                                        DZVP-MOLOPT-SR-GTH

 

       Number of orbital shell sets:                                           1

       Number of orbital shells:                                               5

       Number of primitive Cartesian functions:                                5

       Number of Cartesian basis functions:                                   14

       Number of spherical basis functions:                                   13

       Norm type:                                                              2

 

Normalised Cartesian orbitals:

 

                        Set   Shell   Orbital            Exponent    Coefficient

 

                          1       1    2s               10.389228       0.396646

                                                         3.849621       0.208811

                                                         1.388401      -0.301641

                                                         0.496955      -0.274061

                                                         0.162492      -0.033677

 

                          1       2    3s               10.389228       0.303673

                                                         3.849621       0.240943

                                                         1.388401      -0.313066

                                                         0.496955      -0.043055

                                                         0.162492       0.213991

 

                          1       3    3px              10.389228      -1.530415

                                                         3.849621      -1.371928

                                                         1.388401      -0.761951

                                                         0.496955      -0.253695

                                                         0.162492      -0.035541

                          1       3    3py              10.389228      -1.530415

                                                         3.849621      -1.371928

                                                         1.388401      -0.761951

                                                         0.496955      -0.253695

                                                         0.162492      -0.035541

                          1       3    3pz              10.389228      -1.530415

                                                         3.849621      -1.371928

                                                         1.388401      -0.761951

                                                         0.496955      -0.253695

                                                         0.162492      -0.035541

 

                          1       4    4px              10.389228      -0.565392

 

.

.

.

.

PW_GRID| Information for grid number                                          1

 PW_GRID| Grid distributed over                                   128 processors

 PW_GRID| Real space group dimensions                                   128    1

 PW_GRID| the grid is blocked:                                                NO

 PW_GRID| Cutoff [a.u.]                                                    150.0

 PW_GRID| spherical cutoff:                                                   NO

 PW_GRID|   Bounds   1           -210     209                Points:         420

 PW_GRID|   Bounds   2           -210     209                Points:         420

 PW_GRID|   Bounds   3           -210     209                Points:         420

 PW_GRID| Volume element (a.u.^3)  0.5576E-02     Volume (a.u.^3)    413100.3686

 PW_GRID| Grid span                                                    FULLSPACE

 PW_GRID|   Distribution                         Average         Max         Min

 PW_GRID|   G-Vectors                           578812.5      579180      578760

 PW_GRID|   G-Rays                                1378.1        1379        1378

 PW_GRID|   Real Space Points                   578812.5      705600      529200

 

.

.

.

Total Electron Density at R=0:                                         0.425028

 Re-scaling the density matrix to get the right number of electrons

                  # Electrons              Trace(P)               Scaling factor

                        16384             16384.000                        1.000

 Energy with the initial guess:    -34723.021315705

.

.

.

 

Benchmarking on multiple nodes

 

For 4 node

Script to run Cp2k with 4 nodes

 

#!/bin/bash

 

#SBATCH –job-name=cp2k_test

#SBATCH -o cp2k_out%j.out

#SBATCH -e cp2k_err%j.err

#SBATCH -N 4

#SBATCH –ntasks-per-node=4

 

 

source /home/apps/spack/share/spack/setup-env.sh

spack load cp2k@8.2

 

source /home/apps/spack/share/spack/setup-env.sh

spack load cp2k@8.2

 

#srun cp2k.psmp test.inp

mpirun -np 128 cp2k.psmp -i /home/cdacapp/Application-Data/quantum-espresso/ueabs/cp2k/benchmarks/TestCaseC_H2O-DFT-LS/H2O-DFT-LS.inp

Command to run script-

sbatch<script_name>

sbatch cp2k.sh

 

After successful run two file will be created as mentioned in script –

  1. Output-file (Output can be checked in it)

2 . Error file     (for error checking refer this file)

  • Install OpenFoam using spack

 

  • Check the package availability

spack find nwchem

 

  • Set environment

spack load nwchem@7.0.2

 

  • Running OpenFOAM
    • Obtaining Benchmarks

            wgethttps://nwchemgit.github.io/c240_631gs.nw

cd<input file directory>

 

  • Command to run test_case
    • mpirun -np $NP –bind-to core –map-by ppr:$ITE:$RESOURCE:pe=$PE <executable file name along with path to executable file>inpit_file

 

  • mpirun -np $NP –bind-to core –map-by ppr:$ITE:$RESOURCE:pe=$PE /home/apps/spack/opt/spack/linux-centos7-cascadelake/intel-2021.3.0/nwchem-7.0.2-deq5wvlvropbxxgxbrnvhoktalam4wcu/bin/nwchem c240_631gs.nw

 

 

 

 

Output

 Sum of atomic energies:       -9038.46099563

 

      Non-variational initial energy

      ——————————

 

 Total energy =   -9227.773601

 1-e energy   = -155477.628661

 2-e energy   =   74518.540306

 HOMO         =      -0.162636

 LUMO         =      -0.107661

 

   Time after variat. SCF:    693.1

   Time prior to 1st pass:    694.3

.

.

.

  • Installation of cloverLeaf –

Command to install cloverLeaf –

spack install -v -j30 cloverleaf@1.1 build=mpi_only %gcc@12.2.0 ^openmpi@4.1.4

 

 

  • Check the package availability

spack find cloverleaf

 

 

  • Set environment
    • spack load cloverleaf@1.1

 

  • Running CloverLeaf
    • Obtaining Benchmarks
    • Command for running-
      • export OMP_NUM_THREADS=1
      • mpirun -np 40 <executable file > -i<input-file>
      • mpirun -np 40 /home/apps/spack/opt/spack/linux-centos7-cascadelake/gcc-12.2.0/cloverleaf-1.1-bvvsa5dlbmit4xckpjn4kaux2t52zv7c/bin/clover_leaf -i clover_bm1024_short.in

 

 ++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++++

 

 

 

Benchmarking

 

  1. HPCG
  • Installation of HPCG
    • spack install -v -j30 hpcg@3.1%gcc@12.2.0
  • Check the package availability
    • spack find hpcg
  • Set environment
    • spack load hpl@2.3

 

  • Running HPCG
    • Obtaining Benchmarks
    • Input file- the input file nameddat will be created in bin folder,
      • Problem size – (Line 3)
      • Run time – (Line 4)
    • Command to run hpcg benchmarking
      • mpirun -np 32 xhpcg

 

  • Sample Output-
    • After execution it will create .txt file , refer this for output.
    • The execution may take a while .

 

 

 

 

 

  1. HPL
  • Installation of HPL
    • spack install -v -j30 hpl@2.3 +openmp%gcc@12.2.0 ^openblas@0.3.20 ^openmpi@4.1.4
    •  
  • Check the package availability
    • spack find hpl

 

  • Set environment
    • spack load hpl@2.3

 

  • Running HPL
    • Obtaining Benchmarks
    • Create run scripts ( run_hpl_ccx.sh ) –

#! /bin/bash

 

# To load HPL into environment

spack load hpl@2.3%gcc@12.2.0

### performance settings ###

echo 3 > /proc/sys/vm/drop_caches

echo 1 > /proc/sys/vm/compact_memory

echo 0 > /proc/sys/kernel/numa_balancing

echo ‘always‘ > /sys/kernel/mm/transparent_hugepage/enabled

echo ‘always‘ > /sys/kernel/mm/transparent_hugepage/defrag

ldd `which xhpl`

whichmpicc

sleep 10

# Run the appfile as root, which specifies 16 processes, each with its own CPU binding for OpenMP

# set the CPU governor to performance

# sudocpupower frequency-set -g performance

# Verify the knem module is loaded

lsmod | grep -q knem

if [ $? -eq1 ]; then

echo “Loading knem module…”

sudomodprobe -v knem

fi

mpi_options=”–mcampi_leave_pinned 1 –bind-to none –report-bindings –mcabtlself,vader”

mpi_options=”$mpi_options –map-by ppr:1:l3cache -x OMP_NUM_THREADS=4 -x OMP_PROC_BIND=TRUE -x OMP_PLACES=cores”

mpirun $mpi_options -app ./appFile_ccx

 

  • The script “run_hpl_ccx.sh” requires two additional files: “appFile_ccx” and “xhpl_ccx.sh”.
    • appFile_ccx

-np 1 ./xhpl_ccx.sh 0 0-3 4

-np 1 ./xhpl_ccx.sh 0 4-7 4

-np 1 ./xhpl_ccx.sh 0 8-11 4

-np 1 ./xhpl_ccx.sh 0 12-15 4

-np 1 ./xhpl_ccx.sh 1 16-19 4

-np 1 ./xhpl_ccx.sh 1 20-23 4

-np 1 ./xhpl_ccx.sh 1 24-27 4

-np 1 ./xhpl_ccx.sh 1 28-31 4

-np 1 ./xhpl_ccx.sh 2 32-35 4

-np 1 ./xhpl_ccx.sh 2 36-39 4

-np 1 ./xhpl_ccx.sh 2 40-43 4

-np 1 ./xhpl_ccx.sh 2 44-47 4

-np 1 ./xhpl_ccx.sh 3 48-51 4

-np 1 ./xhpl_ccx.sh 3 52-55 4

-np 1 ./xhpl_ccx.sh 3 56-59 4

-np 1 ./xhpl_ccx.sh 3 60-63 4

-np 1 ./xhpl_ccx.sh 4 64-67 4

-np 1 ./xhpl_ccx.sh 4 68-71 4

-np 1 ./xhpl_ccx.sh 4 72-75 4

-np 1 ./xhpl_ccx.sh 4 76-79 4

-np 1 ./xhpl_ccx.sh 5 80-83 4

-np 1 ./xhpl_ccx.sh 5 84-87 4

-np 1 ./xhpl_ccx.sh 5 88-91 4

-np 1 ./xhpl_ccx.sh 5 92-95 4

-np 1 ./xhpl_ccx.sh 6 96-99 4

-np 1 ./xhpl_ccx.sh 6 100-103 4

-np 1 ./xhpl_ccx.sh 6 104-107 4

-np 1 ./xhpl_ccx.sh 6 108-111 4

-np 1 ./xhpl_ccx.sh 7 112-115 4

-np 1 ./xhpl_ccx.sh 7 116-119 4

-np 1 ./xhpl_ccx.sh 7 120-123 4

-np 1 ./xhpl_ccx.sh 7 124-127 4

 

  • sh

#! /bin/bash

#

# Bind memory to node $1 and four child threads to CPUs specified in $2

export OMP_NUM_THREADS=$3

export GOMP_CPU_AFFINITY=”$2″

export OMP_PROC_BIND=TRUE

# BLIS_JC_NT=1 (No outer loop parallelization):

export BLIS_JC_NT=1

# BLIS_IC_NT= #cores/ccx (# of 2nd level threads – one per core in the shared L3 cache domain):

export BLIS_IC_NT=$OMP_NUM_THREADS

# BLIS_JR_NT=1 (No 4th level threads):

export BLIS_JR_NT=1

# BLIS_IR_NT=1 (No 5th level threads):

export BLIS_IR_NT=1

numactl – -membind=$1 /home/apps/spack/opt/spack/linux-centos7-cascadelake/gcc-12.2.0/hpl-2.3-gneaaoilemgmzmx4kuxaciqvxm7iixav/bin/xhpl

 

  • dat is the data file available along with the installation package in bin.

 

  • Command to run benchmarking
    • Run the run script – run_hpl_ccx.sh
    • mpirun -np 4 <executable-file>

 

  • mpirun -np 4 /home/apps/spack/opt/spack/linux-centos7-cascadelake/gcc-12.2.0/hpl-2.3-gneaaoilemgmzmx4kuxaciqvxm7iixav/bin/xhpl

 

  • Sample Output

 

HPL_pdgesv() end time   Wed Nov 16 17:23:27 2022

 

——————————————————————————–

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   2.11745643e-02 …… PASSED

================================================================================

T/V                N    NB     P     Q               Time                 Gflops

——————————————————————————–

WR00R2L4          29     1     2     2               0.00             9.5141e-02

HPL_pdgesv() start time Wed Nov 16 17:23:27 2022

 

HPL_pdgesv() end time   Wed Nov 16 17:23:27 2022

 

.

.

.

.

——————————————————————————–

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)=   2.22409445e-02 …… PASSED

================================================================================

 

Finished    864 tests with the following results:

            864 tests completed and passed residual checks,

              0 tests completed and failed residual checks,

              0 tests skipped because of illegal input values.

——————————————————————————–

 

End of Tests.

 

 

  1. STREAM

 

  • Installation of Stream
    • spack install -v -j30 stream@5.10%intel@2021.6.0
    •  
  • Check the package availability
    • spack find stream

 

  • Set environment
    • spack load stream@5.10%intel@2021.6.0

 

 

  • Running HPL
    • Obtaining Benchmarks

            Successful installation of package will two executables

 

  • Command to run benchmarking
    • exe (c file)
    • exe (fortran file)

 

 

  • Sample Output

 

———————————————-

 Double precision appears to have 16 digits of accuracy

 Assuming 8 bytes per DOUBLE PRECISION word

———————————————-

 ———————————————-

 STREAM Version $Revision: 5.6 $

 ———————————————-

 Array size =   20000000

 Offset     =          0

 The total memory requirement is  457 MB

 You are running each test  10 times

 —

 The *best* time for each test is used

 *EXCLUDING* the first and last iterations

 ———————————————-

 Number of Threads =           40

 ———————————————-

 Printing one line per active thread….

 Printing one line per active thread….

 Printing one line per active thread….

 Printing one line per active thread….

 Printing one line per active thread….

.

.

.

.

Your clock granularity/precision appears to be      1 microseconds

 —————————————————-

Function     Rate (MB/s)  Avg time   Min time  Max time

Copy:      **********      0.0031      0.0026      0.0034

Scale:     **********      0.0027      0.0026      0.0028

Add:       **********      0.0039      0.0038      0.0041

Triad:     **********      0.0036      0.0035      0.0038

 —————————————————-

 Solution Validates!

Obtaining Benchmarks

# STMV

$ wget –no-check-certificate https://www.ks.uiuc.edu/Research/namd/utilities/stmv/par_all27_prot_na.inp

$ wget –no-check-certificate https://www.ks.uiuc.edu/Research/namd/utilities/stmv/stmv.namd

$ wget –no-check-certificate https://www.ks.uiuc.edu/Research/namd/utilities/stmv/stmv.pdb.gz

$ wget –no-check-certificate https://www.ks.uiuc.edu/Research/namd/utilities/stmv/stmv.psf.gz

$ gunzip stmv.psf.gz

$ gunzip stmv.pdb.gz

 

Batch Script for running namd on Cluster (CPU)

 

#!/bin/bash

 

#SBATCH –job-name=namd                          #Job name

#SBATCH –nodes=1                                      #Number of nodes requested

#SBATCH –ntasks-per-node=4                                  #Number of processes per node

#SBATCH –time=00:30:00                             #Maximum time limit for job

#SBATCH –partition=cpu                                          #Cluster partition setup

#SBATCH –error=namd%J.err                                   #specify error file name

#SBATCH –output=namd%J.out                    #specify output file name

 

# Load the necessary modules and environment for running namd

 

. /home/apps/spack/share/spack/setup-env.sh

spack load namd@2.14                                              #load the namd package from spack

module load openmpi/3.1.5                             #load openmpi from available module

 

# Change to the directory where the input files are located

 

cd /home/cdacapp/Application-Data/namd/apoa1/

 

# Run NAMD

mpirun -np $SLURM_NTASKS namd2 apoa1.namd