What is Batch?
All users of Linux systems are familiar with the command line procedures. Common interactive tasks include file/directory searching, directory management (clean-up, reorganization, etc.), file editing, code debugging, use of window-based software, use of performance tools, compiling software, as well as many other uses.
The initial access to the OSC clusters is through a system of login nodes. These nodes are reserved solely for the purpose of daily and limited maintenance. These activities are typically editing/creating files, uploading and downloading files, monitoring jobs submitted to the batch system. Time and memory are limited on the login shells.
The only access to significant resources on the HPC machines is through the batch process.
Consider the following session, as we try to compile and run the program [nest.c], on one of OSC's clusters. The command gcc creates, by default, an executable file called a.out.
Sequence of interactive commands in a shell on a OSC cluster :
opt-login01:$ cd /home/userid/pathtodir opt-login01:$ gcc nest.c -lm opt-login01:$ ./a.out 459 121 For loop counts N=459 M=121 Sum = 306505.406250 opt-login01:$ /bin/rm a.out
A batch file is a script which contains the commands of a unix system, which are executed in sequence, exactly as would be run interactively. Some lines at the beginning of the file give the batch system software some parameters it needs to know. These opening lines are called the header of the batch file. Here we can 'wrap' the commands from the interactive session, above, into a file so that it can be submitted to a batch session. Here, in the file called [nest.pbs], the first four lines are considered the header lines. These must precede the rest of the script. The remainder of the contents of the script are the commands, which will be executed sequentially.
#PBS -N nest #PBS -l walltime=01:00:00 #PBS -j oe set -x # Create TMPDIR directory cd /tmp mkdir $PBS_JOBID cd $PBS_JOBID export TMPDIR=$PWD # Compile and run code cd $PBS_O_WORKDIR gcc nest.c -lm cp a.out $TMPDIR cd $TMPDIR ./a.out 459 121 # Remove TMPDIR directory cd /tmp rm -rf $TMPDIR
The batch session begins when the user invokes the command qsub. And the job ends after the last command has completed.
Submitting batch script to Torque :
[giuliani.6@cluster ~] qsub nest.pbs 1234567.cluster.mecheng.osu.edu
It is important to remember the numeric prefix to the output from qsub. This number is assigned as the unique identifier to the batch job. When the job has completed, a log file is created in the directory from which the batch was started. The name of the file has as a prefix, the name specified in the script. (This is discussed in the [PBS Header Lines] section.) The rest of the file name is a ".o" followed by the unique job number, nest.o1234567. The contents of the file is the standard output stream of the execution of the script. The log file contains everything that would have appeared on your monitor if the commands had been run interactively.
[The 'set' command in our script requested the shell to echo each command as it was executed. This created the output lines with the '+' character in the log file.]
Contents of nest.o1234567 :
++ cd /tmp ++ mkdir 131.cluster.mecheng.osu.edu ++ cd 131.cluster.mecheng.osu.edu ++ export TMPDIR=/tmp/131.cluster.mecheng.osu.edu ++ TMPDIR=/tmp/131.cluster.mecheng.osu.edu ++ cd /share/newhome/jegiuliani ++ gcc nest.c -lm ++ cp a.out /tmp/131.cluster.mecheng.osu.edu ++ cd /tmp/131.cluster.mecheng.osu.edu ++ ./a.out 459 121 For loop counts N=459 M=121 Sum=306505.406250 ++ cd /tmp ++ rm -rf /tmp/131.cluster.mecheng.osu.edu