Managing Jobs on LINCOMM

One of the main reasons for using LINCOMM is that it can run very long jobs. This article will teach you how to manage such jobs on LINCOMM.

taken from SSCC's Linstat KB

Foreground and Background Jobs

Normally when you type a command, it is processed and you see the results (if any) before the cursor returns and you can type a new command. These jobs are said to be running in the foreground. If you put a job in the background, the cursor returns immediately and you can keep giving commands and doing other work while the your job is running. When the job finishes a message will appear on your screen.

If a job is running in the background it will keep running even if you log out, so you can start a long job before you leave in the evening, log out, and get the results the next morning (or next week, or next month). Just keep track of which LINCOMM server you are using when you start a job, because if you need to manage that job you'll need to return to that server.

What you should not do when you have a job running in the background is start another CPU-intensive job

To run a job in the background, add an ampersand (  &  ) at the end of the command. For example, if you type:

stata -b do myprogram

Stata will start and run  myprogram.do  in the foreground. Thus your session will be unavailable until the job is done. On the other hand:

stata -b do myprogram &

runs Stata in the background. The cursor returns immediately and you can do other things while Stata is running your program. When it is done you'll see a message like:

[1]    Done                          stata -b do myprogram

  Switching Between Foreground and Background

If you have a job running in the foreground and want to put it in the background, press  CTRL-z  (if the job has opened a separate window, you must return to your main LINCOMM window before pressing  CTRL-z  ). The current job will be suspended and you will get your cursor back. Then type  bg  to put it in the background—it will not run while suspended. You can also type  fg  to move it back to the foreground, either from being suspended or from the background.

  Monitoring Jobs

The  ps  command (think processes) gives you a list of processes you are running on the server. The output will be similar to the following:

PID TTY          TIME CMD
29413 pts/30   00:00:00 tcsh
 1601 pts/30   00:00:00 emacs
 1602 pts/30   00:00:00 emacs
 1605 pts/30   00:00:00 ps

 PID  is short for Process IDentifier, and is used when you need to specify a particular job. Keep in mind that LICNOMM is a cluster of four servers, and  ps  will only show you the jobs you are running on the server you're logged into. See  Switching Between LINCOMM Servers  to learn how to get back to the LINCOMM server where you started your job.

Unfortunately, the default  ps  output will only show jobs you started in your current session. To see all your jobs from any session, type:

ps aux | grep  NetID

where  NetID should be replaced by your NetID (e.g.  ps aux | grep jdoe  ). This lists all jobs on the server, then filters it to only show yours.

Another useful command for monitoring jobs is  top  . This will tell you the "top" jobs (in terms of resources used) currently running on the server. With it you can verify that your job is actually doing work by checking that its  %CPU  is greater than zero, though jobs can easily get stuck in a state where they use CPU without doing anything productive.

 top  also gives you a sense of how busy the server is. The LINCOMM servers have 24 CPUs, so if  %CPU  adds up to more than 2400% programs will have to share the available CPU time. If the LINCOMM you're on has less CPU time available than your program is capable of using, consider switching to a different LINCOMM.

Unfortunately  top  does not monitor all the resources a server needs to run jobs. 

  Killing a Job

If you need to stop a running job, use the  kill  command. Simply type  kill  and then the PID of the job you want to kill. For example:

kill 1602

This doesn't actually stop the job, it merely requests that it shut down, giving the program an opportunity to clean up temporary files and such.  On the other hand, adding the  -9  switch to the  kill  command will kill a program immediately with or without its consent. Thus:

kill -9 1602

 will  kill process 1602.

  Switching Between LINCOMM Servers

LINCOMM is actually a cluster of four servers. When you log in you're assigned to a server randomly to try to balance the load between them. However, you can choose to connect to a specific server to monitor a job you started previously or if the server you're assigned to turns out to be particularly busy.

Be sure to note which server you're on when you start a long job. If the server name is not in your prompt, you can identify it by typing:

cat /etc/hostname




Keywords:
jobs long running background foreground wait 
Doc ID:
111804
Owned by:
Eric D. in Agricultural & Applied Economics
Created:
2021-06-19
Updated:
2021-06-19
Sites:
Agricultural & Applied Economics