Specific Commands to use with Research Computing

This page discusses commands that can assist you in your use of the research servers. Most of them are standard Linux commands that you should get to know.

man

All servers have built-in documentation on almost all commands. To access it use the man command. man is short for "manual." This is the most important command there is!

Run:

man ls

In the man program the following keys are used for navigation:

  • SPACE - next page
  • b - previous page
  • q - quit
  • / - search. After pressing '/' enter the search term and press enter.
  • n - find next search match
  • N - find previous search match

That should be all you need. You can always do this if you want more info:

man man

ps

The ps command lists running processes. By default with no options it lists just your own:

jdoe@hongkong:~$ ps
    PID TTY          TIME CMD
 849033 pts/0    00:00:00 bash
 939343 pts/0    00:00:00 ps


Columns are:

  • PID - Process ID
  • TTY - Controlling terminal device(you probably won't ever need this)
  • Time - Cumulative CPU time
  • CMD - the name of the process. This may or may not be the same as the name of the program that is running.

Add the "-f" (full) switch for more info:

jdoe@hongkong:~$ ps -f
UID          PID    PPID  C STIME TTY          TIME CMD
jdoe   849033  849032  0 09:59 pts/0    00:00:00 -bash
jdoe   940919  849033  0 14:13 pts/0    00:00:00 ps -f

Additional columns:

  1. UID - User ID of user that owns the process
  2. PPID - Parent Process ID. The ID of the process that started this process
  3. C - CPU utilization. This is the integer value of the percent usage over the lifetime of the process. 
  4. STIME - Start time(real time)

Or:

jdoe@hongkong:~$ ps u
USER         PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
jdoe    1880582  0.0  0.0   8664  5300 pts/0    Ss   11:47   0:00 -bash
jdoe    1898512  0.0  0.0   8692  5296 pts/1    Ss+  12:22   0:00 -bash
jdoe    1901001  0.0  0.0  10056  3284 pts/0    R+   12:27   0:00 ps u

Additional columns:

  1. %CPU - % of a CPU core used
  2. %MEM - % of system memory used

By default only processes that are children of your login process(SSH) are shown. To show all your processes:

jdoe@wurc3:~$ ps -u jdoe
    PID TTY          TIME CMD
1880535 ?        00:00:00 systemd
1880539 ?        00:00:00 (sd-pam)
1880576 ?        00:00:00 sshd
1880582 pts/0    00:00:00 bash
1898511 ?        00:00:00 tmux: server
1898512 pts/1    00:00:00 bash
1900951 pts/0    00:00:00 ps

kill

The kill command stops a running process. This should only be used when:

  1. You are unable to stop the process any other way
  2. You are aware of the repercussions of stopping the process. Depending on what the process program is doing and how it was designed there could be data loss.
jdoe@hongkong:~$ kill 234562

This tells the process to terminate nicely giving it the opportunity to run it exit/cleanup code. If a process does not stop with that command you can force it to stop:

jdoe@hongkong:~$ kill -kill 234562
jdoe@hongkong:~$ kill -9   234562

These commands do the same thing. Be careful as this command can cause problems if the process was writing to a database, or something similar.

top and htop 

top and htop are both dashboards to show current server activity. Run them, type "q" to quit, "h" for help.
top -u mathaccountname
This will show you JUST your own processes.  That's a nice way to make the output easier to understand.

htop is good at showing CPU core usage and memory usage in a more visual style.
 

nice and renice

nice and renice alter the process priority of processes by adjusting a value called niceness. nice does this when you start the process. renice does this after the process has already been started. 
The niceness  is a number between -20 and 19.  By default, your processes run at 0. Negative numbers are high priority. Positive numbers are low priority. Only administrators can set negative niceness. 

If you want your process to defer to other processes in the system, you could specify your scripts with a nice level above 0.   
When you nice a script, you are saying, I'm sharing this server, and I'm happy to step aside if you need to use this server.  Your nice process will get less execution time on the CPU per unit time that other processes.

Example:
nice -n 10 myscript
This run the program myscript  with a niceness of 10

Renice allows you to reconfigure an already running program to a lower priority.  However, programs that already running have to be referred to by the PID number(process ID number). If you type top ("q" to quit) you'll see a list of all of the processes running on that server at any given time.   You should be able to see what your PID is. you can also run ps to list your processes.

Example:
So, let's say you figure out your process is 21827.
renice 10 -p 21827
would allow you to reset your nice level to 10 on that process.  The process would continue to function, just becomes more flexible with sharing its CPU.
(There's other ways to figure out PID #s.  This is just one way.)

ionice

ionice set the I/O scheduling priority for a process. There are three I/O scheduling classes:

  • Idle(3) - A program running with idle io priority will only get disk time when no other program has asked for disk io for a defined grace period. The impact of idle io processes on normal system activity should be zero.
  • Best Effort(3) - This is the effective scheduling class for any process that has not asked for a specific io priority. 
  • Real Time(1) - The RT scheduling class is given first access to the disk, regardless of what else is going on in the system. Thus the RT class needs to be used with some care, as it can starve other processes. This is not available to non-administrator users.

The Best Effort class - again, the default class for all processes -  has eight priorities that can be set on any process in the class: 0 through 7. The lower the number, the higher the priority. Unless explicitly set the priority within the best effort class will be dynamically derived from the CPU nice level of the process: io_priority = (cpu_nice + 20) / 5

Example:
You have a process named "my_disk_eater" that is very disk intensive and you don't care how long it takes. You decide to let it run in such a way that it will not slow down any other processes. You run it like this:

ionice -c 3 my_disk_eater

Example:
You are started a process(PID # 323323) that is needs to read in a large amount of data. You observe that the process spends a long time waiting for the files to be read and you would like this to finish more quickly. You run this command to change it's I/O priority:

ionice -c2 -n0 -p 323323

free

free gives you a picture of the memory usage on a server. 

Example:

free

This will show memory information in bytes. More useful is this which scales the numbers in appropriate units:

free -h

Which could show:

               total        used        free      shared  buff/cache   available
Mem:           502Gi       3.4Gi       114Gi       4.0Mi       384Gi       495Gi
Swap:           15Gi       1.0Mi        15Gi
Note the free column. This is the memory that is not in use at all. If memory is being used by the filesystem cache it will pull from this free pool and be shown in the buff/cache column. This memory is still available for programs if needed. Thus the available column. This represents all memory that is able to be used by your applications.

time

time can be run against a specific command to find out out how much CPU time a process takes to run. Results show real(clock) time, CPU time of the user process, and CPU time of the system kernel.
Example:
We'll run a command that just wait 5 seconds:
time sleep 5

Which would show:

real    0m5.001s
user    0m0.001s
sys    0m0.000s

nvtop

The nvtop command provides a means of checking and monitoring GPU usage. It provides GPU device information, device utilization in the form of charts and graphs, and process lists.

Device 0 [Quadro RTX 6000] PCIe GEN 1@16x RX: 0.000 KiB/s TX: 0.000 KiB/s
 GPU 300MHz  MEM 405MHz  TEMP 109°F FAN N/A% POW  15 / 250 W
 GPU[                             0%] MEM[||             1.531Gi/22.500Gi]

 Device 1 [Quadro RTX 6000] PCIe GEN 1@16x RX: 0.000 KiB/s TX: 0.000 KiB/s
 GPU 300MHz  MEM 405MHz  TEMP 118°F FAN N/A% POW  15 / 250 W
 GPU[                             0%] MEM[               0.344Gi/22.500Gi]
   ┌──────────────────────────────────┐   ┌──────────────────────────────────┐
100│GPU0 %                            │100│GPU1 %                            │
   │GPU0 mem%                         │   │GPU1 mem%                         │
 75│       │                          │ 75│───────┐                          │
   │       │                          │   │       │                          │
 50│       │                          │ 50│       │                          │
   │       │                          │   │       │                          │
 25│       │                          │ 25│       │                          │
  0│───────┴──────────────────────────│  0│───────┴──────────────────────────│
   └──────────────────────────────────┘   └──────────────────────────────────┘
    PID USER DEV    TYPE  GPU        GPU MEM    CPU  HOST MEM Command          


F2Setup   F6Sort    F9Kill    F10Quit    F12Save Config                        

For more details see: https://github.com/Syllo/nvtop

gpumon

This command is a wrapper around the nvidia-smi command that updates every 2 seconds. When you run it you see:
Tue Aug  1 14:16:31 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05    Driver Version: 520.61.05    CUDA Version: 11.8     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Quadro RTX 6000     On   | 00000000:61:00.0 Off |                    0 |
| N/A   43C    P0    59W / 250W |  17503MiB / 23040MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  Quadro RTX 6000     On   | 00000000:DB:00.0 Off |                    0 |
| N/A   53C    P0    62W / 250W |    165MiB / 23040MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A   1628578      C   /usr/bin/python3                17500MiB |
|    1   N/A  N/A   1628578      C   /usr/bin/python3                  162MiB |
+-----------------------------------------------------------------------------+
Press CTRL-C to stop
Things to note:
  • Top line: the CUDA version in use.
  • Second row is the column headers for the list of GPUs
    • First column
      • GPU number: here there are two, 0 and 1
      • GPU model: Quadro RTX 6000
    • Second column
      • Memory used and available.  For GPU #1 there is 165Mib used out of 23040Mib available
    • Third column
      • At the moment the utilization of each GPU is 0%
  • Botom section: Processes
    • The GPU column indicates which GPU the process is using
    • The PID column is the process ID of the process
    • The Type will generally be 'C' for 'Compute'. You may see 'G' for 'Graphics'
    • The process name
    • And the GPU memory usage

You can also run nvidia-smi directly. The man page has a great deal of information on this.

Resources



Keywords:
research, computing, linux, ssh, command line 
Doc ID:
114568
Owned by:
Erik M. in UW Math Department
Created:
2021-10-28
Updated:
2024-07-01
Sites:
UW Math Department