Specific Commands to use with Research Computing
man
All servers have built-in documentation on almost all commands. To access it use the man command. man is short for "manual." This is the most important command there is!
Run:
man ls
In the man program the following keys are used for navigation:
- SPACE - next page
- b - previous page
- q - quit
- / - search. After pressing '/' enter the search term and press enter.
- n - find next search match
- N - find previous search match
That should be all you need. You can always do this if you want more info:
man man
ps
The ps command lists running processes. By default with no options it lists just your own:
jdoe@hongkong:~$ ps
PID TTY TIME CMD
849033 pts/0 00:00:00 bash
939343 pts/0 00:00:00 ps
Columns are:
- PID - Process ID
- TTY - Controlling terminal device(you probably won't ever need this)
- Time - Cumulative CPU time
- CMD - the name of the process. This may or may not be the same as the name of the program that is running.
Add the "-f" (full) switch for more info:
jdoe@hongkong:~$ ps -f
UID PID PPID C STIME TTY TIME CMD
jdoe 849033 849032 0 09:59 pts/0 00:00:00 -bash
jdoe 940919 849033 0 14:13 pts/0 00:00:00 ps -f
Additional columns:
- UID - User ID of user that owns the process
- PPID - Parent Process ID. The ID of the process that started this process
- C - CPU utilization. This is the integer value of the percent usage over the lifetime of the process.
- STIME - Start time(real time)
Or:
jdoe@hongkong:~$ ps u
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
jdoe 1880582 0.0 0.0 8664 5300 pts/0 Ss 11:47 0:00 -bash
jdoe 1898512 0.0 0.0 8692 5296 pts/1 Ss+ 12:22 0:00 -bash
jdoe 1901001 0.0 0.0 10056 3284 pts/0 R+ 12:27 0:00 ps u
Additional columns:
- %CPU - % of a CPU core used
- %MEM - % of system memory used
By default only processes that are children of your login process(SSH) are shown. To show all your processes:
jdoe@wurc3:~$ ps -u jdoe
PID TTY TIME CMD
1880535 ? 00:00:00 systemd
1880539 ? 00:00:00 (sd-pam)
1880576 ? 00:00:00 sshd
1880582 pts/0 00:00:00 bash
1898511 ? 00:00:00 tmux: server
1898512 pts/1 00:00:00 bash
1900951 pts/0 00:00:00 ps
kill
The kill command stops a running process. This should only be used when:
- You are unable to stop the process any other way
- You are aware of the repercussions of stopping the process. Depending on what the process program is doing and how it was designed there could be data loss.
jdoe@hongkong:~$ kill 234562
This tells the process to terminate nicely giving it the opportunity to run it exit/cleanup code. If a process does not stop with that command you can force it to stop:
jdoe@hongkong:~$ kill -kill 234562
jdoe@hongkong:~$ kill -9 234562
These commands do the same thing. Be careful as this command can cause problems if the process was writing to a database, or something similar.
top and htop
top -u mathaccountname
nice and renice
nice -n 10 myscript
renice 10 -p 21827
ionice
ionice set the I/O scheduling priority for a process. There are three I/O scheduling classes:
- Idle(3) - A program running with idle io priority will only get disk time when no other program has asked for disk io for a defined grace period. The impact of idle io processes on normal system activity should be zero.
- Best Effort(3) - This is the effective scheduling class for any process that has not asked for a specific io priority.
- Real Time(1) - The RT scheduling class is given first access to the disk, regardless of what else is going on in the system. Thus the RT class needs to be used with some care, as it can starve other processes. This is not available to non-administrator users.
The Best Effort class - again, the default class for all processes - has eight priorities that can be set on any process in the class: 0 through 7. The lower the number, the higher the priority. Unless explicitly set the priority within the best effort class will be dynamically derived from the CPU nice level of the process: io_priority = (cpu_nice + 20) / 5
Example:
You have a process named "my_disk_eater" that is very disk intensive and you don't care how long it takes. You decide to let it run in such a way that it will not slow down any other processes. You run it like this:
ionice -c 3 my_disk_eater
Example:
You are started a process(PID # 323323) that is needs to read in a large amount of data. You observe that the process spends a long time waiting for the files to be read and you would like this to finish more quickly. You run this command to change it's I/O priority:
ionice -c2 -n0 -p 323323
free
free gives you a picture of the memory usage on a server.
Example:
free
This will show memory information in bytes. More useful is this which scales the numbers in appropriate units:
free -h
Which could show:
total used free shared buff/cache available
Mem: 502Gi 3.4Gi 114Gi 4.0Mi 384Gi 495Gi
Swap: 15Gi 1.0Mi 15Gi
time
We'll run a command that just wait 5 seconds:
time sleep 5
Which would show:
real 0m5.001s
user 0m0.001s
sys 0m0.000s
nvtop
The nvtop command provides a means of checking and monitoring GPU usage. It provides GPU device information, device utilization in the form of charts and graphs, and process lists.
Device 0 [Quadro RTX 6000] PCIe GEN 1@16x RX: 0.000 KiB/s TX: 0.000 KiB/s
GPU 300MHz MEM 405MHz TEMP 109°F FAN N/A% POW 15 / 250 W
GPU[ 0%] MEM[|| 1.531Gi/22.500Gi]
Device 1 [Quadro RTX 6000] PCIe GEN 1@16x RX: 0.000 KiB/s TX: 0.000 KiB/s
GPU 300MHz MEM 405MHz TEMP 118°F FAN N/A% POW 15 / 250 W
GPU[ 0%] MEM[ 0.344Gi/22.500Gi]
┌──────────────────────────────────┐ ┌──────────────────────────────────┐
100│GPU0 % │100│GPU1 % │
│GPU0 mem% │ │GPU1 mem% │
75│ │ │ 75│───────┐ │
│ │ │ │ │ │
50│ │ │ 50│ │ │
│ │ │ │ │ │
25│ │ │ 25│ │ │
0│───────┴──────────────────────────│ 0│───────┴──────────────────────────│
└──────────────────────────────────┘ └──────────────────────────────────┘
PID USER DEV TYPE GPU GPU MEM CPU HOST MEM Command
F2Setup F6Sort F9Kill F10Quit F12Save Config
For more details see: https://github.com/Syllo/nvtop
gpumon
Tue Aug 1 14:16:31 2023
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 520.61.05 Driver Version: 520.61.05 CUDA Version: 11.8 |
|-------------------------------+----------------------+----------------------+
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|===============================+======================+======================|
| 0 Quadro RTX 6000 On | 00000000:61:00.0 Off | 0 |
| N/A 43C P0 59W / 250W | 17503MiB / 23040MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
| 1 Quadro RTX 6000 On | 00000000:DB:00.0 Off | 0 |
| N/A 53C P0 62W / 250W | 165MiB / 23040MiB | 0% Default |
| | | N/A |
+-------------------------------+----------------------+----------------------+
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 1628578 C /usr/bin/python3 17500MiB |
| 1 N/A N/A 1628578 C /usr/bin/python3 162MiB |
+-----------------------------------------------------------------------------+
- Top line: the CUDA version in use.
- Second row is the column headers for the list of GPUs
- First column
- GPU number: here there are two, 0 and 1
- GPU model: Quadro RTX 6000
- Second column
- Memory used and available. For GPU #1 there is 165Mib used out of 23040Mib available
- Third column
- At the moment the utilization of each GPU is 0%
- First column
- Botom section: Processes
- The GPU column indicates which GPU the process is using
- The PID column is the process ID of the process
- The Type will generally be 'C' for 'Compute'. You may see 'G' for 'Graphics'
- The process name
- And the GPU memory usage
You can also run nvidia-smi directly. The man page has a great deal of information on this.
Resources
- How to nice and renice your processes: https://www.nixtutor.com/linux/changing-priority-on-linux-processes/
- 12 tips for using top: https://www.tecmint.com/12-top-command-examples-in-linux/
- All about Free: http://www.linfo.org/free.html
- And don't forget the man command!