Using Silo
Silo is the SSCC's secure computing enclave. It consists of secure servers and data storage for working with data covered by HIPAA and other sensitive data. Silo contains Windows-based servers (WinSilo), Linux based servers (LinSilo and LinSiloBig) and an HTCondor pool (CondorSilo). WinSilo acts as the gateway to all of Silo: when you log in, you'll start on WinSilo and can log into the other servers from there.
Keep reading to learn how to access Silo and use the Windows servers in the Silo environment. Or you can skip ahead for information on:
Getting Access
If you are interested in using Silo, please contact the SSCC Help Desk. SMPH researchers should set up a consultation with the Clinical and Health Informatics Institute. Depending on the nature of your data you may need to get approval from your IRB, the UW-Madison Office of Cybersecurity, or other relevant authorities. Using Silo will expedite that process because it has already had a formal risk assessment by the Office of Cybersecurity and was found to be low risk, and many of the relevant authorities are already familiar with it. A list of all software available for SMPH can be found on our Biomedical Research Software on Silo page.
Connecting to Silo requires multifactor authentication using Duo, the same system used for your UW-Madison NetID. Most people use the Duo app on their smartphone, but you can also use a separate hardware token. When you contact the Help Desk about using Silo please mention which method you prefer. Instructions for obtaining and using a hardware token can be found here. Consider turning on Duo Restore so you can easily continue using Duo after upgrading your phone.
Installing the Citrix Workspace App
To use Silo you'll need to have the Citrix Workspace App installed on your computer. Just click on the appropriate link below and then run the installer after it finishes downloading. If you've already installed the Citrix Workspace App on your computer in order to use Winstat you do not need to install it again.
In order to use Winstat you'll need to download and install Citrix Workspace on each computer you use.
- Citrix Workspace for Windows (Windows 10)
- Citrix Workspace for Windows from the Microsoft Store
- Citrix Workspace for Mac (macOS 10.15)
- Citrix Workspace for Linux
See Using Winstat for more information about using the Citrix Workspace App.
Logging In
To log in to Silo, you will need go to the web site silo.ssc.wisc.edu. If you're asked to give permission for programs to run, do so.
At the login screen, first give your SSCC username and password as usual. If it is your first time logging into Silo, you'll be prompted to follow the instructions on the screen to set up Duo on your smartphone. You'll be able to use the same Duo app for both Silo and your NetID.
The Silo File System
Silo has an isolated file system that is separate from the SSCC's primary file system, but they have similar structures. All files are available using either Windows or Linux. The key locations are:
Linux Name | Windows Name | |
---|---|---|
Home Directory (Private Space) | ~ | Z: Drive |
Project Directories (Shared Space) | /project | V: Drive |
SMPH Project Directories (Shared Space) | /smph | S: Drive |
Home directories are primarily meant for configuration files, installed packages, and other small files. They have a quota of 20GB, which can be expanded on request to 40GB. Research data should be stored in Project directories, which have no quotas and are shared with the other members of your research group. SMPH researchers will be given project space on the S: drive, which is SMPH's space in DoIT's RestrictedDrive service.
Moving Data To and From Silo
You can move data into Silo using Globus, a tool for securely transferring research data. See Using Globus to Move Data into Silo for more information.
To move results out of Silo, we have created a folder called silosync in the Z: drive of each Silo user. Every five minutes, an automated script copies anything placed in this folder to a corresponding folder in your Z: drive on the SSCC's primary file system. One easy way to access the SSCC's primary file system is to log into Winstat, which you'll see when you log into Silo. Winstat is very similar to WinSilo but, not being a high-security server, it can communicate with the local hard drive of your computer. After logging into Winstat, you'll see a drive called Local Disk (C: on {your computer}). You can drag files from the Z: drive to this drive to move files from SSCC's file system to your computer. You're also welcome to access the SSCC's primary file system by mapping a drive to it (Windows / macOS) or using FTP.
Using WinSilo
Once you've logged in, WinSilo behaves just like a regular Windows server, with a few important exceptions:
- Silo cannot access the Internet. This can affect programs in unexpected ways: for example, Stata's findit command takes much longer to run than usual and then only gives partial results, because it tries to reach Stata's web server and does not display any results until that attempt times out. Fortunately the results it does give are the ones you're most likely to need.
- You can install R packages from CRAN and Bioconductor, Stata packages from SSC, and Python packages from PyPI. If you need to install packages from other sources contact the Help Desk.
- You cannot copy and paste between Silo and your own computer.
- Silo cannot access disk space on your computer.
- You cannot print from Silo.
Silo Downtime
Silo has a downtime from 6:00am-8:00am the third Wednesday of the month for security updates.
If you don't anticipate using Silo's Linux servers, you can stop reading at this point. Welcome to Silo!
Silo Linux Servers
Silo has four kinds of Linux servers. To use them, first sign into the Silo environment, which will put you on WinSilo, and then you can log into them from there.
LinSilo is a cluster of three servers ( linsilo001, linsilo002, linsilo003) with 44 cores and 384GB of memory each. When you log into LinSilo, you'll be automatically directed to the least busy server. If you start a long job on a server, you'll need to go back to the same server to manage it. You can switch to a different server with the ssh command; e.g. to get to linsilo001 type:
ssh linsilo001
LinSiloBig is a cluster of two servers (linsilobig001, linsilobig002) with 80 cores and 768GB of memory each. Again, when you log into LinSiloBig you'll be the directed to the least busy server, but you can switch with ssh. In fact, using LinSiloBig is so similar to using LinSilo that you can assume instructions for using LinSilo apply to LinSiloBig unless we say otherwise.
Jobs that cannot run without the additional memory LinSiloBig provides should have first priority on LinSiloBig. Jobs that can take full advantage of the additional cores it provides are second priority. Please do not use LinSiloBig for jobs that could run just as well on LinSilo or CondorSilo.
LinSiloGPU is a server (linsilogpu001) with 48 cores, 768GB of RAM, and an NVidia T4 GPU. It has been "loaned" to Silo from SSCC's Linstat cluster for COVID-19 work requiring natural language processing and this work has priority on it.
CondorSilo is a cluster of nine servers with 44 cores and 384GB of RAM each. It runs jobs submitted via HTCondor.
The Linux servers in the Silo environment were funded by SMPH. Other researchers are welcome to use them, but SMPH researchers have priority.
Logging into LinSilo
To log into LinSilo, click on the Windows logo button and find the LinSilo folder in the programs list. Then click on the LinSilo or LinSiloBig icon. This will start a program called X-Win32. The first thing you'll see is a utility window that you can ignore (but don't close it or it will close the entire program—minimize it instead). The login prompt will come up shortly thereafter, and then a terminal window once you log in.
If you'll use these frequently, you can right-click on them and pin them so that they'll come up as soon as you click the Windows logo button.
If you don't need graphics, you can also log into LinSilo using SecureCRT.
Running Programs on LinSilo
LinSilo has a wide variety of software installed, including both general-purpose statistical software and specialized software for biomedical research.
SSCC has used tcsh as its default Linux shell for more than 20 years, but bash has become more popular. (If you google how to do something in Linux the solution you find will probably be written for bash, but most of the time it will work in tcsh too since they're not that different.) Many SMPH researchers have a great deal of experience using bash, so we have made it the default shell for members from SMPH. If you'd like to switch shells (in either direction) contact the Help Desk.
Here are the commands to run a few selected programs on LinSilo. In the "Command to run a long job" column, the command is given in the form needed by bash. tcsh users should omit the nohup at the beginning of the command.
Program | Command to run it interactively | Command to run a long job that will continue after you log out ( bash version) |
---|---|---|
R | R | nohup R CMD BATCH myprogram.R & |
Python 3.7 (command line) | python | nohup python myprogram.py & |
Python 2.7 (command line) | python2 | nohup python2 myprogram.py & |
Spyder (Python IDE) | spyder | |
Jupyter Notebook | jupyter notebook | |
Stata | xstata | nohup stata -b do mydofile & |
Matlab | matlab | nohup matlab -nodisplay < mprogram.m > myprogram.log & |
SAS | sas | nohup sas myprogram.sas & |
Of course there are many ways to run these programs, and many more programs!
LinSilo is very similar to Linstat, the SSCC's general-purpose Linux cluster, so you may find Using Linstat and Managing Jobs on Linstat helpful.
Running RStudio on LinSilo
To run RStudio on LinSilo, go to the LinSilo folder in the programs list and double-click on the LinSilo RStudio Server or LinSiloBig RStudio Server icon. (Again, if you'll use these frequently you can pin them.) This will open a web browser on the Windows server containing an RStudio interface that connects to R running on LinSilo or LinSiloBig. Jobs run in RStudio will continue to run even if you log out.
CondorSilo
CondorSilo runs HTCondor, developed by the UW-Madison Computer Science Department. HTCondor is designed to let you run multiple jobs on multiple servers efficiently: when you submit jobs to an HTCondor queue, HTCondor will find available servers to run them. General documentation for using HTCondor can be found here. CondorSilo differs from a standard HTCondor installation in three important ways:
- It shares the Silo file system with the other Silo servers, so there's no need to transfer the files needed by CondorSilo jobs to the CondorSilo servers.
- It does not checkpoint or terminate jobs.
- It cannot send you email when a job finishes because it is in the isolated Silo environment. You'll have to check the status of your jobs.
CondorSilo does not have the simple scripts for submitting jobs that the SSCC's primary Condor flock has (this is something of an experiment). Instead, you will need to create a submit file for each job, telling CondorSilo how to run it. This will give you more control over how the job is run and allow CondorSilo to allocate jobs more efficiently. We'll provide some examples you can copy shortly.
A submit file will specify the program to run and the arguments needed to run it, but also tell CondorSilo how many cores and how much memory it needs. The core requirement is used in deciding where to run a job, but does not limit the number of cores the job can actually use. For example, if your submit file tells CondorSilo your job needs 30 cores (i.e. the submit file includes request_cpus = 30) and CondorSilo sees that a server is only running one job which requires a single core, it may put your job on that server. However, your job can try to use 44 cores and it will get 43 of them, and that's fine. If the single-core job finishes, your job will then use all of the 44 cores in the server. But if your submit file said the job required 44 cores, CondorSilo would not assign it to that server until after the single-core job finished. Meanwhile, if someone else submits a job that requires 30 cores, CondorSilo will not assign it to your server until your job is finished. You might think that it would be nicer to share the server, but a server running two jobs that can use all of its cores will take longer to finish both jobs than if they were run sequentially. If your job can take advantage of all of a server's cores, the best way to share the server with others is for your job to get done as quickly as possible and get out of the way. We thus suggest that if your job can take advantage of many cores, you tell CondorSilo you need 30.
Memory is different from cores, in that a job can only use as much memory as was requested in the submit file. Be sure to request as much as you need! For most statistical software, start by seeing how big your data set is, then estimate how big it's going to be by the time you're done, then add a 25-50% safety margin to that. Even better, run a single instance of the job on LinSilo and monitor its memory usage with top. If you don't specify how much memory you need, you'll get a fraction of the server's memory equal to the fraction of the server's cores you requested, roughly 8.8GB/core.
Submitting Jobs to CondorSilo
To submit a job to CondorSilo, create a submit file, and then run:
condor_submit my_submit_file
where my_submit_file should be replaced by the actual name of your submit file. We have example submit files for the most popular programs in the next section.
A submit file is just a text file. However, it must use Linux line endings (<Line Feed>), not Windows line endings (<Carriage Return> <Line Feed>) or Mac line endings (<Carriage Return>). If you use a Linux text editor like emacs or xemacs to write submit files that won't be an issue. You can use a Windows text editor like Notepad++, which is available on WinSilo, but you may need to change the line endings by clicking Edit, EOL Conversion. Regular Notepad will not work.
Creating a submit file for a given program often takes some close reading of the program's documentation and some trial and error. HTCondor jobs run in the background with no user interface ("headless") and the documentation for your program is likely to have instructions for how to do that. Often it will refer to running jobs in batch mode.
The most common components of a submit file are listed below. Some of them will always be needed, like executable; others should be used when needed.
universe = vanilla
This tells HTCondor what context to run your job in. It's unlikely you'll need anything other than vanilla .
executable = program to run
Replace program to run with the actual program to run, including the path to find it. You can identify the path to a program with which. For example, running which R will return /software/R/bin/R. To submit an R job to CondorSilo, you would put executable = /software/R/bin/R in your submit file.
arguments = arguments for your program
Replace arguments for your program with the arguments (things you type after the command) for your program. For many programs, the arguments will tell the program what script to run as well as how to run it. For example, to submit an R script called example.R and save the output to example.out, use arguments = CMD BATCH example.R example.out. This corresponds to running R CMD BATCH example.R example.out at the command line, a standard way to run an R job in batch mode.
input = input file
output = output file
error = error file
Linux has standard places for programs to get input and send output and error messages, called stdin, stdout, and stderr. These optional lines tell CondorSilo to use files for these purposes. Not all programs use them but Matlab, for example, will run scripts sent to stdin, so you can run a Matlab script called example.m with input = example.m.
request_cpus = number of cores
request_memory = amount of RAM GB
Replace number of cores and amount of RAM with what you need. As discussed above, for cores you're just setting the minimum your job will accept, while for memory you're also setting the maximum your job will be able to use.
environment = "name=value"
The environment line allows you to set system environment variables for your job. For example, Python jobs need PYTHONHOME to be set to the location where Python is installed, so you need to include environment = "PYTHONHOME=/software/anaconda37". Note that there can be no spaces between the name and value.
accounting_group = smph
SMPH researchers have priority on CondorSilo (since SMPH funded the servers), and the accounting_group line tells CondorSilo this is an SMPH job. Non-SMPH researchers should omit the accounting_group line.
queue
The queue line goes at the end and tells CondorSilo to actually put the job in the queue. You can put multiple jobs in the queue with one submit file: queue 10 will put ten jobs in the queue, for example. Each job will have a process number (ten jobs would be numbered 0 through 9), and you can use $(Process)elsewhere in the submit file to refer to that number. For example, to queue ten Matlab scripts called example0.m through example9.m you'd use input = example$(Process).m, or to pass the process number to a Stata do file as an argument it can use internally you'd use arguments = -b do example.do $(Process).
Example Submit Files
The following are example submit files for popular statistical software. Because the COVID-19 pandemic has put hiring on hold, the SSCC has not yet been able to hire the Biomedical Research Computing Facilitator who will be able to provide similar examples for biomedical research software. But we'll help you develop them as best we can, and will be very happy to add working submit files to our library of examples.
R, Python, and Matlab will only use multiple cores if you are using code that's been written to do so. If you are, we suggest you tell your program to ask for 44 cores but put in your submit file that you'll accept 30 (request_cpus = 30).
If you only request one core you'll get about 8.8GB of memory. If you needed 50GB, for example, you'd request it with request_memory = 50 GB
R
universe = vanilla
executable = /software/R/bin/R
arguments = CMD BATCH example.R example.output
error = example.error
request_cpus = 1
queue
Python
universe = vanilla
environment = "PYTHONHOME=/software/anaconda37"
executable = /software/anaconda37/bin/python3
arguments = example.py
output = example.output
error = example.error
request_cpus = 1
queue
Stata
universe = vanilla
executable = /software/stata/stata
arguments = -b do example.do
request_cpus = 32
queue
CondorSilo runs Stata MP, licensed for 32 cores. Note that if you tell Stata to run example.do in batch mode it will put all output and error messages in example.log regardless of what you specify in output = or error = lines.
Matlab
universe = vanilla
executable = /software/matlab/bin/matlab
arguments = -nodisplay -nojvm
input = test.m
output = test.output
error = test.error
request_cpus = 1
queue
Logging in to Silo Using a Hardware Token
If you would like to use a token to log into Silo and have one already, please contact the Help Desk and include the serial number printed on the back of your token in your email. The token must be black and white colored, with the words "OTP c100" printed on the front - we cannot use the blue and green tokens that say "Duo" on the front. If you do not have a token, please let us know and we will issue you one.
Log in by going to silo.ssc.wisc.edu and giving your SSCC username and password as usual. Then click Enter a Passcode and press the button on your token. Enter the 6-digit number from your token and click Log In .
If later you wish to add a smartphone app to your Silo account so you can authenticate with either the app or a token, click Add a new device under the SSCC logo and follow the onscreen instructions.
If you have any questions about using Silo, feel free to contact the Help Desk.