Using Silo

Silo is the SSCC's secure computing enclave. It consists of secure servers and data storage for working with data covered by HIPAA and other sensitive data. Silo contains Windows-based servers (WinSilo), Linux based servers (LinSilo and LinSiloBig) and an HTCondor pool (CondorSilo). WinSilo acts as the gateway to all of Silo: when you log in, you'll start on WinSilo and can log into the other servers from there.

Silo is the SSCC's secure computing enclave. It consists of secure servers and data storage for working with data covered by HIPAA and other sensitive data. Silo contains Windows-based servers (WinSilo), Linux based servers (LinSilo and LinSiloBig) and an HTCondor pool (CondorSilo). WinSilo acts as the gateway to all of Silo: when you log in, you'll start on WinSilo and can log into the other servers from there.

Keep reading to learn how to access Silo and use the Windows servers in the Silo environment. Or you can skip ahead for information on:

Getting Access

If you are interested in using Silo, please contact the  SSCC Help Desk. Depending on the nature of your data you may need to get approval from your IRB, the UW-Madison Office of Cybersecurity, or other relevant authorities. Using Silo will expedite that process because it has already had a formal risk assessment by the Office of Cybersecurity and was found to be low risk, and many of the relevant authorities are already familiar with it.

Connecting to Silo requires multifactor authentication using Duo, the same system used for your UW-Madison NetID. Most people use the Duo app on their smartphone, but you can also use a separate hardware token. When you contact the Help Desk about using Silo please mention which method you prefer. Instructions for obtaining and using a hardware token can be found here

Installing the Citrix Workspace App

To use Silo you'll need to have the Citrix Workspace App installed on your computer. Just click on the appropriate link below and then run the installer after it finishes downloading. If you've already installed the Citrix Workspace App on your computer in order to use Winstat you do not need to install it again.

See  Using Winstat  for more information about using the Citrix Workspace App.

Logging In

To log in to Silo, you will need go to the web site  silo.ssc.wisc.edu. If you're asked to give permission for programs to run, do so.

silo login screen

At the login screen, first give your SSCC username and password as usual. If it is your first time logging into Silo, you'll be prompted to follow the instructions on the screen to set up Duo on your smartphone. You'll be able to use the same Duo app for both Silo and your NetID.

The Silo File System

Silo has an isolated file system that is separate from SSCC's primary file system, but they have similar structures. All files are available using either Windows or Linux. The key locations are:


Linux NameWindows Name
Home Directory (Private Space) ~  Z:  Drive
Project Directories (Shared Space)/project V:  Drive
SMPH Project Directories (Shared Space)/smph S:  Drive

Home directories are primarily meant for configuration files, installed packages, and other small files. They have a quota of 20GB, which can be expanded on request to 40GB. Research data should be stored in Project directories, which have no quotas and are shared with the other members of your research group. SMPH researchers will be given project space on the S: drive, which is SMPH's space in DoIT's RestrictedDrive service.

Moving Data To and From Silo

To move sensitive data into Silo, contact the  SSCC Help Desk  and we'll work with you to find the most convenient way to transfer your data. We are working with popular data providers to try to automate this process.

To move non-sensitive files into Silo and to get results out you'll use SSCC's primary file system as a staging area. The easy way to access SSCC's primary file system is to log into Winstat, which you'll see when you log into Silo. Winstat is very similar to WinSilo but, not being a high-security server, it can communicate with the local hard drive of your computer. After logging into Winstat, you'll see a drive called Local Disk (C: on {your computer}). You can drag files to and from this drive to move them between your computer and SSCC's primary file system. You're also welcome access SSCC's primary file system by mapping a drive to it (  Windows  /  MacOS) or using  FTP  .

To move non-sensitive files into Silo, put them in a folder on your Z: drive in the primary SSCC file system and then contact the  SSCC Help Desk  and ask that that folder be copied to Silo.

Moving data off of Silo's file system is simpler. We have created a folder called  silosync in the Z: drive of each Silo user. Every five minutes, an automated script copies anything placed in this folder to a corresponding folder in your Z: drive on SSCC's primary file system. You can then access it using Winstat. Similar folders can be created within projects. ( silosync does not automatically copy files from the primary SSCC file system to Silo.) It is your responsibility to ensure any data you place in the silosync folder can be appropriately stored on the SSCC's primary file system and do not require the additional security Silo provides.

Using WinSilo

Once you've logged in, WinSilo behaves just like a regular Windows server, with a few important exceptions:

  • Silo cannot access the Internet. This can affect programs in unexpected ways: for example, Stata's findit command takes much longer to run than usual and then only gives partial results, because it tries to reach Stata's web server and does not display any results until that attempt times out. Fortunately the results it does give are the ones you're most likely to need.
  • You can install R packages from CRAN and Bioconductor, Stata packages from SSC, and Python packages from PyPI. If you need to install packages from other sources contact the Help Desk.
  • You cannot copy and paste between Silo and your own computer.
  • Silo cannot access disk space on your computer.
  • You cannot print from Silo.

Silo Downtime

Silo has a downtime from 7:00am-9:00am the first Wednesday of the month for security updates.

If you don't anticipate using Silo's Linux servers, you can stop reading at this point. Welcome to Silo!

Silo Linux Servers

Silo has three kinds of Linux servers. To use them, first sign into the Silo environment, which will put you on WinSilo, and then you can log into them from there.

 LinSilo  is a cluster of three servers ( linsilo001, linsilo002, linsilo003) with 44 cores and 384GB of memory each. When you log into LinSilo, you'll be automatically directed to the least busy server. If you start a long job on a server, you'll need to go back to the same server to manage it. You can switch to a different server with the  ssh  command; e.g. to get to linsilo001 type:

ssh linsilo001

 LinSiloBig  is a cluster of two servers ( linsilobig001,  linsilobig002) with 80 cores and 768GB of memory each. Again, when you log into LinSiloBig you'll be the directed to the least busy server, but you can switch with ssh. In fact, using LinSiloBig is so similar to using LinSilo that you can assume instructions for using LinSilo apply to LinSiloBig unless we say otherwise.

Jobs that cannot run without the additional memory LinSiloBig provides should have first priority on LinSiloBig. Jobs that can take full advantage of the additional cores it provides are second priority. Please do not use LinSiloBig for jobs that could run just as well on LinSilo or CondorSilo.

 CondorSilo  is a cluster of nine servers with 44 cores and 384GB of RAM each. It runs jobs submitted via HTCondor.

The Linux servers in the Silo environment were funded by SMPH. Other researchers are welcome to use them, but SMPH researchers have priority.

Logging into LinSilo

To log into LinSilo, click on the Windows logo button and find the LinSilo folder in the programs list. Then click on the LinSilo or LinSiloBig icon. This will start a program called X-Win32. The first thing you'll see is a utility window that you can ignore (but don't close it or it will close the entire program—minimize it instead). The login prompt will come up shortly thereafter, and then a terminal window once you log in.

If you'll use these frequently, you can right-click on them and pin them so that they'll come up as soon as you click the Windows logo button.

If you don't need graphics, you can also log into LinSilo using SecureCRT.

Running Programs on LinSilo

LinSilo has a wide variety of software installed, including both general-purpose statistical software and specialized software for biomedical research.

SSCC has used  tcsh as its default Linux shell for more than 20 years, but  bash has become more popular. (If you google how to do something in Linux the solution you find will probably be written for  bash  , but most of the time it will work in  tcsh too since they're not that different.) Many SMPH researchers have a great deal of experience using  bash, so we have made it the default shell for members from SMPH. If you'd like to switch shells (in either direction) contact the  Help Desk  .

Here are the commands to run a few selected programs on LinSilo. In the "Command to run a long job" column, the command is given in the form needed by bash.  tcsh  users should omit the nohup at the beginning of the command.

ProgramCommand to run it interactivelyCommand to run a long job that will continue after you log out (  bash  version)
RRnohup R CMD BATCH myprogram.R &
Python 3.7 (command line)pythonnohup python myprogram.py &
Python 2.7 (command line)python2nohup python2 myprogram.py &
Spyder (Python IDE)spyder
Jupyter Notebookjupyter notebook
Stataxstatanohup stata -b do mydofile &
Matlabmatlab
nohup matlab -nodisplay < mprogram.m > myprogram.log &
SASsasnohup sas myprogram.sas &

Of course there are many ways to run these programs, and many more programs!

LinSilo is very similar to Linstat, SSCC's general-purpose Linux cluster, so you may find Using Linstat and Managing Jobs on Linstat helpful.

Running RStudio on LinSilo

To run RStudio on LinSilo, go to the LinSilo folder in the programs list and double-click on the LinSilo RStudio Server or LinSiloBig RStudio Server icon. (Again, if you'll use these frequently you can pin them.) This will open a web browser on the Windows server containing an RStudio interface that connects to R running on LinSilo or LinSiloBig. Jobs run in RStudio will continue to run even if you log out.

CondorSilo

CondorSilo runs HTCondor, developed by the UW-Madison Computer Science Department. HTCondor is designed to let you run multiple jobs on multiple servers efficiently: when you submit jobs to an HTCondor queue, HTCondor will find available servers to run them.  General documentation for using HTCondor can be found here. CondorSilo differs from a standard HTCondor installation in three important ways:

  1. It shares the Silo file system with the other Silo servers, so there's no need to transfer the files needed by CondorSilo jobs to the CondorSilo servers.
  2. It does not checkpoint or terminate jobs.
  3. It cannot send you email when a job finishes because it is in the isolated Silo environment. You'll have to check the status of your jobs.

CondorSilo does not have the simple scripts for submitting jobs that SSCC's primary Condor flock has (this is something of an experiment). Instead, you will need to create a submit file for each job, telling CondorSilo how to run it. This will give you more control over how the job is run and allow CondorSilo to allocate jobs more efficiently. We'll provide some examples you can copy shortly.

A submit file will specify the program to run and the arguments needed to run it, but also tell CondorSilo how many cores and how much memory it needs. The core requirement is used in deciding where to run a job, but does not limit the number of cores the job can actually use. For example, if your submit file tells CondorSilo your job needs 30 cores (i.e. the submit file includes request_cpus = 30) and CondorSilo sees that a server is only running one job which requires a single core, it may put your job on that server. However, your job can try to use 44 cores and it will get 43 of them, and that's fine. If the single-core job finishes, your job will then use all of the 44 cores in the server. But if your submit file said the job required 44 cores, CondorSilo would not assign it to that server until after the single-core job finished. Meanwhile, if someone else submits a job that requires 30 cores, CondorSilo will not assign it to your server until your job is finished. You might think that it would be nicer to share the server, but a server running two jobs that can use all of its cores will take longer to finish both jobs than if they were run sequentially. If your job can take advantage of all of a server's cores, the best way to share the server with others is for your job to get done as quickly as possible and get out of the way. We thus suggest that if your job can take advantage of many cores, you tell CondorSilo you need 30.

Memory is different from cores, in that a job can only use as much memory as was requested in the submit file. Be sure to request as much as you need! For most statistical software, start by seeing how big your data set is, then estimate how big it's going to be by the time you're done, then add a 25-50% safety margin to that. Even better, run a single instance of the job   on LinSilo and monitor its memory usage with top. If you don't specify how much memory you need, you'll get a fraction of the server's memory equal to the fraction of the server's cores you requested, roughly 8.8GB/core.

Submitting Jobs to CondorSilo

To submit a job to CondorSilo, create a submit file, and then run:

condor_submit  my_submit_file 

where my_submit_file should be replaced by the actual name of your submit file. We have example submit files for the most popular programs in the next section.

A submit file is just a text file. However, it must use Linux line endings (<Line Feed>), not Windows line endings (<Carriage Return> <Line Feed>) or Mac line endings (<Carriage Return>). If you use a Linux text editor like emacs or xemacs to write submit files that won't be an issue. You can use a Windows text editor like Notepad++, which is available on WinSilo, but you may need to change the line endings by clicking  Edit,  EOL Conversion. Regular Notepad will not work.

Creating a submit file for a given program often takes some close reading of the program's documentation and some trial and error. HTCondor jobs run in the background with no user interface ("headless") and the documentation for your program is likely to have instructions for how to do that. Often it will refer to running jobs in batch mode.

The most common components of a submit file are listed below. Some of them will always be needed, like executable; others should be used when needed.

universe = vanilla

This tells HTCondor what context to run your job in. It's unlikely you'll need anything other than vanilla  .

executable =  program to run 

Replace program to run with actual program to run, including the path to find it. You can identify the path to a program with which. For example, running  which R  will return  /software/R/bin/R. To submit an R job to CondorSilo, you would put executable = /software/R/bin/R in your submit file.

arguments =  arguments for your program 

Replace arguments for your program  with the arguments (things you type after the command) for your program. For many programs, the arguments will tell the program what script to run as well as how to run it. For example, to submit an R script called example.R and save the output to  example.out, use arguments = CMD BATCH example.R example.out. This corresponds to running R CMD BATCH example.R example.out at the command line, a standard way to run an R job in batch mode.

input =  input file 

output =  output file 

error =  error file 

Linux has standard places for programs to get input and send output and error messages, called  stdin,  stdout, and  stderr. These optional lines tell CondorSilo to use files for these purposes. Not all programs use them but Matlab, for example, will run scripts sent to stdin, so you can run a Matlab script called example.m with input = example.m.

request_cpus =  number of cores 
request_memory =  amount of RAM  GB

Replace number of cores and amount of RAM with what you need. As discussed above, for cores you're just setting the minimum your job will accept, while for memory you're also setting the maximum your job will be able to use.

environment = "name=value"

The environment line allows you to set system environment variables for your job. For example, Python jobs need PYTHONHOME to be set to the location where Python is installed, so you need to include environment = "PYTHONHOME=/software/anaconda37". Note that there can be no spaces between the name and value.  

accounting_group = smph

SMPH researchers have priority on CondorSilo (since SMPH funded the servers), and the accounting_group line tells CondorSilo this is an SMPH job. Non-SMPH researchers should omit the accounting_group line.

queue

The queue line goes at the end and tells CondorSilo to actually put the job in the queue. You can put multiple jobs in queue with one submit file: queue 10 will put ten jobs in the queue, for example. Each job will have a process number (ten jobs would be numbered 0 through 9), and you can use $(Process)elsewhere in the submit file to refer to that number. For example, to queue ten Matlab scripts called example0.m through example9.m you'd use input = example$(Process).m, or to pass the process number to a Stata do file as an argument it can use internally you'd use arguments = -b do example.do $(Process).

Example Submit Files

The following are example submit files for popular statistical software. Because the COVID-19 pandemic has put hiring on hold, SSCC has not yet been able to hire the Biomedical Research Computing Facilitator who will be able to provide similar examples for biomedical research software. But we'll help you develop them as best we can, and will be very happy to add working submit files to our library of examples.

R, Python, and Matlab will only use multiple cores if you are using code that's been written to do so. If you are, we suggest you tell your program to ask for 44 cores but put in your submit file that you'll accept 30 (request_cpus = 30).

If you only request one core you'll get about 8.8GB of memory. If you needed 50GB, for example, you'd request it with request_memory = 50 GB 

R

universe = vanilla
executable = /software/R/bin/R
arguments = CMD BATCH exmaple.R example.output
error = example.error
request_cpus = 1
queue

Python

universe = vanilla
environment = "PYTHONHOME=/software/anaconda37"
executable = /software/anaconda37/bin/python3
arguments = example.py
output = example.output
error = example.error
request_cpus = 1
queue

Stata

universe = vanilla
executable = /software/stata/stata
arguments = -b do example.do
request_cpus = 32
queue

CondorSilo runs Stata MP, licensed for 32 cores. Note that if you tell Stata to run example.do in batch mode it will put all output and error messages in  example.log  regardless of what you specify in output = or error = lines.

Matlab

universe = vanilla
executable = /software/matlab/bin/matlab
arguments = -nodisplay -nojvm
input = test.m
output = test.output
error = test.error
request_cpus = 1
queue

Logging in to Silo Using a Hardware Token

If you would like to use a token to log into Silo and have one already, please contact the  Help Desk  and include the serial number printed on the back of your token in your email. The token must be black and white colored, with the words "OTP c100" printed on the front - we cannot use the blue and green tokens that say "Duo" on the front. If you do not have a token, please let us know and we will issue you one.

Silo token types

Log in by going to  silo.ssc.wisc.edu  and giving your SSCC username and password as usual. Then click Enter a Passcode and press the button on your token. Enter the 6-digit number from your token and click Log In  .

If later you wish to add a smartphone app to your Silo account so you can authenticate with either the app or a token, click Add a new device under the SSCC logo and follow the onscreen instructions.

If you have any questions about using Silo, feel free to contact the Help Desk  .





Keywords:silo winsilo linsilo winrd hipaa secure server   Doc ID:102689
Owner:Mitchell K.Group:Social Science Computing Cooperative
Created:2020-05-31 10:23 CDTUpdated:2020-06-26 13:15 CDT
Sites:Social Science Computing Cooperative
Feedback:  0   0