Using Globus to Move Data into Silo
Globus is a popular platform for accessing, transferring, and sharing research data, including data covered by HIPAA and other sensitive data. UW-Madison recently purchased a campus Globus license, and SSCC staff have created a Globus endpoint within the Silo environment. This allows SSCC members to use Globus to transfer data into the Silo environment directly, without needing the assistance of SSCC staff. Recall that you can move results out of Silo by putting them in your silosync folder.
Globus is accessed by going to app.globus.org. If you anticipate transferring relatively small amounts of data such that your transfers will be done in a few minutes or less, you don't need to install anything. However, if you will be moving large amounts of data we suggest installing the Globus Connect Personal client. It runs as a service and manages transfers initiated using the web app. It will continue transfers even if you close the web app completely, or resume transfers if they are interrupted. See Installing and Configuring the Globus Personal Connect Client for more details, but you need to log into the Globus web app at least once before setting up the client. Globus also has a Python API if you're interested in automating transfers.
If you want to upload data from your own computer you'll need to first connect to the UW-Madison network using VPN.
The first time you log into Globus you will be prompted to create a Globus account that is linked to your UW-Madison NetID. The instructions under Logging into Globus for the First Time will walk you through the process. This account is also linked to your SSCC account. Note that we needed to create a separate "collection" (more on collections in a moment) for SSCC members whose username does not match their NetID, but you'll still be able to use Globus normally.
Globus organizes data into "collections," which can be both sources of data and places to put data. SSCC has three collections where you can put data and it will be moved into the Silo file system. We expect that many UW-Madison departments and research centers will set up Globus collections so that you can transfer data from them to Silo quickly and easily over the high-speed campus network.
If you install the Globus Connect Personal client, you can designate certain folders on your computer (including network drives) as a collection that only you can see. Then you can use that collection to transfer data from your computer to Silo. However, you can also use the web app to upload data from your computer without installing the client or setting up a collection. Keep in mind that transfers from your computer will be limited to the speed of the network you are using at the time.
For security reasons, the Globus endpoint in Silo does not have direct access to the Silo file system. Imported files will be copied to the Silo file system 15-30 minutes after the import is complete. If you create a new directory in Silo, it may take up to 15 minutes for a corresponding directory to be created in Globus. Globus will also ask you to confirm your identity periodically, or before you carry out certain tasks.
If you have data on a server that does not use Globus, you can download it to your computer and then transfer it using Globus. However, do not download restricted data to your computer unless your computer has been approved to store such data. Contact the SSCC Help Desk and we'll work with the data provider to identify an alternative method, such as transferring the data directly from their servers to Silo via Secure FTP. You will also need to contact the Help Desk before using a Globus collection that is not on the UW-Madison network.
To transfer data into Silo, the first step is to locate the collections associated with Silo. In the Collection Search box, search for ‘Silo Transfer’. Three different options will appear:
- SSCC Silo Transfer Collection – corresponds to the V: Drive or /project in Silo. Use this if you're a non-SMPH SSCC researcher and your SSCC username matches your NetID.
- SMPH Silo Transfer Collection – corresponds to the S: Drive or /smph in Silo. Use this if you're an SMPH researcher.
- SSCC Silo Transfer Collection for Unmatched Usernames - corresponds to the V: Drive or /project in Silo. Use this if your SSCC username does not match your NetID.
Choose the appropriate collection, open the project folder, and then navigate to the folder you want to put your data in.
If you are uploading files from your own computer without using a collection, next choose Upload and then Select Files to Upload. You'll see the upload begin immediately.
If you are transferring files from a collection, either on your computer or elsewhere, use the second Collection Search box to find it and then identify the file or folder you wish to transfer. Select Transfer or Sync to... and then the Start button that's pointing in the appropriate direction. Because the transfer will be managed by the Global Connect Personal client you won't see it happening, but you can check on its progress by clicking Activity. Globus will send you an email when the transfer is complete.
Installing and Configuring the Globus Personal Connect Client
The Globus Connect Personal client makes it easier to move large amounts of data through Globus. If you are using a managed computer, talk to your local IT department before installing any new software. You also need to log into the Globus web app at least once first.
Download Global Connect Personal (all operating systems)
Most of the installation process is straightforward (see the full documentation for details) but we will highlight two items.
The Collection Details page will ask you to name your personal collection. Choose something that is likely to be unique so it will be easy to find in a search. Including your username in the collection name will probably work well.
Adding New Folders Your Collection Can Access for Windows
- Right-click the Globus Connect Personal icon in the taskbar and select Options.
- The "Access" tab lists folders that will be accessible via Globus for file transfers. By default, the only folder listed is your home directory. Add folders by clicking the "+" icon and selecting the folder you wish to make accessible. To remove a folder click on the desired folder in the list then click the "-" icon to remove it from the list.
Adding New Folders Your Collection Can Access for macOS
- Click the Globus Connect Personal icon in the main menu bar.
- After you click on the Globus icon a menu will appear and you should select Preferences.
- The "Access" preferences tab lists accessible directories for file transfers. By default, your home directory (e.g.: /Users/demodoc) is listed.
- Click the "+" icon and select a folder to make it accessible for transfers. If you remove everything from the access list, no files will be accessible on your Globus Connect Personal endpoint and you will be prompted to add accessible paths. You can either click "+" and add directories and files or click Reset to Defaults.
Logging into Globus for the First Time
- In a browser, go to app.globus.org
- On the Globus home page, under the Use your existing organizational login header, select the University of Wisconsin - Madison and hit the Continue button.
- You will be redirected to the University of Wisconsin – Madison login page. You will need to use your NetID and password to log in.
- The first time you log in you will need to link your Globus account with your NetID and take a few other steps. If you already have a Globus account not linked to UW-Madison, you can link it to your UW-Madison account at this time. Otherwise, click Continue.
- You will then be brought to a page where you can set permissions for Globus. Click on the Allow button.
At this point, your account is ready for use and you will be brought to the Globus dashboard. You can verify that your identity has been successfully linked by clicking on the Account button in the sidebar on the left-hand side of the screen. On the account page, on the right-hand side of the screen, there is an option to Manage Identities and Link Another Identity. Click on the Manage Identities button to see what your primary identity is.