Test your script locally, then run it on LINCOMM
LINCOMM (Linux Community Servers) is built for big jobs — analyses that need more memory or time than your own computer can give. But it is a shared resource, and it isn't the most comfortable place to write and debug code. The approach that works best is to perfect your script locally on a small sample, then run the full job on LINCOMM. This page explains why, and how the other pieces you've set up fit together to make it painless.
The problem this solves
Writing code is a loop of small changes: run it, see what breaks, fix it, run it again. Doing that loop directly on LINCOMM is slow and awkward. You're working over a remote connection, on a shared machine, often against a large dataset that takes a while just to load. You also tie up resources other people are waiting for, all to chase a typo.
Meanwhile, your own computer is right in front of you, with your familiar editor, and a small sample of the data loads instantly. That's the right place to get the logic correct. LINCOMM is the right place to run it at full scale once it is.
How the workflow fits together
Three things you may have already set up combine into one smooth routine:
- Keep your data on the AAE file share. Because the same files are visible from both your computer and LINCOMM, you don't move copies around. See Where to keep your data: the AAE file share.
- Use relative paths in your script. Anchoring file paths to your script or project folder means the same code runs in both places with no editing. See Understanding relative paths.
- Run the heavy job under tmux. On LINCOMM, starting the job inside tmux lets it keep running if your connection drops. See Keep work running with tmux.
With those in place, the routine looks like this:
- On your computer, develop the script against a small sample of the data — a few hundred rows is plenty to prove the logic.
- Keep the script and its data together in a project folder, with relative paths between them, stored on the file share.
- When the script runs cleanly on the sample, connect to LINCOMM, start tmux, and run the same script against the full dataset.
- Collect your results from the file share, back on your own computer.
Nothing about the script changes between the two runs. Only the size of the data does.
Why test on a sample first
A bug that surfaces after thirty seconds on a sample might not surface until two hours into a full run — and then you've lost two hours. Testing on a small, representative slice catches most logic errors cheaply. Once the script is correct, scaling up is just a matter of pointing it at the whole dataset.
Limitations and trade-offs
- Your local setup and LINCOMM aren't identical. Package or library versions can differ, so a script that runs locally can still hit a surprise on LINCOMM. Installing the same package versions in both places reduces this.
- Your sample should resemble the real data. If it lacks the messy cases — missing values, odd encodings — the full run can still trip on them.
- Some datasets are too large to hold on a personal computer at all. In that case, test on a slice you copy locally, or test on LINCOMM itself using a small subset before launching the full job.