SSCC News for August 2023

SSCC Training

The SSCC’s fall training schedule is now available. We’d like to particularly draw the attention of new graduate students to the “Introduction to…” and “Data Wrangling in…” workshops in Stata, R, and Python, as these workshops are designed for you!

You should take an “Introduction to…” workshop if you:

  • Will be taking a quantitative methods class that uses statistical software you haven’t used before
  • Don’t feel like you really understand the statistical software you use
  • Plan to take an SSCC “Data Wrangling in…” workshop (unless you’re very confident you don’t need it)

You should take a “Data Wrangling in…” workshop if you plan to do quantitative research and you have never taken a class that spent a significant amount of time on preparing real-world data for analysis. For example, you should be able to do the following quickly and easily using reproducible code:

  • Given a categorical “education” variable with a large number of very specific categories, combine them into a small number of broader categories
  • Given a data set of individuals grouped into households, calculate the household incomes and create an indicator variable for “this household has children”
  • Given a data set of individuals and a data set of counties, combine them into a single data set containing both individual-level variables and county-level variables

The SSCC’s statistical consultants have many years of experience helping graduate students like you, so we know what you need to learn and what’s most likely to be a challenge to you. Let us help you learn it!

Summer Tech Update and Downtime

On Saturday, August 12, from 9:00AM to 5:00PM, SSCC staff will take Winstat, Linstat, Silo, and Slurm offline for updates. SSCC disk space will also be unavailable (U:, V:, X:, Z:, /project, etc.). Any running jobs will be terminated, though Slurm will restart some jobs automatically once the downtime is over.

During this time all software will be updated to the latest version, including Stata 18.

Slurm Tips

While it’s unusual for jobs not to be able to run immediately on the Slurm cluster, when it does happen the bottleneck is almost always memory rather than cores. Keep in mind we have a large number of servers with 128 cores and 256GB of memory, and a much smaller number of servers with 128 cores and 512GB or 1024GB of memory. Only reserve more than 250Gb of memory if you’re sure you need it–jobs that reserve less than 250GB of memory essentially always start running immediately because they can run on any of the servers. It’s good to give your job a safety margin, but we’re seeing too many jobs reserving many times more memory than they need.

If you need to use a large fraction of a server’s memory, and if your job can benefit from multiple cores at all, please use at least a similar fraction of the server’s cores. As an example of what we hate to see, two jobs that reserve 500GB of memory and four cores each will fully occupy one of the high-memory servers but leave 120 cores sitting idle. Better for each job to use half the server’s cores (64) since it’s using half the memory, so when the server is full all the cores will be in use. Using more cores actually benefits others by making your job run faster and free up the memory it’s using sooner.

If your job cannot use multiple cores, consider whether the job itself can be broken up into many smaller jobs that the cluster can run in parallel. We’ve created simple instructions for doing so in Stata, R, and Python. The SSCC’s statistical consultants will be happy to help you figure out if your work is a candidate for parallelization.

Don’t forget the “short” partition, especially on those occasions when the Slurm cluster is full. One high-memory server is reserved for short jobs, but jobs submitted to the short partition will also run on other servers if they’re available. Thus there’s no downside to using the short partition other than the maximum job length of six hours (as opposed to ten days in the default “sscc” partition).

Planning For the End of Windows 10

Microsoft has announced that it will stop supporting Windows 10 in October 2025. That means security problems will no longer be fixed after that time, which will make computers running Windows 10 a threat to any network they’re on. In accordance with UW System policy, all computers running Windows 10, whether personal or University-owned, will be blocked from the SSCC network once support for Windows 10 ends. Windows users should upgrade to Windows 11 well before that point, though there’s no particular need to do so now.

In general, computers made prior to 2017 will not be able to run Windows 11. If you have a computer that is older than that, make plans to replace it sometime in the next two years–at that point it will be at least eight years old and ready to be replaced anyway. Unfortunately, SSCC departments have about 300 computers that fall in that category. We will be reaching out to departments to provide a list and help you develop a plan, as we realize finding room in your budget to replace those computers may take the full two years.