Login Node Performance Issues

After having some issues with compute nodes in Tampa losing access to the /work filesystem, our team discovered some latency inconsistencies between Tampa and Winter Haven (network packets taking longer to get from one place to the next). After an exhaustive investigation, we discovered that one switch in the Winter Haven was experiencing a [...]

System Updates for 01/06/2011 and Recurring Maintenance Schedule

Beginning on Thursday, January 6th, we will be beginning a regular maintenance period that will last from 10:00pm until 1:00am to install regular system patches. This maintenance period will NOT affect any running or queued jobs and SHOULD NOT affect your ability to use the system, though it is possible that some issues may arise during the maintenance window.

The [...]

Partial Resource Outage for Library Hardware

Hardware that is housed in our Library datacenter will need to be taken down on Monday, October 4th at 11:00am.  The work is scheduled to last until 5:00pm, but we expect to be finished before then.  We will need this time to make some needed configuration changes to the network.  The following queues will be [...]

Required Reboot on 08/27 for Bug Fix

In order to resolve a problem involving Java applications and NFS version 4, we will need to reboot all of the login nodes and all of the compute nodes.  Our tentative plan is for this to occur on Friday 08/27 as there are several applications experiencing difficulties because of this bug.  A reboot is required [...]

System Instability and Slowness Issues Resolved

Due to a problem that involved several compute nodes and our NFS file server which provides the /home directory, the system became slow and unstable during the day today.  During an attempt to isolate the problem nodes/processes/users, we were forced to quickly delete all jobs in an effort to prevent the failure from affecting other [...]