Research Computing, a division of Information Technology, has been established to promote the availability of high performance computing resources essential to effective research at the University of South Florida. Research Computing supports software tools, high performance computer hardware, and training for both faculty and students.

Our resources are freely available to faculty and students involved in research projects. We ask that you kindly acknowledge us in all publication for which our resources have been useful. Our preferred acknowledgment statement is available here. We also ask that you send an email to publications@rc.usf.edu to let us know of publications or grant requests in which our facilities are mentioned. We would also appreciate being kept informed of any grant requests that are fulfilled.

Research Computing Users Group & Workshop

For those of you that might be unaware, RC has been holding weekly user group and workshop meetings to answer questions and discuss issues in person.

The weekly Research Computing Workshop Sessions will be moving to the new Advanced Visualization Center beginning this week. The Visualization Center is in room PHY 147, which is across the hall from the physics auditorium. The sessions will still be held on Tuesdays from 2-3pm, and are open to anyone that would like to ask questions about or discuss issues with our systems and applications. Also, feel free to stop by to check out the Visualization Center and its brand new 3D-HD visualization wall.

Issue with /work 04/12/2012

Today it was discovered that roughly 40 compute nodes within our cluster had dropped their /work mounts.

As a result of this issue, those nodes needed to be rebooted.  Once the nodes resumed operations, /work was again available without any issue.

Because of the reboots, running user jobs would have been negatively affected.  These jobs will need to be re-submitted given not only the interruption caused by the reboot, but also because when /work drops as a mount, output is no longer generated and the job is effectively stalled.

Please check your recently submitted jobs and make sure that they completed with appropriate output files, and if you notice that there is entropy within the output files or no output files at all, please re-submit the job(s).

John DeSantis
Research Computing

 

Maple 16 and Matlab R2012a News

It is our pleasure to announce that Maple 16 is now available for use on CIRCE.  Users of NX will find that it is listed under “Applications -> Education”.

Matlab release R2012a is now available for cluster use and can be downloaded by students and faculty engaging in research from the RC ISO’s site (NetID and Password required).  See our documentation for more information about installing and running Matlab here: https://rc.usf.edu/trac/doc/wiki/MatlabUser

John DeSantis

InfiniBand Fabric Issue, 04/10/2012

During work to install several more compute nodes to the wh.2012.01.q hardware pool, the power cable to one of the InfiniBand switches was briefly disconnected by accident. If any jobs fail with the below error, please resubmit them as the problem has been resolved. We apologize for any inconvenience.


WARNING: There is at least one OpenFabrics device found but there are
no active ports detected (or Open MPI was unable to use them). This
is most certainly not what you wanted. Check your cables, subnet
manager configuration, etc. The openib BTL will be ignored for this
job.

May 2012 Upgrades to /work Filesystem

During the first week of May, we will be upgrading the current /work filesystem in order to provide significant performance enhancements and most importantly, long-awaited fixes related stability.

/work will be migrated from a home-grown GlusterFS volume of roughly 100TB to a vendor-supported Lustre filesystem of equivalent capacity, capable of 4GB/s of read/write throughput and a variety of options for supporting various applications as efficiently as possible. This upgrade will eliminate the issues we have had with /work since January providing greater stability for jobs and much improved performance for jobs that are I/O-hungry.

All current data on /work will be copied to the new /work filesystem. There will be a cut-off date, where the /work filesystem will be taken offline for several hours to do one last sync of the data before bringing the new system live. We will narrow down this date and let everyone know when the downtime period will be.

/work2 will be taken out-of-service following this upgrade as it will be made redundant by the new configuration. Data on /work2 will NOT be transfered and it will be up to you to ensure that data is copied either to /work or /home prior to taking the volume off-line.

This work is slated to occur sometime during the first week of May. As we narrow down the date, we will update you with our revised timeline.

Thank you for your patience and cooperation during this process. Please let us know if there are any questions related to this work.