<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>USF Information Technology, Research Computing</title>
	<atom:link href="http://rc.blog.usf.edu/feed/" rel="self" type="application/rss+xml" />
	<link>http://rc.blog.usf.edu</link>
	<description>News and Information about system events, updates, and other topics.</description>
	<lastBuildDate>Wed, 16 May 2012 16:43:42 +0000</lastBuildDate>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
	<generator>http://wordpress.org/?v=3.2.1</generator>
		<item>
		<title>Login Node Performance Issues</title>
		<link>http://rc.blog.usf.edu/system/maintenance/login-node-performance-issues/</link>
		<comments>http://rc.blog.usf.edu/system/maintenance/login-node-performance-issues/#comments</comments>
		<pubDate>Wed, 16 May 2012 16:43:42 +0000</pubDate>
		<dc:creator>Brian Smith</dc:creator>
				<category><![CDATA[Maintenance]]></category>
		<category><![CDATA[System]]></category>

		<guid isPermaLink="false">http://rc.blog.usf.edu/?p=551</guid>
		<description><![CDATA[<p>After having some issues with compute nodes in Tampa losing access to the /work filesystem, our team discovered some latency inconsistencies between Tampa and Winter Haven (network packets taking longer to get from one place to the next).  After an exhaustive investigation, we discovered that one switch in the Winter Haven was experiencing a [...]]]></description>
			<content:encoded><![CDATA[<p>After having some issues with compute nodes in Tampa losing access to the /work filesystem, our team discovered some latency inconsistencies between Tampa and Winter Haven (network packets taking longer to get from one place to the next).  After an exhaustive investigation, we discovered that one switch in the Winter Haven was experiencing a performance bug causing it to delay the forwarding of packets, introducing latencies of 30-90ms (miliseconds).  This may not sound like a lot, but when you consider that any time you do an &#8220;ls&#8221; on your home directory, potentially hundreds of individual requests can be made to the NFS server.  At 30-90ms per request, significant lag can occur between issuing the command and actually getting a response.  Also, when you consider creating or accessing a tar archive, the thousands of operations involved will be slowed down by that additional latency, making a process that could take 5 seconds instead take 5 minutes.  We&#8217;re now down to normal latencies which are around 200us (microseconds).</p>
<p>After fixing the configuration and reloading the switch, system performance across the board improved dramatically, resolving the file transfer performance issues we saw with NFS, the mount issues with the compute nodes, and a couple minor annoyances (tab completion, etc.).  </p>
]]></content:encoded>
			<wfw:commentRss>http://rc.blog.usf.edu/system/maintenance/login-node-performance-issues/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Research Computing Users Group &amp; Workshop</title>
		<link>http://rc.blog.usf.edu/uncategorized/research-computing-users-group-workshop/</link>
		<comments>http://rc.blog.usf.edu/uncategorized/research-computing-users-group-workshop/#comments</comments>
		<pubDate>Mon, 16 Apr 2012 18:11:49 +0000</pubDate>
		<dc:creator>Brian Smith</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.blog.usf.edu/?p=549</guid>
		<description><![CDATA[<p>For those of you that might be unaware, RC has been holding weekly user group and workshop meetings to answer questions and discuss issues in person.</p>
<p>The weekly Research Computing Workshop Sessions will be moving to the new Advanced Visualization Center beginning this week.  The Visualization Center is in room PHY 147, which is across [...]]]></description>
			<content:encoded><![CDATA[<p>For those of you that might be unaware, RC has been holding weekly user group and workshop meetings to answer questions and discuss issues in person.</p>
<p>The weekly Research Computing Workshop Sessions will be moving to the new Advanced Visualization Center beginning this week.  The Visualization Center is in room PHY 147, which is across the hall from the physics auditorium.  The sessions will still be held on Tuesdays from 2-3pm, and are open to anyone that would like to ask questions about or discuss issues with our systems and applications.  Also, feel free to stop by to check out the Visualization Center and its brand new 3D-HD visualization wall.</p>
]]></content:encoded>
			<wfw:commentRss>http://rc.blog.usf.edu/uncategorized/research-computing-users-group-workshop/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Issue with /work 04/12/2012</title>
		<link>http://rc.blog.usf.edu/uncategorized/issue-with-work-04122012/</link>
		<comments>http://rc.blog.usf.edu/uncategorized/issue-with-work-04122012/#comments</comments>
		<pubDate>Thu, 12 Apr 2012 20:51:28 +0000</pubDate>
		<dc:creator>John DeSantis</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.blog.usf.edu/?p=546</guid>
		<description><![CDATA[<p>Today it was discovered that roughly 40 compute nodes within our cluster had dropped their /work mounts.</p>
<p>As a result of this issue, those nodes needed to be rebooted.  Once the nodes resumed operations, /work was again available without any issue.</p>
<p>Because of the reboots, running user jobs would have been negatively affected.  These jobs will need to be re-submitted given [...]]]></description>
			<content:encoded><![CDATA[<p>Today it was discovered that roughly 40 compute nodes within our cluster had dropped their /work mounts.</p>
<p>As a result of this issue, those nodes needed to be rebooted.  Once the nodes resumed operations, /work was again available without any issue.</p>
<p>Because of the reboots, running user jobs would have been negatively affected.  These jobs will need to be re-submitted given not only the interruption caused by the reboot, but also because when /work drops as a mount, output is no longer generated and the job is effectively stalled.</p>
<p>Please check your recently submitted jobs and make sure that they completed with appropriate output files, and if you notice that there is entropy within the output files or no output files at all, please re-submit the job(s).</p>
<p>John DeSantis<br />
Research Computing</p>
<p>&nbsp;</p>
]]></content:encoded>
			<wfw:commentRss>http://rc.blog.usf.edu/uncategorized/issue-with-work-04122012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Maple 16 and Matlab R2012a News</title>
		<link>http://rc.blog.usf.edu/hpc-software/maple-16-and-matlab-r2012a-news/</link>
		<comments>http://rc.blog.usf.edu/hpc-software/maple-16-and-matlab-r2012a-news/#comments</comments>
		<pubDate>Thu, 12 Apr 2012 13:01:05 +0000</pubDate>
		<dc:creator>John DeSantis</dc:creator>
				<category><![CDATA[Software]]></category>
		<category><![CDATA[Updates]]></category>

		<guid isPermaLink="false">http://rc.blog.usf.edu/?p=539</guid>
		<description><![CDATA[<p>It is our pleasure to announce that Maple 16 is now available for use on CIRCE.  Users of NX will find that it is listed under &#8220;Applications -&#62; Education&#8221;.</p>
<p>Matlab release R2012a is now available for cluster use and can be downloaded by students and faculty engaging in research from the RC ISO&#8217;s site (NetID and [...]]]></description>
			<content:encoded><![CDATA[<p>It is our pleasure to announce that Maple 16 is now available for use on CIRCE.  Users of NX will find that it is listed under &#8220;Applications -&gt; Education&#8221;.</p>
<p>Matlab release R2012a is now available for cluster use and can be downloaded by students and faculty engaging in research from the RC ISO&#8217;s site (NetID and Password required).  See our documentation for more information about installing and running Matlab here: <a href="https://rc.usf.edu/trac/doc/wiki/MatlabUser" target="_blank">https://rc.usf.edu/trac/doc/<wbr>wiki/MatlabUser</wbr></a></p>
<p>John DeSantis</p>
]]></content:encoded>
			<wfw:commentRss>http://rc.blog.usf.edu/hpc-software/maple-16-and-matlab-r2012a-news/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>InfiniBand Fabric Issue, 04/10/2012</title>
		<link>http://rc.blog.usf.edu/uncategorized/infiniband-fabric-issue-04102012/</link>
		<comments>http://rc.blog.usf.edu/uncategorized/infiniband-fabric-issue-04102012/#comments</comments>
		<pubDate>Tue, 10 Apr 2012 20:56:36 +0000</pubDate>
		<dc:creator>Brian Smith</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.blog.usf.edu/?p=531</guid>
		<description><![CDATA[<p>During work to install several more compute nodes to the wh.2012.01.q hardware pool, the power cable to one of the InfiniBand switches was briefly disconnected by  accident. If any jobs fail with the below error, please resubmit them as the problem has been resolved. We apologize for any inconvenience.</p>
<p>&#8230;
WARNING: There is at least one [...]]]></description>
			<content:encoded><![CDATA[<p>During work to install several more compute nodes to the wh.2012.01.q hardware pool, the power cable to one of the InfiniBand switches was briefly disconnected by  accident. If any jobs fail with the below error, please resubmit them as the problem has been resolved. We apologize for any inconvenience.</p>
<p>&#8230;<br />
WARNING: There is at least one OpenFabrics device found but there are<br />
no active ports detected (or Open MPI was unable to use them). This<br />
is most certainly not what you wanted. Check your cables, subnet<br />
manager configuration, etc. The openib BTL will be ignored for this<br />
job.<br />
&#8230;</p>
]]></content:encoded>
			<wfw:commentRss>http://rc.blog.usf.edu/uncategorized/infiniband-fabric-issue-04102012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>May 2012 Upgrades to /work Filesystem</title>
		<link>http://rc.blog.usf.edu/uncategorized/may-2012-upgrades-to-work-filesystem/</link>
		<comments>http://rc.blog.usf.edu/uncategorized/may-2012-upgrades-to-work-filesystem/#comments</comments>
		<pubDate>Tue, 10 Apr 2012 16:51:52 +0000</pubDate>
		<dc:creator>Brian Smith</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.blog.usf.edu/?p=529</guid>
		<description><![CDATA[<p>During the first week of May, we will be upgrading the current /work filesystem in order to provide significant performance enhancements and most importantly, long-awaited fixes related stability.  </p>
<p>/work will be migrated from a home-grown GlusterFS volume of roughly 100TB to a vendor-supported Lustre filesystem of equivalent capacity, capable of 4GB/s of read/write throughput [...]]]></description>
			<content:encoded><![CDATA[<p>During the first week of May, we will be upgrading the current /work filesystem in order to provide significant performance enhancements and most importantly, long-awaited fixes related stability.  </p>
<p>/work will be migrated from a home-grown GlusterFS volume of roughly 100TB to a vendor-supported Lustre filesystem of equivalent capacity, capable of 4GB/s of read/write throughput and a variety of options for supporting various applications as efficiently as possible.  This upgrade will eliminate the issues we have had with /work since January providing greater stability for jobs and much improved performance for jobs that are I/O-hungry.</p>
<p>All current data on /work will be copied to the new /work filesystem. There will be a cut-off date, where the /work filesystem will be taken offline for several hours to  do one last sync of the data before bringing the new system live.  We will narrow down this date and let everyone know when the downtime period will be.</p>
<p>/work2 will be taken out-of-service following this upgrade as it will be made redundant by the new configuration.  Data on /work2 will NOT be transfered and it will be up to you to ensure that data is copied either to /work or /home prior to taking the volume off-line.</p>
<p>This work is slated to occur sometime during the first week of May.  As we narrow down the date, we will update you with our revised timeline.</p>
<p>Thank you for your patience and cooperation during this process.  Please let us know if there are any questions related to this work.</p>
]]></content:encoded>
			<wfw:commentRss>http://rc.blog.usf.edu/uncategorized/may-2012-upgrades-to-work-filesystem/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>New Compute Resources: Sandy Bridge and Fermi GPUs</title>
		<link>http://rc.blog.usf.edu/uncategorized/new-compute-resources-sandy-bridge-and-fermi-gpus/</link>
		<comments>http://rc.blog.usf.edu/uncategorized/new-compute-resources-sandy-bridge-and-fermi-gpus/#comments</comments>
		<pubDate>Tue, 10 Apr 2012 16:50:18 +0000</pubDate>
		<dc:creator>Brian Smith</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.blog.usf.edu/?p=527</guid>
		<description><![CDATA[<p>Despite the significant system expansion that occurred back in January, adding over 1500 CPU cores and several terabytes of memory, we can always use more resources. We are finalizing some small details to add at least 60 new systems to the cluster based on the E5-2630 Intel Sandy Bridge CPU. This will add an additional [...]]]></description>
			<content:encoded><![CDATA[<p>Despite the significant system expansion that occurred back in January, adding over 1500 CPU cores and several terabytes of memory, we can always use more resources. We are finalizing some small details to add at least 60 new systems to the cluster based on the E5-2630 Intel Sandy Bridge CPU. This will add an additional 720 CPU cores (minimum) of cutting-edge processing power to the system. We will be working to recompile applications and libraries to take advantage of the new CPUs and their capabilities when they come online during the month of June.</p>
<p>We are also working to integrate existing applications with our available GPU resources and would be very much interested in any early adopters/beta testers helping us to work out kinks in the new system.</p>
<p>Challanges to address include</p>
<p>1. GPU scheduling<br />
2. GPU/RDMA/MPI integration<br />
3. Application integration</p>
<p>If you have an application that will lend itself well to GPU computing and already has support for the CUDA architecture, please let us know if you are interested in testing the new resources.</p>
]]></content:encoded>
			<wfw:commentRss>http://rc.blog.usf.edu/uncategorized/new-compute-resources-sandy-bridge-and-fermi-gpus/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CIRCE Research Users: UPDATE: /home issues &#8212; 3/2/2012</title>
		<link>http://rc.blog.usf.edu/uncategorized/circe-research-users-update-home-issues-322012/</link>
		<comments>http://rc.blog.usf.edu/uncategorized/circe-research-users-update-home-issues-322012/#comments</comments>
		<pubDate>Sun, 04 Mar 2012 15:54:44 +0000</pubDate>
		<dc:creator>jfargen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.blog.usf.edu/?p=517</guid>
		<description><![CDATA[<p>The system should be back up and functioning normally.  We had to do extensive checking and debugging of the filesystem which resulted in the loss of some files.  Our timestamps indicate that many of these files are either</p>
<p>a) from today, which cannot be recovered
b) last accessed more than 2 weeks ago, which can [...]]]></description>
			<content:encoded><![CDATA[<p>The system should be back up and functioning normally.  We had to do extensive checking and debugging of the filesystem which resulted in the loss of some files.  Our timestamps indicate that many of these files are either</p>
<p>a) from today, which cannot be recovered<br />
b) last accessed more than 2 weeks ago, which can be recovered</p>
<p>Please let us know, by filing a ticket at help@usf.edu, if you have any missing files so we can restore them. </p>
<p>We apologize for this inconvenience and are working with data center services to resolve the recurring issues with /home which are the result of faulty hardware supplied by our vendor.</p>
<p>My apologies for any inconvenience today&#8217;s outage may have caused.</p>
<p>Best Regards,<br />
Brian Smith</p>
]]></content:encoded>
			<wfw:commentRss>http://rc.blog.usf.edu/uncategorized/circe-research-users-update-home-issues-322012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Update: Storage Issue &#8212; 3/2/2012</title>
		<link>http://rc.blog.usf.edu/uncategorized/update-storage-issue-322012/</link>
		<comments>http://rc.blog.usf.edu/uncategorized/update-storage-issue-322012/#comments</comments>
		<pubDate>Fri, 02 Mar 2012 19:17:58 +0000</pubDate>
		<dc:creator>jfargen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">http://rc.blog.usf.edu/?p=515</guid>
		<description><![CDATA[<p>An attempt to replace a failed drive in the array has caused the storage system to go offline.  We&#8217;re working to resolve the issue. Currently, /home, /apps, and /shares are unavailable.</p>
<p>-Brian</p>
]]></description>
			<content:encoded><![CDATA[<p>An attempt to replace a failed drive in the array has caused the storage system to go offline.  We&#8217;re working to resolve the issue. Currently, /home, /apps, and /shares are unavailable.</p>
<p>-Brian</p>
]]></content:encoded>
			<wfw:commentRss>http://rc.blog.usf.edu/uncategorized/update-storage-issue-322012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Storage Issue &#8212; 3/2/2012</title>
		<link>http://rc.blog.usf.edu/uncategorized/storage-issue-322012/</link>
		<comments>http://rc.blog.usf.edu/uncategorized/storage-issue-322012/#comments</comments>
		<pubDate>Fri, 02 Mar 2012 19:16:27 +0000</pubDate>
		<dc:creator>jfargen</dc:creator>
				<category><![CDATA[Uncategorized]]></category>
		<category><![CDATA[home]]></category>
		<category><![CDATA[hpc]]></category>
		<category><![CDATA[scheduler]]></category>
		<category><![CDATA[storage]]></category>

		<guid isPermaLink="false">http://rc.blog.usf.edu/?p=513</guid>
		<description><![CDATA[<p>To all:</p>
<p>Due to a drive failure in one of the arrays which stores /home, I&#8217;m stopping all jobs that will run from within /home in order to reduce the load on the storage system so that the array will rebuild as quickly as possible.  All new jobs submitted from within /home will be presented [...]]]></description>
			<content:encoded><![CDATA[<p>To all:</p>
<p>Due to a drive failure in one of the arrays which stores /home, I&#8217;m stopping all jobs that will run from within /home in order to reduce the load on the storage system so that the array will rebuild as quickly as possible.  All new jobs submitted from within /home will be presented with an error message stating that they should not be run from that<br />
location.  Please use /work instead.</p>
<p>I will keep this list updated as the array repairs are completed.</p>
<p>Best Regards,<br />
Brian Smith</p>
]]></content:encoded>
			<wfw:commentRss>http://rc.blog.usf.edu/uncategorized/storage-issue-322012/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>

