Misplaced Pages

Titan (supercomputer)

Article snapshot taken from Wikipedia with creative commons attribution-sharealike license. Give it a read and then ask your questions in the chat. We can research this topic together.

This is an old revision of this page, as edited by James086 (talk | contribs) at 14:07, 30 December 2012 (Research projects: prose). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Revision as of 14:07, 30 December 2012 by James086 (talk | contribs) (Research projects: prose)(diff) ← Previous revision | Latest revision (diff) | Newer revision → (diff)
Titan
File:Titan1.jpg
ActiveBecame operational October 29, 2012
SponsorsUS DOE and NOAA (<10%)
OperatorsCray Inc.
LocationOak Ridge National Laboratory
Architecture18,688 AMD Opteron 6274 16-core CPUs
18,688 Nvidia Tesla K20X GPUs
Cray Linux Environment
Power8.2 MW
Space404 sqm (4352 sq ft)
Memory710 TB (598 TB CPU and 112 TB GPU)
Storage10 PB, 240 GB/s IO
Speed17.59 petaFLOPS (LINPACK)
27 petaFLOPS theoretical peak
CostUS$97 million
RankingTOP500: #1, November 12, 2012
PurposeScientific research
LegacyRanked 1 on TOP500 when built.
First GPU based supercomputer to perform over 10 petaFLOPS
Websitewww.olcf.ornl.gov/titan/

Titan is a supercomputer developed by Cray Inc. at Oak Ridge National Laboratory for use in a variety of science projects. Titan is an upgrade of Jaguar, a previous supercomputer at Oak Ridge, to use GPUs in addition to conventional CPUs and is the first such hybrid to perform over 10 petaFLOPS. The upgrade began in October 2011, commenced stability testing in October 2012 and it will be available to researchers in early 2013. The initial cost of the upgrade was USD$60 million, funded primarily by the United States Department of Energy.

Titan uses AMD Opteron CPUs in conjunction with Nvidia Tesla GPUs to maintain energy efficiency while providing an exponential increase in computational power over Jaguar. It contains 18,688 CPUs paired with an equal number of GPUs with a theoretical peak performance of 27 petaFLOPS however in the LINPACK benchmark used for ranking supercomputers by speed it performed at 17.59 petaFLOPS. This was enough to take first place in the November 2012 ranking by the TOP500 organisation.

Titan is available for any purpose however selection for time on the computer depends on the importance of the project, potential to fully utilise the hybrid architecture and they must run on other supercomputers to avoid dependence solely on Titan. Six "vanguard" codes were selected to be the first to run on Titan dealing mostly with molecular scale physics or climate models but other projects are also queued for use of the machine. Due to the inclusion of GPUs programmers have had to alter their existing code to properly address the new architecture. The modifications often require a greater degree of parallelism as the GPUs can handle many more threads simultaneously than CPUs however the changes often yield greater performance even on non-GPU based machines.

History

In order to remain power efficient and up to date with processing power Jaguar had received various upgrades since its creation in 2005 using the XT3 platform and performing at 25 teraFLOPS. By 2008 Jaguar was upgraded to the XT4 platform and performed at 263 teraFLOPS and by 2009 it was expanded using the XT5 platform to perform at 1.4 petaFLOPS. Further upgrades brought Jaguar to 1.76 petaFLOPS before the Titan upgrades began. Plans to create a supercomputer capable of 20 petaFLOPS at ORNL were in place as far back as 2005, when Jaguar was built, however the hybrid CPU/GPU architecture was not finalised until 2010 and the name "Titan" not until 2011. Titan was announced at the private ACM/IEEE Supercomputing Conference (SC10) on November 16, 2010 although a deal with Nvidia to supply the GPUs was signed in 2009. It was publicly announced on October 11, 2011 as the first phase of the upgrade began. Initially a new 15 000 m (160 000 ft) building was planned to house the replacement for Jaguar but it was eventually decided to use Jaguar's existing infrastructure.

Titan was funded primarily by the US Department of Energy through Oak Ridge National Laboratory. ORNL funding was sufficient to purchase the CPUs but not all of the 18, 688 GPUs so the NOAA agreed to fill the remaining nodes in return for computing time. ORNL scientific computing chief Jeff Nichols noted that Titan cost approximately $60 million upfront of which the NOAA contribution was less than $10 million but would not release precise figures due to non-disclosure agreements. Throughout the full term of the contract with Cray Titan will cost $97 million not including potential upgrades to the machine.

Jaguar's internals were upgraded to Titan over the course of a year beginning October 9, 2011. Between October and December 96 of Jaguar's 200 cabinets containing XT5 blades (two 6-core CPUs per node) were upgraded with XK6 blades (one 16-core CPU per node) while the remainder of the machine was still available for processing. In December computation was moved to the 96 XK6 cabinets and the remaining 104 cabinets were upgraded to XK6. The system's interconnect (the network that allows the CPUs to communicate with each other) was updated and the ORNL's ESnet connection was upgraded to 100 Gbps to permit faster data transfer to other national laboratories, universities and research institutions. The system memory was doubled to 600 TB as the nodes were upgraded to XK6. 960 of the XK6 nodes (10 cabinets) were also fitted with a Fermi based Nvidia GPU as Kepler based GPUs were not yet available; these 960 nodes were referred to as TitanDev and used to test code for Titan's full upgrade. This first phase of the upgrade increased the peak performance of Jaguar to 3.3 petaFLOPS although the computer was still called Jaguar. Beginning on September 13, 2012 Nvidia Tesla K20X GPUs were fitted to Jaguar's XK6 compute blades but continued to use the same CPUs. In October the task was completed and the computer was renamed Titan, having remained "Jaguar" until this point. Titan will undergo acceptance testing until early 2013 and once completed it will be made available to researchers.

Hardware

Titan uses the same building and 200 cabinets covering 404 m (4352 sq ft) that Jaguar did, replacing the internals and upgrading networking facilities. Reusing the power and cooling systems already in place for Jaguar saved the lab approximately US $20 million. Titan draws 8.2 MW, 1.2 MW more than Jaguar did, but it is almost ten times as fast in terms of floating point calculations. Power is provided to each cabinet at 480 V to use thinner cables than the US standard 208 V, saving US $1 million in copper. In the event of a power failure, carbon fibre fly wheels power generators that can keep the networking and storage infrastructure running for up to 16 seconds. If power is not restored within 2 seconds, diesel engines are started, taking approximately 7 seconds, and assume the role of powering the generators indefinitely. The flywheels and generators are designed only to keep the networking and storage components powered so that a reboot is much quicker, the generators are not capable of powering the processing infrastructure to continue simulations. Titan's components are air-cooled with heatsinks, but the air is chilled before being pumped through the cabinets. The cooling system has a cooling capacity of 6600 tons and works by cooling water to 5.5 °C (42 °F), which in turn chills the recirculated air.

Titan has 18,688 nodes (4 nodes per blade, 24 blades per cabinet), each containing a 16-core AMD Opteron 6274 CPU with 32 GB of DDR3 ECC memory and an Nvidia Tesla K20X GPU with 6 GB GDDR5 ECC memory. The total number of processor cores is 299, 008 and the total amount of RAM is over 710 TB. 10 PB of storage (made up of 13, 400 7200 rpm 1 TB hard drives) is available with a transfer speed of 240 GB/s. The next storage upgrade is due in 2013, it will up the total storage to between 20 and 30 PB with a transfer speed of approximately 1 TB/s. Titan runs the Cray Linux Environment, a full version of Linux on the login nodes but a scaled down, more efficient version on the compute nodes. GPUs were selected for their vastly higher parallel processing efficiency over CPUs. Although the GPUs have a slower clock speed than the CPUs, each GPU contains 2, 688 CUDA cores at 732 MHz, resulting in a faster overall system. Consequently, the CPUs cores are used to allocate tasks to the GPUs rather than for directly processing the data as in previous supercomputers for well optimised codes.

Titan's hardware has a theoretical peak performance of 27 petaFLOPS with perfectly optimised software. On November 12, 2012 the TOP500 organisation that ranks the worlds' supercomputers by their LINPACK performance announced that Titan was ranked first at 17.59 petaFLOPS, displacing IBM Sequoia. Titan was also ranked third on the Green500, the same 500 supercomputers but re-ordered in terms of energy efficiency.

Research projects

File:VERA reactor core.jpg
A VERA simulation of a light water reactor's core. This image was rendered on Jaguar but the project will continue with greater detail on Titan.

Although Titan is available for use by any project, the requests for use exceeded the computing time available, so selection criteria were drawn up. In 2009, the Oak Ridge Leadership Computing Facility (OLCF) considered fifty applications for first use of the supercomputer, but narrowed it down to six successful candidates chosen not only for the importance of the research, but for their ability to fully utilise the computing power of the hybrid system. The code of projects had to be modified to suit the GPU processing of Titan, but the code was required to still be capable of running on CPU-based systems so that the projects were not solely dependent on Titan. OLCF formed the Center for Accelerated Application Readiness (CAAR) to aid researchers in modifying their code for Titan and holds developer workshops at Nvidia headquarters to educate users about the architecture, compilers and applications on Titan and other supercomputers. CAAR has been working on compilers with Nvidia and code vendors to integrate directives into the programming languages for GPUs. Researchers can then express parallelism in their code without learning a new programming language, typically Fortran, C or C++, and the compiler can express it to the GPUs. Dr. Bronson Messer, a computational astrophysicist, said of the task: "...an application using Titan to the utmost must also find a way to keep the GPU busy, remembering all the while that the GPU is fast, but less flexible than the CPU." Some projects found that the changes increased efficiency of their code on non-GPU machines; the performance of Denovo doubled on CPU-based machines.

The six initial projects to use Titan include S3D, a project that models fine grain physics surrounding combustion aiming to improve the efficiency of diesel and biofuel engines. In 2009, they produced the first fully resolved simulation of autoigniting hydrocarbon flames relevant to the efficiency of direct injection diesel engines using Jaguar. The WL-LSMS project simulates the interactions between electrons and atoms in magnetic materials at temperatures other than absolute zero. An earlier version of the code was the first to perform at greater than one petaFLOPS on Jaguar. Denovo simulates nuclear reactions with the aim of improving the efficiency and reducing the waste of nuclear reactors. The performance of Denovo on conventional CPU-based machines doubled after the tweaks for Titan and performs 3.5 faster on Titan than it did on Jaguar. Large-scale Atomic/Molecular Massively Parallel Simulator (LAMMPS) is a molecular dynamics code that simulates particles across a range of scales, from atomic to relativistic, to improve materials science with potential applications in semi-conductors, biomolecules and polymers. CAM-SE is a combination of two codes: Community Atmosphere Model, a global atmosphere model, and High Order Method Modeling Environment, a code that solves fluid and thermodynamic equations. CAM-SE will allow greater accuracy in climate simulations. Non-Equilibrium Radiation Diffusion (NRDF) plots non-charged particles through supernovae with potential applications in laser fusion, fluid dynamics, medical imaging, nuclear reactors, energy storage and combustion.

The amount of code alteration required to run on the GPUs varies by project. According to Dr. Messer of the NRDF project, only a small percentage of his code is for the GPUs because the calculations are relatively simple but processed repeatedly and in parallel. NRDF is written in CUDA Fortran, a version of normal Fortran with CUDA extensions for the GPUs. Dr. Messer's research requires hundreds of partial differential equations to track the energy, angle, angle of scatter and type of each neutrino modeled in a star going supernova, resulting in millions of individual equations. The code was named Chimera after the mythological creature because it has three "heads": the first simulates the hydrodynamics of stellar material, the second simulates radiation transport and the third simulates nuclear burning. The third "head" is the first to run on the GPUs as the nuclear burning can most easily be simulated by GPU architecture although the other aspects of the code will be modified in time. Currently the project models 14 or 15 nuclear species but if the GPUs provide good acceleration Dr. Messer anticipates up to 200 species could be simulated allowing far greater precision when comparing to empirical observation.

VERA is a light water reactor simulation written at the Consortium for Advanced Simulation of Light Water Reactors (CASL) on Jaguar. VERA allows engineers to monitor the performance and status of any part of a reactor core throughout the lifetime of the reactor to identify points of interest. Although not one of the first six projects, VERA will be run on Titan having been optimised with assistance from CAAR and tested on TitanDev. Computer scientist Tom Evans found that the adaption to Titan's hybrid architecture was of greater difficulty than between previous CPU based supercomputers, despite this he aims to simulate an entire reactor fuel cycle, an eighteen to thirty-six month process, in one week on Titan.

See also

References

  1. ^ "Oak Ridge Claims No. 1 Position on Latest TOP500 List with Titan". TOP500. November 12, 2012. Retrieved November 15, 2012.
  2. ^ "Jaguar: Oak ridge National Laboratory". TOP500. Retrieved December 18, 2012.
  3. "TOP500 List November 2011". TOP500. Retrieved December 18, 2012.
  4. "Discussing the ORNL Titan Supercomputer with ORNL's Jack Wells". The Exascale Report. 2012. Retrieved December 19, 2012. {{cite web}}: Unknown parameter |month= ignored (help)
  5. Bland, Buddy (November 16, 2010). "Where do we go from here?" (PDF). Retrieved December 18, 2012.
  6. Morgan, Timothy Prickett (October 1, 2009). "Oak Ridge goes gaga for Nvidia GPUs". Retrieved December 19, 2012.
  7. Levy, Dawn (October 11, 2011). "ORNL awards contract to Cray for Titan supercomputer". Oak Ridge National Laboratory. Retrieved December 19, 2012.
  8. Munger, Frank (March 7, 2011). "Oak Ridge lab to add titanic supercomputer". Knox News. Retrieved December 19, 2012.
  9. ^ Munger, Frank (November 26, 2012). "The ORNL and NOAA relationship". Knox News. Retrieved December 20, 2012.
  10. Munger, Frank (November 18, 2012). "The cost of Titan". Knox News. Retrieved December 20, 2012.
  11. ^ Feldman, Michael (October 11, 2011). "GPUs Will Morph ORNL's Jaguar Into 20-Petaflop Titan". HPC Wire. Retrieved October 29, 2012.
  12. ^ "Titan Project Timeline". Oak Ridge Leadership Computing Facility. Retrieved December 18, 2012.
  13. ^ Brouner, Jennifer; McCorkle, Morgan; Pearce, Jim; Williams, Leo (2012). "ORNL Review Vol. 45" (PDF). Oak Ridge National Laboratory. Retrieved November 2, 2012.
  14. "Superfast Titan, Superfast Network". Oak Ridge National Laboratory. December 17, 2012. Retrieved December 18, 2012.
  15. Poeter, Damon (October 11, 2011). "Cray's Titan Supercomputer for ORNL Could Be World's Fastest". PC Magazine. Retrieved October 29, 2012.
  16. Jones, Gregory Scott (September 17, 2012). "Final Upgrade Underway". Oak Ridge Leadership Computing Facility. Retrieved November 16, 2012.
  17. ^ Tibken, Shara (October 29, 2012). "Titan supercomputer debuts for open scientific research". CNET. Retrieved October 29, 2012.
  18. ^ "Introducing Titan". Oak Ridge Leadership Computing Facility. Retrieved October 29, 2012.
  19. Munger, Frank (October 29, 2012). "Titan's ready to roll; ORNL supercomputer may become world's No. 1". Knox News. Retrieved October 29, 2012.
  20. "Heterogeneous Systems Re-Claim Green500 List Dominance". Green500. November 14, 2012. Retrieved November 15, 2012.
  21. ^ Lal Shimpi, Anand (October 31, 2012). "Inside the Titan Supercomputer". Anandtech. p. 1. Retrieved November 2, 2012.
  22. ^ Bland, Buddy; Lal Shimpi, Anand (October 30, 2012). "Oak Ridge National Laboratory Tour - Backup Power" (Youtube). Anandtech. Retrieved November 2, 2012.
  23. ^ Bland, Buddy; Lal Shimpi, Anand (October 30, 2012). "Oak Ridge National Laboratory Tour - Cooling Requirements" (Youtube). Anandtech. Retrieved November 2, 2012.
  24. Morgan, Timothy Prickett (October 11, 2011). "Oak Ridge changes Jaguar's spots from CPUs to GPUs". The Register. Retrieved December 21, 2012.
  25. ^ "ORNL Debuts Titan Supercomputer" (PDF). Oak Ridge Leadership Computing Facility. Retrieved October 29, 2012.
  26. Lal Shimpi, Anand (October 31, 2012). "Titan's storage array". Anandtech. Retrieved December 18, 2012.
  27. "TITAN OVERVIEW". Oak Ridge Leadership Computing Facility. Retrieved December 18, 2012.
  28. "Titan System Overview". Oak Ridge Leadership Computing Facility. Retrieved December 21, 2012.
  29. Smith, Ryan (November 12, 2012). "NVIDIA Launches Tesla K20 & K20X: GK110 Arrives At Last". Anandtech. Retrieved December 21, 2012.
  30. Feldman, Michael (October 29, 2012). "Titan Sets High Water Mark for GPU Supercomputing". HPC Wire. Retrieved October 30, 2012.
  31. Jones, Gregory Scott (November 12, 2012). "ORNL Supercomputer Named World's Most Powerful". Oak Ridge National Laboratory. Retrieved December 14, 2012.
  32. "US Titan supercomputer clocked as world's fastest". BBC. November 12, 2012. Retrieved November 12, 2012.
  33. Williams, Leo (November 14, 2012). "Titan is Also a Green Powerhouse". Oak Ridge Leadership Computing Facility. Retrieved November 15, 2012.
  34. ^ "TITAN: Built for Science" (PDF). Oak Ridge Leadership Computing Facility. Retrieved October 29, 2012.
  35. ^ Williams, Leo. "Preparing users for Titan". Oak Ridge National Laboratory. Retrieved November 19, 2012.
  36. Rumsey, Jeremy (December 17, 2012). "Titan Trainers Take Road Trip". Oak Ridge Leadership Computing Facility. Retrieved December 18, 2012.
  37. "Nuclear Energy - Supercomputer speeds path forward". Consortium for Advanced Simulation of LWRs. Retrieved December 14, 2012.
  38. Zybin, Sergey. "LAMMPS Molecular Dynamics Simulator". Sandia National Laboratories. Retrieved October 29, 2012.
  39. ^ Lal Shimpi, Anand (October 31, 2012). "Inside the Titan Supercomputer". Anandtech. p. 3. Retrieved November 15, 2012.
  40. ^ Messer, Bronson (October 30, 2012). "Using Titan to Model Supernovae" (Youtube). Anandtech. Retrieved November 15, 2012.
  41. ^ Pearce, Jim. "VERA analyzes nuclear reactor designs in unprecedented detail". Oak Ridge National Laboratory. Retrieved 18 December 2012.

External links

Records
Preceded byIBM Sequoia
16.325 petaflops
World's most powerful supercomputer
November 2012 – present
Incumbent
Categories: