Australia-ATLAS

  • Worker Nodes:- 2 types of worker nodes at the moment (3rd to come online shortly).
  • Oldest hardware is Dell R410 (2x Intel Xeon E5520, 2261MHz, 16gb RAM, 2x600gb 15k SAS operating in RAID-1 (Hardware), 1gbE connected) - 21 nodes.
  • 20 nodes of HP DL170e G6 (2x Intel Xeon X5650, 2667MHz, 24gb RAM, 2x500gb 7.2k SATA operating in RAID-0 (Software), 1gbE connected). 5 nodes of Dell C6145 (4x AMD Opteron 6276, 2300MHz, 128gb RAM, 2x2TB 7.2k SATA operating in RAID-1 (Hardware), 10gbE connected).
  • Storage enviroment:- 4 different storage nodes, all consisting of a head node connected to storage drawers via SAS.
  • Oldest is Dell 1950 (1x Intel Xeon 5148, 2330MHz, 4gb RAM, 4x1gbE in LACP connected) attached to 2xMD1000 drawers (15x1TB 7.2k SATA drives, configured in RAID-6 (Hardware), broken into 2x6TB XFS filesystems via LVM). 2 of these units, presenting 48T of storage.
  • Next oldest is Dell R710 (2x Intel Xeon E5520, 2270MHz, 24gb RAM, 1x10gbE connected) attached to 2xMD1000 drawers (15x1TB 7.2k SAS drives, configured in RAID-6 (Hardware), broken into 2x6TB XFS filesystems via LVM). 6 of these units, presenting 144T of storage.
  • Next oldest is IBM x3650M3 (2x Intel Xeon X5680, 3300MHz, 24gb RAM, 1x10gbE connected) attached to 3xIBM EXP2512 (12x2TB 7.2k NL-SAS drives, configured in RAID-6 (Hardware), broken into 3x6TB XFS filesystems via LVM). 4 of these units, presenting 218T of storage.
  • Newest is Dell R620 (2x Intel Xeon E5-2609, 2400MHz, 24gb RAM, 1x10gbE connected) attached to 2xDell MD1200 (12x3TB 7.2k NL-SAS drives, configured in RAID-6 (Hardware), broken into 3x9.1TB XFS filesystems via LVM). 4 of these units, presenting 220TB of storage.
  • Head node is Dell R610 (2x Intel Xeon X5650, 2667MHz, 24gb RAM, 2x600gb 10k SAS operating in RAID-1 (Hardware), 1x1gbE connected). Running DPM, SRM, GridFTP, mysql, RFIO.

CA-ALBERTA-WESTGRID-T2

CA-MCGill-CLUMEQ-T2

    • Woker Nodes:- Nodes in the SW partition have 36GB ( 3GB/Core) arranged in a balanced configuration of six 4GB (2Rx4, 1.5V) PC3-10600 CL9 ECC DDR3 1333MHz LP RDIMMs and six 2GB (2Rx8, 1.5V) PC3-10600 CL9 ECC DDR3 1333 MHz LP RDIMMs. This combination operates at a memory speed of 1333MHz. An Option has been provided to increase this to 48GB/Node or 4GB/Core. InfiniBand is configured in a 2:1 blocking network to the storage. In addition, is a 1-GbE. Local disk is 450GB partitioned to 30GB for cvmfs, 20GB for vmem and 379 for "local scratch" and the rest for the OS. There are 12 Intel Xeon X5650 CPUs at 2.67GHZ each.
    • Storage enviroment:-GPFS is the global parallel filesystem running on top of Data Direct Networks (DDN) SFA10000 Architecture. Two SFA 10000 controller couplets configured, each with ten 60-Drive Expansion Drawers and six-hundred 2TB 7,200 rpm 3.5" SATA disk drive modules. Raid configured in an 8+2P RAID6 tier each couplet prividing a usable capacity of ~960 TB resulting in total usable capacity of ~1.92PB for the shared cluster. The global space is partioned into a small partition "/sb" of ~500TB and a large partition "/lb" of 1.3PB which is the shared projects scratch and ATLAS space. Each SFA10000 couplet is equipped with eight QDR InfiniBand Host Ports, four per RAID controller. Four Storage Nodes are directly connected to each SFA10000 couplet in a redundant configuration with each storage node having 2 paths to the couplet, one to each RAID controller. The eight Storage IO Nodes (GPFS NSD Server) are based on the x3650 M3 2U rack-mount server. These systemsare configured with two Intel “Westmere-EP” Model E5620 Four-Core Processors (2.40GHz, 12MB Cache, 5.6GT/s QPI,1066MHz), 48GB of RAM configured as twelve 4GB (2Rx4, 1.5V) PC3-10600 CL9 ECC DDR3 1333MHz LP RDIMMs, two Gigabit Ethernet ports, two 146GB 10K rpm 6Gbps SAS Hard Drives in a RAID 1 array on a ServeRAID controller, and redundant power and cooling sub-systems. The back-end block level IO paths from each x3650 M3 to the SFA10000 storage controller are provided by two Mellanox ConnectX-2 VPI Single-port QSFP QDR IB PCIe Adapters. Each adapter is placed on a separate x8 PCIe gen2 slot in order to maximise block level IO throughput to the SFA10000. Client side GPFS Network traffic is handled by a third Mellanox ConnectX-2 VPISingle-port QSFP QDR IB PCIe Adapter. This adapter connects via copper cabling to a local QLogic 12200 QDR IB Switch with redundant power which in turns provides connectivity to both the HB and LM QLogic 12800-360 Core Switches. The attained throughput of this configuration over the GPFS filesystem is projected at ~16GB/s. ATLAS stores/accesses data via a stoRM node whereby all services are configured on one node currently: The storm-frontend, the storm-backend and a gridftp. The stoRM node is an 8core machine: Intel Xeon E5620 type at 2.4GH each. It has an infiniband conection to the storage, a 10-GbE connection to the outside and a 1-GbE, internal.

CA-SCINET-T2

Worker nodes

  • 2 Xeon x5540 CPU's @ 2.53 GHz (8 cores/node)
  • 1 GigE with 4.2:1 blocking (42 nodes share one 10 GigE link to core network)
  • Disks:
    • 80 nodes have 250 GB, 7200 RPM SATA II drives (WD2502ABYS-23B7A)
    • 20 additional nodes have drives salvaged from other sources (multiple models SATA I/II, 400/500GB, 7200 RPM)
    • Remainder use GPFS file system for scratch space
  • 16 GB RAM
    • 0.8 GB system image resides in ramdisk
    • nodes with disk have 12 GB of swap space to provide about 27 GB available virtual memory
    • nodes that use GPFS use 1 GB for GPFS page-pool and provide about 14 GB of memory

Software

  • 4x146GB 15 kRPM SAS drives in RAID-0 (IBM-ESXS Model: ST3146356SS)
  • Provided via NFS from a pool node

Storage

  • 2 DDN DCS 9900 couplets shared with GPFS file systems
  • 43x16TB pools
  • 4 pool nodes
    • Xeon E5430 @ 2.66 GHz (4 cores)
    • 8 GB RAM on node providing software NFS server, 4 GB on other three
    • 10 GigE connection to core network
  • 4 door nodes
    • 1 dcap, 3 gridftp doors
    • Xeon E5430 @ 2.66 GHz (4 cores)
    • 4 GB RAM
    • 1 GigE each to internal and external networks

SFU-LCG2

Production bugaboo cluster (shared with Westgrid)

  • CE bugaboo-hep.westgrid.ca
    • dual E5405, 2.0 GHz, 8 cores total, 28 GB RAM
  • Torque/Moab server + Westgrid login node - bugaboo.westgrid.ca
    • is a node: dual X5650 2.67, GHz, 12 cores total, 24 GB RAM
  • Worker nodes
    • 31 HP nodes s2-s32: dual X5355, 2.66 GHz, 8 cores per node, 16 GB RAM, 1 x 146 GB SAS HDD, 5-year warranty expires in December 2012
    • 159 Dell nodes b2-b160: dual E5430, 2.66 GHz, 8 cores per node, 16 GB RAM, 1 x 300 GB SAS HDD
    • 239 Dell nodes b161-b400: dual X5650, 2.67 GHz, 12 cores per node, 24 GB RAM, 2 x 300 GB SAS HDD Raid-1

Test arbutus cluster/ SFU HPC Dept Development Cluster

  • CE + Torque/Maui server/CVMFS-2.1.* test server via NFS share to nodes: arbutus-hep.westgrid.ca
    • Dell 1950, Dual Quad Core Xeon E5405, 16GB RAM
  • Cluster head node: bugaboo-dev.westgrid.ca
    • Dell M610 Blade Server, Dual Hex Core Xeon X5650, 24GB RAM, 2 x 300 GB SAS HDD RAID-0
  • Worker nodes
    • 1 HP node s1: dual X5355, 2.66 GHz, 8 cores per node, 16 GB RAM, 1 x 146 GB SAS HDD, 5-year warranty expires in December 2012
    • 4 Dell M610 nodes b410-b414: dual X5650, 2.67 GHz, 12 cores per node, 24 GB RAM, 2 x 300 GB SAS HDD Raid-1
    • 1 Dell M610 node b1: dual E5430, 2.66 GHz, 8 cores per node, 16 GB RAM, 1 x 300 GB SAS HDD
  • This cluster is only for debugging/testing software and new releases of NFS-based CVMFS for bugaboo. It does not run any production or analysis jobs for ATLAS.

Storage

  • dcache-1.9.12-10 as of August 2011. Will be upgraded to the new Golden release when bugaboo cluster is upgraded to SL6 - hopefully spring 2013
  • 1 DDN DCS 9900 couplet
  • 80 x 8TB pools
  • 10 pool nodes
    • dual E5405 2.0 GHz 8 cores total, 16 GB RAM
    • 8 nodes running 7 pools, 2 nodes running 10 pools
  • 2 identical pool nodes are nearline, capable of running another 20 x 8 TB pools, disks are attached, dcache setup
  • all pool nodes run gridftp doors
  • only one dcap door running on the SRM node wormhole.westgrid.ca

Networking

  • QRD InfiniBand connects all servers, dcache and worker nodes except:
    • all HP worker nodes
    • arbutus-hep
    • arbutus-cvmfs
  • internal 1 GB Ethernet is used only for administrative tasks (installation, etc.) on all boxes with IB interface
  • internal 1 GB Ethernet is used for production on all HP nodes and test arbutus cluster
  • all worker nodes have only internal network interfaces
  • NAT is running on bugaboo-hep.westgrid.ca
  • all servers (bugaboo, bugaboo-hep, arbutus-*, dcache headnode and all pool nodes) have 1 GB external network interfaces

TRIUMF-LCG2

CA-VICTORIA-WESTGRID-T2

Hermes I worker nodes

  • 250GB 7200 RPM 3.5" Simple-Swap SATA II (model WD2502ABYS-23B7A0) hard drive
  • 6 * 4GB DDR3-1333 2Rx4 LP RDIMM RAM
  • 2 * Intel Xeon Processor X5550 (2.66GHz 8MB 6.4GT/s 1333MHz 95W) CPU
  • 2 * 1 Gbit/s ethernet links bonded together with LACP for a bandwidth capacity of 2 Gb/s

Hermes II worker nodes

  • 500GB 7200 RPM SATA 3.5in Hard Drive (note: the performance of this hard drive is much better than the one in the Hermes I nodes)
  • 24GB Memory (6x4GB) 1333Mhz
  • 2 x Intel Xeon X5650 @ 2.66Ghz (12M Cache) = 12 cores
  • Qlogic QME7362-CK Dual Port 40 Gb InfiniBand card

dCache pool nodes

In the Phase I cluster there are 4 pool nodes, with 24 GB of RAM each. There are 32 pools (using xfs filesystems on 16 TB LUNs) distributed over the 4 pool nodes.

The Phase II cluster expansion adds 2 pool nodes with 48 GB of RAM each. They each have 9 pools of direct SAS-attached storage; each pool is 24 TB of usable space (8+2 * 3 TB).

All pool nodes have a 10 Gbit/s ethernet link and a 40 Gb/s IB link.

Storage back-end

There is one DCS9900 storage unit, with 60 LUNs, 32 of which are used for ATLAS data storage in dCache. Each LUN is 16 TB (8+2 disks * 2 TB @ 7200 RPM). The DCS9900 controller has a maximum throughput capability of roughly 5-6 GB/s.

The Phase II storage includes one DCS3700 controller unit and two DCS3700 expansion drawers, populated with 3TB 7200 RPM disks.

-- RyanTaylor - 2011-03-18

Edit | Attach | Watch | Print version | History: r15 < r14 < r13 < r12 < r11 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r15 - 2013-12-09 - LanceCouture
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback