ATLAS Canada Tier-1 and Tier-2 Cloud Operations

-- LeslieGroer - 28 Feb 2011

ATLAS Canada operates East and West Tier-2 Federations distributed across five University-based computing facilities.

Although not technicallly included in ATLAS Canada, the Tier-2 at Australia's University of Melbourne is also a part of the ATLAS Canadian cloud.

Table of Contents

Weekly Cloud Operations Reports

APEL Accounting Reports

Production Activities

EGI EMI/UMD Deployment - Fall/Winter 2012

Cream CE Deployment

Frontier and Squid

Daily SiteTest Results

Service Incident Reports (SIRs)

Meetings

Step09 Exercise

Issued Monthly Reports

SAM Reliability/Availability Summary (2009)

  SAM Measured Reliability [Availability] (%)
Month Alberta SFU Toronto Victoria
Nov 2009 86 [86] 95 [90] 74 [74] 96 [96]
Oct 2009 96 [95] 85 [85] 44 [44] 96 [96]
Sept 2009 92 [92] 91 [91] 95 [95] 98 [97]
Aug 2009 99 [99] 100 [86] 97 [97] 97 [97]
July 2009 99 [99] 78 [77] 97 [95] 100 [100]
June 2009 55 [55] 93 [93] 91 [88] 98 [98]
May 2009 99 [99] 90 [89] 55 [55] 95 [91]
Apr 2009 86 [86] 85 [85] 0 [0] 99 [99]
Mar 2009 83 [82] 77 [77] 34 [15] 91 [89]
Feb 2009 98 [98] 85 [85] 68 [67] 96 [96]
Jan 2009 92 [85] 89 [89] 97 [97] 93 [93]
Dec 2008 98 [98] 92 [92] 41 [41] 95 [95]
Nov 2008 92 [88] 93 [93] 45 [45] 91 [91]
Oct 2008 98 [98] 92 [92] 97 [97] 93 [93]
Sept 2008 99 [99] 87 [87] 95 [95] 84 [76]
Aug 2008 98 [98] 89 [89] 69 [67] 94 [81]
July 2008 84 [79] 81 [81] 99 [99] 97 [97]
June 2008 96 [84] 95 [94] 80 [76] 92 [92]
May 2008 95 [95] 83 [83] 90 [90] 73 [73]
April 2008 92 [92] 70 [69] 93 [92] 78 [78]

Support Information

  • CERN Mailing List: atlas-canada-tier2-opsATcernDOTch
  • Tier-2 Operations Coordinator: bryan.caronATcernDOTch

Panda Resource Information

Site Panda Queue Local Disk/core (MB)
(for jobs)
vlimit (GB) Wall Time (min) Available (HS06) Pledged (HS06) Comments
TRIUMF TRIUMF 43250 4 5760 53280 25910  
TRIUMF_MCORE 28375 3 5760      
ANALY_TRIUMF 28375 3 5760      
Australia-ATLAS Australia_ATLAS            
ANALY_AUSTRALIA            
Australia-NECTAR Australia_NECTAR            
ANALY_NECTAR            
CA-JADE CA-JADE-cloudscheduler            
CA-MCGILL-CLUMEQ-T2 CA-MCGILL-CLUMEQ-T2            
ANALY_MCGILL            
CA-SCINET-T2 CA-SCINET-T2            
ANALY_SCINET          
CA-VICTORIA-WESTGRID-T2 CA-VICTORIA-WESTGRID-T2 24000 4 4320      
ANALY_VICTORIA-WG1 24000 4 4320      
IAAS IAAS-cloudscheduler 20000 2.25 5760     some clouds may have VMs with more RAM
IAAS-cloudscheduler-4core            
SFU-LCG2 SFU-LCG2 35000 unlim (2.5 mem) 3600 score, 2880 mcore      
ANALY_SFU-bugaboo 35000 unlim (2.5 mem) 3600      

Resource Information

Site CE # WNs #Cores/WN Local Disk (GB)
(for jobs)
Local Disk/core (GB)
(for jobs)
vlimit (GB) Available (HS06) Pledged (HS06)
TRIUMF ce1 (cream) 146 12 516 43 3.2(user), ulimited(production) 25930 11150
ce2 (cream) 154 12 516 43 3.2(user), ulimited(production) 27350 11760
ce3 (cream) 70 8 227 28 3.2(user), ulimited(production) 7000 3000
Australia-ATLAS agcream1 (cream) 46 8/12/64 353/664/1600 44/55/25 unlimited 6500 5500
CA-ALBERTA-WESTGRID-T2 adm01 (cream) 44 for Atlas allocation (150 WNs in total shared with WestGrid, will be independent soon) 8 380GB 47 unlimited    
CA-MCGILL-CLUMEQ-T2 ce02,ce03 (EMI-cream CEs) jobs can run on any of the 600 nodes in "SW" partition. Fairshare in place. 12 380GB 31 unlimited 3325 143-cores =2073.5
CA-SCINET-T2 lcg-admin3 (cream) 3780
ATLAS max 215
8 >=250 25 4 GB 6650 (532 cores) 3325 (266 cores)
CA-VICTORIA-WESTGRID-T2 gorgon02,gorgon03 84 Hermes Classic nodes (shared with cloud and WestGrid) 8 198 GiB 24.7 GiB 4.0 GiB (user), 4.0 GiB (production) RAC is 913 hermes cores = 13111 HS06 50% of available HS06
gorgon02,gorgon03 120 Heracles nodes (shared with WestGrid) 12 429 GiB 35.7 GiB
SFU-LCG2 arbutus-hep (cvmfs-nfs test only) 5 8/12 120/520 30/45 4000 MB    
bugaboo (cream) 432 8/12 old/new nodes (total 4448 cores, 518 assigned to Atlas) 120/240/520 30/45 unlimited (2500 MB mem limit) 13420 (913 cores) 6625 (450 cores)

Summary of Job Slot and Vmem Information

Last updated: L. Groer 28-Jan-2016

Site Job Slots HS06 RAM/Slot VMem/Slot Kill on overusage Comments
TRIUMF

300x12 =3600

70x8=560

 

4GB

3GB

no limit ulimit of 3.2 for user jobs Killing is only applied to user analysis job. Of course production job would be killed by system when it runs out of memory.
Australia-ATLAS 408   2GB 8GB not currently Swap is set at 4x physical memory in OS installer, but no limits imposed
CA-ALBERTA-WESTGRID-T2 44 x 8 = 352   2 GB      
CA-MCGILL-CLUMEQ-T2 469   3 GB 2.7GB YES killing is necessary to protect IB and GPFS perfomance, but we can RAM thread by limitting job slots to np= (10,11) instead of np=12
CA-SCINET-T2 912 11400 2 GB 4.2 GB ulimit at 4.2 GB per slot 12GB swap per node
CA-VICTORIA-WESTGRID-T2 2112   2 GB 4 GB no Some nodes have 3 GB RAM + 1 GB swap, but most have 2 GB RAM + 2 GB swap
SFU-LCG2 913   2 GB unlimited YES if MEM usage > 2500 MB a job gets killed when it uses more than 2500MB/core of MEM for more than 15 minutes. Killing is needed to protect the cluster from Westgrid users crashing nodes

EMI installation information

Site EMI version (tarball ?) setupwn version Changes unique to site
CA-ALBERTA-WESTGRID-T2 (emi-wn-2.5.1-1_v1 tarball)  

WN configuration is a combination of old env variables from gLite and the new EMI ones. See attached file (atlas-wn_sl5_emi2.sh)

1. Added one entry in $PATH for a temporary fix of lcg-gt issue.

export PATH=${BASEDIR}/extra:${BASEDIR}/bin:${BASEDIR}/usr/bin:${BASEDIR}/usr/sbin:$PATH

Wrapper of lcg-gt: /opt/exp_software/atlas.glite/emi-WN_latest/extra/lcg-gt

2. export PERL5LIB=${base}/usr/lib/perl5/vendor_perl:${base}/usr/lib64/perl5/vendor_perl

3. Replaced site.py to the one got from http://hep.lancs.ac.uk/~msd/060213_site.py

/opt/exp_software/atlas.glite/emi-WN_latest/site-python/site.py

CA-MCGILL-CLUMEQ-T2 (emi-wn-2.5.1-1_v1 tarball)  

1) PERL5LIB =/sb/cnfs/wn-tarball/emi-wn/emi-wn-2.5.1-1_v1/usr/lib/perl5/vendor_perl:/sb/cnfs/wn-tarball/emi-wn/emi-wn-2.5.1-1_v1/usr/lib64/perl5/vendor_perl

2) PYTHONPATH=/sb/cnfs/wn-tarball/emi-wn/emi-wn-2.5.1-1_v1/site-python

CA-SCINET-T2 (emi-wn-2.5.1-1_v1 tarball)  

1. ln -sf /opt/emi2-workernode/etc/emi-version /etc/emi-version
2. export PERL5LIB =${base}/usr/lib/perl5/vendor_perl:${base}/usr/lib64/perl5/vendor_perl

3) applied the patch: http://hep.lancs.ac.uk/~msd/060213_site.py

CA-VICTORIA-WESTGRID-T2 tarball    

Site Network Information

SiteNetworkInformation

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf Facility-Status.pdf r1 manage 113.3 K 2009-01-20 - 18:23 BryanCaron  
PDFpdf Tier-2-Accounting_Report_August2008.pdf r1 manage 278.5 K 2008-09-11 - 09:31 BryanCaron DRAFT - until Sept 12/08
PDFpdf Tier-2-Accounting_Report_July2008.pdf r1 manage 276.5 K 2008-09-11 - 09:30 BryanCaron  
PDFpdf Tier-2-Accounting_Report_June2008.pdf r1 manage 272.1 K 2008-09-11 - 09:30 BryanCaron  
PDFpdf Tier-2_Accounting_Report_September2008.pdf r1 manage 277.0 K 2008-10-14 - 16:16 BryanCaron  
PDFpdf Tier2_Reliab_200807.pdf r1 manage 49.1 K 2008-09-11 - 09:26 BryanCaron  
PDFpdf Tier2_Reliab_200808.pdf r1 manage 32.1 K 2008-09-11 - 09:25 BryanCaron  
PDFpdf Tier2_Reliab_200810.pdf r1 manage 78.7 K 2009-01-15 - 22:22 BryanCaron TIer2_Reliab_200810.pdf
PNGpng apel-CA-status-20081028-table.png r1 manage 671.2 K 2008-10-28 - 16:20 BryanCaron apel-CA-status-20081028-table
PNGpng apel-CA-status-20081028.png r1 manage 22.4 K 2008-10-28 - 16:01 BryanCaron apel-CA-status-20081028
Microsoft Word filertf atlas-wn_sl5_emi2.rtf r1 manage 3.1 K 2013-01-31 - 22:40 ErmingPei EMI WN tar setup script
PDFpdf tier2-rwg-2006-02-08.pdf r1 manage 1275.8 K 2009-01-20 - 18:22 BryanCaron  
PDFpdf tier2_rel_200805.pdf r1 manage 33.5 K 2008-10-22 - 05:58 BryanCaron May 2008
PDFpdf tier2_rel_june2008.pdf r1 manage 33.4 K 2008-10-22 - 05:52 BryanCaron June 2008 Availability and Reliability Report
PDFpdf tier2_rel_sep2008.pdf r1 manage 32.2 K 2008-10-22 - 06:05 BryanCaron September 2008
PDFpdf tier2_reliability_April08.pdf r1 manage 48.5 K 2008-10-22 - 05:56 BryanCaron April 2008
Texttxt weekly_report-template.txt r1 manage 0.6 K 2009-02-17 - 07:02 BryanCaron  
Edit | Attach | Watch | Print version | History: r477 < r476 < r475 < r474 < r473 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r477 - 2018-10-30 - LeslieGroer
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback