Distributed Analysis Tutorial

Held April 18, 2007 at the University of Toronto as part of the ATLAS-Canada Physics Meeting

Toronto Tutorial

full agenda from Toronto Tutorial

Generic instructions for setting up a User Interface Do not forget to set the correct INSTALL_ROOT and JAVA_LOCATION. You may have to create the $INSTALL_ROOT/glite/etc/vomses directory by hand, and put the appropriate atlas-voms.cern.ch and atlas-lcg-voms.cern.ch files in it. This works for me on both SL3 and SL4 installations (-- IsabelTrigger - 01 May 2007). Then don't forget to do source $INSTALL_ROOT/etc/profile.d/grid_env.sh when you log on and want to use it.

Toronto Tutorial Setup Information

Official Ganga Tutorial

Cronus tutorial

Background Information

Here is some information about what Distributed Analysis is (and will become) in ATLAS. If you were not able to participate in the tutorial, you can still benefit from it: just follow the instructions for obtaining your grid certificate and VO membership, and get your local system administrator to help you set up an LCG User Interface at your institute. Then see how much of the tutorial you can do on your local machine or cluster. If you run into trouble, you and your sys admin should contact Leslie, Rod or Bryan for assistance. Our goal is to make sure all Canadian institutes have a UI running for their users in plenty of time for data taking.

What is Distributed Analysis?

The ATLAS Monte Carlo AODs (and eventually the data AODs) are stored at Tier 2 centres around the world. The idea of distributed analysis is that you can write your analysis job, test it locally, and then run it on machines where the data are. This saves you from copying enormous datasets around the world, probably multiple times, using up lots of valuable bandwidth and disk space.

Why should I care, I have a huge scratch disk and a fast PC on my desk, or a great batch system in my department?

OK... so you have been developing your Athena analysis for a while now. You generate signal samples locally, or maybe on something like WestGrid. You cunningly get your background samples by logging onto lxplus, copying them from CASTOR to a big scratch disk at CERN, and then scp them to the scratch disk on your desk. It's all so simple, why change? Well, for one thing, some day soon you will probably want to run on real data. Or maybe even sooner, on some MC sample which isn't in CASTOR. But the killer will be real data... you will not be able to copy petabytes to your scratch disk, and you will NOT just be able to run all your real data jobs on lxbatch like you did for LEP. You need Distributed Analysis.

There's too much bureaucracy involved in this GRID stuff... I'll let my (students, who are young and adaptable / supervisor, who will be staying on ATLAS when I have finished my MSc and moved onto some other project) get GRID certificates, but I'll just go on doing my analysis the old way...

The bureaucracy isn't so bad now. If you want a GRID Canada certificate, here is a web interface which will help you make your application. If you use the web interface, you should receive a generated certificate request via email. You must forward it as indicated to ca@gridcanada.ca. The certificate you receive in reply should be saved along with the keyfile in the .globus directory of your Unix home directory (if you do not have a .globus directory, you may have to make one). For subsequent registration steps you need to convert the certificate into a format that can be imported into your web browser (see further instructions from CERN). This can be done from the Unix command line and only needs to be done once before loading it into the browser, for example by: openssl pkcs12 -export -inkey yourkeyfile.name -in yourcertfile.name -out my_cert.p12 -name "Grid Canada"

If you don't get a GRID certificate, you will not be able to run Distributed Analysis. Which means you won't have any real access to the data.

Once you have a GRID certificate, you need to join the ATLAS and ATLAS-Canada Virtual Organizations (VOs). You can do this by using another web interface. Follow the instructions and then click on the vo name, ATLAS. This takes you to the voms configuration. Just follow the steps. Even if you are already a member of the ATLAS VO, you may need to join the ATLAS Canada VO (as it is new in the last few days). If you are already registered then you just need to join the ATLAS Canada voms group. Expand the "member info" in the menu on the left, and choose "Select Groups & Group Roles". Select "/atlas/ca" and submit. This will result in an email to Bryan Caron, who will authorize your membership of the ATLAS Canada voms group. You can tell if you are a member of the ATLAS-Canada voms by looking at this link. You need a valid certificate loaded in your browser for the link to work.

Enough propaganda! Tell me about Distributed Analysis

The package used to do Distributed Analysis on the LCG is called Ganga. There are at least two other "grid flavours" used on ATLAS: NorduGrid and OSG (which has its own job submission tool, PANDA). Ganga will, in the very near future, support PANDA and NorduGrid as alternative backends, so if you learn Ganga now, you will soon be able to submit jobs anywhere on the GRID that runs ATLAS software. This tutorial will concentrate on Ganga. Ganga can also be used to submit the same analysis job to your local machine, your local batch system, the LCG, or Condor. So even if you don't plan to use the GRID for a job, you can still use Ganga, and benefit from its nice bookkeeping tools.

We will make a "Hello World" job available soon, which prospective tutorial participants can use to make sure they have all the tutorial prerequisites in good working order.

I want to start now! Recent Tutorials on Distributed Analysis and Ganga

The slides from the various tutorials will give you a solid overview of Distributed Analysis and Ganga.

Trying to work through the Ganga tutorial on the wiki can be a good way to start, if you already have GRID certificates and have played around a little bit with LCG job submission. If you want to try to use datasets in Canada, here is a list of AOD stored on Canadian sites.

However, the Toronto tutorial was specifically designed with Canadian users in mind, and the organizers made sure that everyone could submit their jobs from a properly set up user interface, and that datasets were available, so you may need to check with local experts to make sure all that is available to you. Don't be discouraged if you run into difficulties - one of the reasons for having this tutorial is to give the Tier 1 and Tier 2 people in Canada a target date for sorting out and documenting local aspects of the user analysis chain and reporting weird CERN-centric "features" in the software.

We welcome feedback. If you have questions or comments, please contact me (Isabel dot Trigger at triumf dot ca) and I will direct your question to the appropriate expert.

-- IsabelTrigger - 23 Feb 2007

Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 2007-05-01 - IsabelTrigger
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2018 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback