GGUS issues

How to request new features in GGUS (e.g. creation of a new support unit)

  1. The GGUS tool is available at www.ggus.org
  2. "Submit a request for a new feature to GGUS". The direct link is https://savannah.cern.ch/support/?func=additem&group=esc

How to handle "temporary hack solutions" in GGUS


If in a ticket a solution is found which solves the problem but only temporarily and it is known that a more stable fix is possible and due, the ROC operator on duty should

  1. put the user ticket to "solved" and write in the solution field that a solution has been found, but a more stable/permanent solution is needed, warning the user that the problem might reoccur and advising them to open a new ticket in GGUS if the problem reappears,
  2. open a new ticket describing the problem, the temporary fix, and the real solution (if known)
  3. link the old closed ticket to the new one using the "relates issue" field in the ticket

ROC Canada Site Procedures

When a site wants to join


If a request by a site to join the grid is received, one should:

  1. Evaluate if the site will be under the ROC Canada or not.
  2. (a) Should it fall under ROC Canada, an e-mail should be written to the requester providing the necessary information and explaining the procedure for joining the grid. An example of the e-mail to send is in the How to join template. (b) Otherwise an e-mail should be sent to the ROC under which the site should be asking if they are willing to be the ROC for that site.
  3. Check that the site administrator has signed their candidature. If they have not, send an e-mail to remind them that they need to provide a digital signature. An example of the e-mail to send is in the Request to a site to sign their candidature template. If you are asked for a template of the statement that the site has to sign, you can use the following

 
"I have read and hereby accept the policy documents as requested in the ROC Canada e-mail. 
 All administrators and other necessary personnel at our site will be informed of and agree to 
 abide by all Grid operating policies of LCG/EGEE Security Response procedure and the 
 relevant Open Science Grid (OSG) guide." 

How to Suspend a Site

A definitive ROC Canada escalation process, aimed to decide whether and when a site should be suspended, is not defined yet. In our experience two classes of events have been identified, so far, that can lead to this decision.

  1. The ROC Canada is informed by the Grid Operator on Duty that the decision has been made to suspend one of the sites which the ROC is managing. Usually the next step for the GOoD is to put forward the decision during the weekly operation meeting. Although the suspension action can be performed by the GOoD, if no serious objections to this decision can reasonably be moved from the ROC Canada, it has been internally agreed that, as a demonstration of proactivity, the ROC Canada should autonomously suspend the site. In this case a notification to the site should be sent with the Suspension Notification template.
  2. The decision to suspend the site is made directly by the ROC Canada (e.g. due to excessively growing number of unhandled issues affecting the site). In this case it is preferable to contact the site first, informing the administrators that a measure of suspension is going to be taken by the ROC. Convenient time should be left to the site to reply if the site wants to oppose to the measure. An example of the e-mail to send is in the Suspension Request template.
Once the site is informed, and all the "timeouts" have expired, the following steps are:
  1. Suspend the site on the GOC db. a)Select "GODB4 input system", b) Search site, c) Change site Status --> suspended, d) Status Reason: Enter the main reason why the site is being suspended, possibly with reference to tickets.
  2. Add the suspended site to the ROC Canada BDII. Please send email to roc@lcgNoSPAMtriumfNoSPAMca.
  3. Close all relevant open tickets on GGUS.
  • Tickets opened by the GOoD can be retrieved by a GGUS search using the site name as a keyword.
  • Specify in the solution " Site suspended by ROC_Canada on _DATE_ "

SLA with sites.

  1. First send an e-mail to the site administrtor with in attachment a copy of the SLA template to the site administrator for comments. An example of the e-mail to send is in the First request to sign the SLA template. There is also an editable pdf for version 1.6 of the SLA attached at the bottom of this page.
  2. A GGUS (Global Grid User Support) ticket is opened and assigned to the ROC Canada.This ticket will be used to keep track of the SLA signing activity, which in general can be followed by several people within the ROC. It will be closed when the site and the ROC have signed and a signed copy has been returned to the ROC.
  3. Agree on possible amendments
  4. Have the ROC manager to sign two copies of the SLA and send them to the site administrator.
  5. Have the ROC manager to sign digitally a copy of the SLA and e-mail it to the site administrator for their signature.

Site Certification: overview

  1. Add the site to the GOCDB.
  2. A GGUS (Global Grid User Support) ticket is opened and assigned to the ROC Canada.This ticket will be used to keep track of the certification activity, which in general can be followed by several people within the ROC. It will be closed when the site is set in production.
  3. First of all a gap analysis has to be done with the site in order to make clear which services and versions is the site going to run. The target of this analysis is the definition of the actual requirements for the site and the documentation/training possibly needed.
  4. The site is configured by the site administrator(s) and ROC Canada is notified when it is done. During all this time the SAM tests run by the ROC Canada will be active, in order to help the site admins to monitor their progresses.
  5. After the notification arrives that the site is correctly configured (proved by the successful SAM tests) the countdown of three days is started.
  6. An end-to-end test of the support line is done in order to verify the ability of the site to receive and work with service tickets opened by the Operators-on-Duty (COD).
  7. After three days of continuous successful tests. The site will be inserted in production.
  8. As an add-on to the technical certification we require the site so sign the EGEE SLA with the ROC
All the relevant communication and interactions between the site and ROC Canada during the process above described should in principle be done through the GGUS ticket opened at step 1).

Site Certification: details

Starting the certification

A GGUS (Global Grid User Support) ticket is opened and assigned to the ROC Canada.This ticket will be used to keep track of the certification activity, which in general can be followed by several people within the ROC. It will be closed when the site is set in production.
A template for an e-mail to do everything in one step is given here.

Gap Analysis

A gap analysis has to be done with the site in order to make clear which services and versions is the site going to run. The target of this analysis is the definition of the actual requirements for the site and the documentation/training possibly needed.

Certification Tests

The site is configured by the site administrator(s) and ROC Canada is notified when it is done. During all this time the SAM tests run by the ROC Canada will be active, in order to help the site admins to monitor their progresses. Details on SAM for the ROC Canada can be read here

After the notification arrives that the site is correctly configured (proved by the successful SAM tests) the countdown of three days is started.

The support line of the site will also be checked in order to verify the ability of the site to receive and work with service tickets opened by the Operators-on-Duty (COD)

Putting a site in production.

After three days of continuous successful tests. The site will be inserted in production see next section

After the certification: delivery of a site in Production

After the certification tests have been carried on successfully, the sites must be re-introduced in the production service.
This should not happen automatically. The site should be informed and accept. So we suggest this simple procedure to do it.

  1. Send an e-mail to the site administrator (cc roc mailing list) giving a report on the certification tests and asking if the site agrees to go in production. Ideally in this stage a "certification document" should be provided, but this is not defined right now. An example of this comunication can be found in the template, but please consider that each site's certification has got a particular history, and that should be summarized in the e-mail.
  2. Wait for explicit acceptance by the site administrator. We assume here that the site accepts.
  3. Set the "Certified" status in the GOC db. a) select "ROC Management" page, b)Change site Status --> certified, c) Status Reason: Enter a brief summary of the certification criteria (e.g. Job submission succeeded for 1 week).
  4. Send an e-mail to the site Administrator, the cic-on-duty (cc. roc@triumf) confirming the start of the operation in production mode. A template is given here.

ROC Canada Email templates

Joining

Suspending Certification SLA signing

ROC Canada Certification Tools

A quick guide to digital signatures

-- DiQing - 09 August 2011
Edit | Attach | Watch | Print version | History: r6 < r5 < r4 < r3 < r2 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r6 - 2011-08-09 - DiQing
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback