A lot of shifters still don't know how to use the operation tools and don't following the operations procedures, so I create some tips on how to use these tools. Please feel free to update, correct and add new useful contents.

CIC operations portal

CIC operations portal provides a straightforward dashboard for monitoring sites and other useful information, the cic-on-duty (COD) also use CIC portal for central monitoring. We also use its ROC dashboard for our daily monitoring.

CIC dashboard

All certified sites are monitored by CIC dashboard. By default, the filter is "sites in my scope", thus it gives a overview of all sites in your scope, i.e., all certified sites under ROC_Canada for us. Shifter should check it at least twice per day.

  1. When there is a SAM test failure, CIC portal is notified. And when you open or refresh CIC dashboard, you will see a red flash symbol with the number of alarms under it on the right side of site tab. Unclosed alarms is included too. If you move mouse over it, it will show you the statics of alarm ages. By clicking the magnifying arrow on the left of the site name, the accordion will open and give different sub-accordion about alarms, tickets etc. Continue clicking the magnifying arrow for "New NAGIOS alarms" to see the details about the alarms.
    1. For the alarms which last status is "OK", you can switch them off by clicking the "Close selected alarms" button after selecting it by clicking the check-box.
    2. You can click "View" under details bar to view the details of alarms.
    3. Some time there are multiple alarms from different tests which are caused by the same error. For example, when there are problems with SE, both the SRMv2 and CE tests would fail. In this case, you can click the mask symbol (the third one under action bar) of alarm and open a pop-up window which permits to mask or unmask alarms.
    4. Usually when a critical alarm age is older than 3hours(let's more aggressive to improve reliability and availability), we submit a GGUS ticket through CIC portal. Click the "Add a ticket for this alarm" symbol (the second symbol under action bar), a pop-up window will open with the ticket template and links for the alarm. For masked alarms, please do it though the main alarms.
    5. When a alarm is not handled in three days, it will be raised to C-COD team, so it's important to handle the alarms in three days..
  2. "Assigned alarms" sub-accordion is for alarms assigned to a ticket, you can view the details and check the status about these alarms under this sub-accordion. If there is not assigned alarms, this sub-accordion will not appear.
  3. All GGUS tickets opened through CIC portal can be viewed under "Tickets" sub-accordion. You can also get the statics for all tickets submit to the site by moving mouse on the ticket symbol on the site accordion.
    1. Clicking ticket update icon (the first icon) open a pop-up window which permits to update the corresponding ticket.
    2. Clicking GGUS ticket ID leads to GGUS ticket details page on GGUS portal
    3. By default, the ticket expiration date at which the ticket is supposed to be updated or solved is 5 days after created. Expired ticket since 3 days will be raised to C-COD team, so it's important to update ticket or extend the expiration date before that.
    4. When you close GGUS ticket, if the ticket is opened through CIC portal, please close it through CIC portal. Click the ticket update icon and choose "Site OK" for "Escalate" filed in the pop-up window. If it's not closed through CIC portal, the ticket record will be left on CIC portal and status will not be updated. If there alarms are linked to the GGUS ticket, close the ticket through CIC portal will also close these alarms.
-- DiQing - 09 August 2011
Edit | Attach | Watch | Print version | History: r3 < r2 < r1 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r3 - 2011-08-09 - DiQing
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback