SL7 Testing


This page describes the setup at TRIUMF to test a WN with SL7. The goal here is to check that all required software exists on the platform for ATLAS software to run. To this end, we will define Panda queues for production (score) and analysis and run complete validations of all ATLAS software. In addition, we plan to run Hammercloud functional tests on the queues.

Hardware

OS

  • Scientific Linux 7.2 is installed.
  • cvmfs installed
  • no afs access

Additional rpms

Grid middleware and batch system

Since UMD doesn't release WN for SL7 yet, we decided to use the following strategy to install the grid middleware packages on SL7 WN:

  1. Get the package list from UMD3 emi-wn meta package for SL6.
  2. Install packages from EPEL7 repository, we can find the majority of packages there including Globus and some data management packages.
  3. For noarch packages, if we can't find from EPEL repository, copy them from SL6 UMD3 repository
  4. Build Torque packages for SL7 by ourself
We got the following packages installed:
  • a1_grid_env
  • c-ares
  • cleanup-grid-accounts
  • dcache-srmclient
  • dcap
  • dcap-devel
  • dcap-libs
  • dcap-tunnel-gsi
  • dcap-tunnel-krb
  • dcap-tunnel-ssl
  • dcap-tunnel-telnet
  • dpm
  • dpm-libs
  • dpm-devel
  • dpm-perl
  • dpm-python
  • emi-version
  • fetch-crl
  • gfal2-all
  • gfal2-python
  • gfal2-util
  • gfalFS
  • gfal2-doc
  • gfal2-devel
  • ginfo
  • glite-jobid-api-c
  • glite-lb-client
  • glite-lb-common
  • glite-lb-client-progs
  • glite-lbjp-common-gss
  • glite-lbjp-common-trio
  • glite-wn-info
  • glite-yaim-clients
  • glite-yaim-core
  • globus-gass-copy-progs
  • globus-proxy-utils gridsite-libs
  • jclassads
  • lcgdm-devel
  • lcgdm-libs
  • lcg-info
  • lcg-infosites
  • lcg-ManageVOTag
  • lcg-tags
  • lfc
  • lfc-libs
  • lfc-devel
  • lfc-perl
  • lfc-python
  • openldap-clients
  • python-ldap
  • uberftp
  • voms-clients-java
  • voms-devel
The following packages are installed too for dependency:
  • CGSI-gSOAP
  • apache-commons-cli
  • apache-commons-io
  • bouncycastle
  • bouncycastle-pkix
  • canl-c
  • canl-java
  • condor-classads
  • davix-libs
  • gfal2
  • gfal2-plugin-dcap
  • gfal2-plugin-file
  • gfal2-plugin-gridftp
  • gfal2-plugin-http
  • gfal2-plugin-lfc
  • gfal2-plugin-rfio
  • gfal2-plugin-srm
  • globus-callout
  • globus-common
  • globus-ftp-client
  • globus-ftp-control
  • globus-gass-copy
  • globus-gass-transfer
  • globus-gsi-callback
  • globus-gsi-cert-utils
  • globus-gsi-credential
  • globus-gsi-openssl-error
  • globus-gsi-proxy-core
  • globus-gsi-proxy-ssl
  • globus-gsi-sysconfig
  • globus-gss-assist
  • globus-gssapi-error
  • globus-gssapi-gsi
  • globus-io
  • globus-openssl-module
  • globus-xio
  • globus-xio-gsi-driver
  • globus-xio-popen-driver
  • gsoap
  • perl-Authen-SASL
  • perl-Convert-ASN1
  • perl-DBI
  • perl-Digest-HMAC
  • perl-GSSAPI
  • perl-JSON
  • perl-LDAP
  • perl-Net-Daemon
  • perl-PlRPC
  • perl-XML-Filter-BufferText
  • perl-XML-SAX-Writer
  • pugixml
  • srm-ifce
  • voms
  • voms-api-java
In addition, we also use emi-torque-client for Torque batch system and configuration, so I installed the following packages:
  • glite-yaim-torque-client
  • lcg-pbs-utils
  • torque
  • torque-client
For records, the following packages are not installed:
  • dpm-libs.i686
  • emi.amga.amga-cli
  • emi.saga-adapter.context-cpp
  • emi.saga-adapter.isn-cpp
  • emi.saga-adapter.sd-cpp
  • gfal.x86_64
  • gfal.i686
  • gfal-python
  • glite-service-discovery-api-c
  • glite-wms-brokerinfo-access
  • lcg-util
  • lcg-util-libs.x86_64
  • lcg-util-libs.i686
  • lcg-util-python
  • lcgdm-devel.i686
  • lcgdm-libs.i686
  • lfc-libs.i686
  • voms-clients3 (should change to voms-clients-java, or voms-clients-cpp)
We could not find the packages for gLexec in EPEL7, so I did not install and configure gLexec. Then we configured the WNs with YAIM:
  • /opt/glite/yaim/bin/yaim -c -s site-info.def -n emi-WN -n emi-TORQUE_client
We did some manual changes as follows:
  1. The join command behaves differently between SL6 and SL7, in YAIM function config_users we had to replace line "while IFS=: read id user gids groups virtorg tag other; do" with "while IFS=: read user id gids groups virtorg tag other; do".
  2. Now SL7 is using systemctl for daemons and YAIM is not up to date, so had to run 'systemctl enable fetch-crl-cron' and ''systemctl start fetch-crl-cron".
  3. There are also some changes on openssh from SL7, in particular for us the host keys should be changed to ssh_keys group readable.
A separate Torque queue testsl7 has been created for the test. Besides, one local customization for dcap DB is not configured.

Panda queues

Validation

These releases are ignored for validation failures:

Note: (19Sep2016), new sw-mgr will skip slc5 releases (gcc will fail so they will skip validations)

Release Comment
< 17.2.0 Obsolete ?
any slc5 release not supported on SL7
18.1.0-x86_64, 18.0.0-x86_64, 17.8.0-x86_64, 17.8.0-x86_64-gcc47, 17.7.5-x86_64, 17.7.4-x86_64-gcc46, 17.7.3-x86_64-gcc46, 17.7.0-slc6-x86_64 genreflex issue
19.2.2 incorrect DB configuration for run4 ?
19.2.1.6-x86_64 incorrect DB configuration for run4 ?
19.2.1.5-x86_64 incorrect DB configuration for run4 ?
20.7.3-x86_64, 20.7.2-x86_64, 20.7.1-x86_64, 20.7.0-x86_64, 20.6.0-x86_64 LCG ROOT issue
20.8.1-x86_64 fails at most SL6 sites
20.8.0-x86_64 fails at most SL6 sites
20.1.56-x86_64 fails at most SL6 sites
tdaq:nightly.LCG75root6@20.7.2-x86_64 (no longer in validation)
tdaq:nightly.LCG75root6@20.7.3-x86_64 (no longer in validation)

Hammercloud Functional Tests

Issues Seen

Date Resolved Description
10 Dec 2015 27 Jul 2016

Pilot: release setup from kit asetup which is an older version and does not recognize SL7

Fixed (using asetup from ALRB).

10 Dec 2015 03 Mar 2016

Validation: sw-mgr uses asetup from kit which is an older version and does not recognize SL7

Resolved by new sw-mgr which uses a newer version of asetup if a kit's version is old.

4 Mar 2016 WIll not be fixed (slc5 releases) This shared lib is missing on SL7: libgmp.so.3. gcc4.3.5 does not provide it but gcc 4.3.6 has it. As such, validations are failing for many analy releases that set up gcc 4.3.5. Please see details.
7 Mar 2016 Will not be fixed (slc5 releases) Some KV tests failing on analy queue while passing prod queue. Investigation shows that prod queue ran only "Hello World". (eg release 17.0.1.1). Ignoring tis releases < 17.2.0 and note prod KV does not do compilations.
4 Apr 2016   Release which set up LCG_75root6 fail. This is a known issue; see details.
4 Apr 2016   genreflex command failed; compilation error '__builtin_bswap64' was not declared in this scope. Please see details.
5 Aug 2016 fixed; 21.0.0, 21.0.1 and 21.0.2 also fail on SL6 (19Sep2016). 21.0.0, 21.0.1,21.0.2,21.0.3 being setup incorrectly (cmtsite for cmake releases)
10 Aug 2016 20.20.0 also fails on SL6 (19Sep2016) 20.20.0 does a core dump for RAWtoESD processing. Seems sites validated after July 6 fail .. why ?
10 Aug 2016 OK (19Sep2016) 20.3.6.1 fails (analy) - runs out of memory.
7 Mar 2017   21.0.0, 21.0.1 DB config issues, will not be fixed.
4 Apr 2017   21.0.13.1 not yet supported for KV so may fail. Ignore for now.

Other Observations

python 2.6 from 32-bit release fails some tests only on CC7 (lxplus7)

[desilva@lxplus024 ~]$ cat /etc/redhat-release 
CentOS Linux release 7.2.1511 (Core) 
[desilva@lxplus024 ~]$ export AtlasSetup=/afs/cern.ch/atlas/software/dist/AtlasSetup
[desilva@lxplus024 ~]$ alias asetup='source $AtlasSetup/scripts/asetup.sh'
[desilva@lxplus024 ~]$ asetup 17.0.6,32,here
Using AtlasOffline/17.0.6 [cmt] with platform i686-slc5-gcc43-opt
   at /cvmfs/atlas.cern.ch/repo/sw/software/i686-slc5-gcc43-opt/17.0.6
Test area: /afs/cern.ch/user/d/desilva
manpath: warning: $MANPATH set, ignoring /etc/man_db.conf
[desilva@lxplus024 ~]$ which python
/cvmfs/atlas.cern.ch/repo/sw/software/i686-slc5-gcc43-opt/17.0.6/sw/lcg/external/Python/2.6.5/i686-slc5-gcc43-opt/bin/python
[desilva@lxplus024 ~]$ python -V
Python 2.6.5
[desilva@lxplus024 ~]$ python
Python 2.6.5 (r265:79063, Jun 29 2010, 16:44:23) 
[GCC 4.3.2] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import os,pwd
>>> os.getuid()
28311
>>> pwd.getpwuid(os.getuid()).pw_name
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
KeyError: 'getpwuid(): uid not found: 28311'
>>> 

References

Edit | Attach | Watch | Print version | History: r25 < r24 < r23 < r22 < r21 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r25 - 2017-04-04 - AsokaDeSilva
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback