TRIUMF Docker Containers

Description of Set-up

Initially at the TRIUMF Tier-1 we have set up Docker with the early goal of running ATLAS SL-6 based Docker containers on SL-7 Docker engines (worker nodes/blades).

We have decided to start with a condor-based setup where we run 'condor_master' in the container, allowing jobs to run continuously on the container as long as condor_master is running.

The most current version of the document is available in the Tier1Docker -v3.pdf attachment below.

The first 2 documents on the process, Tier1Docker -v1.pdf and Tier1Docker -v2.pdf, are still available below as well.

Some links mentioned in the pdf documents (like our internal git repository) will not be accessible off-site. Interested persons can contact me if you would like a shallow copy of the git Ansible tree.

The PDF document will change frequently as we correct issues with the initial implementation, and as we work on implementing 'Docker Distribution' for secure image cataloging and storage.

Overview of Image, Engine and Container Set-up

Here are is a list of noteworthy items:

Engine:

  • We set up the blade (worker node) that will run the container as if it were a regular worker node, but without all the grid middleware installed. Thus we configure user accounts, install cvmfs, install grid certificates and CRL fetching, any cron jobs that will do account cleanup, and add any local script customizations.
  • We add a /home/docker directory were local scripts and configurations which can be copied to a container at start-up.
  • The automounter is configured to never unmount cvmfs; however it isn't clear if this matters with bind-mounting the cvmfs tree to the container -- one of our test servers currently does not have the timeout set to zero, but still seems to be working. This needs to be validated:

    # grep cvmfs /etc/auto.master
    /cvmfs          /etc/auto.cvmfs --timeout 0
Container:
  • The container bind-mounts the following directories, file systems and files at start-up:
    • /etc/grid-security -- for CRLs and certificates
    • /opt/glite -- local access to any of the older yaim configuration files
    • /cvmfs -- the cvmfs software tree
    • /home -- user account scratch space
Image:
  • The PDF document mentions dumb-init, a C program used as the process supervisor at container launch. There is a link to the RPM in the document. The current state of the /root/init script, invoked by dumb-init, is attached below.
  • Tier1Docker-v1.pdf: TRIUMF Tier-1 Docker setup - version 1
Versions:
  • we are using at this time Docker version 1.10 ( docker-1.10.3-44.el7.centos.x86_64 )

Panda queues

Panda queues are defined and active for these docker containers

Validation failures

These are the validations that failed; ignoring obsolete releases older than 17.2.0, all the others which failed are also verified to fail today (19 Sep 2016) on a normal SL6 node.

Site: TRIUMF-LCG2, Queue: TRIUMF_DOCKER
Missing            VO-atlas-AtlasPhysics-17.0.6.2.1-i686-slc5-gcc43-opt
Missing            VO-atlas-AtlasPhysics-17.0.6.2.2-i686-slc5-gcc43-opt
Missing            VO-atlas-AtlasPhysics-17.0.6.2.3-i686-slc5-gcc43-opt
Missing            VO-atlas-offline-19.2.2-x86_64-slc6-gcc47-opt
Missing            VO-atlas-offline-20.1.56-x86_64-slc6-gcc48-opt
Missing            VO-atlas-offline-20.20.0-x86_64-slc6-gcc49-opt
Missing            VO-atlas-offline-20.8.1-x86_64-slc6-gcc48-opt
Missing            VO-atlas-offline-21.0.0-x86_64-slc6-gcc49-opt
Missing            VO-atlas-offline-21.0.1-x86_64-slc6-gcc49-opt
Missing            VO-atlas-offline-21.0.2-x86_64-slc6-gcc49-opt
Missing            VO-atlas-production-19.2.1.5-x86_64-slc6-gcc47-opt
Missing            VO-atlas-production-19.2.1.6-x86_64-slc6-gcc47-opt
Missing            VO-atlas-production-20.20.0.1-x86_64-slc6-gcc49-opt
Missing            VO-atlas-production-20.20.0.2-x86_64-slc6-gcc49-opt
Missing            VO-atlas-production-20.20.0.3-x86_64-slc6-gcc49-opt
Missing            VO-atlas-TrigMC-17.0.6.2.2-i686-slc5-gcc43-opt
Missing            VO-atlas-TrigMC-17.0.6.2.3-i686-slc5-gcc43-opt
Missing            VO-atlas-TrigMC-17.0.6.2.4-i686-slc5-gcc43-opt
Missing            VO-atlas-TrigMC-17.0.6.2.5-i686-slc5-gcc43-opt

Site: TRIUMF-LCG2, Queue: ANALY_TRIUMF_DOCKER
Missing            VO-atlas-AtlasPhysics-17.0.6.2.1-i686-slc5-gcc43-opt
Missing            VO-atlas-AtlasPhysics-17.0.6.2.2-i686-slc5-gcc43-opt
Missing            VO-atlas-AtlasPhysics-17.0.6.2.3-i686-slc5-gcc43-opt
Missing            VO-atlas-offline-19.2.2-x86_64-slc6-gcc47-opt
Missing            VO-atlas-offline-20.1.56-x86_64-slc6-gcc48-opt
Missing            VO-atlas-offline-20.20.0-x86_64-slc6-gcc49-opt
Missing            VO-atlas-offline-20.8.1-x86_64-slc6-gcc48-opt
Missing            VO-atlas-offline-21.0.0-x86_64-slc6-gcc49-opt
Missing            VO-atlas-offline-21.0.1-x86_64-slc6-gcc49-opt
Missing            VO-atlas-offline-21.0.2-x86_64-slc6-gcc49-opt
Missing            VO-atlas-production-17.0.6.18-x86_64-slc5-gcc43-opt
Missing            VO-atlas-production-17.0.6.19-x86_64-slc5-gcc43-opt
Missing            VO-atlas-production-17.0.6.20-x86_64-slc5-gcc43-opt
Missing            VO-atlas-production-17.0.6.21-x86_64-slc5-gcc43-opt
Missing            VO-atlas-production-17.0.6.22-x86_64-slc5-gcc43-opt
Missing            VO-atlas-production-17.0.7.1-i686-slc5-gcc43-opt
Missing            VO-atlas-production-17.0.7.1-x86_64-slc5-gcc43-opt
Missing            VO-atlas-production-19.2.1.5-x86_64-slc6-gcc47-opt
Missing            VO-atlas-production-19.2.1.6-x86_64-slc6-gcc47-opt
Missing            VO-atlas-production-20.20.0.1-x86_64-slc6-gcc49-opt
Missing            VO-atlas-production-20.20.0.2-x86_64-slc6-gcc49-opt
Missing            VO-atlas-production-20.20.0.3-x86_64-slc6-gcc49-opt
Missing            VO-atlas-TrigMC-17.0.6.2.2-i686-slc5-gcc43-opt
Missing            VO-atlas-TrigMC-17.0.6.2.3-i686-slc5-gcc43-opt
Missing            VO-atlas-TrigMC-17.0.6.2.4-i686-slc5-gcc43-opt
Missing            VO-atlas-TrigMC-17.0.6.2.5-i686-slc5-gcc43-opt

Benchmarks

- Preliminary benchmark with HEPSPEC06 has been done on our IBM blade

  • hardware
    • Product Name: BladeCenter HS22 -[7870AC1] with hypthreading and turbo boost off
    • CPU: Intel(R) Xeon(R) CPU X5650 @ 2.67GHz
    • Memory: 48GB
    • disk: 2 300GB
  • software
    • bare metal OS: Scientific Linux release 7.2 (Nitrogen)
    • kernel: 3.10.0-327.28.3.el7.x86_64
    • docker: docker-1.10.3-44.el7.centos.x86_64
    • docker image: Scientific Linux release 6.7 (Carbon)
  • benchmark results - HEP-SPEC06 (based on CPU2006 benchmark suite)
    • docker: 180.42
    • bare metal: 182.4

- Preliminary disk benchmarks using iozone have been conducted to compare the host versus a docker instance: docker_local_io_test.pdf

- Preliminary data transfer tests comparing gsiftp, dcap, http and root protocols on a docker instance vs. the host are available at: docker_datatransfer_test.pdf

  • Tier1Docker-v3.pdf: Latest document describing Engine, Registry and Image building and signing.

  • docker_build.tar.gz: tarball of example dockerfiles, kickstart file and scripts mentioned in /Tier1Docker-v3.pdf

Topic attachments
I Attachment History Action Size Date Who Comment
PDFpdf Tier1Docker-v1.pdf r1 manage 587.6 K 2016-09-08 - 23:59 DeniceDeatrich TRIUMF Tier-1 Docker setup - version 1
PDFpdf Tier1Docker-v2.pdf r1 manage 623.7 K 2017-03-27 - 17:36 DeniceDeatrich Version 2 of Tier-1 Docker doc
PDFpdf Tier1Docker-v3.pdf r1 manage 684.4 K 2017-10-30 - 22:13 DeniceDeatrich Latest document describing Engine, Registry and Image building and signing.
Unknown file formatgz docker_ansible.tar.gz r1 manage 25.1 K 2017-10-30 - 15:48 DeniceDeatrich tarball of example ansible roles mentioned in Tier1Docker-v3.pdf
Unknown file formatgz docker_build.tar.gz r1 manage 310.2 K 2017-10-30 - 15:45 DeniceDeatrich tarball of example dockerfiles, kickstart file and scripts mentioned in /Tier1Docker-v3.pdf
PDFpdf docker_local_io_test.pdf r1 manage 136.9 K 2016-10-07 - 20:44 RedaTafirout local I/O tests
Unknown file formatext init r1 manage 0.7 K 2016-09-09 - 16:22 DeniceDeatrich The current 'shebang' implementation of 'dumb-init' to set the hostname, copy local configurations, set the condor environment and launch condor_master
Unknown file formatgz lorax_for_SL6.tar.gz r1 manage 269.9 K 2017-10-30 - 15:47 DeniceDeatrich tarball of lorax source and binary rpm mentioned in /Tier1Docker-v3.pdf
Edit | Attach | Watch | Print version | History: r18 < r17 < r16 < r15 < r14 | Backlinks | Raw View | Raw edit | More topic actions
Topic revision: r18 - 2017-10-30 - DeniceDeatrich
 
This site is powered by the TWiki collaboration platform Powered by PerlCopyright © 2008-2019 by the contributing authors. All material on this collaboration platform is the property of the contributing authors.
Ideas, requests, problems regarding TWiki? Send feedback