\documentclass[a4paper]{jpconf}
\usepackage{graphicx}
\begin{document}
\title{EOSCpilot: Interoperability Interim results}

\author{C. Duma$^1$, A. Costantini$^1$, D. Michelotto$^1$,
        A. Ceccanti$^1$, E. Fattibene $^1$ and D. Salomoni$^1$}
\address{$^1$INFN Division CNAF, Bologna, Italy}

\ead{ds@cnaf.infn.it}

\begin{abstract}

The EOSCpilot project is the first project in the entire EOSC programme, tasked 
with exploring some of the scientific, technical and cultural challenges that need
to be addressed in the deployment of the EOSC. The EOSCpilot project has been funded
to support the first phase in the development of the European Open Science Cloud 
(EOSC). In this paper we present a summary of the second year activities results
in the field of interoperability containing the first results of the validation 
of services and demonstrators in the interoperability testbeds and the revised 
interoperability requirements derived from these activities. 

\end{abstract}

\section{Introduction}

The European Open Science Cloud (EOSC) programme aims to deliver an Open Data 
Science Environment that federates existing scientific data infrastructures to 
offer European science and technology researchers and practitioners seamless 
access to services for storage, management, analysis and re-use of research data 
presently restricted by geographic borders and scientific disciplines.

In the framework of the EOSCpilot, WP6, "Interoperability", aims to develop and 
demonstrate the interoperability requirements between e-Infrastructures, 
domain research infrastructures and other service providers needed in the 
European Open Science Cloud. It provides solutions, based on analysis of existing
and planned assets and techniques, to the challenge of interoperability. 
Two aspects of interoperability are taken into consideration: {\bf Data interoperability},
ensuring that data can be discovered accessed and used according to the FAIR 
principles, and {\bf Service interoperability}, ensuring that services operated 
within different infrastructures can interchange and interwork. 

In the framework of the EOSCpilot project INFN, and in particular CNAF is the 
coordinator of the activities of the task T6.3 - “Interoperability pilots 
(service implementation, integration, validation, provisioning for Science 
Demonstrators)”.

One of the project's main Objectives related to WP6 is to:
\begin{itemize}
\item “Develop a number of pilots that integrate services and infrastructures 
    to demonstrate interoperability in a number of scientific domains” 
\end{itemize}
mapped into some specific Objectives addressed by the T6.3 task:
\begin{itemize}
\item Validating the compliance of services provided by WP5, "Services", 
    with specifications and requirements defined by the Science Demonstrators in WP4, 
    "Science Demonstrators"
\item Defining and setting up distributed Interoperability Pilots, involving 
    multiple infrastructures, providers and scientific communities, with the purpose
    of validating the WP5 service portfolio.
\end{itemize}

\section{Activities and Achievements}
\label{sec:activities}

During 2018 the main activities coordinated by INFN-CNAF were:
\begin{itemize}
\item Support the setup of the Science Demonstrator pilots, following their 
    interoperability requirements and matching them again with available services and solutions
\item Setup of different pilot addressing different interoperability aspects:
\begin{itemize}
\item Transparent Networking – PiCo2 (Pilot for COnnection between COmputing centers)
\item Grid and Cloud interoperability – pilot demonstrator for one of the HEP experiments
\item AAI – through the setup of a scoped interoperability pilot as part of the 
    WLCG Authorization WG, AARC and EOSCpilot collaboration
\item Resource Brokering & orchestration – leveraging INDIGO-DataCloud solutions
\item Data accessibility & interoperability of underlying storage systems – 
    distributed Onedata deployment
\end{itemize}
\item Continuous interaction and communication with Science Demonstrators shepherds 
in order to collect eventual new requirements result of the activities done in the
implementation of the SDs specific use cases.
\end{itemize}

\subsection{Interoperability pilots: Transparent Networking}

The {\bf PiCO2 (Pilot for COnnecting  COmputing centers)} is one of the first 
interoperability pilots between generic, community agnostic, infrastructures, 
especially Tier-1 (National HPC/HTC centres), and Tier-2 (HPC/HTC regional centres). 
Its main objective is the automation of frequent, community agnostic, data flow 
(many large files) and code exchange between HPC (National, Regional) and HTC (national, grid) infrastructures
During 2018 technical groups have been set up:
\begin{itemize}
\item one for building a network of peer to peer federations between iRODS zones
    (data storage service), between Tier1 & Tier 2, between Tier2, and between Tier 2 and the grid
\item one for connecting the infrastructures within a L3VPN network and 
    monitoring the performance of the network between sites 
\item one for facilitating the mobility and use of codes between different 
    machines, using containers, packages for configuration management, and notebooks
\end{itemize}

In (Figure~\ref{fig:1}) we see the curent status of te project with the sites involved.

\begin{figure}
  \centering
  \includegraphics[width=\textwidth]{pico2_anrep2018.png}
  \caption{PiCO2 Layer 3 VPN}
  \label{fig:1}
\end{figure}


\subsection{Interoperability pilots: Grid-Cloud interoperability demonstrator 
for HEP community}

Dynamic On Demand Analysis Service (DODAS) is a Platform as a Service tool built 
combining several solutions and products developed by the INDIGO-DataCloud H2020 
project. It has been extensively tested on a dedicated interoperability testbed 
under the umbrella of the EOSCpilot project, during the first year of the project.
Although originally designed for the Compact Muon Solenoid (CMS)  Experiment at 
LHC, DODAS has been quickly adopted by the Alpha Magnetic Spectrometer (AMS) 
astroparticle physics experiment mounted on the ISS as a solution to exploit 
opportunistic computing, nowadays an extremely important topic for research 
domains where computing needs constantly increase. Given its flexibility and 
efficiency, DODAS was selected as one of the Thematic Services that will provide 
multi-disciplinary solutions in the EOSC-hub project. An integration and management 
system of the European Open Science Cloud starting in January 2018.
During the integration pilot the usage of any cloud (both public and private) 
to seamlessly integrate existing Grid computing model of CMS  was demonstrated.
Overall, integration has been successful and much experience has been gained 
resulting in improved understanding of weaknesses and aspects to improve and to optimise. 
Weaknesses, and aspects to be improved include:
\begin{itemize}
\item Federation: federated access to underlying IaaS is a key. So far we’ve 
experienced several issues. Frequently we had issues with the IaaS provider 
already using OpenID Connect Authorization Server and thus unable to federate 
additional services. We adopted ESACO  solution to solve such a problem. It 
would be crucial to have it as a EOSC provided service. 
\begin{itemize}
\item for non-proprietary IaaSes would be extremely important in the EOSC 
landscape. A scenario where, as example, a commercial cloud is used, would 
benefit of such functionality for counting the overall HEPSpec .
\end{itemize}
\item Transparent Data Access: so far the only scalable solution we can use is 
XrootD . However, this might not fit all possible use cases.  A more generic 
solution would be a big plus.
\item Resource monitoring: we didn’t find a common solution for monitoring 
cloud resources. Although we implemented our own we are convinced that a 
common strategy would be extremely valuable. 
\item PaaS Orchestration: Although the current INDIGO PaaS Orchestrator has 
been fully integrated and show enormous advantages while dealing with multiple
IaaSes, there is room for improvement both in the interface and in the management
of IaaS ranking. 
\end{itemize}


\subsection{Interoperability pilots: AAI}

The EOSCpilot and AARC (add reference) projects  started a collaboration activity
in the field of authorization and authentication, policies and recommendations 
regarding their design, that took shape, in the scope of the WP6 activities, 
under the form of an AAI interoperability demonstrator setup as part of the 
AARC pilots Task 1: {\bf Pilots with research communities based on use cases 
provided  - the WLCG use case}, regarding the {\it “Implementation of IdP/SP Proxy, 
mainly to provide Token Translation Services to allow end users to login without 
the need of manually managing X.509 certificates”}. A team of people was formed, 
under the WLCG coordination, to deal with the various activities – the {\bfWLCG 
 Authorization WorkingGroup  (WG)}, motivated by:
\begin{itemize}
 \item Evolving Identity Landscape
 \begin{itemize}
  \item User-owned x509 certificates -> Federated Identities
  \item Federated Identities linkage with existing VOMS authorizations not supported
  \item Maintaining assurance and identity vetting for federated users not supported
 \end{itemize}
 \item Central User Blocking
 \begin{itemize}
  \item Retirement of glexec removes blocking capability (& traceability)
  \item VO-level blocking not a realistic sanction
 \end{itemize}
 \item Data Protection
 \begin{itemize}
  \item Tightening of data protection (GDPR) requires fine-grained user level 
access control
 \end{itemize}
\end{itemize}

federated identities and the adoption of new authorization standards by industry 
is a strong signal for WLCG to adapt its authorization infrastructure, of which 
we can see the schema in (Figure~\ref{fig:2}) 

\begin{figure}
  \centering
  \includegraphics[width=\textwidth]{aai_anrepo2018.png}
  \caption{WLCG AAI system}
  \label{fig:2}
\end{figure}

After an initial requirements gathering , and analysis of how existing solutions
functionalities match the requirements , two main activities started:
\begin{enumerate} 
\item Design and testing of a WLCG Membership Management and Token Translation 
service, facilitated by pilot projects with the support of AARC (AAI Pilot Projects) 
\item Definition of a token based authorization schema for downstream WLCG 
services and token issuers (JWT) 
\end{enumerate}

The activities done during 2018 regarded the: 
\begin{itemize}
 \item IAM instance deployed @ INFN-CNAF since January 2018 to showcase
  main features and integration capabilities
 \begin{itemize}
  \item https://wlcg-authz-wg.cloud.cnaf.infn.it/login
 \end{itemize} 
 \item This deployment is being migrated to CERN infrastructure for further 
 validation & feedback on
\begin{itemize}
  \item RCAuth.eu and CERN HR database integration
  \item Registration & administration management functionality
\end{itemize}


TOCHANGE
The software development lifecycle (SDL) process (Figure~\ref{fig:1}) in INDIGO has been supported by a continuous 
software improvement process that regarded the software quality assurance, software maintenance, 
including release management, support services, and the management of pilot infrastructures 
needed for software integration and acceptance testing.

%\begin{figure}
%  \centering
%  \includegraphics[width=\textwidth]{Figure5.pdf}
%  \caption{Software development lifecycle implementation}
%  \label{fig:1}
%\end{figure}


Preview releases are made available for evaluation by user communities and
resource providers through the pilot infrastructures. Release
candidates are subjected to integration testing, which may include the

%\subsection{Software development lifecycle management}

Software lifecycle management is performed mostly via automated actions orchestrated.

In Figure we depict the project's software lifecycle management services and 
activities and their interdependencies:
\begin{itemize}
\item Version Control System (VCS) - Source Code is made available through public VCS 
        repositories, hosted externally in GitHub repositories, guaranteeing in this 
        way the software openness and visibility, simplifying the exploitation beyond the 
        project lifetime. The INDIGO-DataCloud software is released under the Apache 2.0 
        software license and can be deployed on both public and private Cloud infrastructures.
\item Software quality assurence criteria and control activities and services to enable them: 
\begin{itemize}
\item Continuous Integration service using {\bf Jenkins}: Service to automate the building, 
        packaging (where applicable) and execution of unit and functional tests of software components.
\item Code review service using GitHub: Code review of software source code is one integral part of the SQA\@. This service facilitates the code review proces. It records the 
        comments and allows the reviewer to verify the software modification.
\item Code metrics services using {\bf Grimoire}: To collect and visualize several metrics about the software components.
\end{itemize}
\item Software release and maintenance activities, services and supporting infrastructures
\begin{itemize}
\item A project management service using {\bf openproject.org} is made available by the 
        project: It provides tools such as an issue tracker, wiki, a placeholder for documents and a project management timeline.
\item Artifacts repositories for RPM and Debian packages, and Docker Hub for containers: 
        In INDIGO-DataCloud there are two types of artifacts, packaged software and virtual images. 
        The software can be downloaded from our public repository\footnote{http://repo.indigo-datacloud.eu}.
\item Release notes, installation and configuration guides, user and development manuals are made 
        available on {\bf GitBook}\footnote{https://indigo-dc.gitbooks.io/indigo-datacloud-releases}.
\item Bug trackers using GitHub issues tracker: Services to track issues and bugs of INDIGO-DataCloud software components.
\item Integration infrastructure: this infrastructure is composed of computing resources to support directly 
        the Continuous Integration service. It's the place where building and packaging of software 
        occurs as well as the execution of unit and functional tests. These resources are provided by INDIGO partners.
\item Testing infrastructure: this infrastructure aims to provide several types of environment. A stable environment 
        for users where they can preview the software and services developed by INDIGO-DataCloud, prior to its public release. 
\item Preview infrastructure: where the released artifacts are deployed and made available for testing and validation by the use-cases.
\end{itemize}

\end{itemize}


The first INDIGO-DataCloud major release (codename {\tt MidnightBlue}) was released 1st of August 2016 (see table~\ref{tab:1} for the fact sheet). The 
second INDIGO-DataCloud major release (codename {\tt ElectricIndigo}) was made publicly available on April 14th  2017 (see table~\ref{tab:2} for the fact sheet).


\section{DevOps approach in INDIGO}

Progressive levels of automation were adopted throughout the different phases of 
the INDIGO-DataCloud project software development and delivery processes.

\subsection{Services for continuous integration and SQA}

The INDIGO-DataCloud CI process is schematically shown
in Figure~\ref{fig:3}. The process, in its different steps, reflects some of
the main and important achievements of the software integration team, such as:

\begin{itemize}
    \item New features are developed independently from the
          production version in \textit{feature branches}. The creation of
          a pull request for a specific feature branch marks the start of
          the automated validation process through the execution of the
          SQA jobs.

    \item The SQA jobs perform the code style verification and calculate unit
        and functional test coverage.
        \begin{itemize}
            \item The tools necessary for tackling these tests are packaged in
                Docker images, available in DockerHub.
            \item Each test then initiates a new container that provides a
                clean environment for its execution.
            \item This is an innovative approach that provides the flexibility
                needed to cope with the INDIGO-DataCloud software diversity.
        \end{itemize}

    \item The results of the several SQA jobs are made available in the Jenkins
        service which notifies back to GitHub their exit status.
        \begin{itemize}
            \item Only if the tests have succeeded, the source code is
                validated and is ready to be merged into the production branch.
        \end{itemize}

    \item The last step in the workflow is the code review, where a human
        review of the change is performed. After code review the source code
                can be merged and becomes ready for integration and later release.
\end{itemize}



As a general rule, the described CI process must be followed by all the PTs
contributing code to INDIGO-DataCloud. However there are exceptions to this rule that fall into two main categories:

\subsection{Continuous delivery}
Continuous delivery adds, on top of the software development chain, a seamless
manufacturing of software packages ready to be deployed into production
services. Therefore, fast, frequent and small releases can be taken over thus
promoting the reliability of the software.

\subsection{DevOps adoption from user communities}

The experience gathered throughout the project with regards to the adoption of different DevOps 
practices is not only useful and suitable for the software related to the core services in the 
INDIGO-DataCloud solution, but also applicable to the development and distribution of the applications coming from the user communities.

\section{Conclusions}

Thanks to the new common solutions developed by the INDIGO project, teams of first-line 
researchers in Europe are using public and private Cloud resources to get new results in Physics, Biology, Astronomy, Medicine, Humanities and other disciplines.

\section*{Acknowledgments}
EOSCpilot has been funded by the European Commision H2020 research and innovation program under grant agreement RIA XXXXXXX.


\end{document}