Skip to content
Snippets Groups Projects
Commit 90ff5779 authored by Doina Cristina Duma's avatar Doina Cristina Duma
Browse files

Upload New File

parent 723b66c5
No related branches found
No related tags found
1 merge request!3DS first report
This commit is part of merge request !3. Comments created here will be created in the context of that merge request.
\documentclass[a4paper]{jpconf}
\usepackage{graphicx}
\begin{document}
\title{INDIGO-DataCloud: Software Lifecycle Management embracing DevOps philosophy}
\author{C. Duma$^1$, A. Costantini$^1$, D. Michelotto$^1$,
M. Panella$^1$, D. Salomoni$^1$ and P. Orviz$^2$}
\address{$^1$INFN Division CNAF, Bologna, Italy}
\address{$^2$IFCA, Consejo Superior de Investigaciones Cientificas-CSIC, Santander, Spain}
\ead{cristina.aiftimiei@cnaf.infn.it}
\begin{abstract}
This paper describes the achievements of the H2020 project INDIGO-DataCloud in the field of software lifecycle management, the Continuous Integration and Delivery systems setup to manage the new releases, as a first step towards the implementation of a DevOps approach.
\end{abstract}
\section{Introduction}
INDIGO-DataCloud was an European project starting in April 2015, with the
purpose of developing a modular architecture and software components to improve
how scientific work is supported at the edge of computing services development.
The project focused also on delivering production-quality software, thus it
defined procedures and quality metrics, which were followed by, and
automatically checked for, all the INDIGO components. A comprehensive process
to package and deliver the projects software was also defined. As an outcome of
this, INDIGO - DataCloud delivered two main software releases (the first in August 2016,
the second in April 2017), each followed by several minor updates. The latest
release consists of about 40 open modular components, 50 Docker
containers, 170 software packages, all supporting up-to-date open operating
systems. This result was accomplished reusing and extending open source
software and, whenever applicable, contributing code to upstream projects.
\section{Software development lifecycle management and release process}
\label{sec:release}
The software development lifecycle (SDL) process (Figure~\ref{fig:1}) in INDIGO has been supported by a continuous
software improvement process that regarded the software quality assurance, software maintenance,
including release management, support services, and the management of pilot infrastructures
needed for software integration and acceptance testing.
Following the successful examples of other research software
development projects like, EGEE I, II, III, and EMI, the development,
maintenance and evolution of the software is taken over by the {\sl Product Teams} \/(PTs\/).
PTs are small teams of software developers, fully responsible for the development of a particular
software product (or group of tightly related products) compliant with the project's acceptance criteria.
In order to overcome some of the challenges faced by development teams, like increased waiting time
for code deployment and the pressure to work on old, pending and new code, during the lifetime of
the project a Countinuous Integration approach was adopted aiming to ensure a quick deployment of
the code, faster testing and feedback mechanism, reduced waiting time to deploy the code.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{Figure5.pdf}
\caption{Software development lifecycle implementation}
\label{fig:1}
\end{figure}
Preview releases are made available for evaluation by user communities and
resource providers through the pilot infrastructures. Release
candidates are subjected to integration testing, which may include the
participation of user communities. Once the required release quality is
attained the software is made available to the general public through
INDIGO-DataCloud repositories. INDIGO then provides support for the released
software and manage bug fixes in close coordination with the developers.
%\subsection{Software development lifecycle management}
Software lifecycle management is performed mostly via automated actions orchestrated
by a Continuous Integration (CI) tool. Each change in the code commited to the team's
code-repository triggers quality control activitis implmented as jobs in the CI system,
performing required build and test steps. The jobs themselves are defined by the developers
together with the quality assurance and release teams, and include code style checking,
unit testing, functional testing and, whenever possible, integration testing.
Some manual steps, e.g.\ for documentation verification and code review, are also required.
In Figure~\ref{fig:2} we depict the project's software lifecycle management services and
activities and their interdependencies:
\begin{itemize}
\item Version Control System (VCS) - Source Code is made available through public VCS
repositories, hosted externally in GitHub repositories, guaranteeing in this
way the software openness and visibility, simplifying the exploitation beyond the
project lifetime. The INDIGO-DataCloud software is released under the Apache 2.0
software license and can be deployed on both public and private Cloud infrastructures.
\item Software quality assurence criteria and control activities and services to enable them:
\begin{itemize}
\item Continuous Integration service using {\bf Jenkins}: Service to automate the building,
packaging (where applicable) and execution of unit and functional tests of software components.
\item Code review service using GitHub: Code review of software source code is one integral part of the SQA\@. This service facilitates the code review proces. It records the
comments and allows the reviewer to verify the software modification.
\item Code metrics services using {\bf Grimoire}: To collect and visualize several metrics about the software components.
\end{itemize}
\item Software release and maintenance activities, services and supporting infrastructures
\begin{itemize}
\item A project management service using {\bf openproject.org} is made available by the
project: It provides tools such as an issue tracker, wiki, a placeholder for documents and a project management timeline.
\item Artifacts repositories for RPM and Debian packages, and Docker Hub for containers:
In INDIGO-DataCloud there are two types of artifacts, packaged software and virtual images.
The software can be downloaded from our public repository\footnote{http://repo.indigo-datacloud.eu}.
\item Release notes, installation and configuration guides, user and development manuals are made
available on {\bf GitBook}\footnote{https://indigo-dc.gitbooks.io/indigo-datacloud-releases}.
\item Bug trackers using GitHub issues tracker: Services to track issues and bugs of INDIGO-DataCloud software components.
\item Integration infrastructure: this infrastructure is composed of computing resources to support directly
the Continuous Integration service. It's the place where building and packaging of software
occurs as well as the execution of unit and functional tests. These resources are provided by INDIGO partners.
\item Testing infrastructure: this infrastructure aims to provide several types of environment. A stable environment
for users where they can preview the software and services developed by INDIGO-DataCloud, prior to its public release.
\item Preview infrastructure: where the released artifacts are deployed and made available for testing and validation by the use-cases.
\end{itemize}
\end{itemize}
\begin{figure}
\centering
\includegraphics[width=\textwidth]{Figure6.pdf}
\caption{Software development lifecycle, release, maintenance and exploitation interdependencies.}
\label{fig:2}
\end{figure}
The first INDIGO-DataCloud major release (codename {\tt MidnightBlue}) was released 1st of August 2016 (see table~\ref{tab:1} for the fact sheet). The
second INDIGO-DataCloud major release (codename {\tt ElectricIndigo}) was made publicly available on April 14th 2017 (see table~\ref{tab:2} for the fact sheet).
\begin{figure}
\centering
\includegraphics[width=\textwidth]{TableI.pdf}
\caption{INDIGO-1 release fact sheet}
\label{tab:1}
\end{figure}
\begin{figure}
\centering
\includegraphics[width=\textwidth]{TableII.pdf}
\caption{INDIGO-2 release fact sheet}
\label{tab:2}
\end{figure}
\section{DevOps approach in INDIGO}
Progressive levels of automation were adopted throughout the different phases of
the INDIGO-DataCloud project software development and delivery processes.
Starting with a continuous integration approach, achieved already in the early
stages of the project, the second part of the project was devoted to the establishment
of the next natural step in the DevOps practices: the Continuous Delivery (CD).
\subsection{Services for continuous integration and SQA}
The INDIGO-DataCloud CI process is schematically shown
in Figure~\ref{fig:3}. The process, in its different steps, reflects some of
the main and important achievements of the software integration team, such as:
\begin{figure}
\centering
\includegraphics[width=\textwidth]{Figure8.pdf}
\caption{Continuous Integration workflow followed by new feature additions in the production codebase.}
\label{fig:3}
\end{figure}
\begin{itemize}
\item New features are developed independently from the
production version in \textit{feature branches}. The creation of
a pull request for a specific feature branch marks the start of
the automated validation process through the execution of the
SQA jobs.
\item The SQA jobs perform the code style verification and calculate unit
and functional test coverage.
\begin{itemize}
\item The tools necessary for tackling these tests are packaged in
Docker images, available in DockerHub.
\item Each test then initiates a new container that provides a
clean environment for its execution.
\item This is an innovative approach that provides the flexibility
needed to cope with the INDIGO-DataCloud software diversity.
\end{itemize}
\item The results of the several SQA jobs are made available in the Jenkins
service which notifies back to GitHub their exit status.
\begin{itemize}
\item Only if the tests have succeeded, the source code is
validated and is ready to be merged into the production branch.
\end{itemize}
\item The last step in the workflow is the code review, where a human
review of the change is performed. After code review the source code
can be merged and becomes ready for integration and later release.
\end{itemize}
As a general rule, the described CI process must be followed by all the PTs
contributing code to INDIGO-DataCloud. However there are exceptions to this rule that fall into two main categories:
\begin{itemize}
\item mature products that have well established development services
already in place;
\item new software products that aim to contribute to existing
projects/frameworks (such as OpenStack, OpenNebula and
others), which make use of dedicated
development services made available by the corresponding projects.
\end{itemize}
For these cases where products are using external development services, the
PTs must account that:
\begin{itemize}
\item CI testing must satisfy the INDIGO-DataCloud SQA requirements;
\item Provisioning of metrics could be requested by the SQA
team.
\end{itemize}
The INDIGO-DataCloud SQA team keeps track of the exceptions and continuous
integration systems being used by the several products. This is a dynamic
process. As products evolve and become accepted in their target projects they
may move from the INDIGO-DataCloud development system to external development
systems.
This is indeed one of the objectives of the SQA in INDIGO-DataCloud, improve
the quality of the software, so that it can be more easily accepted in target
projects ensuring a path for sustainability and exploitation.
\subsection{Continuous delivery}
Continuous delivery adds, on top of the software development chain, a seamless
manufacturing of software packages ready to be deployed into production
services. Therefore, fast, frequent and small releases can be taken over thus
promoting the reliability of the software.
In the INDIGO-DataCloud scenario, the continuous delivery adoption translates
into the definition of pipelines. A pipeline is a serial execution of tasks
that encompasses in the first place the SQA jobs (CI phase) and adds as the
second part (CD phase) the building and deployment testing of the software
packages created. The pipeline only succeeds if each task is run to completion,
otherwise the process is stopped and set as a build failure.
As a result, in one pipeline execution, the source code is validated and
packaged automatically, with the only manual intervention of approving and
marking the resulting packages as production-ready. As shown in Figure~\ref{fig:4},
these software pipelines where defined in the Jenkins CI service
and differ in the number of tasks depending on the packaging type being
produced: tarballs, RPM/DEB packages and/or Docker images.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{Figure15.pdf}
\caption{DevOps pipeline to distribute Ubuntu Xenial and CentOS7 packages for cloud-info-provider product}
\label{fig:4}
\end{figure}
\subsection{DevOps adoption from user communities}
The experience gathered throughout the project with regards to the adoption of different DevOps
practices is not only useful and suitable for the software related to the core services in the
INDIGO-DataCloud solution, but also applicable to the development and distribution of the applications coming from the user communities.
As an example, two applications DisVis and PowerFit, were integrated into a similar CI/CD pipeline
described above. As it can be seen in the Figure~\ref{fig:5}, with this pipeline in place the application
developers were provided with both a means to validate the source code before merging and the creation of a
new versioned Docker image, automatically available in the INDIGO-DataCloud???s catalogue for applications i.e. DockerHub???s {\tt indigodatacloudapps} repository.
The novelty introduced in the pipeline above is the validation of the user application. Once the application
is deployed as a Docker container, and subsequently uploaded to {\tt indigodatacloudapps} repository,
it is instantiated in a new container to be validated. The application is then executed and the results
compared with a set of reference outputs. Thus this pipeline implementation goes a step forward by testing
the application execution for the last available Docker image in the catalogue.
\begin{figure}
\centering
\includegraphics[width=\textwidth]{./figs/Figure16.pdf}
\caption{DisVis development workflow using a CI/CD approach}
\label{fig:5}
\end{figure}
\section{Conclusions}
Thanks to the new common solutions developed by the INDIGO project, teams of first-line
researchers in Europe are using public and private Cloud resources to get new results in Physics, Biology, Astronomy, Medicine, Humanities and other disciplines.
The reliability and quality of the developed software has been substantially improved by applying the SQA
criteria in the development stage. Consequently, each change in the source code
that was meant to be integrated in the production version, was validated according
to the SQA testing requirements. Automation was key in this process, allowing the
implementation of a CI scenario that enforces the SQA criteria compliance. The application of more advanced DevOps practices
resulted in the implementation of CD pipelines to extend automation to the delivery of packages.
Thereby, safe and frequent software releases were doable, reducing
the impact on production systems. In this regard, a pre-production validation,
through the preview testbeds or resource centers early adoption -- handled by the
Staged--rollout process -- was also implemented, alleviating the likelihood of discovering uncovered
issues in production environemnts.
The outcomes of INDIGO-DataCloud will persist, and also be extended, after the end of the project because. INDIGO was one of the three coordinating projects (together with EGI and EUDAT) in the preparation of the EOSC-hub proposal submitted to the EC H2020 EINFRA-12 call; the main goal of this recently approved project is to contribute to the EOSC implementation by enabling seamless and open access to a system of research data and services provided across nations and multiple disciplines.
Two additional Horizon 2020 projects were also approved ({\tt DEEP HybridDataCloud} and {\tt eXtreme DataCloud}), that will continue to develop and enhance INDIGO components to provide new and exciting innovative services, committed to extend and enforce the DevOps culture, by implemented the steps of Continuos Deployment and Monitoring.
In particular the CNAF team has the leadership of the Software Maintenance and Support activities in both project, and will contribute not only with the coordination of all the activities in this filed but also with the deployment of the needed infrastructures, CI and CD, leveraging the Cloud@CNAF.
\section*{Acknowledgments}
INDIGO-Datacloud has been funded by the European Commision H2020 research and innovation program under grant agreement RIA 653549.
\section*{References}
\begin{thebibliography}{9}
\bibitem{os} OpenStack, {\it http://www.OpenStack.org/}
\bibitem{os-juno} Rundeck, {\it http://rundeck.org/}
\bibitem{ocp} OpenCityPlatform (OCP), {\it http://opencityplatform.eu/}
\bibitem{indigo} INDIGO - DataCloud, {\it https://www.indigo-datacloud.eu/}
\bibitem{fore} Foreman, {\it https://theforeman.org/}
\bibitem{sensu} Sensu, {\it https://sensuapp.org/}
\bibitem{grafana} Grafana, {\it https://grafana.com/}
\bibitem{influx} InfluxDB, {\it https://www.influxdata.com/}
\bibitem{rally} OpenStack Rally, {\it https://wiki.openstack.org/wiki/Rally}
\bibitem{mesos} Apache Mesosphere (Mesos \& Marathon),
{\it https://www.digitalocean.com/community/tutorials/an-introduction-to-mesosphere}
\bibitem{htcondor} HTCondor, {\it https://research.cs.wisc.edu/htcondor/}
\end{thebibliography}
\end{document}
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment