\documentclass[a4paper]{jpconf}

\usepackage{url}
\usepackage{graphicx}
\usepackage{float}

\newcommand{\quotes}[1]{``#1''}

\begin{document}

\title{StoRM maintenance and evolution} 

\author{
  A. Ceccanti,
  E. Vianello,
  F. Giacomini
}

\address{INFN-CNAF, Bologna, IT}

\ead{
  andrea.ceccanti@cnaf.infn.it
}

\begin{abstract}
  StoRM is the storage element solution that powers the CNAF Tier-1 data center as well as more than 30 other sites. In this contribution, we highlight the main maintenance and evolution activities on StoRM during 2018.
\end{abstract}

\section*{Introduction}
\label{sec:introduction}


StoRM~\cite{storm} is a lightweight storage resource manager (SRM) solution developed at INFN-CNAF which powers the CNAF Tier-1 data center as well as more than 30 other sites. 

StoRM implements the SRM version 2.2~\cite{srm-2.2} data management specification and  is typically deployed on top of a cluster file system like IBM GPFS~\cite{gpfs}.

StoRM has a layered architecture (Figure~\ref{fig:storm-arch}), split between two main components: the StoRM frontend and backend services.
The StoRM frontend service implements the SRM interface exposed 
to client applications and frameworks. 
The StoRM backend service implements the actual storage management logic by interacting directly with the underlying file system. 
Communication between the frontend and the backend happens in two ways: 
\begin{itemize}
    \item via an XML-RPC api, for synchronous requests;
    \item via a database, for asynchronous requests.
\end{itemize}

Data transfer is provided by GridFTP, HTTP and XRootD services accessing directly the file system underlying the StoRM deployment.

StoRM is interfaced with the IBM Tivoli Storage Manager (TSM) via GEMSS~\cite{gemss}, a component also developed at INFN, to provide optimized  data archiving and tape recall functionality. The StoRM WebDAV service provides an alternative data management interface complementary to the SRM functionality, but which does not yet support tape operations. 
An high level representation of the StoRM architecture is given in Figure~\ref{fig:storm-arch}.

During 2018, two StoRM releases where produced:

\begin{itemize}
    \item StoRM 1.11.13~\cite{storm-1.11.13}, released on February, 19th, providing updates for the StoRM backend, YAIM module and the info provider;
    \item StoRM 1.11.14~\cite{storm-1.11.14}, released on July, 25th, providing updates for the frontend and backend services, StoRM native and xmlrpc libraries, the GridFTP DSI module and the YAIM module.
\end{itemize}

The following paragraphs describe the main StoRM maintenance and evolution activities that resulted in the above releases and in pre-release packages made available to the CNAF Tier-1 and other interested sites during 2018.

\begin{figure}
    \centering
    \includegraphics[width=.6\textwidth]{storm-arch.png}
    \caption{\label{fig:storm-arch}The StoRM high level architecture.}
\end{figure}

\section*{StoRM frontend stability improvements}

After observing repeated failures that resulted in the death of the StoRM frontend process in production at Tier-1, an investigation was started
to understand the cause of the failures and provide a fix to improve
the service stability.
The failures occurred mainly when an high number of requests was observed on the frontend. Enabling core dumping did not provide much information, besides the fact that the segfault occurred mostly in the XMLRPC serializiation/deserialization logic, and was likely caused by stack corruption. What precisely caused the stack corruption however was not 
understood.

In order to contain the problem, the following improvements were
implemented:

\begin{itemize}
    \item a configurable limit on the size of the request queue
    on the frontend was implemented;
    \item information about the request queue size and the number of
    active requests was added to the frontend log, in order to monitor the queue processing status in real time;
    \item the logic of the XMLRPC interaction between frontend and backend
    has been refactored in order to use the xmlrpc synchronous API (the former use of the asynchronous API only complicated the code base without providing increased concurrency or throughput);
    \item a configurable limit on the size of the threadpool serving XMLRPC requests has been introduced on the backend;
    \item a configurable limit on the size of the queue of the XMLRPC requests has been introduced on the backend; 
    \item our load test suite was tuned to generate a load comparable with
    the one observed in production for the ATLAS experiment.
\end{itemize}

These improvements, and appropriate configuration, restored the frontend
service stability: no more crashes were observed in production, even during peak load periods.

\section*{JSON storage usage record reporting}

In consultation with all the LHC experiments, the WLCG storage providers (dCache~\cite{dcache}, DPM~\cite{dpm}, EOS~\cite{eos}, StoRM, XRootD~\cite{xrootd}) drafted a proposal for storage resource reporting in WLCG~\cite{storage-resource-reporting-proposal}.
This document proposes five requirements:
\begin{itemize}
    \item \texttt{R0}: storage systems should provide the total used space and the list of files stored (no other meta-data required);
    \item \texttt{R1}: storage systems should provide the total used and total free space for all distinct space quotas available to the experiment through a non-SRM protocol (GridFTP, HTTP or XRootD) and with ten minutes as data freshness order and tens of GB as volume accuracy;
    \item \texttt{R2}: storage systems should provide a public summary file indicating the “topology” of the system and usage information;
    \item \texttt{R3}: storage systems should provide the total used and total free space on sub-directories, in particular any entity on which a restrictive quota has been applied;
    \item \texttt{R4}: storage systems should provide a full storage dump with file information such as size, creation time and check-sum value.
\end{itemize}

Requirement \texttt{R3} has been withdrawn as no experiment supported its inclusion.
Requirement \texttt{R4} was already supported through WebDAV with a detailed and recursive \texttt{PROPFIND} operation.

In order to comply with the requirements \texttt{R0}, \texttt{R1} and \texttt{R2}, the following improvements were introduced in February 2018:
\begin{itemize}
    \item the backend REST endpoint used to generate the list of configured storage areas and their usage status now produces a JSON response instead of plain text; 
    \item a new command, \texttt{get-report-json}, has been added to the info provider to generate a JSON site report file, with a configurable target location.
\end{itemize}

To fulfill requirement \texttt{R2}, the Tier-1 StoRM ATLAS production instance has been configured to expose,
via the StoRM WebDAV service, the JSON usage report in a storage area accessible by any client presenting a trusted X.509 certificate.

\section*{Backend improved starting logic}

Aiming to improve StoRM Backend service starting logic, an important re-factoring work has been done on the start-up source code and on the \texttt{init.d} scripts.
Before the re-factoring, each start-up of StoRM Backend service was divided into several running processes making even the kill of the service too much complex than necessary. 
After the re-factoring only one process can be seen from command line and all the useless arguments has been removed. That has meant also a relevant improve of service start-up speed.

Comparing the start-up before the re-factoring:

\begin{verbatim}
$ time sh start-storm.sh 
Bootstrapping storm-backend-server                         [  OK  ]
Starting storm-backend-server                              [  OK  ]

real	0m20.495s
user	0m0.122s
sys	0m0.140s
\end{verbatim}

and after:

\begin{verbatim}
$ time sh start-storm.sh 
Starting storm-backend-server:                             [  OK  ]

real	0m5.217s
user	0m0.083s
sys	0m0.078s
\end{verbatim}

we can see that the new boot speed is about 4 times faster. 

\section*{WebDAV third-party transfers support}

At the end of May 2017 the Globus Alliance announced that the Open source 
Globus toolkit would be no longer supported by the Globus team at the University of Chicago~\cite{globus-end-of-support}. This announcement had obvious impact on WLCG, since the Globus Security Infrastructure (GSI) and GridFTP lie at the core of the WLCG data management infrastructure, and discussions started in the appropriate forums on the search for alternatives. The DOMA Third party copy Working Group~\cite{doma-tpc} was established to investigate alternatives to the GridFTP protocol for bulk transfers across WLCG sites. This led to a requirement for all storage element implementations to support either WebDAV-based or XrootD-based third-party transfers.

In order to comply with the requirement, the following improvements were introduced in the StoRM WebDAV service in November 2018:
\begin{itemize}
    \item The WebDAV service was migrated to the latest stable Spring boot libraries~\cite{spring-boot};
    \item Token-based delegation and authorization was introduced, by adding support for external OpenID Connect~\cite{oidc} providers and by introducing an internal OAuth~\cite{oauth} authorization server that can be used to issue tokens to client authenticated with VOMS credentials;
    \item the semantic of the WebDAV \texttt{COPY} method was extended to implement third-party transfers;
    \item a significant refactoring of the robot test suite was implemented, by moving the test suite code in the server repository and simplifying credential management. The refactoring resulted in improved usability, performance and error reporting.
\end{itemize}

A pre-release package of the updated StoRM WebDAV package was
deployed at CNAF Tier-1 for the ATLAS WebDAV production instance and added successfully to the DOMA TPC testbed where it showed to work reliably.

The initial deployment also highlighted minor issues which were solved, and lead to the final release of the StoRM WebDAV 1.1.0 release in Februrary 2019.

\begin{figure}
    \centering
    \includegraphics[width=.6\textwidth]{tpc.png}
    \caption{\label{fig:tpc}A WebDAV push-mode third-party transfer managed by CERN File Transfer Service (FTS) against two storage elements.}
\end{figure}

\section*{Conclusions and future work}
In this contribution, we presented the main development and evolution activities performed on StoRM during 2018. Besides ordinary maintenance, in 2019 we will focus on porting StoRM 1
to CENTOS 7 and in replacing the current YAIM-based configuration code~\cite{yaim} with a Puppet module~\cite{puppet}.
\section*{References}

\bibliographystyle{iopart-num}
\bibliography{biblio}

\end{document}