Skip to content
Snippets Groups Projects
main.tex 16.2 KiB
Newer Older
  • Learn to ignore specific revisions
  • enricovianello's avatar
    enricovianello committed
    \documentclass[a4paper]{jpconf}
    
    \usepackage{url}
    \usepackage{graphicx}
    \usepackage{float}
    
    \newcommand{\quotes}[1]{``#1''}
    
    \begin{document}
    
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    \title{StoRM 2: initial design and development activities} 
    
    enricovianello's avatar
    enricovianello committed
    
    \author{
    
      A.~Ceccanti$^1$,
      F.~Giacomini$^1$,
      E.~Vianello$^1$,
      E.~Ronchieri$^1$
    
    \address{$^1$ INFN-CNAF, Bologna, IT}
    
    enricovianello's avatar
    enricovianello committed
    
    \ead{
      andrea.ceccanti@cnaf.infn.it
    }
    
    \begin{abstract}
    
    Lucia Morganti's avatar
    Lucia Morganti committed
      StoRM is the storage element solution that powers the CNAF Tier 1
    
    enricovianello's avatar
    enricovianello committed
      data center as well as more than 30 other sites.  Experience in
      developing, maintaining and operating it at scale suggests that a
      significant refactoring of the codebase is necessary to improve
      StoRM maintainability, reliability, scalability and ease of
      operation in order to meet the data management requirements coming
    
    Lucia Morganti's avatar
    Lucia Morganti committed
      from HL-LHC and other communities served by the CNAF Tier 1 data
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
      center.  In this contribution we highlight the initial StoRM 2
    
    enricovianello's avatar
    enricovianello committed
      design and development activities.
    \end{abstract}
    
    \section{Introduction}
    \label{sec:introduction}
    
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    StoRM was first developed by a joint collaboration between INFN-CNAF, CERN and
    ICTP to provide a lightweight storage element solution implementing the
    SRM~\cite{ref:srm} interface on top of a POSIX filesystem. StoRM has a layered
    architecture (Figure~\ref{fig:storm-arch}), split between two main components:
    the StoRM frontend and backend services. The StoRM frontend service implements
    the SRM interface exposed to client applications and frameworks. The StoRM
    backend service implements the actual storage management logic by interacting
    directly with the underlying file system.
    
    Communication between the frontend and the backend services happens in two ways:
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    \begin{itemize}
        \item via an XML-RPC API, for synchronous requests;
        \item via a database, for asynchronous requests.
    \end{itemize}
    
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    Data transfers are provided by GridFTP, HTTP and XRootD services accessing
    directly the file system underlying the StoRM deployment.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    
    StoRM is interfaced with the IBM Tivoli Storage Manager (TSM) via
    GEMSS~\cite{ref:gemss}, a component also developed at INFN, to provide optimized
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    data archiving and tape recall functionality.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    
    The StoRM WebDAV service provides an alternative data management interface
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    complementary to the SRM functionality, albeit without supporting tape
    operations yet.
    
    Lucia Morganti's avatar
    Lucia Morganti committed
    In the past years StoRM has powered the CNAF Tier 1 data center as well as
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    dozens of other sites and proved to be a reliable SRM implementation. However,
    ten years of experience in developing and operating the service at scale has
    also shown limitations:
    
    enricovianello's avatar
    enricovianello committed
    
    \begin{itemize}
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
     
        \item The StoRM code base is not unit-tested; this means that there is no
          quick feedback loop that functionality is not broken when a change is
          introduced or a refactoring is implemented; there are integration and load
          test suites that can be used to assess that functionality is not broken,
          but these test suites are more complex to instantiate, require a full
          service deployment and do no provide coverage information.
    
        \item Data management responsibilities are scattered among several
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
          components without clear reasons, increasing maintenance and developments
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
          costs.
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
        \item The StoRM backend cannot be horizontally replicated; this causes
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
          operational problems in production and limits scalability and the ability
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
          to adapt dynamically to load changes.
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
        \item Logging is not harmonized among the StoRM services and limited
          tracing is provided, so that it is not trivial to trace the history of an
          incoming request across the services.
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
        \item Core StoRM communication and authentication functionality relies on
          dated technologies and libraries (e.g., XML-RPC, CGSI-gSOAP);
    
        \item The codebase is significantly more complex than needed due to the
          inorganic growth and lack of periodic quality assessment performed on the
          code base.
    
    enricovianello's avatar
    enricovianello committed
    \end{itemize}
    
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    To address these shortcomings, a redesign of the StoRM service has been planned
    and started this year, in parallel with the main StoRM maintenance and
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    development activities.
    
    \begin{figure}
        \centering
        \includegraphics[width=.6\textwidth]{storm-arch.png}
        \caption{\label{fig:storm-arch}The StoRM 1 architecture.}
    \end{figure}
    
    
    enricovianello's avatar
    enricovianello committed
    \section{StoRM 2 high-level architecture}
    
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    The StoRM 2 architecture is depicted in Figure~\ref{fig:storm2-arch}.
    
    enricovianello's avatar
    enricovianello committed
    
    \begin{figure}
        \centering
        \includegraphics[width=.6\textwidth]{high-level-arch.png}
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
        \caption{\label{fig:storm2-arch}The StoRM 2 high-level architecture.}
    
    enricovianello's avatar
    enricovianello committed
    \end{figure}
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    
    The layered architecture approach is maintained, so that service logic is again
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    split between frontend and backend service components.
    
    The frontend responsibility is to implement the interfaces towards the outside
    world. In practice, the frontend is implemented by multiple microservices,
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    each responsible of a specific interface (SRM, WebDAV, etc.).
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    TLS termination and client authentication is implemented at the edge of the
    service perimeter by one (or more) Nginx reverse proxy instances. There are
    several advantages in this approach:
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    \begin{itemize}
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
      \item The TLS handling load is decoupled from request management load.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    
      \item VOMS-related configuration and handling is centralized to a single
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
        component, leading to simplified service operation and troubleshooting.
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
      \item The TLS terminator becomes a natural place to implement load balancing
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
        for the frontend services.
    
    \end{itemize}
    
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    VOMS authorization support is provided by an Nginx VOMS module
    ~\cite{ref:nginx-voms} developed for this purpose and described in more detail
    in another contribution in this report.
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    Besides implementing the management protocol endpoints, the frontends expose other
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    management and monitoring interfaces that can be consumed by internal services and
    may use a relational or in-memory database to persist state information in support
    of request management and accounting.
    
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    Frontends do not directly interact with the storage, but delegate the
    interaction to a backend service.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    
    The backend is a stateless service that implements basic management operations on the
    storage. The storage management operations implemented are the minimum set of
    operations needed to support the data management interfaces exposed by the
    frontends. These operations are typically either data object lifecycle
    operations (e.g., create or remove a file or a directory, list directory contents) or
    metadata operations (e.g., get the size of a file, manage ACLs).
    
    The communication between the frontend and the backend services is implemented
    on top of gRPC~\cite{ref:grpc}, a remote procedure call system initially
    developed at Google. The actual messages exchanged between them are
    synthesized from a description expressed in an interface description language
    called \textit{Protocol Buffers}~\cite{ref:protocol-buffers}; from the same
    message description, language-specific client and server stubs are generated. As
    an example, the following listing shows the description of the messages and of
    the service involved in the simple case of the \textit{version} command.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    {\small
    \begin{verbatim}
    message VersionRequest {
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
      // The version of the client calling the service.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
      string version = 1;
    }
    
    message VersionResponse {
      // The version of the service answering the call
      string version = 1;
    }
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    service VersionService {
      rpc getVersion(VersionRequest) returns (VersionResponse);
    }
    \end{verbatim}
    }
    
    enricovianello's avatar
    enricovianello committed
    
    \section{Principles guiding the development work}
    
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    The following principles have driven the StoRM 2 development work.
    
    enricovianello's avatar
    enricovianello committed
    
    \begin{itemize}
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
        \item The source code will be kept in a Git repository hosted on the INFN
          Gitlab service; the development will follow a branching model inspired
          at Git-workflow~\cite{ref:gitflow} and already successfully used for other
          components developed by the team (e.g., VOMS, INDIGO IAM, StoRM).
    
        \item Rhe code for all main components (frontend and backend services,
          CLIs, etc.) will be hosted on a single repository and a single version number
          will be shared for all the components.
    
        \item A test-driven development approach will be followed, using tools that
          allow to measure the test coverage of the codebase. The objective is to
          ensure high coverage ($>90\%$) on all code.
    
        \item Whenever possible, the code should be self-documenting; the source code folder
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
          structure will be documented with README.md files providing a
          description of each folder contents; a CHANGELOG file will provide
          information of new features and bug fixes following established
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
          industry best practices~\cite{ref:keep-a-changelog}.
    
        \item The development and testing environment will be containerized, in
          order to ensure a consistent environment definition and avoid "works on my
          machine" issues.
    
        \item Services should provide monitoring and metrics endpoints to enable the
          collection of status information and performance metrics.
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
        \item Service should support graceful shutdown and draining.
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
        \item A CI pipeline will be in place, to build and test continuously the code.
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
        \item A consistent configuration and logging format will be adopted across
          all the components, to make service operations easier and simplify log
          files interpretation, aggregation and management.
    
        \item Support for request traceability will be part of the system since its
          inception.
    
    enricovianello's avatar
    enricovianello committed
    \end{itemize}
    
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    The development of StoRM 2 will be organized in SCRUM-like sprints, where each
    sprint will be roughly 4-5 weeks long.
    
    The output of each sprint should be a deployable instance of the services
    implementing a subset of the whole foreseen StoRM 2 functionality.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    
    \section{The build and test environment}
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    The build environment heavily relies on container technology~\cite{ref:docker},
    both to guarantee full build and test reproducibility and to offer a common
    reference platform for development.
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    Since the code for all components is kept in a single git repository, we have
    also opted for a single Docker image to build everything, containing all the
    needed build tools (compilers, unit testing frameworks, static and dynamic
    analyzers, external dependencies, etc.). The resulting image is large but still
    manageable and having one image simplifies the operations.
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    There are also a couple of other Docker images: one is a specialization of the
    build image mentioned above and is dedicated to the build of the Nginx VOMS
    module; the other is an image with client tools used during integration testing.
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    All the image Dockerfiles are kept in a single repository, under continuous
    integration, so that every time there is a change the images are rebuilt.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    \section{The StoRM 2 frontend component}
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    The StoRM 2 frontend is composed of a set of stateless Spring Boot 2
    applications written in Java that implement the management protocol endpoints,
    such as SRM~\cite{ref:srm} and WebDAV~\cite{ref:webdav}. The frontend services
    maintain state in an external database.
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    The main frontend responsibilities are to:
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    \begin{itemize}
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    
        \item implement consistent authorization, taking as input the
          authentication information exposed by the Nginx TLS terminator and
          matching this information with a common authorization policy;
    
        \item implement request validation and management, i.e.,
          protocol-specific management of request queuing as well as conflict
          handling;
    
        \item translate protocol-specific requests to a set of basic storage
          management operations executed by the backend and exposed via a set of
          gRPC services;
    
        \item provide service management and observability endpoints, to allow
          administrators to get information about the requests currently being
          serviced by the system, drain the service or manually force request status
          transitions.
    
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    \end{itemize}
    
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    The first frontend service developed in StoRM 2 focuses on the SRM interface,
    and at the time of this writing implements support for the SRM \textit{ping} and
    \textit{ls} methods.
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    In the initial development sprints, significant work has been devoted to ensure
    the testability of the frontend component in isolation, by leveraging the
    powerful testing support provided by Spring~\cite{ref:spring} and the gRPC
    frameworks.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    
    \section{The StoRM 2 backend component}
    
    The StoRM 2 backend is a gRPC server that provides multiple
    
    enricovianello's avatar
    enricovianello committed
    services. One service responds to \textit{version} requests. Another
    service responds to storage-related requests, which represent the main
    scope of StoRM. In general there is no direct, one-to-one mapping
    between SRM requests arriving at the frontend and requests addressed
    to the backend; rather, these represent building blocks that the
    frontend can compose in order to prepare the responses to SRM clients.
    
    Among the storage requests addressed to the backend, at the moment
    only a couple are implemented: \textit{ls}, in its multiple variations
    (for a file or a directory, recursive, up to a given depth, etc.),
    returns information about files and directories; \textit{pin},
    \textit{unpin} and \textit{pin status} manage the
    \verb|user.storm.pinned| attribute of filesystem entities, which is
    essential for the implementation of the more complex
    \textit{srmPrepareToGet} SRM request.
    
    All the backend requests are currently blocking: a response is sent
    back to the frontend only when the request has been fully processed.
    
    The backend also incorporates sub-components of more general utility
    to operate on Filesystem Extended Attributes and POSIX Access Control
    Lists~\cite{p1003.1e}, adding a layer of safety and expressivity on
    top of the native C APIs. They allow to define attributes and ACLs
    respectively and to apply them to or read them from filesystem
    entities.
    
    For example the following sets the attribute \verb|user.storm.pinned|
    of file \verb|myFile.txt| to the pin duration:
    
    {\small
    \begin{verbatim}
    set_xattr(
      storage_dir / "myFile.txt",
      StormXAttrName{"pinned"},
      XAttrValue{duration}
    );
    \end{verbatim}
    }
    
    The following instead extends the ACL currently assigned to
    \verb|myFile.txt| with some additional entries:
    
    {\small
    \begin{verbatim}
    add_to_access_acl(
      storage_dir / "myFile.txt",
      {
        {User{"storm"}, Perms::Read | Perms::Write},
        {Group{"storm"}, Perms::Read},
        {other, Perms::None}
      }
    );
    \end{verbatim}
    }
    
    The backend is implemented in C++, in the latest standard version
    supported by the toolset installed in the reference platform
    (currently C++17). The build system is based on CMake.
    
    The backend relies on some other third-party dependencies, the most
    important being for interaction with the filesystem (Boost
    Filesystem~\cite{ref:boost.fs}), for logging (Boost
    Log~\cite{ref:boost.log}) and for handling configuration
    (yaml-cpp~\cite{ref:yaml-cpp}).
    
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    \section{Test suite and continuous integration}
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    The test suite is based on the Robot Framework~\cite{ref:rf} and is typically
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    run in a Docker container. A deployment test pipeline~\cite{ref:glcip} runs on
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    our Gitlab-based continuous integration (CI) system every night (and after any
    commit on the master branch) to instantiate the main StoRM 2 services and
    execute the SRM testsuite. The reports of the test suite execution are archived
    and published on the Gitlab CI dashboard. Services and the test suite are
    orchestrated using Docker Compose~\cite{ref:dc}. This approach provides an
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    intuitive, self-contained testing environment deployable on the CI system and on
    the developers workstations.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    
    The test deployment mirrors the architecture shown in
    Figure~\ref{fig:storm-arch}, with clients and services placed in different
    docker networks to mimic a real-life deployment scenario.
    
    enricovianello's avatar
    enricovianello committed
    
    \section{Conclusions and future work}
    
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    In this contribution we have described the initial design and development
    activities performed during 2018 on StoRM 2, the next incarnation of the StoRM
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    storage management system.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    
    The main objectives of the StoRM refactoring is to improve the service
    scalability and manageability in order to meet the data management requirements
    of HL-LHC. The initial work of this year focused  on choosing tools,
    
    Francesco Giacomini's avatar
    Francesco Giacomini committed
    methodologies and approach with a strong emphasis on software quality.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    In the future we will build on this groundwork to provide a full replacement
    for the existing StoRM implementation. The lack of dedicated manpower for this
    activity makes it hard to estimate when StoRM 2 will be ready to be deployed in
    production.
    
    enricovianello's avatar
    enricovianello committed
    
    \section*{References}
    
    \bibliographystyle{iopart-num}
    \bibliography{biblio}
    
    \end{document}