Skip to content
Snippets Groups Projects
main.tex 16.1 KiB
Newer Older
  • Learn to ignore specific revisions
  • enricovianello's avatar
    enricovianello committed
    \documentclass[a4paper]{jpconf}
    
    \usepackage{url}
    \usepackage{graphicx}
    \usepackage{float}
    
    \newcommand{\quotes}[1]{``#1''}
    
    \begin{document}
    
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    \title{StoRM 2: initial design and development activities} 
    
    enricovianello's avatar
    enricovianello committed
    
    \author{
      A~Ceccanti,
      F~Giacomini,
      E~Vianello and
      E~Ronchieri
    }
    
    \address{INFN-CNAF, Bologna, IT}
    
    \ead{
      andrea.ceccanti@cnaf.infn.it
    }
    
    \begin{abstract}
      StoRM is the storage element solution that powers the CNAF Tier-1
      data center as well as more than 30 other sites.  Experience in
      developing, maintaining and operating it at scale suggests that a
      significant refactoring of the codebase is necessary to improve
      StoRM maintainability, reliability, scalability and ease of
      operation in order to meet the data management requirements coming
      from HL-LHC and other communities served by the CNAF Tier-1 data
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
      center.  In this contribution we highlight the initial StoRM 2
    
    enricovianello's avatar
    enricovianello committed
      design and development activities.
    \end{abstract}
    
    \section{Introduction}
    \label{sec:introduction}
    
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    StoRM was first developed between INFN-CNAF, CERN and Trieste's ICTP, to provide a
    lightweight storage element solution implementing the SRM~\cite{ref:srm} interface
    on top of a POSIX filesystem. 
    StoRM has a layered architecture (Figure~\ref{fig:storm-arch}), split between
    two main components: the StoRM frontend and backend services. 
    The StoRM frontend service implements the SRM interface exposed to client
    applications and frameworks. 
    
    The StoRM backend service implements the actual storage management logic by
    interacting directly with the underlying file system. Communication between the
    frontend and the backend happens in two ways: 
    \begin{itemize}
        \item via an XML-RPC API, for synchronous requests;
        \item via a database, for asynchronous requests.
    \end{itemize}
    
    Data transfer is provided by GridFTP, HTTP and XRootD services accessing
    directly the file system underlying the StoRM deployment. 
    
    StoRM is interfaced with the IBM Tivoli Storage Manager (TSM) via
    GEMSS~\cite{ref:gemss}, a component also developed at INFN, to provide optimized
    data archiving and tape recall functionality. 
    
    The StoRM WebDAV service provides an alternative data management interface
    complementary to the SRM functionality, but which does not yet support tape
    operations. 
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    In the past years, StoRM has powered the CNAF Tier-1 data center as well as
    dozens of other sites and proved to be a reliable SRM implementation.
    However, 10 years of experience in developing and operating the service at
    scale has also shown limitations:
    
    enricovianello's avatar
    enricovianello committed
    
    \begin{itemize}
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
        \item the StoRM code base is not unit-tested; this means that there is no
          quick feedback loop anytime a change is introduced or a refactoring is
          implemented that functionality is not broken by the latest changes; there
          are integration and load test suites that can be used to assess that
          functionality is not broken, but these test suites are more complex to
          instantiate, require a full service deployment and do no provide coverage
          information;
    
        \item data management responsibilities are scattered among several
          components without clear reasons, increasing maintenance and developments
          costs;
    
        \item the StoRM backend cannot be horizontally replicated; this causes
          operational problems in production and limits scalability and the ability
          to adapt dynamically to load changes;
    
        \item Logging among the StoRM services is not harmonized, and limited
          tracing is provided, so that is not trivial to trace the history of an
          incoming request across the services;
    
        \item core StoRM communication and authentication functionality relies on dated technologies and libraries 
            (e.g., XML-RPC, CGSI-gSOAP);
    
        \item the codebase is significantly more complex than needed due to the inorganic growth and 
         lack of periodic quality assessment performed on the code base.
    
    enricovianello's avatar
    enricovianello committed
    \end{itemize}
    
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    To address these shortcomings, a  redesign of the StoRM service has been planned
    and started this year in parallel with the main StoRM maintenance and
    development activities.
    
    \begin{figure}
        \centering
        \includegraphics[width=.6\textwidth]{storm-arch.png}
        \caption{\label{fig:storm-arch}The StoRM 1 architecture.}
    \end{figure}
    
    
    enricovianello's avatar
    enricovianello committed
    \section{StoRM 2 high-level architecture}
    
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    The StoRM 2 architecture is depicted in Figure~\ref{fig:storm2-arch}.
    
    enricovianello's avatar
    enricovianello committed
    
    \begin{figure}
        \centering
        \includegraphics[width=.6\textwidth]{high-level-arch.png}
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
        \caption{\label{fig:storm2-arch}The StoRM 2 high-level architecture.}
    
    enricovianello's avatar
    enricovianello committed
    \end{figure}
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    The layered architecture apporach is maintained, so that service logic is again
    split between frontend and backend service components.
    
    The frontend responsibility is to implement the interfaces towards the outside
    world. In practice, the frontend is implemented by multiple microservices,
    each one responsible of a specific interface (SRM, WebDAV, etc.).
    
    TLS termination and client authentication is implemented at the edge of the service 
    perimeter by one (or more) Nginx reverse proxy instances.
    There are several advantages in this approach:
    
    \begin{itemize} 
    
      \item the TLS handling load is decoupled from request management load;
    
      \item VOMS-related configuration and handling is centralized to a single
        component, leading to simplified service operation and troubleshooting;
    
      \item the TLS terminator becomes a natural place to implement load balancing
        for the frontend services.
    
    \end{itemize}
    
    VOMS authorization support is provided by Nginx VOMS module
    ~\cite{ref:nginx-voms} developed for this purpose and described in more detail in another contribution
    in this report.
    
    Besides implementing the maagement protocol endpoints, the frontends expose other
    management and monitoring interfaces that can be consumed by internal services and
    may use a relational or in-memory database to persist state information in support
    of request management and accounting.
    
    Frontends do not directly interact with the storage, but delegate the interaction to a backend service.
    
    The backend is a stateless service that implements basic management operations on the
    storage. The storage management operations implemented are the minimum set of
    operations needed to support the data management interfaces exposed by the
    frontends. These operations are typically either data object lifecycle
    operations (e.g., create or remove a file or a directory, list directory contents) or
    metadata operations (e.g., get the size of a file, manage ACLs).
    
    The communication between the frontend and the backend services is implemented
    on top of gRPC~\cite{ref:grpc}, a remote procedure call system initially
    developed at Google. The actual messages exchanged between them are
    synthesized from a description expressed in an interface description language
    called \textit{Protocol Buffers}~\cite{ref:protocol-buffers}; from the same
    message description, language-specific client and server stubs are generated. As
    an example, the following listing shows the description of the messages and of
    the service involved in the simple case of the \textit{version} command.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    {\small
    \begin{verbatim}
    message VersionRequest {
      // The version of the client calling the service. 
      string version = 1;
    }
    
    message VersionResponse {
      // The version of the service answering the call
      string version = 1;
    }
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    service VersionService {
      rpc getVersion(VersionRequest) returns (VersionResponse);
    }
    \end{verbatim}
    }
    
    enricovianello's avatar
    enricovianello committed
    
    \section{Principles guiding the development work}
    
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    The following principles have driven the StoRM 2 development work.
    
    enricovianello's avatar
    enricovianello committed
    
    \begin{itemize}
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    
        \item The source code will be hosted on a Git repository hosted on the INFN
    
          Gitlab service, and the development will follow the Git-workflow~\cite{ref:gitflow} inspired
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
          branching model used with success for other components developed by the
          team (e.g., VOMS, INDIGO IAM, StoRM);
          
        \item the code for all main components (frontend and backend services,
          CLIs, etc...) will be hosted on a single repository and a single version number
          will be shared for all the components;
    
        \item a test-driven development approach will be followed, using tools that allow 
          to measure the test coverage of the codebase. The objective is to ensure high coverage
           ($>90\%$) on all code;
           
        \item whenever possible, the code should be self-documenting; the source code folder
          structure will be documented with README.md files providing a
          description of each folder contents; a CHANGELOG file will provide
          information of new features and bug fixes following established
          industry best practices~\cite{ref:keep-a-changelog};
    
        \item the development and testing environment will be containerized, in order to ensure
          a consistent environment definition and avoid "works on my machine" issues; 
          
        \item services should provide monitoring and metrics endpoints to enable the collection
        of status information and performance metrics;
        
        \item service should support graceful shutdown and draining;
    
        \item a CI pipeline will be in place, to build and test continuously the code;
    
        \item a consistent configuration and logging format will be adopted across all the components, to
        make service operations easier and simplify log files interpretation, aggregation and management;
    
        \item support for request traceability will be part of the system since its inception.
    
    enricovianello's avatar
    enricovianello committed
    \end{itemize}
    
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    The development of StoRM 2 will be organized in SCRUM-like sprints, where each sprint will be roughly 4-5 weeks long.
    The output of each sprint should be a deploy-able instance of the services implementing a subset of the whole
    foreseen StoRM 2 functionality.
    
    \section{The build and test environment}
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    The build environment heavily relies on container technology~\cite{ref:docker}, both to guarantee full build and test reproducibility and to offer a common reference platform for development.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    Since the code for all components is kept in a single git repository, we have also opted for a single Docker image to build everything, containing all the needed build tools (compilers, unit testing frameworks, static and dynamic analyzers, external dependencies, etc.). The resulting image is large but still manageable and having one image simplifies the operations.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    There are also a couple of other Docker images: one is a specialization of the build image mentioned above and is dedicated to the build of the Nginx VOMS module; the other is an image with client tools used during integration testing.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    All the image Dockerfiles are kept in a single repository, under continuous integration, so that every time there is a change the images are rebuilt.
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    \section{The StoRM 2 frontend component}
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    The StoRM 2 frontend is composed of a set of stateless Spring Boot 2 application written in Java that implement
    the management protocol endpoints, such as SRM~\cite{ref:srm} and WebDAV~\cite{ref:webdav}. 
    The frontend services maintain state in an external database.
    
    The main frontend responsibilities are
    \begin{itemize}
        \item implement consistent authorization, taking as input the authentication information exposed
        by the Nginx TLS terminator and matching this information with a common authorization policy;
        \item implement request validation and management, i.e., protocol-specific management of request 
        queuing as well as conflict handling;
        \item translate protocol-specific requests to a set of basic storage management operations executed
        by the backend and exposed via a set of gRPC services;
        \item provide service management and observability endpoints, to allow administrators to get information
        about the requests currently being serviced by the system, drain the service or manually force request status
        transitions;
    \end{itemize}
    
    The first frontend service developed in StoRM 2 focuses on the SRM interface, and at the time of this writing 
    implements support for the SRM \textit{ping} and \textit{ls} methods.
    
    In the initial development sprints, significant work has been devoted to ensure the testability of the frontend component in isolation,
    by leveraging the powerful testing support provided by Spring~\cite{ref:spring} and the gRPC frameworks.
    
    \section{The StoRM 2 backend component}
    
    The StoRM 2 backend is a gRPC server that provides multiple
    
    enricovianello's avatar
    enricovianello committed
    services. One service responds to \textit{version} requests. Another
    service responds to storage-related requests, which represent the main
    scope of StoRM. In general there is no direct, one-to-one mapping
    between SRM requests arriving at the frontend and requests addressed
    to the backend; rather, these represent building blocks that the
    frontend can compose in order to prepare the responses to SRM clients.
    
    Among the storage requests addressed to the backend, at the moment
    only a couple are implemented: \textit{ls}, in its multiple variations
    (for a file or a directory, recursive, up to a given depth, etc.),
    returns information about files and directories; \textit{pin},
    \textit{unpin} and \textit{pin status} manage the
    \verb|user.storm.pinned| attribute of filesystem entities, which is
    essential for the implementation of the more complex
    \textit{srmPrepareToGet} SRM request.
    
    All the backend requests are currently blocking: a response is sent
    back to the frontend only when the request has been fully processed.
    
    The backend also incorporates sub-components of more general utility
    to operate on Filesystem Extended Attributes and POSIX Access Control
    Lists~\cite{p1003.1e}, adding a layer of safety and expressivity on
    top of the native C APIs. They allow to define attributes and ACLs
    respectively and to apply them to or read them from filesystem
    entities.
    
    For example the following sets the attribute \verb|user.storm.pinned|
    of file \verb|myFile.txt| to the pin duration:
    
    {\small
    \begin{verbatim}
    set_xattr(
      storage_dir / "myFile.txt",
      StormXAttrName{"pinned"},
      XAttrValue{duration}
    );
    \end{verbatim}
    }
    
    The following instead extends the ACL currently assigned to
    \verb|myFile.txt| with some additional entries:
    
    {\small
    \begin{verbatim}
    add_to_access_acl(
      storage_dir / "myFile.txt",
      {
        {User{"storm"}, Perms::Read | Perms::Write},
        {Group{"storm"}, Perms::Read},
        {other, Perms::None}
      }
    );
    \end{verbatim}
    }
    
    The backend is implemented in C++, in the latest standard version
    supported by the toolset installed in the reference platform
    (currently C++17). The build system is based on CMake.
    
    The backend relies on some other third-party dependencies, the most
    important being for interaction with the filesystem (Boost
    Filesystem~\cite{ref:boost.fs}), for logging (Boost
    Log~\cite{ref:boost.log}) and for handling configuration
    (yaml-cpp~\cite{ref:yaml-cpp}).
    
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    \section{Test suite and continuous integration}
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    The test suite is based on the Robot Framework~\cite{ref:rf} and is typically
    run in a Docker container.  A deployment test pipeline~\cite{ref:glcip} runs on
    our Gitlab-based continuous integration (CI) system every night (and after any
    commit on the master branch) to instantiate the main StoRM 2 services and
    execute the SRM testsuite. The reports of the test suite execution are archived
    and published on the Gitlab CI dashboard. Services and the test suite are
    orchestrated using Docker Compose~\cite{ref:dc}. This approach provides an
    intuitive, self-contained testing environment deployable on the CI system and
    on the developers workstations.
    
    The test deployment mirrors the architecture shown in
    Figure~\ref{fig:storm-arch}, with clients and services placed in different
    docker networks to mimic a real-life deployment scenario.
    
    enricovianello's avatar
    enricovianello committed
    
    \section{Conclusions and future work}
    
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    In this contribution we have described the initial design and development
    activities performed during 2018 on StoRM 2, the next incarnation of the StoRM
    storage management system. 
    
    The main objectives of the StoRM refactoring is to improve the service
    scalability and manageability in order to meet the data management requirements
    of HL-LHC. The initial work of this year focused  on choosing tools,
    methodologies and approach with a strong emphasis on software quality. 
    
    Andrea Ceccanti's avatar
    Andrea Ceccanti committed
    In the future we will build on this groundwork to provide a full replacement
    for the existing StoRM implementation. The lack of dedicated manpower for this
    activity makes it hard to estimate when StoRM 2 will be ready to be deployed in
    production.
    
    enricovianello's avatar
    enricovianello committed
    
    \section*{References}
    
    \bibliographystyle{iopart-num}
    \bibliography{biblio}
    
    \end{document}