Skip to content
Snippets Groups Projects
ds_cloud_c.tex 11.2 KiB
Newer Older
  • Learn to ignore specific revisions
  • Doina Cristina Duma's avatar
    Doina Cristina Duma committed
    \documentclass[a4paper]{jpconf}
    \usepackage{graphicx}
    \begin{document}
    
    \title{Cloud@CNAF Management and Evolution}
    
    Doina Cristina Duma's avatar
    Doina Cristina Duma committed
    
    \author{C. Duma$^1$, A. Costantini$^1$, D. Michelotto$^1$ and D. Salomoni$^1$}
    
    \address{$^1$INFN Division CNAF, Bologna, Italy}
    
    \ead{ds@cnaf.infn.it}
    
    \begin{abstract}
    
    
    Cloud@CNAF is  the cloud infrastructure hosted at CNAF, based on open source solutions aiming
    
    to serve different use cases present here.  The infrastructure is the result of  
    the  collaboration  of  a  transversal  group  of  people  from  all  CNAF  
    functional units:   networking, storage, farming, national services, distributed systems.  
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    If 2016 was for the Cloud@CNAF IaaS (Infrastructure as a Service) based on OpenStack, 
    
    a period of consolidation and improvement, 2017 was an year of consolidation and 
    operation ended with an extreme event - the flooding of the DataCenter, when an 
    aqueduct pipe located in the street nearby CNAF went broke. This event caused 
    down of the entire DataCenter, including the Cloud@CNAF infrastructure.This paper 
    presents the activities carried out throughout 2018 to ensure the functioning
    of the center cloud infrastructure, that saw the its migration from CNAF to INFN-Ferrara, 
    
    starting to the re-design of the entire to cope with the limited availability of 
    
    space and weigth imposed by the new location, to the physical migration of the 
    racks and remote management and operation of infrastructure in order to continue 
    to provide high-quality services for our users and communities. 
    
    Doina Cristina Duma's avatar
    Doina Cristina Duma committed
    \end{abstract}
    
    \section{Introduction}
    
    The main goal of Cloud@CNAF \cite{catc} project is to provide a production quality
    
    Cloud Infrastructure for CNAF internal activities as well as national and
    international projects hosted at CNAF:
    \begin{itemize}
      \item Internal activities
        \begin{itemize}
        \item Provisioning VM for CNAF departments and staff members
        \item Tutorial and courses
        \end{itemize}
      \item National and international projects
        \begin{itemize}
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
          \item Providing VMs for experiments hosted at CNAF, like CMS, ATLAS, EEE and FAZIA
          \item testbeds for testing the services developed by projects like the INDIGO-DataCloud, eXtreme-DataCLoud and DEEP-HybridDataCloud
    
    The infrastructure made available is based on OpenStack \cite{openstack}, version Mitaka, with all the
    
    services deployed using a High-Availability (HA) setup or in a
    clustered manner (for ex. for the DBs used). During  2016 the infrastructure has been
    enhanced, by adding new resources, compute and network, and its operation has been improved and guaranteed by
    adding the monitoring part, improving the support, automating the maintenance activities.
    
    
    Thanks to this enhancement, Cloud@CNAF was able to offer high reliable services to the users and communities who rely on such infrastructure.
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    At the end of 2017, on November 9th early at morning, an aqueduct pipe located in the street nearby CNAF, broke as documented in Ref. \cite{flood}.  
    
    As a result, a river of water and mud flowed towards the Tier1 data center.  The level of the water did not exceeded the 
    
    threshold of safety of the waterproof doors but, due to the porosity of the external walls and the floor, it could find a way
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
     into the data center. Both  electric  lines  failed  at  about  7.10AM CET.   Access  to  the  data  center  was  possible  only 
    
     in the  afternoon,  after  all  the  water  had  been  pumped  out.
    As a result, the entire Tier1 data center went down, included the Cloud@CNAF infrastructure.
    
    \section{The resource migration}
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    Some weeks after the flooding, has been decided to move the Cloud@CNAF core services in a different location
    in order to recover the services provided for communities and experiments.
    Thanks to a strong relationship, both University of Parma/INFN-Parma and INFN-Ferrara proposed to host our 
    core machinery and related services.
    Due to the geographical proximity and the presence of Point of Presence (PoP) GARR, the 
    Cloud@CNAF core machinery was moved to the INFN-Ferrara location.
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    Unfortunately, we were not able to move all the Cloud@CNAF resources due to a limited power and weight availability in the new location.
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    For the above mentioned reason, the re-design of the new infrastructure has been considered.
    As a first step, the services and the related machinery to move to the new - temporary - location have been selected in order to
     fit the maximum power consumption and weight estimated for each of the two rooms devoted to host Cloud@CNAF services (see Table \ref{table:1} for details).
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    \begin{table} [ht]
    \centering
    \begin{tabular}{ l|c|c|c||c||c| } 
    \cline{2-6}
     & \multicolumn{3}{c||}{Room1}  &  Room2 & Tot \\
    \cline{2-5}
     & Rack1 & Rack2 & Tot &  Rack3 &  \\
    \hline
    Power consumption (kW) & 8,88 &  4,91 & 13,79 (15) & 5,8 (7)& 19,59\\ 
    Weight (Kg) & 201 & 151 & 352 (400Kg/mq) & 92 (400Kg/mq) & 444 \\ 
    Occupancy (U) & 9 & 12 & 21 & 10 & 31 \\ 
    \hline
    \end{tabular}
    \caption{Power consumption weight and occupancy for each Rack. In brackets, the maximum value admitted for the Room.}
    \label{table:1}
    \end{table}
    
    \section{Re-design the new infrastructure}
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    Due to the limitations described in Table\ref{table:1} only three racks have been used to host Cloud@CNAF core service.
    Among this three racks, the first hosts the storage resources, the second hosts the Openstack controller, the network
     services and the GPFS cluster. The third hosts Ovirt and Openstack compute nodes, together with
     some other ancillary services  (see Table \ref{table:2} for details).
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    Rack1 and Rack2 have been connected by 2x40Gbps through our Brocade VDX switches and Rack1 and Rack3 have been connected 
    by 2x10Gbps through PowerConnect switches.
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    \begin{table} [ht]
    \centering
    \begin{tabular}{ c|l|l|l| } 
    \cline{2-4}
    & \multicolumn{1}{|c|}{Rack1} & \multicolumn{1}{|c|}{Rack2} & \multicolumn{1}{|c|}{Rack3}\\
    \hline
    & VDX & VDX &  PowerConnect x2 \\ 
    Resources & EqualLogic & Cloud controllers &  Ovirt nodes\\ 
    and & Powervault & Cloud networks & Compute nodes\\
    Services & & Gridstore & DBs nodes\\
    & &  Other services & Cloud UI\\
    \hline
    \end{tabular}
    \caption{List of resources and services hosted per Rack}
    \label{table:2}
    \end{table}
    
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    Moreover, Rack1 is connected to PoP GARR with 1x1Gbps fiber connection to guarantee external connectivity.
    A complete overview of the new infrastructure and related resource location is shown in Figure \ref{new_c_at_c}.
    As depicted from the Figure \ref{new_c_at_c}  and taking into account the limitations described in Table \ref{table:1}) the power consumption
    has been limited  up to 13,79kW in respect to Room1 (limit 15kW) and up to 5.8kW (limit 7kW) in respect to Room2.
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    The whole migration process (from the design to the reconfiguration of the new infrastructure) took just a business week 
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    and after that the Cloud@CNAF infrastructure and related services where up and running, able to serve again different projects and communities.
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    Thanks to the experience and documentation gathered, in June 2018 - after the Tier1 returned in its production status, 
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    Cloud@CNAF has been migrated back in less than three business days.
    
    \section{Cloud@CNAF evolution}
    
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    Starting from the activity carried out in 2016 related to the improvements done at the infrastructure level \cite{catc}, in 
    2018 (after the return of the core infrastructure services due to the flooding) 
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    the increase of the computing  resources, in terms of quality and quantity, continued in order to enhance both the
     services and the performance offered to users.
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    Thanks to such activity, during the last year the Cloud@CNAF saw a growth on the number of users and use cases
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
     implemented in the infrastructure, in particular the number of projects increased up to 87 using approximately 
     1035 virtual CPUS, 1.766TB of RAM, with a total of 267 virtual machines (see Figure \ref{catc_monitor} for more details).
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    Among others, some of the projects that used the cloud infrastructure are: 
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    \item HARMONY - Proof-of-concept under the TTLab coordination, is a project aimed at finding resourceful medicines offensive against neoplasms in hematology,
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    \item EEE - Extreme Energy Events - Science inside Schools (EEE), is a special research activity about the origin of cosmic rays carried out with the essential contribution of students and teachers of high schools,
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    \item CHNET-DHLab - Cultural heritage network of INFN for the development of virtual laboratories services, 
    
    \item USER Support - for the development of experiments dashboard and the hosting of the production instance of the dashboard, displayed on the monitor present on the CNAF hallway,
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    \item EOSC-hub DODAS - Temaic service for  Elastic  Extension  of  Computing  Centre  batch resources  on  external  clouds,
    \item Services devoted to EU projects like DEEP-HDC \cite{deep}, XDC \cite{xdc} and EOSC-pilot \cite{pilot}.
    
    \section{Conclusions and future work}
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    Due to a damage in the aqueduct pipe located in the street nearby CNAF, a river of water and mud flowed towards the Tier1 data center causing the 
    
    shutdown of the entire data center. For such reason, the services and related resources hosted by Cloud@CNAF went down.
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    To cope with this problem, the decision to temporary migrate the core resources and services of Clud@CNAF to INFN-Ferrara has been taken and adopted.
    In order to do this, a complete re-design of the entire infrastructure was needed to tackle the limitations in terms of power consumption and
    
     weight imposed by the new location.
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    The joint effort and expertise of all the CNAF people and the INFN-Ferrara colleagues made possible to re-design, migrate and make operational 
    the Cloud@CNAF infrastructure and related hosted services in less than a business week.
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    Thanks to the experience and the documentation gathered, in June 2018 - after the Tier1 returned in its production status, Cloud@CNAF 
    
    has been migrated back in less than three business days.
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    Even with the above described problems, the Cloud@CNAF infrastructure has been maintained and evolved, giving the possibility 
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    to the users to carry on their activities and obtain their desidered results.
    For the next year new and challenging activities are planned, in particular the migration to the OpenStack Rocky version and the deployment of a new architecture distributed between 
    differnet functional units, Data Center and SDDS. 
    
    \includegraphics[width=15cm,clip]{infn-fe23.png}
    
    \caption{The new architecture of the Cloud@CNAF developed to cope the limitations at INFN-Ferrara.}
    \label{new_c_at_c}
    \end{figure}
    
    
    \begin{figure}[h]
    \centering
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    \includegraphics[width=12cm,clip]{catc_monitoring.png}
    
    \caption{Cloud@CNAF monitoring and status}
    \label{catc_monitor}
    \end{figure}
    
    
    \section{References} 
    
    \begin{thebibliography}{}
    
    \bibitem{catc}
    Cloud@CNAF - maintenance and operation, C. Duma, R. Bucchi, A. Costantini, D. Michelotto, M. Panella, D. Salomoni and G. Zizzi, CNAF Annual Report 2016, https://www.cnaf.infn.it/Annual-Report/annual-report-2016.pdf
    \bibitem{openstack}
    Web site: https://www.openstack.org/
    \bibitem{flood}
    The flood, L. dell’Agnello, CNAF Annual Report 2017, https://www.cnaf.infn.it/wp-content/uploads/2018/09/cnaf-annual-report-2017.pdf
    
    \bibitem{deep}
    Web site: https://deep-hybrid-datacloud.eu/
    \bibitem{xdc}
    Web site: www.extreme-datacloud.eu
    
    Alessandro Costantini's avatar
    Alessandro Costantini committed
    \bibitem{pilot}
    Web site: https://eoscpilot.eu
    
    Doina Cristina Duma's avatar
    Doina Cristina Duma committed
    
    \end{document}