diff --git a/contributions/ds_cloud_c/Artifact/ds_cloud_c.pdf b/contributions/ds_cloud_c/Artifact/ds_cloud_c.pdf index a9758247ffd2521d01eea7081baea34d77590a73..2f1d57b6e24ea2c7b6860bbcbbeeef2c9fd35c86 100644 Binary files a/contributions/ds_cloud_c/Artifact/ds_cloud_c.pdf and b/contributions/ds_cloud_c/Artifact/ds_cloud_c.pdf differ diff --git a/contributions/ds_cloud_c/catc_monitoring.png b/contributions/ds_cloud_c/catc_monitoring.png new file mode 100644 index 0000000000000000000000000000000000000000..d945a1547757e3396ab7587dcacb5e95076c5c6f Binary files /dev/null and b/contributions/ds_cloud_c/catc_monitoring.png differ diff --git a/contributions/ds_cloud_c/ds_cloud_c.tex b/contributions/ds_cloud_c/ds_cloud_c.tex index 69fd16b742d600f06bdf6ac44789778695db6a70..07acad7474d19551a25ce06ca38052f8dbc2dfa3 100644 --- a/contributions/ds_cloud_c/ds_cloud_c.tex +++ b/contributions/ds_cloud_c/ds_cloud_c.tex @@ -11,7 +11,7 @@ \begin{abstract} -Cloud@CNAF is the cloud IaaS hosted at CNAF, based on open source solutions aiming +Cloud@CNAF is the cloud infrastructure hosted at CNAF, based on open source solutions aiming to serve different use cases present here. The infrastructure is the result of the collaboration of a transversal group of people from all CNAF functional units: networking, storage, farming, national services, distributed systems. @@ -22,7 +22,7 @@ aqueduct pipe located in the street nearby CNAF went broke. This event caused down of the entire DataCenter, including the Cloud@CNAF infrastructure.This paper presents the activities carried out throughout 2018 to ensure the functioning of the center cloud infrastructure, that saw the its migration from CNAF to INFN-Ferrara, -starting to the re-design of the entire to coupe with the limited availability of +starting to the re-design of the entire to cope with the limited availability of space and weigth imposed by the new location, to the physical migration of the racks and remote management and operation of infrastructure in order to continue to provide high-quality services for our users and communities. @@ -56,48 +56,75 @@ Thanks to this enhancement, Cloud@CNAF was able to offer high reliable services At the end of 2017, on November 9th early at morning, an aqueduct pipe located inthe street nearby CNAF, broke as documented in Ref. \cite{flood}. As a result, a river of water and mud flowed towards the Tier1 data center. The level of the water did not exceeded the -threshold of safety of thewaterproof doors but, due to the porosity of the external walls and the floor, it couldfind a way +threshold of safety of the waterproof doors but, due to the porosity of the external walls and the floor, it could find a way into the data center. Both electric lines failed at about 7.10AM CET. Access to the data center was possibile only in the afternoon, after all the water had been pumped out. As a result, the entire Tier1 data center went down, included the Cloud@CNAF infrastructure. -\section{The migration to INFN-Ferrara} +\section{The resource migration} Some weeks after the flooding, we decided to move the Cloud@CNAF core services in a different location in order to recover the services we pvovided for community and experiements. Thanks to a strong relationship, both University of Parma/INFN-Parma and INFN-Ferrara proposed to host our core machinery and related services. -Due to the geograpical proximity and the presence of POP GARR, we decided to move the Cloud@CNAF core machineries to the INFN-Ferrara location. +Due to the geograpical proximity and the presence of Point of Precence (PoP) GARR, we decided to move the Cloud@CNAF core machineries to the INFN-Ferrara location. -Unfortunately, INFN-Ferrara was not able to host all the Cloud@CNAF resources due to a limited power availability. +Unfortunately, INFN-Ferrara was not able to host all the Cloud@CNAF resources due to a limited power availability and weight. For such reason, we decided to carry on an important activity aimed at re-designing the new infrastructure. In order to do that, we selected the services and the related machinery to move to the new - temporary - location to fit the maximum power consumption and weight estimated for each of the two rooms devoted to host our services (see Table \ref{table:1} for details). \section{Re-design the new infrastructure} -Due to the limitations decribed in Table\ref{table:1} we were push to re-desig the Cloud@CNAF infrastructure by using (only) three racks in order to host our core services (see Table \ref{table:1} for the list os services). -Among this three racks, the first hosted the storage resources, the second hosted the sotage, Openstack controller and network services, together -with the GPFS cluster and other services. The third rack hosted Ovirt and Openstack nodes and some other services. -Rack1 and 2 have been coonected by 2x40Gbps through our VDX and Rack 1 and 3 have been connectd by 2x10Gbps -Moreover, Rack1 is connected to POP GARR with 1x1Gbps fiber connection. +Due to the limitations decribed in Table\ref{table:1} we were push to re-desig the Cloud@CNAF infrastructure by using (only) three racks to host Cloud@CNAF core services (see Table \ref{table:1} for the list of services). +Among this three racks, the first one hosted the storage resources, the second hosted Openstack controller and network services, together +with the GPFS cluster and other services. The third rack hosted Ovirt and Openstack compute nodes and some other ancillary services. +Rack1 and 2 have been connected by 2x40Gbps through our Brocade VDX switches and Rack 1 and 3 have been connectd by 2x10Gbps through PowerConnect switches + +Moreover, Rack1 is connected to PoP GARR with 1x1Gbps fiber connection. A complete overview of the new infrastrucure and related resource location is shown in Figure \ref{new_c_at_c}. As depicted by Figure \ref{new_c_at_c} and taking into account the limitations described in Table \ref{table:1}, we were able to limit the power conumption up to 13,79kW in respect to Room1 (limit 15kW) and up to 5.8kW (limit 7kW) in respect to Room2. -The whole migration process (from the design to the reconfiguration of the new infrastructure) took almost a business week and after that the Cloud@CNAF and related services -where up and running able to serve again different projects and communities. +The whole migration process (from the design to the reconfiguration of the new infrastructure) took almost a business week and after that the Cloud@CNAF infrastructure and related services +where up and running, able to serve again different projects and communities. -\section{Conclusions} -Due to a damage in the aqueduct pipe located inthe street nearby CNAF, a river of water and mud flowed towards the Tier1 data center causing the -shutdown of the entire data center. For such reason, the services and related resources hosted by Cloud@CNAF went down. -To cope with this problem, we decided to temporary migrate che core resources and services of Clud@CNAF to INFN-Ferrara and to do this a complete re-design of the entire -infrastructure was needed to tackle the limitations in terms of power consumption and weight imposed by the new location. -Due to the joint effort of all the CNAF people and the INFN-Ferrara colleagues we were able to re-design, migrate and make operational the new Cloud@CNAF infrastructure and related hosted services -in less than a business week. Thanks to the experience and the documentation provided, in June 2018 - after the Tier1 returned in its production status, Cloud@CNAF has been migrated back in less than three business days. +\section{Cloud@CNAF evolution} + +Starting from the activity carried out in 2016 related to the improvements done at the infrastructure level \cite{catc}, in 2018 (after the return of the core infrasructure services due to the flooding) +the increase of the computing resources, in terms of quality and quantity, continued in order to enhance the both the services and the performance offered to users. + +Thanls to such activity, during the last year the Cloud@CNAF saw a growth on the number of users and use cases implemented in the infastructure, in particular +the number of projects increased up to 87 which means a total consumption of 1035 virtual CPUS, 1766TB of RAM, with a total of 267 virtual machines (see Figure \ref{catc_monitor} for more details). + +Among others, some of the project that used the cloud infrastructure are: +\begin{itemize} +\item HARMONY - under the TTLab coordination, is a project aimed at finding resourceful medicines offensive against neoplasms in hematology +\item EEE - Extreme Energy Events - Science inside Schools (EEE), is a special research activity about the origin of cosmics rays carried out with the essential contribution of students and teachers of high schools, +\item USER Support - for the development of experiments dashboard and the hosting of the production instance of the dashboard, displayed on the monitor present on the CNAF hallway, +\item DODAS - for Elastic Extension of Computing Centre batch resources on external clouds, +\item Services devoted to EU projects like DEEP-HDC \cite{deep}, XDC \cite{xdc} and many more. +\end{itemize} + + -\begin{table} [h] +\section{Conclusions and future work} +Due to a damage in the aqueduct pipe located inthe street nearby CNAF, a river of water and mud flowed towards the Tier1 data center causing the +shutdown of the entire data center. For such reason, the services and related resources hosted by Cloud@CNAF went down. +To cope with this problem, we decided to temporary migrate che core resources and services of Clud@CNAF to INFN-Ferrara and +to do this a complete re-design of the entire infrastructure was needed to tackle the limitations in terms of power consumption and + weight imposed by the new location. +Due to the joint effort of all the CNAF people and the INFN-Ferrara colleagues we were able to re-design, migrate and make operational +the new Cloud@CNAF infrastructure and related hosted services in less than a business week. +Thanks to the experience and the documentation provided, in June 2018 - after the Tier1 returned in its production status, Cloud@CNAF +has been migrated back in less than three business days. +Even taking into account the above described probles, we were able to maintain and evolve the Cloud@CNAF infrastructure, giving the possibility +to the old and new users to continue therir activities and carry out their results. +In the next year, new and challenging activities are planned, in particular the migration to the OpenStack Roky version. + + +\begin{table} [ht] \centering \begin{tabular}{ |c|c|c|c|c|c|c| } \hline @@ -113,7 +140,7 @@ Occupancy (U) & 9 & 12 & 10 & 21 & 10 & 31 \\ \end{table} -\begin{table} +\begin{table} [ht] \centering \begin{tabular}{ |c|c|c|c| } \hline @@ -137,6 +164,13 @@ Powervault & Cloud networks & Compute nodes \\ \label{new_c_at_c} \end{figure} +\begin{figure}[h] +\centering +\includegraphics[width=15cm,clip]{catc_monitoring.png} +\caption{Cloud@CNAF monitoring and status} +\label{catc_monitor} +\end{figure} + \section{References} \begin{thebibliography}{} @@ -147,10 +181,15 @@ Cloud@CNAF - maintenance and operation, C. Duma, R. Bucchi, A. Costantini, D. Mi Web site: https://www.openstack.org/ \bibitem{flood} The flood, L. dell’Agnello, CNAF Annual Report 2017, https://www.cnaf.infn.it/wp-content/uploads/2018/09/cnaf-annual-report-2017.pdf +\bibitem{deep} +Web site: https://deep-hybrid-datacloud.eu/ +\bibitem{xdc} +Web site: www.extreme-datacloud.eu -\end{thebibliography} +\end{thebibliography} + \end{document}