CNAF hosts the Italian Tier 1 data center for WLCG: over the years, Tier 1 has become the main computing facility for INFN.
Nowadays, besides the four LHC experiments, the INFN Tier 1 provides services and resources to 30 other scientific collaborations, including BELLE2 and several astro-particle experiments (Tab.\ref{T1-pledge})\footnote{CSN 1, CSN 2 and CSN 3 are the National Scientific Committees of the INFN, respectively, for experiments in high energy physics with accelerators, astro-particle experiments and experiments in nuclear physics with accelerators.}. As shown in Fig.~\ref{pledge2018}, besides LHC, the main users are the astro-particle experiments.
Nowadays, besides the four LHC experiments, the INFN Tier 1 provides services and resources to 30 other scientific collaborations,
including BELLE2 and several astro-particle experiments (see Table \ref{T1-pledge}).
As shown in Fig.~\ref{pledge2018}, besides LHC, the main users are the astro-particle experiments.
\begin{figure}[h]
...
...
@@ -48,7 +50,7 @@ Nowadays, besides the four LHC experiments, the INFN Tier 1 provides services an
Despite the flooding that occurred at the end of 2017, we were able to provide the resources committed to the experiments for 2018, almost in time.
Despite the flooding that occurred at the end of 2017, we were able to provide the resources committed to the experiments for 2018 almost in time.
...
...
@@ -166,7 +168,7 @@ Despite the flooding that occurred at the end of 2017, we were able to provide t
\br
\end{tabular}
\end{center}
\caption{Pledged and installed resources at INFN Tier 1 in 2018 (for the CPU power an overlap factor is applied)}
\caption{Pledged and installed resources at INFN Tier 1 in 2018 (for the CPU power an overlap factor is applied). CSN 1, CSN 2 and CSN 3 are the National Scientific Committees of the INFN, respectively, for experiments in high energy physics with accelerators, astro-particle experiments and experiments in nuclear physics with accelerators.}
\label{T1-pledge}
\hfill
\end{table}
...
...
@@ -176,10 +178,14 @@ Despite the flooding that occurred at the end of 2017, we were able to provide t
\subsection{Out of the mud}
The year 2018 began with the recovery procedures of the data center after the flooding of Novembrer 2017.
Despite the serious damages to the power plants (both power lines were compromised), immediately after the flooding we started the recovery procedures of both the infrastructure and the IT equipment. The first mandatory intervention was to restore, at least, one of the two power lines (with a leased UPS in the first period). This goal was achieved during December 2017.
In January, after also the chillers were restarted, we could proceed to re-open all services, including part of the farm (at the beginning only $\sim$ 50 kHS06, 1/5 of the total power capacity, were online, while 13\% was lost) and, one by one, the storage systems.
The first experiments to resume operations at CNAF were Alice, Virgo, Darkside: in fact, the storage system used by Virgo and Darkside had been easily recovered after Christmas break, while Alice is able to use computing resources relaying on remote storage. During February and March, we were able to progressively re-open the services for all other experiments. %(Fig.\ref{farm2018} shows the restart of the farm). Meanwhile, we had setup a new partition of the farm hosted at CINECA super-computing center premises (see Par.~\ref{CINECAext}).
In January, after the restart of the chillers, we could proceed to re-open all services, including part of the farm (at the beginning only $\sim$ 50 kHS06, 1/5 of the total power capacity, were online, while 13\% was lost) and, one by one, the storage systems.
The first experiments to resume operations at CNAF have been Alice, Virgo and Darkside:
in fact, the storage system used by Virgo and Darkside had been easily recovered after Christmas break, while Alice is able to use computing resources relaying on remote storage. During February and March, we were able to progressively re-open the services for all other experiments.
%(Fig.\ref{farm2018} shows the restart of the farm). Meanwhile, we had setup a new partition of the farm hosted at CINECA super-computing center premises (see Par.~\ref{CINECAext}).
The final damage inventory shows the loss of $\sim$ 30 kHS06, 4 PB of data and 60 tapes: on the other hand, it was possible to repair all the other systems recovering $\sim$ 20 PB of data; for the infrastructure, the second line was recovered (see \cite{FLOODCHEP} for details).
The final damage inventory shows the loss of $\sim$ 30 kHS06,
4 PB of data and 60 tapes: on the other hand, it was possible to repair all the other systems recovering $\sim$ 20 PB of data;
with respect to the infrastructure, the second line was recovered (see \cite{FLOODCHEP} for details).
%\begin{figure}[h]
% \begin{center}
...
...
@@ -195,14 +201,13 @@ In fact, it was believed that the only threat due to water could come from a ver
The post-mortem analysis showed that the causes, beside the breaking of the tube, are to be found in the unfavorable position (2 underground levels) and in the excessive permeability of the perimeter (while the anti-flood doors worked). Therefore, an intervention has been carried out to increase the waterproofing of the data center and, moreover, work is planned for summer 2019 to strengthen the perimeter of the building and build a second water collection tank.
Even if the search for a new location to move the data center had started before the flooding (the main drive being its limited expandability not able to cope with the foreseen requirements for HL-LHC era when we should scale up to 10 MW of power for IT), the flooding gave us a second strong reason to move.
An opportunity is given by the new ECMWF center which will be hosted in Bologna, in a new Technopole area, starting from 2019. In the same area the INFN Tier 1 and the CINECA computing centers can be hosted too: funding has been guaranteed to INFN and CINECA by the Italian Government for this. The goal is to have the new data center for the INFN Tier 1 fully operational by the end of 2021.
An opportunity is given by the new ECMWF center which will be hosted in Bologna, in a new Technopole area, starting from 2019.
In the same area the INFN Tier 1 and the CINECA\footnote{CINECA is the Italian Supercomputing center, also located near Bologna ($\sim17$ km far from CNAF). See \url{http://www.cineca.it/}} computing centers can be hosted too: funding has been guaranteed to INFN and CINECA by the Italian Government for this. The goal is to have the new data center for the INFN Tier 1 fully operational by the end of 2021.
\section{INFN Tier 1 extension at CINECA}\label{CINECAext}
As mentioned in the previous Paragraph, part of the farm is hosted at CINECA\footnote{CINECA is the Italian Supercomputing center, also located near Bologna ($\sim17$ km far from CNAF). See \url{http://www.cineca.it/}}.
Out of the 400 kHS06 CPU power (340 kHS06 pledged) of the CNAF farm, $\sim180$ are provided by servers installed in the CINECA data center.
%Each server is equipped with a 10 Gbit uplink connection to the rack switch while each of them, in turn, is connected to the aggregation router with 4x40 Gbit links.
The logical network of the farm partition at CINECA is set as an extension of INFN Tier 1 LAN: a dedicated fiber couple interconnects the aggregation router at CINECA with the core switch at the INFN Tier 1 (see Farm and Network Chapters for more details). %Fig.~\ref{cineca-t1}).