Merge branch 'master' of https://baltig.infn.it/cnaf/annual-report/ar2018.git to add new commit

d8fe7bd0 · Federico Fornari · fb7adc8f · f6778a4b · d8fe7bd0 · d8fe7bd0
Commit d8fe7bd0 authored 5 years ago by Federico Fornari
--- a/build.sh
+++ b/build.sh
@@ -118,7 +118,7 @@ link_pdf padme 2019_PADMEcontribution.pdf
 build_from_source sysinfo  sysinfo.tex *.png
 #link_pdf virgo VirgoComputing.pdf

-#build_from_source tier1 tier1.tex
+build_from_source tier1 tier1.tex *.png
 #build_from_source flood theflood.tex *.png
 build_from_source HTC_testbed HTC_testbed_AR2018.tex
 build_from_source farming ARFarming2018.tex *.png *.jpg

--- a/cnaf-annual-report-2018.tex
+++ b/cnaf-annual-report-2018.tex
@@ -183,7 +183,7 @@ Introducing the sixth annual report of CNAF...
 \addcontentsline{toc}{part}{The Tier 1 and Data center}
 \addtocontents{toc}{\protect\mbox{}\protect\hrulefill\par}
 %\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/datacenter.pdf}
-%\ia{The INFN Tier 1 data center}{tier1}
+\ia{The INFN Tier-1}{tier1}
 \ia{The INFN-Tier1: the computing farm}{farming}
 %\ia{Data management and storage systems}{storage}
 %\ia{Evaluation of the ClusterStor G200 Storage System}{seagate}

--- a/contributions/farming/ARFarming2018.tex
+++ b/contributions/farming/ARFarming2018.tex
@@ -54,15 +54,21 @@ General job execution trend can be seen in figure~\ref{farm-jobs}.
 \end{figure}

 \subsubsection{CINECA extension}
-Thanks to an agreement between INFN and CINECA\cite{ref:cineca}, we were able to integrate a portion of Marconi cluster into our computing farm (approx. computing power of 180K HS06), reaching the total computing power of 400.000 HS06, almost doubling the power we provided last year. Thanks to the proximity of CINECA we set up a highly reliable fiber connection between the computing centers, with a very low latency, and could avoid to cache storage: all the remote nodes access storage hosted at CNAF in the exact same manner as the local nodes do. This simplifies a lot the setup and increases global farm reliability (see figure~\ref{cineca} for details on setup).
+Thanks to an agreement between INFN and CINECA\cite{ref:cineca}, we were able to integrate a portion (3 racks for a total of 216 servers providing $\sim$180 kHS06) of the Marconi cluster into our computing farm, reaching the total computing power of 400.000 HS06, almost doubling the power we provided last year. Each server is equipped with a 10 Gbit uplink connection to the rack switch while each of them, in turn, is connected to the aggregation router with 4x40 Gbit links.
+
+Due to the proximity of CINECA we set up a highly reliable fiber connection between the computing centers, with a very low latency (the RTT\footnote{Round-trip time (RTT) is the duration it takes for a network request to go from a starting point to a destination and back again to the starting point.} is 0.48 ms vs. 0.28 ms measured on the CNAF LAN), and could avoid to set up a cache storage on the CINECA side: all the remote nodes access storage resources hosted at CNAF in the exact same manner as the local nodes do. This simplifies a lot the setup and increases global farm reliability (see figure~\ref{cineca} for details on setup).
+
 \begin{figure}
-\centering
-\includegraphics[keepaspectratio,width=12cm]{cineca.png}
-\caption{INFN-T1 farm extension to CINECA}
-\label{cineca}
+    \centering
+    \includegraphics[keepaspectratio,width=12cm]{cineca.png}
+    \caption{INFN-T1 farm extension to CINECA}
+    \label{cineca}
 \end{figure}

-Nodes at CINECA are setup with standard HDDs and since so many cores are available per node, we hit a bottleneck, having to slightly reduce the amount of jobs per node, that generally equals the number of cores. It's important to notice that we did not reach this limit with the latest tender we purchased, since it comes with 2 enterprise class SSDs.
+These nodes have undergone several reconfigurations due to both the hardware and the type of workflow of the experiments. In April we had to upgrade the BIOS to overcome a bug which was preventing the full resource usage, limiting at $\sim$78\% of the total what we were getting from the nodes.
+Moreover, since nodes at CINECA are setup with standard HDDs and since so many cores are available per node, we hit a bottleneck.
+To mitigate this limitation, a reconfiguration of the local RAID configuration of disks has been done\footnote{The initial choice of using RAID-1 for local disks  instead of RAID-0 proved to slow down the system even if safer from an operational point of view} and the amount of jobs per node was slightly reduced (generally this  equals the number of logical cores). It's important to notice that we did not reach this limit with the latest tender we purchased, since it comes with two enterprise class SSDs.
+
 During 2018 we kept using also the Bari ReCaS farm extension, with a reduced set of nodes that provided approx. 10k HS06. See 2017 AR for details on the setup.

 \subsection{Hardware resources}

--- a/contributions/tier1/.gitkeep
+++ b/contributions/tier1/.gitkeep
--- a/contributions/tier1/cpu2018.png
+++ b/contributions/tier1/cpu2018.png
--- a/contributions/tier1/disk2018.png
+++ b/contributions/tier1/disk2018.png
--- a/contributions/tier1/tape2018.png
+++ b/contributions/tier1/tape2018.png
--- a/contributions/tier1/tier1.tex
+++ b/contributions/tier1/tier1.tex
+\documentclass[a4paper]{jpconf}
+\usepackage{graphicx}
+\usepackage{url}
+
+\usepackage{color, colortbl}
+\definecolor{LightCyan}{rgb}{0.88,1,1}
+\definecolor{LightYellow}{rgb}{1,1,0.88}
+\definecolor{Red}{rgb}{1,0,0}
+\definecolor{Green}{rgb}{0,1,0}
+\definecolor{MediumSpringGreen}{rgb}{0,0.98,0.6} %rgb(0,250,154)
+\definecolor{Gold}{rgb}{1,0.84,0}%rgb(255,215,0)
+\definecolor{Gainsboro}{rgb}{0.86,0.86,0.86}%rgb(220,220,220)
+
+\begin{document}
+
+\title{The INFN Tier-1}
+
+\author{Luca dell'Agnello}
+\address{INFN-CNAF, Bologna, IT}
+\ead{luca.dellagnello@cnaf.infn.it}
+
+
+\section{Introduction}
+CNAF hosts the Italian Tier-1 data center for WLCG: over the years, Tier-1 has become the main computing facility for INFN.
+Nowadays, besides the four LHC experiments, the INFN Tier-1 provides services and resources to 30 other scientific collaborations, including BELLE2 and several astro-particle experiments (Tab.\ref{T1-pledge})\footnote{CSN 1, CSN 2 and CSN 3 are the National Scientific Committees of the INFN, respectively, for experiments in high energy physics with accelerators, astro-particle experiments and experiments in nuclear physics with accelerators.}. As showns in Fig.~\ref{pledge2018}, besides LHC, the main users are the astro-particle experiments.
+
+
+\begin{figure}[h]
+	\begin{center}
+		\begin{minipage}{35pc}
+		\includegraphics[width=15pc]{cpu2018.png}\hspace{2pc}%
+%		\caption{\label{cpu2018}xxx}
+%		\end{minipage}\hspace{2pc}%
+%		\begin{minipage}{30pc}
+		\includegraphics[width=15pc]{disk2018.png}\hspace{2pc}%
+%		\caption{\label{disk2018}xxx}
+%		\end{minipage}
+		\vspace{2pc}%
+%		\begin{minipage}{20pc}
+		\begin{center}
+		\includegraphics[width=15pc]{tape2018.png}\hspace{2pc}%
+%		\caption{\label{tape2018}xxx}
+		\caption{\label{pledge2018}Relative requests of resources at INFN Tier-1}
+		\end{center}
+		\end{minipage}\hspace{2pc}%
+	\end{center}
+\end{figure}
+
+
+
+Despite the flooding that occurred at the end of 2017, we were able to provide the resources committed to the experiments for 2018, almost in time.
+
+
+
+\begin{table}
+	\begin{center}
+		\begin{tabular}{l|rrr}
+			\br
+			\textbf{Experiment}&\textbf{CPU (kHS06)}&\textbf{Disk (PB-N)}&\textbf{Tape (PB)}\\
+			\hline
+			\rowcolor{MediumSpringGreen}
+			ALICE&52020&5185&13497\\
+			\rowcolor{MediumSpringGreen}
+			ATLAS&85410&6480&17550\\
+			\rowcolor{MediumSpringGreen}
+			CMS&72000&7200&24440\\
+			\rowcolor{MediumSpringGreen}
+			LHCB&46805&5606&11400\\
+			\rowcolor{MediumSpringGreen}
+			\hline
+			\textbf{LHC Total}&\textbf{256235}&\textbf{24471}&\textbf{66887}\\
+			\hline
+			\rowcolor{LightYellow}
+			Belle2&13000&350&0\\
+			\rowcolor{LightYellow}
+			CDF&0&0&4000\\
+			\rowcolor{LightYellow}
+			Compass&40&10&40\\
+			\rowcolor{LightYellow}
+			KLOE&0&33&3075\\
+			\rowcolor{LightYellow}
+			LCHf&6000&90&0\\
+			\rowcolor{LightYellow}
+			NA62&3000&250&200\\
+			\rowcolor{LightYellow}
+			PADME&1500&10&500\\
+			\rowcolor{LightYellow}
+			LHCb Tier2&26085&0&0\\
+			\rowcolor{LightYellow}
+			\hline
+			\rowcolor{LightYellow}
+			\textbf{CSN 1 Total}&\textbf{49625}&\textbf{743}&\textbf{7815}\\
+			\hline
+			\rowcolor{LightCyan}
+			AMS&15800&1990&510\\
+			\rowcolor{LightCyan}
+			ARGO&0&120&1000\\
+			\rowcolor{LightCyan}
+			Auger&2000&615&0\\
+			\rowcolor{LightCyan}
+			BOREX&2000&185&41\\
+			\rowcolor{LightCyan}
+			CTA&4000&796&120\\
+			\rowcolor{LightCyan}
+			CUORE&1900&262&0\\
+			\rowcolor{LightCyan}
+			Cupid&100&15&10\\
+			\rowcolor{LightCyan}
+			DAMPE&8000&200&100\\
+			\rowcolor{LightCyan}
+			DARKSIDE&2000&980&300\\
+			\rowcolor{LightCyan}
+			ENUBET&500&10&0\\
+			\rowcolor{LightCyan}
+			EUCLID&1000&1042&0\\
+			\rowcolor{LightCyan}
+			Fermi&500&15&40\\
+			\rowcolor{LightCyan}
+			Gerda&40&45&40\\
+			\rowcolor{LightCyan}
+			Icarus&4000&500&1500\\
+			\rowcolor{LightCyan}
+			JUNO&3000&230&0\\
+			\rowcolor{LightCyan}
+			KM3&300&250&200\\
+			\rowcolor{LightCyan}
+			LHAASO&300&60&0\\
+			\rowcolor{LightCyan}
+			LIMADOU&400&8&0\\
+			\rowcolor{LightCyan}
+			LSPE&1000&14&0\\
+			\rowcolor{LightCyan}
+			MAGIC&296&65&150\\
+			\rowcolor{LightCyan}
+			NEWS&200&60&60\\
+			\rowcolor{LightCyan}
+			Opera&200&15&15\\
+			\rowcolor{LightCyan}
+			PAMELA&650&100&150\\
+			\rowcolor{LightCyan}
+			Virgo&30000&656&1368\\
+			\rowcolor{LightCyan}
+			Xenon100&1000&200&1000\\
+			\rowcolor{LightCyan}
+			\hline
+			\rowcolor{LightCyan}
+			\textbf{CSN 2 Total}&\textbf{79186}&\textbf{8433}&\textbf{6604}\\
+			\hline
+			\rowcolor{Gainsboro}
+			FOOT&200&20&0\\
+			\rowcolor{Gainsboro}
+			Famu&2250&15&187\\
+			\rowcolor{Gainsboro}
+			GAMMA/AGATA&0&0&1160\\
+			\rowcolor{Gainsboro}
+			NEWCHIM/FARCOS&0&10&300\\
+			\rowcolor{Gainsboro}
+			\hline
+			\rowcolor{Gainsboro}
+			\textbf{CSN 3 Total}&\textbf{2450}&\textbf{45}&\textbf{1460}\\
+			\hline \hline
+			\rowcolor{Green}
+			\textbf{Grand Total}&\textbf{387496}&\textbf{33692}&\textbf{82766}\\
+			\rowcolor{Green}
+			\textbf{Installed}&\textbf{340000}&\textbf{34000}&\textbf{71000}\\
+			\br
+		\end{tabular}
+	\end{center}
+	\caption{Pledged and installed resources at INFN Tier-1 in 2018 (for the CPU power an overlap factor is applied)}
+	\label{T1-pledge}
+	\hfill
+\end{table}
+
+
+
+\subsection{Out of the mud}
+The year 2018 began with the recovery procedures of the data center after the flooding of Novembrer 2017.
+Despite the serious damages to the power plants (both power lines were compromised), immediately after the flooding we started the recovery procedures of both the infrastructure and the IT equipment. The first mandatory intervention was to restore, at least, one of the two power lines (with a leased UPS in the first period). This goal was achieved during December 2017.
+In January, after also the chillers were restarted, we could proceed to re-open all services, including part of the farm (at the beginning only $\sim$ 50 kHS06, 1/5 of the total power capacity, were online, while 13\% was lost) and, one by one, the storage systems.
+The first experiments to resume operations at CNAF were Alice, Virgo, Darkside: in fact, the storage system used by Virgo and Darkside had been easily recovered after Christmas break, while Alice is able to use computing resources relaying on remote storage. During February and March, we were able to progressively re-open the services for all other experiments. %(Fig.\ref{farm2018} shows the restart of the farm). Meanwhile, we had setup a new partition of the farm hosted at CINECA super-computing center premises (see Par.~\ref{CINECAext}).
+
+The final damage inventory shows the loss of $\sim$ 30 kHS06, 4 PB of data and 60 tapes: on the other hand, it was possible to repair all the other systems recovering $\sim$ 20 PB of data; for the infrastructure, the second line was recovered  (see \cite{FLOODCHEP} for details).
+
+%\begin{figure}[h]
+%	\begin{center}
+%		\includegraphics[width=40pc]{t1-img/farm2018.png}\hspace{2pc}%
+%		\caption{\label{farm2018}Farm usage in 2018}
+%	\end{center}
+%\end{figure}
+
+
+\subsection{The long-term consequences of the flooding}
+The data center was designed taking into account all possible accidents (e.g. fires, power outages ...), except at least this. 
+In fact, it was believed that the only threat due to water could come from a very heavy rain and, indeed, waterproof doors were installed some years ago (after a heavy rain). 
+The post-mortem analysis showed that the causes, beside the breaking of the tube, are to be found in the unfavorable position (2 underground levels) and in the excessive permeability of the perimeter (while the anti-flood doors worked). Therefore, an intervention has been carried out to increase the waterproofing of the data center and, moreover, work is planned for summer 2019 to strengthen the perimeter of the building and build a second water collection tank.
+
+Even if the search for a new location to move the data center had started before the flooding (the main drive being its limited expandability not able to cope with the foreseen requirements for HL-LHC era when we should scale up to 10 MW of power for IT), the flooding gave us a second strong reason to move.
+An opportunity is given by the new ECMWF center which will be hosted in Bologna, in a new Technopole area, starting from 2019. In the same area the INFN Tier-1 and the CINECA computing centers can be hosted too: funding has been guaranteed to INFN and CINECA by the Italian Government for this. The goal is to have the new data center for the INFN Tier-1 fully operational by the end of 2021. 
+
+
+
+
+\section{INFN Tier-1 extension at CINECA}\label{CINECAext}
+As mentioned in the previous Paragraph, part of the farm is hosted at CINECA\footnote{CINECA is the Italian Supercomputing center, also located near Bologna ($\sim17$ far km from CNAF). See \url{http://www.cineca.it/}}. 
+
+Out of the 400 kHS06 CPU power (340 kHS06 pledged) of the CNAF farm, $\sim180$ are provided by servers installed in the CINECA data center.
+%Each server is equipped with a 10 Gbit uplink connection to the rack switch while each of them, in turn, is connected to the aggregation router with 4x40 Gbit links. 
+The logical network of the farm partition at CINECA is set as an extension of INFN Tier-1 LAN: a dedicated fiber couple interconnects the aggregation router at CINECA with the core switch at the INFN Tier-1 (see Farm and Network Chapters for more details). %Fig.~\ref{cineca-t1}).
+%The transmission on the fiber is managed by a couple of Infinera DCI, allowing to have a logical channel up to 1.2 Tbps (currently it is configured to transmit up to 400 Gbps). 
+%\begin{figure}
+%	%	\begin{minipage}[b]{0.45\textwidth}
+%	\begin{center}
+%		\includegraphics[width=30pc]{t1-img/cineca-t1.png}
+%		\caption{\label{cineca-t1}Schematic view of the CINECA - INFN Tier-1 interconnection}
+%	\end{center}
+%	%	\end{minipage}
+%\end{figure}
+
+These nodes, in production since March 2018 for WLCG experiments have been gradually opened to all other collaborations. %Due the low latency (the RTT is 0.48 ms vs. 0.28 ms measured on the CNAF LAN), there is no need of a disk cache on the CINECA side and the WNs directly access the storage located at CNAF; in fact, the 
+The efficiency of the jobs\footnote{The efficiency of a job is defined as the ratio beyween its CPU time and its wall-clock time.} is comparable to the one measured on the farm partition at CNAF. 
+
+Since this partition have been installed from the beginning with CentOS 7, legacy applications requiring a different flavour of Operating System can use it through the container technology Singularity~\cite{singularity}. 
+
+%Moreover, this partition has undergone several reconfigurations due to both the hardware and the type of workflow of the experiments. In April we had to upgrade the BIOS to overcome a bug which was preventing the full resource usage, limiting at $\sim$~78\% of the total what we were getting from the nodes. Moreover a reconfiguration of the local RAID configuration of disks is ongoing\footnote{The initial choice of using RAID-1 for local disks  instead of RAID-0 has been proven to slow down the system even if safer from an operational point of view.} as well as tests to choose the best number of computing slots.
+
+
+
+\section*{References}
+\begin{thebibliography}{9}
+\bibitem{FLOODCHEP} L. dell'Agnello, "Disaster recovery of the INFN Tier-1 data center: lesson learned" to be published in Proceedings of the 23rd International Conference on Computing in High Energy and Nuclear Physics -  EPJ Web of Conferences
+\bibitem{singularity} \url{http://singularity.lbl.gov}
+\end{thebibliography}
+
+\end{document}
+
+