Skip to content
Snippets Groups Projects
Commit 87985b9b authored by Lucia Morganti's avatar Lucia Morganti
Browse files

Update ARFarming2018.tex

parent 4759226b
Branches
No related tags found
No related merge requests found
Pipeline #22381 failed
......@@ -48,21 +48,27 @@ General job execution trend can be seen in figure~\ref{farm-jobs}.
\begin{figure}
\centering
\includegraphics[keepaspectratio,width=15cm]{farm-jobs.png}
\includegraphics[keepaspectratio,width=15cm]{images/farm-jobs.png}
\caption{Farm job trend during 2018}
\label{farm-jobs}
\end{figure}
\subsubsection{CINECA extension}
Thanks to an agreement between INFN and CINECA\cite{ref:cineca}, we were able to integrate a portion of Marconi cluster into our computing farm (approx. computing power of 180K HS06), reaching the total computing power of 400.000 HS06, almost doubling the power we provided last year. Thanks to the proximity of CINECA we set up a highly reliable fiber connection between the computing centers, with a very low latency, and could avoid to cache storage: all the remote nodes access storage hosted at CNAF in the exact same manner as the local nodes do. This simplifies a lot the setup and increases global farm reliability (see figure~\ref{cineca} for details on setup).
Thanks to an agreement between INFN and CINECA\cite{ref:cineca}, we were able to integrate a portion (3 racks for a total of 216 servers providing $\sim$180 kHS06) of the Marconi cluster into our computing farm, reaching the total computing power of 400.000 HS06, almost doubling the power we provided last year. Each server is equipped with a 10 Gbit uplink connection to the rack switch while each of them, in turn, is connected to the aggregation router with 4x40 Gbit links.
Due to the proximity of CINECA we set up a highly reliable fiber connection between the computing centers, with a very low latency (the RTT\footnote{Round-trip time (RTT) is the duration it takes for a network request to go from a starting point to a destination and back again to the starting point.} is 0.48 ms vs. 0.28 ms measured on the CNAF LAN), and could avoid to set up a cache storage on the CINECA side: all the remote nodes access storage resources hosted at CNAF in the exact same manner as the local nodes do. This simplifies a lot the setup and increases global farm reliability (see figure~\ref{cineca} for details on setup).
\begin{figure}
\centering
\includegraphics[keepaspectratio,width=12cm]{cineca.png}
\caption{INFN-T1 farm extension to CINECA}
\label{cineca}
\centering
\includegraphics[keepaspectratio,width=12cm]{images/cineca.png}
\caption{INFN-T1 farm extension to CINECA}
\label{cineca}
\end{figure}
Nodes at CINECA are setup with standard HDDs and since so many cores are available per node, we hit a bottleneck, having to slightly reduce the amount of jobs per node, that generally equals the number of cores. It's important to notice that we did not reach this limit with the latest tender we purchased, since it comes with 2 enterprise class SSDs.
These nodes have undergone several reconfigurations due to both the hardware and the type of workflow of the experiments. In April we had to upgrade the BIOS to overcome a bug which was preventing the full resource usage, limiting at $\sim$78\% of the total what we were getting from the nodes.
Moreover, since nodes at CINECA are setup with standard HDDs and since so many cores are available per node, we hit a bottleneck.
To mitigate this limitation, a reconfiguration of the local RAID configuration of disks has been done\footnote{The initial choice of using RAID-1 for local disks instead of RAID-0 proved to slow down the system even if safer from an operational point of view} and the amount of jobs per node was slightly reduced (generally this equals the number of logical cores). It's important to notice that we did not reach this limit with the latest tender we purchased, since it comes with two enterprise class SSDs.
During 2018 we kept using also the Bari ReCaS farm extension, with a reduced set of nodes that provided approx. 10k HS06. See 2017 AR for details on the setup.
\subsection{Hardware resources}
......@@ -79,13 +85,13 @@ Year 2018 has been terrible from a security point of view. Several critical vuln
\begin{figure}
\centering
\includegraphics[keepaspectratio,width=12cm]{meltdown.jpg}
\includegraphics[keepaspectratio,width=12cm]{images/meltdown.jpg}
\caption{Meltdown and Spectre comparison}
\label{meltdown}
\end{figure}
\begin{figure}
\centering
\includegraphics[keepaspectratio,width=12cm]{meltdown2.jpg}
\includegraphics[keepaspectratio,width=12cm]{images/meltdown2.jpg}
\caption{Meltdown attack description}
\label{meltdown2}
\end{figure}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Please register or to comment