@@ -53,13 +53,13 @@ However, the installed computing power was sufficient to meet the requirements o
\caption{AMD Naples (7351) vs Rome (7452) performance comparison}
\label{Rome}
\end{figure}
Just before the end of the year we were given the opportunity to test a sample of the new AMD EPYC CPU, codename ``Rome'', who is going to take over the current mainstream product, code-name ``Naples''. Performance have increased incredibly from previous generation, we may say that thy almost doubled.
Just before the end of the year we were given the opportunity to test a sample of the new AMD EPYC CPU, codename ``Rome'', who is going to take over the current mainstream product, codename ``Naples''. Performance have increased incredibly from previous generation, we may say that thy almost doubled.
In \figurename~\ref{Rome} it is possible to see the performance of a single Rome CPU, compared to a dual Naples one: it is quite clear that performance is slightly better, but considering we are comparing a single CPU with a dual one, we are facing an incredible leap ahead in performance. This solution could mean that in the next tender we may expect a significant decrease in power consumption, due to the fact that a single modern CPU (whose TDP is generally approx. 150W) can perform like two of the previous generation.
\subsection{Activities on our farm}
Since we migrated all our farm to centos7 a few experiments asked the possibility to run sl6 containers, since some applications are not yet validated or generally available for centos7. To do so we initially provided docker and after a brief investigation, also singularity\cite{ref:singularity}(see next paragraph for details).
Since we migrated all our farm to CentOS7 a few experiments asked the possibility to run sl6 containers, since some applications are not yet validated or generally available for centos7. To do so we initially provided docker and after a brief investigation, also singularity\cite{ref:singularity}(see next paragraph for details).
\\
Our nodes are easily overloaded by job activities due to the multi-job approach we have: every node can run jobs for multiple VOs at the same time, decreasing the possibility of resource sharing among jobs and in the end this creates resource outages: buffers easily get saturated. The size of these buffers are defaults for the centos7 and it is clear these values do not fit our needs. Storage group spotted a problem with network communication from the Worker Nodes to the Disk Servers and this was identified by several transmission and reception buffer overruns from network card. By issuing an \texttt{ifconfig} command, it is quite simple to detect the error:
Our nodes are easily overloaded by job activities due to the multi-job approach we have: every node can run jobs for multiple VOs at the same time, decreasing the possibility of resource sharing among jobs and, in the end, this creates resource outages: buffers easily get saturated. In fact, the default size of these buffers for CentOS7 do not fit our needs. A problem was spotted with the network communication from the Worker Nodes to the Disk Servers and this was identified by several transmission and reception buffer overruns from network card. By issuing an \texttt{ifconfig} command, it is quite simple to detect the error:
\begin{verbatim}
eth0 Link encap:Ethernet HWaddr 08:00:27:31:65:b5
...
...
@@ -126,7 +126,7 @@ configured, to allow grid users access to
this new kind of resource and to verify the versatility of a
HTCondor-CE instance on a brand new use case. Starting from late June,
submission to the production cluster was enabled to local communities,
enabling them to adapt their computing model to the new
thus allowing them to adapt their computing model to the new
environment. During this time, work has been done to better assess and
refine the necessary management tools. These activities are detailed
in the following subsections.
...
...
@@ -145,7 +145,7 @@ grows. In the old LSF configuration, five CREAM-CEs have
proved to keep up the sustained job workflow of INFN-T1 (with appropriate tuning and daycare), and
symmetrically we think five HTCondor-CEs should be enough when the
migration will be completed. A further one is already defined and can
quickly be added if needed. For performance reasons, Providing
quickly be added if needed. For performance reasons, providing
HTCondor-CEs and Submit Nodes with a fast SSD disk is highly
recommended: for this reason two such machines are installed on
bare metal, while a solution to guarantee an adequate minimum disk IO
...
...
@@ -179,10 +179,10 @@ to the UIs.
\subsubsection{Cluster Management}
A major difference between LSF and HTCondor is how to
perform ordinary administrative tasks. When using LSF, editing a small
number of text files and reconfiguring or restarting a few services on
the master host is all it is needed to perform the vast majority of
the operations. Hosts in a HTCondor cluster are different since
perform ordinary administrative tasks.
All that is needed to perform the vast majority of management operations on LSF
is modifying a small number of text files or restarting some services on the master host.
Hosts in a HTCondor cluster are different since
each one has his own set of configurations. Altering the behavior of
a set of machines requires to modify and reload the configuration of
each involved host, individually. The most widely adopted approach to
...
...
@@ -227,7 +227,7 @@ add a few more capabilities, these are being frequently used and prove
themselves as big time savers.
\subsubsection{Accounting}
The INFN Tier 1 has a custom accounting model for several years now,
The INFN Tier 1 has had a customized accounting model for several years,
and this has been adapted to seamlessly work with HTCondor, seen as
just another Usage Records source to populate the existing PostgreSQL
data. Complete and flexible accounting has been set up by configuring