Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • faproietti/ar2018
  • chierici/ar2018
  • SDDS/ar2018
  • cnaf/annual-report/ar2018
4 results
Show changes
Showing
with 545 additions and 58 deletions
contributions/ds_infn_cc/infncc-net.png

198 KiB

contributions/ds_infn_cc/infncc-services.png

128 KiB

File added
\documentclass[a4paper]{jpconf}
\usepackage{graphicx}
\usepackage{tikz}
\usepackage{hyperref}
\usepackage{eurosym}
%%%%%%%%%% Start TeXmacs macros
\newcommand{\tmem}[1]{{\em #1\/}}
\newcommand{\tmop}[1]{\ensuremath{\operatorname{#1}}}
\newcommand{\tmtextit}[1]{{\itshape{#1}}}
\newcommand{\tmtt}[1]{\texttt{#1}}
%%%%%%%%%% End TeXmacs macros
\begin{document}
\title{The INFN-Tier1: the computing farm}
\author{Andrea Chierici$^1$, Stefano Dal Pra$^1$, Diego Michelotto$^1$}
\address{$^1$ INFN-CNAF, Bologna, IT}
\ead{andrea.chierici@cnaf.infn.it, stefano.dalpra@cnaf.infn.it, diego.michelotto@cnaf.infn.it}
%\begin{abstract}
%\end{abstract}
\section{Introduction}
The farming group is responsible for the management of the computing resources of the centre.
This implies the deployment of installation and configuration services, monitoring facilities and the fair distribution of the resources to the experiments that have agreed to run at CNAF.
%\begin{figure}
%\centering
%\includegraphics[keepaspectratio,width=10cm]{ge_arch.pdf}
%\caption{Grid Engine instance at INFN-T1}
%\label{ge_arch}
%\end{figure}
\section{Farming status update}
During 2018 the group got reorganized: Antonio Falabella left the group and Diego Michelotto took over him. This turnover was quite harmless since Diego was already aware of many of the procedures adopted in farming group as well as of the collaborative tools used internally.
\subsection{Computing}
It's well known that in November 2017 we suffered a flooding in our data center and so the largest part of 2018 was dedicated to restoring the facility,
trying to understand how much of the computing power was damaged and how much was recoverable.
We had quite a luck on blade servers (2015 tender), while on 2016 tender most of the nodes that we thought were reusable, after some time got broken and were unrecoverable. We were able to recover working parts from the broken servers (like ram, CPUs and disks) and with those we assembled some nodes to be used as service nodes: the parts were accurately tested by a system integrator that guaranteed for us the stability and reliability of the resulting platform.
As a result of the flooding, approximately 24 kHS06 got damaged.
In spring we finally installed the new tender, composed of AMD EPYC nodes, providing more than 42 kHS06, with 256GB of ram, 2x1TB SSDs and 10Gbit Ethernet network. This is the first time we adopt 10Gbit connection for WNs and we think from now on it will be a basic requirement: modern CPUs provide several cores, enabling us to pack more jobs in a single node, where a 1Gbit network speed may be a significant bottleneck. The same applies to HDDs vs SSDs: we think that modern computing nodes can provide 100\% of their capabilities only with SSDs disks.
General job execution trend can be seen in Figure~\ref{farm-jobs}.
\begin{figure}
\centering
\includegraphics[keepaspectratio,width=15cm]{farm-jobs.png}
\caption{Farm job trend during 2018}
\label{farm-jobs}
\end{figure}
\subsubsection{CINECA extension}
Thanks to an agreement between INFN and CINECA\cite{ref:cineca}, we were able to integrate a portion (3 racks for a total of 216 servers sporting $\sim$180 kHS06) of the Marconi cluster into our computing farm, reaching the total computing power of 400 kHS06, almost doubling the power we provided last year. Each server is equipped with a 10 Gbit uplink connection to the rack switch while each of them, in turn, is connected to the aggregation router with 4x40 Gbit links.
Due to the proximity of CINECA, we set up a highly reliable fiber connection between the computing centers, with a very low latency
(the RTT\footnote{Round-trip time (RTT) is the duration it takes for a network request to go from a starting point to a destination and back again
to the starting point.} is 0.48 ms vs. 0.28 ms measured on the CNAF LAN), and could avoid to set up a cache storage on the CINECA side:
all the remote nodes access storage resources hosted at CNAF in the exact same manner as the local nodes do.
This simplifies a lot the setup and increases global farm reliability (see Figure~\ref{cineca} for details on setup).
\begin{figure}
\centering
\includegraphics[keepaspectratio,width=12cm]{cineca.png}
\caption{INFN-T1 farm extension to CINECA}
\label{cineca}
\end{figure}
These nodes have undergone several reconfigurations due to both the hardware and the type of workflow of the experiments.
In April we had to upgrade the BIOS to overcome a bug which was preventing the full resource usage,
limiting what we were getting from the nodes to $\sim$78\% of the total.
Moreover, since nodes at CINECA are setup with standard HDDs and since so many cores are available per node, we hit a bottleneck.
To mitigate this limitation, a reconfiguration of the local RAID configuration of disks has been
done\footnote{The initial choice of using RAID-1 for local disks instead of RAID-0 proved to slow down the system even if safer from an operational point of view.} and the amount of jobs per node was slightly reduced (generally this equals the number of logical cores). It's important to notice that we did not reach this limit with the latest tender we purchased, since it comes with two enterprise class SSDs.
During 2018 we kept using also the Bari ReCaS farm extension,
with a reduced set of nodes that provided approximately 10 kHS06\cite{ref:ar17farming}.
\subsection{Hardware resources}
Hardware resources for farming group are quite new, and a refresh was not foreseen during 2018. The main concern is on the two different virtualization infrastructures, that only required a warranty renewal. Since we were able to recover a few parts from the flood-damaged nodes, we were able to acquire a 2U 4 node enclosure to be used as the main resource provider for the forthcoming HTCondor instance.
\subsection{Software updates}
During 2018 we completed the migration from SL6 to CentOS7 on all the farming nodes. The configurations have been stored on our provisioning system:
with the WNs the migration process has been rather simple, while with CEs and UIs we took extra care and proceeded one at a time in order to guarantee continuity
to the service. The same configurations have been used to upgrade LHCb-T2 and INFN-BO-T3, with minimal modifications.
All the modules produced for our site can easily be exported to other sites willing to perform the same update.
As already said, the update involved all the services with just a small number of exceptions: CMS experiment is using PhEDEx\cite{ref:phedex}, a system that provides the data placement and the file transfer system that is incompatible with CentOS7. Since the system will be phased out in mid 2019, we agreed with the experiment to not perform any update. Same thing happened with a few legacy UIs and some services for the CDF experiment, that are involved in a LTDP project (more details in next year report).
In any case, if an experiment needs a legacy OS, like SL6, on all the Worker Nodes we provide a container solution based on singularity\cite{ref:singu} software.
Singularity enables users to have full control of their environment through containers: it can be used to package entire scientific workflows, software and libraries, and even data. This avoids the T1 users to ask farming sysadmin to install any software, since everything can be put in a container and run. Users are in control of the extent to which containers interacts with its host: there can be seamless integration, or little to no communication at all.
Year 2018 has been terrible from a security point of view.
Several critical vulnerabilities have been discovered, affecting data-center CPUs and major software stacks:
the major ones were Meltdown and Spectre~\cite{ref:meltdown} (see Figure~\ref{meltdown} and~\ref{meltdown2}).
These discoveries required us to promptly intervene in order to mitigate and/or correct these vulnerabilities,
applying software updates (this mostly breaks down to updating Linux kernel and firmware) that most of the times required to reboot the whole farm.
This impacts greatly in term of resource availability, but it's mandatory in order to prevent security issues and possible sensitive data disclosures.
Thanks to our internally-developed dynamic update procedure, patch application is smooth and almost automatic, avoiding waste of time for the farm staff.
\begin{figure}
\centering
\includegraphics[width=0.5\textwidth]{meltdown.jpg}
%\includegraphics[keepaspectratio,width=12cm]{meltdown.jpg}
\caption{Meltdown and Spectre comparison}
\label{meltdown}
\end{figure}
\begin{figure}
\centering
%\includegraphics[keepaspectratio,width=12cm]{meltdown2.jpg}
\includegraphics[width=0.5\textwidth]{meltdown2.jpg}
\caption{Meltdown attack description}
\label{meltdown2}
\end{figure}
\subsection{HTCondor update}
INFN-T1 decided to migrate to HTCondor from LSF for several reasons.
The main one is that this software has proved to be extremely scalable and ready to stand the forthcoming challenges that High Luminosity LHC will raise
in our research community in the near future. Moreover, many of the other T1s involved in LHC have announced the transition to HTCondor or have already completed it,
not to consider the fact that our current batch system, LSF, is no longer under warranty, since INFN decided not to renew the contract with IBM
(the provider of this software now re-branded ``Spectrum LSF''), in order to save money and consider the alternative given by HTCondor.
\section{DataBase service: Highly available PostgreSQL}
In 2013 INFN-T1 switched to a custom solution for the job accounting
system~\cite{DGAS} based on a PostgreSQL backend. The database was
made more robust over time, introducing redundancy, reliable hardware
and storage. This architecture was powerful enough to also host other
database schema, or even independent instances, to meet
requirements from user communities (CUORE, CUPID) for their computing
model. A MySQL based solution is also in place, to accommodate needs of
the AUGER experiment.
\subsection{Hardware setup}
A High Availability PostgreSQL instance has been deployed on two
identical SuperMicro hosts, ``dbfarm-1'' and ``dbfarm-2'', each one equipped as
follows:
\begin{itemize}
\item Intel(R) Xeon(R) CPU E5-2603 v2 @ 1.80GHz,
\item 32GB Ram,
\item two FiberChannel controllers,
\item a Storage Area Network volume of 2 TB,
\item two redundant power supply.
\end{itemize}
The path to the SAN storage is also fully redundant, since each Fiber Channel
controller is connected to two independent SAN switches.
One node also hosts 2 HDDs ()1.8TB configured with software RAID-1) to work as service storage
area for supplementary data-base backup and other maintenance tasks.
\subsection{Software setup}
A PostgreSQL 11.1 master has been installed on the two host; dbfarm-1
has been set up to work as master and dbfarm-2 works as a hot standby
replica. With this configuration, the master is the main database,
while the replica can be accessed in read-only mode. This instance is
used to host the accounting database of the farming, the inventory of
the hardware of the T1-centre (docet) and a database used by the CUPID
experiment. The content of this database is updated directly by
authorized users of the experiment, while jobs running on our worker
nodes can access its data from the standby replica.
A second independent instance has also been installed on dbfarm-2
working as a hot standby replica of a remote Master instance managed
by the CUORE collaboration and located at INFN-LNGS. The continuous
synchronization with the master database happens through a VPN channel.
Local read access from our Worker Nodes to this
instance can be quite intense: the standby server has been
sustaining up to 500 connections without any evident problem.
\subsection{Distributed MySQL service}
A different solution for the AUGER experiment has been put in place for several
years now, and has been recently redesigned when moving our Worker
Nodes to CentOS7. Several jobs of the Auger experiment need
concurrent read-only access to a MySQL (actually MariaDB, with CentOS7
and later) database. A single server instance cannot sustain the
overall load generated by the clients. For this reason, we have
configured a reasonable subset of Worker Nodes (two racks) to host a
local binary copy of the AUGER data base. The ``master'' copy of this database
is available from a dedicated User Interface and
users can update its content when they need to.
The copy on the Worker Nodes can be updated every few months, upon
request from the experiment. To do so, we must in order:
\begin{itemize}
\item drain any running job accessing the database
\item shutdown every MariaDB instance
\item update the binary copy using rsync
\item restart the database
\item re-enable normal AUGER activity
\end{itemize}
\section{Helix Nebula Science Cloud}
During the first part of 2018, the farming group has been directly involved in the pilot phase of Helix Nebula Science Cloud project~\cite{ref:hnsc}, whose aim was to allow research institutes like INFN to be able to test commercial clouds against HEP use-cases, identifying strength and weak points.
The pilot phase has seen some very intense interaction between the public procurers and both commercial and public service providers.
\subsection{Pilot Phase}
The pilot phase of the HNSciCloud PCP is the final step in the implementation of the hybrid cloud platform proposed by the contractors that were selected. During the period from January to June 2018, the technical activities of the project focused on
scalability of the platforms and on training of new users that will access the pilot at the end of this phase.
Farming members guided the contractors throughout the first part of the pilot phase,
testing the scalability of the proposed platforms, organizing the procurers’ hosted events and assessing the deliverables produced by the contractors together with the other partners of the project.
\subsection{Conclusions of the Pilot Phase}
Improvements to the platforms have been implemented during this phase and even though
some R\&D activities had still to be completed, the general evaluation of the first part of the pilot phase is positive.
In particular, the Buyers Group reiterated the need for a fully functioning cloud storage service and highlighted the commercial advantage such a transparent data service represents for the Contractors. Coupled with a flexible voucher scheme, such an offering will encourage a greater uptake within the Buyers Group and the wider public research sector. The increase in demand for GPUs, even if not originally considered critical during the design phase, has become more important and highlighted a weak point in the current offer.
\section{References}
\begin{thebibliography}{9}
\bibitem{ref:cineca} Cineca webpage: https://www.cineca.it/
\bibitem{ref:ar17farming} Chierici A. et al. 2017 INFN-CNAF Annual Report 2017, edited by L. dell’Agnello, L. Morganti, and E. Ronchieri, pp. 111
\bibitem{ref:phedex} PhEDEx webpage: https://cmsweb.cern.ch/phedex/about.html
\bibitem{ref:singu} Singularity website: https://singularity.lbl.gov/
\bibitem{ref:meltdown} Meltdown attack website: https://meltdownattack.com/
\bibitem{ref:hnsc} Helix Nebula The Science Cloud website: https://www.hnscicloud.eu/
\bibitem{DGAS} Dal Pra, Stefano. ``Accounting Data Recovery. A Case Report from
INFN-T1'' Nota interna, Commissione Calcolo e Reti dell'INFN,
{\tt CCR-48/2014/P}
\bibitem{DOCET} Dal Pra, Stefano, and Alberto Crescente. ``The data operation centre tool. Architecture and population strategies'' Journal of Physics: Conference Series. Vol. 396. No. 4. IOP Publishing, 2012.
\end{thebibliography}
\end{document}
contributions/farming/cineca.png

16.7 KiB

contributions/farming/farm-jobs.png

230 KiB

contributions/farming/meltdown.jpg

39.2 KiB

contributions/farming/meltdown2.jpg

28.5 KiB

...@@ -6,12 +6,12 @@ ...@@ -6,12 +6,12 @@
\title{The \Fermi-LAT experiment} \title{The \Fermi-LAT experiment}
\author{ \author{
M Kuss$^{1}$, M. Kuss$^{1}$,
F Longo$^{2}$, F. Longo$^{2}$,
on behalf of the \Fermi LAT collaboration} on behalf of the \Fermi LAT collaboration}
\address{$^{1}$ Istituto Nazionale di Fisica Nucleare, Sezione di Pisa, I-56127 Pisa, Italy} \address{$^{1}$ INFN Sezione di Pisa, Pisa, IT}
\address{$^{2}$ Department of Physics, University of Trieste, via Valerio 2, Trieste and INFN, Sezione di Trieste, via Valerio 2, Trieste, Italy} \address{$^{2}$ University of Trieste and INFN Sezione di Trieste, Trieste, IT}
\ead{michael.kuss@pi.infn.it} \ead{michael.kuss@pi.infn.it}
\begin{abstract} \begin{abstract}
......
...@@ -28,7 +28,7 @@ ...@@ -28,7 +28,7 @@
\section{The GAMMA experiment and the AGATA array} \section{The GAMMA experiment and the AGATA array}
The strong interaction described by quantum chromodynamics (QCD) is responsible for binding neutrons and protons into nuclei and for the many facets of nuclear structure and reaction physics. Combined with the electroweak interaction, it determines the properties of all nuclei in a similar way as quantum electrodynamics shapes the periodic table of elements. While the latter is well understood, it is still unclear how the nuclear chart emerges from the underlying strong interactions. This requires the development of a unified description of all nuclei based on systematic theories of strong interactions at low energies, advanced few- and many-body methods, as well as a consistent description of nuclear reactions. Nuclear structure and dynamics have not reached the discovery frontier yet (e.g., new isotopes, new elements, …), and a high precision frontier is also being approached with higher beam intensities and purity, along with better efficiency and sensitivity of instruments. The access to new and complementary experiments combined with theoretical advances allows key questions to be addressed such as: The strong interaction described by quantum chromodynamics (QCD) is responsible for binding neutrons and protons into nuclei and for the many facets of nuclear structure and reaction physics. Combined with the electroweak interaction, it determines the properties of all nuclei in a similar way as quantum electrodynamics shapes the periodic table of elements. While the latter is well understood, it is still unclear how the nuclear chart emerges from the underlying strong interactions. This requires the development of a unified description of all nuclei based on systematic theories of strong interactions at low energies, advanced few- and many-body methods, as well as a consistent description of nuclear reactions. Nuclear structure and dynamics have not reached the discovery frontier yet (e.g. new isotopes, new elements, …), and a high precision frontier is also being approached with higher beam intensities and purity, along with better efficiency and sensitivity of instruments. The access to new and complementary experiments combined with theoretical advances allows key questions to be addressed such as:
How does the nuclear chart emerge from the underlying fundamental interactions? How does the nuclear chart emerge from the underlying fundamental interactions?
...@@ -51,8 +51,17 @@ What is the density and isospin dependence of the nuclear equation of state? ...@@ -51,8 +51,17 @@ What is the density and isospin dependence of the nuclear equation of state?
\noindent AGATA \cite{ref:gamma_first,ref:gamma_second} is the European Advanced Gamma Tracking Array for nuclear spectroscopy project consisting of a full shell of high purity segmented germanium detectors. Being fully instrumented with digital electronics it exploits the novel technique of gamma-ray tracking. AGATA will be employed at all the large-scale radioactive and stable beam facilities and in the long-term will be fully completed in 60 detectors unit geometry, in order to realize the envisaged scientific program. AGATA is being realized in phases with the goal of completing the first phase with 20 units by 2020. AGATA has been successfully operated since 2009 at LNL, GSI and GANIL, taking advantage of different beams and powerful ancillary detector systems. It will be used in LNL again in 2022, with stable beams and later with SPES radioactive beams, and in future years is planned to be installed in GSI/FAIR, Jyvaskyla, GANIL again, and HIE-ISOLDE. \noindent AGATA \cite{ref:gamma_first,ref:gamma_second} is the European Advanced Gamma Tracking Array for nuclear spectroscopy project consisting of a full shell of high purity segmented germanium detectors. Being fully instrumented with digital electronics it exploits the novel technique of gamma-ray tracking. AGATA will be employed at all the large-scale radioactive and stable beam facilities and in the long-term will be fully completed in 60 detectors unit geometry, in order to realize the envisaged scientific program. AGATA is being realized in phases with the goal of completing the first phase with 20 units by 2020. AGATA has been successfully operated since 2009 at LNL, GSI and GANIL, taking advantage of different beams and powerful ancillary detector systems. It will be used in LNL again in 2022, with stable beams and later with SPES radioactive beams, and in future years is planned to be installed in GSI/FAIR, Jyvaskyla, GANIL again, and HIE-ISOLDE.
\section{AGATA computing model and the role of CNAF} \section{AGATA computing model and the role of CNAF}
At present the array consists of 15 units, each composed by a cluster of 3 HPGe crystals. Each individual crystal is composed of 36 segments for a total of 38 associated electronics channels/crystal. The data acquisition rate, including Pulse Shape Analysis, can stand up to 4/5 kHz events per crystal. The bottleneck is presently the Pulse Shape Analysis procedure to extract the interaction positions from the HPGe detectors traces. With future faster processor one expects to be able to process the PSA at 10 kHz/crystal. The amount of raw data per experiment, including traces, is about 20 TB for a standard data taking of about 1 week and can increase to 50 TB for specific experimental configuration. The collaboration is thus acquiring locally about 250 TB of data per year. During data-taking raw data is temporarily stored in a computer farm located at the experimental site and, later on, it is dispatched on the GRID in two different centers, CCIN2P3 (Lyon) and CNAF (INFN Bologna), used as TIER1: the duplication process is a security in case of failures/losses of one of the TIER1. At present the array consists of 15 units, each composed by a cluster of 3 HPGe crystals.
The GRID itself is seldom used to re-process the data and the users usually download their data set to local storage where they can run emulators able to manage part or the full workflow. Each individual crystal is composed of 36 segments for a total of 38 associated electronics channels/crystal.
The data acquisition rate, including Pulse Shape Analysis, can stand up to 4/5 kHz events per crystal.
The bottleneck is presently the Pulse Shape Analysis procedure to extract the interaction positions from the HPGe detectors traces.
With future faster processor one expects to be able to process the PSA at 10 kHz/crystal. The amount of raw data per experiment, including traces,
is about 20 TB for a standard data taking of about 1 week and can increase to 50 TB for specific experimental configuration.
The collaboration is thus acquiring locally about 250 TB of data per year. During data-taking raw data is temporarily stored
in a computer farm located at the experimental site and, later on, it is dispatched on the GRID in two different centers, CCIN2P3 (Lyon) and CNAF (INFN Bologna),
used as Tier 1: the duplication process is a security in case of failures/losses of one of the Tier 1 sites.
The GRID itself is seldom used to re-process the data and the users usually download their data set to local storage
where they can run emulators able to manage part or the full workflow.
\section{References} \section{References}
......
contributions/icarus/ICARUS-nue-mip.png

106 KiB

contributions/icarus/ICARUS-sterile-e1529944099665.png

36.7 KiB

contributions/icarus/SBN.png

2.93 MiB

contributions/icarus/icarus-nue.png

476 KiB

\documentclass[a4paper]{jpconf}
\usepackage[font=small]{caption}
\usepackage{graphicx}
\begin{document}
\title{ICARUS}
\author{A. Rappoldi$^1$, on behalf of the ICARUS Collaboration}
\address{$^1$ INFN Sezione di Pavia, Pavia, IT}
\ead{andrea.rappoldi@pv.infn.it}
\begin{abstract}
After its successful operation at the INFN underground laboratories
of Gran Sasso (LNGS) from 2010 to 2013, ICARUS has been moved to
Fermilab Laboratory at Chicago (FNAL),
where it represents an important element of the
Short Baseline Neutrino Project (SBN).
Indeed, the ICARUS T600 detector, which has undergone various technical upgrades
operations at CERN to improve its performance and make it more suitable
to operate at shallow depth, will constitute one of three Liquid Argon (LAr) detectors
exposed to the FNAL Booster Neutrino Beam (BNB).
The purpose of this project is to provide adequate answers to the
``sterile neutrino puzzle'', due to the observation, claimed by various
other experiments, of anomalies in the results obtained in the
measurement of the parameters that regulate the mechanism of neutrino
flavor oscillations.
\end{abstract}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{The ICARUS project}
\label{ICARUS}
The technology of the Liquid Argon Time Projection chamber (LAr TPC),
was first proposed by scientist Carlo Rubbia in 1977. It was conceived as a tool for
detecting neutrinos in a way that would result in completely uniform imaging with high
accuracy of massive volumes (several thousand tons).
ICARUS T600, the first large-scale detector exploiting this detection technique,
is the biggest LAr TPC ever realized, with a cryostat containing 760 tons of liquid argon.
Its construction was the culmination of many years of ICARUS collaboration R\&D studies,
with larger and larger laboratory and industrial prototypes, mostly developed thanks
to the Italian National Institute for Nuclear Physics (INFN), with the support of CERN.
Nowadays, it represents the state of the art of this technique, and it marks a major
milestone in the practical realization of large-scale liquid-argon detectors.
The ICARUS T600 detector was previously installed in the underground Italian INFN Gran
Sasso National Laboratory (LNGS) and was the first large-mass LAr TPC operating as a continuously
sensitive general-purpose observatory.
The detector was exposed to the CERN Neutrinos to Gran Sasso (CNGS) beam,
a neutrino beam produced at CERN and
traveling undisturbed straight through Earth for 730 km.
This very successful run lasted 3 years (2010-2013),
during which were collected
$8.6 \cdot 10^{19}$ protons on target with a
detector live time exceeding 93\%, recording 2650 CNGS neutrinos,
(in agreement with expectations) and cosmic rays (with a total exposure of 0.73 kilotons per year).
ICARUS T600 demonstrated the effectiveness of the so-called {\it single-phase} TPC technique
for neutrino physics, providing a series of results, both from the technical and from the
physical point of views.
Beside the excellent detector performance, both as tracking device and as homogeneous calorimeter,
ICARUS demonstrated a remarkable capability in electron-photon separation and particle
identification, exploiting the measurement of dE/dx versus range, including also the
reconstruction of the invariant mass of photon pairs (coming from $\pi^0$ decay) to reject to unprecedented level
the Neutral Current (NC) background to $\nu_e$ Charge Current (CC) events (see Fig.~\ref{Fig1}).
\begin{figure}[ht]
\centering
% \includegraphics[width=0.8\textwidth,natwidth=1540,natheight=340]{icarus-nue.png}
\includegraphics[width=0.8\textwidth]{icarus-nue.png}
\end{figure}
\begin{figure}[ht]
\centering
\includegraphics[width=0.6\textwidth]{ICARUS-nue-mip.png}
\caption{\label{Fig1} {\it Top:} A typical $\nu_e$ CC events recorded during the ICARUS operation
at LNGS. The neutrino, coming from the right, interacts with the Ar nucleus and produce a
proton (short heavy ionizing track) and an electron (light gray track) which starts an electromagnetic
shower, which develops to the left. {\it Bottom:} The accurate analysis of {\it dE/dx} allows
to easily distinguish the parts of the track in which there is the overlap of more particles,
locating with precision the beginning of the shower.}
\end{figure}
The tiny intrinsic $\nu_e$ component in the CNGS $\nu_{\mu}$
beam allowed ICARUS to perform a sensitive search for anomalous LSND-like $\nu_\mu \rightarrow \nu_e$ oscillations.
Globally, seven electron-like events have been observed, consistent with the $8.5 \pm 1.1$ events
expected from intrinsic beam $\nu_e$ component and standard oscillations, providing the limit on
the oscillation probability $P(\nu_\muμ \rightarrow \nu_e) \le 3.86 \cdot 10^{3}$ at 90\% CL and
$P(\nu_\mu \rightarrow \nu_e) \le 7.76 \cdot 10^{3}$ at 99\% CL, as shown in
Fig.~\ref{Fig2}.
\begin{figure}[ht]
\centering
\includegraphics[width=0.5\textwidth]{ICARUS-sterile-e1529944099665.png}
\caption{\label{Fig2} Exclusion plot for the $\nu_\mu \rightarrow \nu_e$ oscillations.
The yellow star marks the best fit point of MiniBooNE.
The ICARUS limits on the oscillation probability are shown with the red lines. Most of
LSND allowed regios is excluded, except for a small area around $\sin^2 2 \theta \sim 0.005$,
$\Delta m^2 < 1 eV^2$.
}
\end{figure}
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{ICARUS at FNAL}
\label{FNAL}
After its successful operation at LNGS, the ICARUS T600 detector was planned
to be included in the Short Baseline Neutrino project (SBN) at Fermilab\cite{SBN},
in Chicago, aiming to give some definitive answer to the so-called
{\it Sterile Neutrino Puzzle}.
In this context, it will operate as the {\it far detector}, put along the
Booster Neutrino Beam (BNB) line, 600 meters from the target (see Fig.~\ref{Fig3}).
\begin{figure}[h]
\centering
\includegraphics[width=0.8\textwidth]{SBN.png}
\caption{\label{Fig3} The Short Baseline Neutrino Project (SBN) at
Fermilab (Chicago) will use three LAr TPC detectors, exposed to the
Booster Neutrino Beam, at different distances fron the target.
The ICARUS T600 detector, put at 600 m, will operate as the {\it far detector},
voted to detect any anomaly in the beam flux and spectrum, with respect to
the initial beam composition detected by the {\it near detector}
(SBND).
These anomalies, due to neutrino flavour oscillations, would consist of
either $\nu_e$ appearence or $\nu_\mu$ disappearance.
}
\end{figure}
For this purpose, the ICARUS T600 detector underwent intensive
overhauling at CERN, before shipping to FNAL,
in order to make it better suited to surface operation (instead of in
an underground environment).
This important technical improvements took place in the CERN
Neutrino Platform framework (WA104) from 2015 to 2017.
In addition to significant mechanical improvements, especially concerning
a new cold vessel, with a purely passive thermal insulation,
some important innovations have been applied to the scintillation
light detection system\cite{PMT} and to the readout
electronics\cite{Electronics}.
% The role of ICARUS will be to detect any anomaly in the neutrino beam flux and
% composition that can occour during its propagation (from the near to the
% far detector), caused by neutrino flavour oscillation.
% This task requires to have an excellent capability to detect and identify
% neutrino interaction within the LAr sensitive volume, rejecting any other
% spurious event with a high level of confidence.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{ICARUS data amount}
\label{Computingreport_2015.pdf}
% The new ICARUS T600 detector (that has been modified and improved to operate
% at FNAL) contains about 54,000 sensitive wires (that give an electric signal
% proportional to the charge released into the LAr volume by ionizing particles)
% and 180 large PMTs, producing a prompt signal coming from the scintillation light.
% Both these analogic signal types are then converted in digital form, by mean of
% fast ADC modules.
%
% During normal run conditions, the trigger rate is about 0.5 Hz, and
% a full event, consisting of the digitized charge signals of all wires
% and all PMTs, has a size of about 80 MB (compressed).
% Therefore, the expected acquisition rate is about 40 MB/s, corrisponding
%to 1 PB/yr.
The data produced by ICARUS detector (which is a LAr Time Projection Chamber)
basically consist of a large number of waveforms generated by sampling the electric
signals induced on the sensing wires by the drift of the charge deposited along
the trajectory of the charged particles within the Lar sensitive volume.
The waveforms recorded on about 54000 wires and 360 PMTs are digitized
(at sample rate of 2.5 MHz and 500 MHz respectively) and compressed,
resulting in a total size of about 80 MB/event.
Considering the forseen acquisition rate of about 0.5 Hz (in normal
run conditions), the expected data flow is about 40 MB/s, which
involves a data production of about 1 PB/yr.
The raw data are then processed by automated filters that allow to recognize
and select the various event types (cosmic, beam, background, etc.) and rewrite
them in a more flexible format, suitable for the following analysis,
which is also supported by means of graphics interactive programs.
% The experiment is expected to start commissioning phase at the end of 2018,
% with first data coming as soon as the Liquid Argon filling procedure is completed.
% Trigger logic tuning will last not less than a couple of months during which
% one PB of data is expected.
Furthermore, the ICARUS Collaboration is actively working on
producing Montecarlo events needed
to design and test the trigger conditions to be implemented on the detector.
This is done by using the same analysis and simulation tools
developed at Fermilab for the SBN detectors (the {\it LArSoft framework}), in
order to have a common software platform, and to facilitate algorithm testing
and performance checking by all the components of the collaboration.
During the 2018 many activities related to the detector installation
were still ongoing, and the start of data acquisition activities
is scheduled for the 2019.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section{Role and contribution of CNAF}
\label{CNAF}
All the data (raw and reduced) will be stored on the Fermilab using local facility;
however, the ICARUS collaboration agreed to have a mirror site in Italy
(located at CNAF INFN Tier 1) where to retain a full replica of the preselected
raw data, both to have redundancy and provide a more direct data access
to european part of the collaboration.
The CNAF Tier 1 computing resources assigned to ICARUS for 2018 consist of:
4000 HSPEC of CPU, 500 TB of disk storage and 1500 TB of tape archive.
A small fraction of the available storage has been used to
make a copy of all the raw data acquired at LNGS,
which are still subject to analysis.
During 2018 the ICARUS T600 detector was still in preparation, so
only a limited fraction
of such resorces has been used, mainly to perform data transfer tests
(from FNAL to CNAF) and to check the installation of LArSoft framework
in the Tier 1 environment. For this last purpose, a dedicate virtual
machine with custom environment was also used.
%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
\section*{References}
\begin{thebibliography}{1}
\bibitem{SBN}
R. Acciarri et al.,
{\it A Proposal for a Three Detector Short-Baseline Neutrino
Oscillation Program in the Fermilab Booster Neutrino Beam},
arXiv:1503.01520 [physics.ins-det]
\bibitem{PMT}
M. Babicz et al.,
{\it Test and characterization of 400 Hamamatsu R5912-MOD
photomultiplier tubes for the ICARUS T600 detector}.
JINST 13 (2018) P10030
\bibitem{Electronics}
L. Bagby et al.,
{\it New read-out electronics for ICARUS-T600 liquid
argon TPC. Description, simulation and tests of the new
front-end and ADC system}.
JINST 13 (2018) P12007
\end{thebibliography}
\end{document}
File added
...@@ -6,13 +6,13 @@ ...@@ -6,13 +6,13 @@
\author{C. Bozza$^1$, T. Chiarusi$^2$, K. Graf$^3$, A. Martini$^4$ for the KM3NeT Collaboration} \author{C. Bozza$^1$, T. Chiarusi$^2$, K. Graf$^3$, A. Martini$^4$ for the KM3NeT Collaboration}
\address{$ˆ1$ Department of Physics of the University of Salerno and INFN Gruppo Collegato di Salerno, via Giovanni Paolo II 132, 84084 Fisciano, Italy} \address{$ˆ1$ University of Salerno and INFN Gruppo Collegato di Salerno, Fisciano (SA), IT}
\address{$ˆ2$ INFN, Sezione di Bologna, v.le C. Berti-Pichat, 6/2, Bologna 40127, Italy} \address{$ˆ2$ INFN Sezione di Bologna, Bologna, IT}
\address{$ˆ3$ Friedrich-Alexander-Universit{\"a}t Erlangen-N{\"u}rnberg, Erlangen Centre for Astroparticle Physics, Erwin-Rommel-Stra{\ss}e 1, 91058 Erlangen, Germany} \address{$ˆ3$ Friedrich-Alexander-Universit{\"a}t Erlangen-N{\"u}rnberg, Erlangen, GE}
\address{$ˆ4$ INFN, LNF, Via Enrico Fermi, 40, Frascati, 00044 Italy} \address{$ˆ4$ INFN-LNF, Frascati, IT}
\ead{cbozza@unisa.it} \ead{cbozza@unisa.it}
...@@ -24,7 +24,7 @@ from astrophysical sources; the ORCA programme is devoted to ...@@ -24,7 +24,7 @@ from astrophysical sources; the ORCA programme is devoted to
investigate the ordering of neutrino mass eigenstates. The investigate the ordering of neutrino mass eigenstates. The
unprecedented size of detectors will imply PByte-scale datasets and unprecedented size of detectors will imply PByte-scale datasets and
calls for large computing facilities and high-performance data calls for large computing facilities and high-performance data
centres. The data management and processing challenges of KM3NeT are centers. The data management and processing challenges of KM3NeT are
reviewed as well as the computing model. Specific attention is given reviewed as well as the computing model. Specific attention is given
to describing the role and contributions of CNAF. to describing the role and contributions of CNAF.
\end{abstract} \end{abstract}
...@@ -80,7 +80,7 @@ way. One ORCA DU was also deployed and operated in 2017, with smooth ...@@ -80,7 +80,7 @@ way. One ORCA DU was also deployed and operated in 2017, with smooth
data flow and processing. At present time, most of the computing load is data flow and processing. At present time, most of the computing load is
due to simulations for the full building block, now being enriched with due to simulations for the full building block, now being enriched with
feedback from real data analysis. As a first step, this feedback from real data analysis. As a first step, this
was done at CC-IN2P3 in Lyon, but usage of other computing centres is was done at CC-IN2P3 in Lyon, but usage of other computing centers is
increasing and is expected to soon spread to the full KM3NeT increasing and is expected to soon spread to the full KM3NeT
computing landscape. This process is being driven in accordance to the computing landscape. This process is being driven in accordance to the
goals envisaged in setting up the computing model. The KM3NeT goals envisaged in setting up the computing model. The KM3NeT
...@@ -105,14 +105,14 @@ flow with a reduction from $5 GB/s$ to $5 MB/s$ per \emph{building ...@@ -105,14 +105,14 @@ flow with a reduction from $5 GB/s$ to $5 MB/s$ per \emph{building
block}. Quasi-on-line reconstruction is performed for selected block}. Quasi-on-line reconstruction is performed for selected
events (alerts, monitoring). The output data are temporarily stored on events (alerts, monitoring). The output data are temporarily stored on
a persistent medium and distributed with fixed latency (typically less a persistent medium and distributed with fixed latency (typically less
than few hours) to various computing centres, which altogether than few hours) to various computing centers, which altogether
constitute Tier 1, where events are reconstructed by various fitting constitute Tier 1, where events are reconstructed by various fitting
models (mostly searching for shower-like or track-like models (mostly searching for shower-like or track-like
patterns). Reconstruction further reduces the data rate to about $1 patterns). Reconstruction further reduces the data rate to about $1
MB/s$ per \emph{building block}. In addition, Tier 1 also takes care MB/s$ per \emph{building block}. In addition, Tier 1 also takes care
of continuous detector calibration, to optimise pointing accuracy (by of continuous detector calibration, to optimise pointing accuracy (by
working out the detector shape that changes because of water currents) working out the detector shape that changes because of water currents)
and photomultiplier operation. Local analysis centres, logically and photomultiplier operation. Local analysis centers, logically
allocated in Tier 2 of the computing model, perform physics analysis allocated in Tier 2 of the computing model, perform physics analysis
tasks. A database system interconnects the three tiers by distributing tasks. A database system interconnects the three tiers by distributing
detector structure, qualification and calibration data, run detector structure, qualification and calibration data, run
...@@ -124,10 +124,10 @@ book-keeping information, and slow-control and monitoring data. ...@@ -124,10 +124,10 @@ book-keeping information, and slow-control and monitoring data.
\label{fig:compmodel} \label{fig:compmodel}
\end{figure} \end{figure}
KM3NeT exploits computing resources in several centres and in the KM3NeT exploits computing resources in several centers and in the
GRID, as sketched in Fig.~\ref{fig:compmodel}. The conceptually simple GRID, as sketched in Fig.~\ref{fig:compmodel}. The conceptually simple
flow of the three-tier model is then realised by splitting the tasks flow of the three-tier model is then realised by splitting the tasks
of Tier 1 to different processing centres, also optimising the data of Tier 1 to different processing centers, also optimising the data
flow and the network path. In particular, CNAF and CC-IN2P3 aim at being flow and the network path. In particular, CNAF and CC-IN2P3 aim at being
mirrors of each other, containing the full data set at any moment. The mirrors of each other, containing the full data set at any moment. The
implementation for the data transfer from CC-IN2P3 to CNAF (via an implementation for the data transfer from CC-IN2P3 to CNAF (via an
...@@ -144,9 +144,9 @@ for a while becuse of the lack of human resources. ...@@ -144,9 +144,9 @@ for a while becuse of the lack of human resources.
\section{Data size and CPU requirements} \section{Data size and CPU requirements}
Calibration and reconstruction work in batches. The raw data related Calibration and reconstruction work in batches. The raw data related
to the batch are transferred to the centre that is in charge of the to the batch are transferred to the center that is in charge of the
processing before it starts. In addition, a rolling buffer of data is processing before it starts. In addition, a rolling buffer of data is
stored at each computing centre, e.g.\ the last year of data taking. stored at each computing center, e.g.\ the last year of data taking.
Simulation has special needs because the input is negligible, but the Simulation has special needs because the input is negligible, but the
computing power required is very large compared to the needs of computing power required is very large compared to the needs of
...@@ -179,9 +179,8 @@ Thanks to the modular design of the detector, it is possible to quote ...@@ -179,9 +179,8 @@ Thanks to the modular design of the detector, it is possible to quote
the computing requirements of KM3NeT per \emph{building block}, having the computing requirements of KM3NeT per \emph{building block}, having
in mind that the ARCA programme corresponds to two \emph{building in mind that the ARCA programme corresponds to two \emph{building
blocks} and ORCA to one. Not all software could be benchmarked, and blocks} and ORCA to one. Not all software could be benchmarked, and
some estimates are derived by scaling from ANTARES ones. When needed, some estimates are derived by scaling from ANTARES ones.
a conversion factor about 10 between cores and HEPSpec2006 (HS06) is In the following, the standard conversion factor (~10) is used between cores and HEPSpec2006 (HS06).
used in the following.
\begin{table} \begin{table}
\caption{\label{cpu}Yearly resource requirements per \emph{building block}.} \caption{\label{cpu}Yearly resource requirements per \emph{building block}.}
...@@ -211,7 +210,7 @@ resources at CNAF has been so far below the figures for a ...@@ -211,7 +210,7 @@ resources at CNAF has been so far below the figures for a
units are added in the following years. KM3NeT software that units are added in the following years. KM3NeT software that
runs on the GRID can use CNAF computing nodes in opportunistic mode. runs on the GRID can use CNAF computing nodes in opportunistic mode.
Already now, the data handling policy to safeguard the products of Tier-0 Already now, the data handling policy to safeguard the products of Tier 0
is in place. Automatic synchronization from each shore station to both is in place. Automatic synchronization from each shore station to both
CC-IN2P3 and CNAF runs daily and provides two maximally separated CC-IN2P3 and CNAF runs daily and provides two maximally separated
paths from the data production site to final storage places. Mirroring paths from the data production site to final storage places. Mirroring
...@@ -219,11 +218,11 @@ and redundancy preservation between CC-IN2P3 and CNAF are foreseen and ...@@ -219,11 +218,11 @@ and redundancy preservation between CC-IN2P3 and CNAF are foreseen and
currently at an early stage. currently at an early stage.
CNAF has already added relevant contributions to KM3NeT in terms of CNAF has already added relevant contributions to KM3NeT in terms of
know-how for IT solution deployment, e.g.~the above-mentioned synchronisation, software development solutions and the software-defined network at the Tier-0 at know-how for IT solution deployment, e.g.~the above-mentioned synchronisation, software development solutions and the software-defined network at the Tier 0 at
the Italian site. Setting up Software Defined Networks (SDN) for data the Italian site. Setting up Software Defined Networks (SDN) for data
acquisition deserves a special mention. The SDN technology\cite{SDN} is used to acquisition deserves a special mention. The SDN technology\cite{SDN} is used to
configure and operate the mission-critical fabric of switches/routers configure and operate the mission-critical fabric of switches/routers
that interconnects all the on-shore resources in Tier-0 stations. The that interconnects all the on-shore resources in Tier 0 stations. The
KM3NeT DAQ is built around switches compliant with the OpenFlow 1.3 KM3NeT DAQ is built around switches compliant with the OpenFlow 1.3
protocol and managed by dedicated controller servers. With a limited protocol and managed by dedicated controller servers. With a limited
number of Layer-2 forwarding rules, developed on purpose for the KM3NeT number of Layer-2 forwarding rules, developed on purpose for the KM3NeT
......
...@@ -3,20 +3,17 @@ ...@@ -3,20 +3,17 @@
\begin{document} \begin{document}
\title{LHCb Computing at CNAF} \title{LHCb Computing at CNAF}
\author{Stefano Perazzini} \author{S. Perazzini$^1$, C. Bozzi$^{2,3}$}
\address{INFN Sezione di Bologna, viale Berti Pichat 6/2, 40127 Bologna (BO), Italy E-mail: Stefano.Perazzini@bo.infn.it} \address{$^1$ INFN Sezione di Bologna, Bologna, IT}
\address{$^2$ CERN, Gen\`eve, CH}
\address{$^3$ INFN Sezione di Ferrara, Ferrara, IT}
\author{Concezio Bozzi} \ead{stefano.perazzini@bo.infn.it, concezio.bozzi@fe.infn.it}
\address{CERN, EP/LBD, CH-1211 Geneve 23, Switzerland, and INFN Sezione di Ferrara, via Saragat 1, 44122 Ferrara, Italy E-mail: Concezio.Bozzi@fe.infn.it}
\ead{bozzi@fe.infn.it}
\begin{abstract} \begin{abstract}
In this document a summary of the LHCb computing activities during the 2018 is reported. The usage of the CPU, disk and tape resources spread among various computing centres is analysed, with particular attention to the performances of the INFN Tier 1 at CNAF. Projections of the necessary resources in the years to come are also briefly discussed. In this document a summary of the LHCb computing activities during the 2018 is reported. The usage of the CPU, disk and tape resources spread among various computing centers is analysed, with particular attention to the performances of the INFN Tier 1 at CNAF. Projections of the necessary resources in the years to come are also briefly discussed.
\end{abstract} \end{abstract}
\section{Introduction} \section{Introduction}
...@@ -44,7 +41,7 @@ The offline reconstruction of the FULL stream for proton collision data run from ...@@ -44,7 +41,7 @@ The offline reconstruction of the FULL stream for proton collision data run from
A full re-stripping of 2015, 2016 and 2017 proton collision data, started in autumn 2017, ended in April 2018. A stripping cycle of 2015 lead collision data was also performed in that period. The stripping cycle concurrent with the 2018 proton collision data taking started in June and run continuously until November. A full re-stripping of 2015, 2016 and 2017 proton collision data, started in autumn 2017, ended in April 2018. A stripping cycle of 2015 lead collision data was also performed in that period. The stripping cycle concurrent with the 2018 proton collision data taking started in June and run continuously until November.
The INFN Tier1 centre at CNAF was in downtime from November 2017, due to a major flood incident. However, the site was again fully available in March 2018, allowing the completion of the stripping cycles on hold, waiting for the data located at CNAF (about 20\% of the total). Despite the unavailability of CNAF resources for the first months of 2018 the site performed excellently for the rest of the year, as testified by the number reported in this report. The INFN Tier 1 center at CNAF was in downtime from November 2017, due to a major flood incident. However, the site was again fully available in March 2018, allowing the completion of the stripping cycles on hold, waiting for the data located at CNAF (about 20\% of the total). Despite the unavailability of CNAF resources for the first months of 2018 the site performed excellently for the rest of the year, as testified by the number reported in this report.
As in previous years, LHCb continued to make use of opportunistic resources, that are not pledged to WLCG, but that significantly contributed to the overall usage. As in previous years, LHCb continued to make use of opportunistic resources, that are not pledged to WLCG, but that significantly contributed to the overall usage.
...@@ -67,27 +64,27 @@ Total WLCG & 502 & 41.3 & 90.5\\ \hline ...@@ -67,27 +64,27 @@ Total WLCG & 502 & 41.3 & 90.5\\ \hline
\label{tab:pledges} \label{tab:pledges}
\end{table} \end{table}
The usage of WLCG CPU resources by LHCb is obtained from the different views provided by the EGI Accounting portal. The CPU usage is presented in Figure~\ref{fig:T0T1} for the Tier0 and Tier1s and in Figure~\ref{fig:T2} for Tier2s . The same data is presented in tabular form in Table~\ref{tab:T0T1} and Table~\ref{tab:T2}, respectively. The usage of WLCG CPU resources by LHCb is obtained from the different views provided by the EGI Accounting portal. The CPU usage is presented in Figure~\ref{fig:T0T1} for the Tier 0 and Tier 1 sites and in Figure~\ref{fig:T2} for Tier 2 sites. The same data is presented in tabular form in Table~\ref{tab:T0T1} and Table~\ref{tab:T2}, respectively.
\begin{figure} \begin{figure}
\begin{center} \begin{center}
\includegraphics[width=0.8\textwidth]{T0T1.png} \includegraphics[width=0.8\textwidth]{T0T1.png}
\end{center} \end{center}
\caption{\label{fig:T0T1}Monthly CPU work provided by the Tier-0 and \caption{\label{fig:T0T1}Monthly CPU work provided by the Tier 0 and
Tier 1 centres to LHCb during 2018.} Tier 1 centers to LHCb during 2018.}
\end{figure} \end{figure}
\begin{figure} \begin{figure}
\begin{center} \begin{center}
\includegraphics[width=0.8\textwidth]{T2.png} \includegraphics[width=0.8\textwidth]{T2.png}
\end{center} \end{center}
\caption{\label{fig:T2}Monthly CPU work provided by the Tier-2 centres to LHCb during 2018.} \caption{\label{fig:T2}Monthly CPU work provided by the Tier 2 centers to LHCb during 2018.}
\end{figure} \end{figure}
\begin{table}[htbp] \begin{table}[htbp]
\caption{Average CPU power provided by the Tier-0 and the Tier 1 \caption{Average CPU power provided by the Tier 0 and the Tier 1
centres to LHCb during 2018.} centers to LHCb during 2018.}
\centering \centering
\begin{tabular}{lcc} \begin{tabular}{lcc}
\hline \hline
...@@ -110,8 +107,8 @@ UK-T1-RAL & 71.7 & 74.8 \\ ...@@ -110,8 +107,8 @@ UK-T1-RAL & 71.7 & 74.8 \\
\end{table} \end{table}
\begin{table}[htbp] \begin{table}[htbp]
\caption{Average CPU power provided by the Tier-2 \caption{Average CPU power provided by the Tier 2
centres to LHCb during 2018.} centers to LHCb during 2018.}
\centering \centering
\begin{tabular}{lcc} \begin{tabular}{lcc}
\hline \hline
...@@ -137,21 +134,21 @@ UK & 85.7 & 29.3 \\ ...@@ -137,21 +134,21 @@ UK & 85.7 & 29.3 \\
\label{tab:T2} \label{tab:T2}
\end{table} \end{table}
The average power used at Tier0+Tier1s sites is about 32\% higher than the pledges. The average power used at Tier2s is about 26\% higher than the pledges. The average power used at Tier 0 + Tier 1 sites is about 32\% higher than the pledges. The average power used at Tier 2 sites is about 26\% higher than the pledges.
The average CPU power accounted for by WLCG (including Tier0/1 + Tier2) amounts to 654 kHS06, to be compared to 502 kHS06 estimated needs quoted in Table~\ref{tab:pledges}. The Tier0 and Tier1s usage is generally higher than the pledges. The LHCb computing model is flexible enough to use computing resources for all production workflows wherever available. It is important to note that this is true also for CNAF, despite it started to contribute to the computing activities only in March, after the recovery from the incident. After that the CNAF Tier1 has offered great stability, leading to maximal efficiency in the overall exploitation of the resources. The total amount of CPU used at Tier0 and Tier1s centres is detailed in Figure~\ref{fig:T0T1_MC}, showing that about 76\% of the CPU work is due to Monte Carlo simulation. From the same plot it is visible the start of a stripping campaign in March. This corresponds to the recovery of the backlog in the restripping of the Run2 data collected in 2015-2017, due to the unavailability of CNAF after the incident of November 2017. As it is visible from the plot, the backlog has been recovered by the end of April 2018, before the restart of data-taking operations. Even if all the other Tier1s contributed to reprocess these data, the recall of them from tape has been done exclusively at CNAF. Approximately 580 TB of data have been recalled from tape in about 6 weeks, with a maximum throughput of about 250 MB/s. The average CPU power accounted for by WLCG (including Tier 0/1 + Tier 2) amounts to 654 kHS06, to be compared to 502 kHS06 estimated needs quoted in Table~\ref{tab:pledges}. The Tier 0 and Tier 1s usage is generally higher than the pledges. The LHCb computing model is flexible enough to use computing resources for all production workflows wherever available. It is important to note that this is true also for CNAF, despite it started to contribute to the computing activities only in March, after the recovery from the incident. After that the CNAF Tier 1 has offered great stability, leading to maximal efficiency in the overall exploitation of the resources. The total amount of CPU used at Tier 0 and Tier 1s centers is detailed in Figure~\ref{fig:T0T1_MC}, showing that about 76\% of the CPU work is due to Monte Carlo simulation. From the same plot it is visible the start of a stripping campaign in March. This corresponds to the recovery of the backlog in the restripping of the Run2 data collected in 2015-2017, due to the unavailability of CNAF after the incident of November 2017. As it is visible from the plot, the backlog has been recovered by the end of April 2018, before the restart of data-taking operations. Even if all the other Tier 1 centers contributed to reprocess these data, the recall of them from tape has been done exclusively at CNAF. Approximately 580 TB of data have been recalled from tape in about 6 weeks, with a maximum throughput of about 250 MB/s.
\begin{figure} \begin{figure}
\begin{center} \begin{center}
\includegraphics[width=0.8\textwidth]{T0T1_MC.png} \includegraphics[width=0.8\textwidth]{T0T1_MC.png}
\end{center} \end{center}
\caption{\label{fig:T0T1_MC}Usage of LHCb resources at Tier0 and Tier1s during 2018. The plot shows the normalized CPU usage (kHS06) for the various activities.} \caption{\label{fig:T0T1_MC}Usage of LHCb resources at Tier 0 and Tier 1 sites during 2018. The plot shows the normalized CPU usage (kHS06) for the various activities.}
\end{figure} \end{figure}
Since the start of data taking in May 2018, tape storage grew by about 16.7 PB. Of these, 9.5 PB were due to new collected RAW data. The rest was due to RDST (2.6 PB) and ARCHIVE (4.6 PB), the latter due to the archival of Monte Carlo productions, re-stripping of former real data, and new Run2 data. The total tape occupancy as of December 31st 2018 is 68.9 PB, 38.4 PB of which are used for RAW data, 13.3 PB for RDST, 17.2 PB for archived data. This is 12.9\% lower than the original request of 79.2 PB. The total tape occupancy at CNAF at the end of 2018 was about 9.3 PB, of which 3.3 PB of RAW data, 3.6 PB of ARCHIVE and 2.4 of RDST. This correspond to an increase of about 2.3 PB with respect to the end of 2017. These numbers are in agreement with the share of resources expected from CNAF. Since the start of data taking in May 2018, tape storage grew by about 16.7 PB. Of these, 9.5 PB were due to new collected RAW data. The rest was due to RDST (2.6 PB) and ARCHIVE (4.6 PB), the latter due to the archival of Monte Carlo productions, re-stripping of former real data, and new Run2 data. The total tape occupancy as of December 31st 2018 is 68.9 PB, 38.4 PB of which are used for RAW data, 13.3 PB for RDST, 17.2 PB for archived data. This is 12.9\% lower than the original request of 79.2 PB. The total tape occupancy at CNAF at the end of 2018 was about 9.3 PB, of which 3.3 PB of RAW data, 3.6 PB of ARCHIVE and 2.4 of RDST. This correspond to an increase of about 2.3 PB with respect to the end of 2017. These numbers are in agreement with the share of resources expected from CNAF.
\begin{table}[htbp] \begin{table}[htbp]
\caption{Disk Storage resource usage as of February 11$^{\rm th}$ 2019 for the Tier0 and Tier1s centres. The top row is taken from the LHCb accounting, the other ones (used, available and installed capacity) are taken from the recently commissioned WLCG Storage Space Accounting tool. The 2018 pledges are shown in the last row.} \caption{Disk Storage resource usage as of February 11$^{\rm th}$ 2019 for the Tier 0 and Tier 1 centers. The top row is taken from the LHCb accounting, the other ones (used, available and installed capacity) are taken from the recently commissioned WLCG Storage Space Accounting tool. The 2018 pledges are shown in the last row.}
\begin{center} \begin{center}
\resizebox{\columnwidth}{!}{ \resizebox{\columnwidth}{!}{
\begin{tabular}{|l|cc|ccccccc|} \begin{tabular}{|l|cc|ccccccc|}
...@@ -173,14 +170,14 @@ Pledge '18 & 11.4 & 26.25 & 5.61 & 4.01 & 3.20 & 1.43 & 7.32 ...@@ -173,14 +170,14 @@ Pledge '18 & 11.4 & 26.25 & 5.61 & 4.01 & 3.20 & 1.43 & 7.32
\end{center} \end{center}
\end{table} \end{table}
Table~\ref{tab:disk} shows the situation of disk storage resources at CERN and Tier1s, as well as at each Tier1 site, as of February 11$^{\rm th}$ 2019. The used space includes derived data, i.e. DST and micro-DST of both real and simulated data, and space reserved for users. The latter accounts for 1.2 PB in total, 0.9 of which are used. The SRR disk used and SRR disk free information concerns only permanent disk storage (previously known as “T0D1”). The first two lines show a good agreement between what the site reports and what the LHCb accounting (first line) reports. The sum of the Tier0 and Tier1s 2018 pledges amount to 37.7 PB. The available disk space is 35 PB in total, 26 PB of which are used to store real and simulated datasets, and user data. A total of 3.7PB is used as tape buffer, the remaining 5 PB are free and will be used to store the output of the legacy stripping campaigns of Run1 and Run2 data that are currently being prepared. The disk space available at CNAF is about 6.6 PB, about 18\% above the pledge. Table~\ref{tab:disk} shows the situation of disk storage resources at CERN and Tier 1 sites, as well as at each Tier 1 site, as of February 11$^{\rm th}$ 2019. The used space includes derived data, i.e. DST and micro-DST of both real and simulated data, and space reserved for users. The latter accounts for 1.2 PB in total, 0.9 of which are used. The SRR disk used and SRR disk free information concerns only permanent disk storage (previously known as “T0D1”). The first two lines show a good agreement between what the site reports and what the LHCb accounting (first line) reports. The sum of the Tier 0 and Tier 1 sites 2018 pledges amount to 37.7 PB. The available disk space is 35 PB in total, 26 PB of which are used to store real and simulated datasets, and user data. A total of 3.7 PB is used as tape buffer, the remaining 5 PB are free and will be used to store the output of the legacy stripping campaigns of Run1 and Run2 data that are currently being prepared. The disk space available at CNAF is about 6.6 PB, about 18\% above the pledge.
In summary, the usage of computing resources in the 2018 calendar year has been quite smooth for LHCb. Simulation is the dominant activity in terms of CPU work. Additional unpledged resources, as well as clouds, on-demand and volunteer computing resources, were also successfully used. They were essential In summary, the usage of computing resources in the 2018 calendar year has been quite smooth for LHCb. Simulation is the dominant activity in terms of CPU work. Additional unpledged resources, as well as clouds, on-demand and volunteer computing resources, were also successfully used. They were essential
in providing CPU work during the outage of the CNAF Tier 1 centre. As for the INFN Tier1 at CNAF, it came back to its fully-operational status in March 2018. After that, the backlog in the restripping campaign due to unavailability of data stored at CNAF was recovered, thanks also to the contribution of other sites, in time for the restart of data taking. After March 2018, CNAF operated in a very efficient and reliable way, being even able to over perform in terms of CPU power with respect to the pledged resources. in providing CPU work during the outage of the CNAF Tier 1 center. As for the INFN Tier 1 at CNAF, it came back to its fully-operational status in March 2018. After that, the backlog in the restripping campaign due to unavailability of data stored at CNAF was recovered, thanks also to the contribution of other sites, in time for the restart of data taking. After March 2018, CNAF operated in a very efficient and reliable way, being even able to over perform in terms of CPU power with respect to the pledged resources.
\section{Expected growth of resources in 2020-2021} \section{Expected growth of resources in 2020-2021}
In terms of CPU requirements, the different activities result in CPU work estimates for 2020-2021, that are apportioned between the different Tiers taking into account the computing model constraints and also capacities that are already installed. This results in the requests shown in Table~\ref{tab:req_CPU} together with the pledged resources for 2019. The CPU work required at CNAF would correspond to about 18\% of the total CPU requested at Tier1s+Tier2s sites. In terms of CPU requirements, the different activities result in CPU work estimates for 2020-2021, that are apportioned between the different Tiers taking into account the computing model constraints and also capacities that are already installed. This results in the requests shown in Table~\ref{tab:req_CPU} together with the pledged resources for 2019. The CPU work required at CNAF would correspond to about 18\% of the total CPU requested at Tier 1s+Tier 2s sites.
\begin{table}[htbp] \begin{table}[htbp]
\centering \centering
\caption{CPU power requested at the different Tiers in 2020-2021. Pledged resources for 2019 are also reported} \caption{CPU power requested at the different Tiers in 2020-2021. Pledged resources for 2019 are also reported}
...@@ -198,7 +195,7 @@ In terms of CPU requirements, the different activities result in CPU work estima ...@@ -198,7 +195,7 @@ In terms of CPU requirements, the different activities result in CPU work estima
\end{tabular} \end{tabular}
\end{table} \end{table}
The forecast total disk and tape space usage at the end of the years 2019-2020 are broken down into fractions to be provided by the different Tiers. These numbers are shown in Table~\ref{tab:req_disk} for disk and Table~\ref{tab:req_tape} for tape. The disk resources required at CNAF would be about 18\% of those requested for Tier1s+Tier2s sites, while for tape storage CNAF is supposed to provide about 24\% of the total tape request to Tier1s sites. The forecast total disk and tape space usage at the end of the years 2019-2020 are broken down into fractions to be provided by the different Tiers. These numbers are shown in Table~\ref{tab:req_disk} for disk and Table~\ref{tab:req_tape} for tape. The disk resources required at CNAF would be about 18\% of those requested for Tier 1 sites + Tier 2 sites, while for tape storage CNAF is supposed to provide about 24\% of the total tape request to Tier 1 sites.
\begin{table}[htbp] \begin{table}[htbp]
\centering \centering
...@@ -208,11 +205,11 @@ The forecast total disk and tape space usage at the end of the years 2019-2020 a ...@@ -208,11 +205,11 @@ The forecast total disk and tape space usage at the end of the years 2019-2020 a
\hline \hline
Disk (PB) & 2019 & 2020 & 2021 \\ Disk (PB) & 2019 & 2020 & 2021 \\
\hline \hline
Tier0 & 13.4 & 17.2 & 19.5 \\ Tier 0 & 13.4 & 17.2 & 19.5 \\
Tier 1 & 29.0 & 33.2 & 39.0 \\ Tier 1 & 29.0 & 33.2 & 39.0 \\
Tier2 & 4 & 7.2 & 7.5 \\ Tier 2 & 4 & 7.2 & 7.5 \\
\hline \hline
Total & 46.4 & 57.6 & 66.0 \\ Total WLCG & 46.4 & 57.6 & 66.0 \\
\hline \hline
\end{tabular} \end{tabular}
\end{table} \end{table}
...@@ -225,16 +222,16 @@ The forecast total disk and tape space usage at the end of the years 2019-2020 a ...@@ -225,16 +222,16 @@ The forecast total disk and tape space usage at the end of the years 2019-2020 a
\hline \hline
Tape (PB) & 2019 & 2020 & 2021 \\ Tape (PB) & 2019 & 2020 & 2021 \\
\hline \hline
Tier0 & 35.0 & 36.1 & 52.0 \\ Tier 0 & 35.0 & 36.1 & 52.0 \\
Tier 1 & 53.1 & 55.5 & 90.0 \\ Tier 1 & 53.1 & 55.5 & 90.0 \\
\hline \hline
Total & 88.1 & 91.6 & 142.0 \\ Total WLCG & 88.1 & 91.6 & 142.0 \\
\hline \hline
\end{tabular} \end{tabular}
\end{table} \end{table}
\section{Conclusion} \section{Conclusion}
A description of the LHCb computing activities during 2018 has been given, with particular emphasis on the usage of resources and on the forecasts of resource needs until 2021. As in previous years, the CNAF Tier1 centre gave a substantial contribution to LHCb computing in terms of CPU work and storage made available to the collaboration. This achievement is particularly important this year, as CNAF was recovering from the major incident of November 2017 that unfortunately interrupted its activities. The effects of CNAF unavailability have been overcome also thanks to extra efforts from other sites and to the opportunistic usage of non-WLCG resources. The main consequence of the incident, in terms of LHCb operations, has been the delay in the restripping campaign of data collected during 2015-2017. The data that were stored at CNAF (approximately 20\% of the total) have been processed when the site restarted the operations in March 2018. It is worth to mention that despite the delay, the restripping campaign has been completed before the start of data taking according to the predicted schedule, avoiding further stress to the LHCb computing operations. Emphasis should be put also on the fact that an almost negligible amount of data have been lost in the incident and in any case it has been possible to recover them from backup copies stored at other sites. A description of the LHCb computing activities during 2018 has been given, with particular emphasis on the usage of resources and on the forecasts of resource needs until 2021. As in previous years, the CNAF Tier 1 center gave a substantial contribution to LHCb computing in terms of CPU work and storage made available to the collaboration. This achievement is particularly important this year, as CNAF was recovering from the major incident of November 2017 that unfortunately interrupted its activities. The effects of CNAF unavailability have been overcome also thanks to extra efforts from other sites and to the opportunistic usage of non-WLCG resources. The main consequence of the incident, in terms of LHCb operations, has been the delay in the restripping campaign of data collected during 2015-2017. The data that were stored at CNAF (approximately 20\% of the total) have been processed when the site restarted the operations in March 2018. It is worth to mention that despite the delay, the restripping campaign has been completed before the start of data taking according to the predicted schedule, avoiding further stress to the LHCb computing operations. Emphasis should be put also on the fact that an almost negligible amount of data have been lost in the incident and in any case it has been possible to recover them from backup copies stored at other sites.
\end{document} \end{document}
...@@ -26,7 +26,7 @@ The LHCf detector is made of two independent electromagnetic calorimeters placed ...@@ -26,7 +26,7 @@ The LHCf detector is made of two independent electromagnetic calorimeters placed
\section{Results obtained in 2018} \section{Results obtained in 2018}
During 2018 no experimental operations were performed in LHC tunnel or SPS experimental area, so all the work was concentrated to the analysis of data collected during the 2015 operation in p-p collisions at 13 TeV and during 2016 operation in p-Pb collisions at 8.16 TeV. During 2018 no experimental operations were performed in LHC tunnel or SPS experimental area, so all the work was concentrated to the analysis of data collected during the 2015 operation in p-p collisions at 13 TeV and during 2016 operation in p-Pb collisions at 8.16 TeV.
The final results of photon and neutron production spectra in proton-proton collisions at $\sqrt{s} =$ 13 TeV in the very forward region ($8.81 < \eta < 8.99$ and $\eta > 10.94$ for photons, $8.81 < \eta < 9.22$ and $\eta > 10.76$ for neutrons) were published on Physics Letters B and Journal of High Energy Physics, respectively \cite{LHCf_photons, LHCf_neutrons}. The final results of photon and neutron production spectra in proton-proton collisions at $\sqrt{s} =$ 13 TeV in the very forward region ($8.81 < \eta < 8.99$ and $\eta > 10.94$ for photons, $8.81 < \eta < 9.22$ and $\eta > 10.76$ for neutrons, where $\eta$ is the pseudorapidity of the particle\footnote{In accelerator experiments the pseudorapidity of a particle is defined as $\eta = - \ln [ \tan(\theta / 2) ]$, where $\theta$ is the angle between the particle momentum and the beam axis.}) were published on Physics Letters B and Journal of High Energy Physics, respectively \cite{LHCf_photons, LHCf_neutrons}.
These are the first published results of the collaboration at the highest available collision energy of 13 TeV at the LHC. These are the first published results of the collaboration at the highest available collision energy of 13 TeV at the LHC.
In addition to proton-proton results, preliminary results for photon spectrum in proton-lead collisions at $\sqrt{s_{NN}} = 8.16$ TeV were obtained and presented in several international conferences. In addition to proton-proton results, preliminary results for photon spectrum in proton-lead collisions at $\sqrt{s_{NN}} = 8.16$ TeV were obtained and presented in several international conferences.
......