lhcb.tex

\documentclass[a4paper]{jpconf}
\usepackage{graphicx}
\begin{document}
\title{LHCb Computing at CNAF}

\author{Stefano Perazzini}

\address{INFN Sezione di Bologna, viale Berti Pichat 6/2, 40127 Bologna (BO), Italy E-mail: Stefano.Perazzini@bo.infn.it}

\author{Concezio Bozzi}

\address{CERN, EP/LBD, CH-1211 Geneve 23, Switzerland, and INFN Sezione di Ferrara, via Saragat 1, 44122 Ferrara, Italy E-mail: Concezio.Bozzi@fe.infn.it}


\ead{bozzi@fe.infn.it}


\begin{abstract}
In this document a summary of the LHCb computing activities during the 2018 is reported. The usage of the CPU, disk and tape resources spread among various computing centres is analysed, with particular attention to the performances of the INFN Tier 1 at CNAF. Projections of the necessary resources in the years to come are also briefly discussed.
\end{abstract}

\section{Introduction}
The Large Hadron Collider beauty (LHCb) experiment is the experiment dedicated to the study of $c$- and $b$-physics at the Large Hadrond Collider (LHC) accelerator at CERN. Exploiting the large production cross section of $b\bar{b}$ and $c\bar{c}$ quark pairs in proton-proton ($p-p$) collisions at the LHC, LHCb is able to analyse unprecedented quantities of heavy-flavoured hadrons, with particular attention to their $C\!P$-violation observables. Besides its core programme, the LHCb experiment is also able to perform analyses of production cross sections and electroweak physics in the forward region. To date, the LHCb collaboration is composed of about 1350 people, from 79 institutes spread all around the world in 18 countries. More than 50 physics paper have been published by LHCb during 2018, for a total of almost 500 papers since the start of its activities in 2010.

The LHCb detector is a single-arm forward spectrometer covering the pseudorapidity range between 2 and 5. The detector includes a
high-precision tracking system consisting of a silicon-strip vertex detector surrounding the $p-p$ interaction region, a large-area
silicon-strip detector located upstream of a dipole magnet with a bending power of about 4 Tm, and three stations of silicon-strip
detectors and straw drift tubes placed downstream. The combined tracking system provides a momentum measurement with relative
uncertainty that varies from 0.4\% at 5 GeV/$c$ to 0.6\% at 100 GeV/$c$, and impact parameter resolution of 20 $\mu$m for tracks with high transverse momenta. Charged hadrons are identified using two ring-imaging Cherenkov detectors. Photon, electron and hadron candidates are identified by a calorimeter system consisting of scintillating-pad and preshower detectors, an electromagnetic calorimeter and a hadronic calorimeter. Muons are identified by a system composed of alternating layers of iron and multiwire proportional chambers. The trigger consists of a hardware stage, based on information from the calorimeter and muon systems, followed by a software stage, which applies a full event reconstruction.  

\section{Overview of LHCb computing activities in 2018}

The usage of offline computing resources involved: (a) the production of simulated events, which runs continuously; (b) running user jobs, which is also continuous; (c) stripping cycles before and after the end of data taking; (d) processing (i.e. reconstruction and stripping of the full stream, µDST streaming of the TURBO stream) of data taken in 2018 in proton-proton and heavy ion collisions; (e) centralized production of ntuples for analysis working groups.

Activities related to the 2018 data taking were tested in May and started at the beginning of the LHC physics run mid-June. A steady processing and export of data transferred from the pit to offline were always seen. 

We recall that LHCb implemented in Run2 a trigger strategy, by which the high-level trigger is split in two parts. The first one (HLT1), synchronous with data taking, writes events at a 150kHz output rate in a temporary disk buffer located on the HLT farm nodes. Real-time calibrations and alignments are then performed and used in the second high-level trigger stage (HLT2), where event reconstruction algorithms as close as possible to those run offline are applied, and event selection is taking place.

Events passing the high-level trigger selections are sent to offline, either via a FULL stream of RAW events which are then reconstructed and processed as in Run1, or via a TURBO stream which directly records the results of the online reconstruction on tape. TURBO data are subsequently reformatted in a µDST format that does not require further processing, are stored on disk and can be used right away for physics analysis.

The information saved for an event of the TURBO stream is customizable and can range from information related to signal candidates only to the full event. The average size of a TURBO event is about 30kB, to be compared with a size of 60kB for the full event. The TURBO output is also streamed in O(5) streams, in order to optimize data access.

The offline reconstruction of the FULL stream for proton collision data run from May until November. The reconstruction of heavy-ion collision data was run in December.

A full re-stripping of 2015, 2016 and 2017 proton collision data, started in autumn 2017, ended in April 2018. A stripping cycle of 2015 lead collision data was also performed in that period. The stripping cycle concurrent with the 2018 proton collision data taking started in June and run continuously until November.

The INFN Tier1 centre at CNAF was in downtime from November 2017, due to a major flood incident. However, the site was again fully available in March 2018, allowing the completion of the stripping cycles on hold, waiting for the data located at CNAF (about 20\% of the total). Despite the unavailability of CNAF resources for the first months of 2018 the site performed excellently for the rest of the year, as testified by the number reported in this report.

As in previous years, LHCb continued to make use of opportunistic resources, that are not pledged to WLCG, but that significantly contributed to the overall usage.

\section{Resource usage in 2018}

Table~\ref{tab:pledges} shows the resources pledged for LHCb at the various tier levels for the 2018 period.
\begin{table}[htbp]
  \caption{LHCb 2017 WLCG pledges.}
  \centering
  \begin{tabular}{lccc}
    \hline
    2018 & CPU & Disk & Tape \\
             & kHS06 & PB & PB \\
\hline
Tier 0	& 88	& 11.4	& 33.6 \\
Tier 1	& 250	& 26.2	& 56.9 \\
Tier 2	& 164	& 3.7	&  \\ \hline
Total WLCG	& 502	& 41.3 & 90.5\\ \hline
  \end{tabular}
  \label{tab:pledges}
\end{table}

The usage of WLCG CPU resources by LHCb is obtained from the different views provided by the EGI Accounting portal. The CPU usage is presented in Figure~\ref{fig:T0T1} for the Tier0 and Tier1s and in Figure~\ref{fig:T2} for Tier2s . The same data is presented in tabular form in Table~\ref{tab:T0T1} and Table~\ref{tab:T2}, respectively. 

\begin{figure}
\begin{center}
\includegraphics[width=0.8\textwidth]{T0T1.png}
\end{center}
\caption{\label{fig:T0T1}Monthly CPU work provided by the Tier-0 and
  Tier 1 centres to LHCb during 2018.}
\end{figure}

\begin{figure}
\begin{center}
\includegraphics[width=0.8\textwidth]{T2.png}
\end{center}
\caption{\label{fig:T2}Monthly CPU work provided by the Tier-2 centres to LHCb during 2018.}
\end{figure}


\begin{table}[htbp]
  \caption{Average CPU power provided by the Tier-0 and the Tier 1
    centres to LHCb during 2018.}
  \centering
  \begin{tabular}{lcc}
\hline
    $<$Power$>$	& Used & Pledge \\ 
    & kHS06 & kHS06 \\
\hline
CH-CERN	& 141.0	& 88 \\
DE-KIT	& 51.3	& 42.2 \\
ES-PIC	& 12.8	& 14.8 \\
FR-CCIN2P3	& 43.2	& 30.4 \\
IT-INFN-CNAF	& 64.1	& 46.8 \\
NL-T1	& 38.0	& 24.6 \\
RRC-KI-T1	& 22.0	& 16.4 \\ 
UK-T1-RAL	& 71.7	& 74.8 \\
\hline
 Total	& 447.6	& 338.1 \\
\hline
  \end{tabular}
  \label{tab:T0T1}
\end{table}

\begin{table}[htbp]
  \caption{Average CPU power provided by the Tier-2
    centres to LHCb during 2018.}
  \centering
  \begin{tabular}{lcc}
\hline
    $<$Power$>$	& Used & Pledge \\ 
    & kHS06 & kHS06 \\
\hline
China   & 0.3 & 0 \\
Brazil	& 11.5	& 17.2 \\
France	& 27.1	& 22.9 \\
Germany	& 9.3	& 8.1 \\
Israel	& 0.2	& 0 \\
Italy	& 8.6	& 26.0 \\
Poland	& 7.6	& 4.1 \\
Romania	& 2.8	& 6.5 \\
Russia	& 16.0	& 19.0 \\
Spain	& 7.9	& 7.0 \\
Switzerland	& 29.6	& 24.0 \\
UK	& 85.7	& 29.3 \\
\hline
 Total	& 206.3	& 164.0 \\
\hline
  \end{tabular}
  \label{tab:T2}
\end{table}

The average power used at Tier0+Tier1s sites is about 32\% higher than the pledges. The average power used at Tier2s is about 26\% higher than the pledges.   

The average CPU power accounted for by WLCG (including Tier0/1 + Tier2) amounts to 654 kHS06, to be compared to 502 kHS06 estimated needs quoted in Table~\ref{tab:pledges}. The Tier0 and Tier1s usage is generally higher than the pledges. The LHCb computing model is flexible enough to use computing resources for all production workflows wherever available. It is important to note that this is true also for CNAF, despite it started to contribute to the computing activities only in March, after the recovery from the incident. After that the CNAF Tier1 has offered great stability, leading to maximal efficiency in the overall exploitation of the resources. The total amount of CPU used at Tier0 and Tier1s centres is detailed in Figure~\ref{fig:T0T1_MC}, showing that about 76\% of the CPU work is due to Monte Carlo simulation. From the same plot it is visible the start of a stripping campaign in March. This corresponds to the recovery of the backlog in the restripping of the Run2 data collected in 2015-2017, due to the unavailability of CNAF after the incident of November 2017. As it is visible from the plot, the backlog has been recovered by the end of April 2018, before the restart of data-taking operations. Even if all the other Tier1s contributed to reprocess these data, the recall of them from tape has been done exclusively at CNAF. Approximately 580 TB of data have been recalled from tape in about 6 weeks, with a maximum throughput of about 250 MB/s.

\begin{figure}
\begin{center}
\includegraphics[width=0.8\textwidth]{T0T1_MC.png}
\end{center}
\caption{\label{fig:T0T1_MC}Usage of LHCb resources at Tier0 and Tier1s during 2018. The plot shows the normalized CPU usage (kHS06) for the various activities.}
\end{figure}

Since the start of data taking in May 2018, tape storage grew by about 16.7 PB. Of these, 9.5 PB were due to new collected RAW data. The rest was due to RDST (2.6 PB) and ARCHIVE (4.6 PB), the latter due to the archival of Monte Carlo productions, re-stripping of former real data, and new Run2 data. The total tape occupancy as of December 31st 2018 is 68.9 PB, 38.4 PB of which are used for RAW data, 13.3 PB for RDST, 17.2 PB for archived data. This is 12.9\% lower than the original request of 79.2 PB. The total tape occupancy at CNAF at the end of 2018 was about 9.3 PB, of which 3.3 PB of RAW data, 3.6 PB of ARCHIVE and 2.4 of RDST. This correspond to an increase of about 2.3 PB with respect to the end of 2017. These numbers are in agreement with the share of resources expected from CNAF.

\begin{table}[htbp]
  \caption{Disk Storage resource usage as of February 11$^{\rm th}$ 2019 for the Tier0 and Tier1s centres. The top row is taken from the LHCb accounting, the other ones (used, available and installed capacity) are taken from the recently commissioned WLCG Storage Space Accounting tool. The 2018 pledges are shown in the last row.}
  \begin{center}
  \resizebox{\columnwidth}{!}{
  \begin{tabular}{|l|cc|ccccccc|}
\hline
 Disk (PB)            & CERN  & Tier 1s & CNAF  & GRIDKA  & IN2P3 & PIC  & RAL  & RRCKI & SARA \\ 
\hline
LHCb accounting       & 6.00  & 19.78   & 4.33  & 3.32    & 2.77  & 1.21 & 4.55 & 1.63  & 1.96 \\ 
\hline
SRM T0D1 used         & 6.32  & 19.88   & 4.37  & 3.34    & 2.77  & 1.25 & 4.54 & 1.69  & 1.92 \\ 
SRM T0D1 free         & 2.08  & 2.87    & 0.93  & 0.44    & 0.36  & 0.11 & 0.26 & 0.71  & 0.05 \\
SRM T1D0 (used+free)  & 1.40  & 2.25    & 1.30  & 0.15    & 0.03  & 0.02 & 0.63 & 0.09  & 0.03 \\
\hline
SRM T0D1+T1D0 total   & 9.80  & 25.00   & 6.60  & 3.93    & 3.16  & 1.38 & 5.43 & 2.49  & 2.01 \\
\hline
Pledge '18            & 11.4  & 26.25   & 5.61  & 4.01    & 3.20  & 1.43 & 7.32 & 2.30  & 2.31 \\
\hline
 \end{tabular}\label{tab:disk}
}
\end{center}
\end{table}

Table~\ref{tab:disk} shows the situation of disk storage resources at CERN and Tier1s, as well as at each Tier1 site, as of February 11$^{\rm th}$ 2019. The used space includes derived data, i.e. DST and micro-DST of both real and simulated data, and space reserved for users. The latter accounts for 1.2 PB in total, 0.9 of which are used. The SRR disk used and SRR disk free information concerns only permanent disk storage (previously known as “T0D1”). The first two lines show a good agreement between what the site reports and what the LHCb accounting (first line) reports. The sum of the Tier0 and Tier1s 2018 pledges amount to 37.7 PB. The available disk space is 35 PB in total, 26 PB of which are used to store real and simulated datasets, and user data. A total of 3.7PB is used as tape buffer, the remaining 5 PB are free and will be used to store the output of the legacy stripping campaigns of Run1 and Run2 data that are currently being prepared. The disk space available at CNAF is about 6.6 PB, about 18\% above the pledge. 

In summary, the usage of computing resources in the 2018 calendar year has been quite smooth for LHCb. Simulation is the dominant activity in terms of CPU work. Additional unpledged resources, as well as clouds, on-demand and volunteer computing resources, were also successfully used. They were essential
in providing CPU work during the outage of the CNAF Tier 1 centre. As for the INFN Tier1 at CNAF, it came back to its fully-operational status in March 2018. After that, the backlog in the restripping campaign due to unavailability of data stored at CNAF was recovered, thanks also to the contribution of other sites, in time for the restart of data taking. After March 2018, CNAF operated in a very efficient and reliable way, being even able to over perform in terms of CPU power with respect to the pledged resources.

\section{Expected growth of resources in 2020-2021}

In terms of CPU requirements, the different activities result in CPU work estimates for 2020-2021, that are apportioned between the different Tiers taking into account the computing model constraints and also capacities that are already installed. This results in the requests shown in Table~\ref{tab:req_CPU} together with the pledged resources for 2019. The CPU work required at CNAF would correspond to about 18\% of the total CPU requested at Tier1s+Tier2s sites. 
\begin{table}[htbp]
  \centering
  \caption{CPU power requested at the different Tiers in 2020-2021. Pledged resources for 2019 are also reported}
\label{tab:req_CPU}
  \begin{tabular}{lccc}
    \hline
    CPU power (kHS06) & 2019 & 2020 & 2021 \\
    \hline
    Tier 0	          & 86 & 98	& 125\\
    Tier 1	          & 268 & 328	& 409\\
    Tier 2	          & 193 & 185	& 229\\
    \hline
    Total WLCG        & 547 &	611	& 763\\
    \hline	
  \end{tabular}
  \end{table}

The forecast total disk and tape space usage at the end of the years 2019-2020 are broken down into fractions to be provided by the different Tiers. These numbers are shown in Table~\ref{tab:req_disk} for disk and Table~\ref{tab:req_tape} for tape. The disk resources required at CNAF would be about 18\% of those requested for Tier1s+Tier2s sites, while for tape storage CNAF is supposed to provide about 24\% of the total tape request to Tier1s sites.

\begin{table}[htbp]
  \centering
  \caption{LHCb disk request for each Tier level in 2020-2021. Pledged resources for 2019 are also shown.}
  \label{tab:req_disk}
  \begin{tabular}{lccc}
    \hline
    Disk (PB) & 2019 & 2020 & 2021 \\
    \hline
    Tier0	    & 13.4 & 17.2	& 19.5 \\ 
    Tier 1	  & 29.0 & 33.2	& 39.0 \\
    Tier2	    & 4    & 7.2	& 7.5  \\
    \hline
    Total	    & 46.4 & 57.6	& 66.0 \\
    \hline
  \end{tabular}
\end{table}

\begin{table}[htbp]
  \centering
  \caption{LHCb tape request for each Tier level in 2020-2012. Pledged resources for 2019 are also reported.}
  \label{tab:req_tape}
  \begin{tabular}{lccc}
    \hline
    Tape (PB) & 2019 & 2020 & 2021 \\
    \hline
    Tier0	    & 35.0 & 36.1	& 52.0 \\
    Tier 1	  & 53.1 & 55.5	& 90.0 \\
    \hline
    Total	    & 88.1 & 91.6	& 142.0 \\
    \hline
  \end{tabular}
\end{table}

\section{Conclusion}

A description of the LHCb computing activities during 2018 has been given, with particular emphasis on the usage of resources and on the forecasts of resource needs until 2021. As in previous years, the CNAF Tier1 centre gave a substantial contribution to LHCb computing in terms of CPU work and storage made available to the collaboration. This achievement is particularly important this year, as CNAF was recovering from the major incident of November 2017 that unfortunately interrupted its activities. The effects of CNAF unavailability have been overcome also thanks to extra efforts from other sites and to the opportunistic usage of non-WLCG resources. The main consequence of the incident, in terms of LHCb operations, has been the delay in the restripping campaign of data collected during 2015-2017. The data that were stored at CNAF (approximately 20\% of the total) have been processed when the site restarted the operations in March 2018. It is worth to mention that despite the delay, the restripping campaign has been completed before the start of data taking according to the predicted schedule, avoiding further stress to the LHCb computing operations. Emphasis should be put also on the fact that an almost negligible amount of data have been lost in the incident and in any case it has been possible to recover them from backup copies stored at other sites.

\end{document}