@@ -44,7 +44,7 @@ The offline reconstruction of the FULL stream for proton collision data run from
A full re-stripping of 2015, 2016 and 2017 proton collision data, started in autumn 2017, ended in April 2018. A stripping cycle of 2015 lead collision data was also performed in that period. The stripping cycle concurrent with the 2018 proton collision data taking started in June and run continuously until November.
The INFN Tier1 centre at CNAF was in downtime from November 2017, due to a major flood incident. However, the site was again fully available in March 2018, allowing the completion of the stripping cycles on hold, waiting for the data located at CNAF (about 20\% of the total). Despite the unavailability of CNAF resources for the first months of 2018 the site performed excellently for the rest of the year, as testified by the number reported in this report.
The INFN Tier1 centre at CNAF was in downtime from November 2017, due to a major flood incident. However, the site was again fully available in March 2018, allowing the completion of the stripping cycles on hold, waiting for the data located at CNAF (about 20\% of the total). Despite the unavailability of CNAF resources for the first months of 2018 the site performed excellently for the rest of the year, as testified by the number reported in this report.
As in previous years, LHCb continued to make use of opportunistic resources, that are not pledged to WLCG, but that significantly contributed to the overall usage.
The usage of WLCG CPU resources by LHCb is obtained from the different views provided by the EGI Accounting portal. The CPU usage is presented in Figure~\ref{fig:T0T1} for the Tier0 and Tier1s and in Figure~\ref{fig:T2} for Tier2s . The same data is presented in tabular form in Table~\ref{tab:T0T1} and Table~\ref{tab:T2}, respectively.
The usage of WLCG CPU resources by LHCb is obtained from the different views provided by the EGI Accounting portal. The CPU usage is presented in Figure~\ref{fig:T0T1} for the Tier0 and Tier 1 sites and in Figure~\ref{fig:T2} for Tier 2 sites. The same data is presented in tabular form in Table~\ref{tab:T0T1} and Table~\ref{tab:T2}, respectively.
\begin{figure}
\begin{center}
\includegraphics[width=0.8\textwidth]{T0T1.png}
\end{center}
\caption{\label{fig:T0T1}Monthly CPU work provided by the Tier-0 and
\caption{\label{fig:T0T1}Monthly CPU work provided by the Tier0 and
Tier 1 centres to LHCb during 2018.}
\end{figure}
...
...
@@ -81,12 +81,12 @@ The usage of WLCG CPU resources by LHCb is obtained from the different views pro
\begin{center}
\includegraphics[width=0.8\textwidth]{T2.png}
\end{center}
\caption{\label{fig:T2}Monthly CPU work provided by the Tier-2 centres to LHCb during 2018.}
\caption{\label{fig:T2}Monthly CPU work provided by the Tier2 centres to LHCb during 2018.}
\end{figure}
\begin{table}[htbp]
\caption{Average CPU power provided by the Tier-0 and the Tier 1
\caption{Average CPU power provided by the Tier0 and the Tier 1
centres to LHCb during 2018.}
\centering
\begin{tabular}{lcc}
...
...
@@ -110,7 +110,7 @@ UK-T1-RAL & 71.7 & 74.8 \\
\end{table}
\begin{table}[htbp]
\caption{Average CPU power provided by the Tier-2
\caption{Average CPU power provided by the Tier2
centres to LHCb during 2018.}
\centering
\begin{tabular}{lcc}
...
...
@@ -137,21 +137,21 @@ UK & 85.7 & 29.3 \\
\label{tab:T2}
\end{table}
The average power used at Tier0+Tier1s sites is about 32\% higher than the pledges. The average power used at Tier2s is about 26\% higher than the pledges.
The average power used at Tier 0 + Tier1 sites is about 32\% higher than the pledges. The average power used at Tier 2 sites is about 26\% higher than the pledges.
The average CPU power accounted for by WLCG (including Tier0/1 + Tier2) amounts to 654 kHS06, to be compared to 502 kHS06 estimated needs quoted in Table~\ref{tab:pledges}. The Tier0 and Tier1s usage is generally higher than the pledges. The LHCb computing model is flexible enough to use computing resources for all production workflows wherever available. It is important to note that this is true also for CNAF, despite it started to contribute to the computing activities only in March, after the recovery from the incident. After that the CNAF Tier1 has offered great stability, leading to maximal efficiency in the overall exploitation of the resources. The total amount of CPU used at Tier0 and Tier1s centres is detailed in Figure~\ref{fig:T0T1_MC}, showing that about 76\% of the CPU work is due to Monte Carlo simulation. From the same plot it is visible the start of a stripping campaign in March. This corresponds to the recovery of the backlog in the restripping of the Run2 data collected in 2015-2017, due to the unavailability of CNAF after the incident of November 2017. As it is visible from the plot, the backlog has been recovered by the end of April 2018, before the restart of data-taking operations. Even if all the other Tier1s contributed to reprocess these data, the recall of them from tape has been done exclusively at CNAF. Approximately 580 TB of data have been recalled from tape in about 6 weeks, with a maximum throughput of about 250 MB/s.
The average CPU power accounted for by WLCG (including Tier0/1 + Tier2) amounts to 654 kHS06, to be compared to 502 kHS06 estimated needs quoted in Table~\ref{tab:pledges}. The Tier0 and Tier1s usage is generally higher than the pledges. The LHCb computing model is flexible enough to use computing resources for all production workflows wherever available. It is important to note that this is true also for CNAF, despite it started to contribute to the computing activities only in March, after the recovery from the incident. After that the CNAF Tier1 has offered great stability, leading to maximal efficiency in the overall exploitation of the resources. The total amount of CPU used at Tier0 and Tier1s centres is detailed in Figure~\ref{fig:T0T1_MC}, showing that about 76\% of the CPU work is due to Monte Carlo simulation. From the same plot it is visible the start of a stripping campaign in March. This corresponds to the recovery of the backlog in the restripping of the Run2 data collected in 2015-2017, due to the unavailability of CNAF after the incident of November 2017. As it is visible from the plot, the backlog has been recovered by the end of April 2018, before the restart of data-taking operations. Even if all the other Tier 1 centres contributed to reprocess these data, the recall of them from tape has been done exclusively at CNAF. Approximately 580 TB of data have been recalled from tape in about 6 weeks, with a maximum throughput of about 250 MB/s.
\caption{\label{fig:T0T1_MC}Usage of LHCb resources at Tier0 and Tier1s during 2018. The plot shows the normalized CPU usage (kHS06) for the various activities.}
\caption{\label{fig:T0T1_MC}Usage of LHCb resources at Tier0 and Tier 1 sites during 2018. The plot shows the normalized CPU usage (kHS06) for the various activities.}
\end{figure}
Since the start of data taking in May 2018, tape storage grew by about 16.7 PB. Of these, 9.5 PB were due to new collected RAW data. The rest was due to RDST (2.6 PB) and ARCHIVE (4.6 PB), the latter due to the archival of Monte Carlo productions, re-stripping of former real data, and new Run2 data. The total tape occupancy as of December 31st 2018 is 68.9 PB, 38.4 PB of which are used for RAW data, 13.3 PB for RDST, 17.2 PB for archived data. This is 12.9\% lower than the original request of 79.2 PB. The total tape occupancy at CNAF at the end of 2018 was about 9.3 PB, of which 3.3 PB of RAW data, 3.6 PB of ARCHIVE and 2.4 of RDST. This correspond to an increase of about 2.3 PB with respect to the end of 2017. These numbers are in agreement with the share of resources expected from CNAF.
\begin{table}[htbp]
\caption{Disk Storage resource usage as of February 11$^{\rm th}$ 2019 for the Tier0 and Tier1s centres. The top row is taken from the LHCb accounting, the other ones (used, available and installed capacity) are taken from the recently commissioned WLCG Storage Space Accounting tool. The 2018 pledges are shown in the last row.}
\caption{Disk Storage resource usage as of February 11$^{\rm th}$ 2019 for the Tier0 and Tier1 centres. The top row is taken from the LHCb accounting, the other ones (used, available and installed capacity) are taken from the recently commissioned WLCG Storage Space Accounting tool. The 2018 pledges are shown in the last row.}
Table~\ref{tab:disk} shows the situation of disk storage resources at CERN and Tier1s, as well as at each Tier1 site, as of February 11$^{\rm th}$ 2019. The used space includes derived data, i.e. DST and micro-DST of both real and simulated data, and space reserved for users. The latter accounts for 1.2 PB in total, 0.9 of which are used. The SRR disk used and SRR disk free information concerns only permanent disk storage (previously known as “T0D1”). The first two lines show a good agreement between what the site reports and what the LHCb accounting (first line) reports. The sum of the Tier0 and Tier1s 2018 pledges amount to 37.7 PB. The available disk space is 35 PB in total, 26 PB of which are used to store real and simulated datasets, and user data. A total of 3.7PB is used as tape buffer, the remaining 5 PB are free and will be used to store the output of the legacy stripping campaigns of Run1 and Run2 data that are currently being prepared. The disk space available at CNAF is about 6.6 PB, about 18\% above the pledge.
Table~\ref{tab:disk} shows the situation of disk storage resources at CERN and Tier 1 sites, as well as at each Tier1 site, as of February 11$^{\rm th}$ 2019. The used space includes derived data, i.e. DST and micro-DST of both real and simulated data, and space reserved for users. The latter accounts for 1.2 PB in total, 0.9 of which are used. The SRR disk used and SRR disk free information concerns only permanent disk storage (previously known as “T0D1”). The first two lines show a good agreement between what the site reports and what the LHCb accounting (first line) reports. The sum of the Tier0 and Tier 1 sites 2018 pledges amount to 37.7 PB. The available disk space is 35 PB in total, 26 PB of which are used to store real and simulated datasets, and user data. A total of 3.7PB is used as tape buffer, the remaining 5 PB are free and will be used to store the output of the legacy stripping campaigns of Run1 and Run2 data that are currently being prepared. The disk space available at CNAF is about 6.6 PB, about 18\% above the pledge.
In summary, the usage of computing resources in the 2018 calendar year has been quite smooth for LHCb. Simulation is the dominant activity in terms of CPU work. Additional unpledged resources, as well as clouds, on-demand and volunteer computing resources, were also successfully used. They were essential
in providing CPU work during the outage of the CNAF Tier 1 centre. As for the INFN Tier1 at CNAF, it came back to its fully-operational status in March 2018. After that, the backlog in the restripping campaign due to unavailability of data stored at CNAF was recovered, thanks also to the contribution of other sites, in time for the restart of data taking. After March 2018, CNAF operated in a very efficient and reliable way, being even able to over perform in terms of CPU power with respect to the pledged resources.
in providing CPU work during the outage of the CNAF Tier 1 centre. As for the INFN Tier1 at CNAF, it came back to its fully-operational status in March 2018. After that, the backlog in the restripping campaign due to unavailability of data stored at CNAF was recovered, thanks also to the contribution of other sites, in time for the restart of data taking. After March 2018, CNAF operated in a very efficient and reliable way, being even able to over perform in terms of CPU power with respect to the pledged resources.
\section{Expected growth of resources in 2020-2021}
In terms of CPU requirements, the different activities result in CPU work estimates for 2020-2021, that are apportioned between the different Tiers taking into account the computing model constraints and also capacities that are already installed. This results in the requests shown in Table~\ref{tab:req_CPU} together with the pledged resources for 2019. The CPU work required at CNAF would correspond to about 18\% of the total CPU requested at Tier1s+Tier2s sites.
In terms of CPU requirements, the different activities result in CPU work estimates for 2020-2021, that are apportioned between the different Tiers taking into account the computing model constraints and also capacities that are already installed. This results in the requests shown in Table~\ref{tab:req_CPU} together with the pledged resources for 2019. The CPU work required at CNAF would correspond to about 18\% of the total CPU requested at Tier1s+Tier2s sites.
\begin{table}[htbp]
\centering
\caption{CPU power requested at the different Tiers in 2020-2021. Pledged resources for 2019 are also reported}
...
...
@@ -198,7 +198,7 @@ In terms of CPU requirements, the different activities result in CPU work estima
\end{tabular}
\end{table}
The forecast total disk and tape space usage at the end of the years 2019-2020 are broken down into fractions to be provided by the different Tiers. These numbers are shown in Table~\ref{tab:req_disk} for disk and Table~\ref{tab:req_tape} for tape. The disk resources required at CNAF would be about 18\% of those requested for Tier1s+Tier2s sites, while for tape storage CNAF is supposed to provide about 24\% of the total tape request to Tier1s sites.
The forecast total disk and tape space usage at the end of the years 2019-2020 are broken down into fractions to be provided by the different Tiers. These numbers are shown in Table~\ref{tab:req_disk} for disk and Table~\ref{tab:req_tape} for tape. The disk resources required at CNAF would be about 18\% of those requested for Tier 1 sites + Tier2 sites, while for tape storage CNAF is supposed to provide about 24\% of the total tape request to Tier1 sites.
\begin{table}[htbp]
\centering
...
...
@@ -208,9 +208,9 @@ The forecast total disk and tape space usage at the end of the years 2019-2020 a
\hline
Disk (PB) & 2019 & 2020 & 2021 \\
\hline
Tier0 & 13.4 & 17.2 & 19.5 \\
Tier0 & 13.4 & 17.2 & 19.5 \\
Tier 1 & 29.0 & 33.2 & 39.0 \\
Tier2 & 4 & 7.2 & 7.5 \\
Tier2 & 4 & 7.2 & 7.5 \\
\hline
Total & 46.4 & 57.6 & 66.0 \\
\hline
...
...
@@ -225,7 +225,7 @@ The forecast total disk and tape space usage at the end of the years 2019-2020 a
\hline
Tape (PB) & 2019 & 2020 & 2021 \\
\hline
Tier0 & 35.0 & 36.1 & 52.0 \\
Tier0 & 35.0 & 36.1 & 52.0 \\
Tier 1 & 53.1 & 55.5 & 90.0 \\
\hline
Total & 88.1 & 91.6 & 142.0 \\
...
...
@@ -235,6 +235,6 @@ The forecast total disk and tape space usage at the end of the years 2019-2020 a
\section{Conclusion}
A description of the LHCb computing activities during 2018 has been given, with particular emphasis on the usage of resources and on the forecasts of resource needs until 2021. As in previous years, the CNAF Tier1 centre gave a substantial contribution to LHCb computing in terms of CPU work and storage made available to the collaboration. This achievement is particularly important this year, as CNAF was recovering from the major incident of November 2017 that unfortunately interrupted its activities. The effects of CNAF unavailability have been overcome also thanks to extra efforts from other sites and to the opportunistic usage of non-WLCG resources. The main consequence of the incident, in terms of LHCb operations, has been the delay in the restripping campaign of data collected during 2015-2017. The data that were stored at CNAF (approximately 20\% of the total) have been processed when the site restarted the operations in March 2018. It is worth to mention that despite the delay, the restripping campaign has been completed before the start of data taking according to the predicted schedule, avoiding further stress to the LHCb computing operations. Emphasis should be put also on the fact that an almost negligible amount of data have been lost in the incident and in any case it has been possible to recover them from backup copies stored at other sites.
A description of the LHCb computing activities during 2018 has been given, with particular emphasis on the usage of resources and on the forecasts of resource needs until 2021. As in previous years, the CNAF Tier1 centre gave a substantial contribution to LHCb computing in terms of CPU work and storage made available to the collaboration. This achievement is particularly important this year, as CNAF was recovering from the major incident of November 2017 that unfortunately interrupted its activities. The effects of CNAF unavailability have been overcome also thanks to extra efforts from other sites and to the opportunistic usage of non-WLCG resources. The main consequence of the incident, in terms of LHCb operations, has been the delay in the restripping campaign of data collected during 2015-2017. The data that were stored at CNAF (approximately 20\% of the total) have been processed when the site restarted the operations in March 2018. It is worth to mention that despite the delay, the restripping campaign has been completed before the start of data taking according to the predicted schedule, avoiding further stress to the LHCb computing operations. Emphasis should be put also on the fact that an almost negligible amount of data have been lost in the incident and in any case it has been possible to recover them from backup copies stored at other sites.