The ATLAS experiment at LHC was fully operating in 2017. In this contribution we describe the ATLAS computing activities performed in the Italian sites of the Collaboration, and in particular the utilisation of the CNAF Tier-1.
\end{abstract}
\section{Introduction}
ATLAS \cite{ATLAS-det} is one of two general-purpose detectors at the Large Hadron Collider (LHC). It investigates a wide range of physics, from the search for the Higgs boson and standard model studies to extra dimensions and particles that could make up dark matter. Beams of particles from the LHC collide at the centre of the ATLAS detector making collision debris in the form of new particles, which fly out from the collision point in all directions. Six different detecting subsystems arranged in layers around the collision point record the paths, momentum, and energy of the particles, allowing them to be individually identified. A huge magnet system bends the paths of charged particles so that their momenta can be measured. The interactions in the ATLAS detectors create an enormous flow of data. To digest the data, ATLAS uses an advanced trigger system to tell the detector which events to record and which to ignore. Complex data-acquisition and computing systems are then used to analyse the collision events recorded. At 46 m long, 25 m high and 25 m wide, the 7000-tons ATLAS detector is the largest volume particle detector ever built. It sits in a cavern 100 m below ground near the main CERN site, close to the village of Meyrin in Switzerland.
More than 3000 scientists from 174 institutes in 38 countries work on the ATLAS experiment.
ATLAS has been taking data from 2010 to 2012, at center of mass energies of 7 and 8 TeV, collecting about 5 and 20 fb$^{-1}$ of integrated luminosity, respectively. During the complete Run-2 phase (2015-2018) ATLAS collected and registered at the Tier-0 147 fb$^{-1}$ of integrated luminosity at center of mass energies of 13 TeV.
The experiment has been designed to look for New Physics over a very large set of final states and signatures, and for precision measurements of known Standard Model (SM) processes. Its most notable result up to now has been the discovery of a new resonance at a mass of about 125 GeV \cite{ATLAS higgs}, followed by the measurement of its properties (mass, production cross sections in various channels and couplings). These measurements have confirmed the compatibility of the new resonance with the Higgs boson, foreseen by the SM but never observed before.
\section{The ATLAS Computing System}
The ATLAS Computing System \cite{ATLAS-cm} is responsible for the provision of the software framework and services, the data management system, user-support services, and the world-wide data access and job-submission system. The development of detector-specific algorithmic code for simulation, calibration, alignment, trigger and reconstruction is under the responsibility of the detector projects, but the Software and Computing Project plans and coordinates these activities across detector boundaries. In particular, a significant effort has been made to ensure that relevant parts of the “offline” framework and event-reconstruction code can be used in the High Level Trigger. Similarly, close cooperation with Physics Coordination and the Combined Performance groups ensures the smooth development of global event-reconstruction code and of software tools for physics analysis.
\subsection{The ATLAS Computing Model}
The ATLAS Computing Model embraces the Grid paradigm and a high degree of decentralisation and sharing of computing resources. The required level of computing resources means that off-site facilities are vital to the operation of ATLAS in a way that was not the case for previous CERN-based experiments. The primary event processing occurs at CERN in a Tier-0 Facility. The RAW data is archived at CERN and copied (along with the primary processed data) to the Tier-1 facilities around the world. These facilities archive the raw data, provide the reprocessing capacity, provide access to the various processed versions, and allow scheduled analysis of the processed data by physics analysis groups. Derived datasets produced by the physics groups are copied to the Tier-2 facilities for further analysis. The Tier-2 facilities also provide the simulation capacity for the experiment, with the simulated data housed at Tier-1s. In addition, Tier-2 centres provide analysis facilities, and some provide the capacity to produce calibrations based on processing raw data. A CERN Analysis Facility provides an additional analysis capacity, with an important role in the calibration and algorithmic development work. ATLAS has adopted an object-oriented approach to software, based primarily on the C++ programming language, but with some components implemented using FORTRAN and Java. A component-based model has been adopted, whereby applications are built up from collections of plug-compatible components based on a variety of configuration files. This capability is supported by a common framework that provides common data-processing support. This approach results in great flexibility in meeting both the basic processing needs of the experiment, but also for responding to changing requirements throughout its lifetime. The heavy use of abstract interfaces allows for different implementations to be provided, supporting different persistency technologies, or optimized for the offline or high-level trigger environments.
The Athena framework is an enhanced version of the Gaudi framework that was originally developed by the LHCb experiment, but is now a common ATLAS-LHCb project. Major
design principles are the clear separation of data and algorithms, and between transient (in-memory) and persistent (in-file) data. All levels of processing of ATLAS data, from high-level trigger to event simulation, reconstruction and analysis, take place within the Athena framework; in this way it is easier for code developers and users to test and run algorithmic code, with the assurance that all geometry and conditions data will be the same for all types of applications ( simulation, reconstruction, analysis, visualization).
One of the principal challenges for ATLAS computing is to develop and operate a data storage and management infrastructure able to meet the demands of a yearly data volume of O(10PB) utilized by data processing and analysis activities spread around the world. The ATLAS Computing Model establishes the environment and operational requirements that ATLAS data-handling systems must support and provides the primary guidance for the development of the data management systems.
The ATLAS Databases and Data Management Project (DB Project) leads and coordinates ATLAS activities in these areas, with a scope encompassing technical data bases (detector production, installation and survey data), detector geometry, online/TDAQ databases, conditions databases (online and offline), event data, offline processing configuration and bookkeeping, distributed data management, and distributed database and data management services. The project is responsible for ensuring the coherent development, integration and operational capability of the distributed database and data management software and infrastructure for ATLAS across these areas.
The ATLAS Computing Model defines the distribution of raw and processed data to Tier-1 and Tier-2 centres, so as to be able to exploit fully the computing resources that are made available to the Collaboration. Additional computing resources are available for data processing and analysis at Tier-3 centres and other computing facilities to which ATLAS may have access. A complex set of tools and distributed services, enabling the automatic distribution and processing of the large amounts of data, has been developed and deployed by ATLAS in cooperation with the LHC Computing Grid (LCG) Project and with the middleware providers of the three large Grid infrastructures we use: EGI, OSG and NorduGrid. The tools are designed in a flexible way, in order to have the possibility to extend them to use other types of Grid middleware in the future.
The main computing operations that ATLAS have to run comprise the preparation, distribution and validation of ATLAS software, and the computing and data management operations run centrally on Tier-0, Tier-1s and Tier-2s. The ATLAS Virtual Organization allows production and analysis users to run jobs and access data at remote sites using the ATLAS-developed Grid tools.
The Computing Model, together with the knowledge of the resources needed to store and process each ATLAS event, gives rise to estimates of required resources that can be used to design and set up the various facilities. It is not assumed that all Tier-1s or Tier-2s are of the same size; however, in order to ensure a smooth operation of the Computing Model, all Tier-1s usually have broadly similar proportions of disk, tape and CPU, and similarly for the Tier-2s.
The organization of the ATLAS Software and Computing Project reflects all areas of activity within the project itself. Strong high-level links are established with other parts of the ATLAS organization, such as the TDAQ Project and Physics Coordination, through cross-representation in the respective steering boards. The Computing Management
Board, and in particular the Planning Officer, acts to make sure that software and computing developments take place coherently across sub-systems and that the project as a whole meets its milestones. The International Computing Board assures the information flow between the ATLAS Software and Computing Project and the national resources and their Funding Agencies.
\section{The role of the Italian Computing facilities in the global ATLAS Computing}
Italy provides Tier-1, Tier-2 and Tier-3 facilities to the ATLAS collaboration. The Tier-1, located at CNAF, Bologna, is the main centre, also referred as “regional” centre. The Tier-2 centres are distributed in different areas of Italy, namely in Frascati, Napoli, Milano and Roma. All 4 Tier-2 sites are considered as Direct Tier-2 (T2D), meaning that they have an higher importance with respect to normal Tier-2s and can have primary data too. They are also considered satellites of the Tier-1, also identified as nucleus. The total of the Tier-2 sites corresponds to more than the total ATLAS size at the Tier-1, for what concerns disk and CPUs; tape is not available in the Tier-2 sites. A third category of sites is the so-called Tier-3 centres. Those are smaller centres, scattered in different places in Italy, that nevertheless contributes in a consistent way to the overall computing power, in terms of disk and CPUs. The overall size of the Tier-3 sites corresponds roughly to the size of a Tier-2 site. The Tier-1 and Tier-2 sites have pledged resources, while the Tier-3 sites do not have any pledge resource available.
In terms of pledged resources, Italy contributes to the ATLAS computing as 9\% of both CPU and disk for the Tier-1. The share of the Tier-2 facilities corresponds to 7\% of disk and 9\% of CPU of the whole ATLAS computing infrastructure. The Italian Tier-1, together with the other Italian centres, provides both resources and expertise to the ATLAS computing community, and manages the so-called Italian Cloud of computing. Since 2015 the Italian Cloud does not only include Italian sites, but also Tier-3 sites of other countries, namely South Africa and Greece.
The computing resources, in terms of disk, tape and CPU, available in the Tier-1 at CNAF have been very important for all kind of activities, including event generation, simulation, reconstruction, reprocessing and analysis, for both MonteCarlo and real data. Its major contribution has been the data reprocessing, since this is a very I/O and memory intense operation, normally executed only in Tier-1 centres. In this sense CNAF has played a fundamental role for the fine measurement of the Higgs [3] properties in 2018 and other analysis. The Italian centres, including CNAF, have been very active not only in the operation side, but contributed a lot in various aspect of the Computing of the ATLAS experiment, in particular for what concerns the network, the storage systems, the storage federations and the monitoring tools. The Tier-1 at CNAF has been very important for the ATLAS community in 2018, for some specific activities:
\begin{itemize}
\item improvements on the WebDAV/HTTPS access for StoRM, in order to be used as main renaming method for the ATLAS files in StoRM and for http federation purposes;
\item improvements of the dynamic model of the multi-core resources operated via the LSF resource management system and simplification of the PanDA queues, using the Harvester service to mediate the control and information flow between PanDA and the resources.
\item network troubleshooting via the Perfsonar-PS network monitoring system, used for the LHCONE overlay network, together with the other Tier-1 and Tier-2 sites;
\item planning, readiness testing and implementation of the HTCondor batch system for the farming resources management.
\end{itemize}
\section{Main achievements of ATLAS Computing centers in Italy}
The Italian Tier-2 Federation runs all the ATLAS computing activities in the Italian cloud supporting the operations at CNAF, the Italian Tier-1 centre, and the Milano, Napoli, Roma1 and Frascati Tier-2 sites. This insures an optimized use of the resources and a fair and efficient data access. The computing activities of the ATLAS collaboration have been constantly carried out over the whole 2018, in order to analyse the data of the Run-2 and produce the Monte Carlo data needed for the 2018 run.
The LHC data taking started in April 2018 and, until the end of the operation in December 2018, all the Italian sites, the CNAF Tier-1 and the four Tier-2s, have been involved in all the computing operations of the collaboration: data reconstruction, Monte Carlo simulation, user and group analysis and data transfer among all the sites. Besides these activities, the Italian centers have contributed to the upgrade of the Computing Model both from the testing side and the development of specific working groups. ATLAS collected and registered at the Tier-0 ~60.6 fb$^{-1}$ and ~25 PB of raw and derived data, while the cumulative data volume distributed in all the data centers in the grid was of the order of ~80 PB. The data has been replicated with an efficiency of 100\% and an average throughput of the order of ~13 GB/s during the data taking period, with peaks above 25 GB/s. For just Italy, the average throughput was of the order of 800 MB/s with peaks above 2GB/s. The data replication speed from Tier-0 to the Tier-2s has been quite fast with a transfer time lower than 4 hours. The average number of simultaneous jobs running on the grid has been of about 110k for production (simulation and reconstruction) and data analysis, with peaks over 150k, with an average CPU efficiency up to more than 80\%. The use of the grid for analysis has been stable on ~26k simultaneous jobs, with peaks around the conferences’ periods to over 40k, showing the reliability and effectiveness of the use of grid tools for data analysis.
The Italian sites contributed to the development of the Xrootd and http/webdav federation. In the latter case the access to the storage resources is managed using the http/webdav protocol, in collaboration with the CERN DPM team, the Belle2 experiment, the Canadian Corporate Cloud ant the RAL (UK) site. The purpose is to build a reliable storage federation, alternative to the Xrootd one, to access physics data both on the grid and on cloud storage infrastructures (like Amazon S3, MicroSoft Azure, etc). The Italian community is particularly involved in this project and the first results have been presented to the WLCG collaboration.
The Italian community also contributes to develop new tools for distributed data analysis and management. Another topic of interest is the usage of new computing technologies: in this field the Italian community contributed to the development and testing of muon tracking algorithms in the ATLAS High Level Trigger, using GPGPU. Other topics in which the Italian community is involved are the Machine Learning/Deep Learning for both analysis and Operational Intelligence and their applications to the experiment software and infrastructure, by using accelerators like GPGPU and FPGAs.
The contribution of the Italian sites to the computing activities in terms of processed jobs and data recorded has been of about 9\%, corresponding to the order of the resource pledged to the collaboration, with very good performance in term of availability, reliability and efficiency. All the sites are always in the top positions in the ranking of the collaboration sites.
Besides the Tier-1 and Tier-2s, in 2018 also the Tier-3s gave a significant contribution to the Italian physicists community for the data analysis. The Tier-3s are local farms dedicated to the interactive data analysis, the last step of the analysis workflow, and to the grid analysis over small data sample. Several italian groups set up a farm for such a purpose in their universities and, after a testing and validation process performed by the distributed computing team of the collaboration, all have been recognized as official Tier-3s of the collaboration.
\section{Impact of CNAF flooding incident on ATLAS computing activities}
The ATLAS Computing Model was designed to have a sufficient redundancy of the available resources in order to tackle emergency situations like the flooding occurred on November 9th 2017 at CNAF. Thanks to the huge effort of the whole community of the CNAF, the operativity of the data centre restarted gradually from the second half of February 2018. A continuous interaction between ATLAS distributed computing community and CNAF people was needed to bring the computing operation fully back to normality. The deep collaboration was very successful and after one month the site was almost fully operational and the ATLAS data management and processing activities were running smoothly again. Eventually, the overall impact of the incident was limited enough, mainly thanks to the relatively quick recovery of the CNAF data center and to the robustness of the computing model.
\section*{References}
\begin{thebibliography}{9}
\bibitem{ATLAS-det} The ATLAS Computing Technical Design Report ATLAS-TDR-017;
CERN-LHCC-2005-022, June 2005
\bibitem{ATLAS higgs} Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC, the ATLAS Collaboration, Physics Letters B, Volume 716, Issue 1, 17 September 2012, Pages 1–29
\bibitem{ATLAS-cm} The evolution of the ATLAS computing model; R W L Jones and D Barberis 2010 J. Phys.: Conf. Ser. 219 072037 doi:10.1088/1742-6596/219/7/072037