Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • faproietti/ar2018
  • chierici/ar2018
  • SDDS/ar2018
  • cnaf/annual-report/ar2018
4 results
Show changes
Commits on Source (370)
Showing
with 1050 additions and 105 deletions
......@@ -63,86 +63,94 @@ fi
cd ${builddir}
# prepare cover
#link_pdf cover cover.pdf
#link_pdf experiment experiment.pdf
#link_pdf datacenter datacenter.pdf
#link_pdf research research.pdf
#link_pdf transfer transfer.pdf
#link_pdf additional additional.pdf
link_pdf cover cover.pdf
link_pdf experiment experiment.pdf
link_pdf datacenter datacenter.pdf
link_pdf research research.pdf
link_pdf transfer transfer.pdf
link_pdf additional additional.pdf
build_from_source user-support main.tex *.PNG
#build_from_source ams ams.tex AMS_nuovo.pdf contributors.pdf He-MC.pdf He-MC.tiff input_output.jpg production_jobs.jpg
#build_from_source alice alice.tex *.png *.eps
#build_from_source atlas atlas.tex
#build_from_source borexino borexino.tex
#build_from_source cms report-cms-feb-2018.tex cms-jobs.eps tier-1-sr-2017.eps
build_from_source ams AMS-report-2019.tex AMS_nuovo.pdf contributors.pdf He-MC.pdf input_output.jpg production_jobs.jpg
build_from_source alice main.tex *.png
build_from_source atlas atlas.tex
build_from_source borexino Borexino_CNAFreport2018.tex
build_from_source cms report-cms-feb-2019.tex tier1-jobs-2018.pdf tier1-readiness-2018.pdf
link_pdf belle Cnaf-2019-5.0.pdf
#build_from_source cosa cosa.tex biblio.bib beegfs.PNG
#build_from_source cnprov cnprov.tex
#build_from_source cta cta.tex *.eps
#build_from_source cuore cnaf_cuore.tex cnaf_cuore.bib
#build_from_source cupid cupid.tex cupid.bib
#link_pdf dampe dampe.pdf
#link_pdf darkside ds.pdf
build_from_source cnprov cnprov.tex
build_from_source cta CTA_annualreport_2018_v1.tex *.eps
build_from_source cuore cuore.tex cuore.bib
build_from_source cupid main.tex cupid-biblio.bib
build_from_source dampe main.tex *.jpg *.png
build_from_source darkside ds-annual-report-2019.tex
#build_from_source eee eee.tex EEEarch.eps EEEmonitor.eps EEEtracks.png ELOGquery.png request.png
#build_from_source exanest exanest.tex biblio.bib monitoring.PNG storage.png
build_from_source test TEST.tex test.eps
#build_from_source fazia fazia.tex
build_from_source fermi fermi.tex
build_from_source gamma gamma.tex
build_from_source icarus report_2018.tex *.png
#build_from_source gerda gerda.tex *.pdf
#build_from_source glast glast.tex
#link_pdf juno juno.pdf
link_pdf juno juno-annual-report-2019.pdf
build_from_source km3net km3net.tex compmodel.png threetier.png
build_from_source na62 main.tex
build_from_source newchim repnewchim18.tex fig1.png
#build_from_source lhcb lhcb.tex *.jpg
#build_from_source lhcf lhcf.tex
#build_from_source limadou limadou.tex
build_from_source lhcb lhcb.tex *.png
build_from_source lhcf lhcf.tex
build_from_source limadou limadou.tex
#build_from_source lowcostdev lowcostdev.tex *.jpg
#build_from_source lspe lspe.tex biblio.bib lspe_data_path.pdf
build_from_source virgo AdV_computing_CNAF.tex
build_from_source xenon main.tex xenon-computing-model.pdf
build_from_source sc18 SC18.tex *.png
#build_from_source mw-esaco mw-esaco.tex *.png
#build_from_source mw-kube mw-kube.tex
#build_from_source mw-cdmi-storm mw-cdmi-storm.tex *.png *.jpeg
#build_from_source mw-software mw-software.tex
#build_from_source mw-iam mw-iam.tex
## Research and Developments
build_from_source sd_iam main.tex biblio.bib *.png
build_from_source sd_storm main.tex biblio.bib *.png
build_from_source sd_storm2 main.tex biblio.bib *.png
build_from_source sd_nginx_voms main.tex biblio.bib *.png
#build_from_source na62 na62.tex
#link_pdf padme padme.pdf
link_pdf padme 2019_PADMEcontribution.pdf
#build_from_source xenon xenon.tex xenon-computing-model.pdf
#build_from_source sysinfo sysinfo.tex pres_rundeck.png deploy_grafana.png
build_from_source sysinfo sysinfo.tex *.png
#link_pdf virgo VirgoComputing.pdf
#build_from_source tier1 tier1.tex
build_from_source tier1 tier1.tex *.png
#build_from_source flood theflood.tex *.png
#build_from_source farming farming.tex
build_from_source HTC_testbed HTC_testbed_AR2018.tex
build_from_source farming ARFarming2018.tex *.png *.jpg
#build_from_source dynfarm dynfarm.tex
#build_from_source storage storage.tex *.png Huawei_rack.JPG
build_from_source storage storage.tex *.PNG
#build_from_source seagate seagate.tex biblio.bib *.png *.jpg
#build_from_source dataclient dataclient.tex
#build_from_source ltpd ltpd.tex *.png
#build_from_source net net.tex *.png
build_from_source net main.tex *.png
#build_from_source ssnn1 ssnn.tex *.jpg
#build_from_source ssnn2 vmware.tex *.JPG *.jpg
#build_from_source infra Chiller.tex chiller-location.png
build_from_source audit Audit-2018.tex image.png
#build_from_source cloud_cnaf cloud_cnaf.tex *.png
#build_from_source srp SoftRel.tex ar2017.bib
build_from_source dmsq dmsq2018.tex ar2018.bib
#build_from_source st StatMet.tex sm2017.bib
build_from_source ds_eoscpilot ds_eoscpilot.tex
build_from_source ds_eoschub ds_eoschub.tex
build_from_source ds_eoscpilot ds_eoscpilot.tex *.png
build_from_source ds_eoschub ds_eoschub.tex *.png
build_from_source ds_cloud_c ds_cloud_c.tex *.png
build_from_source ds_infn_cc ds_infn_cc.tex *.png
build_from_source ds_devops_pe ds_devops_pe.tex
build_from_source ds_devops_pe ds_devops_pe.tex *.png
#build_from_source cloud_b cloud_b.tex *.png *.jpg
#build_from_source cloud_c cloud_c.tex *.png *.pdf
#build_from_source cloud_d cloud_d.tex *.png
build_from_source sdds-xdc SDDS-XDC.tex *.png
build_from_source sdds-deep SDDS-DEEP.tex *.png
build_from_source PhD_DataScience_2018 PhD-DataScience-2018.tex
build_from_source chnet dhlab.tex *.png
#build_from_source pett pett.tex bibliopett.bib
#build_from_source iso iso.tex 27001.png biblioiso.bib
build_from_source pett pett.tex bibliopett.bib
build_from_source summerstudent summerstudent.tex *.png
pdflatex ${topdir}/cnaf-annual-report-2018.tex \
&& pdflatex ${topdir}/cnaf-annual-report-2018.tex 2> /dev/null \
......
......@@ -28,7 +28,7 @@
%\author{}
%\maketitle
%\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/cover.pdf}
\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/cover.pdf}
\newpage
\thispagestyle{empty}
......@@ -82,7 +82,46 @@ Tel. +39 051 209 5475, Fax +39 051 209 5477\\
\markboth{\MakeUppercase{Introduction}}{\MakeUppercase{Introduction}}
\chapter*{Introduction}
\thispagestyle{plain}
Introducing the sixth annual report of CNAF...
\small The first months of 2018 were still affected by the effects of the flooding suffered in November 2017 and it was only in March 2018
that our data center was able to resume its full activity.
Despite this, the overall performance of the Tier 1 for the LHC experiments and for the many other astroparticle and nuclear physics experiments was very good,
and enough to place CNAF's Tier 1 among the most productive ones in the WLCG ecosystem, as the reports of the experiments in this document show.
Even the activities of both the HPC clusters and the Cloud@CNAF infrastructure resumed regular operations after the systems have been brought back to CNAF
from the sites that had temporarily hosted them.
The flooding had indeed beneficial repercussions in speeding up the decision to find a new location for our data center.
The move was already planned in order to face the challenges of High-Luminosity LHC and of the astroparticle experiments that will begin their data acquisition
in the second half of 2020, but the dramatic event of November 2017 made the fragility and weaknesses of the current installation clear.
Also, during 2018 three events have matured paving the way for the definition of a development strategy towards both a new site and a new computing model,
that includes the possibility to exploit the computing power of the HPC systems: the availability of a big area such as Bologna Tecnopolo where to install
our new data center; the possibility of a joint upgrade together with the Italian supercomputing center CINECA thanks to European and Italian funding;
the additional funds from the Italian Government for a project aimed at strengthening the INFN computing infrastructures.
Our R\&D activities have proceeded regularly, meeting the expected milestones and deliverables.
In particular, the path towards a European Open Science Cloud (EOSC) has seen significant progress thanks to the EOSCHub and EOSCPilot projects,
in both of which CNAF plays an important role. Contributions to the EOSC have also come from other H2020 projects in which we are involved,
namely XDC-eXtreme Data Cloud, which focuses mainly on data management services evolved for a context of distributed resources,
and DEEP-Hybrid DataCloud, which addresses the need to support intensive computing techniques, requiring specialized HPC hardware,
to explore very large data sets.
The External Projects and Technology Transfer (PETT) Organizational Unit has contributed to various projects in the field of computing,
communication of science, technology transfer and education. Great effort has been dedicated to the consolidation of the Technology Transfer Laboratory (INFN-TTLab),
a collaboration between CNAF and the INFN divisions of Bologna and Ferrara with the goal of promoting the transfer of our know-how towards regional enterprises.
2018 has also been the first full year in which the TTLab operated an ISO-27001 ISMS consisting of a subset of the Data Center resources.
Such certification, which was acquired in order to be qualified for storing and managing sensitive data,
could open new opportunities of exploitation of our resources in the next future.
Also noteworthy is the involvement of CNAF in the INFN Cultural Heritage Network (CHNet),
where our expertise in Cloud technologies and software development is put to good use for the preparation of a digital library
where members of the network can safely store their datasets and have access to applications for their processing.
This report about the accomplishments of CNAF during 2018 arrives just at the end of 2019.
The delay is due to higher-priority commitments that have overlapped with its finalization,
but we are well aware that such situation affects its usefulness as a means of transparency towards our stakeholders
and of recognition of the hard work and dedication of the personnel of the Center.
To prevent similar situations in the future we are adopting some corrections to the editing process
already for the report about the year 2019, and we are also planning some interesting surprises that we hope will please our readers.
\begin{flushright}
\parbox{0.7\textwidth}{
......@@ -127,7 +166,7 @@ Introducing the sixth annual report of CNAF...
%\addcontentsline{toc}{chapter}{Scientific Exploitation of CNAF ICT Resources}
%\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/esperiment.pdf}
%\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/experiment.pdf}
%\ip{Scientific Exploitation of CNAF ICT Resources}
......@@ -141,35 +180,38 @@ Introducing the sixth annual report of CNAF...
\phantomsection
\addcontentsline{toc}{part}{Scientific Exploitation of CNAF ICT Resources}
\addtocontents{toc}{\protect\mbox{}\protect\hrulefill\par}
%\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/experiment.pdf}
\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/experiment.pdf}
\cleardoublepage
\ia{User and Operational Support at CNAF}{user-support}
%\ia{ALICE computing at the INFN CNAF Tier 1}{alice}
%\ia{AMS-02 data processing and analysis at CNAF}{ams}
%\ia{The ATLAS experiment at the INFN CNAF Tier 1}{atlas}
%\ia{The Borexino-SOX experiment at the INFN CNAF Tier 1}{borexino}
%\ia{The Cherenkov Telescope Array}{cta}
%\ia{The CMS experiment at the INFN CNAF Tier 1}{cms}
%\ia{CSES-Limadou at CNAF}{limadou}
%\ia{CUORE experiment}{cuore}
%\ia{CUPID-0 experiment}{cupid}
%\ia{DAMPE data processing and analysis at CNAF}{dampe}
%\ia{DarkSide-50 experiment at CNAF}{darkside}
\ia{ALICE computing at the INFN CNAF Tier 1}{alice}
\ia{AMS-02 data processing and analysis at CNAF}{ams}
\ia{The ATLAS experiment at the INFN CNAF Tier 1}{atlas}
\ia{The Borexino experiment at the INFN-CNAF}{borexino}
\ia{The Cherenkov Telescope Array}{cta}
\ia{The CMS experiment at the INFN CNAF Tier 1}{cms}
\ia{The Belle II experiment at CNAF}{belle}
\ia{CSES-Limadou at CNAF}{limadou}
\ia{CUORE experiment}{cuore}
\ia{CUPID-0 experiment}{cupid}
\ia{DAMPE data processing and analysis at CNAF}{dampe}
\ia{DarkSide program at CNAF}{darkside}
%\ia{The EEE Project activity at CNAF}{eee}
\ia{TEST FOR COMMITTEE}{test}
\ia{The \emph{Fermi}-LAT experiment}{fermi}
%\ia{Fazia: running dynamical simulations for heavy ion collisions at Fermi energies}{fazia}
%\ia{The Fermi-LAT experiment}{glast}
\ia{GAMMA experiment}{gamma}
\ia{ICARUS}{icarus}
%\ia{The GERDA experiment}{gerda}
%\ia{Juno experimenti at CNAF}{juno}
\ia{Juno experimenti at CNAF}{juno}
\ia{The KM3NeT neutrino telescope network and CNAF}{km3net}
\ia{The NEWCHIM activity at CNAF for the CHIMERA and FARCOS devices}{newchim}
%\ia{LHCb Computing at CNAF}{lhcb}
%\ia{The LHCf experiment}{lhcf}
\ia{LHCb Computing at CNAF}{lhcb}
\ia{The LHCf experiment}{lhcf}
%\ia{The LSPE experiment at INFN CNAF}{lspe}
%\ia{The NA62 experiment at CERN}{na62}
%\ia{The PADME experiment at INFN CNAF}{padme}
%\ia{XENON computing activities}{xenon}
\ia{The NA62 experiment at CERN}{na62}
\ia{The NEWCHIM activity at CNAF for the CHIMERA and FARCOS devices}{newchim}
\ia{The PADME experiment at INFN CNAF}{padme}
\ia{XENON computing model}{xenon}
\ia{Advanced Virgo computing at CNAF}{virgo}
%
% to keep together the next part title with its chapters in the toc
%\addtocontents{toc}{\newpage}
......@@ -179,67 +221,67 @@ Introducing the sixth annual report of CNAF...
\phantomsection
\addcontentsline{toc}{part}{The Tier 1 and Data center}
\addtocontents{toc}{\protect\mbox{}\protect\hrulefill\par}
%\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/datacenter.pdf}
%\ia{The INFN Tier 1 data center}{tier1}
%\ia{The computing farm}{farming}
%\ia{Data management and storage systems}{storage}
\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/datacenter.pdf}
\ia{The INFN Tier 1}{tier1}
\ia{The INFN-Tier 1: the computing farm}{farming}
\ia{Data management and storage systems}{storage}
%\ia{Evaluation of the ClusterStor G200 Storage System}{seagate}
%\ia{Activity of the INFN CNAF Long Term Data Preservation (LTDP) group}{ltpd}
%\ia{The INFN Tier 1: Network}{net}
\ia{The INFN-Tier 1: Network and Security}{net}
%\ia{Cooling system upgrade and Power Usage Effectiveness improvement in the INFN CNAF Tier 1 infrastructure}{infra}
%\ia{National ICT Services Infrastructure and Services}{ssnn1}
%\ia{National ICT Services hardware and software infrastructures for Central Services}{ssnn2}
%\ia{The INFN Information System}{sysinfo}
%\ia{CNAF Provisioning system: On the way to Puppet 5}{cnprov}
\ia{The INFN Information System}{sysinfo}
\ia{CNAF Provisioning system: Puppet 5 upgrade}{cnprov}
\ia{Evaluating Migration of INFN–T1 from
CREAM-CE/LSF to HTCondor-CE/HTCondor}{HTC_testbed}
\cleardoublepage
\thispagestyle{empty}
\phantomsection
\addcontentsline{toc}{part}{Research and Developments}
\addtocontents{toc}{\protect\mbox{}\protect\hrulefill\par}
%\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/research.pdf}
\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/research.pdf}
\cleardoublepage
%\ia{Continuous Integration and Delivery with Kubernetes}{mw-kube}
%\ia{Middleware support, maintenance and development}{mw-software}
%\ia{Evolving the INDIGO IAM service}{mw-iam}
%\ia{Esaco: an OAuth/OIDC token introspection service}{mw-esaco}
%\ia{StoRM Quality of Service and Data Lifecycle support through CDMI}{mw-cdmi-storm}
%\ia{A low-cost platform for space software development}{lowcostdev}
%\ia{Overview of Software Reliability literature}{srp}
\ia{Evolving the INDIGO IAM service}{sd_iam}
\ia{StoRM maintenance and evolution}{sd_storm}
\ia{StoRM 2: initial design and development activities}{sd_storm2}
\ia{A VOMS module for the Nginx web server}{sd_nginx_voms}
\ia{Comparing Data Mining Techniques for Software Defect Prediction}{dmsq}
%\ia{Summary of a tutorial on statistical methods}{st}
%\ia{Dynfarm: Transition to Production}{dynfarm}
%\ia{Official testing and increased compatibility for Dataclient}{dataclient}
\ia{Common software lifecycle management in external projects: Placeholder}{ds_devops_pe}
\ia{EOSC-hub: Placeholder}{ds_eoschub}
\ia{Common software lifecycle management in external projects:}{ds_devops_pe}
\ia{EOSC-hub: contributions to project achievements}{ds_eoschub}
\ia{EOSCpilot - Interoperability aspects and results}{ds_eoscpilot}
\ia{Cloud@CNAF Management and Evolution}{ds_cloud_c}
\ia{INFN CorporateCloud: Management and evolution}{ds_infn_cc}
\ia{eXtreme DataCloud project: Advanced data management services for distributed e-infrastructures}{sdds-xdc}
\ia{DEEP-HybridDataCloud project: Hybrid services for distributed e-infrastructures}{sdds-deep}
\ia{DHLab: a digital library for the INFN Cultural Heritage Network}{chnet}
\cleardoublepage
\thispagestyle{empty}
\phantomsection
\addcontentsline{toc}{part}{Technology transfer and other projects}
\addcontentsline{toc}{part}{Technology transfer, outreach and more}
\addtocontents{toc}{\protect\mbox{}\protect\hrulefill\par}
%\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/transfer.pdf}
\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/transfer.pdf}
\cleardoublepage
%\ia{External projects and Technology transfer}{pett}
%\ia{The ISO 27001 Certification}{iso}
%\ia{COmputing on SoC Architectures: the COSA project at CNAF}{cosa}
%\ia{The ExaNeSt project - activities at CNAF}{exanest}
\ia{External Projects and Technology Transfer}{pett}
\ia{INFN CNAF log analysis: a first experience with summer students}{summerstudent}
\ia{The annual international conference of high performance computing: SC18 from INFN point of view}{sc18}
\ia{Infrastructures and Big Data processing as pillars in the XXXIII PhD course in Data Science and Computation}{PhD_DataScience_2018}
\ia{Internal Auditing INFN for GDPR compliance}{audit}
\cleardoublepage
\thispagestyle{empty}
\phantomsection
\addcontentsline{toc}{part}{Additional information}
\addtocontents{toc}{\protect\mbox{}\protect\hrulefill\par}
%\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/additional.pdf}
\includepdf[pages=1, pagecommand={\thispagestyle{empty}}]{papers/additional.pdf}
\cleardoublepage
\ia{Infrastructures and Big Data processing as pillars in the XXXIII PhD couse in Data Sciece and Computation}{PhD_DataScience_2018}
\phantomsection
\addcontentsline{toc}{chapter}{Organization}
\markboth{\MakeUppercase{Organization}}{\MakeUppercase{Organization}}
......@@ -257,14 +299,14 @@ Gaetano Maron
\subsection*{Scientific Advisory Panel}
\begin{tabular}{ l l p{7cm} }
\textit{Chairperson} & Michael Ernst & \textit{\small Brookhaven National Laboratory, USA} \\
& Gian Paolo Carlino & \textit{\small INFN -- Sezione di Napoli, Italy} \\
& Patrick Fuhrmann & \textit{\small Deutsches Elektronen-Synchrotron, Germany} \\
& Josè Hernandez & \textit{\small Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas, Spain} \\
& Donatella Lucchesi & \textit{\small Università di Padova, Italy} \\
& Vincenzo Vagnoni & \textit{\small INFN -- Sezione di Bologna, Italy} \\
& Pierre-Etienne Macchi & \textit{\small IN2P3/CNRS, France}
\textit{Chairperson} & Eleonora Luppi & \textit{\small Università di Ferrara, Italy} \\
& Roberto Saban & \textit{\small INFN, Italy} \\
& Laura Perini & \textit{\small Università di Milano, Italy} \\
& Volker Beckman & \textit{\small IN2P3, France} \\
& Volker Guelzow & \textit{\small Deutsches Elektronen-Synchrotron, Germany} \\
& Alberto Pace & \textit{\small CERN} \\
& Eric Lancon & \textit{\small Brookhaven National Laboratory, USA} \\
& Josè Hernandez & \textit{\small Centro de Investigaciones Energéticas, Medioambientales y Tecnológicas, Spain}
\end{tabular}
% open local environment where the format of section and subsection
......
\documentclass[a4paper]{jpconf}
\usepackage[english]{babel}
% \usepackage{cite}
\usepackage{biblatex}
%\bibliographystyle{abnt-num}
%%%%%%%%%% Start TeXmacs macros
\newcommand{\tmtextit}[1]{{\itshape{#1}}}
\newenvironment{itemizedot}{\begin{itemize} \renewcommand{\labelitemi}{$\bullet$}\renewcommand{\labelitemii}{$\bullet$}\renewcommand{\labelitemiii}{$\bullet$}\renewcommand{\labelitemiv}{$\bullet$}}{\end{itemize}}
%%%%%%%%%% End TeXmacs macros
\begin{document}
\title{Evaluating Migration of INFN--Tier 1 from CREAM-CE/LSF to
HTCondor-CE/HTCondor}
\author{Stefano Dal Pra$^1$}
\address{$^1$ INFN-CNAF, Bologna, IT}
\ead{stefano.dalpra@cnaf.infn.it}
\begin{abstract}
The Tier 1 data center provides computing resources for a variety of HEP and
Astrophysics experiments, organized in Virtual Organization submitting their
jobs to our computing facilities through Computing Elements, acting as Grid
interfaces to the Local Resource Manager. We planned to phase--out our
current LRMS (IBM/Platform LSF 9.1.3) and CEs (CREAM) to adopt HTCondor as a
replacement of LSF and HTCondor--CE instead of CREAM. A small cluster has
been set up to practice the management and evaluate a migration plan to a
new LRMS and CE set. This document reports about our early experience on
this.
\end{abstract}
\section{Introduction}
The INFN-Tier 1 currently provides a computing power of about 400 kHS06, 35000
slots on one thousand physical Worker Nodes. These resources are accessed
through Grid by 24 Grid VOs and locally by 25 user groups.
The IBM/Platform LSF 9.1.3 batch system arbitrate access to all the competing
users groups, both Grid and local, according to a \tmtextit{fairshare} policy,
designed to prevent underutilization of the available resources or starvation
of lower priority groups, while ensuring a medium--term share proportional to
configured quotas.
The CREAM--CEs act as frontend for Grid users to the underlying LSF batch
system, submitting their jobs on behalf of them. This setup has proven to be
an effective solution for several years. However, the compatibility between
CREAM and HTCondor seems to be less tight than with LSF. Moreover, active
development of CREAM has recently ceased and thus we cannot expect new
versions to be released, nor better HTCondor support to be implemented by an
official development team. We decided to migrate our batch system solution
from LSF to HTCondor, thus we need to also change our CEs. We have selected
HTCondor-CE as a natural choice, because it is maintained by the same
development team of HTCondor. In the following we provide a report about our
experience with HTCondor and HTCondor--CE.
\section{The HTCondor cluster}
To get acquainted with the new batch system and CEs, to evaluate how these can
work together, how other components, such as monitoring, provisioning and
accounting systems can be integrated with HTCondor and HTCondor--CE and
finally to devise a reasonable migration plan, a simple small HTCondor 8.6.13
cluster has been set up during spring 2018. A HTCondor--CE was soon added, in
late April. HTCondor is a very mature opensource product, deployed at several
major Tier 1 for years, thus we already know that it will certainly fit our
use cases. The HTCondor--CE, on the other hand, is a more recent product, and
a number of issues might be too problematic for us to deal with. Our focus is
about ensuring that this CE implementation can be a viable solution for us.
\subsection{The testbed}
The test cluster consists of:
\begin{itemizedot}
\item a HTCondor--CE on top of
\item a HTCondor \ Central Manager and Collector
\item 3 Worker Nodes (Compute Nodes, in HTCondor terms), 16 slot each.
\end{itemizedot}
\subsection{HTCondor--CE Installation and setup}
The first CE installation was a bit tricky. The RPMs were available from OSG
repositories only, meaning that a number of default settings and dependencies
were unmet for EGI standards. Short after, however, HTCondor--CE RPMs were made
available on the same official repository of HTCondor.
\subsubsection{Setup}
To setup the configuration for the HTCondor and HTCondor--CE, puppet modules
are available. Unfortunately the puppet system at our site is not compatible
with these modules as they depend on \tmtextit{hiera}, which is not supported
at our site. These were later adapted to make them compatible with our
configuration management system. In the meanwhile, the setup was finalized
looking at the official documentation.
\subsubsection{Configuration}
The first configuration was completed manually. The main documentation
source for the HTCondor--CE is that of the OSG website~\cite{OSGDOC},
which refers to a tool \tmtextit{osg-configure} not present on the
general HTCondor--CE release. Because of this, the setup was completed
by trial and error. Once a working setup was obtained, a set of
integration notes were added to a public wiki~\cite{INFNWIKI}. This
should help other non OSG users to get some supplementary hint to
complete their installation.
\subsubsection{Accounting}
As of 2018, the official EGI accounting tool, APEL~\cite{APEL}, has no support for
HTCondor--CE. On the other hand, INFN--T1 has a custom accounting tool in
place for several years now~\cite{DGAS}. Thus, it's all about finding a suitable way to
retrieve from HTCondor the same information that we retrieve from CREAM--CE
and LSF.
A working way to do so has been by using python and the \tmtextit{python
bindings}, a set of api interfaces to the HTCondor daemons. These can be used
to query the SCHEDD at the CE and retrieve a specified set of data\quad about
recently finished jobs, which are subsequently inserted to our local
accounting database. A noticeable fact to note, is that the grid information
(User DN, VO, etc.) are directly available together with all the needed
accounting data. This simplifies the accounting problem, as it is no more
necessary to collect grid data separately from the BLAH component and then
look for matches with the corresponding grid counterpart.
This solution have been used during 2018 to provide accounting for
HTCondor--CE testbed cluster.
\subsection{Running HTCondor--CE}
After some time to become confident with the main configuration tasks, the
testbed begun working with jobs submitted by the 4 LHC experiments from
September 2018. The system proved to be stable and smooth, being able to work
unattended. This confirms that this system can be a reliable substitute for
CREAM--CE and LSF.
\subsection{Running HTCondor}
The HTCondor batch system is a mature product with a large user base. We have
put less effort at investigating it deeply. We already know that most or all
of needed features will work well. Rather, some effort have been put on
dealing with configuration management.
\subsubsection{Configuration management}
Eventhoutgh a standard base of puppet classes have been adapted to our
management system, an additional python tool have been written to improve
flexibility and readiness. The tool works by reading and enforcing on each
node of the cluster a set of configuration directives written on text files
accessible from a shared filesystem. The actual set and the read order depends
on the host role and name. Doing so, a large cluster can be quite easily
managed as a collection of set of host sets. The tool is quite simple and
limited but it can be improved as needed when more complex requirements should
arise.
\subsection{The migration plan}
After using the testbed cluster a possible plan for a smooth migration have
been devised:
\begin{itemizedot}
\item Install and setup a new HTCondor cluster, with a few more HTCondor--CE
and an initial small set of Worker Nodes
\item Enable the LHC VOs on the new cluster
\item Add more WN to the new cluster gradually
\item Enable other Grid VOs
\item Finally, enable submission from local submissions. These are made from
a heterogenous set of users, with a potentially rich set of individual needs
and can require a considerable administrative effort to meet all of them.
\end{itemizedot}
\subsection{Conclusion}
A testbed cluster based on HTCondor--CE on top of HTCondor batch system has
been deployed to evaluate these as a substitute for CREAM--CE and LSF. The
evaluation has mostly focused on the HTCondor--CE, as it is the most recent
product. Apart for a few minor issues, mainly related to gaps in the available
documentation, the CE proved to be a stable component. The possibility to
perform accounting has been verified.
\section*{References}
\begin{thebibliography}{9}
\bibitem{OSGDOC} \url{https://opensciencegrid.org/docs/compute-element/install-htcondor-ce/}
\bibitem{INFNWIKI} \url{http://wiki.infn.it/progetti/htcondor-tf/htcondor-ce_setup}
\bibitem{DGAS} S. Dal Pra, ``Accounting Data Recovery. A Case Report from
INFN-T1'' Nota interna, Commissione Calcolo e Reti dell'INFN, {\tt CCR-48/2014/P}
\bibitem{APEL} \url{https://wiki.egi.eu/wiki/APEL}
\end{thebibliography}
\end{document}
\documentclass[a4paper]{jpconf}
\usepackage{graphicx}
\begin{document}
\title{ Infrastructures and Big Data processing as pillars in the XXXIII PhD couse in Data Sciece and Computation}
\title{ Infrastructures and Big Data processing as pillars in the XXXIII PhD course in Data Sciece and Computation}
%\address{Production Editor, \jpcs, \iopp, Dirac House, Temple Back, Bristol BS1~6BE, UK}
\author{D. Salomoni$^1$, A. Costantini$^1$, C. D. Duma$^1$, B. Martelli$^1$, D. Cesini$^1$, E. Fattibene$^1$ and D. Michelotto $^1$
\author{D. Salomoni$^1$, A. Costantini$^1$, C. D. Duma$^1$, B. Martelli$^1$, D. Cesini$^1$, E. Fattibene$^1$, D. Michelotto $^1$
% etc.
}
\address{$^1$ INFN-CNAF, Bologna, Italy}
\address{$^1$ INFN-CNAF, Bologna, IT}
\ead{davide.salomoni@cnaf.infn.it}
......@@ -32,7 +32,7 @@ issue, for example: joint doctoral degrees, co-­tutorship and student exchanges
member of the Course Board will provide.
The PhD course runs for four years and is aimed at train people to become able to carry out academic and industrial research at a level of abstraction that
builds atop each single scientific skill which lies at the basis of the field of ``Data Science".
builds atop each single scientific skill which lies at the basis of the field of ``Data Science''.
Drawing on this, students graduated in the field of Mathematical Physical, Chemical and Astronomical Sciences should produce original and significant
researches in terms of scientific publications and innovative applications, blending basic disciplines and finally specializing in specific fields as from those
......@@ -69,7 +69,7 @@ least 3 months abroad, during the 3rd/4th year of the course.
\section{Infrastructure for Big Data processing}
As already mentioned, the didactical units Infrastructure for Big Data processing Basic (IBDB) and Advanced (IBDA), headed by Davide Salomoni with the
support of the authors, have been an integral part of the PhD couse and constituted the personalized learning plan of some PhD students.
support of the authors, have been an integral part of the PhD course and constituted the personalized learning plan of some PhD students.
In order to made available the teaching material and to made possible an active interaction among the teachers and the students, a Content
Management System have been deployed and made available. The CMS elected for such activity have been Moodle \cite{moodle} and the entire courses
have been made available trough it via a dedicated link (https://moodle.cloud.cnaf.infn.it/).
......@@ -98,7 +98,7 @@ and Disaster Recovery have been described. Moreover, a discussion on computing m
\subsection{Infrastructure for Big Data processing Advanced}
The course is aimed at discussing the foundations of Cloud computing and storage services beyond IaaS (PaaS and SaaS) leading the students to understand how to
exploit distributed infrastructures for Big Data processing.
The IBDA couse is intended as an evolution of the IBDB and, therefore, before following this course the IBDB should have already been achieved, or having familiarity with the covered topics.
The IBDA course is intended as an evolution of the IBDB and, therefore, before following this course the IBDB should have already been achieved, or having familiarity with the covered topics.
At the end of the course, the student had practical and theoretical knowledge on distributed computing and storage infrastructures, cloud computing and virtualization,
parallel computing and their application to Big Data Analysis.
The course foresees an oral exam focusing on the presented topics. Students have been requested to prepare a small project discussed during the exam.
......
File added
\documentclass[a4paper]{jpconf}
\usepackage{graphicx}
\newcommand{\hyp} {$^{3}_{\Lambda}\mathrm H$}
\newcommand{\antihyp}{$^{3}_{\bar{\Lambda}} \overline{\mathrm H}$}
\newcommand{\fourhhyp} {$^{4}_{\Lambda}\mathrm H$}
\newcommand{\fourhehyp} {$^{4}_{\Lambda}\mathrm{He}$}
\newcommand{\parantihyp}{$\left(^{3}_{\bar{\Lambda}} \overline{\mathrm H} \right)$}
\newcommand{\he} {$^{3}\mathrm{He}$}
\newcommand{\antihe} {$^{3}\mathrm{\overline{He}}$}
\newcommand{\hefour} {$^{4}\mathrm{He}$}
\newcommand{\pip} {$\pi^+$}
\newcommand{\pim} {$\pi^-$}
\newcommand{\pio} {$\pi$}
\newcommand{\dedx} {d$E$/d$x$}
\newcommand{\pp} {pp\;}
\newcommand{\pPb} {p--Pb\;}
\newcommand{\PbPb} {Pb--Pb\;}
\newcommand{\XeXe} {Xe--Xe\;}
\newcommand{\Mmom} {\mbox{\rm MeV$\kern-0.15em /\kern-0.12em c$}}
\newcommand{\Gmom} {\mbox{\rm GeV$\kern-0.15em /\kern-0.12em c$}}
\newcommand{\Gmass} {\mbox{\rm GeV$\kern-0.15em /\kern-0.12em c^2$}}
\newcommand{\Mmass} {\mbox{\rm MeV$\kern-0.15em /\kern-0.12em c^2$}}
%\newcommand{\pt} {$p_{\rm T}$}
\newcommand{\ctau} {$c \tau$}
\newcommand{\ct} {$ct$}
\newcommand{\LKz} {$\Lambda$/$K^{0}$}
\newcommand{\s} {\sqrt{s}}
\newcommand{\snn} {\sqrt{s_{\mathrm{NN}}}}
\newcommand{\dndy} {d$N$/d$y$}
\newcommand{\OO} {$\mathrm{O^{2}}$}
\begin{document}
\title{ALICE computing at the INFN CNAF Tier 1}
\author{S. Piano$^1$, D. Elia$^2$, S. Bagnasco$^3$, F. Noferini$^4$, N. Jacazio$^5$, G. Vino$^2$}
\address{$^1$ INFN Sezione di Trieste, Trieste, IT}
\address{$^2$ INFN Sezione di Bari, Bari, IT}
\address{$^3$ INFN Sezione di Torino, Torino, IT}
\address{$^4$ INFN Sezione di Bologna, Bologna, IT}
\address{$^5$ INFN-CNAF, Bologna, IT}
\ead{stefano.piano@ts.infn.it}
\begin{abstract}
In this paper the computing activities for the ALICE experiment at the CERN LHC are described, in particular those in connection with the contribution of the Italian community and the role of the Tier1 located at the INFN CNAF in Bologna.
\end{abstract}
\section{Experimental apparatus and physics goal}
ALICE (A Large Ion Collider Experiment) is a general-purpose heavy-ion experiment specifically designed to study the physics of strongly interacting matter and QGP (Quark-Gluon Plasma) in nucleus-nucleus collisions at the CERN LHC (Large Hadron Collider).
The experimental apparatus consists of a central barrel part, which measures hadrons, electrons and photons, and a forward spectrometer to measure muons. It has been upgraded for Run2 by installing a second arm complementing the EMCAL at the opposite azimuth and thus enhancing the jet and di-jet physics. This extension, named DCAL for “Dijet Calorimeter” has been installed during the Long Shutdown 1 (LS1) period. Other detectors were also upgraded or completed: in particular the last few modules of TRD and PHOS were also installed while the TPC was refilled with a different gas mixture and equipped with a new redesigned readout electronics. Also the DAQ and HLT computing farms were upgraded to match the increased data rate foreseen in Run2 from the TPC and the TRD. A detailed description of the ALICE sub-detectors can be found in~\cite{Abelev:2014ffa}.\\
The main goal of ALICE is the study of the hot and dense matter created in ultra-relativistic nuclear collisions. At high temperature the Quantum CromoDynamics (QCD) predicts a phase transition between hadronic matter, where quarks and gluons are confined inside hadrons, and a deconfined state of matter known as Quark-Gluon Plasma~\cite{Adam:2015ptt, Adam:2016izf, Acharya:2018qsh}. Such deconfined state was also created in the primordial matter, a few microseconds after the Big Bang. The ALICE experiment creates the QGP in the laboratory through head-on collisions of heavy nuclei at the unprecedented energies of the LHC. The heavier the colliding nuclei and the higher the center-of-mass energy, the greater the chance of creating the QGP: for this reason, ALICE has also chosen lead, which is one of the largest nuclei readily available. In addition to the Pb-Pb collisions, the ALICE Collaboration is currently studying \pp and \PbPb systems, which are also used as reference data for the nucleus-nucleus collisions. During 2017 ALICE acquired also a short pilot run (corresponding to one LHC fill) of \XeXe collisions.
\section{Data taking, physics results and upgrade activities}
The main goal of the run in 2018 was to complete the approved Run2 physics program and it was fully achieved thanks to the excellent performance of the apparatus.
ALICE resumed data taking with beams in April at the restart of LHC operation with pp collisions ($\s=13$~TeV). ALICE continued to collect statistics with pp collisions from April 2nd to October 25th with the same trigger mix as in 2017. As planned, ALICE was operating with pp luminosity leveled to $\mathrm{2.6\times10^{30}}$ $\mathrm{cm^{-2}s^{-1}}$ providing an interaction rate of 150 kHz. The HLT compression factor was improved to 8.5 throughout the data taking, thus the HLT was able to reject the higher amount of spurious clusters, which were anticipated with Ar-CO2 gas mixture in the TPC. The average RAW data event size after compression was 1.7MB at the nominal interaction rate (150 kHz), exactly as expected and used for the resource calculations. At the end of the pp period, ALICE arrived at 43\% combined efficiency (LHC availability 47\% * ALICE efficiency 92\%).
The \PbPb ($\snn=5.02$~TeV) data taking period started in November 2018 and was scheduled for 24 days. The target was to reach a total integrated luminosity of 1 $\mathrm{nb^{-1}}$ for Run2 and to complete the ALICE goals for the collection of a large sample of central and minimum bias collisions. To achieve this, the interaction rate was leveled at 8 kHz (L = $\mathrm{1.0\times10^{27}}$ $\mathrm{cm^{-2}s^{-1}}$) and data taken at close to the maximum achievable readout rate. The accelerator conditions were different compared to the foreseen mainly because of the delay in the beam start by 3-4 days due to solenoid coil fault in LINAC3 and the 20\% loss of integrated luminosity due to beam sizes 50\% larger at IP2 than at IP1/IP5 during the whole Pb-Pb period. The LHC time in Stable Beams was 47\%, the average data taking efficiency by ALICE was 87\% and a maximum HLT compression factor close to 9 has been reached during the Pb-Pb period. To compensate for the reduced beam availability, the rates of different triggers were adjusted to increase as much as possible the statistics in central and semi-central events. Overall, we collected 251M central and mid-central events and 159M minimum bias events. To further minimize the impact of Pb-bPb run on tape resources, ALICE additionally compressed the non-TPC portion of RAW data (by applying level 2 gzip compression) resulting in additional 17\% reduction of data volume on tape. As a result, the accumulated amount of Pb--Pb RAW data was 5.5~PiB. A total amount of RAW data of 11~PiB, including pp, was written to tape at Tier0, and then replicated at the Tier1s. The data accumulation curve at Tier0 is shown in Fig.\ref{fig:rawdata} and about 4.2~PiB of RAW data has been replicated to CNAF during 2018 with a maximum rate of 360 TiB per week, limited only by the tape drives speed considering the 100 Gb/s LHCOPN bandwidth between CERN and CNAF, as shown by the Fig.\ref{fig:tottraftape}.
\begin{figure}[!ht]
\begin{center}
\includegraphics[width=0.75\textwidth]{raw_data_accumulation_run2}
\end{center}
\caption{Raw data accumulation curve for Run2.}
\label{fig:rawdata}
\end{figure}
The p-p data collected in 2018 has been fully calibrated and processed in Pass1, as well as the associated general-purpose MC. All productions were executed according to plan and within the CPU and storage budget, in time to free the resources for the Pb-Pb data processing. The Pb-Pb RAW data calibration and offline quality assurance validation started in parallel with the data taking, with samples of data uniformly taken from each LHC fill. Full calibration was completed by 20 December, then the production pass began at the T0/T1s. On average, 40K CPU cores are used for the production and the processing has been completed by the end of February 2019. In parallel, the general purpose MC associated with the Pb-Pb data is being validated and prepared for full production.
\begin{figure}[!ht]
\begin{center}
\includegraphics[width=0.75\textwidth]{total_traffic_cnaf_tape_2018}
\end{center}
\caption{ALICE traffic per week and total traffic on the CNAF tape during 2018.}
\label{fig:tottraftape}
\end{figure}
Along 2018 ALICE many new physics results have been obtained from pp, p--Pb, \PbPb and \XeXe collisions from Run2 data taking, while also the collaboration has continued to work on results from the analysis of the Run1 data. Almost 50 papers have been submitted to journals in the last year, including in particular the main topics reported in the following.
In \pp and in \pPb collisions, for instance, ALICE studied the
$\Lambda_{\rm c}^+$ production~\cite{Acharya:2017kfy}, the prompt and non-prompt $\hbox {J}/\psi$ production and nuclear modification at mid-rapidity~\cite{Acharya:2018yud} and the measurement of the inclusive $\hbox {J}/\psi$ polarization at forward rapidity in \pp collisions
at $\s= 8$~TeV \cite{Acharya:2018uww}.
Looking at \PbPb data ALICE succeeded in studying
the $D$-meson azimuthal anisotropy in midcentral Pb-Pb collisions at $\snn=5.02$~TeV~\cite{Acharya:2017qps}, the Z$^0$-boson production at large rapidities in Pb-Pb collisions at $\snn=5.02$~TeV~\cite{Acharya:2017wpf} and the anisotropic flow of identified particles in Pb-Pb collisions at $ \snn=5.02 $~TeV~\cite{Acharya:2018zuq}. The anisotropic flow was also studied in \XeXe collisions at $\snn = 5.44$~TeV~\cite{Acharya:2018ihu}, together with the inclusive $\hbox {J}/\psi$ production~\cite{Acharya:2018jvc} and the transverse momentum spectra and nuclear modification factors of charged particles~\cite{Acharya:2018eaq}.\\
The general upgrade strategy for Run3 is conceived to deal with this challenge with expected \PbPb interaction rates of up to 50 kHz aiming at an integrated luminosity above 10 $\mathrm{nb^{-1}}$. The five TDRs, namely for the new ITS, the TPC GEM-based readout chambers, the Muon Forward Tracker, the Trigger and Readout system, and the Online/Offline computing system were fully approved by the CERN Research Board between 2014 and 2015. In 2017 the transition from the R\&D phase to the construction of prototypes of the final detector elements was successfully completed. For the major systems, the final prototype tests and evaluations were performed and the production readiness reviews have been successful, the production started during the 2017 and has been continued throughout 2018.
\section{Computing model and R\&D activity in Italy}
The ALICE computing model is still heavily based on Grid distributed computing; since the very beginning, the base principle underlying it has been that every physicist should have equal access to the data and computing resources~\cite{ALICE:2005aa}. According to this principle, the ALICE peculiarity has always been to operate its Grid as a “cloud” of computing resources (both CPU and storage) with no specific role assigned to any given center, the only difference between them being the Tier level to which they belong. All resources have to be made available to all ALICE members, according only to experiment policy and not on resource physical location, and data is distributed according to network topology and availability of resources and not in pre-defined datasets. Tier1s only peculiarities are their size and the availability of tape custodial storage, which holds a collective second copy of raw data and allows the collaboration to run event reconstruction tasks there. In the ALICE model, though, tape recall is almost never done: all useful data reside on disk, and the custodial tape copy is used only for safekeeping. All data access is done through the xrootd protocol, either through the use of “native” xrootd storage or, like in many large deployments, using xrootd servers in front of a distributed parallel filesystem like GPFS.\\
The model has not changed significantly for Run2, except for scavenging of some extra computing power by opportunistically use the HLT farm when not needed for data taking. All raw data collected in 2017 has been passed through the calibration stages, including the newly developed track distortion calibration for the TPC, and has been validated by the offline QA process before entering the final reconstruction phase. The ALICE software build system has been extended with additional functionality to validate the AliRoot release candidates with a large set of raw data from different years as well as with various MC generators and configurations. It uses the CERN elastic cloud infrastructure, thus allowing for dynamic provision of resources as needed. The Grid utilization in the accounting period remained high, with no major incidents. The CPU/Wall efficiency remained constant, at about 85\% across all Tiers, similar to the previous year. The much higher data rate foreseen for Run3, though, will require a major rethinking of the current computing model in all its components, from the software framework to the algorithms and of the distributed infrastructure. The design of the new computing framework for Run3, started in 2013 and mainly based on the concepts of Online-Offline integration (“\OO\ Project”), has been finalized with the corresponding Technical Design Report~\cite{Buncic:2015ari}: development and implementation phases as well as performance tests are currently ongoing.\\
The Italian share of the ALICE distributed computing effort (currently about 17\%) includes resources both form the Tier1 at CNAF and from the Tier2s in Bari, Catania, Torino and Padova-LNL, plus some extra resources in Trieste. The contribution from the Italian community to the ALICE computing in 2018 has been mainly spread over the usual items, such as the development and maintenance of the (AliRoot) software framework, the management of the computing infrastructure (Tier1 and Tier2 sites) and the participation in the Grid operations of the experiment.\\
In addition, in the framework of the computing R\&D activities in Italy, the design and development of a site dashboard project started a couple of years ago has been continued in 2017 and has been finalized in the first half of 2018. In its original idea, the project aimed at building a monitoring system able to gather information from all the available sources to improve the management of a Tier2 datacenter. A centralized site dashboard based on specific tools selected to meet tight technical requirements, like the capability to manage a huge amount of data in a fast way and through an interactive and customizable graphical user interface, has been developed. Its current version, running in the Bari Tier2 site since more than two years, relies on an open source time-series database (InfluxDB), a dashboard builder for visualizing time-series metrics (Grafana) and dedicated code written to implement the gathering sensors. The Bari dashboard has been exported in all the other sites along 2016 and 2017: the project has now entered the final phase where a unique centralized dashboard for the ALICE computing in Italy is being implemented. The project prospects also include the design of a more general monitoring system for distributed datacenters able to provide active support to site administrators in detecting critical events as well as to improve problem solving and debugging procedures. A contribution on the Italian dashboard has been presented at CHEP 2016~\cite{Elia:2017dzc}. This project also underlies the development of the new ALICE monitoring system for the \OO\ farm at CERN, which was approved by the \OO\ Technical Board; a first prototype of such monitoring system has been made ready to be used for the TPC detector test at P2 in May 2018. This development corresponds to the main activity item for one of the three fellowship contracts provided by the INFN in 2017 for the LHC computing developments towards Run3 and Run4. The other two fellowships are devoted to the analysis framework and to new strategies in the analysis algorithms. In particular, one fellow focuses on implementing a new analysis framework for the \OO\ system, while the other one is implementing a general-purpose framework to easily include industry-standard Machine Learning tools in the analysis workflows.\\
\section{Role and contribution of the INFN Tier1 at CNAF}
CNAF is a full-fledged ALICE Tier1 center, having been one of the first to join the production infrastructure years ago. According to the ALICE cloud-like computing model, it has no special assigned task or reference community, but provides computing and storage resources to the whole collaboration, along with offering valuable support staff for the experiment’s computing activities. It provides reliable xrootd access both to its disk storage and to the tape infrastructure, through a TSM plugin that was developed by CNAF staff specifically for ALICE use.\\
As a result of flooding, the CNAF computing center stopped operation on November 8th, 2017; tape access had been made available again on January 31st 2018, and the ALICE Storage Element was fully recovered by February 23th. The loss of CPU resources during the Tier1 shutdown was partially mitigated by the reallocation of the Tier1 worker nodes located in Bari to the Tier2 Bari queue. At the end of February 2018 the CNAF local farm had been powered again moving from 50 kHS06 gradually to 140 kHS06. In addition, on March 15th 170 kHS06 at CINECA became available thanks to a 500 Gb/s dedicated link.
Since March running at CNAF has been remarkably stable: for example, both the disk and tape storage availabilities have been better than 98\%, ranking CNAF in the top 5 most reliable sites for ALICE. The computing resources provided for ALICE at the CNAF Tier1 center were fully used along the year, matching and often exceeding the pledged amounts due to access to resources unused by other collaborations. Overall, about 64\% of the ALICE computing activity was Monte Carlo simulation, 14\% raw data processing (which takes place at the Tier0 and Tier1 centers only) and 22\% analysis activities: Fig.~\ref{fig:runjobsusers} illustrates the share among the different activities in the ALICE running job profile along the last 12 months.\\
\begin{figure}[!ht]
\begin{center}
\includegraphics[width=0.75\textwidth]{running_jobs_per_users_2018}
\end{center}
\caption{Share among the different ALICE activities in the 2018 running jobs profile.}
\label{fig:runjobsusers}
\end{figure}
In order to optimize the use of resources and enhance the “CPU efficiency” (the ratio of CPU to Wall Clock times), an effort was started in 2011 to move the analysis tasks from user-submitted “chaotic” jobs to organized, centrally managed “analysis trains”. The current split of analysis activities, in terms of CPU hours, is about 14\% individual jobs and 86\% organized trains.
Since April 2018, CNAF deployed the pledged resources corresponding to about 52 kHS06 CPU, 5140 TB disk and 13530 TB tape storage.\\
The INFN Tier1 has provided about 4,9\% since March 2018 and about 4.20\% along all year of the total CPU hours used by ALICE, ranking second of the ALICE Tier1 sites despite the flooding incident, as shown in Fig. \ref{fig:walltimesharet1}. The cumulated fraction of CPU hours along the whole year for CNAF is about 21\% of the all ALICE Tier1 sites, following only FZK in Karlsruhe (24\%).
\begin{figure}[!ht]
\begin{center}
\includegraphics[width=0.75\textwidth]{wall_time_tier1_2018}
\end{center}
\caption{Ranking of CNAF among ALICE Tier1 centers in 2018.}
\label{fig:walltimesharet1}
\end{figure}
This amounts to about 44\% of the total Wall Time of the INFN contribution: it successfully completed nearly 10.5 million jobs, for a total of more than 44 millions CPU hours, the running job profile at CNAF in 2018 is shown in Fig.\ref{fig:rjobsCNAFunov}.\\
Since mid-November a new job submission queue has been made available to ALICE and used to successfully test the job queueing mechanism, the scheduling policy, the priority scheme, the resource monitoring and the resource management with HTCondor at CNAF.
\begin{figure}[!ht]
\begin{center}
\includegraphics[width=0.75\textwidth]{running_jobs_CNAF_2018}
\end{center}
\caption{Running jobs profile at CNAF in 2018.}
\label{fig:rjobsCNAFunov}
\end{figure}
At the end of the last year ALICE was keeping on disk at CNAF more than 4.1 PiB of data in nearly 118 million files, plus more than 10 PiB of raw data on custodial tape storage; the reliability of the storage infrastructure is commendable, even taking into account the extra layer of complexity introduced by the xrootd interfaces. The excellent FS performances allow to analyse data from SE with an average throughput of about 1.6 GB/s and a peak throughput of about 3.0 GB/s, as shown in Fig.\ref{fig:nettrafse}.
\begin{figure}[!ht]
\begin{center}
\includegraphics[width=0.75\textwidth]{network_traffic_cnaf_se_2018}
\end{center}
\caption{Network traffic on the ALICE xrootd servers at CNAF during 2018.}
\label{fig:nettrafse}
\end{figure}
Also network connectivity has always been reliable; the 100 Gb/s of the LHCOPN and the 100 Gb/s of the LHCONE WAN links makes CNAF one of the better-connected sites in the ALICE Computing Grid, allowing ALICE to sustain a total traffic up to 360 TB of raw data per week from Tier0 to CNAF, as demonstrated by the Fig.\ref{fig:tottraftape}.
\section*{References}
\begin{thebibliography}{9}
%\cite{Abelev:2014ffa}
\bibitem{Abelev:2014ffa}
B.~B.~Abelev {\it et al.} [ALICE Collaboration],
%``Performance of the ALICE Experiment at the CERN LHC,''
Int.\ J.\ Mod.\ Phys.\ A {\bf 29} (2014) 1430044.
%doi:10.1142/S0217751X14300440
%[arXiv:1402.4476 [nucl-ex]].
%%CITATION = doi:10.1142/S0217751X14300440;%%
%310 citations counted in INSPIRE as of 01 Mar 2018
%\cite{Adam:2015ptt}
\bibitem{Adam:2015ptt}
J.~Adam {\it et al.} [ALICE Collaboration],
%``Centrality dependence of the charged-particle multiplicity density at midrapidity in Pb-Pb collisions at $\snn = 5.02$ TeV,''
Phys.\ Rev.\ Lett.\ {\bf 116} (2016) no.22, 222302.
% doi:10.1103/PhysRevLett.116.222302
% [arXiv:1512.06104 [nucl-ex]].
%%CITATION = doi:10.1103/PhysRevLett.116.222302;%%
%73 citations counted in INSPIRE as of 04 Mar 2018
%\cite{Adam:2016izf}
\bibitem{Adam:2016izf}
J.~Adam {\it et al.} [ALICE Collaboration],
%``Anisotropic flow of charged particles in Pb-Pb collisions at $\snn=5.02$ TeV,''
Phys.\ Rev.\ Lett.\ {\bf 116} (2016) no.13, 132302.
% doi:10.1103/PhysRevLett.116.132302
% [arXiv:1602.01119 [nucl-ex]].
%%CITATION = doi:10.1103/PhysRevLett.116.132302;%%
%69 citations counted in INSPIRE as of 04 Mar 2018
%\cite{Acharya:2018qsh}
\bibitem{Acharya:2018qsh}
S.~Acharya {\it et al.} [ALICE Collaboration],
%``Transverse momentum spectra and nuclear modification factors of charged particles in pp, p-Pb and Pb-Pb collisions at the LHC,''
arXiv:1802.09145 [nucl-ex].
%%CITATION = ARXIV:1802.09145;%%
%\cite{Acharya:2017kfy}
\bibitem{Acharya:2017kfy}
S.~Acharya {\it et al.} [ALICE Collaboration],
%``$\Lambda_{\rm c}^+$ production in pp collisions at $\s = 7$ TeV and in p-Pb collisions at $\snn = 5.02$ TeV,''
JHEP {\bf 1804} (2018) 108.
% doi:10.1007/JHEP04(2018)108
% [arXiv:1712.09581 [nucl-ex]].
%%CITATION = doi:10.1007/JHEP04(2018)108;%%
%34 citations counted in INSPIRE as of 07 May 2019
\bibitem{Acharya:2018yud}
S.~Acharya {\it et al.} [ALICE Collaboration],
%``Prompt and non-prompt $\hbox {J}/\psi $ production and nuclear modification at mid-rapidity in p–Pb collisions at $\snn= 5.02}$ TeV,''
Eur.\ Phys.\ J.\ C {\bf 78} (2018) no.6, 466.
% doi:10.1140/epjc/s10052-018-5881-2
% [arXiv:1802.00765 [nucl-ex]].
%%CITATION = doi:10.1140/epjc/s10052-018-5881-2;%%
%3 citations counted in INSPIRE as of 07 May 2019
%\cite{Acharya:2018uww}
\bibitem{Acharya:2018uww}
S.~Acharya {\it et al.} [ALICE Collaboration],
%``Measurement of the inclusive J/ $\psi $ polarization at forward rapidity in pp collisions at $\s = 8$ TeV,''
Eur.\ Phys.\ J.\ C {\bf 78} (2018) no.7, 562.
% doi:10.1140/epjc/s10052-018-6027-2
% [arXiv:1805.04374 [hep-ex]].
%%CITATION = doi:10.1140/epjc/s10052-018-6027-2;%%
%4 citations counted in INSPIRE as of 07 May 2019
%\cite{Acharya:2017qps}
\bibitem{Acharya:2017qps}
S.~Acharya {\it et al.} [ALICE Collaboration],
%``$D$-meson azimuthal anisotropy in midcentral Pb-Pb collisions at $\snn=5.02}$ TeV,''
Phys.\ Rev.\ Lett.\ {\bf 120} (2018) no.10, 102301.
% doi:10.1103/PhysRevLett.120.102301
% [arXiv:1707.01005 [nucl-ex]].
%%CITATION = doi:10.1103/PhysRevLett.120.102301;%%
%42 citations counted in INSPIRE as of 07 May 2019
%\cite{Acharya:2017wpf}
\bibitem{Acharya:2017wpf}
S.~Acharya {\it et al.} [ALICE Collaboration],
%``Measurement of Z$^0$-boson production at large rapidities in Pb-Pb collisions at $\snn=5.02$ TeV,''
Phys.\ Lett.\ B {\bf 780} (2018) 372.
% doi:10.1016/j.physletb.2018.03.010
% [arXiv:1711.10753 [nucl-ex]].
%%CITATION = doi:10.1016/j.physletb.2018.03.010;%%
%4 citations counted in INSPIRE as of 07 May 2019
%\cite{Acharya:2018zuq}
\bibitem{Acharya:2018zuq}
S.~Acharya {\it et al.} [ALICE Collaboration],
%``Anisotropic flow of identified particles in Pb-Pb collisions at $\snn=5.02 $ TeV,''
JHEP {\bf 1809} (2018) 006.
% doi:10.1007/JHEP09(2018)006
% [arXiv:1805.04390 [nucl-ex]].
%%CITATION = doi:10.1007/JHEP09(2018)006;%%
%11 citations counted in INSPIRE as of 07 May 2019
%\cite{Acharya:2018ihu}
\bibitem{Acharya:2018ihu}
S.~Acharya {\it et al.} [ALICE Collaboration],
%``Anisotropic flow in Xe-Xe collisions at $\snn = 5.44}$ TeV,''
Phys.\ Lett.\ B {\bf 784} (2018) 82.
% doi:10.1016/j.physletb.2018.06.059
% [arXiv:1805.01832 [nucl-ex]].
%%CITATION = doi:10.1016/j.physletb.2018.06.059;%%
%19 citations counted in INSPIRE as of 07 May 2019
%\cite{Acharya:2018jvc}
\bibitem{Acharya:2018jvc}
S.~Acharya {\it et al.} [ALICE Collaboration],
%``Inclusive J/$\psi$ production in Xe–Xe collisions at $\snn = 5.44$ TeV,''
Phys.\ Lett.\ B {\bf 785} (2018) 419.
% doi:10.1016/j.physletb.2018.08.047
% [arXiv:1805.04383 [nucl-ex]].
%%CITATION = doi:10.1016/j.physletb.2018.08.047;%%
%5 citations counted in INSPIRE as of 07 May 2019
%\cite{Acharya:2018eaq}
\bibitem{Acharya:2018eaq}
S.~Acharya {\it et al.} [ALICE Collaboration],
%``Transverse momentum spectra and nuclear modification factors of charged particles in Xe-Xe collisions at $\snn= 5.44$ TeV,''
Phys.\ Lett.\ B {\bf 788} (2019) 166.
% doi:10.1016/j.physletb.2018.10.052
% [arXiv:1805.04399 [nucl-ex]].
%%CITATION = doi:10.1016/j.physletb.2018.10.052;%%
%18 citations counted in INSPIRE as of 07 May 2019
%\cite{ALICE:2005aa}
\bibitem{ALICE:2005aa}
P.~Cortese {\it et al.} [ALICE Collaboration],
%``ALICE technical design report of the computing,''
CERN-LHCC-2005-018.
%%CITATION = CERN-LHCC-2005-018;%%
%44 citations counted in INSPIRE as of 01 Mar 2018
%\cite{Buncic:2015ari}
\bibitem{Buncic:2015ari}
P.~Buncic, M.~Krzewicki and P.~Vande Vyvre,
%``Technical Design Report for the Upgrade of the Online-Offline Computing System,''
CERN-LHCC-2015-006, ALICE-TDR-019.
%%CITATION = CERN-LHCC-2015-006, ALICE-TDR-019;%%
%53 citations counted in INSPIRE as of 01 Mar 2018
%\cite{Elia:2017dzc}
\bibitem{Elia:2017dzc}
D.~Elia {\it et al.} [ALICE Collaboration],
%``A dashboard for the Italian computing in ALICE,''
J.\ Phys.\ Conf.\ Ser.\ {\bf 898} (2017) no.9, 092054.
%doi:10.1088/1742-6596/898/9/092054
%%CITATION = doi:10.1088/1742-6596/898/9/092054;%%
%\cite{Abelev:2014dsa}
%\bibitem{Abelev:2014dsa}
% B.~B.~Abelev {\it et al.} [ALICE Collaboration],
% %``Transverse momentum dependence of inclusive primary charged-particle production in p-Pb
% collisions at $\snn=5.02~\text {TeV}$,''
% Eur.\ Phys.\ J.\ C {\bf 74} (2014) no.9, 3054.
% %doi:10.1140/epjc/s10052-014-3054-5
% %[arXiv:1405.2737 [nucl-ex]].
% %%CITATION = doi:10.1140/epjc/s10052-014-3054-5;%%
% %73 citations counted in INSPIRE as of 01 Mar 2018
%\cite{Abelev:2013haa}
%\bibitem{Abelev:2013haa}
% B.~B.~Abelev {\it et al.} [ALICE Collaboration],
% %``Multiplicity Dependence of Pion, Kaon, Proton and Lambda Production in p-Pb Collisions
% at $\snn = 5.02$ TeV,''
% Phys.\ Lett.\ B {\bf 728} (2014) 25.
% %doi:10.1016/j.physletb.2013.11.020
% %[arXiv:1307.6796 [nucl-ex]].
% %%CITATION = doi:10.1016/j.physletb.2013.11.020;%%
% %233 citations counted in INSPIRE as of 01 Mar 2018
\end{thebibliography}
\end{document}
contributions/alice/network_traffic_cnaf_se_2018.png

146 KiB

contributions/alice/raw_data_accumulation_run2.png

66.2 KiB

contributions/alice/running_jobs_CNAF_2018.png

122 KiB

contributions/alice/running_jobs_per_users_2018.png

182 KiB

contributions/alice/total_traffic_cnaf_tape_2018.png

65.1 KiB

contributions/alice/wall_time_tier1_2018.png

70.5 KiB

\documentclass[a4paper]{jpconf}
\usepackage{graphicx}
\usepackage{hyperref}
\usepackage{color}
\begin{document}
\title{AMS-02 data processing and analysis at CNAF}
\author{B. Bertucci$^{1,2}$, M. Duranti$^2$, V. Formato$^{2,\ast}$, D. Spiga$^{2}$}
\address{$^1$ Universit\`a di Perugia, I-06100 Perugia, Italy}
\address{$^2$ INFN, Sezione Perugia, I-06100 Perugia, Italy}
\address{AMS experiment \url{http://ams.cern.ch}, \url{http://www.ams02.org}, \url{http://www.pg.infn.it/ams/}}
\ead{* valerio.formato@infn.it}
\begin{abstract}
AMS is large acceptance Cosmic Ray (CR) detector operating in space, on board the International Space Station (ISS) since the 19$^{\textrm{th}}$ of May of 2011.\\
%AMS is a large acceptance instrument conceived to search for anti-particles (positrons, anti-protons, anti-deuterons) coming from dark matter annihilation, primordial anti-matter (anti-He or light anti nuclei) and to perform accurate measurements in space of the cosmic radiation in the GeV-TeV energy range.
%Installed on the International Space Station (ISS) in mid-May 2011, it is operating continuously since then, with a collected statistics of $\sim$ 130 billion events up to the end of 2018.
CNAF is one of the repositories of the full AMS data set and contributes to the data production and Monte Carlo simulation of the international collaboration. It represents the central computing resource for the data analysis performed by Italian collaboration and its role will be reviewed in this document.
In the following, the AMS computing framework, the role of the CNAF computing center and the use of the CNAF resources in 2018 will be given.\\
In addition the R\&D activities on going, to integrate cloud resources in such a framework, will discussed.
\end{abstract}
\section{Introduction}
AMS is a large acceptance instrument conceived to search for anti-particles
(positrons, anti-protons, anti-deuterons) coming from dark matter annihilation,
primordial anti-matter (anti-He or light anti nuclei) and to perform accurate
measurements in space of the cosmic radiation in the GeV-TeV energy range.
\begin{figure}[t]
\begin{center}
\includegraphics[width=0.49\textwidth]{AMS_nuovo.pdf}
\end{center}
\caption{\label{fig:ams_layout} AMS-02 detector consists of nine planes of
precision silicon tracker, a transition radiation detector (TRD), four planes
of time of flight counters (TOF), a permanent magnet, an array of
anticoincidence counters (ACC), surrounding the inner tracker, a ring imaging
Cherenkov detector (RICH), and an electromagnetic calorimeter (ECAL).}
\end{figure}
The layout of the AMS-02 detector is shown in Fig. \ref{fig:ams_layout}.
A large spectrometer is the core of the instrument: a magnetic field of 0.14
T generated by a permanent magnet deflects in opposite directions positive and
negative particles whose trajectories are accurately measured up to TeV
energies by means of 9 layers of double side silicon micro-strip detectors -
the Tracker - with a spatial resolution of $\sim 10 \mu m$ in the single point
measurement along the track. Redundant measurements of the particle
characteristics, as velocity, absolute charge magnitude ($Z$), rigidity and
energy are performed by a Time of Flight system, the tracker, a RICH detector
and a 3D imaging calorimeter with a 17 $X_0$ depth. A transition radiation
detector provides an independent e/p separation with a rejection power of
$\sim 10^3$ around 100 GeV.
AMS has been installed on the International Space Station (ISS) in mid-May 2011
and it is operating continuously since then, with a collected statistics of
$\sim$ 130 billion events up to the end of 2018.
The signals from the $\sim$ 300.000 electronic channels of the detector and its
monitoring system (thermal and pressure sensors) are reduced on board to match
the average bandwidth of $\sim$10 Mbit/s for the data transmission from space
to ground, for a $\sim$ 100 GB/day of raw data produced by the experiment.
Due to the rapidly changing environmental conditions along the $\sim$ 90 minutes
orbit of the ISS at 390 Km of altitude, continuous monitoring and adjustments of
the data taking conditions are performed in the Payload and Operation Control
Center (POCC) located at CERN and a careful calibration of the detector response
is needed to process the raw data and reconstruct physics quantities for data analysis.
CNAF is one of the repositories of the full AMS data set, both raw and processed
data are stored at CNAF which represents the central computing resource for the
data analysis performed by Italian collaboration and contributes as well to the
data production and Monte Carlo simulation of the international collaboration.
\section{AMS-02 Computing Model and Computing Facilities}
As a payload on the ISS, AMS has to be compliant to all of the standard
communication protocols used by NASA to communicate with ISS, and its data have
to be transmitted through the NASA communication network.
On the ground, data are finally stored at the AMS Payload Operation Control
Center (POCC) at CERN.
Data are continuously collected, 24 hours per day, 365 days per year.
The data reconstruction pipeline is mainly composed by two logical step:
\begin{itemize}
\item[1)]{
the {\bf First Production} runs continuously over incoming data doing an
initial validation and indexing. It produces the so called ``standard'' (STD)
reconstructed data stream, ready within two hours after data are received at
CERN, that is used to calibrate different sub-detectors as well as to monitor
off-line the detector performances. In this stage Data Summary Files are
produced for fast event selections.
}
\item[2)]{
Data from the First Production are reprocessed applying all of sub-detector
calibrations, alignments, ancillary data from ISS and slow control data to
produce reconstructed data for the physics analysis.
This {\bf Second production} step is usually applied in an incremental way
to the STD data sample, every 6 months, the time needed to produce and
certify the calibrations. A full reprocessing of all AMS data is carried
out periodically in case of major software major updates, providing the so
called ``pass'' production. Up to 2018 there were 7 full data reproductions
done. The last published measurements were based on the pass6 data set, but all the analyses being carried out for the next publications are based on the pass7 ones.
}
\end{itemize}
The First Production is processed at CERN on a dedicated farm of about 200
cores, whereas Monte Carlo productions, ISS data reprocessing and user data
analysis are supported by a network of computing centers (see fig.
\ref{fig:contributors}).
\begin{figure}[t]
\begin{center}
\includegraphics[width=0.5\textwidth]{contributors.pdf}
\end{center}
\caption{AMS-02 Major Contributors to Computing Resources.}
\label{fig:contributors}
\end{figure}
Usually China and Taiwan centers are mostly devoted to Monte Carlo production,
while CERN, CNAF and FZJ Julich are the main centers for data reprocessing.
A light-weight production platform has been realized to run on different
computing centers, using different platforms. Based on perl, python and sqlite3,
it is easily deployable and allows to have a fully automated production cycle,
from job submission to monitoring, validation, transferring.
\section{CNAF contribution}
CNAF is the main computing resource for data analysis of the AMS Italian collaboration with $\sim$ 20000 HS06, $\sim$ 2 PB of storage on disk and 1 PB on tape, allocated in 2018.
A full copy of the AMS raw data is preserved on tape, while, usually, the latest production and part of the Monte Carlo sample are available on disk.
More then 30 users are routinely performing the bulk of their analysis at CNAF, transferring to local sites (i.e. their small local computing farm or their laptop) just reduced data sets or histograms.
As described in the following, during 2018, the possibility of a XRootD endpoint at CNAF has been explored. The goal is to federate, through XRootD, the $\sim$ 5 PB available for the AMS Collaboration at CERN, with the $\sim$ 2 PB at CNAF. In this picture, CNAF will be the second data center to share its disk space togheter with the one available for the collaboration, optimizing it on large-scale.
\section{Data processing strategy at CNAF}
Two local queues are available for the AMS users: the default running is the
{\it ams} queue, with a CPU limit of 11520 minutes (it allows to run 8 core
multi-thread jobs for 1 day) and a maximum of 1500 job running simultaneously,
where as for test runs the {\it ams$\_$short} queue, with high priority but a
CPU limit of 360 minutes and 100 jobs as running limit.
For data reprocessing or MC production the AMS production queue {\it ams$\_$prod},
with a CPU limit of 5760 minutes and no jobs limit, is available and accessible
only to the data production team of the international collaboration and few
experts users of the Italian team.
In fact, the {\it ams$\_$prod} queue is used within the data analysis process
to produce data streams of pre-selected events and lightweight data files with a
custom format \cite{dst} on the full AMS data statistics.
In such a way, the final analysis can easily process the reduced data set
avoiding the access to the large AMS data sample.
The data-stream and custom data files productions are usually repeated few times a year.\\
At CNAF, on the local filesystem, usually, the last data production (at the time of writing is still the pass6 but the full transfer of the pass7 is proceeding) and a sub-sample of Monte Carlo production are available.\\
%Due to the high I/O throughput from disk of AMS jobs, $\sim$ 1MB/s (1000 running
%jobs may generate $\sim$ 1GB/s of I/O throughput from disk), the AMS Collaboration verified the possibility to remotely access, in a batch job run ning , all the files of the AMS production at CERN via XRootD.
%To efficiently use all of the resources allocated at CNAF, in terms of disk and
%CPU time, AMS decided to adopt a double strategy to process AMS data files:
%lsf jobs using data stored on local filesystem and lsf jobs accessing data files
%at CERN via xrootd protocol.
%{\color{red}
%BOH PURE TUTTA STA PARTE QUA \\
%We limit the number of jobs running on local filesystem to $\sim$ 800 and use all the available computing resources to process files via xrootd, not saturating the access to local disk. Typical use of CNAF resources is shown in figure \ref{fig:input_output_net}: the number of pending (green) and running (blue) AMS jobs in the top plot and the input/output (green/blue) network traffic rate, in the bottom plot, as a function of time are displayed.
%From figure \ref{fig:input_output_net}, one can observe the increasing in the
%network traffic correlated to the number of running jobs making use of xrootd.
%}
%Currently, AMS server disk can support I/O throughput up to 40 Gb/s and we are verifying our strategy, increasing the number of lsf running on local disk and accessing external files via xrootd only when they are not locally available or if the local filesystem is going to be saturated.
In the past years, the AMS-Italy collaboration, explored the possibility to run part of the batch (LSF) jobs running at CNAF, on data accessed remotely, through XRootD, from the AMS dataset at CERN. This was completely proven to run smoothly and since a couple of years, once a dataset is not present at CNAF, due to the lack of space, the analysis is performed simply accessing these files from CERN. While usually the last `pass` is fully avalaible at CNAF, we don't have enough space to host also the corresponding Monte Carlo data samples.\\
On the contrary there are some data samples (e.g. reduced {\it streams} or reduced ntuples produced with a custom and lightest data format) tailored for the analyses being carried by the AMS-Italy members, that are not present at CERN and so, essentially, available only directly from CNAF.\\
To facilitate data analysis for users on small remote sites (i.e. the computing farms present in the various INFN branches and in the ASI headquarters) and to seamlessly integrate extra available resources (e.g. cloud resources), in 2018 the possibility of a XRootD endpoint at CNAF has been explored and is currently under test. This will allow users to read/write data on the GPFS area and it will serve as a starting point for an eventual federation with the CERN AMS XRootD endpoint.\\
Having a XRootD federation with CERN will automatically provide all the functionalities described above:
\begin{itemize}
\item CNAF users will ``see'' a single {\it namespace} with all the AMS data files. They will access all the data files ignoring if these are present or not at CNAF: if a file is not locally avalaible the XRootD server will transparently redirect to the corresponding version at CERN;
\item the online disk space will be optimized between CERN and CNAF: most used data sets (e.g. pass7) will be present in both the sites while less accessed ones will be present just in one of the two sites. Custom data sets, instead, will be present just at CNAF;
\item the smaller computing centers and cloud resources, essentially without disk space, will access the complete AMS data set from the XRootD federation and will have the possibility to write their output directly at CNAF.
\end{itemize}
The potentiality of the integration of external resources (AMS-ASI computing center and cloud resources) into a single batch system will be described in more detailes in Sec.\ref{ReD}.
%\begin{figure}[h]
%\begin{center}
%\includegraphics[width=20pc]{input_output.jpg}
%\end{center}
%\caption{Number of pending (green) and running (blue) AMS jobs, in the top plot, and input (green)/output(blue) network traffic rate, on the lower plot, as a function of time}.
%\label{fig:input_output_net}
%\end{figure}
\section{Activities in 2018}
AMS activities at CNAF in 2018 have been related to data reprocessing, Monte
Carlo production and data analysis. Those activities have produced four publications reporting the measurement of the primary and secondary component of cosmic rays from Lithium to Oxygen \cite{Aguilar:2018keu,Aguilar:2018njt} and of the fine time-structures of electron, positron, proton and Helium fluxes \cite{Aguilar:2018wmi,Aguilar:2018ons} perfomed by AMS.
%\subsection*{Monte Carlo production}
%As part of the network AMS computing centers, CNAF has been involved in the Monte Carlo campaign devoted to the study of proton, helium and light nuclei ions for AMS publications. To support Monte Carlo campaign, special LSF profile has been implemented to allow AMS users to submit multi-thread simulation jobs. The AMS collaboration in 2016 used $\sim$11000 CPU-years for MC production. In particular in 2016 the collaboration started to face one of the main physics objectives of the experiment: the search for primordial anti-matter, i.e. anti-Helium. Being the anti-Helium more rare than 1 particle over 10 million Helium particles, and so the signal/background so tiny, a large effort to produce a MC sample of the background (i.e. Helium) is needed in order to have a statistical meaning sample to search Helium particles being mis-identified as anti-Helium. A large MC Helium production, with 35 billion simulated events, corresponding to $\sim$ 6000 CPU-years, has been conducted. This effort has been shared among the various AMS collaboration production sites, including CNAF, as shown in Fig.\ref{fig:He-MC}.
%\begin{figure}[h]
%\begin{center}
%\includegraphics[trim = 145pt 270pt 145pt 270pt, clip, width=0.35\textwidth]{He-MC.pdf}
%\end{center}
%\caption{Sharing among the various production sites of the $\sim$ 6000 CPU-years needed for the anti-Helium analysis.}.
%\label{fig:He-MC}
%\end{figure}
\subsection*{Data analysis}
Different analysis are carried on by the Italian collaboration. In 2018, the CNAF resources for user analysis have been devoted to several different topic: the update, with more statistics, of the electron and positron analyses (they resulted in two PRL publications in 2019 \cite{Aguilar:2019pos,Aguilar:2019ele}), the measurement of the light nuclei abundances (that resulted in the PRL publications \cite{Aguilar:2018keu,Aguilar:2018njt}) and the study of their time variation as well as the study of the proton and helium flux as a function of time, the deuteron abundance measurement and the antideuteron search analysis.
%The disk resources pledged in 2018, $\sim$ 2 PB, were mostly devoted to the pass7 data sample ($\sim$ 1 PB), MC data sample ($\sim$ 400 TB), selected data streams ($\sim$ 100 TB of pre-selected data used for common electron/positron, antiproton, antideuteron, proton and ion analysis) and scratch area for users.
\subsection*{Research and Development}
\label{ReD}
As mentioned above, during 2017 AMS started evaluating the technical feasibility of integrating also cloud resources (possibly seamlessly) in order to primarily benefit of external computing resources, meant as opportunistic resources. The architectural model foreseen is that all AMS data are and will be hosted at CNAF. Possible cloud compute resources should be able to remotely access data (might be caching locally for the sake of the I/O optimization) and produced data (namely output files) should be moved into the CNAF storage.\\
AMS work-flow has been successfully integrated in DODAS (Dynamic On Demand Analysis Service, a thematic service funded by the EOSC-hub European project, \cite{DODAS}) and the work-flow has been validated and consolidated during 2018. The success of the validation tests performed over the HelixNebula Science Cloud provided resources and over the Google Cloud INFN grant, motivate further exploitation as well as evolution of the strategy. In total in 2018 the Italian collaboration benefited of more than 4\textit{\,k\,HS06\,yr} of opportunistic resources, that represent $\sim$ 20\% of the ones obtained from CNAF.\\
More in detail during the 2019 the plan is to consolidate the usage of the INFN on-premises cloud providers, namely Cloud@ReCaS Bari and Cloud@CNAF in the context of DODAS. Consolidation by means of improvement in managing I/O by using emerging solution for data caching as well as starting exploiting geographically distributed clusters.\\
The latter is about exploiting DODAS based solutions to create a single logical cluster running over any available resource provider. The desired solution is to allow user submitting jobs from e.g. CNAF provided User Interface to a single queue and allow dynamic clusters to fetch payloads in a secure and transparent (to the end user) way.\\
From a technical perspective the distributed cluster implementation will be based on HTCondor technology which is a important strategic aspect because of we expect this will allow, later on, a completely seamless integration within the batch system of the CNAF Tier 1.
As a note, the two above mentioned activities are strictly related and the optimization of the I/O strategy will be key to the success of the distributed cluster implementation.\\
Another initiative started during 2017 is the geographically extension of the OpenStack to a remote site. More in detail this activity is about a geographically distributed cloud system (based on OpenStack) aiming to share and manage computing and storage resources, owned by heterogeneous cooperating entities.\\
The prototype has been already developed and tested. During the 2019 the plan is to complete the integration of the whole available hardware hosted at ASI-SSDC (Space Science Data Center at the Italian Space Agency) located in Rome and starting exploiting it through DODAS. This will be one of the provider we expect to include in the geo-distributed setup above mentioned.
The goal by the end of 2019 is to bring the ASI-SSDC hosted computing resources to production.
\section*{References}
\begin{thebibliography}{9}
\bibitem{Aguilar:2018keu}
M.~Aguilar {\it et al.} [AMS Collaboration],
%``Precision Measurement of Cosmic-Ray Nitrogen and its Primary and Secondary Components with the Alpha Magnetic Spectrometer on the International Space Station,''
Phys.\ Rev.\ Lett.\ {\bf 121} (2018) no.5, 051103. DOI:\url{10.1103/PhysRevLett.121.051103}
%%CITATION = doi:10.1103/PhysRevLett.121.051103;%%
%6 citations counted in INSPIRE as of 24 Apr 2019
\bibitem{Aguilar:2018wmi}
M.~Aguilar {\it et al.} [AMS Collaboration],
%``Observation of Fine Time Structures in the Cosmic Proton and Helium Fluxes with the Alpha Magnetic Spectrometer on the International Space Station,''
Phys.\ Rev.\ Lett.\ {\bf 121} (2018) no.5, 051101. doi:\url{10.1103/PhysRevLett.121.051101}
%%CITATION = doi:10.1103/PhysRevLett.121.051101;%%
%8 citations counted in INSPIRE as of 24 Apr 2019
\bibitem{Aguilar:2018ons}
M.~Aguilar {\it et al.} [AMS Collaboration],
%``Observation of Complex Time Structures in the Cosmic-Ray Electron and Positron Fluxes with the Alpha Magnetic Spectrometer on the International Space Station,''
Phys.\ Rev.\ Lett.\ {\bf 121} (2018) no.5, 051102. doi:\url{10.1103/PhysRevLett.121.051102}
%%CITATION = doi:10.1103/PhysRevLett.121.051102;%%
%10 citations counted in INSPIRE as of 24 Apr 2019
\bibitem{Aguilar:2018njt}
M.~Aguilar {\it et al.} [AMS Collaboration],
%``Observation of New Properties of Secondary Cosmic Rays Lithium, Beryllium, and Boron by the Alpha Magnetic Spectrometer on the International Space Station,''
Phys.\ Rev.\ Lett.\ {\bf 120} (2018) no.2, 021101. doi:\url{10.1103/PhysRevLett.120.021101}
%%CITATION = doi:10.1103/PhysRevLett.120.021101;%%
%34 citations counted in INSPIRE as of 24 Apr 2019
\bibitem{Aguilar:2019pos}
M.~Aguilar {\it et al.} [AMS Collaboration],
% Towards Understanding the Origin of Cosmic-Ray Positrons
Phys.\ Rev.\ Lett.\ {\bf 122} (2019) no.4, 041102.
doi:\url{10.1103/PhysRevLett.122.041102}
\bibitem{Aguilar:2019ele}
M.~Aguilar {\it et al.} [AMS Collaboration],
% Towards Understanding the Origin of Cosmic-Ray Electrons
Phys.\ Rev.\ Lett.\ {\bf 122} (2019) no.10, 101101.
doi:\url{10.1103/PhysRevLett.122.101101}
\bibitem{dst} D. D'Urso \& M. Duranti, Journal of Physics: Conference Series, 664 (2015), 072016
\bibitem{DODAS}
D. Spiga {\it et al.}
%“DODAS: How to effectively exploit heterogeneous clouds for scientific computations”,
PoS(ISGC-2018 \& FCDD) {\bf 024} doi:\url{https://doi.org/10.22323/1.327.0024}
%\bibitem{xrootd} http://xrootd.org.
\end{thebibliography}
\end{document}
File added
File added
File added
contributions/ams/input_output.jpg

74.5 KiB

contributions/ams/production_jobs.jpg

42.8 KiB

\documentclass[a4paper]{jpconf}
\usepackage{graphicx}
\begin{document}
\title{The ATLAS Experiment at the INFN CNAF Tier 1}
\author{A. De Salvo$^1$, L. Rinaldi$^2$}
\address{$^1$ INFN Sezione di Roma-1, Roma, IT}
\address{$^2$ Universit\`a di Bologna e INFN Sezione di Bologna, Bologna, IT}
\ead{alessandro.desalvo@roma1.infn.it, lorenzo.rinaldi@bo.infn.it}
\begin{abstract}
The ATLAS experiment at LHC was fully operating in 2017. In this contribution we describe the ATLAS computing activities performed in the Italian sites of the Collaboration, and in particular the utilisation of the CNAF Tier 1.
\end{abstract}
\section{Introduction}
ATLAS \cite{ATLAS-det} is one of two general-purpose detectors at the Large Hadron Collider (LHC). It investigates a wide range of physics, from the search for the Higgs boson and standard model studies to extra dimensions and particles that could make up dark matter. Beams of particles from the LHC collide at the center of the ATLAS detector making collision debris in the form of new particles, which fly out from the collision point in all directions. Six different detecting subsystems arranged in layers around the collision point record the paths, momentum, and energy of the particles, allowing them to be individually identified. A huge magnet system bends the paths of charged particles so that their momenta can be measured. The interactions in the ATLAS detectors create an enormous flow of data. To digest the data, ATLAS uses an advanced trigger system to tell the detector which events to record and which to ignore. Complex data-acquisition and computing systems are then used to analyse the collision events recorded. At 46 m long, 25 m high and 25 m wide, the 7000-tons ATLAS detector is the largest volume particle detector ever built. It sits in a cavern 100 m below ground near the main CERN site, close to the village of Meyrin in Switzerland.
More than 3000 scientists from 174 institutes in 38 countries work on the ATLAS experiment.
ATLAS has been taking data from 2010 to 2012, at center of mass energies of 7 and 8 TeV, collecting about 5 and 20 fb$^{-1}$ of integrated luminosity, respectively. During the complete Run-2 phase (2015-2018) ATLAS collected and registered at the Tier 0 147 fb$^{-1}$ of integrated luminosity at center of mass energies of 13 TeV.
The experiment has been designed to look for New Physics over a very large set of final states and signatures, and for precision measurements of known Standard Model (SM) processes. Its most notable result up to now has been the discovery of a new resonance at a mass of about 125 GeV \cite{ATLAS higgs}, followed by the measurement of its properties (mass, production cross sections in various channels and couplings). These measurements have confirmed the compatibility of the new resonance with the Higgs boson, foreseen by the SM but never observed before.
\section{The ATLAS Computing System}
The ATLAS Computing System \cite{ATLAS-cm} is responsible for the provision of the software framework and services, the data management system, user-support services, and the world-wide data access and job-submission system. The development of detector-specific algorithmic code for simulation, calibration, alignment, trigger and reconstruction is under the responsibility of the detector projects, but the Software and Computing Project plans and coordinates these activities across detector boundaries. In particular, a significant effort has been made to ensure that relevant parts of the “offline” framework and event-reconstruction code can be used in the High Level Trigger. Similarly, close cooperation with Physics Coordination and the Combined Performance groups ensures the smooth development of global event-reconstruction code and of software tools for physics analysis.
\subsection{The ATLAS Computing Model}
The ATLAS Computing Model embraces the Grid paradigm and a high degree of decentralisation and sharing of computing resources. The required level of computing resources means that off-site facilities are vital to the operation of ATLAS in a way that was not the case for previous CERN-based experiments. The primary event processing occurs at CERN in a Tier 0 Facility. The RAW data is archived at CERN and copied (along with the primary processed data) to the Tier 1 facilities around the world. These facilities archive the raw data, provide the reprocessing capacity, provide access to the various processed versions, and allow scheduled analysis of the processed data by physics analysis groups. Derived datasets produced by the physics groups are copied to the Tier 2 facilities for further analysis. The Tier 2 facilities also provide the simulation capacity for the experiment, with the simulated data housed at Tier 1 centers. In addition, Tier 2 centers provide analysis facilities, and some provide the capacity to produce calibrations based on processing raw data. A CERN Analysis Facility provides an additional analysis capacity, with an important role in the calibration and algorithmic development work. ATLAS has adopted an object-oriented approach to software, based primarily on the C++ programming language, but with some components implemented using FORTRAN and Java. A component-based model has been adopted, whereby applications are built up from collections of plug-compatible components based on a variety of configuration files. This capability is supported by a common framework that provides common data-processing support. This approach results in great flexibility in meeting both the basic processing needs of the experiment, but also for responding to changing requirements throughout its lifetime. The heavy use of abstract interfaces allows for different implementations to be provided, supporting different persistency technologies, or optimized for the offline or high-level trigger environments.
The Athena framework is an enhanced version of the Gaudi framework that was originally developed by the LHCb experiment, but is now a common ATLAS-LHCb project. Major
design principles are the clear separation of data and algorithms, and between transient (in-memory) and persistent (in-file) data. All levels of processing of ATLAS data, from high-level trigger to event simulation, reconstruction and analysis, take place within the Athena framework; in this way it is easier for code developers and users to test and run algorithmic code, with the assurance that all geometry and conditions data will be the same for all types of applications ( simulation, reconstruction, analysis, visualization).
One of the principal challenges for ATLAS computing is to develop and operate a data storage and management infrastructure able to meet the demands of a yearly data volume of O(10PB) utilized by data processing and analysis activities spread around the world. The ATLAS Computing Model establishes the environment and operational requirements that ATLAS data-handling systems must support and provides the primary guidance for the development of the data management systems.
The ATLAS Databases and Data Management Project (DB Project) leads and coordinates ATLAS activities in these areas, with a scope encompassing technical data bases (detector production, installation and survey data), detector geometry, online/TDAQ databases, conditions databases (online and offline), event data, offline processing configuration and bookkeeping, distributed data management, and distributed database and data management services. The project is responsible for ensuring the coherent development, integration and operational capability of the distributed database and data management software and infrastructure for ATLAS across these areas.
The ATLAS Computing Model defines the distribution of raw and processed data to Tier 1 and Tier 2 centers, so as to be able to exploit fully the computing resources that are made available to the Collaboration. Additional computing resources are available for data processing and analysis at Tier 3 centers and other computing facilities to which ATLAS may have access. A complex set of tools and distributed services, enabling the automatic distribution and processing of the large amounts of data, has been developed and deployed by ATLAS in cooperation with the LHC Computing Grid (LCG) Project and with the middleware providers of the three large Grid infrastructures we use: EGI, OSG and NorduGrid. The tools are designed in a flexible way, in order to have the possibility to extend them to use other types of Grid middleware in the future.
The main computing operations that ATLAS have to run comprise the preparation, distribution and validation of ATLAS software, and the computing and data management operations run centrally on Tier 0, Tier 1 sites and Tier 2 sites. The ATLAS Virtual Organization allows production and analysis users to run jobs and access data at remote sites using the ATLAS-developed Grid tools.
The Computing Model, together with the knowledge of the resources needed to store and process each ATLAS event, gives rise to estimates of required resources that can be used to design and set up the various facilities. It is not assumed that all Tier 1 sites or Tier 2 sites are of the same size; however, in order to ensure a smooth operation of the Computing Model, all Tier 1 centers usually have broadly similar proportions of disk, tape and CPU, and similarly for the Tier 2 sites.
The organization of the ATLAS Software and Computing Project reflects all areas of activity within the project itself. Strong high-level links are established with other parts of the ATLAS organization, such as the TDAQ Project and Physics Coordination, through cross-representation in the respective steering boards. The Computing Management
Board, and in particular the Planning Officer, acts to make sure that software and computing developments take place coherently across sub-systems and that the project as a whole meets its milestones. The International Computing Board assures the information flow between the ATLAS Software and Computing Project and the national resources and their Funding Agencies.
\section{The role of the Italian Computing facilities in the global ATLAS Computing}
Italy provides Tier 1, Tier 2 and Tier 3 facilities to the ATLAS collaboration. The Tier 1, located at CNAF, Bologna, is the main center, also referred as “regional” center. The Tier 2 centers are distributed in different areas of Italy, namely in Frascati, Napoli, Milano and Roma. All 4 Tier 2 sites are considered as Direct Tier 2 (T2D), meaning that they have an higher importance with respect to normal Tier 2s and can have primary data too. They are also considered satellites of the Tier 1, also identified as nucleus. The total of the Tier 2 sites corresponds to more than the total ATLAS size at the Tier 1, for what concerns disk and CPUs; tape is not available in the Tier 2 sites. A third category of sites is the so-called Tier 3 centers. Those are smaller centers, scattered in different places in Italy, that nevertheless contributes in a consistent way to the overall computing power, in terms of disk and CPUs. The overall size of the Tier 3 sites corresponds roughly to the size of a Tier 2 site. The Tier 1 and Tier 2 sites have pledged resources, while the Tier 3 sites do not have any pledge resource available.
In terms of pledged resources, Italy contributes to the ATLAS computing as 9\% of both CPU and disk for the Tier 1. The share of the Tier 2 facilities corresponds to 7\% of disk and 9\% of CPU of the whole ATLAS computing infrastructure. The Italian Tier 1, together with the other Italian centers, provides both resources and expertise to the ATLAS computing community, and manages the so-called Italian Cloud of computing. Since 2015 the Italian Cloud does not only include Italian sites, but also Tier 3 sites of other countries, namely South Africa and Greece.
The computing resources, in terms of disk, tape and CPU, available in the Tier 1 at CNAF have been very important for all kind of activities, including event generation, simulation, reconstruction, reprocessing and analysis, for both MonteCarlo and real data. Its major contribution has been the data reprocessing, since this is a very I/O and memory intense operation, normally executed only in Tier 1 centers. In this sense CNAF has played a fundamental role for the fine measurement of the Higgs [3] properties in 2018 and other analysis. The Italian centers, including CNAF, have been very active not only in the operation side, but contributed a lot in various aspect of the Computing of the ATLAS experiment, in particular for what concerns the network, the storage systems, the storage federations and the monitoring tools. The Tier 1 at CNAF has been very important for the ATLAS community in 2018, for some specific activities:
\begin{itemize}
\item improvements on the WebDAV/HTTPS access for StoRM, in order to be used as main renaming method for the ATLAS files in StoRM and for http federation purposes;
\item improvements of the dynamic model of the multi-core resources operated via the LSF resource management system and simplification of the PanDA queues, using the Harvester service to mediate the control and information flow between PanDA and the resources.
\item network troubleshooting via the Perfsonar-PS network monitoring system, used for the LHCONE overlay network, together with the other Tier 1 and Tier 2 sites;
\item planning, readiness testing and implementation of the HTCondor batch system for the farming resources management.
\end{itemize}
\section{Main achievements of ATLAS Computing centers in Italy}
The Italian Tier 2 Federation runs all the ATLAS computing activities in the Italian cloud supporting the operations at CNAF, the Italian Tier 1 center, and the Milano, Napoli, Roma1 and Frascati Tier 2 sites. This insures an optimized use of the resources and a fair and efficient data access. The computing activities of the ATLAS collaboration have been constantly carried out over the whole 2018, in order to analyse the data of the Run-2 and produce the Monte Carlo data needed for the 2018 run.
The LHC data taking started in April 2018 and, until the end of the operation in December 2018, all the Italian sites, the CNAF Tier 1 and the four Tier 2 sites, have been involved in all the computing operations of the collaboration: data reconstruction, Monte Carlo simulation, user and group analysis and data transfer among all the sites. Besides these activities, the Italian centers have contributed to the upgrade of the Computing Model both from the testing side and the development of specific working groups. ATLAS collected and registered at the Tier 0 ~60.6 fb$^{-1}$ and ~25 PB of raw and derived data, while the cumulative data volume distributed in all the data centers in the grid was of the order of ~80 PB. The data has been replicated with an efficiency of 100\% and an average throughput of the order of ~13 GB/s during the data taking period, with peaks above 25 GB/s. For just Italy, the average throughput was of the order of 800 MB/s with peaks above 2GB/s. The data replication speed from Tier 0 to the Tier 2 sites has been quite fast with a transfer time lower than 4 hours. The average number of simultaneous jobs running on the grid has been of about 110k for production (simulation and reconstruction) and data analysis, with peaks over 150k, with an average CPU efficiency up to more than 80\%. The use of the grid for analysis has been stable on ~26k simultaneous jobs, with peaks around the conferences’ periods to over 40k, showing the reliability and effectiveness of the use of grid tools for data analysis.
The Italian sites contributed to the development of the Xrootd and http/webdav federation. In the latter case the access to the storage resources is managed using the http/webdav protocol, in collaboration with the CERN DPM team, the Belle2 experiment, the Canadian Corporate Cloud ant the RAL (UK) site. The purpose is to build a reliable storage federation, alternative to the Xrootd one, to access physics data both on the grid and on cloud storage infrastructures (like Amazon S3, MicroSoft Azure, etc). The Italian community is particularly involved in this project and the first results have been presented to the WLCG collaboration.
The Italian community also contributes to develop new tools for distributed data analysis and management. Another topic of interest is the usage of new computing technologies: in this field the Italian community contributed to the development and testing of muon tracking algorithms in the ATLAS High Level Trigger, using GPGPU. Other topics in which the Italian community is involved are the Machine Learning/Deep Learning for both analysis and Operational Intelligence and their applications to the experiment software and infrastructure, by using accelerators like GPGPU and FPGAs.
The contribution of the Italian sites to the computing activities in terms of processed jobs and data recorded has been of about 9\%, corresponding to the order of the resource pledged to the collaboration, with very good performance in term of availability, reliability and efficiency. All the sites are always in the top positions in the ranking of the collaboration sites.
Besides the Tier 1 and Tier 2 sites, in 2018 also the Tier 3 sites gave a significant contribution to the Italian physicists community for the data analysis. The Tier 3 centers are local farms dedicated to the interactive data analysis, the last step of the analysis workflow, and to the grid analysis over small data sample. Several italian groups set up a farm for such a purpose in their universities and, after a testing and validation process performed by the distributed computing team of the collaboration, all have been recognized as official Tier 3s of the collaboration.
\section{Impact of CNAF flooding incident on ATLAS computing activities}
The ATLAS Computing Model was designed to have a sufficient redundancy of the available resources in order to tackle emergency situations like the flooding occurred on November 9th 2017 at CNAF. Thanks to the huge effort of the whole community of the CNAF, the operativity of the data center restarted gradually from the second half of February 2018. A continuous interaction between ATLAS distributed computing community and CNAF people was needed to bring the computing operation fully back to normality. The deep collaboration was very successful and after one month the site was almost fully operational and the ATLAS data management and processing activities were running smoothly again. Eventually, the overall impact of the incident was limited enough, mainly thanks to the relatively quick recovery of the CNAF data center and to the robustness of the computing model.
\section*{References}
\begin{thebibliography}{9}
\bibitem{ATLAS-det} The ATLAS Computing Technical Design Report ATLAS-TDR-017;
CERN-LHCC-2005-022, June 2005
\bibitem{ATLAS higgs} Observation of a new particle in the search for the Standard Model Higgs boson with the ATLAS detector at the LHC, the ATLAS Collaboration, Physics Letters B, Volume 716, Issue 1, 17 September 2012, Pages 1–29
\bibitem{ATLAS-cm} The evolution of the ATLAS computing model; R W L Jones and D Barberis 2010 J. Phys.: Conf. Ser. 219 072037 doi:10.1088/1742-6596/219/7/072037
\end{thebibliography}
\end{document}