Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • faproietti/ar2018
  • chierici/ar2018
  • SDDS/ar2018
  • cnaf/annual-report/ar2018
4 results
Show changes
File added
......@@ -2,34 +2,34 @@
\usepackage{graphicx}
\begin{document}
\title{User and Operational Support at CNAF}
\author{D. Cesini, E. Corni, F. Fornari, L. Morganti, C. Pellegrino, M. V. P. Soares, M. Tenti, L. Dell'Agnello}
\address{INFN-CNAF, Bologna, IT}
\author{D. Cesini$^1$, E. Corni$^1$, F. Fornari$^1$, L. Morganti$^1$, C. Pellegrino$^1$, M. V. P. Soares$^1$, M. Tenti$^1$, L. Dell'Agnello$^1$}
\address{$^1$ INFN-CNAF, Bologna, IT}
\ead{user-support@lists.cnaf.infn.it}
\begin{abstract}
Many different research groups, typically organized in Virtual Organizations (VOs),
exploit the Tier-1 Data center facilities for computing and/or data storage and management. Moreover, CNAF hosts two small HPC farms and a Cloud infrastructure. The User Support unit provides to the users of all CNAF facilities with a direct operational support, and promotes common technologies and best-practices to access the ICT resources in order to facilitate the usage of the center and maximize its efficiency.
exploit the Tier 1 Data center facilities for computing and/or data storage and management. Moreover, CNAF hosts two small HPC farms and a Cloud infrastructure. The User Support unit provides to the users of all CNAF facilities with a direct operational support, and promotes common technologies and best-practices to access the ICT resources in order to facilitate the usage of the center and maximize its efficiency.
\end{abstract}
\section{Current status}
Born in April 2012, the User Support team in 2018 was composed by one coordinator and up to five fellows with post-doctoral education or equivalent work experience in scientific research or computing.
The main activities of the team include:
\begin{itemize}
\item providing a prompt feedback to VO-specific issues via ticketing systems or official mail channels;
\item forwarding to the appropriate Tier-1 units those requests which cannot be autonomously satisfied, and taking care of answers and fixes, e.g. via the tracker JIRA, until a solution is delivered to the experiments;
\item forwarding to the appropriate Tier 1 units those requests which cannot be autonomously satisfied, and taking care of answers and fixes, e.g. via the tracker JIRA, until a solution is delivered to the experiments;
\item supporting the experiments in the definition and debugging of computing models in distributed and Cloud environments;
\item helping the supported experiments by developing code, monitoring frameworks and writing guides and documentation for users (see e.g. https://www.cnaf.infn.it/en/users-faqs/);
\item solving issues on experiment software installation, access problems, new accounts creation and any other daily usage problems;
\item porting applications to new parallel architectures (e.g. GPUs and HPC farms);
\item providing the Tier-1 Run Coordinator, who represents CNAF at the Daily WLCG calls, and reports about resource usage and problems at the monthly meeting of the Tier-1 management body (Comitato di Gestione del Tier-1).
\item providing the Tier 1 Run Coordinator, who represents CNAF at the Daily WLCG calls, and reports about resource usage and problems at the monthly meeting of the Tier 1 management body (Comitato di Gestione del Tier 1).
\end{itemize}
People belonging to the User Support team represent INFN Tier-1 inside the VOs.
People belonging to the User Support team represent INFN Tier 1 inside the VOs.
In some cases, they are directly integrated in the supported experiments. Moreover, they can play the role of a member of any VO for debugging purposes.
The User Support staff is also involved in different CNAF internal projects, notably the Computing on SoC Architectures (COSA) project (www.cosa-project.it) dedicated to the technology tracking and benchmarking of the modern low-power architectures for computing applications.
\section{Supported experiments}
The LHC experiments represent the main users of the data center, handling more than 80\% of the total computing and storage resources funded at CNAF. Besides the four LHC experiments (ALICE, ATLAS, CMS, LHCb) for which CNAF acts as Tier-1 site, the data center also supports an ever increasing number of experiments from the Astrophysics, Astroparticle physics and High Energy Physics domains, and specifically Agata, AMS-02, Argo-YBJ, Auger, Belle II, Borexino, CDF, Compass, COSMO-WNEXT CTA, Cuore, Cupid, Dampe, DarkSide-50, Enubet, Famu, Fazia, Fermi-LAT, Gerda, Icarus, LHAASO, LHCf, Limadou, Juno, Kloe, KM3Net, Magic, NA62, Newchim, NEWS, NTOP, Opera, Padme, Pamela, Panda, Virgo, and XENON.
The LHC experiments represent the main users of the data center, handling more than 80\% of the total computing and storage resources funded at CNAF. Besides the four LHC experiments (ALICE, ATLAS, CMS, LHCb) for which CNAF acts as Tier 1 site, the data center also supports an ever increasing number of experiments from the Astrophysics, Astroparticle physics and High Energy Physics domains, and specifically Agata, AMS-02, Auger, Belle II, Borexino, CDF, Compass, COSMO-WNEXT CTA, Cuore, Cupid, Dampe, DarkSide-50, Enubet, Famu, Fazia, Fermi-LAT, Gerda, Icarus, LHAASO, LHCf, Limadou, Juno, Kloe, KM3Net, Magic, NA62, Newchim, NEWS, NTOP, Opera, Padme, Pamela, Panda, Virgo, and XENON.
Clearly, a bigger effort from the User Support team is needed to answer to the varied and diverse needs from these no-LHC experiments and to encourage them to adopt more modern technologies, e.g. FTS, Dirac, token-based authorization.
\begin{figure}[ht]
......@@ -60,12 +60,13 @@ The following figures show resources pledged and used by the supported experimen
Unfortunately, the accounting data for storage, both disk and tape statistics, are available only after summer 2018, given the restoration of the complex system of sensors for accounting after the 2017 flooding had a lower priority with respect to activities needed for a complete of the storage resources involved in the flood.
\section{Support to HPC and cloud-based experiment}
Apart from Tier-1 facilities, CNAF hosts two small HPC farms and a cloud infrastructure. The first HPC cluster, in production since 2015, is composed of 27 nodes, some of them also equipped with one or more GPUs (NVIDIA Tesla K20, K40 and K1). All nodes are infiniband interconnected and equipped with 2 Intel CPUs, 8 physical cores each, HyperThread enabled. The cluster is accessible via the LSF batch system. It is open to various INFN communities, but the main users are theoretical physicist dealing with plasma laser acceleration simulations. The cluster serves as testing infrastructure to prepare the high resolution runs submitted to supercomputers.
Apart from Tier 1 facilities, CNAF hosts two small HPC farms and a cloud infrastructure. The first HPC cluster, in production since 2015, is composed of 27 nodes, some of them also equipped with one or more GPUs (NVIDIA Tesla K20, K40 and K1). All nodes are infiniband interconnected and equipped with 2 Intel CPUs, 8 physical cores each, HyperThread enabled. The cluster is accessible via the LSF batch system. It is open to various INFN communities, but the main users are theoretical physicists dealing with plasma laser acceleration simulations. The cluster is used as a testing infrastructure to prepare the high resolution runs to be submitted afterwards to supercomputers.
A second HPC cluster entered into production in 2017 to serve the CERN accelerators R/D groups. The cluster consists of 12 nodes OmniPath interconnected. Can be access through batch queues managed by the IBM LSF system.
A second HPC cluster entered into production in 2017 to serve the CERN accelerators R/D groups. The cluster consists of 12 nodes OmniPath interconnected. It can be access through batch queues managed by the IBM LSF system.
The support is provided on a daily base for what concerns software installation, access problems, new accounts creation and any other usage problems.
The User Support team manages an OpenStack-based tenant hosted within the Cloud@CNAF. This tenant, provided with 300 vCPUs, is mostly devoted to support peculiar use cases which require unusual software configurations and only for a limited amount of time. The most important of these use cases is the FAZIA experiment, for which 256 vCPUs were provided, distributed over 16 worker nodes with 8GB of RAM each, where the Debian 8.4 operating system has been installed and configured with LDAP+Kerberos for user authentication and authorization, and NFS 4 for network storage sharing. Recently, other experiments started accessing the Cloud infrastructure: AMS, EEE, FAZIA, Icarus and NTOF.
The User Support team manages an OpenStack-based tenant hosted within the Cloud@CNAF. This tenant, provided with 300 vCPUs, is mostly devoted to support peculiar use cases which require unusual software configurations and only for a limited amount of time. The most important of these use cases is the FAZIA experiment, for which 256 vCPUs were provided, distributed over 16 worker nodes with 8GB of RAM each, where the Debian 8.4 operating system has been installed and configured with LDAP and Kerberos for user authentication and authorization, and NFS 4 for network storage sharing.
Recently, other experiments started accessing the Cloud infrastructure: AMS, EEE, Icarus and NTOF.
\end{document}
......
......@@ -5,18 +5,18 @@
%\author{P. Astone$^1$, F. Badaracco$^{2,3}$, S. Bagnasco$^4$, S. Caudill$^5$, F. Carbognani$^6$, A. Cirone$^{7,8}$, G. Fronz\'e$^{4}$, J. Harms$^{2,3}$, I. LaRosa$^1$, C. Lazzaro$^9$, P. Leaci$^1$, S. Lusso$^4$, C. Palomba$^1$, R. DePietri$^{11,12}$, M. Punturo$^{10}$, L. Rei$^8$, L. Salconi$^6$, S. Vallero$^{4}$, on behalf of the Virgo collaboration}
\author{P. Astone$^1$, F. Badaracco$^{2,3}$, S. Bagnasco$^4$, S. Caudill$^5$, F. Carbognani$^6$, A. Cirone$^{7,8}$, M. Drago$^{2,3}$, G. Fronz\'e$^{4}$, J. Harms$^{2,3}$, I. LaRosa$^1$, C. Lazzaro$^9$, P. Leaci$^1$, S. Lusso$^4$, C. Palomba$^1$, R. DePietri$^{11,12}$, M. Punturo$^{10}$, L. Rei$^8$, L. Salconi$^6$, S. Vallero$^{4}$, on behalf of the Virgo collaboration}
\address{$^1$ INFN, Roma, IT}
\address{$^2$ Gran Sasso Science Institute (GSSI), IT}
\address{$^3$ INFN, Laboratori Nazionali del Gran Sasso, IT}
\address{$^4$ INFN, Torino, IT}
\address{$^5$ Nikhef, Science Park, NL}
\address{$^6$ EGO-European Gravitational Observatory, Cascina, Pisa, IT}
\address{$^7$ Universit\`a degli Studi di Genova, IT}
\address{$^8$ INFN, Genova, IT}
\address{$^9$ INFN, Padova, IT}
\address{$^{10}$ INFN, Perugia, IT}
\address{$^{11}$ Universit\`a degli Studi di Parma, IT}
\address{$^{12}$ INFN, Gruppo Collegato Parma, IT}
\address{$^1$ INFN Sezione di Roma, Roma, IT}
\address{$^2$ Gran Sasso Science Institute (GSSI), L'Aquila, IT}
\address{$^3$ INFN Laboratori Nazionali del Gran Sasso, L'Aquila, IT}
\address{$^4$ INFN Sezione di Torino, Torino, IT}
\address{$^5$ Nikhef, Amsterdam, NL}
\address{$^6$ EGO-European Gravitational Observatory, Cascina (PI), IT}
\address{$^7$ Universit\`a degli Studi di Genova, Genova, IT}
\address{$^8$ INFN Sezione di Genova, Genova, IT}
\address{$^9$ INFN Sezione di Padova, Padova, IT}
\address{$^{10}$ INFN Sezione di Perugia, Perugia, IT}
\address{$^{11}$ Universit\`a degli Studi di Parma, Parma, IT}
\address{$^{12}$ INFN Gruppo Collegato Parma, Parma, IT}
%\address{Production Editor, \jpcs, \iopp, Dirac House, Temple Back, Bristol BS1~6BE, UK}
......@@ -32,7 +32,7 @@ The amount of data processed during the last few years has emphasized the fact t
\section{Advanced Virgo computing model}
\subsection{Data production and data transfer}
The Advanced Virgo data acquisition system is writing about 35MB/s of data (so-called ``bulk data'') during O3. CNAF and CC-IN2P3 are the Virgo Tier-0: during the science runs, bulk data is stored in a circular buffer located at the Virgo site, and simultaneously transferred to the remote computing centres where they are archived in tape libraries. The transfer is realized through an ad-hoc procedure based on GridFTP (at CNAF) and iRods (at CC-IN2P3). Other data fluxes reach CNAF during science runs:
The Advanced Virgo data acquisition system is writing about 35MB/s of data (so-called ``bulk data'') during O3. CNAF and CC-IN2P3 are the Virgo Tier 0: during the science runs, bulk data is stored in a circular buffer located at the Virgo site, and simultaneously transferred to the remote computing centers where they are archived in tape libraries. The transfer is realized through an ad-hoc procedure based on GridFTP (at CNAF) and iRods (at CC-IN2P3). Other data fluxes reach CNAF during science runs:
\begin{itemize}
\item trend data (few GB/day), periodically transferred using the system described above;
......@@ -42,23 +42,23 @@ The Advanced Virgo data acquisition system is writing about 35MB/s of data (so-c
\subsection{Data Analysis at CNAF}
%The analysis of the LIGO and Virgo data was made jointly by the two collaborations; the analysis pipelines are distributed among the worldwide network of computing facilities offering computing resources to the GW experiments. CNAF was mainly used for CW analysis, looking for continuous gravitational wave signals, developed by INFN–Roma people (see hereafter more details). But at CNAF is also running part of the pyCBC pipeline, submitted via OSG, looking for compact binaries signals. pyCBC has a crucial role in the detection of the coalescence of BBH and BNS. CNAF contributed to the computation performed through pyCBC for the analysis of the events GW170814, the first BBH coalescence detected also by Virgo, and GW170817, the BNS coalescence. During the last month a new extension of CVMFS, \emph{big cvmfs} was mounted at cnaf to support another OSG pipeline, \emph{BayesWave}. The big cvmfs is able to export, in a posix fashion, big file of data from nearby cache in Amsterdam instead of accessing data directly from Nebraska. BayesWave is a Bayesian algorithm designed to robustly distinguish gravitational wave signals from noise and instrumental glitches without relying on any prior assumptions of waveform morphology. In the last year coherent WaveBurst \emph{cwb} was ported to cnaf and made available to run. cwb is a pipeline based on coherent algorithm for detection and reconstruction of modelled and unmodelled GW bursts. A new newtonian noise cancellation algoritmh, developed by the group of Gran Sasso Science Institute (\emph{GSSI}) was made available very recently. The increased number of LVC pipelines running at cnaf has led to saturate advance virgo pledge at cnaf, cnaf promptly rensponded to advance virgo needed enlargin our quota and giving experimental access to gpu.
LIGO-Virgo data analysis is organized jointly, meaning that the analysis pipelines are made available to the computing facilities related to the LVC network, ready to be distributed to each GW detector. CNAF has been mainly used for Continuous Wave(\emph{CW}) analysis, led by the Roma INFN group, and for the Compact Binary Coalescence python-based analysis (\emph{pyCBC}), submitted via OSG. In particular CNAF computationally contributed to GW170814 and GW170817 events, respectively the first BBH coalescence detected by Virgo and the first BNS merger ever observed. During the last month a new extension of CVMFS, so-called ``big cvmfs'', was mounted at CNAF to support another OSG-based pipeline, Bayes Wave. The former is able to make available, in a POSIX-like fashion, big data files from a cache in Amsterdam, instead of accessing the data directly from Nebraska. The latter is a Bayesian algorithm, designed to robustly distinguish GW signals from noise and instrumental glitches, without relying on any prior assumptions on the waveform shape. During the last year, coherent WaveBurst(\emph{cWB}), an algorithm dedicated to the detection and reconstruction of GW Bursts, was also ported to CNAF. Furthermore, new Newtonian Noise cancellation algorithms, which are currently being developed by the GSSI group, were made recently available. The increasing number of LVC pipelines running at CNAF has led to resource saturation, and consequently to a demand for enlarged computing power, together with access to GPUs.
LIGO-Virgo data analysis is organized jointly, meaning that the analysis pipelines are made available to the computing facilities related to the LVC network, ready to be distributed to each GW detector. CNAF has been mainly used for Continuous Wave (\emph{CW}) analysis, led by the Roma INFN group, and for the Compact Binary Coalescence python-based analysis (\emph{pyCBC}), submitted via OSG. In particular CNAF computationally contributed to GW170814 and GW170817 events, respectively the first BBH coalescence detected by Virgo and the first BNS merger ever observed. During the last month a new extension of CVMFS, so-called ``big cvmfs'', was mounted at CNAF to support another OSG-based pipeline, Bayes Wave. The former is able to make available, in a POSIX-like fashion, big data files from a cache in Amsterdam, instead of accessing the data directly from Nebraska. The latter is a Bayesian algorithm, designed to robustly distinguish GW signals from noise and instrumental glitches, without relying on any prior assumptions on the waveform shape. During the last year, coherent WaveBurst (\emph{cWB}), an algorithm dedicated to the detection and reconstruction of GW Bursts, was also ported to CNAF. Furthermore, new Newtonian Noise cancellation algorithms, which are currently being developed by the GSSI group, were made recently available. The increasing number of LVC pipelines running at CNAF has led to resource saturation, and consequently to a demand for enlarged computing power, together with access to GPUs.
\subsubsection{CW pipeline}
CNAF has been in 2018 the main computing center for Virgo all-sky continuous wave (CW) searches. The search for this kind of signals, emitted by spinning neutron stars, covers a large portion of the source parameter space and consists of several steps organized in a hierarchical analysis pipeline. CNAF has been mainly used for the ``incoherent'' stage, based of a particular implementation of the Hough transform, which is the heaviest part of the analysis from a computational point of view. The code implementing the Hough transform has been written in such a way that the exploration of the parameter space can be split in several independent jobs, each covering a range of signal frequencies and a portion of the sky. This is an embarrassingly parallel problem, very well suited to be run in a distributed computing environment. The analysis jobs have been run using the EGI UMD grid middleware, with input and output files stored in a StoRM-based Storage Element at CNAF. Candidate post-processing, consisting of clusterisation, coincidences and ranking, and parts of the candidate follow-up analysis have been also carried on at CNAF. Typical Hough transform jobs needs about 4GB of memory (with a fraction requiring more, up to 8GB). Past year most of the resources have been used to analyze Advanced LIGO O2 data. Overall, in 2018 more than 10M CPU hours have been used at CNAF for CW searches, by running O($10^5$) jobs, with duration from a few hours to ~3 days.
CNAF has been in 2018 the main computing center for Virgo all-sky continuous wave (CW) searches. The search for this kind of signals, emitted by spinning neutron stars, covers a large portion of the source parameter space and consists of several steps organized in a hierarchical analysis pipeline. CNAF has been mainly used for the ``incoherent'' stage, based of a particular implementation of the Hough transform, which is the heaviest part of the analysis from a computational point of view. The code implementing the Hough transform has been written in such a way that the exploration of the parameter space can be split in several independent jobs, each covering a range of signal frequencies and a portion of the sky. This is an embarrassingly parallel problem, very well suited to be run in a distributed computing environment. The analysis jobs have been run using the EGI UMD grid middleware, with input and output files stored in a StoRM-based Storage Element at CNAF. Candidate post-processing, consisting of clusterisation, coincidences and ranking, and parts of the candidate follow-up analysis have been also carried on at CNAF. A typical Hough transform job needs about 4GB of memory (with a fraction requiring more, up to 8GB). Past year most of the resources have been used to analyze Advanced LIGO O2 data. Overall, in 2018 more than 10M CPU hours have been used at CNAF for CW searches, by running O($10^5$) jobs, with duration from a few hours to ~3 days.
\subsubsection{cWB pipeline}
Starting in 2019, the coherent WaveBurst based pipelines have been ported and adapted to run at CNAF to reproduce the cWB environment setup on the worker nodes, without the constraint to read the user home account during running. It is planned to run at CNAF all Virgo offline long duration all-sky searches on the data that will be collected during the Observational Run 3 (03) that started April 1st, 2019. cWB is a data-analysis tool to search for a broad range of gravitational-wave (GW) transients. The pipeline identifies coincident events in the GW data from earth-based interferometric detectors and reconstructs the gravitational wave signal by using a constrained maximum likelihood approach. The algorithm performs a time-frequency analysis of the data, using wavelet representation, and identifies the events by clustering time-frequency pixels with significant excess coherent power. The likelihood statistics is built as a coherent sum over the responses of different detectors and estimates the total signal to noise ratio of the GW signal in the network. The pipeline splits the total analysis time into sub-periods to be analyzed in parallel jobs, using HTCondor tools and it is expected to use a consistent amount of CPU hours during 2019.
Starting in 2019, the coherent WaveBurst based pipelines have been ported and adapted to run at CNAF to reproduce the cWB environment setup on the worker nodes, without the constraint to read the user home account during running. It is planned to run at CNAF all Virgo offline long duration all-sky searches on the data that will be collected during the Observational Run 3 (03) that started April 1, 2019. cWB is a data-analysis tool to search for a broad range of gravitational-wave (GW) transients. The pipeline identifies coincident events in the GW data from earth-based interferometric detectors and reconstructs the gravitational wave signal by using a constrained maximum likelihood approach. The algorithm performs a time-frequency analysis of the data, using wavelet representation, and identifies the events by clustering time-frequency pixels with significant excess coherent power. The likelihood statistics is built as a coherent sum over the responses of different detectors and estimates the total signal to noise ratio of the GW signal in the network. The pipeline splits the total analysis time into sub-periods to be analyzed in parallel jobs, using HTCondor tools and it is expected to use a consistent amount of CPU hours during 2019.
\subsubsection{Newtonian noise pipeline}
The cancellation of gravitational noise from seismic fields will be a major challenge both from theoretical and computational point of view, since the involved simulations are very demanding. This activity requires the accurate positioning of a large number of seismometers. A cluster at CNAF was used to run position optimisations of the seismic arrays used for cancellation and to determine the cancellation performance as a function of the number of sensors and its robustness with respect to sensor-positioning accuracy.
\subsection{outlook}
The first detection of gravitational waves (GW) and the birth of multi-messenger astrophysics have opened a new field of scientific research. With the possibility to detect GW from various kind of sources we can probe new physical phenomena in regions of the Universe we couldn't explore before, with new perspectives on our knowledge about how it works.
Indeed, so far only signals from the coalescence of compact objects have been detected, while one of the most interesting and promising class of continuous GW signals, coming from asymmetrical rotating neutron stars, is still missing. Wide searches of this kind of signals require a huge amount of computational power due to the Doppler effect of the Earth motion, which disrupts the incoming signal dramatically increases the parameters space. This means that it is necessary to develop complex algorithms to reduce the computational power needed, at the price of significantly reducing the sensitivity of the search.
\subsection{Outlook}
The first detection of gravitational waves (GW) and the birth of multi-messenger astrophysics have opened a new field of scientific research. With the possibility to detect GW from various kinds of sources we can probe new physical phenomena in regions of the Universe we couldn't explore before, with new perspectives on our knowledge about how it works.
Indeed, so far only signals from the coalescence of compact objects have been detected, while one of the most interesting and promising class of continuous GW signals, coming from asymmetrical rotating neutron stars, is still missing. Wide searches of this kind of signals require a huge amount of computational power due to the Doppler effect of the Earth motion, which disrupts the incoming signal and dramatically increases the parameters space. This means that it is necessary to develop complex algorithms to reduce the computational power needed, at the price of significantly reducing the sensitivity of the search.
The development of new algorithms, which use the high efficiency and computational power of modern GPUs, showed that the new codes on a single GPU can run with a factor of ten speed-up with respect to the older ones on a ten times more expensive multi-core CPU.
For the CW case, using real data from the 9 months long run of the LIGO detectors we have estimated that on a cluster of about 200 GPUs a complete search can be done in about a couple of months, to be confronted with the several months required by the older code on a 2000 CPUs cluster.\\ A GPU cluster would be also extremely useful to test and train Machine Learning algorithms, which in the recent years were shown to be able to face very complex analyses with high efficiency and speed.\\
For the CW case, using real data from the 9 months long run of the LIGO detectors we have estimated that on a cluster of about 200 GPUs a complete search can be done in about a couple of months, to be compared with the several months required by the older code on a 2000 CPUs cluster.\\ A GPU cluster would be also extremely useful to test and train Machine Learning algorithms, which in the recent years were shown to be able to face very complex analyses with high efficiency and speed.\\
Advanced Virgo and Advanced LIGO are also exploring different technologies to face the new challenges of GW physics. The growing number of computing centers involved in GW research forces us to relax our idea on computing, searching a way to uniformly run different pipelines in complex and heterogeneous infrastructures. For example, the de-supporting of GridFTP pushes towards the use of Rucio, a well supported and flexible tool for data-transfer and management, while the de-supporting of the Cream-CE suggests a redesign of the job submission strategy, possibly under the control of an overall management system like DIRAC. \\ CNAF staff is intensively supporting Virgo members in all this these tests.
......
\documentclass[a4paper,12pt]{jpconf}
\usepackage[american]{babel}
\usepackage{geometry}
%\usepackage{fancyhdr}
\usepackage{graphicx}
\geometry{a4paper,top=4.0cm,left=2.5cm,right=2.5cm,bottom=2.7cm}
%\usepackage[mmm]{fncychap}
%\fancyhf{} % azzeriamo testatine e piedino
%\fancyhead[L]{\thepage}
%\renewcommand{\sectionmark}[1]{\markleft{\thesection.\ #1}}
%\fancyhead[R]{\bfseries\leftmark}
%\rhead{XENON computing activities}
\begin{document}
\title{XENON computing model}
%\pagestyle{fancy}
\author{Marco Selvi$^1$}
\address{$^1$ INFN Sezione di Bologna, Bologna, IT}
\ead{marco.selvi@bo.infn.it}
\begin{abstract}
The XENON project is dedicated to the direct search of dark matter at LNGS.
XENON1T was the largest double-phase TPC ever built and operated so far, with 2 t of active xenon, decommissioned in December 2018. It successfully set the best world-wide limit to the interaction cross-section of WIMPs with nucleons. In the context of rare event search detectors, the amount of data (in the form of raw waveform) was significant: order of 1 PB/year, including both Science and Calibration runs. The next phase of the experiment, XENONnT, is under construction at LNGS, with a 3 times larger TPC and correspondingly increased data rate. Its commissioning is foreseen by the end of 2019.
We describe the computing model of the XENON project, with details of the data transfer and management, the massive raw data processing, and the production of Monte Carlo simulation.
All these topics are addressed using in the most efficient way the computing resources spread mainly in the US and EU, thanks to the OSG and EGI facilities, including those available at CNAF.
\end{abstract}
\section{The XENON project}
\thispagestyle{empty}
The matter composition of the universe has been a debate topic
among scientists for centuries. In the last couple of decades a series
of astronomical and astrophysical measurements have corroborated
the hypothesis that ordinary matter e.g. electrons, quarks,
neutrinos, etc. represents only 15\% of the total matter in the universe.
The remaining 85\% is thought to be made of a
new, yet-undiscovered exotic species of elementary particles called
dark matter. These indirect evidences of its existence
triggered a world-wide effort to try observe its interaction with
ordinary matter in extremely sensitive detectors, but its nature is
still a mystery.
The XENON experimental program \cite{225, mc, instr-1T} is searching
for weakly interacting massive particles (WIMPs), hypothetical
particles that, if existing, could account for dark matter and
that might interact with ordinary matter through nuclear recoil.
XENON1T is the third generation of the experimental
program; it completed the data taking at the end of 2018, setting the best world-wide limit to the interaction cross-section of WIMPs with nucleons.
The experiment employs a dual-phase (liquid-gas) xenon
time projection chamber (TPC) featuring as target for WIMPs two
tonnes of ultrapure liquid xenon. The detector is designed
in such a way to be sensitive to rare nuclear recoils of xenon
nuclei possibly induced by WIMPs scattering within the detector.
The TPC is surrounded by a water-based muon veto (MV). Each
sub-detector is read out by its own data acquisition system (DAQ).
The detector is located underground at the INFN Laboratori Nazionali
del Gran Sasso in Italy to shield the experiment from cosmic rays.
XENON1T is an order of magnitude larger than any of its predecessor
experiments. This upscaling in detector size produced a
proportional increase in the data rate and computing needs of
the collaboration. The size of the data set required the collaboration
to transition from a centralized computing model, i.e. the entire
dataset is stored on a local facility at various institutions, to having
to distribute the data across collaboration resources. Similarly,
the computing requirements called for incorporating distributed
resources, such as the Open Science Grid (OSG) \cite{osg} and the European
Grid Infrastructure (EGI) \cite{egi}, for main computing tasks,
e.g. initial data processing and Monte Carlo production.
\section{XENON1T}
For what concern the data flow, the XENON1T experiment uses a DAQ machine hosted in the XENON1T service
building underground to acquire data. The DAQ rate in DM mode is ~1.3 TB/day, while in calibration mode it can be significantly larger: up to
$\sim$13 TB/day.
A significant challenge for the collaboration has been that there is
no single institution that has the capacity to store the entire data set.
This requires the data to either be stored in a cloud environment
or be distributed across various collaboration institutions. Storing
the data in a cloud environment is prohibitively expensive at this
point. The data set size and the network traffic charges would
consume the entire computing budget several times over.
The only feasible option was to distribute the data across several
computing facilities associated with collaboration institutions.
The raw data are copied into {\it Rucio}, a data handling system. There are several Rucio endpoints or Rucio
storage elements (RSE) around the world, including LNGS, NIKHEF, Lyon and Chicago. The raw data are replicated in at
least two positions and there are two mirrored tape backups, at CNAF and in Stockholm, with 5.6 PB in total. %Help
When the data have to be processed, they are first copied onto Chicago storage then they are processed using the OSG. The processed data are
then copied back to Chicago and become available for the analysis.
In addition, for each user there is a home space of 100 GB available on a disk of 10 TB. A dedicated server will take
care of the data transfer to/from remote facilities. A high memory 32 cores machine is used to host several virtual
machines, each one running a dedicated service: code (data processing and Monte Carlo) and documents repository on
SVN/GIT, the run database, the on-line monitoring web interface, the XENON wiki and GRID UI.
In fig. \ref{fig:xenonCM} we show a sketch of the XENON computing model and data management scheme.
\begin{figure}[t]
\begin{center}
\includegraphics[width=15cm]{xenon-computing-model.pdf}
\end{center}
\caption{Overview of the XENON1T Job and Data Management Scheme.}
\label{fig:xenonCM}
\end{figure}
The resources at CNAF (CPU and Disk) are used so far mainly for the Monte Carlo simulation of the
detector (GEANT4 model of the detector and waveform generator), and for the €œreal-data€ storage and processing. %Currently we used about XX TB of the XX TB available for 2018. %Help
%For this purpose,
There were some improvements performed recently by the Computing Working group of the experiment. The CNAF Disk at the beginning was not integrated into the Rucio framework because it was not large enough to justify the amount of work needed for the integration (it was 60 TB up to 2016). For this reason we required for 2018 an additional amount of 90 TB, to reach a total 200 TB which is considered significant by the collaboration to consider a full integration of the Disk space.\\
The second improvement has been to perform the data processing on both the US and EU GRID (previously it was done in the US only). Some software tools have been successfully developed and tested during 2017, and they are used for a fully distributed massive data processing. To fulfil this goal, we required 300 HS06 additional CPUs, for a total of 1000 HS06, equivalent to the resources available on the US OSG.\\
The request of Tapes (1000 TB) in 2018 was done to fulfil the requirement by INFN to have a copy of all the XENON1T data in Italy, as discussed inside the INFN Astroparticle Committee. A dedicate automatic data transfer to tapes has been developed by CNAF.
The computing model described in this report allowed for a fast and effective processing and analysis of the first XENON1T data in 2017, and of the final ones in 2018, which led to the best limit in the search of WIMPs so far \cite{sr0, sr1}.
\section{XENONnT}
The planning and initial implementation of the data and job management
for the next generation experiment, XENONnT, has already
begun. The experiment is currenlty under construction at LNGS, and it's scheduled to start taking data by the end of 2019. The current plan is the increase the TPC volume by a factor 3
to have 6 t of active liquid xenon. The new experimental setup will
also have an additional veto layer called Neutron Veto.
The larger detector will require modifications to the current data
and job management. The processing chain and its products will
undergo significant changes. The larger data volume
and improved knowledge about data access patterns has informed
changes to the data organization. Rather than store the full raw
dataset for later re-processing, the data coming from the detector
will be filtered to only include interesting events. The full raw
dataset will only be stored on tape at one or two sites, where one
of these sites is for long-term archival. The filtered raw dataset will
be stored at OSG/EGI sites for later reprocessing. The overall data
volume of the reduced dataset will be similar to the current data
volume of XENON1T.
\section{References}
\begin{thebibliography}{9}
\bibitem{225} Aprile E. et al (XENON Collaboration), {\it Dark Matter Results from 225 Live Days of XENON100 Data}, Phys. Rev. Lett. {\bf 109} (2012), 181301
\bibitem{mc} Aprile E. et al (XENON Collaboration), {\it Physics reach of the XENON1T dark matter experiment}, JCAP {\bf 04} (2016), 027
\bibitem{instr-1T} Aprile E. et al (XENON Collaboration), {\it The XENON1T Dark Matter Experiment}, Eur. Phys. J. C77 {\bf 12} (2017), 881
\bibitem{osg} Ruth Pordes et al., {\it The open science grid}, Journal of Physics: Conference Series 78, 1 (2007), 012057.
\bibitem{egi} D. Kranzlmüller et al., {\it The European Grid Initiative (EGI)}, Remote Instrumentation and Virtual Laboratories. Springer US, Boston, MA, 61–66 (2010).
\bibitem{sr0} Aprile E. et al (XENON Collaboration), {\it First Dark Matter Search Results from the XENON1T Experiment }, Phys. Rev. Lett. {\bf 119} (2017), 181301
\bibitem{sr1} Aprile E. et al (XENON Collaboration), {\it Dark Matter Search Results from a One Ton-Year Exposure of XENON1T}, Phys. Rev. Lett. {\bf 121} (2018), 111302
\end{thebibliography}
\end{document}
File added
immagini/Additional-Information_18_web.jpg

752 KiB

#!/bin/bash
for file in `ls -l | grep -v total | awk '{print $9}'`; do sudo convert $file ${file::(-4)}.pdf; done
immagini/copertina_web.jpg

2.21 MiB

immagini/datacenter_18_web.jpg

846 KiB

immagini/esperiment_18_web.jpg

1.87 MiB

immagini/research_18_web.jpg

1.39 MiB

immagini/transfer_18_web.jpg

1.28 MiB