Skip to content
Snippets Groups Projects

Compare revisions

Changes are shown as if the source revision was being merged into the target revision. Learn more about comparing revisions.

Source

Select target project
No results found

Target

Select target project
  • faproietti/ar2018
  • chierici/ar2018
  • SDDS/ar2018
  • cnaf/annual-report/ar2018
4 results
Show changes
\documentclass[a4paper]{jpconf}
\usepackage{graphicx}
\usepackage{url}
\usepackage{color, colortbl}
\definecolor{LightCyan}{rgb}{0.88,1,1}
\definecolor{LightYellow}{rgb}{1,1,0.88}
\definecolor{Red}{rgb}{1,0,0}
\definecolor{Green}{rgb}{0,1,0}
\definecolor{MediumSpringGreen}{rgb}{0,0.98,0.6} %rgb(0,250,154)
\definecolor{Gold}{rgb}{1,0.84,0}%rgb(255,215,0)
\definecolor{Gainsboro}{rgb}{0.86,0.86,0.86}%rgb(220,220,220)
\begin{document}
\title{The INFN Tier 1}
\author{Luca dell'Agnello$^1$}
\address{$^1$ INFN-CNAF, Bologna, IT}
\ead{luca.dellagnello@cnaf.infn.it}
\section{Introduction}
CNAF hosts the Italian Tier 1 data center for WLCG: over the years, Tier 1 has become the main computing facility for INFN.
Nowadays, besides the four LHC experiments, the INFN Tier 1 provides services and resources to 30 other scientific collaborations,
including BELLE2 and several astro-particle experiments (see Table \ref{T1-pledge}).
As shown in Fig.~\ref{pledge2018}, besides LHC, the main users are the astro-particle experiments.
\begin{figure}[h]
\begin{center}
\includegraphics[keepaspectratio,width=15cm]{pledge.png}
\caption{\label{pledge2018}Relative requests of resources at INFN Tier 1}
\end{center}
\end{figure}
Despite the flooding that occurred at the end of 2017, we were able to provide the resources committed to the experiments for 2018 almost in time.
\begin{table}
\begin{center}
\begin{tabular}{l|rrr}
\br
\textbf{Experiment}&\textbf{CPU (kHS06)}&\textbf{Disk (PB-N)}&\textbf{Tape (PB)}\\
\hline
\rowcolor{MediumSpringGreen}
ALICE&52020&5185&13497\\
\rowcolor{MediumSpringGreen}
ATLAS&85410&6480&17550\\
\rowcolor{MediumSpringGreen}
CMS&72000&7200&24440\\
\rowcolor{MediumSpringGreen}
LHCB&46805&5606&11400\\
\rowcolor{MediumSpringGreen}
\hline
\textbf{LHC Total}&\textbf{256235}&\textbf{24471}&\textbf{66887}\\
\hline
\rowcolor{LightYellow}
Belle2&13000&350&0\\
\rowcolor{LightYellow}
CDF&0&0&4000\\
\rowcolor{LightYellow}
Compass&40&10&40\\
\rowcolor{LightYellow}
KLOE&0&33&3075\\
\rowcolor{LightYellow}
LCHf&6000&90&0\\
\rowcolor{LightYellow}
NA62&3000&250&200\\
\rowcolor{LightYellow}
PADME&1500&10&500\\
\rowcolor{LightYellow}
LHCb Tier2&26085&0&0\\
\rowcolor{LightYellow}
\hline
\rowcolor{LightYellow}
\textbf{CSN 1 Total}&\textbf{49625}&\textbf{743}&\textbf{7815}\\
\hline
\rowcolor{LightCyan}
AMS&15800&1990&510\\
\rowcolor{LightCyan}
ARGO&0&120&1000\\
\rowcolor{LightCyan}
Auger&2000&615&0\\
\rowcolor{LightCyan}
BOREX&2000&185&41\\
\rowcolor{LightCyan}
CTA&4000&796&120\\
\rowcolor{LightCyan}
CUORE&1900&262&0\\
\rowcolor{LightCyan}
Cupid&100&15&10\\
\rowcolor{LightCyan}
DAMPE&8000&200&100\\
\rowcolor{LightCyan}
DARKSIDE&2000&980&300\\
\rowcolor{LightCyan}
ENUBET&500&10&0\\
\rowcolor{LightCyan}
EUCLID&1000&1042&0\\
\rowcolor{LightCyan}
Fermi&500&15&40\\
\rowcolor{LightCyan}
Gerda&40&45&40\\
\rowcolor{LightCyan}
Icarus&4000&500&1500\\
\rowcolor{LightCyan}
JUNO&3000&230&0\\
\rowcolor{LightCyan}
KM3&300&250&200\\
\rowcolor{LightCyan}
LHAASO&300&60&0\\
\rowcolor{LightCyan}
LIMADOU&400&8&0\\
\rowcolor{LightCyan}
LSPE&1000&14&0\\
\rowcolor{LightCyan}
MAGIC&296&65&150\\
\rowcolor{LightCyan}
NEWS&200&60&60\\
\rowcolor{LightCyan}
Opera&200&15&15\\
\rowcolor{LightCyan}
PAMELA&650&100&150\\
\rowcolor{LightCyan}
Virgo&30000&656&1368\\
\rowcolor{LightCyan}
Xenon100&1000&200&1000\\
\rowcolor{LightCyan}
\hline
\rowcolor{LightCyan}
\textbf{CSN 2 Total}&\textbf{79186}&\textbf{8433}&\textbf{6604}\\
\hline
\rowcolor{Gainsboro}
FOOT&200&20&0\\
\rowcolor{Gainsboro}
Famu&2250&15&187\\
\rowcolor{Gainsboro}
GAMMA/AGATA&0&0&1160\\
\rowcolor{Gainsboro}
NEWCHIM/FARCOS&0&10&300\\
\rowcolor{Gainsboro}
\hline
\rowcolor{Gainsboro}
\textbf{CSN 3 Total}&\textbf{2450}&\textbf{45}&\textbf{1460}\\
\hline \hline
\rowcolor{Green}
\textbf{Grand Total}&\textbf{387496}&\textbf{33692}&\textbf{82766}\\
\rowcolor{Green}
\textbf{Installed}&\textbf{340000}&\textbf{34000}&\textbf{71000}\\
\br
\end{tabular}
\end{center}
\caption{Pledged and installed resources at INFN Tier 1 in 2018 (for the CPU power an overlap factor is applied). CSN 1, CSN 2 and CSN 3 are the National Scientific Committees of the INFN, respectively, for experiments in high energy physics with accelerators, astro-particle experiments and experiments in nuclear physics with accelerators.}
\label{T1-pledge}
\hfill
\end{table}
\subsection{Out of the mud}
The year 2018 began with the recovery procedures of the data center after the flooding of November 2017.
Despite the serious damages to the power plants (both power lines were compromised), immediately after the flooding we started the recovery procedures of both the infrastructure and the IT equipment. The first mandatory intervention was to restore, at least, one of the two power lines (with a leased UPS in the first period). This goal was achieved during December 2017.
In January, after the restart of the chillers, we could proceed to re-open all services, including part of the farm (at the beginning only $\sim$ 50 kHS06, 1/5 of the total power capacity, were online, while 13\% was lost) and, one by one, the storage systems.
The first experiments to resume operations at CNAF have been Alice, Virgo and Darkside:
in fact, the storage system used by Virgo and Darkside had been easily recovered after Christmas break, while Alice is able to use computing resources relaying on remote storage. During February and March, we were able to progressively re-open the services for all other experiments.
%(Fig.\ref{farm2018} shows the restart of the farm). Meanwhile, we had setup a new partition of the farm hosted at CINECA super-computing center premises (see Par.~\ref{CINECAext}).
The final damage inventory shows the loss of $\sim$ 30 kHS06,
1.4 PB of data and 60 tapes: on the other hand, it was possible to repair all the other systems recovering $\sim$ 20 PB of data;
with respect to the infrastructure, the second line was recovered (see \cite{FLOODCHEP} for details).
%\begin{figure}[h]
% \begin{center}
% \includegraphics[width=40pc]{t1-img/farm2018.png}\hspace{2pc}%
% \caption{\label{farm2018}Farm usage in 2018}
% \end{center}
%\end{figure}
\subsection{The long-term consequences of the flooding}
The data center was designed taking into account all possible accidents, e.g. fires, power outages... except very unlikely events
such as the breaking of one of the main water pipelines in Bologna, located in a road next to CNAF,
which is precisely what happened in November 2017.
In fact, it was believed that the only threat due to water could come from a very heavy rain and, indeed,
waterproof doors were installed some years ago, after a heavy rain.
The post-mortem analysis showed that the causes, beside the breaking of the pipe, are to be found in the unfavorable position (2 underground levels) and in the excessive permeability of the perimeter (while the anti-flood doors worked). Therefore, an intervention has been carried out to increase the waterproofing of the data center and, moreover, work is planned for summer 2019 to strengthen the perimeter of the building and build a second water collection tank.
Even if the search for a new location to move the data center had started before the flooding (the main drive being its limited expandability not able to cope with the foreseen requirements for HL-LHC era when we should scale up to 10 MW of power for IT), the flooding gave us a second strong reason to move.
An opportunity is given by the new ECMWF center which will be hosted in Bologna, in a new Technopole area, starting from 2019.
In the same area the INFN Tier 1 and the CINECA\footnote{CINECA is the Italian Supercomputing center, also located near Bologna ($\sim17$ km far from CNAF). See \url{http://www.cineca.it/}} computing centers can be hosted too: funding has been guaranteed to INFN and CINECA by the Italian Government for this. The goal is to have the new data center for the INFN Tier 1 fully operational by the end of 2021.
\section{INFN Tier 1 extension at CINECA}\label{CINECAext}
Out of the 400 kHS06 CPU power (340 kHS06 pledged) of the CNAF farm, $\sim180$ are provided by servers installed in the CINECA data center.
%Each server is equipped with a 10 Gbit uplink connection to the rack switch while each of them, in turn, is connected to the aggregation router with 4x40 Gbit links.
The logical network of the farm partition at CINECA is set as an extension of INFN Tier 1 LAN: a dedicated fiber couple interconnects the aggregation router at CINECA with the core switch at the INFN Tier 1 (see Farm and Network Chapters for more details). %Fig.~\ref{cineca-t1}).
%The transmission on the fiber is managed by a couple of Infinera DCI, allowing to have a logical channel up to 1.2 Tbps (currently it is configured to transmit up to 400 Gbps).
%\begin{figure}
% % \begin{minipage}[b]{0.45\textwidth}
% \begin{center}
% \includegraphics[width=30pc]{t1-img/cineca-t1.png}
% \caption{\label{cineca-t1}Schematic view of the CINECA - INFN Tier-1 interconnection}
% \end{center}
% % \end{minipage}
%\end{figure}
These nodes, in production since March 2018 for WLCG experiments have been gradually opened to all other collaborations. %Due the low latency (the RTT is 0.48 ms vs. 0.28 ms measured on the CNAF LAN), there is no need of a disk cache on the CINECA side and the WNs directly access the storage located at CNAF; in fact, the
The efficiency of the jobs\footnote{The efficiency of a job is defined as the ratio beyween its CPU time and its wall-clock time.} is comparable to the one measured on the farm partition at CNAF.
Since this partition have been installed from the beginning with CentOS 7, legacy applications requiring a different flavour of Operating System can use it through the container technology Singularity~\cite{singularity}.
%Moreover, this partition has undergone several reconfigurations due to both the hardware and the type of workflow of the experiments. In April we had to upgrade the BIOS to overcome a bug which was preventing the full resource usage, limiting at $\sim$~78\% of the total what we were getting from the nodes. Moreover a reconfiguration of the local RAID configuration of disks is ongoing\footnote{The initial choice of using RAID-1 for local disks instead of RAID-0 has been proven to slow down the system even if safer from an operational point of view.} as well as tests to choose the best number of computing slots.
\section*{References}
\begin{thebibliography}{9}
\bibitem{FLOODCHEP} L. dell'Agnello, "Disaster recovery of the INFN Tier 1 data center: lesson learned" to be published in Proceedings of the 23rd International Conference on Computing in High Energy and Nuclear Physics - EPJ Web of Conferences
\bibitem{singularity} \url{http://singularity.lbl.gov}
\end{thebibliography}
\end{document}
File added
......@@ -2,34 +2,34 @@
\usepackage{graphicx}
\begin{document}
\title{User and Operational Support at CNAF}
\author{D. Cesini, E. Corni, F. Fornari, L. Morganti, C. Pellegrino, M. V. P. Soares, M. Tenti, L. Dell'Agnello}
\address{INFN-CNAF, Bologna, IT}
\author{D. Cesini$^1$, E. Corni$^1$, F. Fornari$^1$, L. Morganti$^1$, C. Pellegrino$^1$, M. V. P. Soares$^1$, M. Tenti$^1$, L. Dell'Agnello$^1$}
\address{$^1$ INFN-CNAF, Bologna, IT}
\ead{user-support@lists.cnaf.infn.it}
\begin{abstract}
Many different research groups, typically organized in Virtual Organizations (VOs),
exploit the Tier-1 Data center facilities for computing and/or data storage and management. Moreover, CNAF hosts two small HPC farms and a Cloud infrastructure. The User Support unit provides to the users of all CNAF facilities with a direct operational support, and promotes common technologies and best-practices to access the ICT resources in order to facilitate the usage of the center and maximize its efficiency.
exploit the Tier 1 Data center facilities for computing and/or data storage and management. Moreover, CNAF hosts two small HPC farms and a Cloud infrastructure. The User Support unit provides to the users of all CNAF facilities with a direct operational support, and promotes common technologies and best-practices to access the ICT resources in order to facilitate the usage of the center and maximize its efficiency.
\end{abstract}
\section{Current status}
Born in April 2012, the User Support team in 2018 was composed by one coordinator and up to five fellows with post-doctoral education or equivalent work experience in scientific research or computing.
The main activities of the team include:
\begin{itemize}
\item providing a prompt feedback to VO-specific issues via ticketing systems or official mail channels;
\item forwarding to the appropriate Tier-1 units those requests which cannot be autonomously satisfied, and taking care of answers and fixes, e.g. via the tracker JIRA, until a solution is delivered to the experiments;
\item forwarding to the appropriate Tier 1 units those requests which cannot be autonomously satisfied, and taking care of answers and fixes, e.g. via the tracker JIRA, until a solution is delivered to the experiments;
\item supporting the experiments in the definition and debugging of computing models in distributed and Cloud environments;
\item helping the supported experiments by developing code, monitoring frameworks and writing guides and documentation for users (see e.g. https://www.cnaf.infn.it/en/users-faqs/);
\item solving issues on experiment software installation, access problems, new accounts creation and any other daily usage problems;
\item porting applications to new parallel architectures (e.g. GPUs and HPC farms);
\item providing the Tier-1 Run Coordinator, who represents CNAF at the Daily WLCG calls, and reports about resource usage and problems at the monthly meeting of the Tier-1 management body (Comitato di Gestione del Tier-1).
\item providing the Tier 1 Run Coordinator, who represents CNAF at the Daily WLCG calls, and reports about resource usage and problems at the monthly meeting of the Tier 1 management body (Comitato di Gestione del Tier 1).
\end{itemize}
People belonging to the User Support team represent INFN Tier-1 inside the VOs.
People belonging to the User Support team represent INFN Tier 1 inside the VOs.
In some cases, they are directly integrated in the supported experiments. Moreover, they can play the role of a member of any VO for debugging purposes.
The User Support staff is also involved in different CNAF internal projects, notably the Computing on SoC Architectures (COSA) project (www.cosa-project.it) dedicated to the technology tracking and benchmarking of the modern low-power architectures for computing applications.
\section{Supported experiments}
The LHC experiments represent the main users of the data center, handling more than 80\% of the total computing and storage resources funded at CNAF. Besides the four LHC experiments (ALICE, ATLAS, CMS, LHCb) for which CNAF acts as Tier-1 site, the data center also supports an ever increasing number of experiments from the Astrophysics, Astroparticle physics and High Energy Physics domains, and specifically Agata, AMS-02, Argo-YBJ, Auger, Belle II, Borexino, CDF, Compass, COSMO-WNEXT CTA, Cuore, Cupid, Dampe, DarkSide-50, Enubet, Famu, Fazia, Fermi-LAT, Gerda, Icarus, LHAASO, LHCf, Limadou, Juno, Kloe, KM3Net, Magic, NA62, Newchim, NEWS, NTOP, Opera, Padme, Pamela, Panda, Virgo, and XENON.
The LHC experiments represent the main users of the data center, handling more than 80\% of the total computing and storage resources funded at CNAF. Besides the four LHC experiments (ALICE, ATLAS, CMS, LHCb) for which CNAF acts as Tier 1 site, the data center also supports an ever increasing number of experiments from the Astrophysics, Astroparticle physics and High Energy Physics domains, and specifically Agata, AMS-02, Auger, Belle II, Borexino, CDF, Compass, COSMO-WNEXT CTA, Cuore, Cupid, Dampe, DarkSide-50, Enubet, Famu, Fazia, Fermi-LAT, Gerda, Icarus, LHAASO, LHCf, Limadou, Juno, Kloe, KM3Net, Magic, NA62, Newchim, NEWS, NTOP, Opera, Padme, Pamela, Panda, Virgo, and XENON.
Clearly, a bigger effort from the User Support team is needed to answer to the varied and diverse needs from these no-LHC experiments and to encourage them to adopt more modern technologies, e.g. FTS, Dirac, token-based authorization.
\begin{figure}[ht]
......@@ -60,12 +60,13 @@ The following figures show resources pledged and used by the supported experimen
Unfortunately, the accounting data for storage, both disk and tape statistics, are available only after summer 2018, given the restoration of the complex system of sensors for accounting after the 2017 flooding had a lower priority with respect to activities needed for a complete of the storage resources involved in the flood.
\section{Support to HPC and cloud-based experiment}
Apart from Tier-1 facilities, CNAF hosts two small HPC farms and a cloud infrastructure. The first HPC cluster, in production since 2015, is composed of 27 nodes, some of them also equipped with one or more GPUs (NVIDIA Tesla K20, K40 and K1). All nodes are infiniband interconnected and equipped with 2 Intel CPUs, 8 physical cores each, HyperThread enabled. The cluster is accessible via the LSF batch system. It is open to various INFN communities, but the main users are theoretical physicist dealing with plasma laser acceleration simulations. The cluster serves as testing infrastructure to prepare the high resolution runs submitted to supercomputers.
Apart from Tier 1 facilities, CNAF hosts two small HPC farms and a cloud infrastructure. The first HPC cluster, in production since 2015, is composed of 27 nodes, some of them also equipped with one or more GPUs (NVIDIA Tesla K20, K40 and K1). All nodes are infiniband interconnected and equipped with 2 Intel CPUs, 8 physical cores each, HyperThread enabled. The cluster is accessible via the LSF batch system. It is open to various INFN communities, but the main users are theoretical physicists dealing with plasma laser acceleration simulations. The cluster is used as a testing infrastructure to prepare the high resolution runs to be submitted afterwards to supercomputers.
A second HPC cluster entered into production in 2017 to serve the CERN accelerators R/D groups. The cluster consists of 12 nodes OmniPath interconnected. Can be access through batch queues managed by the IBM LSF system.
A second HPC cluster entered into production in 2017 to serve the CERN accelerators R/D groups. The cluster consists of 12 nodes OmniPath interconnected. It can be access through batch queues managed by the IBM LSF system.
The support is provided on a daily base for what concerns software installation, access problems, new accounts creation and any other usage problems.
The User Support team manages an OpenStack-based tenant hosted within the Cloud@CNAF. This tenant, provided with 300 vCPUs, is mostly devoted to support peculiar use cases which require unusual software configurations and only for a limited amount of time. The most important of these use cases is the FAZIA experiment, for which 256 vCPUs were provided, distributed over 16 worker nodes with 8GB of RAM each, where the Debian 8.4 operating system has been installed and configured with LDAP+Kerberos for user authentication and authorization, and NFS 4 for network storage sharing. Recently, other experiments started accessing the Cloud infrastructure: AMS, EEE, FAZIA, Icarus and NTOF.
The User Support team manages an OpenStack-based tenant hosted within the Cloud@CNAF. This tenant, provided with 300 vCPUs, is mostly devoted to support peculiar use cases which require unusual software configurations and only for a limited amount of time. The most important of these use cases is the FAZIA experiment, for which 256 vCPUs were provided, distributed over 16 worker nodes with 8GB of RAM each, where the Debian 8.4 operating system has been installed and configured with LDAP and Kerberos for user authentication and authorization, and NFS 4 for network storage sharing.
Recently, other experiments started accessing the Cloud infrastructure: AMS, EEE, Icarus and NTOF.
\end{document}
......
......@@ -5,18 +5,18 @@
%\author{P. Astone$^1$, F. Badaracco$^{2,3}$, S. Bagnasco$^4$, S. Caudill$^5$, F. Carbognani$^6$, A. Cirone$^{7,8}$, G. Fronz\'e$^{4}$, J. Harms$^{2,3}$, I. LaRosa$^1$, C. Lazzaro$^9$, P. Leaci$^1$, S. Lusso$^4$, C. Palomba$^1$, R. DePietri$^{11,12}$, M. Punturo$^{10}$, L. Rei$^8$, L. Salconi$^6$, S. Vallero$^{4}$, on behalf of the Virgo collaboration}
\author{P. Astone$^1$, F. Badaracco$^{2,3}$, S. Bagnasco$^4$, S. Caudill$^5$, F. Carbognani$^6$, A. Cirone$^{7,8}$, M. Drago$^{2,3}$, G. Fronz\'e$^{4}$, J. Harms$^{2,3}$, I. LaRosa$^1$, C. Lazzaro$^9$, P. Leaci$^1$, S. Lusso$^4$, C. Palomba$^1$, R. DePietri$^{11,12}$, M. Punturo$^{10}$, L. Rei$^8$, L. Salconi$^6$, S. Vallero$^{4}$, on behalf of the Virgo collaboration}
\address{$^1$ INFN, Roma, IT}
\address{$^2$ Gran Sasso Science Institute (GSSI), IT}
\address{$^3$ INFN, Laboratori Nazionali del Gran Sasso, IT}
\address{$^4$ INFN, Torino, IT}
\address{$^5$ Nikhef, Science Park, NL}
\address{$^6$ EGO-European Gravitational Observatory, Cascina, Pisa, IT}
\address{$^7$ Universit\`a degli Studi di Genova, IT}
\address{$^8$ INFN, Genova, IT}
\address{$^9$ INFN, Padova, IT}
\address{$^{10}$ INFN, Perugia, IT}
\address{$^{11}$ Universit\`a degli Studi di Parma, IT}
\address{$^{12}$ INFN, Gruppo Collegato Parma, IT}
\address{$^1$ INFN Sezione di Roma, Roma, IT}
\address{$^2$ Gran Sasso Science Institute (GSSI), L'Aquila, IT}
\address{$^3$ INFN Laboratori Nazionali del Gran Sasso, L'Aquila, IT}
\address{$^4$ INFN Sezione di Torino, Torino, IT}
\address{$^5$ Nikhef, Amsterdam, NL}
\address{$^6$ EGO-European Gravitational Observatory, Cascina (PI), IT}
\address{$^7$ Universit\`a degli Studi di Genova, Genova, IT}
\address{$^8$ INFN Sezione di Genova, Genova, IT}
\address{$^9$ INFN Sezione di Padova, Padova, IT}
\address{$^{10}$ INFN Sezione di Perugia, Perugia, IT}
\address{$^{11}$ Universit\`a degli Studi di Parma, Parma, IT}
\address{$^{12}$ INFN Gruppo Collegato Parma, Parma, IT}
%\address{Production Editor, \jpcs, \iopp, Dirac House, Temple Back, Bristol BS1~6BE, UK}
......@@ -32,7 +32,7 @@ The amount of data processed during the last few years has emphasized the fact t
\section{Advanced Virgo computing model}
\subsection{Data production and data transfer}
The Advanced Virgo data acquisition system is writing about 35MB/s of data (so-called ``bulk data'') during O3. CNAF and CC-IN2P3 are the Virgo Tier-0: during the science runs, bulk data is stored in a circular buffer located at the Virgo site, and simultaneously transferred to the remote computing centres where they are archived in tape libraries. The transfer is realized through an ad-hoc procedure based on GridFTP (at CNAF) and iRods (at CC-IN2P3). Other data fluxes reach CNAF during science runs:
The Advanced Virgo data acquisition system is writing about 35MB/s of data (so-called ``bulk data'') during O3. CNAF and CC-IN2P3 are the Virgo Tier 0: during the science runs, bulk data is stored in a circular buffer located at the Virgo site, and simultaneously transferred to the remote computing centers where they are archived in tape libraries. The transfer is realized through an ad-hoc procedure based on GridFTP (at CNAF) and iRods (at CC-IN2P3). Other data fluxes reach CNAF during science runs:
\begin{itemize}
\item trend data (few GB/day), periodically transferred using the system described above;
......@@ -42,23 +42,23 @@ The Advanced Virgo data acquisition system is writing about 35MB/s of data (so-c
\subsection{Data Analysis at CNAF}
%The analysis of the LIGO and Virgo data was made jointly by the two collaborations; the analysis pipelines are distributed among the worldwide network of computing facilities offering computing resources to the GW experiments. CNAF was mainly used for CW analysis, looking for continuous gravitational wave signals, developed by INFN–Roma people (see hereafter more details). But at CNAF is also running part of the pyCBC pipeline, submitted via OSG, looking for compact binaries signals. pyCBC has a crucial role in the detection of the coalescence of BBH and BNS. CNAF contributed to the computation performed through pyCBC for the analysis of the events GW170814, the first BBH coalescence detected also by Virgo, and GW170817, the BNS coalescence. During the last month a new extension of CVMFS, \emph{big cvmfs} was mounted at cnaf to support another OSG pipeline, \emph{BayesWave}. The big cvmfs is able to export, in a posix fashion, big file of data from nearby cache in Amsterdam instead of accessing data directly from Nebraska. BayesWave is a Bayesian algorithm designed to robustly distinguish gravitational wave signals from noise and instrumental glitches without relying on any prior assumptions of waveform morphology. In the last year coherent WaveBurst \emph{cwb} was ported to cnaf and made available to run. cwb is a pipeline based on coherent algorithm for detection and reconstruction of modelled and unmodelled GW bursts. A new newtonian noise cancellation algoritmh, developed by the group of Gran Sasso Science Institute (\emph{GSSI}) was made available very recently. The increased number of LVC pipelines running at cnaf has led to saturate advance virgo pledge at cnaf, cnaf promptly rensponded to advance virgo needed enlargin our quota and giving experimental access to gpu.
LIGO-Virgo data analysis is organized jointly, meaning that the analysis pipelines are made available to the computing facilities related to the LVC network, ready to be distributed to each GW detector. CNAF has been mainly used for Continuous Wave(\emph{CW}) analysis, led by the Roma INFN group, and for the Compact Binary Coalescence python-based analysis (\emph{pyCBC}), submitted via OSG. In particular CNAF computationally contributed to GW170814 and GW170817 events, respectively the first BBH coalescence detected by Virgo and the first BNS merger ever observed. During the last month a new extension of CVMFS, so-called ``big cvmfs'', was mounted at CNAF to support another OSG-based pipeline, Bayes Wave. The former is able to make available, in a POSIX-like fashion, big data files from a cache in Amsterdam, instead of accessing the data directly from Nebraska. The latter is a Bayesian algorithm, designed to robustly distinguish GW signals from noise and instrumental glitches, without relying on any prior assumptions on the waveform shape. During the last year, coherent WaveBurst(\emph{cWB}), an algorithm dedicated to the detection and reconstruction of GW Bursts, was also ported to CNAF. Furthermore, new Newtonian Noise cancellation algorithms, which are currently being developed by the GSSI group, were made recently available. The increasing number of LVC pipelines running at CNAF has led to resource saturation, and consequently to a demand for enlarged computing power, together with access to GPUs.
LIGO-Virgo data analysis is organized jointly, meaning that the analysis pipelines are made available to the computing facilities related to the LVC network, ready to be distributed to each GW detector. CNAF has been mainly used for Continuous Wave (\emph{CW}) analysis, led by the Roma INFN group, and for the Compact Binary Coalescence python-based analysis (\emph{pyCBC}), submitted via OSG. In particular CNAF computationally contributed to GW170814 and GW170817 events, respectively the first BBH coalescence detected by Virgo and the first BNS merger ever observed. During the last month a new extension of CVMFS, so-called ``big cvmfs'', was mounted at CNAF to support another OSG-based pipeline, Bayes Wave. The former is able to make available, in a POSIX-like fashion, big data files from a cache in Amsterdam, instead of accessing the data directly from Nebraska. The latter is a Bayesian algorithm, designed to robustly distinguish GW signals from noise and instrumental glitches, without relying on any prior assumptions on the waveform shape. During the last year, coherent WaveBurst (\emph{cWB}), an algorithm dedicated to the detection and reconstruction of GW Bursts, was also ported to CNAF. Furthermore, new Newtonian Noise cancellation algorithms, which are currently being developed by the GSSI group, were made recently available. The increasing number of LVC pipelines running at CNAF has led to resource saturation, and consequently to a demand for enlarged computing power, together with access to GPUs.
\subsubsection{CW pipeline}
CNAF has been in 2018 the main computing center for Virgo all-sky continuous wave (CW) searches. The search for this kind of signals, emitted by spinning neutron stars, covers a large portion of the source parameter space and consists of several steps organized in a hierarchical analysis pipeline. CNAF has been mainly used for the ``incoherent'' stage, based of a particular implementation of the Hough transform, which is the heaviest part of the analysis from a computational point of view. The code implementing the Hough transform has been written in such a way that the exploration of the parameter space can be split in several independent jobs, each covering a range of signal frequencies and a portion of the sky. This is an embarrassingly parallel problem, very well suited to be run in a distributed computing environment. The analysis jobs have been run using the EGI UMD grid middleware, with input and output files stored in a StoRM-based Storage Element at CNAF. Candidate post-processing, consisting of clusterisation, coincidences and ranking, and parts of the candidate follow-up analysis have been also carried on at CNAF. Typical Hough transform jobs needs about 4GB of memory (with a fraction requiring more, up to 8GB). Past year most of the resources have been used to analyze Advanced LIGO O2 data. Overall, in 2018 more than 10M CPU hours have been used at CNAF for CW searches, by running O($10^5$) jobs, with duration from a few hours to ~3 days.
CNAF has been in 2018 the main computing center for Virgo all-sky continuous wave (CW) searches. The search for this kind of signals, emitted by spinning neutron stars, covers a large portion of the source parameter space and consists of several steps organized in a hierarchical analysis pipeline. CNAF has been mainly used for the ``incoherent'' stage, based of a particular implementation of the Hough transform, which is the heaviest part of the analysis from a computational point of view. The code implementing the Hough transform has been written in such a way that the exploration of the parameter space can be split in several independent jobs, each covering a range of signal frequencies and a portion of the sky. This is an embarrassingly parallel problem, very well suited to be run in a distributed computing environment. The analysis jobs have been run using the EGI UMD grid middleware, with input and output files stored in a StoRM-based Storage Element at CNAF. Candidate post-processing, consisting of clusterisation, coincidences and ranking, and parts of the candidate follow-up analysis have been also carried on at CNAF. A typical Hough transform job needs about 4GB of memory (with a fraction requiring more, up to 8GB). Past year most of the resources have been used to analyze Advanced LIGO O2 data. Overall, in 2018 more than 10M CPU hours have been used at CNAF for CW searches, by running O($10^5$) jobs, with duration from a few hours to ~3 days.
\subsubsection{cWB pipeline}
Starting in 2019, the coherent WaveBurst based pipelines have been ported and adapted to run at CNAF to reproduce the cWB environment setup on the worker nodes, without the constraint to read the user home account during running. It is planned to run at CNAF all Virgo offline long duration all-sky searches on the data that will be collected during the Observational Run 3 (03) that started April 1st, 2019. cWB is a data-analysis tool to search for a broad range of gravitational-wave (GW) transients. The pipeline identifies coincident events in the GW data from earth-based interferometric detectors and reconstructs the gravitational wave signal by using a constrained maximum likelihood approach. The algorithm performs a time-frequency analysis of the data, using wavelet representation, and identifies the events by clustering time-frequency pixels with significant excess coherent power. The likelihood statistics is built as a coherent sum over the responses of different detectors and estimates the total signal to noise ratio of the GW signal in the network. The pipeline splits the total analysis time into sub-periods to be analyzed in parallel jobs, using HTCondor tools and it is expected to use a consistent amount of CPU hours during 2019.
Starting in 2019, the coherent WaveBurst based pipelines have been ported and adapted to run at CNAF to reproduce the cWB environment setup on the worker nodes, without the constraint to read the user home account during running. It is planned to run at CNAF all Virgo offline long duration all-sky searches on the data that will be collected during the Observational Run 3 (03) that started April 1, 2019. cWB is a data-analysis tool to search for a broad range of gravitational-wave (GW) transients. The pipeline identifies coincident events in the GW data from earth-based interferometric detectors and reconstructs the gravitational wave signal by using a constrained maximum likelihood approach. The algorithm performs a time-frequency analysis of the data, using wavelet representation, and identifies the events by clustering time-frequency pixels with significant excess coherent power. The likelihood statistics is built as a coherent sum over the responses of different detectors and estimates the total signal to noise ratio of the GW signal in the network. The pipeline splits the total analysis time into sub-periods to be analyzed in parallel jobs, using HTCondor tools and it is expected to use a consistent amount of CPU hours during 2019.
\subsubsection{Newtonian noise pipeline}
The cancellation of gravitational noise from seismic fields will be a major challenge both from theoretical and computational point of view, since the involved simulations are very demanding. This activity requires the accurate positioning of a large number of seismometers. A cluster at CNAF was used to run position optimisations of the seismic arrays used for cancellation and to determine the cancellation performance as a function of the number of sensors and its robustness with respect to sensor-positioning accuracy.
\subsection{outlook}
The first detection of gravitational waves (GW) and the birth of multi-messenger astrophysics have opened a new field of scientific research. With the possibility to detect GW from various kind of sources we can probe new physical phenomena in regions of the Universe we couldn't explore before, with new perspectives on our knowledge about how it works.
Indeed, so far only signals from the coalescence of compact objects have been detected, while one of the most interesting and promising class of continuous GW signals, coming from asymmetrical rotating neutron stars, is still missing. Wide searches of this kind of signals require a huge amount of computational power due to the Doppler effect of the Earth motion, which disrupts the incoming signal dramatically increases the parameters space. This means that it is necessary to develop complex algorithms to reduce the computational power needed, at the price of significantly reducing the sensitivity of the search.
\subsection{Outlook}
The first detection of gravitational waves (GW) and the birth of multi-messenger astrophysics have opened a new field of scientific research. With the possibility to detect GW from various kinds of sources we can probe new physical phenomena in regions of the Universe we couldn't explore before, with new perspectives on our knowledge about how it works.
Indeed, so far only signals from the coalescence of compact objects have been detected, while one of the most interesting and promising class of continuous GW signals, coming from asymmetrical rotating neutron stars, is still missing. Wide searches of this kind of signals require a huge amount of computational power due to the Doppler effect of the Earth motion, which disrupts the incoming signal and dramatically increases the parameters space. This means that it is necessary to develop complex algorithms to reduce the computational power needed, at the price of significantly reducing the sensitivity of the search.
The development of new algorithms, which use the high efficiency and computational power of modern GPUs, showed that the new codes on a single GPU can run with a factor of ten speed-up with respect to the older ones on a ten times more expensive multi-core CPU.
For the CW case, using real data from the 9 months long run of the LIGO detectors we have estimated that on a cluster of about 200 GPUs a complete search can be done in about a couple of months, to be confronted with the several months required by the older code on a 2000 CPUs cluster.\\ A GPU cluster would be also extremely useful to test and train Machine Learning algorithms, which in the recent years were shown to be able to face very complex analyses with high efficiency and speed.\\
For the CW case, using real data from the 9 months long run of the LIGO detectors we have estimated that on a cluster of about 200 GPUs a complete search can be done in about a couple of months, to be compared with the several months required by the older code on a 2000 CPUs cluster.\\ A GPU cluster would be also extremely useful to test and train Machine Learning algorithms, which in the recent years were shown to be able to face very complex analyses with high efficiency and speed.\\
Advanced Virgo and Advanced LIGO are also exploring different technologies to face the new challenges of GW physics. The growing number of computing centers involved in GW research forces us to relax our idea on computing, searching a way to uniformly run different pipelines in complex and heterogeneous infrastructures. For example, the de-supporting of GridFTP pushes towards the use of Rucio, a well supported and flexible tool for data-transfer and management, while the de-supporting of the Cream-CE suggests a redesign of the job submission strategy, possibly under the control of an overall management system like DIRAC. \\ CNAF staff is intensively supporting Virgo members in all this these tests.
......
......@@ -20,9 +20,9 @@
\title{XENON computing model}
%\pagestyle{fancy}
\author{M. Selvi}
\author{Marco Selvi$^1$}
\address{INFN - Sezione di Bologna}
\address{$^1$ INFN Sezione di Bologna, Bologna, IT}
\ead{marco.selvi@bo.infn.it}
......
immagini/Additional-Information_18_web.jpg

752 KiB

#!/bin/bash
for file in `ls -l | grep -v total | awk '{print $9}'`; do sudo convert $file ${file::(-4)}.pdf; done
immagini/copertina_web.jpg

2.21 MiB

immagini/datacenter_18_web.jpg

846 KiB

immagini/esperiment_18_web.jpg

1.87 MiB

immagini/research_18_web.jpg

1.39 MiB

immagini/transfer_18_web.jpg

1.28 MiB