Du parchemin aux big data: naviguer sans carte dans les données? Journées du SITG 2016 Organisation Mondiale de la Météorologie Genève Frédéric Hemmer CERN Chef du department des Technologies de l Information 19 avril 2016 2
2010: a New Era in Fundamental Science LHCb CMS ATLAS Exploration of a new energy frontier in p-p and Pb-Pb collisions ALICE LHC ring: 27 km circumference 19 avril 2016 Journée du SITG 2016 3
Tier 0 at CERN: Acquisition, First pass reconstruction, Storage & Distribution 2011: 400-500 MB/sec 2011: 4-6 GB/sec 1.25 GB/sec (ions) 19 avril 2016 Journée du SITG 2016 4
A distributed computing infrastructure to provide the production and analysis environments for the LHC experiments Managed and operated by a worldwide collaboration between the experiments and the participating computer centres WLCG what and why? The resources are distributed for funding and sociological reasons Our task was to make use of the resources available to us no matter where they are located Tier-0 (CERN): Initial data reconstruction Data distribution Data recording & archiving Tier-1 (12 centres + Russia): Permanent storage Re-processing Analysis Tier-2 (~140 centres): Simulation End-user analysis ~ 160 sites, 35 countries 300000 cores 200 PB of storage 2 Million jobs/day 10 Gbps links Journée du SITG 2016
Long Term Data Preservation Ensure that LHC Data is preserved for future generation Not only bits & bytes, also the programs that generated them More & more a requirement of funding agencies Many other disciplines, ranging from science to arts & humanities, already (very) active 19 avril 2016 Journée du SITG 2016 6
Data in the Tier-0 15 PB 23 PB 27 PB LHC: 26/36 PB Verification 2016 Repack 19 avril 2016 Journée du SITG 2016 7
Tape contamination incident ~25mm (~120MB over 144 data tracks) holes and scratches ~13mm Over 30m 3 /min of airflows per library (Home vacuum cleaner: ~2m 3 /min) Operating environment required for newgeneration drives: ISO-14644 Class 8 (particles / m 3 ): RPi Arduino sensor 19 avril 2016 Journée du SITG 2016 8
Open Access 19 avril 2016 Journée du SITG 2016 9
@ CERN The Open Source SW behind Open Access CERN Document Server Institutional Repository: Green Open Access ~1.5 million records ~7'500 new papers/year 1 to 3'000 authors from commented drafts to final OA versions and Disciplinary archive Open Data portal... Gold Open Access
Open Data Open Knowledge CERN & the LHC experiments have made the first steps towards Open Data (http://opendata.cern.ch/) Key drivers: Educational Outreach & Reproducibility Increasingly required by Funding Agencies Paving the way for Open Knowledge as envisioned by DPHEP (http://dphep.org) ICFA Study Group on Data Preservation and Long Term Analysis in High Energy Physics CERN has released Zenodo, a platform for Open Data as a Service (http://zenodo.org) 1 Building on experience of Digital Libraries & Extreme scale data management Targeted at the long tail of science Citable through DOIs, including the associated software Interfacing with Github, making code citable ~25000 records from 420 communities incl. 32 software packages not accessible on github ANYMORE 1 Initially cofunded by the EC FP7 OpenAire series of projects 25 November 2015 Data Management@CERN 11