CMS Monte Carlo Production System: From Analyst Point of View
On behalf of PdmV and Generator Groups
Wajid Ali Khan | 09.04.2019
N ATIONAL C ENTER FOR P HYSICS , I SLAMABAD PAKISTAN
Overview:
Monte Carlo Production Management (McM) its usage and needs
CMS Computing Model: Worldwide LHC Computing Grid
MC Production and McM Terminology
Production Monitoring Platform (pMp)
Dataset Name Terminology and Datasets in DAS → Bridging DAS and McM
Finding Details:
Settings for cmsDrivers/Sequences
Finding/Using Particular Gridpacks/Configuration Files
Private Sample Production (LHEs/pLHE) and More Production Details
Links to be Bookmarked, HNs, Egroups, Twikis
Exercises
Wajid Ali Khan Monte Carlo Production System 09.04.2019 2/32
MC Production Management:
Analysis of LHC data at CMS experiment requires the production of a large number
of simulated events
McM produced billions of simulated events using different campaigns during RunI, II
Each campaign takes in a specific detector and LHC conditions
About more than 20 different groups (PAGs, POGs, DPGs) are working at CMS using
various MC samples
Hundreds of signal and backgrounds samples are needed for various studies
Strong and reliable system is required to manage information needed for
configuration and prioritization of event production
Ensure efficient book keeping and production of MC samples for different groups
Take input form a user in a simple way and interfaces with the CMS production
infrastructure
Take the user → Monte Carlo Management → Tier-1/Tier-2 computing centers
Wajid Ali Khan Monte Carlo Production System 09.04.2019 3/32
CMS Computing Model:
CMS presents a challenging environment not only in terms of physics to discover, the
detector to build and operate but also in:
Data volume and the necessary computing resources
Computing resources and dataset are at least an order of magnitude larger than the
previous experiments
The large scale CMS computing and storage requirements make it difficult to localize all of
them at one place (technical and funding reasons)
Many CMS collaborators are not based at CERN and they have access to significant
computing resources (other than CERN)
It is advantageous to harness them for CMS computing
It also helps to develop local infrastructure and secure local funding
Wajid Ali Khan Monte Carlo Production System 09.04.2019 4/32
Worldwide LHC Computing Grid (WLCG):
WLCG is composed of four levels, or “Tiers”, called 0, 1, 2/3.
Each tier is made up of several computer centres and provides a specific set of
services → tiers process, store and analyse data from LHC
Tier 0 is the CERN Data Centre provides less than 20% of the Grid’s total capacity,
40% at T1s, and 40% at T2s
Wajid Ali Khan Monte Carlo Production System 09.04.2019 5/32
Computing Resources Usage:
Major Campaigns sharing the computing resources
Wajid Ali Khan Monte Carlo Production System 09.04.2019 6/32
MC Production – Managerial Overview:
Physics Object Group
Physics Analysis Group MC Production
Computing Operations
Management
Detector Performance Group
DAS
Generator Contact: collects the needs for simulated datasets from
within the detector or physics group tests the requests locally and
proposes them to the MCCM for production
Generator Convener: Examines or inspects the requests made by the
Generator contact closely and thoroughly and then approves the
particular generator configurations
Request/Production Manager: Configures campaigns and flows,
performs request chaining, sets their priority and submit requests to
the production infrastructure, handles workflows during the production
phase and sends datasets to DAS
Wajid Ali Khan Monte Carlo Production System 09.04.2019 7/32
Data Tiers Most Commonly Used:
Data Tier can be defined as the event contents/information a dataset stores. Most
commonly used are:
RAW, RECO, AOD, AODSIM, MiniAOD, NanoAOD, USER, GEN, FEVT
RAW contains full event information from the Tier-0 (i.e., from CERN), containing
’raw’ detector information (detector element, hits, etc.)
RAW is not used directly for analysis
RECO & AOD
RECO (RECOnstructed data): output from first processing by Tier-0. This layer contains
reconstructed physics objects, but it’s still very detailed ∼ 2 MB
Used mostly for dedicated studies and detector commissioning
AOD (Analysis Object Data): distilled version of RECO data, contains (∼ 40%) RECO
information and can be used for analysis
MiniAOD
Lightweight data tier MiniAOD is a step further in data reduction (∼ 10–15% of AOD size)
Typical event size (30–50 kB/evevnt) serve the needs of ∼ 90% of CMS analyses
NanoAOD
NanoAOD consists of ntuple like format, readable with bare ROOT and containing
per/event information that is needed in most generic analyses (30 - 50%)
Produced on top of MiniAOD, typical event size (1–2kB), also fast to run: O(10–20Hz)
Further details can be found here
Wajid Ali Khan Monte Carlo Production System 09.04.2019 8/32
McM Web Interface: 1
Production Interface: https://cms-pdmv.cern.ch/mcm
Development/Testing Interface: https://cms-pdmv-dev.cern.ch/mcm
Wajid Ali Khan Monte Carlo Production System 09.04.2019 9/32
McM Web Interface: 2
Register your self if you are using the McM for first time
Click Users → than Click Add me! button on lower left corner
Role/access rights increase can be requested at any time
Wajid Ali Khan Monte Carlo Production System 09.04.2019 10/32
Technical Overview of MC Production:
Type of physics processes ?
Event Generation, Hard Scattering
Generation Type of generators ?
{
Hadronization, Validation
Parameters to be modified ?
GEN-SIM
Particle-detector interaction
Simulation CMS Geometry, Magnetic field
Simulation
Pileup situation, Alignment-
Digitization DIGI, L1, DIGI2RAW, HLT
{
Caliberation, Trigger menu etc
DIGI-RECO
RAW2DIGI, L1Reco, RECO,
Reconstruction Reconstruction algorithms
VALIDATION, DQM
MiniAOD
NanoAOD
Wajid Ali Khan Monte Carlo Production System 09.04.2019 11/32
McM Terminology:
Types of root requests in McM:
wmLHE: simulation of the hard event by specialised event generator programs,
resulting in events written in LHE format – Workload Management System
(WMAgent)
wmLHEGS: LHE and GEN-SIM production in a single step (default way)
pLHE: private or personal LHE files
Pythia (GEN-SIM): A generator which can do both hard scattering and hadronization,
the input in general can be a LHE file
wmLHE and pLHE are the steps that produce hard scattering processes, and then
stores those events in EDM format (i.e., genParticles), which can be used later by
hadronizer
Wajid Ali Khan Monte Carlo Production System 09.04.2019 12/32
McM Terminology and MC Processing: 1
Request is a set of instructions and configuration options for Monte Carlo event
generation prepared by the Generator and PPD groups. It may represent different
processing steps and their combinations (LHE, GEN-SIM, RECO)
TOP-RunIIFall18pLHE-00003, TOP-RunIIFall18GS-00003
TOP-RunIIAutumn18DRPremix-00109
TOP-RunIIAutumn18MiniAOD-00116, TOP-RunIIAutumn18NanoAOD-00041
Flow is a connection between at least two (2) campaigns to produce a dataset in
more than one campaign. It can overwrite parameters in subsequent campaign e.g.,
flowRunIIFall18GS → flowRunIIAutumn18DRPremix →
flowRunIIAutumn18MiniAOD → flowRunIIAutumn18NanoAOD
flowPhaseIISpring17DPU200, DRNoPU, DRPU140 and similarly 0T, 38T
Prep ID is a unique identification string for a Monte Carlo request that allows to track
it in different systems
Workflow is a set of tasks to be processed by the production tools. For each request
there can be multiple workflows
Campaign is a central platform in McM used to produce a set of requests sharing the
same physics goal, software release, energy and event processing configuration e.g.,
GENonly, GEN-SIM, wmLHE, wmLHEGS, DIGI-RECO, DIGIonly etc,.
Wajid Ali Khan Monte Carlo Production System 09.04.2019 13/32
McM Terminology and MC Processing: 2
wmLHE+GS
wmLHE DIGI L1 Reco
GEN
L1 Reco
+ DIGI2RAW VALIDATION MiniAOD NanoAOD
pLHE SIM HLT DQM
Sequences of the campaign can be changed at the flow level e.g.,
"magField":"0T" , "pileup":"NoPileUp", "conditions":"specificGT"
Wajid Ali Khan Monte Carlo Production System 09.04.2019 14/32
McM Terminology and MC Processing: 3
Chained campaign is a sequence of campaigns connected by flows determining the
succession of processing steps and campaigns which are needed to deliver datasets
for analysis e.g.,
chain_RunIIWinter19PFCalib16GS_flowRunIIWinter19PFCalib16DRPU0to70_
flowRunIIWinter19PFCalib16MiniAOD_flowRunIIWinter19PFCalib16NanoAOD
Chained campaigns can start from wmLHE, wmLHEGS, pLHE, GEN-SIM
Chained request is a concrete set of processing requests starting from a root
request and going through the steps of a chained campaign e.g.,
TOP-chain_RunIIFall18wmLHEGS_flowRunIIAutumn18DRPremix_
flowRunIIAutumn18MiniAOD_flowRunIIAutumn18NanoAODv4-00003
The number at the end of request is always unique 00003:
https://cms-pdmv.cern.ch/mcm/chained_requests?member_of_campaign=
chain_RunIIFall18wmLHEGS_flowRunIIAutumn18DRPremix_
flowRunIIAutumn18MiniAOD_flowRunIIAutumn18NanoAODv4&prepid=TOP*00003
Wajid Ali Khan Monte Carlo Production System 09.04.2019 15/32
Production Monitoring Platform (pMp): 1
Production Monitoring Platform: https://cms-pdmv.cern.ch/pmp
pMp is developed to monitor the progress and statistics of Monte Carlo requests,
flows, campaigns and workflows using their prepIDs in different options:
Present Statistics: Total events or requests for the specific searched items
Historical Statistics: Shows expected, current and done events over time, and a list of
submitted requests with their progress
Performance Statistics: Shows total time taken by a request to go from one status to
another
CMS user can check the status of relevant samples by using PrepID/campaign
Inset shows the growth of RunIIAutumn18MiniAOD over the time since approval
Wajid Ali Khan Monte Carlo Production System 09.04.2019 16/32
Production Monitoring Platform (pMp): 2
Status of a present statistics in announce mode for a particular campaign
Total number of requests created, DONE, SUBMITTED, APPROVED and NEW
Wajid Ali Khan Monte Carlo Production System 09.04.2019 17/32
Status of Requests in pMp:
Link to pMp plots from McM
View announced statistics for request e.g., TOP-RunIIFall17wmLHEGS-00064
View growing statistics for request
View historical statistics for request
Buttons are present under Actions:
Requests: *TOP*Autumn18*
Campaigns: *RunIIAutumn18*
Flows: *RunIIAutumn18*
Chained Campaigns: *Autumn18*
Information that can be extracted:
Total number of submitted events
Events appeared in DAS (statistics from running jobs of a request)
Done events in DAS (statistics from finished jobs of a request)
Wajid Ali Khan Monte Carlo Production System 09.04.2019 18/32
Dataset Name’s Terminology:
Dataset name in CMS always follows the format (three forward slashes): /*/*/*
/DatasetName/Campaign-ProcessString-globalTag-Ext-Version/DataTier
We can get all the datatier from DBS:
dataset=/TTToSemiLeptonic_mtop171p5_TuneCP5_PSweights_
13TeV-powheg-pythia8/*/*
If an additional statistics are required for any sample the extension (ext1/2/3) of that
particular sample is requested in McM
Wajid Ali Khan Monte Carlo Production System 09.04.2019 19/32
Data Aggregation System (DAS): 1
CMSSW GEN-SIM MiniAOD McM Prep-ID
Search sample on DAS: https://cmsweb.cern.ch/das
/TT_TuneCUETP8M2T4_13TeV-powheg-pythia8/RunIISummer17MiniAOD-92X_
upgrade2017_realistic_v10_ext1-v2/MINIAODSIM
Samples available in DAS marked with VALID, INVALID, PRODUCTION
Wajid Ali Khan Monte Carlo Production System 09.04.2019 20/32
Data Aggregation System (DAS): 2
Collection of Files GT, CMSSW
Completed/Announced samples are marked in DAS as: VALID
PRODUCTION statistics is still growing and dataset is not yet announced
Run over the available statistics by using allowNonValidInputDataset parameter
in CrabConfigFile
Wajid Ali Khan Monte Carlo Production System 09.04.2019 21/32
Finding Details from DAS/McM: 1
Log on to: https://cms-pdmv.cern.ch/mcm and click the Navigation button
There are four fields which can be used to make a search:
Prep-ID: BTV-RunIISummer17MiniAOD-00070
Dataset Name: TT_TuneCUETP8M2T4_13TeV-powheg-pythia8
MccM Ticket: BTV-2017Aug09-0000*
Request Tags: PAGLHCP19
Wajid Ali Khan Monte Carlo Production System 09.04.2019 22/32
Finding Details from DAS/McM: 2
Alternately go to Requests Tab: https://cms-pdmv.cern.ch/mcm/requests Click the
Navigation tab
Lists prepid’s in which the BTV-RunIISummer17MiniAOD-00070 has been used
Output dataset can also be extracted along with requests status
Wajid Ali Khan Monte Carlo Production System 09.04.2019 23/32
Finding details from DAS/McM: 3
Options from Select View tab for request: BTV-RunIISummer17DRPremix-00085
Couple of interesting things are:
Pileup dataset name: Pileup Data set used
Config Id: Configuration files for DIGI and RECO steps
Sequences: cmsDriver infomation on DIGI and RECO steps
Reqmgr name: Shows production status and its link from McM to dataset in DAS
Wajid Ali Khan Monte Carlo Production System 09.04.2019 24/32
Finding Details from DAS/McM: 4
Exploring some more options for the request under considerations
click to see DIGO ConfigFile
click to see RECO ConfigFile click on eye to see cmsDriver details
Available Options are different for a user
Click to see full chain + various step of same request
Wajid Ali Khan Monte Carlo Production System 09.04.2019 25/32
Finding Details for GS Requests:
Exploring some options for the GS: BTV-RunIISummer17wmLHEGS-00001
click to see fragment details LHE/wmLHEGS
grid pack location and other gen level info
get configuration files
Wajid Ali Khan Monte Carlo Production System 09.04.2019 26/32
CMSDriver/Scripts to Produce Events:
Create a request in any campaign e.g., RunIIFall18wmLHEGS
Always check the created request by running it locally before starting validation
click to see existing request in campaign
click to get the test command
click to trigger the validation
Select a campaign in accord with your needs by checking the cmsdriver/sequences
For more details write us at: hn-cms-prep-ops@cern.ch
Wajid Ali Khan Monte Carlo Production System 09.04.2019 27/32
HNs, Egroups and Meetings:
Prep-ops: hn-cms-prep-ops@cern.ch
Default gateway to interact with MC GEN contacts, experts from Trigger, AlcaDB,
computing, and to post MccM announcements
Generator group: hn-cms-generators@cern.ch
Main HN to discuss MC generation issues, and to post MccM announcements
CMSSW Release and data operations
hn-cms-relAnnounce@cern.ch, cms-release-dataops@cern.ch
To discuss and integrate new algorithms in official CMS software
Monte Carlo Coordination Meetings (MccM):
This meeting is open to GEN contacts (DPGs, POGs, PAGs) and CMS analysts to discuss
MC requests/tickets with MccM core team
Twiki: https://twiki.cern.ch/twiki/bin/view/CMS/PdmVMonteCarloCoordinationMeeting
CERN Time: 3 PM - 4 PM, every Wednesday
PPD General Meeting:
https://indico.cern.ch/category/3905
CERN Time: 2 PM - 4 PM, every Thursday
ORP:
CMSSW release plan, integration of new pull requests in CMSSW, etc
CERN Time: 5 PM - 6 PM, every Tuesday
Announced at: cms-release-dataops@cern.ch
Wajid Ali Khan Monte Carlo Production System 09.04.2019 28/32
Toy Example:
Suppose: /TTToSemiLeptonic_TuneCP5down_PSweights_13TeV-powheg-pythia8/
RunIIFall17MiniAOD-PU2017_94X_mc2017_realistic_v11-v1/MINIAODSIM
We want to find the corresponding request in McM:
Physics Working Group-Chain Used-Unique Number
Root Request of the dataset
Wajid Ali Khan Monte Carlo Production System 09.04.2019 29/32
PdmV Twikis/Material:
PdmV Twiki: https://twiki.cern.ch/twiki/bin/view/CMS/PdmV
McM Twiki: https://twiki.cern.ch/twiki/bin/view/CMS/PdmVMcM
pMp Twiki: https://twiki.cern.ch/twiki/bin/viewauth/CMS/PdmVpMp
McM Glossary: https://twiki.cern.ch/twiki/bin/viewauth/CMS/PdmVMcMGlossary
Previous McM Tutorial: https://indico.cern.ch/event/674156
Wajid Ali Khan Monte Carlo Production System 09.04.2019 30/32
Exercises: 1
Select any MiniAOD dataset from your analysis and find it in DAS
Find its McM prepID, global tag and CMSSW release
Find LHE, GEN-SIM, DIGI-RECO requests that are used to produce this MiniAOD request
Find global tag, x-section, generator level cuts in a GEN-SIM request that were used to
produce that MiniAOD
Check if there is any extension of the sample has been produced
Find the Les Houches Event file used to produce TOP-RunIIFall18GS-00003.
Find various requests been made using request B2G-RunIIWinter15wmLHE-00007
as a root request.
Find cmsDriver settings used for request TOP-RunIIAutumn18MiniAOD-00006.
Find the grid pack used to produce request TOP-RunIIAutumn18MiniAOD-00006.
Find PDF used by default in the sample. There can be a number of ways but find it using
at least two different ways.
Find data-cards used to generate the root request.
Wajid Ali Khan Monte Carlo Production System 09.04.2019 31/32
Exercises: 2
Validation of the following tt̄ FCNC request TOP-RunII-Fall18-wmLHEGS-00249 will
never succeed. Using request checking script patch the grid pack and check if the
validation is successful.
What are the number of partons in born matrix element for highest multiplicity.
Taking the request TOP-RunII-Fall18-wmLHEGS-00226:
Check if H→bb̄, ZZ, τ τ decays have been included.
Check if all other Higgs decays have been switch off.
Check if request includes the parton shower weights.
Find the CR tune used in request.
List all the B2G requests starting from B2G*01469 to B2G*01481 in the campaign
RunIIFall17wmLHEGS.
Split the requests depending on their status none/new, submit/submitted,
define/defined, submit/approved
Find all the SMP, B2G, TOP requests produced with the chained campaign:
chain_RunIIFall18wmLHEGS_flowRunIIAutumn18DRPremix_
flowRunIIAutumn18MiniAOD_flowRunIIAutumn18NanoAODv4
Wajid Ali Khan Monte Carlo Production System 09.04.2019 32/32
Backup
Wajid Ali Khan Monte Carlo Production System 09.04.2019 33/32
Get the Test Command:
Open a terminal window
Get test command: wget
https://cms-pdmv.cern.ch/mcm/public/restapi/requests/
get_test/PPD-RunIIFall18wmLHEGS-00001
Initialize your grid proxy certificate: voms-proxy-init -voms cms
Change the permission: chmod +x PPD-RunIIFall18wmLHEGS-00001
Launch the script: ./PPD-RunIIFall18wmLHEGS-00001
Request.xml, Request.py, Request.root files will be created
Read logs and explore the generated files
You can also produce DIGI-RECO and MiniAOD in same file by adding appropriate
cmsdrivers
Wajid Ali Khan Monte Carlo Production System 09.04.2019 34/32
Conditions (Global Tags):
The alignment and calibration conditions needed by all stages of the data production
(SIM, DIGI: for simulated events) and processing (RECO, MiniAOD: for simulation
and reconstruction alike) in CMSSW can be retrieved using global tags
Global tag (GT):
A single entry point to retrieve all conditions consumed by a given workflow
GT is a collection of 200-400 tags, which are set of AlCa parameters measured by
calibration experts in DPG/POGs, released to dedicated database
It’s usually identified by a string e.g., 92X_upgrade17_realistic_v1
CMSCondDB: web portal for administration and navigation of the existing global tags
Wajid Ali Khan Monte Carlo Production System 09.04.2019 35/32