Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
78 commits
Select commit Hold shift + click to select a range
ff50226
Create cron_job_runner.py
artlbv Jun 19, 2024
17b634f
Update cron_job_runner.py
artlbv Jun 20, 2024
0d0aea3
Update cron_job_runner.py
artlbv Jun 20, 2024
dd3b214
file paths changed
LukasEbeling Aug 15, 2024
418396b
cron job runner hadding output files per run
LukasEbeling Aug 15, 2024
14e9725
deleted
LukasEbeling Sep 2, 2024
e8992a7
cron jobs for creating root hists, hadding hists, and plotting
LukasEbeling Sep 2, 2024
8bdf565
init
LukasEbeling Sep 16, 2024
8493af5
path changed: year/era/...
LukasEbeling Sep 16, 2024
6c29487
log files for cron job
LukasEbeling Sep 16, 2024
bf33276
not needed
LukasEbeling Oct 18, 2024
7910141
renamed
LukasEbeling Oct 18, 2024
7dfc901
new try
LukasEbeling Oct 18, 2024
1b1e942
init
LukasEbeling Oct 18, 2024
54df4f5
init
LukasEbeling Oct 24, 2024
2d5ce4c
cap after 5 files removed
LukasEbeling Oct 24, 2024
c7b4917
merging
LukasEbeling Oct 24, 2024
6ac0a07
plotting
LukasEbeling Oct 24, 2024
5a01d2d
utils renamed
LukasEbeling Oct 24, 2024
10154ed
htcondor setup
LukasEbeling Oct 24, 2024
798f0c0
logs, queue added
LukasEbeling Oct 24, 2024
9040bce
for different in- and output path
LukasEbeling Nov 11, 2024
0bfefa9
submitted files zipped
LukasEbeling Nov 11, 2024
318109b
submit files zipped
LukasEbeling Nov 11, 2024
6e39453
merging per week added
LukasEbeling Nov 18, 2024
ec04c6f
merging per week added
LukasEbeling Nov 18, 2024
b795d46
files moved
LukasEbeling Dec 3, 2024
bf61876
directories added
LukasEbeling Dec 3, 2024
eb7b3b8
not needed
LukasEbeling Dec 3, 2024
170eabd
path to queue.txt changed
LukasEbeling Dec 4, 2024
e24e41f
init file to generate tar file of repo (submitted to htcondor)
LukasEbeling Dec 4, 2024
b01d3a5
paths adjusted
LukasEbeling Dec 4, 2024
31c99d6
clean up
LukasEbeling Dec 4, 2024
ff025d2
added to ignore file
LukasEbeling Dec 4, 2024
7531d97
paths changed
LukasEbeling Dec 4, 2024
746df4f
clean up
LukasEbeling Dec 4, 2024
828ced8
clean up, tar file not needed
LukasEbeling Dec 4, 2024
c9215e4
comment changed
LukasEbeling Dec 4, 2024
9c2932d
htcondor flag now as function in utils
LukasEbeling Dec 4, 2024
4e13251
clean up, htcondor flag put to utils
LukasEbeling Dec 4, 2024
ecc2ae2
init (so far only dump)
LukasEbeling Dec 5, 2024
a600ad4
abort condition based on file time
LukasEbeling Dec 5, 2024
32a826a
clean up, htcondor flag imported from utils
LukasEbeling Dec 5, 2024
a331577
executed cmd printed to output file
LukasEbeling Dec 5, 2024
a848e40
new abort condition for hadd
LukasEbeling Dec 5, 2024
7651944
init
LukasEbeling Dec 9, 2024
2988957
not needed
LukasEbeling Dec 9, 2024
948c699
duplicate line removed
LukasEbeling Dec 9, 2024
e95b604
init: file to run all
LukasEbeling Dec 9, 2024
aae652a
merge all eras moved to different script
LukasEbeling Dec 9, 2024
ef76c39
file cleaning added based on file timing
LukasEbeling Dec 9, 2024
5d5181d
hadd -> abort condition now based on file timing
LukasEbeling Dec 9, 2024
a76aefb
user friendly documentation
LukasEbeling Dec 9, 2024
c272950
import clean up
LukasEbeling Dec 9, 2024
d47df7f
clearer naming convention
LukasEbeling Dec 9, 2024
7478ae7
Merge branch 'main' into test/automation: updates in upstream repository
LukasEbeling Dec 16, 2024
972a441
clean up
LukasEbeling Dec 16, 2024
03dcb17
import added
LukasEbeling Dec 16, 2024
275fa53
path to dqm dir changed
LukasEbeling Dec 16, 2024
68a86e9
tar file removed
LukasEbeling Jan 30, 2025
a85cb47
replace removed
LukasEbeling Jan 30, 2025
1253c67
commented out for testing cron job
LukasEbeling Mar 17, 2025
467228c
no double merging
LukasEbeling Mar 17, 2025
bb04a6f
year no longer hard coded
LukasEbeling Mar 17, 2025
a738ebb
install path automatically found
LukasEbeling Mar 18, 2025
83a186e
log file added
LukasEbeling Mar 18, 2025
ec97097
cron jobs capped at 10 files per dataset
LukasEbeling Mar 18, 2025
10e561b
more verbose
LukasEbeling Mar 18, 2025
ba613a4
code commented in
LukasEbeling Mar 19, 2025
259f805
commented for testing
LukasEbeling May 19, 2025
a328aaa
Update README.md
LukasEbeling May 27, 2025
a0c095d
Modifications for running on 2025 PromptNano. PF JetID definition add…
jaisatul May 28, 2025
01937c5
Merge branch 'dev/automation' of github.com:cms-l1-dpg/MacrosNtuples …
jaisatul May 28, 2025
ff937c4
output directory is created if it does not exist
LukasEbeling Jun 20, 2025
5be8b7b
dirname based on topology
LukasEbeling Jun 20, 2025
55d7094
Update helper_nano_dqmoff.py
jaisatul Jun 30, 2025
588d5fd
VBF MET+DiJet histo removed from helper_nano.EtSum()
jaisatul Jul 1, 2025
708b316
MET plots switches off for now while debugging trigger paths.
LukasEbeling Jul 1, 2025
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 4 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
@@ -0,0 +1,4 @@
**/__pycache__/**
/automation/logs/**
/automation/queue.txt
**.log
46 changes: 46 additions & 0 deletions automation/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# Automation Tool Kit
Tool kit to (semi) automize production of DQM plot from NanoAODs stored on tier 0. The automation is based on the cron job scheduler which executes bash scripts and commands periodically. The following commands will be useful:

- show list of scheduled cron jobs: `acrontab -l`
- remove all scheduled cron jobs: `acrontab -r`
- open cron job editior: `acrontab -e`

last acron command:
```
0 * * * * lxplus cd /afs/cern.ch/user/l/lebeling/MacrosNtuples/automation && sh cron_job.sh >>cron.log 2>&1
```


Inside the cron jib editior:
- save changes via ctrl+o
- close editior via ctrl+x

Before running the automation tool kit, adjust the output path (i.e. the directory in which all plots and histogram are deployed) in: `utils.py` -> `dqm_prefix`

To run the automation tool kit, open the cron editor and paste the following command (replace *PATH* by the actual installation path on lxplus):
```
* */1 * * * lxplus cd /PATH/MacrosNtuples/automation && sh cron_job.sh >>cron.log 2>&1
```
Cron will execute the command once every hour saving the output messages (and errors) into the *cron.log* logfile. More details on how to configure the timing of a cron job, can be found [here](https://crontab.guru).


## automation steps
The different processing steps are summarized as:

1) histograms
Run `python3 make_hists.py` to produce `.root` files containing histograms for all data types (i.e. EGamma, Muon, JetMet). Which selections are run is specified in `config.yaml` (see scritps). If the respective output file already exists, the histogram production is skipped.

2) merge per run
Run `python3 merge_per_run.py` to merge (i.e. hadd) the histogram files per run. If the respective output file already exists and is newer than all base histogram files, the merging is skipped.

3) merge per era/week
Run `python3 merge_per_era.py` to further merge (i.e. hadd) the histograms per era (i.e. Run2024H) and per week using the merged histograms per run. If the respective output file already exists and is newer than all base histogram files, the merging is skipped.

4) merge per type
Run `python3 merge_total.py` to merge (i.e. hadd) all histograms of one data type (i.e. EGamma, Muon, JetMet) using the merged histograms per era. If the respective output file already exists and is newer than all base histogram files, the merging is skipped.

5) plotting
Run `make_plots.py` to produce png/pdf plots from all merged histograms (merge per run/era/week/total). The plotting repective scripts are specified `condfig.yaml`. If the png/pdf files already exist and are newer than the histogram files, the plotting is skipped.

## htcondor setup
All prduction steps listed above can be run on htcondor. Using the flag `--htcondor`, the repective plotting scripts are not directly executed but instead written into the `queue.txt` file. With `condor_submit submit.txt`, all commands in the queue are submitted to htcondor. This mode is highly recommended to (re-)run all files currently stored on tier 0.
45 changes: 45 additions & 0 deletions automation/config.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,45 @@
JetMET:
datasets:
- 'JetMET0'
- 'JetMET1'
eras:
- 'Run2025*'
scripts:
- 'python3 ../l1macros/performances_nano.py -i $INFILE -o $OUTDIR/all_DiJet.root -c DiJet'
plotting:
- 'python3 ../plotting/make_DiJet_plots.py --dir $OUTDIR --config ../config_cards/full_DiJet.yaml'

EGamma:
datasets:
- 'EGamma0'
- 'EGamma1'
- 'EGamma2'
- 'EGamma3'
eras:
- 'Run2025*'
scripts:
- 'python3 ../l1macros/performances_nano.py -i $INFILE -o $OUTDIR/all_PhotonJet.root -c PhotonJet'
- 'python3 ../l1macros/performances_nano.py -i $INFILE -o $OUTDIR/all_ZToEE.root -c ZToEE'
- 'python3 ../l1macros/performances_nano_dqmoff.py -i $INFILE -o $OUTDIR/oug_zee_dqmoff.root -c ZToEEDQMOff'
plotting:
- 'python3 ../plotting/make_ZToEE_plots.py --dir $OUTDIR --config ../config_cards/full_ZToEE.yaml'
- 'python3 ../plotting/make_PhotonJet_plots.py --dir $OUTDIR --config ../config_cards/full_PhotonJet.yaml'

Muon:
datasets:
- 'Muon0'
- 'Muon1'
eras:
- 'Run2025*'
scripts:
- 'python3 ../l1macros/performances_nano.py -i $INFILE -o $OUTDIR/all_ZToMuMu.root -c ZToMuMu'
- 'python3 ../l1macros/performances_nano.py -i $INFILE -o $OUTDIR/all_MuonJet.root -c MuonJet' #TODO not working
- 'python3 ../l1macros/performances_nano.py -i $INFILE -o $OUTDIR/all_ZToTauTau.root -c ZToTauTau'
- 'python3 ../l1macros/performances_nano_dqmoff.py -i $INFILE -o $OUTDIR/out_zmumu_dqmoffl.root -c ZToMuMuDQMOff'
- 'python3 ../l1macros/performances_nano_dqmoff.py -i $INFILE -o $OUTDIR/out_jets_dqmoff.root -c JetsDQMOff'
- 'python3 ../l1macros/performances_nano_dqmoff.py -i $INFILE -o $OUTDIR/out_ztautau_dqmoff.root -c ZToTauTauDQMOff'
- 'python3 ../l1macros/performances_nano_dqmoff.py -i $INFILE -o $OUTDIR/out_etsum_dqmoff.root -c EtSumDQMOff'
plotting:
- 'python3 ../plotting/make_ZToMuMu_plots.py --dir $OUTDIR --config ../config_cards/full_ZToMuMu.yaml'
- 'python3 ../plotting/make_ZToTauTau_plots.py --dir $OUTDIR --config ../config_cards/full_ZToTauTau.yaml'
- 'python3 ../plotting/make_MuonJet_plots.py --dir $OUTDIR --config ../config_cards/full_MuonJet.yaml'
8 changes: 8 additions & 0 deletions automation/cron_job.sh
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
#!/bin/bash
# python3 make_hists.py
# python3 merge_per_run.py
# python3 merge_per_era.py
# python3 merge_total.py
# python3 make_plots.py
date
echo "All done!"
52 changes: 52 additions & 0 deletions automation/make_hists.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
#!/bin/python3
import yaml, os
from glob import glob
from utils import htcondor_flag, parse_file, run_command, write_queue, tier0, dqm_prefix

config_file = yaml.safe_load(open('config.yaml', 'r'))

htcondor = htcondor_flag()


for label, config in config_file.items():

#step 1 - find all files on tier 0
#fnames = [glob(f"{tier0}/{era}/{dataset}/NANOAOD/PromptReco-v*/*/*/*/*/*.root") for era in config["eras"] for dataset in config["datasets"]]
fnames = []
for era in config["eras"]:
for dataset in config["datasets"]:
files = glob(f"{tier0}/{era}/{dataset}/NANOAOD/PromptReco-v*/*/*/*/*/*.root")
#print(files)
for f in files:
part = f.split("/")
run_str = part[-4] + part[-3]
run_num = int(run_str)
if run_num >= 392241:
fnames.append(f)
#fnames = [item for sublist in fnames1 for item in sublist]
#step 2 - remove files that have already been processed
for file in fnames:
output_path = dqm_prefix + parse_file(file)
num_root_files = len(glob(f"{output_path}/*.root"))
if num_root_files > 0:
fnames.remove(file)
print(file + " not processed")


#step 3 - run scripts
#enumerate over all files
for i, file in enumerate(fnames):
#if not htcondor and i == 10: break

print(f"Processing file {file}")

output_path = dqm_prefix + parse_file(file)

for cmd in config["scripts"]:
cmd = cmd.replace("$OUTDIR", output_path)
cmd = cmd.replace("$INFILE", file)

os.makedirs(output_path, exist_ok=True)

if htcondor: write_queue(cmd)
else: run_command(cmd, output_path + "/log.txt")
35 changes: 35 additions & 0 deletions automation/make_plots.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
#!/bin/python3

import os, argparse, yaml
from glob import glob
from utils import run_command, write_queue, htcondor_flag, dqm_prefix

# load config
config_dict = yaml.safe_load(open('config.yaml', 'r'))

htcondor = htcondor_flag()

# main logic: glob files merged root files and make plots
for label, config in config_dict.items():
pattern = os.path.join(dqm_prefix, '**', label,'**', 'merged')
merged_dirs = glob(pattern, recursive=True)

for merged_dir in merged_dirs:

# abort plotting if all .png files are newer than all .root files
t_newest, t_oldest = 0, 0
root_files = glob(f"{merged_dir}/*.root")
png_files = glob(f"{merged_dir}/plotsL1Run3/*.png")
if len(root_files) > 0: t_newest = max(os.path.getctime(f) for f in root_files)
if len(png_files) > 0: t_oldest = min(os.path.getctime(f) for f in png_files)
if t_oldest > t_newest:
print('skipping: ' + merged_dir)
continue

for cmd in config["plotting"]:
print(80*"#"+'\n'+f"plotting for {merged_dir}")
os.makedirs(merged_dir + '/plotsL1Run3', exist_ok=True)
cmd = cmd.replace("$OUTDIR", merged_dir)
print(cmd)
if htcondor: write_queue(cmd)
else: run_command(cmd, merged_dir + '/log.txt')
39 changes: 39 additions & 0 deletions automation/merge_per_era.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,39 @@
#!/bin/python3

from glob import glob
from utils import hadd, get_weeks, htcondor_flag, dqm_prefix

htcondor = htcondor_flag()

# collect all histogram root files merged by run
all_files = glob(f"{dqm_prefix}/*/*/*/*/*/merged/*.root") #change later to dqm_prefix

weeks = get_weeks()

# group files by week and era
file_groups = {}
for file in all_files:
parts = file.split('/')
filename = parts[-1]
run = int(parts[-3])
era = parts[-6]
label = parts[-7]

# group by week - not all run in list?
if run in weeks.keys():
week = weeks[run]
target = f"{dqm_prefix}/Weekly/{week}/{label}/merged/{filename}"
if target not in file_groups:
file_groups[target] = []
file_groups[target].append(file)

# group by era
target = f"{dqm_prefix}/{label}/{era}/merged/{filename}"
if target not in file_groups:
file_groups[target] = []
file_groups[target].append(file)


# Hadd grouped files
for target, files in file_groups.items():
hadd(target, files, htcondor)
33 changes: 33 additions & 0 deletions automation/merge_per_run.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,33 @@
#!/bin/python3

from glob import glob
from utils import hadd, clean, htcondor_flag, dqm_prefix

# parse arguments
htcondor = htcondor_flag()

# collect all base histogram root files
all_files = glob(f"{dqm_prefix}/*/*/*/*/*/*/*.root")
all_files = [f for f in all_files if 'merged' not in f]
cleaned_files = clean(all_files)

# group files by runnum, by era, and by year
file_groups = {}
for file in cleaned_files:
parts = file.split('/')
run = int(parts[-3])
era = parts[-6]
dataset = parts[-7]

filename = parts[-1]
filehash = parts[-2]

# group by runnum
target = file.replace(filehash, "merged")
if target not in file_groups:
file_groups[target] = []
file_groups[target].append(file)

# Hadd grouped files
for target, files in file_groups.items():
hadd(target, files, htcondor)
29 changes: 29 additions & 0 deletions automation/merge_total.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
#!/bin/python3

from glob import glob
from utils import hadd, clean, get_weeks, htcondor_flag, dqm_prefix

htcondor = htcondor_flag()

# collect all histogram root files merged by run
all_files = glob(f'{dqm_prefix}/*/*/merged/*.root') #change later to dqm_prefix
cleaned_files = clean(all_files)

# group by type (i.e. Muon, EGamma, JetMet, etc.)
file_groups = {}
for file in cleaned_files:
parts = file.split('/')
filename = parts[-1]
era = parts[-3]
label = parts[-4]

target = f"{dqm_prefix}/{label}/merged/{filename}"
if target not in file_groups:
file_groups[target] = []
file_groups[target].append(file)


# Hadd grouped files
for target, files in file_groups.items():
hadd(target, files, htcondor)
#print(target, files)
11 changes: 11 additions & 0 deletions automation/submit.txt
Original file line number Diff line number Diff line change
@@ -0,0 +1,11 @@
executable = wrapper.py
arguments = $(cmd)
output = logs/$(ClusterId).$(ProcId).out
error = logs/$(ClusterId).$(ProcId).err
log = logs/$(ClusterId).$(ProcId).log
+JobFlavour = espresso

should_transfer_files = yes
when_to_transfer_output = on_exit

queue cmd from queue.txt
Loading