Running Analysis

EventDealer is the piece of code that runs a list of analysis modules in the LAr10Ana/ana/ folder on a single run, and saves the all output to file. You can run it directly to test and debug, or to run it on the FermiGrid using the scripts on all data.

Running Directly (on GPVM or EAF)

To test if an analysis module is working properly, you can run the EventDealer on a single run on couppsbcgpvm01.fnal.gov or EAF. You’ll get a more complete error message in the output, and don’t need to wait for the job to be scheduled. The file IO path will be different from when running on the grid, so it doesn’t help with debugging permission issues, etc.

  1. Clone the LAr10Ana repository in your home directory or a personal working directory like /exp/e961/app/users/<username>.

  2. Edit LAr10Ana/grid_jobs/EventDealer.py, change the path to the run data tarball, the path to save the output, and the processes to run.

if __name__ == "__main__":
    if len(sys.argv) > 1:
        ProcessSingleRun(
            rundir=sys.argv[1],
            recondir=sys.argv[2],
            process_list = ["run", "event", "exposure", "scintillation", "scint_rate", "bubble", "reco", "pressure_t0", "t_expansion"])
    else:
        ProcessSingleRun(
            rundir="/exp/e961/data/SBC-25-daqdata/20260221_0.tar",
            recondir="/nashome/z/zsheng/test", # Use your own directory for testing~
            process_list = ["run", "event", "exposure", "scintillation", "scint_rate", "bubble", "reco", "pressure_t0", "t_expansion"])
  1. Activate conda environment

source LAr10Ana/setup.sh
  1. Run EventDealer in one of two ways

# using info in the else clause
python LAr10Ana/grid_jobs/EventDealer.py

# provide info in argument
python LAr10Ana/grid_jobs/EventDealer.py /exp/e961/data/SBC-25-daqdata/20260221_0.tar /nashome/z/zsheng/test

Running on the FermiGrid

The LAr10Ana/grid_jobs/ directory also contains a few bash scripts to help with submitting and running jobs on the Fermi grid. The jobs need to be submitted from couppsbcgpvm01.fnal.gov. For production mode and automated submission, use coupppro user, which is set up with managed kerberos ticket.

1. batch_run_gridjob.sh

This is the script to run to submit multiple jobs. It checks the folder of both the raw data tarball and the analysis outputs. For each run, it checks if the recon output doesn’t exist, or current repository tag (like v0.0.7) is newer than the existing output for this run. If either is true, it will proceed to submitting a job for this run. It keeps track of the run ID and job ID of submitted jobs at ~/.cache/sbc_job_list.csv, and also uses a lockfile to prevent multiple instances running at the same time.

Three tags are available for this script.

  • --verbose or -v: Verbose mode, more detailed logging. Passed to run_gridjob.sh.

  • --force-rerun or --force or -f: Force mode, submit jobs for all runs without the checks

  • --production or --pro or -p: Production mode, for automated submission using production role permission and a set of different paths. Passed to run_gridjob.sh.

2. run_gridjob.sh

This script is used to submit jobs for a single run. It can be used indepently or by batch_run_gridjob.sh. It will copy the raw data tarball to the PNFS where the grid nodes have access to, make a tarball of the LAr10Ana repository, create output path in PNFS if it doesn’t yet exist, and actually submit the job.

The working directory for each user is at /pnfs/coupp/scratch/users/${USER} to allow proper permission. Two folders will be created inside: temp_data to store a copy of the data tarball, and grid_output to store the output files of the analysis.

When submitting jobs using the jobsub_lite program, it also calcuates the appropriate disk space, RAM, and run time to request. Too much resources requested may mean longer wait time before the job is scheduled, while too little requested may result in job exceeding limit and being held. The resoruces are calculated based on the size of the tarball. This can be improved to distinguish between runs with many events, vs runs with few events but large files in each (like long scintillation runs).

  • Disk: 3x size + 5GB, min 5GB, max 500GB

  • RAM: 2x size + 2GB, minimum 2GB, max 16GB

  • Run time: 1h/5GB x size + 2h, minimum 2h, maximum 24h

Two tags are available:

  • --verbose or -v: Verbose mode, more detailed logging.

  • --production or --pro or -p: Production mode. This only works if submitted as coupppro@couppsbcgpvm01.fnal.gov. The working directory in production mode is at /pnfs/coupp/scratch/coupppro instead. The grid jobs are submitted with production role using the managed kerberos ticket. (This presumply allows for priority and better resources allocation compared to normal analysis role, but I haven’t really noticed a difference, since not a lot of people in the coupp group are running jobs. -ZS)

3. gridjob.sh

This script is the one that actually runs on the grid node. It configures conda environment, copies the data from PNFS to the internal storage of the node, untars it, and runs EventDealer.py. After EventDealer.py is finished, is copies the output folder from the node back to the PNFS.

4. cleanup.sh

This is another piece of script that needs to be run explicitly. Since all data and output files are in the scratch part of PNFS, they are not persistent. For each run in the grid_output folder, it checks the exit code of the log file. If the code is not zero, then the output folder is deleted. If it’s zero, meaning success, it will then compare the version of the output to the run existing in the destination (/exp/e961/data/SBC-25-recon/dev-output in most cases). If the version is the same or newer, then the output in the destination will be overwritten by the new output. If not, the new output will be discarded. Then, it removes the run from ~/.cache/sbc_job_list.csv. The raw data tarball is left in temp_data folder, to avoid repeated file operations.