Running Pleb¶
This chapter explains the three supported ways to run Pleb:
Directly via the command-line interface (CLI).
Via a settings file (TOML/JSON config).
Via a workflow file (TOML/JSON) that orchestrates multiple steps.
It also lists additional entry points (Python API) and how the modes interoperate.
Overview of modes¶
CLI (no workflow) is the simplest path: you pass a settings file and optionally override values inline. Workflows build on settings by adding a sequence of steps and optional loops.
Supported entry points today:
pleb(main CLI, runs the pipeline by default)pleb qc-report(generate QC plots from an existing run directory)pleb ingest(mapping-driven ingest mode)pleb workflow(multi-step sequences and loops)Python API (
pleb.pipeline.run_pipeline,pleb.param_scan.run_param_scan,pleb.qc_report.generate_qc_report,pleb.ingest.ingest_dataset)
If you want a new entry point (for example, a minimal QC-only CLI), add a workflow step or wrap the Python API in your own script.
Mode 1: Direct CLI (no settings file)¶
You can run the CLI without a settings file by supplying the core parameters via flags. This is best for quick, one-off runs.
Minimal run (CLI-only):
pleb \
--results-dir /data/pulsar_results \
--outdir-name run_cli_only \
--set home_dir="/data/pulsars" \
--set dataset_name="DR3full" \
--set pulsars='["J1713+0747","J1909-3744"]' \
--set branches='["main"]'
Enable QC and change options:
pleb \
--results-dir /data/pulsar_results \
--outdir-name run_cli_qc \
--set home_dir="/data/pulsars" \
--set dataset_name="DR3full" \
--set pulsars='["J1713+0747"]' \
--set branches='["main"]' \
--qc \
--qc-structure-mode both \
--qc-outlier-gate
Parameter scan (CLI-only):
pleb \
--results-dir /data/pulsar_results \
--outdir-name run_cli_scan \
--set home_dir="/data/pulsars" \
--set dataset_name="DR3full" \
--set pulsars='["J1713+0747"]' \
--set branches='["main"]' \
--param-scan --scan-typical
Generate a QC report (CLI-only):
pleb qc-report --run-dir results/run_2024-01-01
Run mapping-driven ingest (CLI-only):
pleb ingest --mapping configs/catalogs/system_flags/system_flag_mapping.example.json \
--output-dir /data/pulsars
Compatibility notes (CLI-only):
--param-scanruns only the parameter scan; it does not run the full pipeline in the same invocation.pleb qc-reportdoes not run QC; it only reads existing QC CSVs and generates plots.pleb ingestis a standalone mode and does not run the pipeline.
Mode 2: CLI with a settings file¶
Settings files (TOML or JSON) are the primary way to configure runs. They map
directly to pleb.config.PipelineConfig and can be overridden with
--set on the CLI or by workflow steps.
Typical settings file (minimal):
# configs/runs/pipeline/minimal.toml
dataset_name = "DR3full"
home_dir = "/data/pulsars"
results_dir = "/data/pulsar_results"
outdir_name = "run_minimal"
pulsars = ["J1713+0747", "J1909-3744"]
branches = ["main"]
run_fix_dataset = true
run_tempo2 = true
run_pqc = false
Settings with QC and reporting:
# configs/runs/pipeline/qc.toml
dataset_name = "DR3full"
home_dir = "/data/pulsars"
results_dir = "/data/pulsar_results"
outdir_name = "run_qc"
pulsars = ["J1713+0747"]
branches = ["main"]
run_fix_dataset = true
run_tempo2 = true
run_pqc = true
pqc_backend_col = "group"
pqc_outlier_gate_enabled = true
pqc_outlier_gate_sigma = 6.0
Use with the CLI:
pleb --config configs/runs/pipeline/qc.toml
Override in place:
pleb --config configs/runs/pipeline/qc.toml \
--set outdir_name="run_qc_debug" \
--set pqc_outlier_gate_sigma=4.5
You can also take any CLI-only example from Mode 1 and move those values into
the settings file, then keep only the high-level CLI switches (for example
--qc or --param-scan).
Mode 3: Workflow files¶
Workflows coordinate multiple steps and optional loops. This enables iterative schemes (for example: run pipeline, apply fixes, re-run QC, repeat until stable).
Workflow file structure:
# configs/workflows/example_iterative.toml
config = "configs/runs/pipeline/test_all_steps.toml"
[[loops]]
name = "get_jumps"
max_iters = 5
steps = ["pipeline", "qc_report", "fix_apply"]
stop_if = [{ no_changes = true }, { qc_ok = true }]
[[loops]]
name = "check_params"
max_iters = 1
steps = ["param_scan", "qc_report", "fix_apply"]
stop_if = [{ no_changes = true }, { qc_ok = true }]
Run it:
pleb workflow --file configs/workflows/example_iterative.toml
Equivalent JSON workflow:
{
"config": "configs/runs/pipeline/test_all_steps.toml",
"loops": [
{
"name": "get_jumps",
"max_iters": 3,
"steps": ["pipeline", "qc_report", "fix_apply"],
"stop_if": [{"no_changes": true}, {"qc_ok": true}]
}
],
"steps": ["qc_report"]
}
Top-level steps (no loop):
config = "configs/runs/pipeline/test_all_steps.toml"
steps = ["pipeline", "qc_report"]
Per-step overrides:
config = "configs/runs/pipeline/test_all_steps.toml"
[[steps]]
name = "pipeline"
set = ["outdir_name=\"run_a\""]
[[steps]]
name = "qc_report"
overrides = { qc_report_backend = "EFF" }
Supported workflow step names:
ingest(mapping-driven ingest)pipeline(full pipeline run)fix_apply(pipeline run with fix-apply enabled)param_scan(parameter scan)qc_report(generate QC plots from latest pipeline output orrun_dir)
Stop conditions:
no_changes: stop when FixDataset reports zero changes. This uses the latestfix_dataset_summary.tsvfrom the run.qc_ok: stop when QC summary has zero flagged counts. This uses the latestqc_summary.tsv.
Stop conditions can be declared as a list of strings:
stop_if = ["no_changes", "qc_ok"]
Or as a list of dictionaries:
stop_if = [{ no_changes = true }, { qc_ok = true }]
Compatibility notes (workflows):
qc_reportrequires a priorpipelineor explicitrun_dirin the step definition.param_scanis independent; it does not run the full pipeline.fix_applyruns the pipeline with fix-apply enabled and can be placed anywhere in a loop.
Other ways to run Pleb (Python API)¶
For scripting or integration into larger systems, use the Python API:
from pleb.config import PipelineConfig
from pleb.pipeline import run_pipeline
from pleb.param_scan import run_param_scan
from pleb.qc_report import generate_qc_report
from pleb.ingest import ingest_dataset
cfg = PipelineConfig.load("configs/runs/pipeline/test_all_steps.toml")
out = run_pipeline(cfg)
run_param_scan(cfg, scan_typical=True)
generate_qc_report(run_dir=out["tag"])
ingest_dataset(mapping_file, output_root)
This is the main additional way to run Pleb beyond the CLI, settings files, and workflows.