Running Pleb
============

This chapter explains the three supported ways to run Pleb:

1. Directly via the command-line interface (CLI).
2. Via a settings file (TOML/JSON config).
3. Via a workflow file (TOML/JSON) that orchestrates multiple steps.

It also lists additional entry points (Python API) and how the modes
interoperate.

Overview of modes
-----------------

CLI (no workflow) is the simplest path: you pass a settings file and optionally
override values inline. Workflows build on settings by adding a sequence of
steps and optional loops.

Supported entry points today:

- ``pleb`` (main CLI, runs the pipeline by default)
- ``pleb qc-report`` (generate QC plots from an existing run directory)
- ``pleb ingest`` (mapping-driven ingest mode)
- ``pleb workflow`` (multi-step sequences and loops)
- Python API (``pleb.pipeline.run_pipeline``, ``pleb.param_scan.run_param_scan``,
  ``pleb.qc_report.generate_qc_report``, ``pleb.ingest.ingest_dataset``)

If you want a new entry point (for example, a minimal QC-only CLI), add a
workflow step or wrap the Python API in your own script.


Mode 1: Direct CLI (no settings file)
-------------------------------------

You can run the CLI without a settings file by supplying the core parameters
via flags. This is best for quick, one-off runs.

Minimal run (CLI-only):

.. code-block:: bash

   pleb \
     --results-dir /data/pulsar_results \
     --outdir-name run_cli_only \
     --set home_dir="/data/pulsars" \
     --set dataset_name="DR3full" \
     --set pulsars='["J1713+0747","J1909-3744"]' \
     --set branches='["main"]'

Enable QC and change options:

.. code-block:: bash

   pleb \
     --results-dir /data/pulsar_results \
     --outdir-name run_cli_qc \
     --set home_dir="/data/pulsars" \
     --set dataset_name="DR3full" \
     --set pulsars='["J1713+0747"]' \
     --set branches='["main"]' \
     --qc \
     --qc-structure-mode both \
     --qc-outlier-gate

Parameter scan (CLI-only):

.. code-block:: bash

   pleb \
     --results-dir /data/pulsar_results \
     --outdir-name run_cli_scan \
     --set home_dir="/data/pulsars" \
     --set dataset_name="DR3full" \
     --set pulsars='["J1713+0747"]' \
     --set branches='["main"]' \
     --param-scan --scan-typical

Generate a QC report (CLI-only):

.. code-block:: bash

   pleb qc-report --run-dir results/run_2024-01-01

Run mapping-driven ingest (CLI-only):

.. code-block:: bash

   pleb ingest --mapping configs/catalogs/system_flags/system_flag_mapping.example.json \
     --output-dir /data/pulsars

Compatibility notes (CLI-only):

- ``--param-scan`` runs only the parameter scan; it does not run the full
  pipeline in the same invocation.
- ``pleb qc-report`` does not run QC; it only reads existing QC CSVs and
  generates plots.
- ``pleb ingest`` is a standalone mode and does not run the pipeline.


Mode 2: CLI with a settings file
--------------------------------

Settings files (TOML or JSON) are the primary way to configure runs. They map
directly to :class:`pleb.config.PipelineConfig` and can be overridden with
``--set`` on the CLI or by workflow steps.

Typical settings file (minimal):

.. code-block:: toml

   # configs/runs/pipeline/minimal.toml
   dataset_name = "DR3full"
   home_dir = "/data/pulsars"
   results_dir = "/data/pulsar_results"
   outdir_name = "run_minimal"

   pulsars = ["J1713+0747", "J1909-3744"]
   branches = ["main"]

   run_fix_dataset = true
   run_tempo2 = true
   run_pqc = false

Settings with QC and reporting:

.. code-block:: toml

   # configs/runs/pipeline/qc.toml
   dataset_name = "DR3full"
   home_dir = "/data/pulsars"
   results_dir = "/data/pulsar_results"
   outdir_name = "run_qc"

   pulsars = ["J1713+0747"]
   branches = ["main"]

   run_fix_dataset = true
   run_tempo2 = true
   run_pqc = true
   pqc_backend_col = "group"
   pqc_outlier_gate_enabled = true
   pqc_outlier_gate_sigma = 6.0

Use with the CLI:

.. code-block:: bash

   pleb --config configs/runs/pipeline/qc.toml

Override in place:

.. code-block:: bash

   pleb --config configs/runs/pipeline/qc.toml \
     --set outdir_name="run_qc_debug" \
     --set pqc_outlier_gate_sigma=4.5

You can also take any CLI-only example from Mode 1 and move those values into
the settings file, then keep only the high-level CLI switches (for example
``--qc`` or ``--param-scan``).


Mode 3: Workflow files
----------------------

Workflows coordinate multiple steps and optional loops. This enables
iterative schemes (for example: run pipeline, apply fixes, re-run QC,
repeat until stable).

Workflow file structure:

.. code-block:: toml

   # configs/workflows/example_iterative.toml
   config = "configs/runs/pipeline/test_all_steps.toml"

   [[loops]]
   name = "get_jumps"
   max_iters = 5
   steps = ["pipeline", "qc_report", "fix_apply"]
   stop_if = [{ no_changes = true }, { qc_ok = true }]

   [[loops]]
   name = "check_params"
   max_iters = 1
   steps = ["param_scan", "qc_report", "fix_apply"]
   stop_if = [{ no_changes = true }, { qc_ok = true }]

Run it:

.. code-block:: bash

   pleb workflow --file configs/workflows/example_iterative.toml

Equivalent JSON workflow:

.. code-block:: json

   {
     "config": "configs/runs/pipeline/test_all_steps.toml",
     "loops": [
       {
         "name": "get_jumps",
         "max_iters": 3,
         "steps": ["pipeline", "qc_report", "fix_apply"],
         "stop_if": [{"no_changes": true}, {"qc_ok": true}]
       }
     ],
     "steps": ["qc_report"]
   }

Top-level steps (no loop):

.. code-block:: toml

   config = "configs/runs/pipeline/test_all_steps.toml"

   steps = ["pipeline", "qc_report"]

Per-step overrides:

.. code-block:: toml

   config = "configs/runs/pipeline/test_all_steps.toml"

   [[steps]]
   name = "pipeline"
   set = ["outdir_name=\"run_a\""]

   [[steps]]
   name = "qc_report"
   overrides = { qc_report_backend = "EFF" }

Supported workflow step names:

- ``ingest`` (mapping-driven ingest)
- ``pipeline`` (full pipeline run)
- ``fix_apply`` (pipeline run with fix-apply enabled)
- ``param_scan`` (parameter scan)
- ``qc_report`` (generate QC plots from latest pipeline output or ``run_dir``)

Stop conditions:

- ``no_changes``: stop when FixDataset reports zero changes. This uses the
  latest ``fix_dataset_summary.tsv`` from the run.
- ``qc_ok``: stop when QC summary has zero flagged counts. This uses the
  latest ``qc_summary.tsv``.

Stop conditions can be declared as a list of strings:

.. code-block:: toml

   stop_if = ["no_changes", "qc_ok"]

Or as a list of dictionaries:

.. code-block:: toml

   stop_if = [{ no_changes = true }, { qc_ok = true }]

Compatibility notes (workflows):

- ``qc_report`` requires a prior ``pipeline`` or explicit ``run_dir`` in the
  step definition.
- ``param_scan`` is independent; it does not run the full pipeline.
- ``fix_apply`` runs the pipeline with fix-apply enabled and can be placed
  anywhere in a loop.


Other ways to run Pleb (Python API)
----------------------------------

For scripting or integration into larger systems, use the Python API:

.. code-block:: python

   from pleb.config import PipelineConfig
   from pleb.pipeline import run_pipeline
   from pleb.param_scan import run_param_scan
   from pleb.qc_report import generate_qc_report
   from pleb.ingest import ingest_dataset

   cfg = PipelineConfig.load("configs/runs/pipeline/test_all_steps.toml")
   out = run_pipeline(cfg)

   run_param_scan(cfg, scan_typical=True)
   generate_qc_report(run_dir=out["tag"])
   ingest_dataset(mapping_file, output_root)

This is the main additional way to run Pleb beyond the CLI, settings files,
and workflows.