Single-Pulsar Three-Pass Workflow
=================================

Once the separate stages are understood, the next step is to chain them into
one reproducible workflow file.

Workflow mode should not be the first introduction to the pipeline. It is most
useful after the individual stages are already understood.


The Three Passes
----------------

For one pulsar, a clean three-pass workflow is:

Pass 1
  Build the first coherent branch with system flags, jumps, and variants.

Pass 2
  Run tempo2 and PQC on top of Pass 1 without applying QC edits.

Pass 3
  Read the QC outputs from Pass 2 and comment flagged TOAs on a new branch.


The Three Run Profiles
----------------------

You should already have these files:

- ``configs/runs/fixdataset/single_pulsar_step1_fix.toml``
- ``configs/runs/pqc/single_pulsar_pqc_detect.toml``
- ``configs/runs/fixdataset/single_pulsar_pqc_apply.toml``

The third one is new and is shown below.

In practice, the workflow file works best when each referenced run profile can
also be executed directly on its own. That makes stage-level debugging much
simpler.


The QC-Apply Profile
--------------------

Example:
``configs/runs/fixdataset/single_pulsar_pqc_apply.toml``

Tracked repository example:
``configs/runs/fixdataset/single_pulsar_pqc_apply.example.toml``

.. code-block:: toml

   home_dir = "/data/canonical"
   dataset_name = "EPTA-DR3/epta-dr3-data"
   results_dir = "results"
   singularity_image = "/work/containers/psrpta.sif"

   branches = ["step3_apply_qc_comments"]
   reference_branch = "step3_apply_qc_comments"
   pulsars = ["J1909-3744"]
   jobs = 1
   outdir_name = "j1909_pqc_apply"

   run_tempo2 = false
   run_pqc = false
   qc_report = false
   make_plots = false
   make_reports = false
   make_covmat = false

   run_fix_dataset = true
   fix_apply = true
   fix_base_branch = "step2_pqc_balanced_detect"
   fix_branch_name = "step3_apply_qc_comments"
   fix_commit_message = "Step3: apply PQC comments for J1909-3744"

   fix_qc_results_dir = "results/j1909_pqc_detect/qc"
   fix_qc_branch = "step2_pqc_balanced_detect"

   fix_qc_remove_outliers = true
   fix_qc_action = "comment"
   fix_qc_outlier_cols = ["bad_point", "robust_outlier", "robust_global_outlier", "bad_mad"]
   fix_qc_remove_bad = true
   fix_qc_remove_transients = false
   fix_qc_remove_solar = false
   fix_qc_remove_orbital_phase = false

   fix_generate_alltim_variants = true
   fix_backend_classifications_path = "configs/catalogs/variants/backend_classifications_legacy_new.toml"
   fix_alltim_variants_path = "configs/catalogs/variants/alltim_variants_legacy_new.toml"
   fix_jump_reference_variants = true
   fix_jump_reference_jump_flag = "-sys"

This is the single-pulsar version of
``configs/workflows/steps/step3_apply_qc_comments_variants.toml``.


How To Explain The QC-Apply Keys
--------------------------------

``fix_qc_results_dir``
  Directory containing the QC outputs produced by the detect run.

``fix_qc_branch``
  Branch name that the QC outputs correspond to.

``fix_qc_remove_outliers``
  Enable QC-driven action.

``fix_qc_action = "comment"``
  Comment flagged TOAs rather than delete them. This is the recommended first
  policy.

``fix_qc_outlier_cols``
  Explicit QC columns that should count as actionable outlier evidence.

``fix_qc_remove_transients = false``
  Do not automatically comment transient or event families until their meaning
  has been reviewed explicitly in the QC outputs.


Why The Apply Stage Uses Explicit Outlier Columns
-------------------------------------------------

Action policy is distinct from detection policy.

For a first apply pass, a narrow explicit list like:

.. code-block:: toml

   fix_qc_outlier_cols = ["bad_point", "robust_outlier", "robust_global_outlier", "bad_mad"]

is better than a vague "anything suspicious" rule.

This keeps the first apply pass auditable.


How ``fix_qc_results_dir`` And ``fix_qc_branch`` Work Together
---------------------------------------------------------------

These two keys are easy to misunderstand.

``fix_qc_results_dir``
  Points to the run-directory location where the QC outputs were written.

``fix_qc_branch``
  Tells the apply stage which branch-specific QC subdirectory or context those
  outputs correspond to.

Together they define the hand-off from Step 2 to Step 3:

- Step 2 generates QC outputs under its run directory,
- Step 3 reads those outputs back in and applies the chosen action policy to a
  new branch.

If these paths are wrong, the apply stage can appear to ignore QC results even
though the real problem is that it is reading the wrong run location.


The Workflow File
-----------------

Once the three run profiles exist, make a workflow file under
``configs/workflows/``.

Example:
``configs/workflows/single_pulsar_3pass.toml``

Tracked repository example:
``configs/workflows/single_pulsar_3pass.example.toml``

.. code-block:: toml

   config = "configs/runs/fixdataset/single_pulsar_step1_fix.toml"
   mode = "serial"

   [[groups]]
   name = "step1_fix_flags_and_jumps"
   mode = "serial"

   [[groups.steps]]
   name = "pipeline"
   config = "configs/runs/fixdataset/single_pulsar_step1_fix.toml"

   [[groups]]
   name = "step2_detect"
   mode = "serial"

   [[groups.steps]]
   name = "pipeline"
   config = "configs/runs/pqc/single_pulsar_pqc_detect.toml"

   [[groups]]
   name = "step3_apply"
   mode = "serial"

   [[groups.steps]]
   name = "pipeline"
   config = "configs/runs/fixdataset/single_pulsar_pqc_apply.toml"

This is the stripped-down single-pulsar form of the repository's branch-chained
workflow pattern in ``configs/workflows/branch_chained_fix_pqc_variants.toml``.


How Run Directories And Branches Relate In The Workflow
-------------------------------------------------------

The workflow coordinates two parallel pieces of state:

- dataset branches,
- run directories under ``results_dir``.

These are related, but they are not the same thing.

Example sequence:

- Pass 1 writes branch ``step1_fix_flags_variants`` and run directory
  ``results/j1909_step1_fix``,
- Pass 2 writes branch ``step2_pqc_balanced_detect`` and run directory
  ``results/j1909_pqc_detect``,
- Pass 3 writes branch ``step3_apply_qc_comments`` and run directory
  ``results/j1909_pqc_apply``.

The branch names define the mutation history of the dataset. The run
directories define where logs, summaries, plots, and QC products are stored.


How To Run The Workflow
-----------------------

Run:

.. code-block:: bash

   pleb workflow --file configs/workflows/single_pulsar_3pass.toml

This is most useful after the stages have already been run manually at least
once.


When To Prefer Manual Runs Over The Workflow File
-------------------------------------------------

Use the workflow file when:

- the stage order is stable,
- the branch hand-off is already understood,
- the goal is repeatability.

Run stages manually when:

- a config is still being tuned,
- a branch name or output path is changing,
- the detect/apply hand-off is being debugged,
- one stage is failing and needs isolated inspection.


Why Branch Chaining Matters
---------------------------

Each pass starts from the previous branch and writes a new branch:

- ``raw_ingest`` -> ``step1_fix_flags_variants``
- ``step1_fix_flags_variants`` -> ``step2_pqc_balanced_detect``
- ``step2_pqc_balanced_detect`` -> ``step3_apply_qc_comments``

This branch pattern matters because it preserves the logic of each stage:

- Step 1 changes metadata and jump structure,
- Step 2 generates diagnostics,
- Step 3 applies selected QC actions.


Debugging Workflow Mode
-----------------------

If the full workflow fails, do not debug the whole workflow at once.

Instead:

1. identify which pass failed,
2. run that pass directly with ``pleb --config ...``,
3. inspect the resolved config in ``run_settings/``,
4. fix the stage-specific issue,
5. rerun the workflow.

This avoids treating workflow mode like a black box.


Final Rule
----------

A workflow file is a convenience layer, not a substitute for understanding the
individual run profiles.

If each of the three run profiles cannot be explained independently, the
workflow file is still too opaque for routine use.


Related Documentation
---------------------

- workflow mode overview: :doc:`../running_modes`
- branch-chained workflow examples: :doc:`../configuration`
- repository example workflow:
  ``configs/workflows/branch_chained_fix_pqc_variants.toml``