Overview And Scope

This section defines the scope of the manual and the stage boundaries it uses.

What This Manual Covers

By the end of the exercise, a user should understand five separate things:

  1. the data tree,

  2. the config tree,

  3. the branch chain,

  4. the difference between detection and mutation,

  5. the meaning of backend grouping for PQC.

If any of these are missing, it is still possible to repeat commands mechanically, but not to debug or extend the workflow with confidence.

The Stages In Plain Terms

The pipeline is easiest to understand as four distinct operations.

Stage 1: ingest

Build a canonical pulsar directory tree from whatever raw layout the source data currently lives in.

Stage 2: first FixDataset pass

Normalize flags, insert missing jumps, prune stale jumps, optionally create variant products, and commit those edits to a new branch.

Stage 3: PQC detect run

Run tempo2 and QC detectors to produce review products. This stage should usually not edit the data files.

Stage 4: QC apply run

Read the QC products from Stage 3 and comment or delete selected TOAs. Start with comment-only.

Minimum Single-Pulsar Story

For one pulsar, the simplest robust story is:

  1. ingest raw files into a canonical tree,

  2. create a branch with system flags and jumps in place,

  3. run tempo2 and PQC on that branch,

  4. inspect QC outputs,

  5. apply comments to flagged TOAs on a new branch,

  6. rerun if needed.

This is slower than a one-line workflow, but it keeps the stage boundaries clear and auditable.

Stage Output Matrix

The four stages write different kinds of outputs. Keeping them separate is one of the main design strengths of the workflow.

Stage

Mutates dataset branch

Writes run outputs

Writes QC outputs

Main purpose

ingest

yes

limited

no

build canonical pulsar tree from raw source layout

Step 1 FixDataset

yes

yes

no

normalize flags, maintain jumps, generate variants

Step 2 detect

usually yes, but not through QC action

yes

yes

run tempo2 and PQC, generate review products

Step 3 apply

yes

yes

no new detection

apply chosen QC action policy to a new branch

Glossary

backend

A canonical observing-system identity used in filenames, grouping, and often jump logic. During ingest this is defined explicitly by the mapping file.

system flag / -sys

A tim-file flag used to represent system identity after harmonization. It is often used both for jump maintenance and as a QC grouping column.

group

A broader grouping label than an individual backend in some datasets. In some analyses it is used as the QC grouping column instead of sys.

branch

A git branch in the canonical dataset repository. Branches capture dataset mutation history across ingest, FixDataset passes, and apply stages.

run directory

A directory under results_dir containing outputs from one invocation: resolved config, summaries, plots, QC tables, and logs.

detect stage

A run that produces QC evidence and diagnostics without using QC outputs as mutation instructions.

apply stage

A later FixDataset run that reads previously generated QC outputs and turns selected QC columns into comments or deletions.

variant

An alternate include or reference product generated by FixDataset using the configured classification and variant catalogs. The standard outputs are <PSR>_<variant>_all.tim and, when jump-reference generation is enabled, <PSR>_<variant>.par.

reference_branch

The branch used as a comparison anchor and, in practice, often the branch that the run is organized around.

pqc_backend_col

The QC grouping column forwarded to PQC. This choice controls how many thresholds and summaries are interpreted.

Why PQC Exists

PQC is often misunderstood as “a tool that removes bad TOAs.” That is not the right mental model.

PQC is useful because single-pulsar timing data often contains behavior that should be reviewed explicitly:

  • isolated bad measurements,

  • backend-specific outliers,

  • step-like changes,

  • DM-like step changes,

  • solar-angle systematics,

  • orbital-phase structure,

  • transient-like events.

The job of PQC is to create evidence and diagnostics. The job of the operator is to decide what action, if any, should follow.

Why FixDataset Exists

FixDataset and PQC are easy to conflate if the stage boundary is not explained.

FixDataset is the mutation layer. It is used to:

  • normalize -sys, -group, and -pta flags,

  • insert jumps that should exist but do not,

  • prune jumps that no longer correspond to data,

  • enforce a baseline parfile policy,

  • generate consistent <PSR>_<variant>_all.tim products,

  • apply QC decisions back to tim files.

PQC flags things. FixDataset changes things.

Non-Destructive First Rule

For an initial workflow:

  • do not delete TOAs immediately,

  • use new branches for each mutation pass,

  • use fix_qc_action = "comment" first,

  • preserve the previous branch so outputs can be diffed across stages.

This is the safest way to preserve cause and effect across stages.

The End State

For one pulsar, a good final target is:

  • one ingest mapping JSON,

  • one ingest run profile,

  • one first-pass FixDataset profile,

  • one PQC detection profile,

  • one QC-apply profile,

  • optionally one workflow file that chains those steps.

That produces a reproducible, inspectable path from raw inputs to a reviewable cleaned branch.