Overview And Scope¶
This section defines the scope of the manual and the stage boundaries it uses.
What This Manual Covers¶
By the end of the exercise, a user should understand five separate things:
the data tree,
the config tree,
the branch chain,
the difference between detection and mutation,
the meaning of backend grouping for PQC.
If any of these are missing, it is still possible to repeat commands mechanically, but not to debug or extend the workflow with confidence.
The Stages In Plain Terms¶
The pipeline is easiest to understand as four distinct operations.
- Stage 1: ingest
Build a canonical pulsar directory tree from whatever raw layout the source data currently lives in.
- Stage 2: first FixDataset pass
Normalize flags, insert missing jumps, prune stale jumps, optionally create variant products, and commit those edits to a new branch.
- Stage 3: PQC detect run
Run tempo2 and QC detectors to produce review products. This stage should usually not edit the data files.
- Stage 4: QC apply run
Read the QC products from Stage 3 and comment or delete selected TOAs. Start with comment-only.
Minimum Single-Pulsar Story¶
For one pulsar, the simplest robust story is:
ingest raw files into a canonical tree,
create a branch with system flags and jumps in place,
run tempo2 and PQC on that branch,
inspect QC outputs,
apply comments to flagged TOAs on a new branch,
rerun if needed.
This is slower than a one-line workflow, but it keeps the stage boundaries clear and auditable.
Stage Output Matrix¶
The four stages write different kinds of outputs. Keeping them separate is one of the main design strengths of the workflow.
Stage |
Mutates dataset branch |
Writes run outputs |
Writes QC outputs |
Main purpose |
|---|---|---|---|---|
ingest |
yes |
limited |
no |
build canonical pulsar tree from raw source layout |
Step 1 FixDataset |
yes |
yes |
no |
normalize flags, maintain jumps, generate variants |
Step 2 detect |
usually yes, but not through QC action |
yes |
yes |
run tempo2 and PQC, generate review products |
Step 3 apply |
yes |
yes |
no new detection |
apply chosen QC action policy to a new branch |
Glossary¶
backendA canonical observing-system identity used in filenames, grouping, and often jump logic. During ingest this is defined explicitly by the mapping file.
system flag/-sysA tim-file flag used to represent system identity after harmonization. It is often used both for jump maintenance and as a QC grouping column.
groupA broader grouping label than an individual backend in some datasets. In some analyses it is used as the QC grouping column instead of
sys.branchA git branch in the canonical dataset repository. Branches capture dataset mutation history across ingest, FixDataset passes, and apply stages.
run directoryA directory under
results_dircontaining outputs from one invocation: resolved config, summaries, plots, QC tables, and logs.detect stageA run that produces QC evidence and diagnostics without using QC outputs as mutation instructions.
apply stageA later FixDataset run that reads previously generated QC outputs and turns selected QC columns into comments or deletions.
variantAn alternate include or reference product generated by FixDataset using the configured classification and variant catalogs. The standard outputs are
<PSR>_<variant>_all.timand, when jump-reference generation is enabled,<PSR>_<variant>.par.reference_branchThe branch used as a comparison anchor and, in practice, often the branch that the run is organized around.
pqc_backend_colThe QC grouping column forwarded to PQC. This choice controls how many thresholds and summaries are interpreted.
Why PQC Exists¶
PQC is often misunderstood as “a tool that removes bad TOAs.” That is not the right mental model.
PQC is useful because single-pulsar timing data often contains behavior that should be reviewed explicitly:
isolated bad measurements,
backend-specific outliers,
step-like changes,
DM-like step changes,
solar-angle systematics,
orbital-phase structure,
transient-like events.
The job of PQC is to create evidence and diagnostics. The job of the operator is to decide what action, if any, should follow.
Why FixDataset Exists¶
FixDataset and PQC are easy to conflate if the stage boundary is not explained.
FixDataset is the mutation layer. It is used to:
normalize
-sys,-group, and-ptaflags,insert jumps that should exist but do not,
prune jumps that no longer correspond to data,
enforce a baseline parfile policy,
generate consistent
<PSR>_<variant>_all.timproducts,apply QC decisions back to tim files.
PQC flags things. FixDataset changes things.
Non-Destructive First Rule¶
For an initial workflow:
do not delete TOAs immediately,
use new branches for each mutation pass,
use
fix_qc_action = "comment"first,preserve the previous branch so outputs can be diffed across stages.
This is the safest way to preserve cause and effect across stages.
The End State¶
For one pulsar, a good final target is:
one ingest mapping JSON,
one ingest run profile,
one first-pass FixDataset profile,
one PQC detection profile,
one QC-apply profile,
optionally one workflow file that chains those steps.
That produces a reproducible, inspectable path from raw inputs to a reviewable cleaned branch.