Overview And Scope¶

This section defines the scope of the manual and the stage boundaries it uses.

What This Manual Covers¶

By the end of the exercise, a user should understand five separate things:

the data tree,
the config tree,
the branch chain,
the difference between detection and mutation,
the meaning of backend grouping for PQC.

If any of these are missing, it is still possible to repeat commands mechanically, but not to debug or extend the workflow with confidence.

The Stages In Plain Terms¶

The pipeline is easiest to understand as four distinct operations.

Stage 1: ingest: Build a canonical pulsar directory tree from whatever raw layout the source data currently lives in.
Stage 2: first FixDataset pass: Normalize flags, insert missing jumps, prune stale jumps, optionally create variant products, and commit those edits to a new branch.
Stage 3: PQC detect run: Run tempo2 and QC detectors to produce review products. This stage should usually not edit the data files.
Stage 4: QC apply run: Read the QC products from Stage 3 and comment or delete selected TOAs. Start with comment-only.

Minimum Single-Pulsar Story¶

For one pulsar, the simplest robust story is:

ingest raw files into a canonical tree,
create a branch with system flags and jumps in place,
run tempo2 and PQC on that branch,
inspect QC outputs,
apply comments to flagged TOAs on a new branch,
rerun if needed.

This is slower than a one-line workflow, but it keeps the stage boundaries clear and auditable.

Stage Output Matrix¶

The four stages write different kinds of outputs. Keeping them separate is one of the main design strengths of the workflow.

Stage	Mutates dataset branch	Writes run outputs	Writes QC outputs	Main purpose
ingest	yes	limited	no	build canonical pulsar tree from raw source layout
Step 1 FixDataset	yes	yes	no	normalize flags, maintain jumps, generate variants
Step 2 detect	usually yes, but not through QC action	yes	yes	run tempo2 and PQC, generate review products
Step 3 apply	yes	yes	no new detection	apply chosen QC action policy to a new branch

Glossary¶

backend: A canonical observing-system identity used in filenames, grouping, and often jump logic. During ingest this is defined explicitly by the mapping file.
system flag / -sys: A tim-file flag used to represent system identity after harmonization. It is often used both for jump maintenance and as a QC grouping column.
group: A broader grouping label than an individual backend in some datasets. In some analyses it is used as the QC grouping column instead of sys.
branch: A git branch in the canonical dataset repository. Branches capture dataset mutation history across ingest, FixDataset passes, and apply stages.
run directory: A directory under results_dir containing outputs from one invocation: resolved config, summaries, plots, QC tables, and logs.
detect stage: A run that produces QC evidence and diagnostics without using QC outputs as mutation instructions.
apply stage: A later FixDataset run that reads previously generated QC outputs and turns selected QC columns into comments or deletions.
variant: An alternate include or reference product generated by FixDataset using the configured classification and variant catalogs. The standard outputs are <PSR>_<variant>_all.tim and, when jump-reference generation is enabled, <PSR>_<variant>.par.
reference_branch: The branch used as a comparison anchor and, in practice, often the branch that the run is organized around.
pqc_backend_col: The QC grouping column forwarded to PQC. This choice controls how many thresholds and summaries are interpreted.

Why PQC Exists¶

PQC is often misunderstood as “a tool that removes bad TOAs.” That is not the right mental model.

PQC is useful because single-pulsar timing data often contains behavior that should be reviewed explicitly:

isolated bad measurements,
backend-specific outliers,
step-like changes,
DM-like step changes,
solar-angle systematics,
orbital-phase structure,
transient-like events.

The job of PQC is to create evidence and diagnostics. The job of the operator is to decide what action, if any, should follow.

Why FixDataset Exists¶

FixDataset and PQC are easy to conflate if the stage boundary is not explained.

FixDataset is the mutation layer. It is used to:

normalize -sys, -group, and -pta flags,
insert jumps that should exist but do not,
prune jumps that no longer correspond to data,
enforce a baseline parfile policy,
generate consistent <PSR>_<variant>_all.tim products,
apply QC decisions back to tim files.

PQC flags things. FixDataset changes things.

Non-Destructive First Rule¶

For an initial workflow:

do not delete TOAs immediately,
use new branches for each mutation pass,
use fix_qc_action = "comment" first,
preserve the previous branch so outputs can be diffed across stages.

This is the safest way to preserve cause and effect across stages.

The End State¶

For one pulsar, a good final target is:

one ingest mapping JSON,
one ingest run profile,
one first-pass FixDataset profile,
one PQC detection profile,
one QC-apply profile,
optionally one workflow file that chains those steps.

That produces a reproducible, inspectable path from raw inputs to a reviewable cleaned branch.

Overview And Scope¶

What This Manual Covers¶

The Stages In Plain Terms¶

Minimum Single-Pulsar Story¶

Stage Output Matrix¶

Glossary¶

Why PQC Exists¶

Why FixDataset Exists¶

Non-Destructive First Rule¶

The End State¶

pleb - The EPTA Data Combination Pipeline

Navigation

Related Topics

Overview And Scope¶

What This Manual Covers¶

The Stages In Plain Terms¶

Minimum Single-Pulsar Story¶

Stage Output Matrix¶

Glossary¶

Why PQC Exists¶

Why FixDataset Exists¶

Non-Destructive First Rule¶

The End State¶

Related Documentation¶