PQC Detection: Run QC Without Editing The Data

This page covers the detect stage.

The goal of this stage is:

  • run tempo2 on a coherent branch,

  • run PQC detectors,

  • generate review products,

  • do not yet apply edits to tim files.

In this workflow, the detect stage is the first stage where residual behavior is examined systematically rather than only structurally.

Why Run PQC At All

For a single pulsar, PQC is useful because it surfaces structure that is easy to miss in a basic residual plot.

Examples:

  • one or two isolated bad measurements,

  • a problematic backend with heavy-tailed residuals,

  • a step in time,

  • a DM-like step,

  • solar-angle contamination,

  • orbital-phase dependence in binary systems,

  • broader transient structure such as bumps or glitches.

Without PQC, a run may only show that “the fit looks messy.” With PQC, the output provides a set of specific hypotheses and tables to inspect.

The Detect Profile

Create a profile under configs/runs/pqc/.

Example: configs/runs/pqc/single_pulsar_pqc_detect.toml

Tracked repository example: configs/runs/pqc/single_pulsar_pqc_detect.example.toml

home_dir = "/data/canonical"
dataset_name = "EPTA-DR3/epta-dr3-data"
results_dir = "results"
singularity_image = "/work/containers/psrpta.sif"

branches = ["step2_pqc_balanced_detect"]
reference_branch = "step2_pqc_balanced_detect"
pulsars = ["J1909-3744"]
jobs = 1
outdir_name = "j1909_pqc_detect"

run_tempo2 = true
make_plots = true
make_reports = true
make_covmat = true

run_fix_dataset = true
fix_apply = true
fix_base_branch = "step1_fix_flags_variants"
fix_branch_name = "step2_pqc_balanced_detect"
fix_commit_message = "Step2: PQC detection branch for J1909-3744"

fix_qc_remove_outliers = false

fix_generate_alltim_variants = true
fix_backend_classifications_path = "configs/catalogs/variants/backend_classifications_legacy_new.toml"
fix_alltim_variants_path = "configs/catalogs/variants/alltim_variants_legacy_new.toml"
fix_jump_reference_variants = true
fix_jump_reference_jump_flag = "-sys"

run_pqc = true
qc_report = true
qc_report_backend_col = "sys"
qc_report_compact_pdf = true
qc_report_compact_outlier_cols = ["bad_point", "robust_outlier", "robust_global_outlier", "bad_mad"]

pqc_backend_col = "sys"
pqc_drop_unmatched = false
pqc_merge_tol_seconds = 10.0

pqc_add_orbital_phase = true
pqc_add_solar_elongation = true
pqc_add_elevation = true
pqc_add_airmass = true
pqc_add_parallactic_angle = true
pqc_add_freq_bin = true
pqc_freq_bins = 20

pqc_structure_mode = "both"
pqc_structure_detrend_features = ["solar_elongation_deg", "orbital_phase", "freq_bin"]
pqc_structure_test_features = ["solar_elongation_deg", "orbital_phase", "freq_bin"]
pqc_structure_group_cols = "sys"
pqc_structure_nbins = 12
pqc_structure_min_per_bin = 3
pqc_structure_p_thresh = 0.01

pqc_robust_enabled = true
pqc_robust_scope = "backend"
pqc_robust_z_thresh = 6.0
pqc_outlier_gate_enabled = false
pqc_event_instrument = true

pqc_tau_corr_minutes = 30.0
pqc_fdr_q = 0.01
pqc_mark_only_worst_per_day = true
pqc_tau_rec_days = 7.0
pqc_window_mult = 5.0
pqc_min_points = 6
pqc_delta_chi2_thresh = 25.0

pqc_step_enabled = true
pqc_step_min_points = 20
pqc_step_delta_chi2_thresh = 25.0
pqc_step_scope = "both"

pqc_dm_step_enabled = true
pqc_dm_step_min_points = 20
pqc_dm_step_delta_chi2_thresh = 25.0
pqc_dm_step_scope = "both"

pqc_solar_events_enabled = true
pqc_orbital_phase_cut_enabled = true
pqc_eclipse_events_enabled = true
pqc_gaussian_bump_enabled = true
pqc_glitch_enabled = true

This is the single-pulsar version of the repository’s balanced detect pattern from configs/workflows/steps/step2_pqc_balanced_detect.toml.

Why This Profile Still Has FixDataset Enabled

One point that often needs clarification is why run_fix_dataset = true appears in a detect profile.

In this pattern, Step 2 still creates a branch boundary and can still regenerate variant products, but the QC-apply action itself is disabled.

The important switch is:

fix_qc_remove_outliers = false

That means the run is detection-oriented, not action-oriented.

Operationally, this profile serves three purposes at once:

  • it establishes a distinct Step-2 branch,

  • it keeps variant products available on that branch if they are needed,

  • it runs tempo2 and PQC without using QC flags as mutation instructions.

How To Explain The Main PQC Keys

Grouping keys:

pqc_backend_col

The column used to define backend groups for many detectors. For this workflow, "sys" is a common choice after Step 1 has harmonized -sys values. This key is one of the most consequential choices in the run because many QC summaries and thresholds are interpreted within this grouping.

Matching keys:

pqc_merge_tol_seconds

Tolerance used when matching TOAs and tim metadata. If this is too small, valid matches may fail. If it is too large, unrelated rows may be merged incorrectly.

Feature-engineering keys:

pqc_add_orbital_phase

Compute orbital phase when binary parameters support it.

pqc_add_solar_elongation

Add solar-angle information used by solar-structure checks.

pqc_add_freq_bin and pqc_freq_bins

Add a coarse frequency-bin feature for structure tests and diagnostics.

Structure keys:

pqc_structure_mode

Whether to detrend against features, test for structure, or both. "both" is often the most informative starting point because it keeps the output diagnostically rich.

pqc_structure_group_cols

Grouping used in the structure stage.

Robust outlier keys:

pqc_robust_enabled

Enable MAD-style robust outlier detection.

pqc_robust_scope

Compute robust statistics globally, by backend, or both.

pqc_robust_z_thresh

Threshold controlling sensitivity. Lower values increase sensitivity and false positives; higher values suppress marginal outliers.

Transient and event keys:

pqc_step_*, pqc_dm_step_*

Step and DM-step sensitivity.

pqc_solar_*, pqc_orbital_phase_*, pqc_eclipse_*

Domain-specific structure or event detectors.

pqc_gaussian_bump_*, pqc_glitch_*

Broader transient family detectors.

How To Pick A First PQC Strategy

For an initial run, start with a balanced profile:

  • moderate false-positive control,

  • robust outliers enabled,

  • common event detectors enabled,

  • comment-only downstream action,

  • compact report enabled.

Avoid two extremes in the initial pass:

  • too conservative, where almost nothing is flagged,

  • too aggressive, where the output is dominated by low-value alerts.

How To Decide Whether A Detector Family Belongs In The First Run

The broad detector families in the balanced example do not all answer the same question.

Use them selectively:

  • robust outlier detection: useful in almost every initial pass because isolated outliers are common and easy to interpret,

  • structure testing: useful early because it helps reveal residual dependence on frequency, solar angle, or orbital phase,

  • step and DM-step detectors: useful when backend changes or dispersive state changes are plausible,

  • solar and orbital-phase detectors: most useful when the pulsar, cadence, and observing geometry make those structures physically plausible,

  • bump and glitch detectors: useful when there is already reason to suspect broader transient structure, but not always necessary in the very first exploratory run.

How To Run The Detect Stage

Run:

pleb --config configs/runs/pqc/single_pulsar_pqc_detect.toml

What To Inspect After PQC

Inspect the run directory in this order:

  1. run_settings/ Confirm the exact command and resolved config.

  2. tempo2 output products Confirm the branch and pulsar were processed correctly.

  3. qc/ outputs Inspect QC CSVs and summaries.

  4. compact report products Review the summary before opening every raw table.

Core point:

Do not start from individual flagged TOAs. Start from the summaries, then move to backend-specific evidence.

What The QC Outputs Are For

The QC outputs serve at least four different purposes:

  • triage: determine whether the pulsar has a small number of isolated issues or a larger structural problem,

  • localization: determine whether the issue is tied to one backend, one time range, one frequency range, or one orbital/solar regime,

  • comparison: compare the effect of threshold changes or backend overrides across reruns,

  • hand-off: provide explicit inputs for the later QC-apply stage through fix_qc_results_dir and fix_qc_branch.

What Not To Conclude Too Early

Keep these points in mind:

  • a flag is not proof of astrophysical pathology,

  • a flag is not automatically a deletion command,

  • backend grouping choices affect what is flagged,

  • threshold changes must be justified and documented,

  • report outputs and action policy are separate layers.

The Apply Stage Comes Later

Once the detect outputs can be read confidently, the next stage is a separate FixDataset pass that points at the QC results directory from this run.

That separation is critical. Detection should be inspectable on its own.

How To Choose A Small First Detector Set

The balanced example above enables a broad set of detectors because it mirrors the repository’s existing pattern. For a very first run on a new pulsar, it is also reasonable to begin with a narrower subset and then expand.

A conservative first subset is:

  • pqc_robust_enabled = true

  • pqc_structure_mode = "both"

  • pqc_step_enabled = true

  • pqc_dm_step_enabled = true

  • pqc_solar_events_enabled = true only if solar-angle structure is a real concern for the pulsar and cadence

Then add:

  • orbital-phase diagnostics for binary systems,

  • eclipse diagnostics when physically relevant,

  • bump and glitch detectors when the residual history suggests transient behavior worth testing explicitly.

The reason to expand gradually is interpretability. When many detector families fire in the first run, it is harder to tell which family is driving the output.

How This Relates To Action Policy

The detect profile defines what evidence is produced. It does not define what will later be commented or deleted. That later decision is made by fix_qc_* keys during the apply stage.

In practical terms:

  • pqc_* determines what the QC stage measures and flags,

  • qc_report_* determines how those QC outputs are summarized for review,

  • fix_qc_* determines which QC columns later trigger comments or deletions.

Keeping those three layers conceptually separate makes the workflow much easier to tune and audit.

For the separation between detector strategy, action strategy, and report strategy, see Configuration Reference.