API Reference¶

Run the data-combination diagnostics pipeline.

This package exposes a small public API for running the full pipeline, running parameter scans, and applying FixDataset operations programmatically. The implementation is refactored from the original notebooks and designed to import quickly; heavy dependencies (GitPython, libstempo/pqc) are imported lazily by entry points.

Examples

Run the pipeline programmatically:

from pathlib import Path
from pleb import PipelineConfig, run_pipeline

cfg = PipelineConfig(
    home_dir=Path("/data/epta"),
    singularity_image=Path("/images/tempo2.sif"),
    dataset_name="EPTA",
)
outputs = run_pipeline(cfg)

Run a parameter scan:

from pleb import PipelineConfig, run_param_scan

cfg = PipelineConfig(
    home_dir=Path("/data/epta"),
    singularity_image=Path("/images/tempo2.sif"),
    dataset_name="EPTA",
    param_scan_typical=True,
)
results = run_param_scan(cfg)

See also

pleb.pipeline.run_pipeline: Full pipeline implementation. pleb.param_scan.run_param_scan: Parameter scan runner. pleb.dataset_fix: FixDataset helpers.

Internal Modules (No Index)¶

Provide the command-line interface for the data-combination pipeline.

This module wires config loading/overrides to pleb.pipeline.run_pipeline() and pleb.param_scan.run_param_scan(), including convenience flags for parameter scans and PQC reporting.

Examples

Run the full pipeline from a JSON config:

python -m pleb.cli --config pipeline.json

Run a parameter scan with a typical profile:

python -m pleb.cli --config pipeline.toml --param-scan --scan-typical

Generate a PQC report from a run directory:

python -m pleb.cli qc-report --run-dir results/run_2024-01-01

See also

pleb.config.PipelineConfig: Configuration model. pleb.pipeline.run_pipeline: Pipeline execution entry point. pleb.param_scan.run_param_scan: Parameter scan entry point. pleb.qc_report.generate_qc_report: QC report generator.

Define configuration models for the data-combination pipeline.

This module provides PipelineConfig, a flattened dataclass used by the CLI and pipeline entry points to control data ingestion, fitting, reporting, and optional FixDataset or parameter-scan stages. The config is intentionally flat to simplify JSON/TOML serialization and CLI overrides.

See also

pleb.pipeline.run_pipeline: Main pipeline entry point. pleb.param_scan.run_param_scan: Parameter scan entry point. pleb.cli: Command-line interface that consumes PipelineConfig.

FixDataset utilities for cleaning and normalizing pulsar datasets.

This module implements deterministic file-level transformations for .par and .tim trees: include maintenance, flag standardization, deduplication, JUMP maintenance, declarative relabel/overlap rules, and optional QC-driven comment/delete actions.

Notes¶

Statistical operations in this module are lightweight and mostly descriptive:

Reference-system selection uses a weighted deterministic score that favors lower timing RMS, longer Tspan, better TOA precision, denser cadence, and broader overlap support across backend timfiles and MJD coverage.
Deduplication can use user-defined tolerance or auto-derived frequency tolerance from channel spacing.
QC application consumes precomputed pqc flags; this module does not fit statistical models itself.

Worked example¶

For a variant J1713+0747_new_all.tim, reference-system generation:

Split included backend timfiles by -sys.
Measure timing RMS, Tspan, TOA precision, cadence, and overlap-support diagnostics per system.
Choose the reference system with the best weighted score across those metrics.
Write J1713+0747_new.par with JUMP -sys <system> 0 <fitflag> lines.

References¶

PQC docs (for consumed QC flags): https://golamshaifullah.github.io/pqc/index.html

Notes¶

The QC CSV produced here is intentionally diagnostic rather than a final truth label. It carries detector outputs such as:

bad_ou: outlier evidence from the OU-innovation bad-measurement model.
bad_mad: robust outlier evidence from MAD-based detectors.
bad_hard: optional hard sigma-gate failures.
bad_point: combined outlier indicator after event-aware reconciliation.
event_member: coherent-event membership (transient/step/solar/etc.).
outlier_any: compatibility field from pqc (bad_point OR event).

Statistical concepts used by the wrapped PQC pipeline include:

False discovery rate (FDR) control Controls expected fraction of false positives among detections. A typical Benjamini-Hochberg decision rule marks p-values p_(i) where p_(i) <= (i/m) q for rank i, number of tests m, and target FDR q.
OU-correlated innovations Residuals are tested under a short-timescale correlated process to avoid over-flagging clustered noise as independent outliers.
Robust z-scores with MAD Robust scale estimate: MAD = median(|x - median(x)|) and sigma_robust ~= 1.4826 * MAD (for Gaussian data) gives outlier score |x - median(x)| / sigma_robust.
Delta-chi-square model comparison Event detectors compare null vs alternative local models; large Delta chi^2 supports structured deviations.

Worked example¶

If residuals in one backend include a coherent eclipse-like dip, robust/MAD detectors may initially mark those TOAs. Event detectors can then model the dip and reclassify those points as event_member so bad_point is cleared.

References¶

PQC documentation: https://golamshaifullah.github.io/pqc/index.html
Benjamini, Y. & Hochberg, Y. 1995, JRSS-B, 57(1), 289-300.
Rousseeuw, P. J. & Croux, C. 1993, JASA, 88(424), 1273-1283.

Notes¶

Compact decisions are derived from two logical sets:

outlier set (by default union of outlier_any, bad_point, robust/bad-mad columns, etc.)
event set (transient, solar, eclipse, Gaussian bump, glitch, orbital flags)

Decision rules:

BAD_TOA: outlier and not event
REVIEW_EVENT: outlier and event
EVENT: event and not outlier
KEEP: neither set

References¶

PQC docs: https://golamshaifullah.github.io/pqc/index.html

Generate comparison and QC reports from pipeline outputs.

This module builds change reports, model-comparison summaries, and outlier tables from tempo2 outputs parsed by pleb.parsers.

See also

pleb.parsers: Parsing helpers for tempo2 logs. pleb.pipeline.run_pipeline: Orchestrates report generation.

Infer system flags for EPTA-style tempo2 FORMAT 1 .tim files.

Goal:

When -sys/-group/-pta are missing (and sometimes -be missing), infer them cheaply and consistently.
If bandwidth (-bw) and number-of-bands (-nchan/-nband) are available, assign sub-band systems by binning frequencies into equal-width sub-bands.
Keep system format: <TEL>.<BACKEND>.<CENTRE_MHZ> (used with “-sys” flag)

Design choices (cheap + robust):

Only TOA lines are processed; directives/comments are preserved.
We never try to infer a header; we assume FORMAT 1 and use the 2nd column as frequency (MHz).
We drop/ignore any TOA lines whose frequency is non-numeric.
Backend inference:
1. per-TOA “-be” flag if present
2. filename stem heuristic: <TEL>.<BACKEND>….tim
3. otherwise raise BackendMissingError with a sample TOA line for the UI to show the user
Second pass canonicalisation across pulsars:

Use canonicalise_centres() on a combined table of inferred centres to “snap” them across pulsars within a tolerance (default 1 MHz).

See also

pleb.dataset_fix.infer_and_apply_system_flags: Integration point for FixDataset.

tempo2 execution helpers for the pipeline.

This module wraps the tempo2 CLI invocation used by the pipeline and parameter-scan workflows. It assumes tempo2 is available inside a Singularity or Apptainer container.

See also

pleb.pipeline.run_pipeline: Main workflow orchestration. pleb.param_scan.run_param_scan: Fit-only parameter scan workflow.

Robust .tim file parsing utilities.

These helpers implement tolerant parsing and filtering for tempo2 .tim files that contain mixed headers, directives, and TOA rows.

General utility helpers for the pipeline.

These helpers provide small filesystem utilities and shared path conventions used across pipeline modules.

API Reference¶

Internal Modules (No Index)¶

Notes¶

Worked example¶

References¶

See Also¶

Notes¶

Worked example¶

References¶

See Also¶

Notes¶

References¶

pleb - The EPTA Data Combination Pipeline

Navigation

Related Topics