API Reference

Run the data-combination diagnostics pipeline.

This package exposes a small public API for running the full pipeline, running parameter scans, and applying FixDataset operations programmatically. The implementation is refactored from the original notebooks and designed to import quickly; heavy dependencies (GitPython, libstempo/pqc) are imported lazily by entry points.

Examples

Run the pipeline programmatically:

from pathlib import Path
from pleb import PipelineConfig, run_pipeline

cfg = PipelineConfig(
    home_dir=Path("/data/epta"),
    singularity_image=Path("/images/tempo2.sif"),
    dataset_name="EPTA",
)
outputs = run_pipeline(cfg)

Run a parameter scan:

from pleb import PipelineConfig, run_param_scan

cfg = PipelineConfig(
    home_dir=Path("/data/epta"),
    singularity_image=Path("/images/tempo2.sif"),
    dataset_name="EPTA",
    param_scan_typical=True,
)
results = run_param_scan(cfg)

See also

pleb.pipeline.run_pipeline: Full pipeline implementation. pleb.param_scan.run_param_scan: Parameter scan runner. pleb.dataset_fix: FixDataset helpers.

Internal Modules (No Index)

Provide the command-line interface for the data-combination pipeline.

This module wires config loading/overrides to pleb.pipeline.run_pipeline() and pleb.param_scan.run_param_scan(), including convenience flags for parameter scans and PQC reporting.

Examples

Run the full pipeline from a JSON config:

python -m pleb.cli --config pipeline.json

Run a parameter scan with a typical profile:

python -m pleb.cli --config pipeline.toml --param-scan --scan-typical

Generate a PQC report from a run directory:

python -m pleb.cli qc-report --run-dir results/run_2024-01-01

See also

pleb.config.PipelineConfig: Configuration model. pleb.pipeline.run_pipeline: Pipeline execution entry point. pleb.param_scan.run_param_scan: Parameter scan entry point. pleb.qc_report.generate_qc_report: QC report generator.

Define configuration models for the data-combination pipeline.

This module provides PipelineConfig, a flattened dataclass used by the CLI and pipeline entry points to control data ingestion, fitting, reporting, and optional FixDataset or parameter-scan stages. The config is intentionally flat to simplify JSON/TOML serialization and CLI overrides.

See also

pleb.pipeline.run_pipeline: Main pipeline entry point. pleb.param_scan.run_param_scan: Parameter scan entry point. pleb.cli: Command-line interface that consumes PipelineConfig.

FixDataset utilities for cleaning and normalizing pulsar datasets.

This module implements deterministic file-level transformations for .par and .tim trees: include maintenance, flag standardization, deduplication, JUMP maintenance, declarative relabel/overlap rules, and optional QC-driven comment/delete actions.

Notes

Statistical operations in this module are lightweight and mostly descriptive:

  • Reference-system selection uses a weighted deterministic score that favors lower timing RMS, longer Tspan, better TOA precision, denser cadence, and broader overlap support across backend timfiles and MJD coverage.

  • Deduplication can use user-defined tolerance or auto-derived frequency tolerance from channel spacing.

  • QC application consumes precomputed pqc flags; this module does not fit statistical models itself.

Worked example

For a variant J1713+0747_new_all.tim, reference-system generation:

  1. Split included backend timfiles by -sys.

  2. Measure timing RMS, Tspan, TOA precision, cadence, and overlap-support diagnostics per system.

  3. Choose the reference system with the best weighted score across those metrics.

  4. Write J1713+0747_new.par with JUMP -sys <system> 0 <fitflag> lines.

References

See Also

pleb.config.PipelineConfig

Pipeline-level integration settings.

pleb.pipeline.run_pipeline

Orchestrates FixDataset execution and reporting.

Legacy robust .tim parsing import path.

## GMS: Check if this actually used.

This module is kept for backward compatibility and re-exports the canonical reader from pleb.tim_reader.

Legacy dataset-fix utilities extracted from FixDataset.ipynb.

## GMS: Check if these are actually used. Might be junk.

These functions implement the original notebook’s dataset correction features: - TIM fixes: whitespace/padd cleanup, missing flag insertion (-be/-pta/-group/-sys), NUPPI splitting, overlap removal, missing tim INCLUDE updates. - PAR fixes: ensure ephem/clk/ne_sw, add missing JUMPs, coordinate conversion helpers, optional param additions.

They are intentionally kept close to the notebook logic for parity.

Lightweight Git helpers used by the pipeline.

These helpers wrap common GitPython operations with minimal logging.

Kepler/orbit helper functions.

This module is adapted from the AnalysePulsars notebook and provides small, float-based orbital conversions suitable for tempo2-style workflows.

Notes

The original notebook mixed unit-aware calculations (Astropy) and raw floats. Here we keep the core conversions in plain floats and expose more advanced solvers behind optional SciPy/Astropy imports when needed.

Legacy dataset-fix utilities extracted from FixDataset.ipynb.

## GMS: Check where this is used. There are at least two versions of this which are just dead code.

These functions implement the original notebook’s dataset correction features: - TIM fixes: whitespace/padd cleanup, missing flag insertion (-be/-pta/-group/-sys), NUPPI splitting, overlap removal, missing tim INCLUDE updates. - PAR fixes: ensure ephem/clk/ne_sw, add missing JUMPs, coordinate conversion helpers, optional param additions.

They are intentionally kept close to the notebook logic for parity.

Logging helpers for the pipeline package.

This module provides a small logger factory that writes to stdout and a timestamped log file under logs/ (or $PLEB_LOG_DIR).

Optional PQC integration for per-TOA quality control.

This module converts PTAQCConfig values into pqc detector configuration objects, runs pqc per pulsar, and writes a per-TOA QC table. It also supports per-backend override profiles and subprocess isolation for libstempo-related crashes.

Notes

The QC CSV produced here is intentionally diagnostic rather than a final truth label. It carries detector outputs such as:

  • bad_ou: outlier evidence from the OU-innovation bad-measurement model.

  • bad_mad: robust outlier evidence from MAD-based detectors.

  • bad_hard: optional hard sigma-gate failures.

  • bad_point: combined outlier indicator after event-aware reconciliation.

  • event_member: coherent-event membership (transient/step/solar/etc.).

  • outlier_any: compatibility field from pqc (bad_point OR event).

Statistical concepts used by the wrapped PQC pipeline include:

  1. False discovery rate (FDR) control Controls expected fraction of false positives among detections. A typical Benjamini-Hochberg decision rule marks p-values p_(i) where p_(i) <= (i/m) q for rank i, number of tests m, and target FDR q.

  2. OU-correlated innovations Residuals are tested under a short-timescale correlated process to avoid over-flagging clustered noise as independent outliers.

  3. Robust z-scores with MAD Robust scale estimate: MAD = median(|x - median(x)|) and sigma_robust ~= 1.4826 * MAD (for Gaussian data) gives outlier score |x - median(x)| / sigma_robust.

  4. Delta-chi-square model comparison Event detectors compare null vs alternative local models; large Delta chi^2 supports structured deviations.

Worked example

If residuals in one backend include a coherent eclipse-like dip, robust/MAD detectors may initially mark those TOAs. Event detectors can then model the dip and reclassify those points as event_member so bad_point is cleared.

References

See Also

pleb.pipeline.run_pipeline

Pipeline stage that invokes this module.

pleb.dataset_fix.apply_pqc_outliers

Optional downstream action stage (comment/delete flagged TOAs).

Parse tempo2 outputs and pipeline text formats.

This module provides small, resilient parsers for tempo2 logs and output artifacts such as plk logs, covariance matrices, and general2 output.

See also

pleb.reports: Utilities that consume parsed outputs. pleb.tim_reader.read_tim_file_robust: Robust .tim reader used here.

Parameter-scan utilities for rapid fit diagnostics.

This module implements a fit-only workflow that evaluates candidate parameter additions or edits by running tempo2 on temporary .par variants. It is used by the CLI --param-scan mode.

See also

pleb.pipeline.run_pipeline: Full pipeline workflow. pleb.reports.write_new_param_significance: Related reporting utilities.

Orchestrate the data-combination pipeline end to end.

This module coordinates git branch management, tempo2 runs, report generation, and optional quality-control steps. It stitches together the core building blocks in pleb.tempo2, pleb.reports, and pleb.dataset_fix.

See also

pleb.config.PipelineConfig: Primary configuration model. pleb.param_scan.run_param_scan: Fit-only parameter scan workflow.

Plotting helpers for pipeline outputs.

This module renders summary plots and tables from tempo2 outputs and pipeline metadata. It relies on Matplotlib and optionally Seaborn for styling.

See also

pleb.reports: Tabular report generation. pleb.parsers: Parsing helpers used by plots.

Binary/orbital analysis helpers for pulsar .par files.

This module provides lightweight parsing and derived-parameter calculations intended for summary reports, not full timing-model validation.

See also

pleb.kepler_orbits: Orbital mechanics helpers used in derived quantities. pleb.config.PipelineConfig: Enables binary analysis in the pipeline.

Generate report artifacts from existing pqc CSV outputs.

This module is a post-processing/reporting layer. It does not re-run pqc; instead it reads *_qc.csv files, renders helper-script diagnostics, and can assemble a compact PDF with actionable per-backend tables.

Notes

Compact decisions are derived from two logical sets:

  • outlier set (by default union of outlier_any, bad_point, robust/bad-mad columns, etc.)

  • event set (transient, solar, eclipse, Gaussian bump, glitch, orbital flags)

Decision rules:

  • BAD_TOA: outlier and not event

  • REVIEW_EVENT: outlier and event

  • EVENT: event and not outlier

  • KEEP: neither set

References

Generate comparison and QC reports from pipeline outputs.

This module builds change reports, model-comparison summaries, and outlier tables from tempo2 outputs parsed by pleb.parsers.

See also

pleb.parsers: Parsing helpers for tempo2 logs. pleb.pipeline.run_pipeline: Orchestrates report generation.

Infer system flags for EPTA-style tempo2 FORMAT 1 .tim files.

Goal:

  • When -sys/-group/-pta are missing (and sometimes -be missing), infer them cheaply and consistently.

  • If bandwidth (-bw) and number-of-bands (-nchan/-nband) are available, assign sub-band systems by binning frequencies into equal-width sub-bands.

  • Keep system format: <TEL>.<BACKEND>.<CENTRE_MHZ> (used with “-sys” flag)

Design choices (cheap + robust):

  • Only TOA lines are processed; directives/comments are preserved.

  • We never try to infer a header; we assume FORMAT 1 and use the 2nd column as frequency (MHz).

  • We drop/ignore any TOA lines whose frequency is non-numeric.

  • Backend inference:

    1. per-TOA “-be” flag if present

    2. filename stem heuristic: <TEL>.<BACKEND>….tim

    3. otherwise raise BackendMissingError with a sample TOA line for the UI to show the user

  • Second pass canonicalisation across pulsars:

    Use canonicalise_centres() on a combined table of inferred centres to “snap” them across pulsars within a tolerance (default 1 MHz).

See also

pleb.dataset_fix.infer_and_apply_system_flags: Integration point for FixDataset.

tempo2 execution helpers for the pipeline.

This module wraps the tempo2 CLI invocation used by the pipeline and parameter-scan workflows. It assumes tempo2 is available inside a Singularity or Apptainer container.

See also

pleb.pipeline.run_pipeline: Main workflow orchestration. pleb.param_scan.run_param_scan: Fit-only parameter scan workflow.

Robust .tim file parsing utilities.

These helpers implement tolerant parsing and filtering for tempo2 .tim files that contain mixed headers, directives, and TOA rows.

General utility helpers for the pipeline.

These helpers provide small filesystem utilities and shared path conventions used across pipeline modules.