API Reference¶
Run the data-combination diagnostics pipeline.
This package exposes a small public API for running the full pipeline, running parameter scans, and applying FixDataset operations programmatically. The implementation is refactored from the original notebooks and designed to import quickly; heavy dependencies (GitPython, libstempo/pqc) are imported lazily by entry points.
Examples
Run the pipeline programmatically:
from pathlib import Path
from pleb import PipelineConfig, run_pipeline
cfg = PipelineConfig(
home_dir=Path("/data/epta"),
singularity_image=Path("/images/tempo2.sif"),
dataset_name="EPTA",
)
outputs = run_pipeline(cfg)
Run a parameter scan:
from pleb import PipelineConfig, run_param_scan
cfg = PipelineConfig(
home_dir=Path("/data/epta"),
singularity_image=Path("/images/tempo2.sif"),
dataset_name="EPTA",
param_scan_typical=True,
)
results = run_param_scan(cfg)
See also
pleb.pipeline.run_pipeline: Full pipeline implementation. pleb.param_scan.run_param_scan: Parameter scan runner. pleb.dataset_fix: FixDataset helpers.
Internal Modules (No Index)¶
Provide the command-line interface for the data-combination pipeline.
This module wires config loading/overrides to pleb.pipeline.run_pipeline()
and pleb.param_scan.run_param_scan(), including convenience flags for
parameter scans and PQC reporting.
Examples
Run the full pipeline from a JSON config:
python -m pleb.cli --config pipeline.json
Run a parameter scan with a typical profile:
python -m pleb.cli --config pipeline.toml --param-scan --scan-typical
Generate a PQC report from a run directory:
python -m pleb.cli qc-report --run-dir results/run_2024-01-01
See also
pleb.config.PipelineConfig: Configuration model. pleb.pipeline.run_pipeline: Pipeline execution entry point. pleb.param_scan.run_param_scan: Parameter scan entry point. pleb.qc_report.generate_qc_report: QC report generator.
Define configuration models for the data-combination pipeline.
This module provides PipelineConfig, a flattened dataclass used by the
CLI and pipeline entry points to control data ingestion, fitting, reporting,
and optional FixDataset or parameter-scan stages. The config is intentionally
flat to simplify JSON/TOML serialization and CLI overrides.
See also
pleb.pipeline.run_pipeline: Main pipeline entry point.
pleb.param_scan.run_param_scan: Parameter scan entry point.
pleb.cli: Command-line interface that consumes PipelineConfig.
FixDataset utilities for cleaning and normalizing pulsar datasets.
This module implements deterministic file-level transformations for .par
and .tim trees: include maintenance, flag standardization,
deduplication, JUMP maintenance, declarative relabel/overlap rules, and
optional QC-driven comment/delete actions.
Notes¶
Statistical operations in this module are lightweight and mostly descriptive:
Reference-system selection uses a weighted deterministic score that favors lower timing RMS, longer Tspan, better TOA precision, denser cadence, and broader overlap support across backend timfiles and MJD coverage.
Deduplication can use user-defined tolerance or auto-derived frequency tolerance from channel spacing.
QC application consumes precomputed
pqcflags; this module does not fit statistical models itself.
Worked example¶
For a variant J1713+0747_new_all.tim, reference-system generation:
Split included backend timfiles by
-sys.Measure timing RMS, Tspan, TOA precision, cadence, and overlap-support diagnostics per system.
Choose the reference system with the best weighted score across those metrics.
Write
J1713+0747_new.parwithJUMP -sys <system> 0 <fitflag>lines.
References¶
PQC docs (for consumed QC flags): https://golamshaifullah.github.io/pqc/index.html
See Also¶
- pleb.config.PipelineConfig
Pipeline-level integration settings.
- pleb.pipeline.run_pipeline
Orchestrates FixDataset execution and reporting.
Legacy robust .tim parsing import path.
## GMS: Check if this actually used.
This module is kept for backward compatibility and re-exports the canonical
reader from pleb.tim_reader.
Legacy dataset-fix utilities extracted from FixDataset.ipynb.
## GMS: Check if these are actually used. Might be junk.
These functions implement the original notebook’s dataset correction features: - TIM fixes: whitespace/padd cleanup, missing flag insertion (-be/-pta/-group/-sys), NUPPI splitting, overlap removal, missing tim INCLUDE updates. - PAR fixes: ensure ephem/clk/ne_sw, add missing JUMPs, coordinate conversion helpers, optional param additions.
They are intentionally kept close to the notebook logic for parity.
Lightweight Git helpers used by the pipeline.
These helpers wrap common GitPython operations with minimal logging.
Kepler/orbit helper functions.
This module is adapted from the AnalysePulsars notebook and provides small, float-based orbital conversions suitable for tempo2-style workflows.
Notes
The original notebook mixed unit-aware calculations (Astropy) and raw floats. Here we keep the core conversions in plain floats and expose more advanced solvers behind optional SciPy/Astropy imports when needed.
Legacy dataset-fix utilities extracted from FixDataset.ipynb.
## GMS: Check where this is used. There are at least two versions of this which are just dead code.
These functions implement the original notebook’s dataset correction features: - TIM fixes: whitespace/padd cleanup, missing flag insertion (-be/-pta/-group/-sys), NUPPI splitting, overlap removal, missing tim INCLUDE updates. - PAR fixes: ensure ephem/clk/ne_sw, add missing JUMPs, coordinate conversion helpers, optional param additions.
They are intentionally kept close to the notebook logic for parity.
Logging helpers for the pipeline package.
This module provides a small logger factory that writes to stdout and a
timestamped log file under logs/ (or $PLEB_LOG_DIR).
Optional PQC integration for per-TOA quality control.
This module converts PTAQCConfig values into pqc detector
configuration objects, runs pqc per pulsar, and writes a per-TOA QC table.
It also supports per-backend override profiles and subprocess isolation for
libstempo-related crashes.
Notes¶
The QC CSV produced here is intentionally diagnostic rather than a final truth label. It carries detector outputs such as:
bad_ou: outlier evidence from the OU-innovation bad-measurement model.bad_mad: robust outlier evidence from MAD-based detectors.bad_hard: optional hard sigma-gate failures.bad_point: combined outlier indicator after event-aware reconciliation.event_member: coherent-event membership (transient/step/solar/etc.).outlier_any: compatibility field frompqc(bad_point OR event).
Statistical concepts used by the wrapped PQC pipeline include:
False discovery rate (FDR) control Controls expected fraction of false positives among detections. A typical Benjamini-Hochberg decision rule marks p-values
p_(i)wherep_(i) <= (i/m) qfor ranki, number of testsm, and target FDRq.OU-correlated innovations Residuals are tested under a short-timescale correlated process to avoid over-flagging clustered noise as independent outliers.
Robust z-scores with MAD Robust scale estimate:
MAD = median(|x - median(x)|)andsigma_robust ~= 1.4826 * MAD(for Gaussian data) gives outlier score|x - median(x)| / sigma_robust.Delta-chi-square model comparison Event detectors compare null vs alternative local models; large
Delta chi^2supports structured deviations.
Worked example¶
If residuals in one backend include a coherent eclipse-like dip, robust/MAD
detectors may initially mark those TOAs. Event detectors can then model the
dip and reclassify those points as event_member so bad_point is cleared.
References¶
PQC documentation: https://golamshaifullah.github.io/pqc/index.html
Benjamini, Y. & Hochberg, Y. 1995, JRSS-B, 57(1), 289-300.
Rousseeuw, P. J. & Croux, C. 1993, JASA, 88(424), 1273-1283.
See Also¶
- pleb.pipeline.run_pipeline
Pipeline stage that invokes this module.
- pleb.dataset_fix.apply_pqc_outliers
Optional downstream action stage (comment/delete flagged TOAs).
Parse tempo2 outputs and pipeline text formats.
This module provides small, resilient parsers for tempo2 logs and output
artifacts such as plk logs, covariance matrices, and general2 output.
See also
pleb.reports: Utilities that consume parsed outputs. pleb.tim_reader.read_tim_file_robust: Robust .tim reader used here.
Parameter-scan utilities for rapid fit diagnostics.
This module implements a fit-only workflow that evaluates candidate parameter
additions or edits by running tempo2 on temporary .par variants. It is used
by the CLI --param-scan mode.
See also
pleb.pipeline.run_pipeline: Full pipeline workflow. pleb.reports.write_new_param_significance: Related reporting utilities.
Orchestrate the data-combination pipeline end to end.
This module coordinates git branch management, tempo2 runs, report generation,
and optional quality-control steps. It stitches together the core building
blocks in pleb.tempo2, pleb.reports, and pleb.dataset_fix.
See also
pleb.config.PipelineConfig: Primary configuration model. pleb.param_scan.run_param_scan: Fit-only parameter scan workflow.
Plotting helpers for pipeline outputs.
This module renders summary plots and tables from tempo2 outputs and pipeline metadata. It relies on Matplotlib and optionally Seaborn for styling.
See also
pleb.reports: Tabular report generation. pleb.parsers: Parsing helpers used by plots.
Binary/orbital analysis helpers for pulsar .par files.
This module provides lightweight parsing and derived-parameter calculations intended for summary reports, not full timing-model validation.
See also
pleb.kepler_orbits: Orbital mechanics helpers used in derived quantities. pleb.config.PipelineConfig: Enables binary analysis in the pipeline.
Generate report artifacts from existing pqc CSV outputs.
This module is a post-processing/reporting layer. It does not re-run pqc;
instead it reads *_qc.csv files, renders helper-script diagnostics, and can
assemble a compact PDF with actionable per-backend tables.
Notes¶
Compact decisions are derived from two logical sets:
outlier set (by default union of
outlier_any,bad_point, robust/bad-mad columns, etc.)event set (transient, solar, eclipse, Gaussian bump, glitch, orbital flags)
Decision rules:
BAD_TOA: outlier and not eventREVIEW_EVENT: outlier and eventEVENT: event and not outlierKEEP: neither set
References¶
Generate comparison and QC reports from pipeline outputs.
This module builds change reports, model-comparison summaries, and outlier
tables from tempo2 outputs parsed by pleb.parsers.
See also
pleb.parsers: Parsing helpers for tempo2 logs. pleb.pipeline.run_pipeline: Orchestrates report generation.
Infer system flags for EPTA-style tempo2 FORMAT 1 .tim files.
Goal:
When -sys/-group/-pta are missing (and sometimes -be missing), infer them cheaply and consistently.
If bandwidth (-bw) and number-of-bands (-nchan/-nband) are available, assign sub-band systems by binning frequencies into equal-width sub-bands.
Keep system format: <TEL>.<BACKEND>.<CENTRE_MHZ> (used with “-sys” flag)
Design choices (cheap + robust):
Only TOA lines are processed; directives/comments are preserved.
We never try to infer a header; we assume FORMAT 1 and use the 2nd column as frequency (MHz).
We drop/ignore any TOA lines whose frequency is non-numeric.
Backend inference:
per-TOA “-be” flag if present
filename stem heuristic: <TEL>.<BACKEND>….tim
otherwise raise BackendMissingError with a sample TOA line for the UI to show the user
Second pass canonicalisation across pulsars:
Use canonicalise_centres() on a combined table of inferred centres to “snap” them across pulsars within a tolerance (default 1 MHz).
See also
pleb.dataset_fix.infer_and_apply_system_flags: Integration point for FixDataset.
tempo2 execution helpers for the pipeline.
This module wraps the tempo2 CLI invocation used by the pipeline and parameter-scan workflows. It assumes tempo2 is available inside a Singularity or Apptainer container.
See also
pleb.pipeline.run_pipeline: Main workflow orchestration. pleb.param_scan.run_param_scan: Fit-only parameter scan workflow.
Robust .tim file parsing utilities.
These helpers implement tolerant parsing and filtering for tempo2 .tim files that contain mixed headers, directives, and TOA rows.
General utility helpers for the pipeline.
These helpers provide small filesystem utilities and shared path conventions used across pipeline modules.