Optimize Mode ============= Purpose ------- ``pleb optimize`` is the model-selection layer for PQC and workflow settings. It treats tuning as a repeatable search problem instead of manual threshold adjustment. The optimizer does not replace the existing pipeline. It sits on top of it: 1. sample a candidate settings vector, 2. run PLEB with those settings, 3. read the QC artifacts already produced by PLEB, 4. compute a weighted score, 5. keep the best configuration and a full trial table. This keeps the execution path familiar and reduces the amount of new code that can affect production processing. What optimize mode uses ----------------------- The optimizer reuses existing PLEB capabilities: - pipeline execution, - workflow execution, - PQC CSV outputs, - QC summary tables, - branch-aware run directories, - ordinary TOML settings overrides. It does not introduce a second QC implementation. Main files ---------- - ``pleb/optimize/search_space.py``: search-space loading and parameter sampling. - ``pleb/optimize/trial_runner.py``: adapter from optimization trials to existing pipeline or workflow runs. - ``pleb/optimize/scorers.py``: metric extraction from ``*_qc.csv`` outputs. - ``pleb/optimize/optimizer.py``: study driver, best-trial selection, result writing. Configuration files ------------------- The optimizer uses three TOML file types. Run config ~~~~~~~~~~ This defines where the base run comes from, how many trials to execute, and where results should be written. Example:: [optimize] base_config_path = "configs/runs/pipeline/epta-dr3-v0.toml" execution_mode = "pipeline" search_space_path = "configs/optimize/search_spaces/pqc_balanced_v1.toml" objective_path = "configs/optimize/objectives/balanced_qc.toml" folds_path = "configs/optimize/folds/time_blocks.toml" out_dir = "results/optimize/example_pipeline" study_name = "example_pipeline_pqc" n_trials = 20 sampler = "random" seed = 12345 jobs = 1 [optimize.fixed_overrides] pulsars = ["J1713+0747"] run_pqc = true run_tempo2 = true run_fix_dataset = false Search-space config ~~~~~~~~~~~~~~~~~~~ This defines which settings are allowed to move during optimization. Example:: [parameters.pqc_fdr_q] type = "float" low = 0.001 high = 0.05 log = true [parameters.pqc_step_enabled] type = "bool" [parameters.pqc_step_delta_chi2_thresh] type = "float" low = 10.0 high = 60.0 depends_on = "pqc_step_enabled" enabled_values = [true] Supported parameter types are: - ``float`` - ``int`` - ``bool`` - ``categorical`` - ``fixed`` Objective config ~~~~~~~~~~~~~~~~ This defines the weighted score. Example:: maximize = true [weights] residual_cleanliness = 2.0 residual_whiteness = 1.0 event_coherence = 0.75 stability = 0.75 bad_fraction = -1.5 overfragmentation_penalty = -1.0 Fold config ~~~~~~~~~~~ This defines how repeated held-out reruns are built. Example:: [folds] mode = "time_blocks" n_splits = 4 time_col = "mjd" backend_col = "sys" rerun_mode = "held_in" Current fold modes: - ``none``: no fold reruns, only the full trial run is scored. - ``time_blocks``: divide TOAs by MJD blocks and rerun with one block held out each time. - ``backend_holdout``: rerun with one backend held out at a time. True held-out reruns -------------------- This mode now performs actual reruns on reduced temporary datasets. For each trial: 1. PLEB runs once on the full dataset. 2. If folds are enabled, PLEB builds temporary dataset trees under ``/_fold_datasets/``. 3. For each fold, backend tim files are rewritten so held-out TOAs are removed. 4. ``*_all.tim`` include files are updated so empty backend tim files are not kept in the include list. 5. PLEB reruns on each held-in fold dataset. 6. The optimizer averages fold metrics and computes stability from the spread across fold reruns. This is closer to a real robustness test than simply slicing one QC CSV after a single run. Important limitation: - The fold reruns are held-in reruns. PLEB does not yet have a separate train/apply model that fits on one subset and then predicts labels for a separate hold-out subset. - In practice this means optimize mode measures stability under data removal, which is still useful for unsupervised QC and event detection. Metrics ------- The optimizer currently scores trials from the QC CSV outputs. Available metrics include: - ``bad_fraction`` - ``event_fraction`` - ``event_coherence`` - ``residual_cleanliness`` - ``residual_whiteness`` - ``overfragmentation_penalty`` - ``backend_inconsistency_penalty`` - ``parameter_complexity_penalty`` - ``stability`` - ``event_stability`` There are also raw counts such as: - ``n_toas`` - ``n_bad`` - ``n_events`` - ``n_event_members`` Metric definitions ------------------ The current metrics are simple summary statistics derived from the ``*_qc.csv`` tables. They are intended to be transparent and easy to inspect, rather than hidden model scores. Counts ~~~~~~ - ``n_toas``: number of TOA rows in the QC table. - ``n_bad``: number of TOAs flagged by the combined bad-point mask. - ``n_events``: number of distinct detected events. This is counted from event ID columns such as ``transient_id`` and related event labels, with solar and orbital event flags contributing when present. - ``n_event_members``: number of TOAs that belong to any detected event. Fractions ~~~~~~~~~ - ``bad_fraction``: ``n_bad / n_toas``. - ``event_fraction``: ``n_event_members / n_toas``. Event-structure metrics ~~~~~~~~~~~~~~~~~~~~~~~ - ``event_coherence``: among TOAs marked as event members, this is the fraction belonging to the most common backend. A value near 1 means event members are concentrated in one backend; a lower value means they are spread across multiple backends. - ``overfragmentation_penalty``: fraction of detected events that contain only one TOA. Large values indicate that event detection is breaking structure into isolated single-point events. Residual-based metrics ~~~~~~~~~~~~~~~~~~~~~~ These are computed only after removing TOAs flagged as bad. - ``residual_cleanliness``: ``1 / (1 + MAD(clean residuals))`` where MAD is the median absolute deviation. Larger values mean the cleaned residuals are more tightly grouped. - ``residual_whiteness``: ``1 / (1 + abs(lag-1 autocorrelation))`` for the cleaned residual series. Larger values mean the cleaned residuals are closer to white noise at one-step lag. - ``scaled_residual_cleanliness``: ``1 / (1 + median(abs(residual) / sigma))`` for rows with valid uncertainties. Larger values mean smaller residuals relative to the reported TOA uncertainty scale. Backend-distribution metric ~~~~~~~~~~~~~~~~~~~~~~~~~~~ - ``backend_inconsistency_penalty``: normalized entropy of the bad-TOA distribution across backends. A value near 0 means bad TOAs are concentrated in one backend; a larger value means they are spread more evenly across many backends. Search-complexity metric ~~~~~~~~~~~~~~~~~~~~~~~~ - ``parameter_complexity_penalty``: active tuned parameters divided by the total number of parameters in the search space. This penalizes settings that only win by turning on many extra degrees of freedom. Fold-robustness metrics ~~~~~~~~~~~~~~~~~~~~~~~ - ``stability``: ``1 / (1 + stddev(bad_fraction across folds))``. - ``event_stability``: ``1 / (1 + stddev(event_fraction across folds))``. For both stability metrics, values near 1 mean the metric changes little when the data are perturbed by the fold scheme. Output files ------------ Each optimization study writes: - ``trials.csv``: one row per trial with score, parameters, and metrics. - ``summary.json``: compact study summary. - ``best_trial.json``: full record of the best trial. - ``best_overrides.toml``: flat TOML snippet of the winning parameter values. - ``report.md``: compact human-readable report. Running optimize mode --------------------- Example:: python -m pleb.cli optimize \ --config configs/optimize/runs/example_pipeline.toml If ``sampler = "random"``, PLEB uses built-in random sampling. If ``sampler = "optuna_tpe"``, Optuna must be installed in the active Python environment. Current limits -------------- - optimization-level ``jobs`` must currently be ``1``; - per-trial internal pipeline parallelism still works through ordinary PLEB settings such as ``jobs`` in the base pipeline config; - workflow optimization applies sampled settings through top-level workflow ``set`` overrides; - the optimizer is designed for settings selection, not for timing-model parameter scans. Use ``param_scan`` for timing-model experiments. Relationship to PQC ------------------- Optimize mode does not re-explain the detector mathematics. For detector and statistical details, use the PQC documentation: - https://golamshaifullah.github.io/pqc/index.html The optimizer only answers a different question: "Which combination of PLEB/PQC settings gives the best overall behavior under the objective I defined?"