First FixDataset Pass: Build A Usable Single-Pulsar Branch¶
This is the stage that produces the first branch that is usually practical to analyze and maintain.
The aim is not QC yet. The aim is:
consistent system flags,
consistent jump structure,
a clean branch boundary,
optional variant products for later use.
Why This Stage Comes Before PQC¶
PQC grouping and interpretation depend on the data layout and metadata being reasonably coherent.
If PQC is run before system flags and jumps are in order, the output may be technically correct but operationally hard to interpret.
So the first serious branch-building pass should usually be:
infer or normalize system flags,
insert missing jumps,
prune stale jumps,
optionally generate variants,
commit that state to a new branch.
The Step-1 Profile¶
Create a run profile under configs/runs/fixdataset/.
Example:
configs/runs/fixdataset/single_pulsar_step1_fix.toml
Tracked repository example:
configs/runs/fixdataset/single_pulsar_step1_fix.example.toml
home_dir = "/data/canonical"
dataset_name = "EPTA-DR3/epta-dr3-data"
results_dir = "results"
singularity_image = "/work/containers/psrpta.sif"
branches = ["step1_fix_flags_variants"]
reference_branch = "step1_fix_flags_variants"
pulsars = ["J1909-3744"]
jobs = 1
outdir_name = "j1909_step1_fix"
run_tempo2 = false
run_pqc = false
qc_report = false
make_plots = false
make_reports = false
make_covmat = false
run_fix_dataset = true
fix_apply = true
fix_base_branch = "raw_ingest"
fix_branch_name = "step1_fix_flags_variants"
fix_commit_message = "Step1: normalize flags and jumps for J1909-3744"
fix_infer_system_flags = true
fix_system_flag_overwrite_existing = true
fix_insert_missing_jumps = true
fix_prune_stale_jumps = true
fix_jump_flag = "-sys"
fix_generate_alltim_variants = true
fix_backend_classifications_path = "configs/catalogs/variants/backend_classifications_legacy_new.toml"
fix_alltim_variants_path = "configs/catalogs/variants/alltim_variants_legacy_new.toml"
fix_jump_reference_variants = true
fix_jump_reference_jump_flag = "-sys"
fix_jump_reference_keep_tmp = false
fix_jump_reference_csv_dir = "results/jump_reference"
fix_dedupe_toas_within_tim = true
fix_remove_overlaps_exact = true
fix_ensure_ephem = "DE440"
fix_ensure_clk = "TT(BIPM2024)"
This is the single-pulsar version of the repository’s Step 1 pattern
from configs/workflows/steps/step1_fix_flags_variants.toml.
What Each Key Is For¶
Core routing keys:
fix_base_branchExisting branch to mutate from. For a newly ingested dataset, this is often
raw_ingest.fix_branch_nameNew branch that receives the edits.
branchesandreference_branchKeep these aligned with the branch this stage is meant to operate on.
Mutation keys:
fix_infer_system_flagsInfer or normalize system labels used later by jump logic and PQC grouping.
fix_system_flag_overwrite_existingOverwrite existing inconsistent values. Use carefully, but for an initial harmonization pass it is often the right choice.
fix_insert_missing_jumpsInsert jumps that should exist based on backend/system structure.
fix_prune_stale_jumpsRemove jumps that no longer map to real data structure.
fix_jump_flagFlag used as the jump grouping reference. Commonly
-sys. This choice should align with the grouping logic that later stages use.
Variant keys:
fix_generate_alltim_variantsGenerate variant include products named
<PSR>_<variant>_all.timfor downstream analysis or review.fix_backend_classifications_pathClassification catalog used to decide which backends belong to which variant families.
fix_alltim_variants_pathVariant-definition catalog.
fix_jump_reference_variantsBuild reference-system jump variants and write variant parfiles named
<PSR>_<variant>.par. This is useful when the workflow needs a reproducible set of variant products for later comparison or review.
Consistency keys:
fix_ensure_ephemEnsure a specific ephemeris in parfiles.
fix_ensure_clkEnsure a specific clock string in parfiles.
Why run_tempo2 Is Usually Disabled Here¶
The point of this pass is to establish a coherent branch structure and a coherent data-model boundary. Running tempo2 at the same time can blur that boundary because it adds fit products and diagnostics to a run whose main purpose is mutation.
It is therefore usually cleaner to:
finish the metadata and jump pass,
inspect the resulting branch,
run tempo2 and PQC in the next pass.
What “Basic Par File With All The Jumps” Means¶
This phrase needs a precise operational meaning.
The goal is not “every possible jump anyone could imagine.” The goal is:
the parfile reflects the current backend/system structure,
missing expected jumps are inserted,
stale or obsolete jumps are removed,
the branch becomes a sensible baseline for timing and QC.
In other words, this stage creates the first operationally coherent timing model branch.
How To Run The First Pass¶
Run:
pleb --config configs/runs/fixdataset/single_pulsar_step1_fix.toml
Because fix_apply = true, this is a branch-mutating run.
What To Inspect After The First Pass¶
Inspect:
the new git branch,
the pulsar parfile,
the pulsar
_all.timplus any<PSR>_<variant>_all.timproducts,the FixDataset summary outputs under the run directory,
the jump reference CSV outputs if enabled.
Check:
whether
-sysvalues now look consistent,whether expected jump lines exist in the parfile,
whether obviously obsolete jump lines were removed,
whether variant files were created where expected.
How To Read The Outputs¶
For a single pulsar, the most useful direct inspection points are usually:
the pulsar parfile in the dataset branch,
the pulsar
_all.timfile,the backend tim files under
tims/,any generated
<PSR>_<variant>_all.timand<PSR>_<variant>.parfiles,the run summary files under the run directory.
The parfile answers:
which jumps exist,
whether ephemeris and clock defaults were enforced,
whether the model now reflects the intended backend partitioning.
The tim files answer:
whether
-sysand related flags were normalized consistently,whether include structure still looks sane after mutation.
What Jumps Mean In This Workflow¶
A jump is not merely a nuisance parameter. In this workflow, jumps express the partitioning implied by backend/system differences.
The following points are operationally important:
jumps are tied to how the data are grouped,
wrong grouping gives wrong jump structure,
missing jumps can bias the fit,
stale jumps can clutter the model and confuse interpretation,
jump maintenance is a data-structure task before it is a statistical task.
Common First-Pass Errors¶
using the wrong
fix_base_branch,forgetting to limit to one pulsar,
running with
run_tempo2=truewhen the goal is only mutation,enabling too many unrelated fix actions at once,
generating variants without understanding the catalogs used.
What This Stage Usually Writes¶
Depending on the exact settings and input data, this pass may write:
updated tim files with normalized flags,
a revised pulsar parfile with jump maintenance applied,
regenerated
_all.timinclude files,optional
<PSR>_<variant>_all.timand<PSR>_<variant>.parproducts,jump-reference CSV outputs under
fix_jump_reference_csv_dir,run summaries in the run directory under
results_dir.
Because this pass mutates the dataset branch, it is useful to compare the
raw_ingest branch and the Step-1 branch directly with git after the run.
How This Stage Connects To PQC¶
The output of this stage is not just a cleaned branch. It is also the branch that defines the grouping vocabulary for later QC:
backend/system flags are more coherent,
expected jumps are present,
stale jumps are removed,
variant products, if enabled, are available on the branch that the detect stage will read.
This is why Step 2 commonly sets:
fix_base_branch = "step1_fix_flags_variants"
and then uses pqc_backend_col = "sys" or another grouping key that now has
cleaner semantics than it had before this pass.