First FixDataset Pass: Build A Usable Single-Pulsar Branch

This is the stage that produces the first branch that is usually practical to analyze and maintain.

The aim is not QC yet. The aim is:

  • consistent system flags,

  • consistent jump structure,

  • a clean branch boundary,

  • optional variant products for later use.

Why This Stage Comes Before PQC

PQC grouping and interpretation depend on the data layout and metadata being reasonably coherent.

If PQC is run before system flags and jumps are in order, the output may be technically correct but operationally hard to interpret.

So the first serious branch-building pass should usually be:

  • infer or normalize system flags,

  • insert missing jumps,

  • prune stale jumps,

  • optionally generate variants,

  • commit that state to a new branch.

The Step-1 Profile

Create a run profile under configs/runs/fixdataset/.

Example: configs/runs/fixdataset/single_pulsar_step1_fix.toml

Tracked repository example: configs/runs/fixdataset/single_pulsar_step1_fix.example.toml

home_dir = "/data/canonical"
dataset_name = "EPTA-DR3/epta-dr3-data"
results_dir = "results"
singularity_image = "/work/containers/psrpta.sif"

branches = ["step1_fix_flags_variants"]
reference_branch = "step1_fix_flags_variants"
pulsars = ["J1909-3744"]
jobs = 1
outdir_name = "j1909_step1_fix"

run_tempo2 = false
run_pqc = false
qc_report = false
make_plots = false
make_reports = false
make_covmat = false

run_fix_dataset = true
fix_apply = true
fix_base_branch = "raw_ingest"
fix_branch_name = "step1_fix_flags_variants"
fix_commit_message = "Step1: normalize flags and jumps for J1909-3744"

fix_infer_system_flags = true
fix_system_flag_overwrite_existing = true
fix_insert_missing_jumps = true
fix_prune_stale_jumps = true
fix_jump_flag = "-sys"

fix_generate_alltim_variants = true
fix_backend_classifications_path = "configs/catalogs/variants/backend_classifications_legacy_new.toml"
fix_alltim_variants_path = "configs/catalogs/variants/alltim_variants_legacy_new.toml"

fix_jump_reference_variants = true
fix_jump_reference_jump_flag = "-sys"
fix_jump_reference_keep_tmp = false
fix_jump_reference_csv_dir = "results/jump_reference"

fix_dedupe_toas_within_tim = true
fix_remove_overlaps_exact = true

fix_ensure_ephem = "DE440"
fix_ensure_clk = "TT(BIPM2024)"

This is the single-pulsar version of the repository’s Step 1 pattern from configs/workflows/steps/step1_fix_flags_variants.toml.

What Each Key Is For

Core routing keys:

fix_base_branch

Existing branch to mutate from. For a newly ingested dataset, this is often raw_ingest.

fix_branch_name

New branch that receives the edits.

branches and reference_branch

Keep these aligned with the branch this stage is meant to operate on.

Mutation keys:

fix_infer_system_flags

Infer or normalize system labels used later by jump logic and PQC grouping.

fix_system_flag_overwrite_existing

Overwrite existing inconsistent values. Use carefully, but for an initial harmonization pass it is often the right choice.

fix_insert_missing_jumps

Insert jumps that should exist based on backend/system structure.

fix_prune_stale_jumps

Remove jumps that no longer map to real data structure.

fix_jump_flag

Flag used as the jump grouping reference. Commonly -sys. This choice should align with the grouping logic that later stages use.

Variant keys:

fix_generate_alltim_variants

Generate variant include products named <PSR>_<variant>_all.tim for downstream analysis or review.

fix_backend_classifications_path

Classification catalog used to decide which backends belong to which variant families.

fix_alltim_variants_path

Variant-definition catalog.

fix_jump_reference_variants

Build reference-system jump variants and write variant parfiles named <PSR>_<variant>.par. This is useful when the workflow needs a reproducible set of variant products for later comparison or review.

Consistency keys:

fix_ensure_ephem

Ensure a specific ephemeris in parfiles.

fix_ensure_clk

Ensure a specific clock string in parfiles.

Why run_tempo2 Is Usually Disabled Here

The point of this pass is to establish a coherent branch structure and a coherent data-model boundary. Running tempo2 at the same time can blur that boundary because it adds fit products and diagnostics to a run whose main purpose is mutation.

It is therefore usually cleaner to:

  1. finish the metadata and jump pass,

  2. inspect the resulting branch,

  3. run tempo2 and PQC in the next pass.

What “Basic Par File With All The Jumps” Means

This phrase needs a precise operational meaning.

The goal is not “every possible jump anyone could imagine.” The goal is:

  • the parfile reflects the current backend/system structure,

  • missing expected jumps are inserted,

  • stale or obsolete jumps are removed,

  • the branch becomes a sensible baseline for timing and QC.

In other words, this stage creates the first operationally coherent timing model branch.

How To Run The First Pass

Run:

pleb --config configs/runs/fixdataset/single_pulsar_step1_fix.toml

Because fix_apply = true, this is a branch-mutating run.

What To Inspect After The First Pass

Inspect:

  1. the new git branch,

  2. the pulsar parfile,

  3. the pulsar _all.tim plus any <PSR>_<variant>_all.tim products,

  4. the FixDataset summary outputs under the run directory,

  5. the jump reference CSV outputs if enabled.

Check:

  • whether -sys values now look consistent,

  • whether expected jump lines exist in the parfile,

  • whether obviously obsolete jump lines were removed,

  • whether variant files were created where expected.

How To Read The Outputs

For a single pulsar, the most useful direct inspection points are usually:

  • the pulsar parfile in the dataset branch,

  • the pulsar _all.tim file,

  • the backend tim files under tims/,

  • any generated <PSR>_<variant>_all.tim and <PSR>_<variant>.par files,

  • the run summary files under the run directory.

The parfile answers:

  • which jumps exist,

  • whether ephemeris and clock defaults were enforced,

  • whether the model now reflects the intended backend partitioning.

The tim files answer:

  • whether -sys and related flags were normalized consistently,

  • whether include structure still looks sane after mutation.

What Jumps Mean In This Workflow

A jump is not merely a nuisance parameter. In this workflow, jumps express the partitioning implied by backend/system differences.

The following points are operationally important:

  • jumps are tied to how the data are grouped,

  • wrong grouping gives wrong jump structure,

  • missing jumps can bias the fit,

  • stale jumps can clutter the model and confuse interpretation,

  • jump maintenance is a data-structure task before it is a statistical task.

Common First-Pass Errors

  • using the wrong fix_base_branch,

  • forgetting to limit to one pulsar,

  • running with run_tempo2=true when the goal is only mutation,

  • enabling too many unrelated fix actions at once,

  • generating variants without understanding the catalogs used.

What This Stage Usually Writes

Depending on the exact settings and input data, this pass may write:

  • updated tim files with normalized flags,

  • a revised pulsar parfile with jump maintenance applied,

  • regenerated _all.tim include files,

  • optional <PSR>_<variant>_all.tim and <PSR>_<variant>.par products,

  • jump-reference CSV outputs under fix_jump_reference_csv_dir,

  • run summaries in the run directory under results_dir.

Because this pass mutates the dataset branch, it is useful to compare the raw_ingest branch and the Step-1 branch directly with git after the run.

How This Stage Connects To PQC

The output of this stage is not just a cleaned branch. It is also the branch that defines the grouping vocabulary for later QC:

  • backend/system flags are more coherent,

  • expected jumps are present,

  • stale jumps are removed,

  • variant products, if enabled, are available on the branch that the detect stage will read.

This is why Step 2 commonly sets:

fix_base_branch = "step1_fix_flags_variants"

and then uses pqc_backend_col = "sys" or another grouping key that now has cleaner semantics than it had before this pass.