Ingest Mapping¶

The ingest mode builds a standard pulsar layout from arbitrary folders using an explicit JSON mapping file. Backend names are never auto-parsed. Every backend is defined in the mapping file using the PETA naming convention:

TEL.BACKEND.CENFREQ

Canonical output layout¶

For each pulsar, ingest writes:

Jxxxx+xxxx/Jxxxx+xxxx.par
Jxxxx+xxxx/Jxxxx+xxxx_all.tim (includes each backend tim)
Jxxxx+xxxx/tims/TEL.BACKEND.CENFREQ.tim
Jxxxx+xxxx/tmplts/<original_template_name>

Running ingest¶

You can run ingest as a subcommand or as a config-driven mode:

pleb ingest --mapping configs/ingest_mapping.example.json --output-dir /data/epta/EPTA

Or via config:

ingest_mapping_file = "configs/ingest_mapping.example.json"
ingest_output_dir = "/data/epta/EPTA"

Then:

pleb --config pipeline.toml

JSON schema¶

Schema file: configs/schemas/ingest_mapping.schema.json.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "pleb ingest mapping",
  "type": "object",
  "properties": {
    "sources": { "type": "array", "items": { "type": "string" } },
    "par_roots": { "type": "array", "items": { "type": "string" } },
    "template_roots": { "type": "array", "items": { "type": "string" } },
    "ignore_backends": { "type": "array", "items": { "type": "string" } },
    "pulsar_aliases": { "type": "object", "additionalProperties": { "type": "string" } },
    "backends": {
      "type": "object",
      "additionalProperties": {
        "type": "object",
        "properties": {
          "root": { "type": "string" },
          "ignore": { "type": "boolean" },
          "tim_glob": { "type": "string" },
          "ignore_suffixes": { "type": "array", "items": { "type": "string" } }
        },
        "required": ["root"]
      }
    }
  }
}

Example mapping¶

{
  "sources": [
    "/data/epta/raw",
    "/data/epta/legacy"
  ],
  "par_roots": [
    "/data/epta/raw/parfiles",
    "/data/epta/legacy/par"
  ],
  "template_roots": [
    "/data/epta/raw/templates"
  ],
  "ignore_backends": [
    "NRT.OLD.CHECK"
  ],
  "pulsar_aliases": {
    "B1937+21": "J1939+2134",
    "B1855+09": "J1857+0943"
  },
  "backends": {
    "EFF.P200.1360": {
      "root": "/data/epta/raw/tim/EFF/P200/1360",
      "tim_glob": "*.tim",
      "ignore_suffixes": ["_all.tim"]
    },
    "NRT.NUPPI.1480": {
      "root": "/data/epta/legacy/NRT/NUPPI/1480",
      "tim_glob": "*.tim"
    },
    "JBO.ROACH.1520": {
      "root": "/data/epta/raw/JBO/ROACH/1520",
      "ignore": false
    }
  }
}

How to populate the mapping¶

List data roots (optional): sources is informative only.
Define par roots: directories where *.par are stored.
Define template roots: directories containing profile templates.
Add pulsar aliases: map every B-name to its standard J-name.
Define backends: one entry per backend using the PETA naming convention. Each backend entry must include a root path that contains the tim files for that backend. No automatic parsing or guessing is performed.
Ignore lists: if a backend is listed in ignore_backends it is skipped entirely.

Strict mapping rules¶

Backend names come only from the mapping file keys.
If a tim file is found but its pulsar name cannot be resolved to a J-name (via explicit mapping), ingest fails.
If multiple par files map to the same pulsar, ingest fails.