Ingest Mapping

The ingest mode builds a canonical pulsar layout from arbitrary folders using an explicit JSON mapping file. Backend names are never auto-parsed. Every backend is defined in the mapping file using the PETA naming convention:

TEL.BACKEND.CENFREQ

Canonical output layout

For each pulsar, ingest writes:

  • Jxxxx+xxxx/Jxxxx+xxxx.par

  • Jxxxx+xxxx/Jxxxx+xxxx_all.tim (includes each backend tim)

  • Jxxxx+xxxx/tims/TEL.BACKEND.CENFREQ.tim

  • Jxxxx+xxxx/tmplts/<original_template_name>

Running ingest

You can run ingest as a subcommand or as a config-driven mode:

pleb ingest --mapping configs/ingest_mapping.example.json --output-dir /data/epta/EPTA

Or via config:

ingest_mapping_file = "configs/ingest_mapping.example.json"
ingest_output_dir = "/data/epta/EPTA"

Then:

pleb --config pipeline.toml

JSON schema

Schema file: configs/schemas/ingest_mapping.schema.json.

{
  "$schema": "https://json-schema.org/draft/2020-12/schema",
  "title": "pleb ingest mapping",
  "type": "object",
  "properties": {
    "sources": { "type": "array", "items": { "type": "string" } },
    "par_roots": { "type": "array", "items": { "type": "string" } },
    "template_roots": { "type": "array", "items": { "type": "string" } },
    "ignore_backends": { "type": "array", "items": { "type": "string" } },
    "pulsar_aliases": { "type": "object", "additionalProperties": { "type": "string" } },
    "backends": {
      "type": "object",
      "additionalProperties": {
        "type": "object",
        "properties": {
          "root": { "type": "string" },
          "ignore": { "type": "boolean" },
          "tim_glob": { "type": "string" },
          "ignore_suffixes": { "type": "array", "items": { "type": "string" } }
        },
        "required": ["root"]
      }
    }
  }
}

Example mapping

{
  "sources": [
    "/data/epta/raw",
    "/data/epta/legacy"
  ],
  "par_roots": [
    "/data/epta/raw/parfiles",
    "/data/epta/legacy/par"
  ],
  "template_roots": [
    "/data/epta/raw/templates"
  ],
  "ignore_backends": [
    "NRT.OLD.CHECK"
  ],
  "pulsar_aliases": {
    "B1937+21": "J1939+2134",
    "B1855+09": "J1857+0943"
  },
  "backends": {
    "EFF.P200.1360": {
      "root": "/data/epta/raw/tim/EFF/P200/1360",
      "tim_glob": "*.tim",
      "ignore_suffixes": ["_all.tim"]
    },
    "NRT.NUPPI.1480": {
      "root": "/data/epta/legacy/NRT/NUPPI/1480",
      "tim_glob": "*.tim"
    },
    "JBO.ROACH.1520": {
      "root": "/data/epta/raw/JBO/ROACH/1520",
      "ignore": false
    }
  }
}

How to populate the mapping

  1. List data roots (optional): sources is informative only.

  2. Define par roots: directories where *.par are stored.

  3. Define template roots: directories containing profile templates.

  4. Add pulsar aliases: map every B-name to its canonical J-name.

  5. Define backends: one entry per backend using the PETA naming convention. Each backend entry must include a root path that contains the tim files for that backend. No automatic parsing or guessing is performed.

  6. Ignore lists: if a backend is listed in ignore_backends it is skipped entirely.

Strict mapping rules

  • Backend names come only from the mapping file keys.

  • If a tim file is found but its pulsar name cannot be resolved to a J-name (via explicit mapping), ingest fails.

  • If multiple par files map to the same pulsar, ingest fails.