Ingest Mapping ============== The ingest mode builds a canonical pulsar layout from arbitrary folders using an explicit JSON mapping file. Backend names are **never auto-parsed**. Every backend is defined in the mapping file using the PETA naming convention: ``TEL.BACKEND.CENFREQ`` Canonical output layout ----------------------- For each pulsar, ingest writes: - ``Jxxxx+xxxx/Jxxxx+xxxx.par`` - ``Jxxxx+xxxx/Jxxxx+xxxx_all.tim`` (includes each backend tim) - ``Jxxxx+xxxx/tims/TEL.BACKEND.CENFREQ.tim`` - ``Jxxxx+xxxx/tmplts/`` Running ingest -------------- You can run ingest as a subcommand or as a config-driven mode: .. code-block:: bash pleb ingest --mapping configs/ingest_mapping.example.json --output-dir /data/epta/EPTA Or via config: .. code-block:: toml ingest_mapping_file = "configs/ingest_mapping.example.json" ingest_output_dir = "/data/epta/EPTA" Then: .. code-block:: bash pleb --config pipeline.toml JSON schema ----------- Schema file: ``configs/schemas/ingest_mapping.schema.json``. .. code-block:: json { "$schema": "https://json-schema.org/draft/2020-12/schema", "title": "pleb ingest mapping", "type": "object", "properties": { "sources": { "type": "array", "items": { "type": "string" } }, "par_roots": { "type": "array", "items": { "type": "string" } }, "template_roots": { "type": "array", "items": { "type": "string" } }, "ignore_backends": { "type": "array", "items": { "type": "string" } }, "pulsar_aliases": { "type": "object", "additionalProperties": { "type": "string" } }, "backends": { "type": "object", "additionalProperties": { "type": "object", "properties": { "root": { "type": "string" }, "ignore": { "type": "boolean" }, "tim_glob": { "type": "string" }, "ignore_suffixes": { "type": "array", "items": { "type": "string" } } }, "required": ["root"] } } } } Example mapping --------------- .. code-block:: json { "sources": [ "/data/epta/raw", "/data/epta/legacy" ], "par_roots": [ "/data/epta/raw/parfiles", "/data/epta/legacy/par" ], "template_roots": [ "/data/epta/raw/templates" ], "ignore_backends": [ "NRT.OLD.CHECK" ], "pulsar_aliases": { "B1937+21": "J1939+2134", "B1855+09": "J1857+0943" }, "backends": { "EFF.P200.1360": { "root": "/data/epta/raw/tim/EFF/P200/1360", "tim_glob": "*.tim", "ignore_suffixes": ["_all.tim"] }, "NRT.NUPPI.1480": { "root": "/data/epta/legacy/NRT/NUPPI/1480", "tim_glob": "*.tim" }, "JBO.ROACH.1520": { "root": "/data/epta/raw/JBO/ROACH/1520", "ignore": false } } } How to populate the mapping --------------------------- 1. **List data roots** (optional): ``sources`` is informative only. 2. **Define par roots**: directories where ``*.par`` are stored. 3. **Define template roots**: directories containing profile templates. 4. **Add pulsar aliases**: map every B-name to its canonical J-name. 5. **Define backends**: one entry per backend using the PETA naming convention. Each backend entry must include a ``root`` path that contains the tim files for that backend. No automatic parsing or guessing is performed. 6. **Ignore lists**: if a backend is listed in ``ignore_backends`` it is skipped entirely. Strict mapping rules -------------------- - Backend names come **only** from the mapping file keys. - If a tim file is found but its pulsar name cannot be resolved to a J-name (via explicit mapping), ingest fails. - If multiple par files map to the same pulsar, ingest fails.