Parser & Loader (v0.1)
1. Purpose
The Parser & Loader is responsible for transforming raw specification files into a structured intermediate representation (IR).
It is the only module that interacts directly with external file formats.
It MUST NOT perform semantic validation or business rule enforcement.
2. Responsibilities
The Parser & Loader MUST:
- Read specification source files
- Validate YAML syntax
- Validate encoding
- Normalize basic data structures
- Produce a deterministic Intermediate Representation (IR)
- Detect structural anomalies at syntax level
It MUST NOT:
- Resolve semantic references
- Build graph structures
- Validate constraints
- Perform diff logic
- Mutate data after IR generation
3. Supported Input Format
3.1 Primary Format
The primary supported format is:
YAML (UTF-8 encoded)
3.2 Encoding Requirements
- Input MUST be valid UTF-8
- BOM MUST be ignored if present
- Invalid encoding MUST produce a fatal parsing error
3.3 YAML Constraints
The parser MUST enforce:
- No duplicate mapping keys
- No implicit type coercion beyond YAML standard
- No anchors or aliases unless explicitly supported
- No executable tags
If unsupported YAML features are encountered, the parser MUST fail deterministically.
4. Intermediate Representation (IR)
The output of the Parser & Loader is an Intermediate Representation (IR).
The IR MUST:
- Represent mappings as ordered key-value structures
- Represent sequences as ordered lists
- Preserve scalar values as typed primitives
- Exclude comments
- Preserve original structure hierarchy
The IR MUST be:
- Deterministic
- Independent of file formatting
- Independent of key ordering in source
5. Structural Validation at Parsing Stage
The Parser & Loader MAY perform limited structural validation limited to:
- Root-level type validation (e.g., must be mapping)
- Presence of mandatory top-level keys (optional, implementation-defined)
- Detection of illegal scalar types
It MUST NOT validate:
- Referential integrity
- Process structure
- Event structure
- Constraint semantics
- Versioning rules
All semantic validation belongs to the Validation Engine.
6. Duplicate Key Handling
Duplicate keys within the same mapping:
- MUST result in a fatal error
- MUST include canonical location information
- MUST be deterministic in reporting order
Silent overwriting is prohibited.
7. Normalization Rules
The Parser & Loader MUST normalize:
- Line endings
- Scalar whitespace (no trimming unless specified)
- Numeric representation (preserve numeric type)
- Boolean representation (true/false only)
It MUST NOT:
- Reorder mappings
- Reorder sequences
- Inject default values
Default value injection is prohibited at this stage.
8. Error Model Integration
Parsing errors MUST:
- Use stable error codes
- Include file location (line and column if available)
- Be categorized as fatal
- Prevent further processing
The Parser MUST stop on fatal errors.
9. Determinism Requirements
Given identical source input:
- The IR MUST be identical
- Error output MUST be identical
- Ordering MUST be stable
Given semantically identical but syntactically different YAML:
- The IR MAY differ
- Canonical equivalence is enforced later
10. Multi-File Support (If Implemented)
If multi-file loading is supported:
- File resolution order MUST be deterministic
- Include/import mechanisms MUST be explicitly defined
- Circular inclusion MUST be detected
- Inclusion MUST produce a single merged IR
If not supported, the implementation MUST explicitly reject multi-file inputs.
11. Security Considerations
The Parser MUST:
- Disable remote resource loading
- Disable code execution tags
- Avoid arbitrary file inclusion
- Protect against entity expansion attacks
The Parser MUST be safe against malicious YAML payloads.
12. Memory and Size Constraints
The implementation SHOULD define:
- Maximum file size
- Maximum nesting depth
- Maximum mapping size
- Maximum sequence size
Exceeding limits MUST produce deterministic failure.
13. Output Contract
The Parser & Loader MUST output:
IntermediateRepresentation {
root: MappingNode
metadata: {
source_path
checksum (optional)
}
}The exact internal structure MAY vary, but logical equivalence MUST hold.
14. Integration with Next Stage
The IR produced by this module is the sole input to:
Canonical Graph Builder
No other module may consume raw YAML.
15. Non-Goals
The Parser & Loader does NOT:
- Construct Canonical Graph
- Validate business logic
- Perform diff
- Evaluate constraints
- Generate artifacts
Its sole purpose is syntactic parsing and safe normalization.
16. Compliance Criteria
An implementation is compliant if:
- It rejects invalid YAML deterministically
- It rejects duplicate keys
- It produces stable IR
- It does not perform semantic validation
- It enforces encoding constraints
17. Summary
The Parser & Loader is:
- Deterministic
- Strict
- Non-semantic
- Immutable in output
- Security-hardened
It prepares structured data for Canonical Graph construction.