Canonicalization Engine (v0.1)

1. Purpose

The Canonicalization Engine ensures that semantically equivalent specifications produce structurally identical Canonical Graph representations.

It eliminates non-semantic variability introduced by:

Input ordering
Formatting differences
Structural noise
Implicit defaults

Canonicalization is mandatory before any semantic processing.

2. Core Guarantees

The Canonicalization Engine MUST guarantee:

Deterministic node ordering
Deterministic attribute ordering
Deterministic edge ordering
Stable canonical paths
Stable structural comparison behavior

Two semantically equivalent graphs MUST produce identical canonicalized graphs.

3. Canonicalization Scope

Canonicalization operates exclusively on:

Nodes
Node attributes
Edges
Canonical paths
Collections

It MUST NOT:

Modify semantic meaning
Add implicit defaults (unless defined by Specification)
Remove required information
Perform validation

4. Canonical Ordering Rules

Canonical ordering MUST be applied to:

4.1 Node Collections

Nodes MUST be ordered deterministically by:

Node type (stable predefined type precedence)
Primary identifier (lexicographic)
Canonical path (as tie-breaker)

Type precedence MUST be defined statically and MUST NOT change between runs.

4.2 Attributes Within Nodes

Attributes MUST be ordered lexicographically by attribute name.

Nested mappings MUST also follow canonical ordering recursively.

4.3 Edges

Outgoing edges of each node MUST be ordered by:

Edge type
Target node canonical path

4.4 Sequences

Sequences that represent unordered semantic sets MUST be sorted deterministically.

Sequences that represent semantically ordered collections MUST preserve semantic order.

The Specification MUST define which collections are ordered vs unordered.

5. Canonical Path Stabilization

Canonical paths MUST:

Be normalized
Use consistent separators
Use normalized identifiers
Avoid trailing delimiters
Be case-sensitive unless defined otherwise

Path normalization MUST NOT alter semantic identifiers.

6. Scalar Normalization

Scalars MUST be normalized as follows:

Booleans → true / false
Integers → canonical decimal form
Floats → normalized decimal form
Strings → preserved exactly
Null → explicit null

Whitespace trimming is prohibited unless explicitly defined in Specification.

7. Structural Noise Elimination

The Canonicalization Engine MAY eliminate:

Empty optional collections
Explicit default values that match specification defaults

Only if the Specification explicitly permits omission.

It MUST NOT eliminate:

Explicit user-defined values
Semantic declarations

8. Deterministic Traversal

Canonicalization MUST produce a traversal order that is:

Stable
Depth-first or breadth-first (implementation-defined)
Consistent across runs

Traversal order MUST NOT depend on original YAML ordering.

9. Structural Hash (Optional but Recommended)

The engine MAY compute a structural hash.

If implemented:

The hash MUST depend solely on canonicalized structure
The hash MUST be stable across runs
The hash algorithm MUST be documented
Hash collisions MUST be considered theoretically possible

The hash MUST NOT depend on memory addresses or runtime state.

10. Canonical Graph Immutability

Canonicalization MUST NOT mutate the original graph instance.

It MUST:

Produce a new graph instance OR
Freeze the existing graph after canonical ordering

The resulting graph MUST be immutable.

11. Handling Unordered Semantic Sets

For collections declared semantically unordered:

Sorting MUST be applied deterministically
Sorting key MUST be stable
Sorting algorithm MUST be deterministic

For collections declared semantically ordered:

Order MUST be preserved
Canonicalization MUST NOT reorder

12. Error Handling

Canonicalization SHOULD NOT produce semantic errors.

However, it MAY produce structural errors if:

Internal graph invariants are violated
Node types are inconsistent
Required attributes are missing unexpectedly

Such errors MUST be treated as fatal internal errors.

13. Performance Expectations

Canonicalization SHOULD operate in:

O(N log N)

Where N = number of nodes + edges.

Recursive normalization MUST avoid quadratic behavior.

14. Interaction with Other Engines

Canonicalization MUST precede:

Validation Engine
Diff Engine
Versioning Engine
Migration Engine

Diff MUST compare canonicalized graphs only.

Validation MUST operate on canonicalized graph only.

15. Compliance Criteria

An implementation is compliant if:

It produces identical canonical graphs for semantically equivalent specs
It guarantees stable ordering
It prevents YAML order influence
It normalizes scalars deterministically
It preserves semantic meaning

16. Non-Goals

Canonicalization does NOT:

Validate business logic
Classify changes
Perform migration
Interpret constraints

Its purpose is structural normalization only.

17. Summary

The Canonicalization Engine:

Freezes deterministic structure
Eliminates non-semantic variance
Enables reliable diffing
Enables reproducible hashing
Guarantees structural comparability

It is foundational for evolution and validation stability.

Canonicalization Engine (v0.1) ​

1. Purpose ​

2. Core Guarantees ​

3. Canonicalization Scope ​

4. Canonical Ordering Rules ​

4.1 Node Collections ​

4.2 Attributes Within Nodes ​

4.3 Edges ​

4.4 Sequences ​

5. Canonical Path Stabilization ​

6. Scalar Normalization ​

7. Structural Noise Elimination ​

8. Deterministic Traversal ​

9. Structural Hash (Optional but Recommended) ​

10. Canonical Graph Immutability ​

11. Handling Unordered Semantic Sets ​

12. Error Handling ​

13. Performance Expectations ​

14. Interaction with Other Engines ​

15. Compliance Criteria ​

16. Non-Goals ​

17. Summary ​