Appendix C – Validation Code Listings¶

C.1. General Principle¶

The code of validation modules (deterministic checking, property‑based tests, integration with Hypothesis and TLA+) is no longer duplicated in the document. Instead, each significant piece of code is stored as a separate signed artifact in IPFS. This appendix contains: - a list of artifacts with their CIDs and checksums, - a brief description of each module’s purpose, - commands for download and verification, - examples of key data structures to understand the logic. All artifacts are part of the core-tools repository (Appendix K) and are accessible via the corresponding CIDs.

C.2. Deterministic Validation (Validation Pipeline)¶

C.2.1. Main Module¶

Purpose: sequential execution of Ruff, mypy, pytest+Hypothesis, Bandit inside a sandbox with early exit on first error. Returns a structured report in the artifact format (section 2.12).

Field	Value
CID (IPFS)	`QmValidationPipelineV2`
BLAKE3 hash	`f7e6d5c4b3a291807f6e5d4c3b2a1908f7e6d5c4b3a291807f6e5d4c3b2a1908`
File name	`deterministic_pipeline.py`
Version	2.0.1 (compatible with Python 3.12+)
Signature	`ed25519:8f7e6d…`

Download and verification:

ipfs get QmValidationPipelineV2 -o deterministic_pipeline.py
sha256sum deterministic_pipeline.py
# Expected output: f7e6d5c4b3a2…  deterministic_pipeline.py

C.2.2. Main Function Signature¶

def run_deterministic_validation(
    generated_code: str,
    module_name: str = "temp_agent",
    sandbox_id: Optional[str] = None
) -> Tuple[bool, ValidationArtifact]:
    """
    Performs a full deterministic validation cycle.
    Returns (passed, artifact), where artifact contains:
    - stage_results: results of each stage (ruff, mypy, pytest, bandit)
    - metrics: execution time, number of errors
    - content_cid: CID of the saved report
    """

C.2.3. ValidationArtifact Structure¶

{
  "artifact_id": "art_val_20260420T120000Z",
  "type": "validation_report",
  "input_code_cid": "QmCode…",
  "stages": {
    "ruff": {"status": "passed", "issues": 0, "execution_time_ms": 120},
    "mypy": {"status": "passed", "type_errors": 0, "coverage": 0.92},
    "pytest": {"status": "passed", "tests_run": 42, "failures": 0},
    "bandit": {"status": "passed", "severity_high": 0, "severity_medium": 1}
  },
  "overall_status": "passed",
  "timestamp": "2026-04-20T12:00:00Z",
  "signature": "ed25519:…"
}

C.3. Property‑Based Testing (Hypothesis)¶

C.3.1. Base Test Module¶

Purpose: templates of Hypothesis strategies and decorators for checking properties of generated code. Used inside the sandbox to discover edge cases.

Field	Value
CID (IPFS)	`QmHypothesisTestSuiteV1`
BLAKE3 hash	`a1b2c3d4e5f6…`
File name	`property_tests.py`

C.3.2. Example Strategy¶

from hypothesis import given, strategies as st

@given(st.lists(st.integers()))
def test_sum_even(lst):
    result = sum_even(lst)
    expected = sum(x for x in lst if x % 2 == 0)
    assert result == expected

The full set of strategies includes generators for: - lists, dictionaries, recursive structures, - floating‑point numbers (with nan/inf control), - strings in various encodings.

C.4. Integration with TLA+ Model Checker¶

C.4.1. TLC Launch Script¶

Purpose: automatic generation of a TLC configuration file, running the model checker inside a sandbox, and parsing the results.

Field	Value
CID (IPFS)	`QmTLAValidatorV1`
BLAKE3 hash	`c9d8e7f6a5b4…`
File name	`tla_validator.py`

C.4.2. Example of a Generated TLA+ Specification (fragment)¶

---- MODULE SimpleAgentModule ----
EXTENDS Integers, TLC
VARIABLES state, memoryUsage

Init == state = "IDLE" /\ memoryUsage = 0

Next ==
  \/ state = "IDLE" /\ state' = "PROCESSING" /\ memoryUsage' = memoryUsage + 128
  \/ state = "PROCESSING" /\ state' = "IDLE" /\ memoryUsage' = 0

Invariant == memoryUsage <= 4096

Spec == Init /\ [][Next]_<<state, memoryUsage>>

Full specifications for various modules are available in Appendix D.

C.5. Time‑Based Invariant Analysis (Time‑Based Invariants)¶

C.5.1. Checking Module¶

Purpose: verification of code containing # @time_invariant: annotations using freezegun or time‑machine.

Field	Value
CID (IPFS)	`QmTimeInvariantCheckerV1`
BLAKE3 hash	`d4c3b2a1908f7e…`
File name	`time_invariant_checker.py`

C.5.2. Example Annotation and Check¶

# @time_invariant: max_retry_delay < 5000
def retry_with_backoff():
    for attempt in range(3):
        try:
            return call_external()
        except Exception:
            time.sleep(1000 * (2 ** attempt))  # delay in ms

The module extracts annotations, emulates the passage of time, and checks the condition.

C.6. Static Security Analysis (Bandit + Semgrep)¶

C.6.1. Configuration and Wrapper¶

Purpose: unified execution of Bandit and Semgrep with custom rules, including prohibition of eval, exec, os.system, and unsafe deserializations.

Field	Value
CID (IPFS)	`QmSecurityScannerV1`
BLAKE3 hash	`e5f4a3b2c1d0…`
File name	`security_scanner.py`
Semgrep rules	`QmSemgrepRulesV1` (separate artifact)

C.6.2. Example Custom Semgrep Rule (YAML)¶

rules:
  - id: no-eval
    pattern: eval(...)
    message: "eval() is forbidden in agent-generated code"
    severity: ERROR
  - id: no-subprocess-shell
    pattern: subprocess.run(..., shell=True)
    message: "shell=True is a security risk"
    severity: ERROR

C.7. Shadow Benchmarking with Chaos Injections¶

C.7.1. Shadow Testing Manager¶

Purpose: running code in an isolated shadow container, collecting performance metrics, applying chaos scenarios, and comparing against a baseline using statistical methods (P1‑2).

Field	Value
CID (IPFS)	`QmShadowBenchmarkV2`
BLAKE3 hash	`b8a7c6d5e4f3…`
File name	`shadow_benchmark.py`

C.7.2. Main Functions¶

def run_shadow_benchmark(
    new_code: str,
    baseline_metrics: dict,
    target_module: str,
    chaos_profile: Optional[str] = None
) -> ShadowBenchmarkArtifact:
    """
    Returns an artifact with metrics, regressions, and chaos test results.
    """

The result structure includes: - throughput (p50, p95, p99), - latency (p50, p95, p99), - memory_peak_mb, - resilience_score (P1‑6), - regression_flags (if degradation exceeds a threshold).

C.8. Chaos Engineering Scenarios (Executable)¶

C.8.1. Fault Injection Scripts¶

Purpose: a set of executable scripts (in Python and Rust) that run inside the shadow container to emulate network delays, packet loss, CPU throttling, and escape attempts.

Artifact	CID	Description
`network_delay.py`	`QmChaosNetDelayV1`	Introduces delay and jitter on a network interface
`packet_loss.py`	`QmChaosPacketLossV1`	Emulates packet loss via tc‑netem
`cpu_throttle.py`	`QmChaosCpuV1`	Limits CPU via cgroups
`escape_memfd.c`	`QmEscapeMemfdV1`	Fileless injection attempt (testing only)

All scripts are signed and run with restricted privileges.

C.9. Artifact Integrity Verification¶

A manifest validation_manifest.json, available at CID QmValidationManifestV2, contains the list of all CIDs and their BLAKE3 hashes for verification.

Verification command:

ipfs get QmValidationManifestV2
jq -r '.artifacts[] | "\(.cid) \(.blake3)"' validation_manifest.json | while read cid hash; do
  ipfs get $cid -o tmp_file
  echo "$hash tmp_file" | sha256sum -c
done

C.10. Relationship with Other Sections¶

5.3 Deterministic Validation Pipeline – logic and stage descriptions.
5.5 Unit and Property‑Based Testing – use of Hypothesis.
5.9–5.11 Shadow Benchmarking & Chaos – full testing cycle.
2.12 Artifact Model and Traceability – artifact structure.
Appendix D – TLA+ specifications.

C.11. Change History¶

Version	Date	Changes	Manifest CID
V1	2026-01-15	Initial set of scripts	`QmValidationManifestV1`
V2	2026-04-20	Added statistical metrics, resilience score, updated pipeline	`QmValidationManifestV2`