Appendix C – Validation Code Listings¶
C.1. General Principle¶
The code of validation modules (deterministic checking, property‑based tests, integration with Hypothesis
and TLA+) is no longer duplicated in the document. Instead, each significant piece of code is stored as
a separate signed artifact in IPFS. This appendix contains:
- a list of artifacts with their CIDs and checksums,
- a brief description of each module’s purpose,
- commands for download and verification,
- examples of key data structures to understand the logic.
All artifacts are part of the core-tools repository (Appendix K) and are accessible via the corresponding
CIDs.
C.2. Deterministic Validation (Validation Pipeline)¶
C.2.1. Main Module¶
Purpose: sequential execution of Ruff, mypy, pytest+Hypothesis, Bandit inside a sandbox with early exit on first error. Returns a structured report in the artifact format (section 2.12).
| Field | Value |
|---|---|
| CID (IPFS) | QmValidationPipelineV2 |
| BLAKE3 hash | f7e6d5c4b3a291807f6e5d4c3b2a1908f7e6d5c4b3a291807f6e5d4c3b2a1908 |
| File name | deterministic_pipeline.py |
| Version | 2.0.1 (compatible with Python 3.12+) |
| Signature | ed25519:8f7e6d… |
Download and verification:
ipfs get QmValidationPipelineV2 -o deterministic_pipeline.py
sha256sum deterministic_pipeline.py
# Expected output: f7e6d5c4b3a2… deterministic_pipeline.py
C.2.2. Main Function Signature¶
def run_deterministic_validation(
generated_code: str,
module_name: str = "temp_agent",
sandbox_id: Optional[str] = None
) -> Tuple[bool, ValidationArtifact]:
"""
Performs a full deterministic validation cycle.
Returns (passed, artifact), where artifact contains:
- stage_results: results of each stage (ruff, mypy, pytest, bandit)
- metrics: execution time, number of errors
- content_cid: CID of the saved report
"""
C.2.3. ValidationArtifact Structure¶
{
"artifact_id": "art_val_20260420T120000Z",
"type": "validation_report",
"input_code_cid": "QmCode…",
"stages": {
"ruff": {"status": "passed", "issues": 0, "execution_time_ms": 120},
"mypy": {"status": "passed", "type_errors": 0, "coverage": 0.92},
"pytest": {"status": "passed", "tests_run": 42, "failures": 0},
"bandit": {"status": "passed", "severity_high": 0, "severity_medium": 1}
},
"overall_status": "passed",
"timestamp": "2026-04-20T12:00:00Z",
"signature": "ed25519:…"
}
C.3. Property‑Based Testing (Hypothesis)¶
C.3.1. Base Test Module¶
Purpose: templates of Hypothesis strategies and decorators for checking properties of generated code. Used inside the sandbox to discover edge cases.
| Field | Value |
|---|---|
| CID (IPFS) | QmHypothesisTestSuiteV1 |
| BLAKE3 hash | a1b2c3d4e5f6… |
| File name | property_tests.py |
C.3.2. Example Strategy¶
from hypothesis import given, strategies as st
@given(st.lists(st.integers()))
def test_sum_even(lst):
result = sum_even(lst)
expected = sum(x for x in lst if x % 2 == 0)
assert result == expected
The full set of strategies includes generators for: - lists, dictionaries, recursive structures, - floating‑point numbers (with nan/inf control), - strings in various encodings.
C.4. Integration with TLA+ Model Checker¶
C.4.1. TLC Launch Script¶
Purpose: automatic generation of a TLC configuration file, running the model checker inside a sandbox, and parsing the results.
| Field | Value |
|---|---|
| CID (IPFS) | QmTLAValidatorV1 |
| BLAKE3 hash | c9d8e7f6a5b4… |
| File name | tla_validator.py |
C.4.2. Example of a Generated TLA+ Specification (fragment)¶
---- MODULE SimpleAgentModule ----
EXTENDS Integers, TLC
VARIABLES state, memoryUsage
Init == state = "IDLE" /\ memoryUsage = 0
Next ==
\/ state = "IDLE" /\ state' = "PROCESSING" /\ memoryUsage' = memoryUsage + 128
\/ state = "PROCESSING" /\ state' = "IDLE" /\ memoryUsage' = 0
Invariant == memoryUsage <= 4096
Spec == Init /\ [][Next]_<<state, memoryUsage>>
Full specifications for various modules are available in Appendix D.
C.5. Time‑Based Invariant Analysis (Time‑Based Invariants)¶
C.5.1. Checking Module¶
Purpose: verification of code containing # @time_invariant: annotations using
freezegun or time‑machine.
| Field | Value |
|---|---|
| CID (IPFS) | QmTimeInvariantCheckerV1 |
| BLAKE3 hash | d4c3b2a1908f7e… |
| File name | time_invariant_checker.py |
C.5.2. Example Annotation and Check¶
# @time_invariant: max_retry_delay < 5000
def retry_with_backoff():
for attempt in range(3):
try:
return call_external()
except Exception:
time.sleep(1000 * (2 ** attempt)) # delay in ms
The module extracts annotations, emulates the passage of time, and checks the condition.
C.6. Static Security Analysis (Bandit + Semgrep)¶
C.6.1. Configuration and Wrapper¶
Purpose: unified execution of Bandit and Semgrep with custom rules, including prohibition of
eval, exec, os.system, and unsafe deserializations.
| Field | Value |
|---|---|
| CID (IPFS) | QmSecurityScannerV1 |
| BLAKE3 hash | e5f4a3b2c1d0… |
| File name | security_scanner.py |
| Semgrep rules | QmSemgrepRulesV1 (separate artifact) |
C.6.2. Example Custom Semgrep Rule (YAML)¶
rules:
- id: no-eval
pattern: eval(...)
message: "eval() is forbidden in agent-generated code"
severity: ERROR
- id: no-subprocess-shell
pattern: subprocess.run(..., shell=True)
message: "shell=True is a security risk"
severity: ERROR
C.7. Shadow Benchmarking with Chaos Injections¶
C.7.1. Shadow Testing Manager¶
Purpose: running code in an isolated shadow container, collecting performance metrics, applying chaos scenarios, and comparing against a baseline using statistical methods (P1‑2).
| Field | Value |
|---|---|
| CID (IPFS) | QmShadowBenchmarkV2 |
| BLAKE3 hash | b8a7c6d5e4f3… |
| File name | shadow_benchmark.py |
C.7.2. Main Functions¶
def run_shadow_benchmark(
new_code: str,
baseline_metrics: dict,
target_module: str,
chaos_profile: Optional[str] = None
) -> ShadowBenchmarkArtifact:
"""
Returns an artifact with metrics, regressions, and chaos test results.
"""
The result structure includes: - throughput (p50, p95, p99), - latency (p50, p95, p99), - memory_peak_mb, - resilience_score (P1‑6), - regression_flags (if degradation exceeds a threshold).
C.8. Chaos Engineering Scenarios (Executable)¶
C.8.1. Fault Injection Scripts¶
Purpose: a set of executable scripts (in Python and Rust) that run inside the shadow container to emulate network delays, packet loss, CPU throttling, and escape attempts.
| Artifact | CID | Description |
|---|---|---|
network_delay.py |
QmChaosNetDelayV1 |
Introduces delay and jitter on a network interface |
packet_loss.py |
QmChaosPacketLossV1 |
Emulates packet loss via tc‑netem |
cpu_throttle.py |
QmChaosCpuV1 |
Limits CPU via cgroups |
escape_memfd.c |
QmEscapeMemfdV1 |
Fileless injection attempt (testing only) |
All scripts are signed and run with restricted privileges.
C.9. Artifact Integrity Verification¶
A manifest validation_manifest.json, available at CID QmValidationManifestV2, contains the list of all CIDs and their BLAKE3 hashes for verification.
Verification command:
ipfs get QmValidationManifestV2
jq -r '.artifacts[] | "\(.cid) \(.blake3)"' validation_manifest.json | while read cid hash; do
ipfs get $cid -o tmp_file
echo "$hash tmp_file" | sha256sum -c
done
C.10. Relationship with Other Sections¶
- 5.3 Deterministic Validation Pipeline – logic and stage descriptions.
- 5.5 Unit and Property‑Based Testing – use of Hypothesis.
- 5.9–5.11 Shadow Benchmarking & Chaos – full testing cycle.
- 2.12 Artifact Model and Traceability – artifact structure.
- Appendix D – TLA+ specifications.
C.11. Change History¶
| Version | Date | Changes | Manifest CID |
|---|---|---|---|
| V1 | 2026-01-15 | Initial set of scripts | QmValidationManifestV1 |
| V2 | 2026-04-20 | Added statistical metrics, resilience score, updated pipeline | QmValidationManifestV2 |