Appendix AA: Fine-Tuning DeepSeek-V4 for Specialized Tasks¶

Purpose: Describe the targeted fine‑tuning process of the single MoE model DeepSeek‑V4 to improve its performance in three key domains critical for the system's autonomous operation:

DSL rule generation (Neuro‑Symbolic L2 Compression, Memory_Hierarchy_Mem0g.md).
Formal reasoning for Neuro‑Symbolic Governance (Validation_and_Verification.md, Intrinsic_Motivation.md).
Multimodal verification (Meat‑Interface 3.0 / DeepSight, Meat_Interface_Tasking.md).

Fine‑tuning is performed using the QLoRA (Quantized Low‑Rank Adaptation) method on synthetic datasets generated by the system itself during Phase 2–3. The process is fully automated and triggered by the Meta‑Decision‑Pipeline when persistent quality degradation is detected in one of the domains.

AA.1. Motivation and Launch Criteria¶

The base DeepSeek‑V4 model shows high quality on general tasks but may exhibit suboptimal behavior in highly specialized scenarios:

DSL rules: the model tends to generate redundant or syntactically incorrect S‑expressions (up to 5% compilation errors).
Formal proofs: generation of SMT‑LIB2 specifications sometimes requires 2–3 attempts due to trivial tautologies (Concolic Filtering discards ~30% as trivial).
Multimodal verification: deepfake detection accuracy and watermark extraction on specific image types (document photos, receipts) are below the target 98%.

Fine‑tuning launch criteria (via Meta‑Decision‑Pipeline):

Domain	Metric	Launch Threshold
DSL	DSL rule compilation error rate	> 3% over 7 days
Neuro‑Symbolic Governance	Share of trivial tautologies in generated SMT specs	> 35% over 30 days
Multimodal	Deepfake detector False Positive Rate	> 3% over 30 days
Multimodal	Watermark extraction accuracy	< 95% over 30 days

Fine‑tuning is not launched more than once every 60 days per domain to avoid overfitting.

AA.2. Synthetic Dataset Preparation¶

AA.2.1. Data Generation for DSL Rules¶

Source: successful PPO executor trajectories (MEV_and_PPO_Executors.md) that passed Batch Compression.

Process:

All DistilledWisdom items with a populated dsl_rule field from the last 90 days are extracted from Mem0g L2.
For each rule, a training pair is generated:
Input: (context: market_regime, volatility, liquidity) + (textual strategy description)
Target: (DSL rule in correct syntax)
Additionally, negative examples are generated — pairs with intentionally corrupted syntax, so that the model learns to distinguish valid and invalid rules.
Dataset size: ≥ 5000 pairs.

AA.2.2. Data Generation for Neuro‑Symbolic Governance¶

Source: successful ConstitutionalPrinciple items with verified ProofTree (Constitutional Debate 2.0 / Neuro_Symbolic_Governance.md).

Process:

All ConstitutionalPrinciple items with a formal_proof_cid field that passed Multi‑Solver verification (Z3 + CVC4 + Yices) are extracted from Mem0g L2.
For each principle, a pair is formed:
Input: (L3.0 axioms + current L3.1 + proposed amendment in natural language)
Target: (valid SMT‑LIB2 specification)
Negative examples: rejected specifications (tagged trivial or counterexample_found).
Dataset size: ≥ 1000 pairs.

AA.2.3. Data Generation for Multimodal Verification¶

Source: Canary Tasks (Meat_Interface_Tasking.md) and the Meat‑Interface 3.0 verification archive.

Process:

All images that passed through CanaryVerifier with high‑confidence results (confidence > 0.90) are collected.
For each image, a pair is created:
Input: image + verification prompt (watermark presence / real vs fake)
Target: JSON with fields status, confidence, violations
For deepfake detection, synthetic images are generated via DeepSeek‑V4 in Architectus mode (DeepSight Media Synthesis) — they are labeled synthetic.
Dataset size: ≥ 10000 images.

AA.3. QLoRA Fine‑Tuning Procedure¶

AA.3.1. Configuration¶

Parameter	Value
Base model	`deepseek-v4` (weights from `QmDeepSeekV4Weights`)
Quantization	4‑bit NF4 (bitsandbytes)
Adapters	LoRA: rank=64, alpha=128, target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"]
Optimizer	AdamW 8‑bit, lr=2e‑4, cosine schedule
Batch size	4 (gradient accumulation 16 → effective batch 64)
Epochs	3 for DSL, 5 for Governance, 2 for Multimodal
Platform	Core Node (local, 4× RTX PRO 6000) or rented cluster via Vast.ai
Monitoring	Weights & Biases (local server)

AA.3.2. Launch Script¶

#!/bin/bash
# scripts/finetune_deepseek.sh

MODEL_CID="QmDeepSeekV4Weights"
DOMAIN=$1  # "dsl", "governance", "multimodal"
DATA_CID=$2 # dataset CID
OUTPUT_DIR="/var/lib/swarm/finetuned/$DOMAIN"

# Download weights
ipfs get $MODEL_CID -o /tmp/deepseek-v4

# Download dataset
ipfs get $DATA_CID -o /tmp/dataset_$DOMAIN.jsonl

# Launch QLoRA
torchrun --nproc_per_node=4 scripts/train_qlora.py \
  --model_name_or_path /tmp/deepseek-v4 \
  --dataset_path /tmp/dataset_$DOMAIN.jsonl \
  --output_dir $OUTPUT_DIR \
  --num_train_epochs ${EPOCHS} \
  --per_device_train_batch_size 4 \
  --gradient_accumulation_steps 16 \
  --learning_rate 2e-4 \
  --lr_scheduler_type cosine \
  --warmup_ratio 0.03 \
  --logging_steps 10 \
  --save_strategy epoch \
  --bf16 \
  --use_qlora \
  --lora_r 64 \
  --lora_alpha 128

# Publish adapter to IPFS
ipfs add -r $OUTPUT_DIR > /tmp/finetuned_cid.txt
echo "Fine-tuned adapter CID: $(cat /tmp/finetuned_cid.txt)"

AA.3.3. Artifacts¶

After fine‑tuning completes, the following are created:

QLoRA adapter (CID, ~200 MB).
Training configuration (training_config.json).
Training metrics (training_metrics.json — loss, eval_loss).
Reference dataset (CID of the dataset used for training).

All artifacts are signed and published to IPFS.

AA.4. Validation of the Fine‑Tuned Model¶

AA.4.1. Acceptance Metrics¶

The fine‑tuned adapter passes the standard Validation Pipeline (Validation_and_Verification.md) with additional domain‑specific checks:

Domain	Metric	Target Value	Test Dataset
DSL	Compilation error rate	≤ 1%	Hold‑out set of 500 rules
DSL	Agreement with reference solution (LLM as Judge, Vagrant)	≥ 98%	Hold‑out set
Governance	Share of trivial tautologies (after Concolic Filtering)	≤ 10%	200 synthetic amendments
Governance	Time to successful Multi‑Solver verification	≤ 3 min	50 amendments
Multimodal	False Positive Rate (deepfake)	≤ 1%	2000 images
Multimodal	Watermark extraction accuracy	≥ 98%	2000 watermarked images

AA.4.2. Acceptance Procedure¶

Shadow testing: the adapter is loaded into a separate vLLM instance on a Regional Aggregator. All production requests are duplicated to the shadow instance for 7 days. Results are compared against the baseline (current model without adapter).
A/B analysis: confidence intervals are computed (bootstrap, 95%). If the adapter shows a statistically significant improvement in the target metric without regression in other domains, it is considered passed.
Decision Pipeline: a proposal to promote the adapter goes through Governance (BFT quorum of Core Nodes).
Activation: the adapter is activated via vllm_launcher with the flag --lora-adapter <CID> for the corresponding species (Architectus for Governance, Vagrant for DSL and Multimodal).

AA.5. Degradation Prevention (Safety Guards)¶

Isolated storage: adapters are stored separately from base weights; loading and unloading is atomic.
Fast rollback: if a regression in production metrics is detected within 48 hours, the adapter is automatically disabled (rollback to the base model).
Domain isolation: different domains are fine‑tuned independently; their adapters do not overlap, preventing negative cross‑domain influence.
Drift monitoring: the Value Drift Early‑Warning System (Memory_Hierarchy_Mem0g.md, section 9) tracks embeddings of key principles before and after adapter activation.

AA.6. Integration with Other Modules¶

Module	Relationship
Memory_Hierarchy_Mem0g.md	Storage of datasets (L2) and adapter metrics (L0 Meta‑Mem0g).
Validation_and_Verification.md	Adapter validation via standard pipeline + domain tests.
Intrinsic_Motivation.md	Curiosity Engine may request fine‑tuning upon detecting persistent errors.
Global_State_and_Decision_Pipeline.md	`finetune_launch` proposal type to initiate fine‑tuning.
Appendix B: Launch Commands	vLLM launch commands with adapter (`--lora-adapter`).
Appendix L: Configuration Files	Fine‑tuning parameters in `global_policy.json`.

AA.7. Configuration in global_policy.json¶

{
  "finetuning": {
    "enabled": false,
    "max_frequency_days": 60,
    "min_dataset_size": {
      "dsl": 5000,
      "governance": 1000,
      "multimodal": 10000
    },
    "shadow_test_days": 7,
    "auto_rollback_hours": 48,
    "qlora": {
      "rank": 64,
      "alpha": 128,
      "learning_rate": 2e-4,
      "epochs": {
        "dsl": 3,
        "governance": 5,
        "multimodal": 2
      }
    }
  }
}

AA.8. Change History¶

Version	Date	Changes
V1.0	2026-07-15	Initial specification of DeepSeek‑V4 fine‑tuning.