Appendix AA: Fine-Tuning DeepSeek-V4 for Specialized Tasks¶
Purpose: Describe the targeted fine‑tuning process of the single MoE model DeepSeek‑V4 to improve its performance in three key domains critical for the system's autonomous operation:
- DSL rule generation (Neuro‑Symbolic L2 Compression, Memory_Hierarchy_Mem0g.md).
- Formal reasoning for Neuro‑Symbolic Governance (Validation_and_Verification.md, Intrinsic_Motivation.md).
- Multimodal verification (Meat‑Interface 3.0 / DeepSight, Meat_Interface_Tasking.md).
Fine‑tuning is performed using the QLoRA (Quantized Low‑Rank Adaptation) method on synthetic datasets generated by the system itself during Phase 2–3. The process is fully automated and triggered by the Meta‑Decision‑Pipeline when persistent quality degradation is detected in one of the domains.
AA.1. Motivation and Launch Criteria¶
The base DeepSeek‑V4 model shows high quality on general tasks but may exhibit suboptimal behavior in highly specialized scenarios:
- DSL rules: the model tends to generate redundant or syntactically incorrect S‑expressions (up to 5% compilation errors).
- Formal proofs: generation of SMT‑LIB2 specifications sometimes requires 2–3 attempts due to trivial tautologies (Concolic Filtering discards ~30% as trivial).
- Multimodal verification: deepfake detection accuracy and watermark extraction on specific image types (document photos, receipts) are below the target 98%.
Fine‑tuning launch criteria (via Meta‑Decision‑Pipeline):
| Domain | Metric | Launch Threshold |
|---|---|---|
| DSL | DSL rule compilation error rate | > 3% over 7 days |
| Neuro‑Symbolic Governance | Share of trivial tautologies in generated SMT specs | > 35% over 30 days |
| Multimodal | Deepfake detector False Positive Rate | > 3% over 30 days |
| Multimodal | Watermark extraction accuracy | < 95% over 30 days |
Fine‑tuning is not launched more than once every 60 days per domain to avoid overfitting.
AA.2. Synthetic Dataset Preparation¶
AA.2.1. Data Generation for DSL Rules¶
Source: successful PPO executor trajectories (MEV_and_PPO_Executors.md) that passed Batch Compression.
Process:
- All
DistilledWisdomitems with a populateddsl_rulefield from the last 90 days are extracted from Mem0g L2. - For each rule, a training pair is generated:
- Input:
(context: market_regime, volatility, liquidity) + (textual strategy description) - Target:
(DSL rule in correct syntax) - Additionally, negative examples are generated — pairs with intentionally corrupted syntax, so that the model learns to distinguish valid and invalid rules.
- Dataset size: ≥ 5000 pairs.
AA.2.2. Data Generation for Neuro‑Symbolic Governance¶
Source: successful ConstitutionalPrinciple items with verified ProofTree (Constitutional Debate 2.0 / Neuro_Symbolic_Governance.md).
Process:
- All
ConstitutionalPrincipleitems with aformal_proof_cidfield that passed Multi‑Solver verification (Z3 + CVC4 + Yices) are extracted from Mem0g L2. - For each principle, a pair is formed:
- Input:
(L3.0 axioms + current L3.1 + proposed amendment in natural language) - Target:
(valid SMT‑LIB2 specification) - Negative examples: rejected specifications (tagged
trivialorcounterexample_found). - Dataset size: ≥ 1000 pairs.
AA.2.3. Data Generation for Multimodal Verification¶
Source: Canary Tasks (Meat_Interface_Tasking.md) and the Meat‑Interface 3.0 verification archive.
Process:
- All images that passed through
CanaryVerifierwith high‑confidence results (confidence > 0.90) are collected. - For each image, a pair is created:
- Input:
image + verification prompt (watermark presence / real vs fake) - Target:
JSON with fields status, confidence, violations - For deepfake detection, synthetic images are generated via DeepSeek‑V4 in
Architectusmode (DeepSight Media Synthesis) — they are labeledsynthetic. - Dataset size: ≥ 10000 images.
AA.3. QLoRA Fine‑Tuning Procedure¶
AA.3.1. Configuration¶
| Parameter | Value |
|---|---|
| Base model | deepseek-v4 (weights from QmDeepSeekV4Weights) |
| Quantization | 4‑bit NF4 (bitsandbytes) |
| Adapters | LoRA: rank=64, alpha=128, target_modules=["q_proj","k_proj","v_proj","o_proj","gate_proj","up_proj","down_proj"] |
| Optimizer | AdamW 8‑bit, lr=2e‑4, cosine schedule |
| Batch size | 4 (gradient accumulation 16 → effective batch 64) |
| Epochs | 3 for DSL, 5 for Governance, 2 for Multimodal |
| Platform | Core Node (local, 4× RTX PRO 6000) or rented cluster via Vast.ai |
| Monitoring | Weights & Biases (local server) |
AA.3.2. Launch Script¶
#!/bin/bash
# scripts/finetune_deepseek.sh
MODEL_CID="QmDeepSeekV4Weights"
DOMAIN=$1 # "dsl", "governance", "multimodal"
DATA_CID=$2 # dataset CID
OUTPUT_DIR="/var/lib/swarm/finetuned/$DOMAIN"
# Download weights
ipfs get $MODEL_CID -o /tmp/deepseek-v4
# Download dataset
ipfs get $DATA_CID -o /tmp/dataset_$DOMAIN.jsonl
# Launch QLoRA
torchrun --nproc_per_node=4 scripts/train_qlora.py \
--model_name_or_path /tmp/deepseek-v4 \
--dataset_path /tmp/dataset_$DOMAIN.jsonl \
--output_dir $OUTPUT_DIR \
--num_train_epochs ${EPOCHS} \
--per_device_train_batch_size 4 \
--gradient_accumulation_steps 16 \
--learning_rate 2e-4 \
--lr_scheduler_type cosine \
--warmup_ratio 0.03 \
--logging_steps 10 \
--save_strategy epoch \
--bf16 \
--use_qlora \
--lora_r 64 \
--lora_alpha 128
# Publish adapter to IPFS
ipfs add -r $OUTPUT_DIR > /tmp/finetuned_cid.txt
echo "Fine-tuned adapter CID: $(cat /tmp/finetuned_cid.txt)"
AA.3.3. Artifacts¶
After fine‑tuning completes, the following are created:
- QLoRA adapter (CID, ~200 MB).
- Training configuration (
training_config.json). - Training metrics (
training_metrics.json— loss, eval_loss). - Reference dataset (CID of the dataset used for training).
All artifacts are signed and published to IPFS.
AA.4. Validation of the Fine‑Tuned Model¶
AA.4.1. Acceptance Metrics¶
The fine‑tuned adapter passes the standard Validation Pipeline (Validation_and_Verification.md) with additional domain‑specific checks:
| Domain | Metric | Target Value | Test Dataset |
|---|---|---|---|
| DSL | Compilation error rate | ≤ 1% | Hold‑out set of 500 rules |
| DSL | Agreement with reference solution (LLM as Judge, Vagrant) | ≥ 98% | Hold‑out set |
| Governance | Share of trivial tautologies (after Concolic Filtering) | ≤ 10% | 200 synthetic amendments |
| Governance | Time to successful Multi‑Solver verification | ≤ 3 min | 50 amendments |
| Multimodal | False Positive Rate (deepfake) | ≤ 1% | 2000 images |
| Multimodal | Watermark extraction accuracy | ≥ 98% | 2000 watermarked images |
AA.4.2. Acceptance Procedure¶
- Shadow testing: the adapter is loaded into a separate vLLM instance on a Regional Aggregator. All production requests are duplicated to the shadow instance for 7 days. Results are compared against the baseline (current model without adapter).
- A/B analysis: confidence intervals are computed (bootstrap, 95%). If the adapter shows a statistically significant improvement in the target metric without regression in other domains, it is considered passed.
- Decision Pipeline: a proposal to promote the adapter goes through Governance (BFT quorum of Core Nodes).
- Activation: the adapter is activated via
vllm_launcherwith the flag--lora-adapter <CID>for the corresponding species (Architectusfor Governance,Vagrantfor DSL and Multimodal).
AA.5. Degradation Prevention (Safety Guards)¶
- Isolated storage: adapters are stored separately from base weights; loading and unloading is atomic.
- Fast rollback: if a regression in production metrics is detected within 48 hours, the adapter is automatically disabled (rollback to the base model).
- Domain isolation: different domains are fine‑tuned independently; their adapters do not overlap, preventing negative cross‑domain influence.
- Drift monitoring: the Value Drift Early‑Warning System (Memory_Hierarchy_Mem0g.md, section 9) tracks embeddings of key principles before and after adapter activation.
AA.6. Integration with Other Modules¶
| Module | Relationship |
|---|---|
| Memory_Hierarchy_Mem0g.md | Storage of datasets (L2) and adapter metrics (L0 Meta‑Mem0g). |
| Validation_and_Verification.md | Adapter validation via standard pipeline + domain tests. |
| Intrinsic_Motivation.md | Curiosity Engine may request fine‑tuning upon detecting persistent errors. |
| Global_State_and_Decision_Pipeline.md | finetune_launch proposal type to initiate fine‑tuning. |
| Appendix B: Launch Commands | vLLM launch commands with adapter (--lora-adapter). |
| Appendix L: Configuration Files | Fine‑tuning parameters in global_policy.json. |
AA.7. Configuration in global_policy.json¶
{
"finetuning": {
"enabled": false,
"max_frequency_days": 60,
"min_dataset_size": {
"dsl": 5000,
"governance": 1000,
"multimodal": 10000
},
"shadow_test_days": 7,
"auto_rollback_hours": 48,
"qlora": {
"rank": 64,
"alpha": 128,
"learning_rate": 2e-4,
"epochs": {
"dsl": 3,
"governance": 5,
"multimodal": 2
}
}
}
}
AA.8. Change History¶
| Version | Date | Changes |
|---|---|---|
| V1.0 | 2026-07-15 | Initial specification of DeepSeek‑V4 fine‑tuning. |