A - GPU Configurations
# Appendix A: GPU Configurations & Hardware Profiles
**Purpose:** Contains reference GPU configurations for all node types (Core Node, Regional Aggregator, Edge Node), power profiles, thermal envelopes, and expert mask distribution for DeepSeek‑V4 by species. Used by `hw_probe`, `vllm_launcher`, and `isolationd` components.
---
## A1. Node Configurations by Role
### A1.1. Core Node (Full Profile)
| Component | Model | Qty | Key Characteristics |
| :----------------- | :---------------------------- | :-------- | :-------------------------------------------- |
| **Anchor GPU** | NVIDIA RTX PRO 6000 Blackwell | 1 | 96 GB GDDR7, 600 W TDP, PCIe 5.0 x16 |
| **Secondary GPU** | NVIDIA RTX 5090 Ti | 1–2 | 32 GB GDDR7, 450 W TDP, PCIe 5.0 x16 |
| **CPU** | AMD Ryzen Threadripper 7960X | 1 | 24 cores, 48 threads, 128 PCIe lanes |
| **RAM** | 256 GB DDR5 ECC | 1 kit | ECC mandatory for bit‑error protection |
| **NVMe** | 2× 4 TB Samsung 990 Pro | 2 | RAID‑0, 7450 MB/s read |
| **Power Supply** | Seasonic Prime TX‑2200 | 1 | 2200 W, 80+ Titanium |
| **UPS** | APC Smart‑UPS SRT3000 | 1 | 3000 VA / 2700 W, Online Double‑Conversion |
### A1.2. Regional Aggregator (Cloud Profile)
| Component | Model | Characteristics |
| :--------- | :------------------- | :--------------------------------- |
| **GPU** | NVIDIA A10 / RTX 4090| 24 GB VRAM, 150–300 W TDP |
| **vCPU** | 16–32 cores | Provider‑dependent |
| **RAM** | 64–128 GB | |
| **NVMe** | 500+ GB | |
### A1.3. Edge Node (Rentable / Lightweight Profile)
| Component | Model | Characteristics |
| :--------- | :------------------------ | :-------------------------------------- |
| **GPU** | RTX 4090 / RTX 5090 Ti | 24–32 GB VRAM |
| **vCPU** | 8–16 cores | For validation and light inference tasks|
| **RAM** | 32–64 GB | |
| **NVMe** | 100–250 GB ephemeral | |
---
## A2. DeepSeek‑V4 Expert Mask Distribution
| Species | Active Expert Share | VRAM Estimate (no offload) | Recommended Hardware |
| :------------- | :------------------ | :------------------------- | :---------------------------------- |
| **Vagrant** | 20% | ~80 GB | 1× RTX 4090 / rented |
| **Arbtiragius**| 30% | ~120 GB | 1× RTX 5090 Ti (32 GB) + CPU offload|
| **Sentinella** | 40% | ~160 GB | 2× RTX 5090 Ti or 1× RTX PRO 6000 |
| **Architectus**| 60% | ~240 GB | 2× RTX PRO 6000 or 4× RTX 5090 Ti |
| **Custodian** | 10–15% | ~40–60 GB | 1× RTX 4090 / rented, CPU offload possible |
---
## A3. Power Profiles
### A3.1. NVIDIA RTX PRO 6000 Blackwell
| Mode | Power Limit | Performance | Use Case |
| :----------------- | :------------- | :---------- | :--------------------------------------------------- |
| **Max Performance**| 600 W (stock) | 100% | `Architectus` strategic tasks, formal verification |
| **Balanced** | 450 W | ~85% | `Sentinella`, threat monitoring |
| **Eco** | 300 W | ~65% | `Vagrant`, background tasks, evolution |
### A3.2. NVIDIA RTX 5090 Ti
| Mode | Power Limit | Performance | Use Case |
| :----------------- | :------------- | :---------- | :----------------------------------------------- |
| **Max Performance**| 450 W (stock) | 100% | `Arbtiragius`, high‑frequency trading |
| **Balanced** | 350 W | ~85% | `Vagrant`, validation |
| **Eco** | 250 W | ~65% | Reconnaissance, pruning, batch tasks |
---
## A4. Thermal Envelope
| Node | Max GPU Temp | Action at 75 °C | Action at 85 °C | Emergency Shutdown |
| :----------------------- | :----------- | :----------------------------------------- | :----------------------------------------- | :-------------------------- |
| **Core Node (PRO 6000)** | 85 °C | Fan 80%, Power Limit 450 W → 300 W | Fan 100%, Power Limit 300 W → 250 W | 95 °C → Hardware Kill |
| **Core Node (5090 Ti)** | 83 °C | Fan 80%, Power Limit 450 W → 350 W | Fan 100%, Power Limit 350 W → 250 W | 95 °C → Hardware Kill |
| **Edge Node (rented)** | 85 °C | Load reduction, notification | Inference shutdown | Rental termination |
Cooling management is handled by the `adjust_cooling` script (see [Hardware_Isolation.md](hardware_isolation.md)).
---
## A5. PCIe Sideband Monitoring (SMBus)
Extended PCIe Sideband monitoring via Arduino is used to detect hidden GPU attacks (see [Isolation_and_Sandbox.md](Isolation_and_Sandbox.md), section 4.2).
| Arduino Signal | Connection Point | Purpose |
| :------------- | :------------------------ | :----------------------------------- |
| A4 (SDA) | TPM Header (Pin 11) | SMBus data |
| A5 (SCL) | TPM Header (Pin 12) | SMBus clock |
| D2 (INT) | PCIe PERST# (Sideband) | Unauthorized bus reset detection |
| D3 (OUT) | Front Panel (PWR SW) | Soft‑Kill |
| Relay Ctrl | ATX 24‑pin (PS‑ON) | Hard‑Kill (physical disconnection) |
---
## A6. Relationship with Other Documents
- **Cold Start:** [Cold_Start_Protocol.md](cold_start_protocol.md)
- **Isolation and Watchdog:** [Isolation_and_Sandbox.md](Isolation_and_Sandbox.md)
- **vLLM Launch:** [Appendix B: Launch Commands](./Appendix_B_Launch_Commands.md)
- **Hardware BOM:** [Appendix J: Hardware BOM](./Appendix_J_Hardware_BOM.md)
- **Glossary:** [Glossary.md](Glossary.md)