DC Detailing — MANJULAB Ohio BoM | PersonaPlex + LLM Brain + RAG

🖥️ Hardware — Infrastructure

7 components $43,900

Component	Vendor / Model	Qty	Unit Cost	Total	Status	Notes
GPU Server NVIDIA A100 Node (48 GB VRAM)	Dell / Supermicro PowerEdge R750xa + A100 PCIe	2	$15,000	$30,000	Planned	2x nodes for redundancy; primary PersonaPlex 7B INT8 inference
Rack Application Server CPU App / Support Server (32-core, 128 GB RAM)	Dell PowerEdge R550	2	$3,500	$7,000	Planned	Hosts Redis, PostgreSQL, Prometheus, Grafana, Orchestrator
NAS Storage Network Attached Storage (12-bay, 48 TB raw)	Synology RS1221+ w/ 12x4TB HDD	1	$2,500	$2,500	Planned	RAG documents, transcript logs, database backups
Core Switch SFP+ 10GbE Core Switch	MikroTik CRS326-24S+2Q+	1	$800	$800	Planned	10GbE fabric; upgrade to 25GbE at >10 TPS
Firewall / Edge Router Edge Firewall, NAT, VLAN, VPN	Fortinet FortiGate 60F	1	$700	$700	Planned	TLS offload, DDoS protection, VLAN segmentation
UPS Power Battery Backup Unit (per rack)	APC Smart-UPS 3000VA RM 2U	2	$1,200	$2,400	Planned	15-min bridge per rack; auto-shutdown on extended outage
Internet Connectivity Business Fiber ISP — MONTHLY OpEx	AT&T / Spectrum Business Business Fiber 1 Gbps	1	$500	$500	Planned	** MONTHLY recurring; upgrade to 10 Gbps at >10 TPS

Hardware Subtotal$43,900

🎙️ Voice Layer

5 components $0

Component	Vendor / Model	Qty	Unit Cost	Total	Status	Notes
PersonaPlex 7B INT8 Quantized LLM Voice Model	Meta / Community PersonaPlex 7B INT8	1	$0	$0	Planned	Full-duplex; 70ms speaker switch; runs on A100 GPU node
Mimi Encoder Speech to Token Encoder (PCM 24kHz input)	Kyutai / Custom Mimi Encoder v1	1	$0	$0	Planned	Converts raw PCM audio frames to discrete speech tokens
Mimi Decoder Token to Speech Decoder (PCM 24kHz output)	Kyutai / Custom Mimi Decoder v1	1	$0	$0	Planned	Synthesizes PCM speech output from token sequence
Temporal + Depth Transformer Full-duplex Dual-stream Transformer	Custom Temporal + Depth Transformer	1	$0	$0	Planned	Simultaneous listen + speak; 70ms context switch
Text Prompt Injector LLM Answer to Voice Prompt Injector	Custom Python Prompt Injector	1	$0	$0	Planned	Injects Brain Layer LLM answer each dialogue turn dynamically

Voice Layer Subtotal$0 (OSS)

🧠 Brain Layer (LLM API)

5 components $330/mo

Component	Vendor / Model	Qty	Unit Cost	Total	Status	Notes
GPT-4o mini Primary LLM API — 80% of traffic	OpenAI GPT-4o mini (2024-07-18)	1	$50	$50	Active	$0.15 in / $0.60 out per M tokens; majority of routine calls
Gemini 1.5 Flash Budget LLM API — overflow / cheapest	Google DeepMind Gemini 1.5 Flash	1	$30	$30	Active	$0.075 in / $0.30 out per M tokens; lowest cost at volume
Claude Haiku Quality LLM API — best quality/price ratio	Anthropic Claude 3 Haiku	1	$50	$50	Active	$0.25 in / $1.25 out per M tokens; nuanced quality tasks
GPT-4o / Claude Sonnet Premium LLM API — 20% complex tasks	OpenAI / Anthropic GPT-4o + Claude Sonnet 3.5	1	$200	$200	Active	$2.50+ in / $10+ out per M tokens; complex reasoning
Smart LLM Router Complexity-based LLM Request Router	Custom Python Smart Router v1	1	$0	$0	Planned	Routes 80% cheap / 20% premium based on prompt complexity score

Brain Layer Subtotal$330/mo

📚 Knowledge Layer (RAG)

5 components $0 (OSS)

Component	Vendor / Model	Qty	Total	Status	Notes
RAG Pipeline Retrieval-Augmented Generation Orchestrator	Custom LangChain / LlamaIndex	1	$0	Planned	Orchestrates embedding, retrieval, re-ranking, answer grounding
BGE-small Embedder Text Embedding Model (384-dim vectors)	BAAI BGE-small-en-v1.5	1	$0	Planned	Self-hosted; 33M params; CPU-inferrable at low latency
Qdrant / FAISS Vector Database for Embedding Search	Qdrant / Meta FAIR Qdrant CE / FAISS v1.7	1	$0	Planned	Stores 384-dim vectors; ANN search <50ms target
Re-ranker Cross-encoder Result Re-ranker	HuggingFace / Custom ms-marco-MiniLM-L-6-v2	1	$0	Planned	Improves top-K retrieval precision before answer injection
Document Store Source Docs Ingest (PDF / FAQ / CRM / Webhooks)	Custom / MinIO MinIO OSS (S3-compatible)	1	$0	Planned	Holds raw knowledge corpus; event-triggered re-indexing

Knowledge Layer Subtotal$0 (OSS)

🛠️ Support Services

6 components $0 (OSS)

Component	Vendor / Model	Qty	Total	Status	Notes
Redis In-memory Cache & Session Layer	Redis Labs Redis CE 7.2	1	$0	Planned	Session state, pub/sub audio events, rate-limit counters
PostgreSQL Primary Relational Database	PostgreSQL Global Dev Group PostgreSQL 16	1	$0	Planned	Users, transcripts, billing records, system configuration
Prometheus Metrics Scraping & Alerting	CNCF Prometheus 2.48	1	$0	Planned	Scrapes GPU util, latency, token usage; AlertManager integration
Grafana Metrics Visualization & Ops Dashboard	Grafana Labs Grafana CE 10.3	1	$0	Planned	Real-time dashboards: GPU%, p99 latency, LLM cost/min, margin
Transcript Logger Conversation Analytics & Logging Service	Custom Python FastAPI Logger	1	$0	Planned	Stores, indexes, and exports all call transcripts to PostgreSQL
Admin Dashboard Knowledge-Base Management UI	Custom React + FastAPI Admin	1	$0	Planned	Manage KB docs, model routing config, usage reporting

Support Services Subtotal$0 (OSS)

⚙️ Session Orchestrator

1 component $0

Component	Vendor / Model	Qty	Total	Status	Notes
Session Orchestrator Audio Routing + GPU/LLM Orchestrator	Custom Python 3.11 AsyncIO Service	1	$0	Planned	Routes PCM audio, intercepts monologue, injects LLM response dynamically

Session Orchestrator Subtotal$0

🌐 Gateway Layer

1 component $0

Component	Vendor / Model	Qty	Total	Status	Notes
nginx Gateway TLS Termination + WebSocket Reverse Proxy	nginx Inc. nginx OSS 1.25	1	$0	Planned	Rate limit: 5 concurrent conns; wss:// proxy; upgrade at scale

Gateway Layer Subtotal$0 (OSS)

💻 Client Layer

3 components $50/mo

Component	Vendor / Model	Qty	Total	Status	Notes
WebRTC / WebSocket Real-time Browser Voice Interface	W3C / Custom JS Browser WebRTC + WebSocket Stack	1	$0	Active	PCM 24kHz bidirectional; target <100ms glass-to-glass latency
Twilio SIP Trunk PSTN / Phone-In Integration — MONTHLY OpEx	Twilio Elastic SIP Trunking	1	$50	Active	** MONTHLY; ~$0.01/min inbound PSTN; ~5k min/mo at 5 TPS
Web Voice Widget Embeddable Browser Voice Widget	Custom JS / React Embeddable Widget	1	$0	Planned	Drop-in <script> embed for client websites; mobile-responsive

Client Layer Subtotal$50/mo

🔒 Security

3 components $0

Component	Vendor / Model	Qty	Total	Status	Notes
Identity & JWT Auth Authentication, Authorization & RBAC	Custom Python-jose + FastAPI Auth	1	$0	Planned	JWT issuance, refresh, RBAC, per-tenant API key management
HashiCorp Vault Secrets & API Key Manager	HashiCorp Vault CE 1.15	1	$0	Planned	Stores LLM API keys, DB creds, TLS private keys securely
TLS Certificates SSL/TLS Cert Automation (all layers)	Let's Encrypt / EFF Certbot + ACME v2	1	$0	Active	Auto-renewing 90-day wildcard certs via DNS challenge; zero cost

Security Subtotal$0 (OSS)

🐳 Deployment

2 components $0

Component	Vendor / Model	Qty	Total	Status	Notes
Docker Compose Dev & Staging Container Orchestration	Docker Inc. Docker CE + Compose v2	1	$0	Planned	All services in docker-compose.yml; quick local iteration
Kubernetes (K3s) Production Multi-node Container Orchestration	CNCF / Rancher Labs K3s v1.29 / kubeadm	1	$0	Planned	HPA for voice pods; rolling deploys; upgrade to 25GbE at 10 TPS

Deployment Subtotal$0 (OSS)

💰 COST SUMMARY — MANJULAB Ohio DC · PersonaPlex + LLM Brain + RAG · 5 TPS

Cost Category	Budget Type	Amount	Notes
Hardware — servers, switches, UPS, NAS (ex. ISP monthly)	CapEx (one-time)	$43,400	One-time hardware minus monthly ISP
Voice Layer — model weights (OSS / open source)	CapEx (one-time)	$0	Open-source weights; $0 license cost
Knowledge Layer — OSS RAG pipeline + vector DB	CapEx (one-time)	$0	LangChain / Qdrant / BGE-small — all OSS
Support Services — OSS stack (Redis, PG, Prometheus, Grafana)	CapEx (one-time)	$0	All open-source; $0 license
Session Orchestrator + Gateway + Client + Security + Deployment	CapEx (one-time)	$0	All OSS; dev labor cost separate
TOTAL CapEx (one-time hardware + setup)	CapEx	$43,400	Upfront investment for MANJULAB Ohio
── Monthly Operating Expenses ──
LLM APIs — GPT-4o mini + Gemini Flash + Claude Haiku + Premium	OpEx (monthly)	$330	Monthly API spend at 5 TPS; scales with volume
Internet / ISP — Business Fiber 1 Gbps	OpEx (monthly)	$500	Recurring monthly; upgrade to 10G at >10 TPS
Twilio SIP — PSTN inbound (monthly estimate)	OpEx (monthly)	$50	~$0.01/min; ~5,000 min/mo at 5 TPS
TOTAL Monthly OpEx (APIs + ISP + Twilio)	OpEx / month	$880	Monthly recurring spend
TOTAL Annual OpEx (× 12 months)	OpEx / year	$10,560	Annual recurring spend estimate
YEAR 1 GRAND TOTAL (CapEx + Annual OpEx)	Year 1 Total	$53,960	Full Year 1: build-out + 12 months operations

🚀 Future Expansion Notes

25GbE Network Fabric: Upgrade MikroTik 10GbE to CRS354 or Arista 7050SX3 at >10 TPS or video workloads.
GPU Redundancy: Add 2nd A100 node for active-active GPU redundancy at 10 TPS.
Gateway Scale: Migrate nginx OSS to nginx Plus or HAProxy Enterprise at >500 concurrent sessions.
ISP Upgrade: Upgrade to 10 Gbps symmetric fiber for low-latency multi-tenant streaming.
GPU Operator: Add Kubernetes GPU Operator for dynamic scheduling across GPU nodes.
Global CDN: Add Cloudflare CDN + TURN/STUN relay for global WebRTC client scale.

DC Detailing — Bill of Materials

💰 COST SUMMARY — MANJULAB Ohio DC · PersonaPlex + LLM Brain + RAG · 5 TPS

🚀 Future Expansion Notes