MANJULAB Ohio Data Center · PersonaPlex + LLM Brain + RAG · 5 TPS Scale
Complete hardware, software, and service inventory with costs and status tracking.
| Component | Vendor / Model | Qty | Unit Cost | Total | Status | Notes |
|---|---|---|---|---|---|---|
GPU Server NVIDIA A100 Node (48 GB VRAM) |
Dell / Supermicro PowerEdge R750xa + A100 PCIe |
2 | $15,000 | $30,000 | Planned | 2x nodes for redundancy; primary PersonaPlex 7B INT8 inference |
Rack Application Server CPU App / Support Server (32-core, 128 GB RAM) |
Dell PowerEdge R550 |
2 | $3,500 | $7,000 | Planned | Hosts Redis, PostgreSQL, Prometheus, Grafana, Orchestrator |
NAS Storage Network Attached Storage (12-bay, 48 TB raw) |
Synology RS1221+ w/ 12x4TB HDD |
1 | $2,500 | $2,500 | Planned | RAG documents, transcript logs, database backups |
Core Switch SFP+ 10GbE Core Switch |
MikroTik CRS326-24S+2Q+ |
1 | $800 | $800 | Planned | 10GbE fabric; upgrade to 25GbE at >10 TPS |
Firewall / Edge Router Edge Firewall, NAT, VLAN, VPN |
Fortinet FortiGate 60F |
1 | $700 | $700 | Planned | TLS offload, DDoS protection, VLAN segmentation |
UPS Power Battery Backup Unit (per rack) |
APC Smart-UPS 3000VA RM 2U |
2 | $1,200 | $2,400 | Planned | 15-min bridge per rack; auto-shutdown on extended outage |
Internet Connectivity Business Fiber ISP — MONTHLY OpEx |
AT&T / Spectrum Business Business Fiber 1 Gbps |
1 | $500 | $500 | Planned | ** MONTHLY recurring; upgrade to 10 Gbps at >10 TPS |
| Component | Vendor / Model | Qty | Unit Cost | Total | Status | Notes |
|---|---|---|---|---|---|---|
PersonaPlex 7B INT8 Quantized LLM Voice Model |
Meta / Community PersonaPlex 7B INT8 |
1 | $0 | $0 | Planned | Full-duplex; 70ms speaker switch; runs on A100 GPU node |
Mimi Encoder Speech to Token Encoder (PCM 24kHz input) |
Kyutai / Custom Mimi Encoder v1 |
1 | $0 | $0 | Planned | Converts raw PCM audio frames to discrete speech tokens |
Mimi Decoder Token to Speech Decoder (PCM 24kHz output) |
Kyutai / Custom Mimi Decoder v1 |
1 | $0 | $0 | Planned | Synthesizes PCM speech output from token sequence |
Temporal + Depth Transformer Full-duplex Dual-stream Transformer |
Custom Temporal + Depth Transformer |
1 | $0 | $0 | Planned | Simultaneous listen + speak; 70ms context switch |
Text Prompt Injector LLM Answer to Voice Prompt Injector |
Custom Python Prompt Injector |
1 | $0 | $0 | Planned | Injects Brain Layer LLM answer each dialogue turn dynamically |
| Component | Vendor / Model | Qty | Unit Cost | Total | Status | Notes |
|---|---|---|---|---|---|---|
GPT-4o mini Primary LLM API — 80% of traffic |
OpenAI GPT-4o mini (2024-07-18) |
1 | $50 | $50 | Active | $0.15 in / $0.60 out per M tokens; majority of routine calls |
Gemini 1.5 Flash Budget LLM API — overflow / cheapest |
Google DeepMind Gemini 1.5 Flash |
1 | $30 | $30 | Active | $0.075 in / $0.30 out per M tokens; lowest cost at volume |
Claude Haiku Quality LLM API — best quality/price ratio |
Anthropic Claude 3 Haiku |
1 | $50 | $50 | Active | $0.25 in / $1.25 out per M tokens; nuanced quality tasks |
GPT-4o / Claude Sonnet Premium LLM API — 20% complex tasks |
OpenAI / Anthropic GPT-4o + Claude Sonnet 3.5 |
1 | $200 | $200 | Active | $2.50+ in / $10+ out per M tokens; complex reasoning |
Smart LLM Router Complexity-based LLM Request Router |
Custom Python Smart Router v1 |
1 | $0 | $0 | Planned | Routes 80% cheap / 20% premium based on prompt complexity score |
| Component | Vendor / Model | Qty | Total | Status | Notes |
|---|---|---|---|---|---|
RAG Pipeline Retrieval-Augmented Generation Orchestrator |
Custom LangChain / LlamaIndex |
1 | $0 | Planned | Orchestrates embedding, retrieval, re-ranking, answer grounding |
BGE-small Embedder Text Embedding Model (384-dim vectors) |
BAAI BGE-small-en-v1.5 |
1 | $0 | Planned | Self-hosted; 33M params; CPU-inferrable at low latency |
Qdrant / FAISS Vector Database for Embedding Search |
Qdrant / Meta FAIR Qdrant CE / FAISS v1.7 |
1 | $0 | Planned | Stores 384-dim vectors; ANN search <50ms target |
Re-ranker Cross-encoder Result Re-ranker |
HuggingFace / Custom ms-marco-MiniLM-L-6-v2 |
1 | $0 | Planned | Improves top-K retrieval precision before answer injection |
Document Store Source Docs Ingest (PDF / FAQ / CRM / Webhooks) |
Custom / MinIO MinIO OSS (S3-compatible) |
1 | $0 | Planned | Holds raw knowledge corpus; event-triggered re-indexing |
| Component | Vendor / Model | Qty | Total | Status | Notes |
|---|---|---|---|---|---|
Redis In-memory Cache & Session Layer |
Redis Labs Redis CE 7.2 |
1 | $0 | Planned | Session state, pub/sub audio events, rate-limit counters |
PostgreSQL Primary Relational Database |
PostgreSQL Global Dev Group PostgreSQL 16 |
1 | $0 | Planned | Users, transcripts, billing records, system configuration |
Prometheus Metrics Scraping & Alerting |
CNCF Prometheus 2.48 |
1 | $0 | Planned | Scrapes GPU util, latency, token usage; AlertManager integration |
Grafana Metrics Visualization & Ops Dashboard |
Grafana Labs Grafana CE 10.3 |
1 | $0 | Planned | Real-time dashboards: GPU%, p99 latency, LLM cost/min, margin |
Transcript Logger Conversation Analytics & Logging Service |
Custom Python FastAPI Logger |
1 | $0 | Planned | Stores, indexes, and exports all call transcripts to PostgreSQL |
Admin Dashboard Knowledge-Base Management UI |
Custom React + FastAPI Admin |
1 | $0 | Planned | Manage KB docs, model routing config, usage reporting |
| Component | Vendor / Model | Qty | Total | Status | Notes |
|---|---|---|---|---|---|
Session Orchestrator Audio Routing + GPU/LLM Orchestrator |
Custom Python 3.11 AsyncIO Service |
1 | $0 | Planned | Routes PCM audio, intercepts monologue, injects LLM response dynamically |
| Component | Vendor / Model | Qty | Total | Status | Notes |
|---|---|---|---|---|---|
nginx Gateway TLS Termination + WebSocket Reverse Proxy |
nginx Inc. nginx OSS 1.25 |
1 | $0 | Planned | Rate limit: 5 concurrent conns; wss:// proxy; upgrade at scale |
| Component | Vendor / Model | Qty | Total | Status | Notes |
|---|---|---|---|---|---|
WebRTC / WebSocket Real-time Browser Voice Interface |
W3C / Custom JS Browser WebRTC + WebSocket Stack |
1 | $0 | Active | PCM 24kHz bidirectional; target <100ms glass-to-glass latency |
Twilio SIP Trunk PSTN / Phone-In Integration — MONTHLY OpEx |
Twilio Elastic SIP Trunking |
1 | $50 | Active | ** MONTHLY; ~$0.01/min inbound PSTN; ~5k min/mo at 5 TPS |
Web Voice Widget Embeddable Browser Voice Widget |
Custom JS / React Embeddable Widget |
1 | $0 | Planned | Drop-in <script> embed for client websites; mobile-responsive |
| Component | Vendor / Model | Qty | Total | Status | Notes |
|---|---|---|---|---|---|
Identity & JWT Auth Authentication, Authorization & RBAC |
Custom Python-jose + FastAPI Auth |
1 | $0 | Planned | JWT issuance, refresh, RBAC, per-tenant API key management |
HashiCorp Vault Secrets & API Key Manager |
HashiCorp Vault CE 1.15 |
1 | $0 | Planned | Stores LLM API keys, DB creds, TLS private keys securely |
TLS Certificates SSL/TLS Cert Automation (all layers) |
Let's Encrypt / EFF Certbot + ACME v2 |
1 | $0 | Active | Auto-renewing 90-day wildcard certs via DNS challenge; zero cost |
| Component | Vendor / Model | Qty | Total | Status | Notes |
|---|---|---|---|---|---|
Docker Compose Dev & Staging Container Orchestration |
Docker Inc. Docker CE + Compose v2 |
1 | $0 | Planned | All services in docker-compose.yml; quick local iteration |
Kubernetes (K3s) Production Multi-node Container Orchestration |
CNCF / Rancher Labs K3s v1.29 / kubeadm |
1 | $0 | Planned | HPA for voice pods; rolling deploys; upgrade to 25GbE at 10 TPS |
| Cost Category | Budget Type | Amount | Notes |
|---|---|---|---|
| Hardware — servers, switches, UPS, NAS (ex. ISP monthly) | CapEx (one-time) | $43,400 | One-time hardware minus monthly ISP |
| Voice Layer — model weights (OSS / open source) | CapEx (one-time) | $0 | Open-source weights; $0 license cost |
| Knowledge Layer — OSS RAG pipeline + vector DB | CapEx (one-time) | $0 | LangChain / Qdrant / BGE-small — all OSS |
| Support Services — OSS stack (Redis, PG, Prometheus, Grafana) | CapEx (one-time) | $0 | All open-source; $0 license |
| Session Orchestrator + Gateway + Client + Security + Deployment | CapEx (one-time) | $0 | All OSS; dev labor cost separate |
| TOTAL CapEx (one-time hardware + setup) | CapEx | $43,400 | Upfront investment for MANJULAB Ohio |
| ── Monthly Operating Expenses ── | |||
| LLM APIs — GPT-4o mini + Gemini Flash + Claude Haiku + Premium | OpEx (monthly) | $330 | Monthly API spend at 5 TPS; scales with volume |
| Internet / ISP — Business Fiber 1 Gbps | OpEx (monthly) | $500 | Recurring monthly; upgrade to 10G at >10 TPS |
| Twilio SIP — PSTN inbound (monthly estimate) | OpEx (monthly) | $50 | ~$0.01/min; ~5,000 min/mo at 5 TPS |
| TOTAL Monthly OpEx (APIs + ISP + Twilio) | OpEx / month | $880 | Monthly recurring spend |
| TOTAL Annual OpEx (× 12 months) | OpEx / year | $10,560 | Annual recurring spend estimate |
| YEAR 1 GRAND TOTAL (CapEx + Annual OpEx) | Year 1 Total | $53,960 | Full Year 1: build-out + 12 months operations |