📄 ManjuLAB Research · May 2026

Voice and Role Control for Full-Duplex Conversational AI

whizyogaManjuLAB Research Team
ManjuLAB · yogabrata.com · whizyoga@manjulab.com · +1 425-502-1519
extended, and yours.

Launch Live Demo →

Overview

Conversational AI has historically forced an impossible trade-off. Traditional cascaded systems (ASR to LLM to TTS) offer voice and role customization but produce robotic conversations. Full-duplex models like Moshi finally made AI conversations feel natural, but locked users into a single fixed voice and role.

1o1 by ManjuLAB breaks this trade-off. Select from a diverse range of voices and define any role through a plain text prompt -- no retraining required. 1o1 delivers genuinely natural conversations, handles interruptions and backchannels, and maintains your chosen persona throughout.

Key insight: By combining voice prompting (audio embedding) with text prompting (natural language role definition) in a single hybrid system prompt, 1o1 disentangles speech naturalness from task-following behavior -- enabling both without compromise.

Capabilities

Full-Duplex Interaction

1o1 listens and speaks simultaneously. This eliminates delays from cascaded systems and enables natural conversation behaviors -- when to pause, interrupt, or backchannel.

Hybrid Prompting

A voice prompt (audio embedding) captures vocal characteristics. A text prompt (natural language) describes the role and context. These are processed jointly. Any role is definable at inference time -- no fine-tuning needed.

Natural Backchanneling

1o1 produces contextual backchannels -- "okay", "yeah", "I see" -- that signal active listening without interrupting the speaker's flow.

Demonstration Examples

Assistant
Wise Teacher
"You are a wise and friendly teacher. Answer questions or provide advice in a clear and engaging way."
0:42
Customer Service
Banking Support
"You work for First 1o1 Bank. Your name is Maya. The customer's transaction of $1,200 was flagged in Miami, FL."
1:08
Medical Office
Patient Registration
"You work for Dr. Kumar's office. Record full name, DOB, allergies, and prior conditions. Assure confidentiality."
1:22
Natural Backchanneling
Casual Conversation
"You enjoy having a good conversation."
0:55

Architecture

🎤 Voice Prompt
Audio Embedding
+
📝 Text Prompt
Role & Context
1o1 AI Core
7B · Dual-Stream · 24kHz
🔊 Natural Speech
Full-Duplex Output

Training Data

7,303
Real Conversations
1,217 hours from Fisher English corpus. Source of natural backchanneling and authentic interaction patterns.
39,322
Synthetic Assistant
410 hours of question-answering dialogues. Transcripts generated by LLMs, synthesized via neural TTS.
105,410
Customer Service
1,840 hours across banking, medical, restaurant scenarios with rich contextual text prompts.

Evaluation Results

SystemSmooth Turn-TakingUser InterruptionPause HandlingAverage
🥇 1o1 AI (ManjuLAB) 90.8 95.0 100.095.3
Moshi (Kyutai) 60.6 82.1 94.178.9
Gemini Live 65.5<

Citation

@article{whizyoga2026onetoone,
  title   = {Voice and Role Control for Full-Duplex Conversational AI},
  author  = {whizyoga},
  year    = {2026},
  url     = {https://yogabrata.com/research.html},
  note    = {ManjuLAB -- yogabrata.com}
}

Acknowledgments

1o1 is built on the Moshi architecture from Kyutai (CC-BY-4.0). This work was developed by whizyoga at ManjuLAB. The original research that inspired this work is the PersonaPlex project from NVIDIA ADLR. Code and model weights are released under the MIT License.

/td>
89.1 71.875.5
Qwen 2.5 Omni 86.7 43.9 54.761.8
Freeze Omni 1.8 65.3 33.633.6

Key Findings

01
Efficient Specialization
Under 5,000 hours of directed data enables full task-following from pretrained weights.
02
Disentangled Naturalness
Blending synthetic and real data lets the model exhibit natural speech patterns alongside strong task-adherence.
03
Emergent Generalization
1o1 handles scenarios far outside training by leveraging broad pretraining from its language model foundation.
📄 Paper 🤗 Model Weights 💻 Code 🎛 Live Demo MIT License
🎛 Try 1o1 AI Live

Install the backend on your Windows PC and connect it to this demo -- experience full-duplex voice AI in real time.

Open Live Demo →