Skip to main content

Configuration Options (Options)

Configuration options are used to initialize AaaS Pilot Kit instances. It consists of required and optional options.

Required Options

Configuration Note

The following options are required in regular usage, but some options (such as agentConfig) can be omitted under specific conditions. See each option's details.

token (string)

🔑 Digital Employee Auth Token - Identity authentication credential for calling Digital Employee services.

  • Example: "your-auth-token-here"
  • How to obtain: Contact platform administrator or visit Platform Documentation

figureId (string)

🆔 Digital Employee Avatar ID - Unique avatar resource identifier obtained from the platform.

ttsPer (string)

🎙️ Digital Employee Voice ID - Corresponds to the speaker identifier for the TTS engine.

agentConfig (AgentConfig)

🧠 AI Agent API Configuration - Configuration for LLM conversation processing.

interface IAgentConfig {
// Keyue ONE - Intelligent outbound robot ID https://ky.cloud.baidu.com/ky/telemarketing/config/robot/manage
robotId?: string;
// Keyue ONE - Intelligent customer service robot token https://ky.cloud.baidu.com/ky/unit-app
token?: string;
}

Configuration Requirements:

  • ⚠️ If custom agentService is not provided, this configuration is required.
  • ✅ When agentService is provided, agentConfig can be omitted, with the custom service fully handling conversation logic.

Related Links:

Optional Options

ttsSample (number)

📊 TTS Audio Sampling Rate (Hz) - Affects sound quality and bandwidth.

  • Default value: 16000
  • Recommended values:
    • 16000 (16k, default, balances sound quality and performance)
    • 24000 (24k, high-fidelity scenarios)
    • 8000 (8k, low bandwidth/embedded devices)

locale (string | LanguageCode)

🌐 Unified Language Configuration (v1.1.2+) - Set SDK interface language and speech service language.

  • Type: 'zh' | 'en' | 'ja' | 'ko' | LanguageCode | string
  • Default value: 'zh'
  • Description: Simultaneously controls:
    • Language of SDK internal message text (such as error prompts, status messages)
    • ASR speech recognition and TTS speech synthesis language
import {createAaaSPilotKit, Language} from '@bdky/aaas-pilot-kit';

// Method 1: Use string
const kit = createAaaSPilotKit({
locale: 'en', // Both interface and speech use English
});

// Method 2: Use Language enum (recommended, with type hints)
const kit = createAaaSPilotKit({
locale: Language.ENGLISH,
});
Unified Configuration

Starting from v1.1.2, only need to set locale to configure both interface and speech service language simultaneously. If you need interface and speech to use different languages, you can override speech service language through asr.config.lang.

Detailed Usage: See Internationalization (i18n)

messages (Partial<I18nMessages>)

📝 Custom Translation Messages (v1.1.2+) - Override or extend built-in translations.

  • Type: Partial<I18nMessages>
  • Default value: undefined

Lookup Priority:

  1. Custom messages
  2. Built-in language pack corresponding to current locale
  3. Chinese fallback (zh)
  4. Key itself (Chinese original text)
createAaaSPilotKit({
locale: 'en',
messages: {
// Override built-in translation
'网络连接错误': 'Custom network error message',
},
});

Detailed Usage: See Internationalization (i18n)


lang (LanguageCode)

Deprecated (v1.1.2+)

lang top-level configuration is deprecated. Please migrate to locale.

// ❌ Old syntax (deprecated)
{ lang: 'en' }

// ✅ New syntax
{ locale: 'en' }

If you only want to set speech service language while keeping the interface in Chinese, use asr.config.lang:

// ✅ Only speech service uses English
{
locale: 'zh', // Keep interface in Chinese
asr: { provider: 'baidu', config: { lang: 'en' } }
}

🌐 Language Configuration - Used to configure ASR speech recognition and TTS speech synthesis language.

  • Type: LanguageCode (string or Language constant)
  • Default value: 'zh' (Chinese)
  • Supported languages:
Language ConstantLanguage CodeLanguage Name
Language.CHINESE'zh'Chinese (Mandarin)
Language.ENGLISH'en'English
Language.JAPANESE'ja'Japanese
Language.SPANISH'es'Spanish
Language.RUSSIAN'ru'Russian
Language.KOREAN'ko'Korean
Language.VIETNAMESE'vi'Vietnamese
Language.GERMAN'de'German
Language.INDONESIAN'id'Indonesian
Language.THAI'th'Thai

Usage:

import {createAaaSPilotKit, Language} from '@bdky/aaas-pilot-kit';

// Method 1: Use Language constant (recommended, with type hints)
const controller = createAaaSPilotKit({
figureId: 'your-figure-id',
token: 'your-token',
lang: Language.ENGLISH, // English
});

// Method 2: Use language code string directly
const controller = createAaaSPilotKit({
figureId: 'your-figure-id',
token: 'your-token',
lang: 'en', // English
});
Note
  • Passing unsupported language codes will automatically fallback to Chinese ('zh') and output warning in console
  • It's recommended to use Language constants to get TypeScript type hints and IDE auto-completion

rendererMode ('cloud' | 'cloud-native' | 'client')

🖥️ Renderer Mode - Select technical implementation for Digital Employee service.

  • Default value: 'cloud'
  • Options:
    • 'cloud' (default) → Cloud streaming rendering (iframe method)

      • Suitable scenarios: PC end
      • Features: High-fidelity dynamic avatar, requires network + RTC support
    • 'cloud-native' → Cloud streaming rendering (native SDK integration)

      • Suitable scenarios: Mobile end (avoids iframe click restrictions)
      • Features: High-fidelity dynamic avatar, requires network + RTC support
      • Recommended: Prefer this mode for mobile browser environments
    • 'client' → Local 2D rendering (static image + lip-sync animation)

      • Suitable scenarios: Offline scenarios or low resource consumption needs
      • Features: Low resource consumption, available offline

Usage Example:

// PC end (default)
const controller = createAaaSPilotKit({
figureId: 'xxx',
rendererMode: 'cloud'
});

// Mobile optimized
const controller = createAaaSPilotKit({
figureId: 'xxx',
rendererMode: 'cloud-native'
});

clientRendererConfig (object)

🖼️ Client-side Rendering Configuration - Only effective when rendererMode='client'.

Contains avatar resource paths, lip-sync mapping, TTS drive parameters, etc.

timeoutSec (number)

⏱️ Global Session Timeout (seconds) - Automatically ends conversation after timeout and releases resources.

  • Default value: 60

disconnectAlertSec (number)

Timeout Advance Warning (seconds) - Warning before timeout.

  • Default value: 10
  • Example: Set to 10 → Announce "Conversation will end soon" 10 seconds before timeout

figureResolutionWidth / figureResolutionHeight (number)

📐 Digital Employee Avatar Resolution (pixels).

  • Requirements:
    • Must be even number
    • Minimum 400, maximum not exceeding 1920
    • Combined with height cannot exceed 1080×1920 or 1920×1080
  • ⚠️ Warning: Too high resolution may cause performance degradation

speechSpeed (number)

🗣️ Broadcast Speech Speed (characters/second).

  • Default value: 6 (standard broadcasting speed)
  • Adjustment suggestions:
    • Teaching/elderly scenarios → 4~5
    • Fast-paced customer service → 7~8

typeDelay / enTypeDelay (number)

⌨️ "Typewriter Effect" Character Interval (milliseconds).

  • typeDelay default value: 163 (smooth and natural), calculation formula: 1000ms/163 outputs about 6 characters per second
  • enTypeDelay default value: 45 (English characters)

interruptible (boolean)

🛑 Allow User Interruption - Whether to allow users to interrupt current broadcast (triggered by voice/manual input).

  • Default value: true (recommended to enable, improves interaction experience)

prologue (string)

🎬 Opening Greeting - Automatically broadcast after initialization.

  • Example: "Hello, I am your digital employee Xiaoyue. How can I help you?"

scaleX / scaleY (number)

📏 Digital Employee Avatar Scaling Ratio.

  • Default value: 1.0
  • >1 enlarge, <1 shrink

translateX / translateY (number)

↔️↕️ Digital Employee Character Position Offset (pixels).

  • translateX: positive right, negative left
  • translateY: positive down, negative up

position (IPosition)

🧩 Pixel-level Position and Crop Control — Used for precise layout of Digital Employee avatar display area in final screen.

🔧 Workflow:

  1. First crop "Digital Employee avatar main body rectangle area" from original portrait backing based on crop
  2. Then scale + position the cropped area to final video canvas based on location

💡 Applicable Scenarios:

  • Embed Digital Employee avatar into fixed-size "dialog box" or "product card"
  • Pixel-level positioning requirements aligned with UI design mockups
  • Block background interference, only show Digital Employee avatar upper body/facial close-up

📐 Parameter Structure:

interface IPosition {
// Crop area (based on original backing)
crop: {x: number, y: number, width: number, height: number}
// Final positioning + scaling (based on output canvas)
location: {x: number, y: number, width: number, height: number}
}

⚠️ Notes:

  • All values are in pixel units, percentages not supported
  • crop area exceeding original backing → automatically clamp boundaries
  • location exceeding canvas → Digital Employee avatar partially or fully invisible
  • Does not conflict with scaleX/Y, translateX/Y, will stack calculation (first crop and position → then scale and translate)

Example — Center display of Digital Employee avatar upper body (original backing 1920x1080, output canvas 800x600):

const position = {
// Crop upper body
crop: {
x: 600,
y: 0,
width: 720,
height: 540
},
// Scale and position to upper left area
location: {
x: 40,
y: 20,
width: 720,
height: 540
}
};

minSplitLen (number)

✂️ First Sentence Streaming Split Granularity (character count).

  • Default value: 5
  • Logic: "First accumulate N characters, start broadcasting when encountering punctuation", avoids character-by-character stuttering
  • Example: "Today's weather is nice." → Start broadcasting at "nice."

ttsModel ('turbo_v2' | 'quality_v2' | undefined)

TTS Model Version.

  • undefined → Standard version (stable, low latency)
  • 'turbo_v2' → Accelerated version (faster response, suitable for real-time conversation)
  • 'quality_v2' → Quality version (sound quality priority, latency will increase)

Example:

const controller = await createAaaSPilotKit({
ttsModel: 'turbo_v2', // Use accelerated TTS
// ... other configurations
});

asrVad (number)

🎙️ ASR Voice Endpoint Detection (VAD) Silence Timeout Duration (milliseconds) - Used to determine user's pause threshold after "finishing a sentence".

  • Default value: 600
  • Popular understanding: How long you pause, system thinks you've finished speaking, starts recognition
    • Larger value → More "patient", suitable for users with slow speech speed or thinking habits
    • Smaller value → More "sensitive", suitable for fast response scenarios
  • Recommended values:
    • Default 600ms (suitable for most conversation scenarios)
    • Fast Q&A can set to 300~400ms
    • Slow speed/English teaching scenarios can set to 800~1000ms
  • ⚠️ Note: Too small may cause speech to be interrupted before finished, too large will make users feel "slow response"
Deprecation Notice (v1.2.0+)

asrVad top-level configuration is deprecated. Please migrate to asr.config.asrVad (Baidu ASR). Old syntax still compatible but not recommended.

// Old syntax (still compatible)
{ asrVad: 600 }

// New syntax (recommended)
{ asr: { provider: 'baidu', config: { asrVad: 600 } } }

asr (AsrConfig)

🎤 ASR Service Configuration (v1.1.0+) - Configure speech recognition service provider and its parameters.

New Feature

Starting from v1.1.0, supports Azure Microsoft Cloud speech recognition service, suitable for internationalization scenarios.

Type Definition:

type AsrConfig =
| {provider: 'baidu', config: IBaiduAsrConfig}
| {provider: 'azure', config: IAzureSpeechConfig};

Baidu ASR Configuration (Default)

Uses Baidu speech recognition service, suitable for domestic scenarios.

FieldTypeDefault ValueDescription
asrVadnumber600ASR VAD silence timeout (milliseconds)
langLanguageCode'zh'Language configuration
audioConstraintsMediaTrackConstraintsSee belowAudio constraints
enableEchoCancellationbooleanfalseEnable echo cancellation (recommended for mobile)
echoCancellationConfigobject-Echo cancellation tuning parameters

Example:

import {createAaaSPilotKit, Language} from '@bdky/aaas-pilot-kit';

const controller = createAaaSPilotKit({
token: 'xxx',
figureId: 'xxx',
ttsPer: 'xxx',
agentConfig: {...},
asr: {
provider: 'baidu',
config: {
asrVad: 600,
lang: Language.CHINESE
}
}
});

Azure ASR Configuration (Internationalization)

Uses Azure Microsoft Cloud speech recognition service, suitable for international multi-language scenarios.

FieldTypeRequiredDefault ValueDescription
subscriptionKeystring-Azure Speech subscription key
regionstring-Azure region (such as 'eastasia', 'southeastasia')
languagesstring[]-['en-US']Recognition language list (up to 4, supports multi-language auto-switching)
phraseListstring[]--Custom phrase list (improve proper noun recognition rate)
phraseWeightnumber-2Phrase weight (1-10)
initialSilenceTimeoutMsnumber-30000Initial silence timeout (milliseconds)
endSilenceTimeoutMsnumber-30000End silence timeout (milliseconds)
segmentationSilenceTimeoutMsnumber-1000Segmentation silence timeout (milliseconds)
connectionTimeoutMsnumber--Connection timeout (milliseconds, no timeout if not set)
enableAudioLoggingboolean-falseEnable audio logging (for debugging)
customEndpointIdstring--Custom speech endpoint ID
advancedConfigRecord<string, any>--Advanced configuration
audioConstraintsMediaTrackConstraints-See belowAudio constraints
enableEchoCancellationboolean-falseEnable echo cancellation
echoCancellationConfigobject--Echo cancellation tuning parameters

Example:

import {
createAaaSPilotKit,
AzureSpeechRegion,
AzureSpeechLanguage
} from '@bdky/aaas-pilot-kit';

const controller = createAaaSPilotKit({
token: 'xxx',
figureId: 'xxx',
ttsPer: 'xxx',
agentConfig: {...},
asr: {
provider: 'azure',
config: {
subscriptionKey: 'YOUR_AZURE_SUBSCRIPTION_KEY',
region: AzureSpeechRegion.SOUTHEAST_ASIA,
languages: [
AzureSpeechLanguage.CHINESE_SIMPLIFIED_CN,
AzureSpeechLanguage.ENGLISH_US
],
initialSilenceTimeoutMs: 30000,
endSilenceTimeoutMs: 30000,
phraseList: ['KeyueONE', 'AaaS']
}
}
});

Universal Audio Configuration

The following configurations apply to both Baidu and Azure providers:

audioConstraints default value:

{
echoCancellation: true,
noiseSuppression: true,
autoGainControl: true
}

echoCancellationConfig structure:

{
energyMultiplier?: number; // Energy multiplier
idleThreshold?: number; // Idle threshold
smoothingFactor?: number; // Smoothing factor
recoveryDelay?: number; // Recovery delay (milliseconds)
}

Enable echo cancellation example (recommended for mobile):

asr: {
provider: 'azure', // or 'baidu'
config: {
// ... provider-specific configuration
enableEchoCancellation: true,
audioConstraints: {
echoCancellation: true,
noiseSuppression: true,
autoGainControl: true
}
}
}

checkAudioDeviceBeforeStart (boolean)

🎙️ [Optional] Check Audio Device Availability Before ASR Starts - Performs 4-layer progressive detection (API support, HTTPS, device enumeration, permissions, stream acquisition).

Default value: true

After enabling (default true):

Performance Impact:

  • First detection adds ~100-500ms (mainly getUserMedia)
  • Results cached within 5 seconds, repeated calls <1ms

Recommended Scenarios:

  • Scenarios with high user experience requirements (discover issues early)
  • Scenarios needing precise error prompts (distinguish "no device", "permission denied", "device occupied")

Example:

const controller = await createAaaSPilotKit({
checkAudioDeviceBeforeStart: true, // Auto-check before starting (recommended)
// ... other configurations
});

// Listen for detection results
controller.emitter.on('microphone_available', (result) => {
if (!result.available) {
console.error('Device detection failed:', result.userMessage);

// Provide solution based on error type
if (result.error === 'PERMISSION_DENIED') {
showPermissionGuide();
}
else if (result.error === 'HTTPS_REQUIRED') {
showHTTPSWarning();
}
}
});

Related Event: microphone_available - Receive detection results Related Method: checkAudioDevice() - Manually trigger device detection

microphoneFailureHandling ('error' | 'warn' | 'silent' | 'prompt')

🎙️ [Optional] Handling Strategy When Microphone Detection Fails - Behavior when checkAudioDeviceBeforeStart=true and microphone detection fails.

Default value: 'error'

Strategy Description:

  • 'error' (default) → Throws AsrInitializationError, terminates startup
  • 'warn' → Outputs warning in console, disables voice input, continues startup (text-only mode)
  • 'silent' → Silently degrades to text-only mode, no warning output (still emits events)
  • 'prompt' → Calls onMicrophoneCheckFailed callback, lets developer customize interaction logic

Usage Scenarios:

  • 'error' → Scenarios with strict voice function requirements (such as voice customer service)
  • 'warn' → Voice is optional function, allows text input fallback
  • 'silent' → Automatic degradation, does not interfere with user experience
  • 'prompt' → Need to display custom dialog for user confirmation

Notes:

  • After degradation, users can still perform text input through input(text) method
  • System will emit device_check_completed and microphone_available events
  • ASR service will be automatically disabled (controller.asrService.disabled = true)

Example:

const controller = await createAaaSPilotKit({
checkAudioDeviceBeforeStart: true,
microphoneFailureHandling: 'warn', // Warn but continue
// ... other configurations
});

// Listen for microphone availability
controller.emitter.on('microphone_available', (result) => {
if (!result.available) {
console.warn('Microphone unavailable, degraded to text-only mode');
showTextOnlyModeNotice();
}
});

Related Configuration:

Related Events:

onMicrophoneCheckFailed (Function)

🔔 [Optional] Custom Handling Callback When Microphone Detection Fails - Only effective when microphoneFailureHandling='prompt'.

Type Definition:

type OnMicrophoneCheckFailed = (
result: IAudioDeviceCheckResult,
continueCallback: () => void
) => void | Promise<void>

Parameter Description:

ParameterTypeDescription
resultIAudioDeviceCheckResultDevice detection result, includes error details and user-friendly prompts
continueCallback() => voidCallback function, after calling, continue initialization process (no microphone mode)

Typical Uses:

  • Display custom confirmation dialog
  • Provide "Continue using" or "Cancel" options
  • Record user decision for data analysis

Complete Example:

import {createAaaSPilotKit} from '@bdky/aaas-pilot-kit';

const controller = await createAaaSPilotKit({
checkAudioDeviceBeforeStart: true,
microphoneFailureHandling: 'prompt',
onMicrophoneCheckFailed: async (result, continueCallback) => {
// Display custom dialog
const userChoice = await showDialog({
title: 'Microphone Unavailable',
message: result.userMessage,
buttons: [
{text: 'Continue (text only)', value: 'continue'},
{text: 'Cancel', value: 'cancel'}
]
});

if (userChoice === 'continue') {
console.log('User chose to degrade to text mode');
continueCallback(); // Continue initialization
} else {
console.log('User cancelled initialization');
// Don't call continueCallback, initialization will fail
throw new Error('User cancelled initialization');
}
},
// ... other configurations
});

Error Handling:

  • If callback throws exception, initialization will fail
  • If continueCallback is not called, initialization will wait (async case) or fail (sync case)
  • If callback execution fails, warning will be output in console and return false

Notes:

  • Only effective when microphoneFailureHandling='prompt'
  • Callback can be sync or async function
  • Must call continueCallback() to continue initialization process

Related Configuration:

env ('development' | 'sandbox' | 'production')

🌐 Runtime Environment.

  • 'development' → Development debugging (logs fully on)
  • 'sandbox' → Sandbox testing (simulates production)
  • 'production' → Production environment (default, optimal performance)

enableDebugMode (boolean)

🐞 Enable Debug Mode - Output full-chain logs (ASR/Agent/TTS/rendering).

  • Recommended to enable during development debugging, must be disabled in production

hotWordReplacementRules (ReplacementRule[])

🧩 Speech Recognition Hotword Correction Rules (regex replacement).

Used to correct proper nouns, brand words, etc., where ASR recognition is inaccurate.

// Example
hotWordReplacementRules: [
{pattern: /客悦\s*one/gi, replacement: 'Keyue·ONE'},
{pattern: /A I/g, replacement: 'AI'}
]

speechFormatters / conversationFormatters (TFormatter[])

📝 Text Format Processing Functions.

  • speechFormatters: Voice text input format processing
  • conversationFormatters: Message content format processing

agentService (Newable<BaseAgentService>)

🧠 Custom Agent Service Class (must inherit from BaseAgentService).

Used for implementing private Agent streaming API protocol integration. ⚠️ After binding this, agentConfig configuration will not be available.

Detailed Implementation Guide: Custom AgentService Configuration

inactivityPrompt (string)

⏸️ Long-time No Interaction Prompt - When user is silent for timeout, Digital Employee actively broadcasts reminder.

  • Example: "Are you still there? I can continue serving you~"

autoChromaKey (boolean)

🟢 Whether to Automatically Enable Green Screen Keying (only effective when rendererMode='cloud').

  • Default value: true → Automatically remove background, blend into your page
  • false → Keep original background (suitable for materials with alpha channel)

Configuration Example

const options: IOptions = {
// Required configuration
token: 'your-auth-token-here',
figureId: '209337',
ttsPer: 'LITE_audiobook_female_1',
agentConfig: {
token: 'your-agent-token',
robotId: 'your-robot-id'
},

// Optional configuration
locale: Language.ENGLISH, // Unified language configuration (interface + speech)
ttsSample: 16000,
rendererMode: 'cloud',
timeoutSec: 60,
speechSpeed: 6,
interruptible: true,
prologue: 'Hello, I am your digital employee. How can I help you?',
asrVad: 600,
env: 'production',
enableDebugMode: false,
autoChromaKey: true,
inactivityPrompt: 'You haven\'t spoken for a long time, I\'m going to disconnect~',

// Hotword replacement rules
hotWordReplacementRules: [
{pattern: /客悦\s*one/gi, replacement: 'Keyue·ONE'},
{pattern: /A I/g, replacement: 'AI'}
]
};

const controller = createAaaSPilotKit(options);