Skip to main content
Both the avatar_id and voice_id can be easily obtained by copying them directly from the web interface. You can also create and manage your streaming avatars using our intuitive web platform.Create and manage your avatars at: https://akool.com/apps/upload/avatar?from=%2Fapps%2Fstreaming-avatar%2Fedit
The resources (image, video, voice) generated by our API are valid for 7 days. Please save the relevant resources as soon as possible to prevent expiration.
To experience our live avatar streaming feature in action, explore our demo built on the Agora streaming service: AKool Streaming Avatar React Demo.
Knowledge Base Integration: You can enhance your streaming avatar with contextual AI responses by integrating a Knowledge Base. When creating a session, provide a knowledge_id parameter to enable the AI to use documents and URLs from your knowledge base for more accurate and relevant responses.

API Endpoints

Avatar Management

Session Management

Live Avatar Stream Message

IAgoraRTCClient.on(event: "stream-message", listener: (uid: UID, pld: Uint8Array) => void)

IAgoraRTCClient.sendStreamMessage(msg: Uint8Array | string, flag: boolean): Promise<void>;
Send Data Chat Type Parameters
ParameterTypeRequiredValueDescription
vNumberYes2Version of the message
typeStringYeschatMessage type for chat interactions
midStringYesUnique message identifier for conversation tracking
idxNumberYesSequential index of the message, start from 0
finBooleanYesIndicates if this is the final part of the message
pldObjectYesContainer for message payload
pld.textStringYesText content to send to avatar (e.g. “Hello”)
Command Type Parameters
ParameterTypeRequiredValueDescription
vNumberYes2Protocol version number
typeStringYescommandSpecifies this is a system command message
midStringYesUnique ID to track and correlate command messages
pldObjectYesContains the command details and parameters
pld.cmdStringYesCommand action to execute. Valid values: “set-params” (update avatar settings), “interrupt” (stop current avatar response)
pld.dataObjectNoParameters for the command (required for “set-params”)
pld.data.vidStringNoDeprecated. Use pld.data.vparams.vid instead. Voice ID to change avatar’s voice. Only used with “set-params”. Get valid IDs from Voice List API
pld.data.vurlStringNoDeprecated. Use pld.data.vparams.vurl instead. Custom voice model URL. Only used with “set-params”. Get valid URLs from Voice List API
pld.data.langStringNoLanguage code for avatar responses (e.g. “en”, “es”). Only used with “set-params”. Get valid codes from Language List API
pld.data.modeNumberNoAvatar interaction style. Only used with “set-params”. “1” = Retelling (avatar repeats content), “2” = Dialogue (avatar engages in conversation)
pld.data.bgurlStringNoURL of background image/video for avatar scene. Only used with “set-params”
pld.data.vparamsObjectNoVoice parameters to use for the session.
Voice Parameters
ParameterTypeRequiredValueDescription
vidStringNoVoice ID to change avatar’s voice. Only used with “set-params”. Get valid IDs from Voice List API
vurlStringNoCustom voice model URL. Only used with “set-params”. Get valid URLs from Voice List API
speeddoubleNo1Controls the speed of the generated speech. Values range from 0.8 to 1.2, with 1.0 being the default speed.
pron_mapObjectNoPronunciation mapping for custom words. Example: “pron_map”: { “akool” : “ai ku er” }
stt_typeStringNoSpeech-to-text type. "openai_realtime" = OpenAI Realtime
turn_detectionObjectNoTurn detection configuration.
Turn Detection Configuration
ParameterTypeRequiredValueDescription
typeStringNo”server_vad”Turn detection type. "server_vad" = Server VAD, "semantic_vad" = Semantic VAD
thresholdNumberNo0.5Activation threshold (0 to 1). A higher threshold will require louder audio to activate the model, and thus might perform better in noisy environments. Available when type is "server_vad".
prefix_padding_msNumberNo300Amount of audio (in milliseconds) to include before the VAD detected speech. Available when type is "server_vad".
silence_duration_msNumberNo500Duration of silence (in milliseconds) to detect speech stop. With shorter values turns will be detected more quickly. Available when type is "server_vad".
JSON Example
{
    "v": 2,
    "type": "chat",
    "mid": "msg-1723629433573",
    "idx": 0,
    "fin": true,
    "pld": {
      "text": "Hello"
    },
}
Receive Data Chat Type Parameters
ParameterTypeValueDescription
vNumber2Version of the message
typeStringchatMessage type for chat interactions
midStringUnique message identifier for tracking conversation flow
idxNumberSequential index of the message part
finBooleanIndicates if this is the final part of the response
pldObjectContainer for message payload
pld.fromString”bot” or “user”Source of the message - “bot” for avatar responses, “user” for speech recognition input
pld.textStringText content of the message
Command Type Parameters
ParameterTypeValueDescription
vNumber2Version of the message
typeStringcommandMessage type for system commands
midStringUnique identifier for tracking related messages in a conversation
pldObjectContainer for command payload
pld.cmdString”set-params”, “interrupt”Command to execute: “set-params” to update avatar settings, “interrupt” to stop current response
pld.codeNumber1000Response code from the server, 1000 indicates success
pld.msgStringResponse message from the server
JSON Example
{
    "v": 2,
    "type": "chat",
    "mid": "msg-1723629433573",
    "idx": 0,
    "fin": true,
    "pld": {
      "from": "bot",
      "text": "Hello!  How can I assist you today? "
    }
}
Typescript Example
const client: IAgoraRTCClient = AgoraRTC.createClient({
  mode: 'rtc',
  codec: 'vp8',
});

client.join(agora_app_id, agora_channel, agora_token, agora_uid);

client.on('stream-message', (message: Uint8Array | string) => {
  console.log('received: %s', message);
});

Integrating Your Own LLM Service

Before dispatching a message to the WebSocket, consider executing an HTTP request to your LLM service.
const client: IAgoraRTCClient = AgoraRTC.createClient({
  mode: 'rtc',
  codec: 'vp8',
});

client.join(agora_app_id, agora_channel, agora_token, agora_uid);

client.on('stream-message', (message: Uint8Array | string) => {
  console.log('received: %s', message);
});

let inputMessage = 'hello';

try {
  const response = await fetch('https://your-backend-host/api/llm/answer', {
    method: "POST",
    headers: {
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      question: inputMessage,
    }),
  });

  if (response.ok) {
    const result = await response.json();
    inputMessage = result.answer;
  } else {
    console.error("Failed to fetch from backend", response.statusText);
  }
} catch (error) {
  console.error("Error during fetch operation", error);
}

const message = {
  v: 2,
  type: "chat",
  mid: "msg-1723629433573",
  idx: 0,
  fin: true,
  pld: {
      text: inputMessage,
  },
};
client.sendStreamMessage(JSON.stringify(message), false);

Response Code Description

Please note that if the value of the response code is not equal to 1000, the request is failed or wrong
ParameterValueDescription
code1000Success
code1003Parameter error or Parameter can not beempty
code1008The content you get does not exist
code1009Youdo not have permission to operate
code1101Invalid authorization or Therequest token has expired
code1102Authorization cannot be empty
code1200The account has been banned
code1201create audio error, pleasetry again later
code1202The same video cannot be translated lipSync inthe same language more than 1 times
code1203video should be with audio
code1204Your video duration is exceed 60s!
code1205Create videoerror, please try again later
code1207The video you are using exceeds thesize limit allowed by the system by 300M
code1209Please upload a videoin another encoding format
code1210The video you are using exceeds thevalue allowed by the system by 60fp
code1211Create lipsync error, pleasetry again later