> ## Documentation Index
> Fetch the complete documentation index at: https://docs.akool.com/llms.txt
> Use this file to discover all available pages before exploring further.

# Create Session

> Create a new streaming avatar session

<Note>
  Both the avatar\_id and voice\_id can be easily obtained by copying them directly from the web interface. You can also create and manage your streaming avatars using our intuitive web platform.

  Create and manage your avatars at: [https://akool.com/apps/upload/avatar?from=%2Fapps%2Fstreaming-avatar%2Fedit](https://akool.com/apps/upload/avatar?from=%2Fapps%2Fstreaming-avatar%2Fedit)
</Note>

<Note>
  **Knowledge Base Integration:** You can enhance your streaming avatar with contextual AI responses by integrating a [Knowledge Base](/ai-tools-suite/knowledge-base). When creating a session, provide a `knowledge_id` parameter to enable the AI to use documents and URLs from your knowledge base for more accurate and relevant responses.
</Note>

<Note>
  **ElevenLabs Custom Configuration:** You can use your own ElevenLabs API key and customize voice parameters by providing `elevenlabs_settings` within the `voice_params` object. This allows you to:

  * Use your own ElevenLabs account and billing
  * Choose specific ElevenLabs models (e.g., eleven\_flash\_v2\_5, eleven\_turbo\_v2\_5)
  * Fine-tune voice characteristics such as stability, similarity\_boost, and style
  * Enable speaker boost for improved voice cloning quality

  When using ElevenLabs custom settings, specify your desired ElevenLabs voice ID in the `voice_id` field, and the backend will pass through all your custom parameters to ElevenLabs.
</Note>


## OpenAPI

````yaml POST /api/open/v4/liveAvatar/session/create
openapi: 3.0.3
info:
  title: Live Avatar API
  description: API for managing streaming avatars and sessions
  version: 1.0.0
servers:
  - url: https://openapi.akool.com
    description: Production server
security:
  - ApiKeyAuth: []
  - BearerAuth: []
paths:
  /api/open/v4/liveAvatar/session/create:
    post:
      tags:
        - Session Management
      summary: Create Session
      description: Create a new streaming avatar session
      requestBody:
        required: true
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/CreateSessionRequest'
            example:
              avatar_id: dvp_Tristan_cloth2_1080P
              duration: 3600
              voice_id: iP95p4xoKVk53GoZ742B
              language: en
              mode_type: 2
              knowledge_id: 64f8a1b2c3d4e5f6a7b8c9d0
              background_url: https://example.com/background.jpg
              stream_type: agora
              credentials:
                agora_uid: 100000
                agora_app_id: your-app-id
                agora_channel: your-channel
                agora_token: your-token
              voice_params:
                speed: 1
                stt_language: en
                pron_map:
                  akool: ai ku er
                stt_type: openai_realtime
                turn_detection:
                  type: server_vad
                  threshold: 0.5
                  prefix_padding_ms: 300
                  silence_duration_ms: 500
                elevenlabs_settings:
                  api_key: sk_your_elevenlabs_api_key
                  model_id: eleven_flash_v2_5
                  stability: 0.5
                  similarity_boost: 0.75
                  style: 0
                  use_speaker_boost: true
      responses:
        '200':
          description: Session created successfully
          content:
            application/json:
              schema:
                allOf:
                  - $ref: '#/components/schemas/ApiResponse'
                  - type: object
                    properties:
                      data:
                        $ref: '#/components/schemas/SessionResponse'
components:
  schemas:
    CreateSessionRequest:
      type: object
      required:
        - avatar_id
      properties:
        avatar_id:
          type: string
          description: >-
            Digital human model in real-time avatar. If you want to use a custom
            uploaded video, you need to call the avatar/create interface to
            create a template. This process takes some time to process. You can
            check the processing status through the avatar/detail interface.
            When status=3, you can use the avatar_id field to pass it in.
        duration:
          type: number
          maximum: 3600
          description: >-
            Session duration in seconds (max: 3600). Credits are pre-charged for
            the full duration, but any unused credits will be refunded after the
            session ends. Rates depend on your subscription plan.
        knowledge_id:
          type: string
          description: >-
            Knowledge base ID to provide context for AI responses. Create and
            manage knowledge bases using the Knowledge Base API. When provided,
            the AI will use documents and URLs from the knowledge base to
            enhance response accuracy.
        voice_id:
          type: string
          description: >-
            Voice ID to change avatar's voice. Get valid IDs from Voice List
            API. Note that voice IDs from Akool Multilingual 2 cannot be used
            with Streaming Avatar.
        voice_url:
          type: string
          description: Custom voice model URL. Get valid URLs from Voice List API
        language:
          type: string
          description: >-
            Language code to use for the session. Get valid codes from Language
            List API
        mode_type:
          type: integer
          enum:
            - 1
            - 2
          description: >
            Avatar interaction mode that determines how the avatar responds to
            input:

            - `1`: Retelling mode - Avatar repeats the provided content verbatim

            - `2`: Dialogue mode - Avatar engages in conversational interaction
        scene_mode:
          type: string
          enum:
            - fast_dialogue
          description: >
            Scene mode for the session.


            - `fast_dialogue`: Low-latency dialogue mode. Optimized for realtime
            voice interaction and **voice input only**.
        e2e_type:
          type: string
          enum:
            - openai
          description: >
            End-to-end (E2E) pipeline provider. Only applicable when
            `scene_mode` is `fast_dialogue`.


            - `openai`: OpenAI
        background_url:
          type: string
          description: URL of background image/video for avatar scene
        voice_params:
          description: >
            Voice configuration for the session.


            Use `Option 2` when `scene_mode` is `fast_dialogue`. Otherwise use
            `Option 1`.
          anyOf:
            - $ref: '#/components/schemas/VoiceParams'
            - $ref: '#/components/schemas/FastDialogueVoiceParams'
        stream_type:
          type: string
          enum:
            - agora
            - livekit
            - trtc
          default: agora
          description: >-
            Stream type to use for the session. "agora" = Agora (default),
            "livekit" = Livekit, "trtc" = TRTC
        credentials:
          $ref: '#/components/schemas/Credentials'
    ApiResponse:
      type: object
      required:
        - code
        - msg
      properties:
        code:
          type: integer
          description: 'Interface returns business status code (1000: success)'
          example: 1000
        msg:
          type: string
          description: Interface returns status information
          example: OK
    SessionResponse:
      type: object
      properties:
        _id:
          type: string
          description: Session ID
        uid:
          type: integer
          description: User ID
        type:
          type: integer
          description: Session type
        status:
          type: integer
          enum:
            - 1
            - 2
            - 3
            - 4
          description: Session status (1:queueing, 2:processing, 3:completed, 4:failed)
        stream_type:
          type: string
          description: Stream type used for the session
        credentials:
          $ref: '#/components/schemas/Credentials'
    VoiceParams:
      type: object
      description: Voice parameters for normal mode
      properties:
        speed:
          type: number
          minimum: 0.8
          maximum: 1.2
          default: 1
          description: >-
            Controls the speed of the generated speech. Values range from 0.8 to
            1.2, with 1.0 being the default speed.
        pron_map:
          type: object
          additionalProperties:
            type: string
          description: >-
            Pronunciation mapping for custom words. Example: pron_map with akool
            mapped to ai ku er
        stt_language:
          type: string
          description: >-
            Language code for speech-to-text recognition to improve accuracy by
            using language-specific models. If not specified, default language
            or auto-detection will be used.
        stt_type:
          type: string
          enum:
            - openai_realtime
          description: Speech-to-text type. "openai_realtime" = OpenAI Realtime
        turn_detection:
          $ref: '#/components/schemas/TurnDetection'
        elevenlabs_settings:
          $ref: '#/components/schemas/ElevenlabsSettings'
    FastDialogueVoiceParams:
      type: object
      additionalProperties: false
      description: Voice parameters for `scene_mode=fast_dialogue`
      properties:
        voice_id:
          type: string
          enum:
            - alloy
            - ash
            - ballad
            - coral
            - echo
            - sage
            - shimmer
            - verse
            - marin
            - cedar
          description: >
            Preset voice ID used in fast dialogue mode.


            Notes:

            - Voices returned by the Voice List API are **not** available in
            this mode.
        stt_language:
          type: string
          description: >-
            Language code for speech-to-text recognition to improve accuracy by
            using language-specific models. If not specified, default language
            or auto-detection will be used.
        turn_detection:
          $ref: '#/components/schemas/TurnDetection'
        elevenlabs_settings:
          $ref: '#/components/schemas/ElevenlabsSettings'
    Credentials:
      type: object
      properties:
        agora_uid:
          type: number
          description: Agora SDK user ID (required when stream_type is "agora")
        agora_app_id:
          type: string
          description: Agora App ID (optional when stream_type is "agora")
        agora_channel:
          type: string
          description: Agora channel name (required when stream_type is "agora")
        agora_token:
          type: string
          description: Agora access token (required when stream_type is "agora")
        livekit_url:
          type: string
          description: LiveKit server URL (required when stream_type is "livekit")
        livekit_token:
          type: string
          description: LiveKit access token (required when stream_type is "livekit")
        livekit_room_name:
          type: string
          description: LiveKit room name (optional when stream_type is "livekit")
        livekit_server_identity:
          type: string
          description: LiveKit server identity (optional when stream_type is "livekit")
        livekit_client_identity:
          type: string
          description: LiveKit client identity (optional when stream_type is "livekit")
        trtc_sdk_app_id:
          type: number
          description: TRTC App ID (required when stream_type is "trtc")
        trtc_sdk_room_id:
          type: string
          description: TRTC room ID (required when stream_type is "trtc")
        trtc_sdk_user_id:
          type: string
          description: TRTC user ID (required when stream_type is "trtc")
        trtc_sdk_user_sig:
          type: string
          description: >-
            TRTC authentication token (userSig) (required when stream_type is
            "trtc")
    TurnDetection:
      type: object
      properties:
        type:
          type: string
          enum:
            - server_vad
            - semantic_vad
          description: >-
            Turn detection type. "server_vad" = Server VAD, "semantic_vad" =
            Semantic VAD
        threshold:
          type: number
          minimum: 0
          maximum: 1
          description: >-
            Activation threshold (0 to 1). A higher threshold will require
            louder audio to activate the model, and thus might perform better in
            noisy environments. Available when type is "server_vad".
        prefix_padding_ms:
          type: integer
          description: >-
            Amount of audio (in milliseconds) to include before the VAD detected
            speech. Available when type is "server_vad".
        silence_duration_ms:
          type: integer
          description: >-
            Duration of silence (in milliseconds) to detect speech stop. With
            shorter values turns will be detected more quickly. Available when
            type is "server_vad".
    ElevenlabsSettings:
      type: object
      description: >-
        ElevenLabs custom voice configuration. Allows you to use your own
        ElevenLabs API key and customize voice parameters. The voice_id should
        be specified in the main voice_id field of the session request.
      properties:
        api_key:
          type: string
          description: Your ElevenLabs API key for authentication
        model_id:
          type: string
          description: >-
            ElevenLabs model ID to use for voice generation (e.g.,
            "eleven_flash_v2_5", "eleven_turbo_v2_5", "eleven_multilingual_v2")
        stability:
          type: number
          minimum: 0
          maximum: 1
          description: >-
            Controls the stability of the voice. Higher values make the voice
            more consistent, lower values add more variation. Range is 0 to 1.
        similarity_boost:
          type: number
          minimum: 0
          maximum: 1
          description: >-
            Controls how closely the AI should adhere to the original voice.
            Higher values stick closer to the original voice. Range is 0 to 1.
        style:
          type: number
          minimum: 0
          maximum: 1
          description: Controls the style exaggeration of the voice. Range is 0 to 1.
        use_speaker_boost:
          type: boolean
          description: >-
            Enhances the similarity to the original speaker. Recommended for
            improved voice cloning quality.
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: x-api-key
      description: >-
        Your API Key used for request authorization. If both Authorization and
        x-api-key have values, Authorization will be used first and x-api-key
        will be discarded.
    BearerAuth:
      type: http
      scheme: bearer
      description: >-
        Your API Key used for request authorization. Get Token from
        authentication/usage#get-the-token

````