Skip to main content
Detect faces in images and videos with high accuracy. Get bounding boxes, 6-point landmarks, cropped face images, and face tracking for video content.

API Endpoints

Face Detection Operations

  • Detect Faces - Unified endpoint for face detection in images and videos (auto-detects media type)
  • Analyze Frames - Multi-frame face analysis with person deduplication for face swap preparation

Getting Started

Basic Workflow

  1. For Image Face Detection:
    • Call the Detect Faces API with an image URL or base64-encoded image
    • Only the url or img parameter is required (no need for num_frames)
    • Receive bounding boxes and 6-point landmarks for all detected faces
    • Optionally get cropped face image URLs with return_face_url=true
    • Use the landmark data for downstream tasks (e.g., face swap, face recognition)
  2. For Video Face Detection:
    • Call the Detect Faces API with a video URL
    • Specify the num_frames parameter to control how many frames to analyze (default: 5)
    • Get face tracking data across frames with removed face positions

Response Code Description

Error code 0 indicates success. Any non-zero error code indicates a failure. Check the error_msg field for detailed error information.
CodeDescription
0Success
1Error - Check error_msg for details

Features

Dual Input Modes

The API supports two ways to provide image input:
  1. URL Mode: Provide a publicly accessible URL to an image or video
  2. Base64 Mode: Provide base64-encoded image data (with or without data URI prefix)
// URL mode
{ "url": "https://example.com/image.jpg" }

// Base64 mode
{ "img": "..." }

6-Point Facial Landmarks

The API detects 6 key facial landmarks for each face:
  1. Left Eye - Center point of the left eye
  2. Right Eye - Center point of the right eye
  3. Nose Tip - Tip of the nose
  4. Mouth Center - Center point of the mouth (X-axis midpoint between mouth corners)
  5. Left Mouth Corner - Left corner of the mouth
  6. Right Mouth Corner - Right corner of the mouth

Cropped Face Images

When return_face_url=true, the API returns:
  • face_urls: URLs to cropped face images stored in cloud storage
  • crop_region: The region coordinates used for cropping
  • crop_landmarks: Landmarks relative to the cropped image
This is particularly useful for face swap operations where you need both the face image and its landmarks.

Single Face Mode

When single_face=true, the API returns only the largest face (by area) in each frame. This is useful for:
  • Portrait photos where you only care about the main subject
  • ID photos with a single person
  • Reducing response size when multiple faces are detected

Face Tracking for Videos

For video content, the API provides advanced face tracking:
  • Persistent Face IDs - Tracks the same face across multiple frames
  • Removed Faces - Identifies faces that were present in previous frames but are no longer visible
  • Frame Timing - Provides timestamp information for each frame

Auto Media Type Detection

The API automatically detects whether the input is an image or video based on:
  • File extension (.jpg, .png, .mp4, .mov, etc.)
  • Content-Type header from the URL
  • Fallback to content analysis if needed

Best Practices

Image Requirements

  • Quality: Use high-resolution images for better detection accuracy
  • Face Visibility: Ensure faces are clearly visible and not obscured
  • Lighting: Well-lit images produce better detection results
  • Angle: Frontal or slight angle faces work best (±45 degrees)
  • Size: Face size should be at least 80x80 pixels

Video Requirements

  • Duration: Shorter videos process faster
  • Frame Rate: Standard frame rates (24-30 fps) are optimal
  • Resolution: 720p or higher recommended for best results
  • Face Count: API can detect multiple faces per frame
  • Encoding: Use standard encoding formats (H.264 recommended)

API Usage Tips

  • Parameter Usage:
    • For Images: Only url or img parameter is required. The num_frames parameter is NOT needed.
    • For Videos: Both url and num_frames parameters are recommended.
  • Frame Selection (for videos only):
    • Short videos (< 10s): 5-10 frames
    • Medium videos (10-30s): 10-20 frames
    • Long videos (> 30s): 20-50 frames
  • URL Accessibility: Ensure the media URL is publicly accessible
  • Supported Formats:
    • Images: JPG, JPEG, PNG, BMP, WEBP
    • Videos: MP4, MOV, AVI, WEBM

Understanding the Response

Response Structure

{
  "error_code": 0,
  "error_msg": "SUCCESS",
  "faces_obj": {
    "0": {
      "landmarks": [
        [[100, 120], [150, 120], [125, 150], [125, 180], [110, 180], [140, 180]]
      ],
      "landmarks_str": [
        "100,120:150,120:125,150:125,180"
      ],
      "region": [[80, 100, 100, 120]],
      "removed": [],
      "frame_time": null,
      "face_urls": null,
      "crop_region": null,
      "crop_landmarks": null
    }
  }
}

Field Descriptions

  • error_code: Status code (0 = success)
  • error_msg: Status message or error description
  • faces_obj: Dictionary keyed by frame index (as string)
    • landmarks: Array of 6-point landmarks for each detected face
      • Format: [[x1, y1], [x2, y2], [x3, y3], [x4, y4], [x5, y5], [x6, y6]]
    • landmarks_str: String format of first 4 landmarks for Face Swap API compatibility
      • Format: "x1,y1:x2,y2:x3,y3:x4,y4"
    • region: Bounding boxes for each detected face
      • Format: [x, y, width, height] where (x, y) is the top-left corner
    • removed: Bounding boxes of faces no longer visible (video only)
    • frame_time: Timestamp in seconds for this frame (video only, null for images)
    • face_urls: Cropped face image URLs (only when return_face_url=true)
    • crop_region: Cropping region in original image (only when return_face_url=true)
    • crop_landmarks: Landmarks relative to cropped image (only when return_face_url=true)

Common Use Cases

Face Detection for Image Processing

{
  "url": "https://example.com/portrait.jpg"
}
Use case: Detect faces in a portrait photo for face alignment, face recognition, or face swap preprocessing.
For images, the num_frames parameter is not needed and will be ignored.

Get Cropped Face Images for Face Swap

{
  "url": "https://example.com/photo.jpg",
  "return_face_url": true
}
Use case: Get cropped face images with their landmarks for direct use in Face Swap API.

Face Tracking in Video Content

{
  "url": "https://example.com/video.mp4",
  "num_frames": 15
}
Use case: Track faces across video frames for video editing, face swap in videos, or facial animation.

Single Face Detection

{
  "url": "https://example.com/group_photo.jpg",
  "single_face": true
}
Use case: Get only the main/largest face from a group photo.

Multiple Face Detection

The API automatically detects all faces in an image or video frame. No special configuration needed.

Integration with Face Swap

  1. Use Face Detection API to get face landmarks and optionally cropped face URLs
  2. Pass the landmarks_str value to Face Swap API as the opts parameter
  3. When using return_face_url=true, use crop_landmarks for the cropped face image

Error Handling

Common Errors

Error MessageCauseSolution
”Either ‘url’ or ‘img’ parameter must be provided”Missing inputProvide either url or img parameter
”Invalid URL format”Malformed URL providedEnsure URL is properly formatted with protocol (http/https)
“Failed to download media”URL inaccessible or invalidVerify URL is publicly accessible
”No faces detected”No faces found in mediaCheck image quality and face visibility
”Failed to process media”Media format not supportedUse supported formats (JPG, PNG, MP4, etc.)
”Media type detection failed”Unable to determine media typeEnsure file has proper extension or content-type

Handling Failed Requests

# Example error handling in Python

# For image detection (no num_frames needed)
response = requests.post(
    "https://openapi.akool.com/interface/detect-api/detect_faces",
    json={"url": "https://example.com/image.jpg"},
    headers={"x-api-key": "YOUR_API_KEY"}
)

# For video detection (num_frames recommended)
# response = requests.post(
#     "https://openapi.akool.com/interface/detect-api/detect_faces",
#     json={"url": "https://example.com/video.mp4", "num_frames": 10},
#     headers={"x-api-key": "YOUR_API_KEY"}
# )

result = response.json()
if result["error_code"] != 0:
    print(f"Error: {result['error_msg']}")
else:
    faces = result["faces_obj"]
    print(f"Detected {len(faces['0']['landmarks'])} faces")

Performance Considerations

Processing Time

  • Images: Typically < 1 second
  • Videos: Varies based on:
    • Number of frames requested
    • Video resolution
    • Number of faces per frame

Rate Limits

Rate limits apply to all API endpoints. Please refer to your account settings for specific limits.

Optimization Tips

  • Use appropriate num_frames value - more frames = longer processing time
  • Use single_face=true when you only need one face
  • Cache results when processing the same media multiple times
  • Process videos in batches if analyzing many videos

Support

For additional help and examples, check out: Need help? Contact us at [email protected]