Face Detection API Overview - Akool open api documents

Detect faces in images and videos with high accuracy. Get bounding boxes, 6-point landmarks, cropped face images, and face tracking for video content.

API Endpoints

Face Detection Operations

Detect Faces - Unified endpoint for face detection in images and videos (auto-detects media type)
Analyze Frames - Multi-frame face analysis with person deduplication for face swap preparation

Getting Started

Basic Workflow

For Image Face Detection:
- Call the Detect Faces API with an image URL or base64-encoded image
- Only the url or img parameter is required (no need for num_frames)
- Receive bounding boxes and 6-point landmarks for all detected faces
- Optionally get cropped face image URLs with return_face_url=true
- Use the landmark data for downstream tasks (e.g., face swap, face recognition)
For Video Face Detection:
- Call the Detect Faces API with a video URL
- Specify the num_frames parameter to control how many frames to analyze (default: 5)
- Get face tracking data across frames with removed face positions

Response Code Description

Error code 0 indicates success. Any non-zero error code indicates a failure. Check the error_msg field for detailed error information.

Code	Description
0	Success
1	Error - Check error_msg for details

Features

Dual Input Modes

The API supports two ways to provide image input:

URL Mode: Provide a publicly accessible URL to an image or video
Base64 Mode: Provide base64-encoded image data (with or without data URI prefix)

// URL mode
{ "url": "https://example.com/image.jpg" }

// Base64 mode
{ "img": "data:image/jpeg;base64,/9j/4AAQSkZJRg..." }

6-Point Facial Landmarks

The API detects 6 key facial landmarks for each face:

Left Eye - Center point of the left eye
Right Eye - Center point of the right eye
Nose Tip - Tip of the nose
Mouth Center - Center point of the mouth (X-axis midpoint between mouth corners)
Left Mouth Corner - Left corner of the mouth
Right Mouth Corner - Right corner of the mouth

Cropped Face Images

When return_face_url=true, the API returns:

face_urls: URLs to cropped face images stored in cloud storage
crop_region: The region coordinates used for cropping
crop_landmarks: Landmarks relative to the cropped image

This is particularly useful for face swap operations where you need both the face image and its landmarks.

Single Face Mode

When single_face=true, the API returns only the largest face (by area) in each frame. This is useful for:

Portrait photos where you only care about the main subject
ID photos with a single person
Reducing response size when multiple faces are detected

Face Tracking for Videos

For video content, the API provides advanced face tracking:

Persistent Face IDs - Tracks the same face across multiple frames
Removed Faces - Identifies faces that were present in previous frames but are no longer visible
Frame Timing - Provides timestamp information for each frame

Auto Media Type Detection

The API automatically detects whether the input is an image or video based on:

File extension (.jpg, .png, .mp4, .mov, etc.)
Content-Type header from the URL
Fallback to content analysis if needed

Best Practices

Image Requirements

Quality: Use high-resolution images for better detection accuracy
Face Visibility: Ensure faces are clearly visible and not obscured
Lighting: Well-lit images produce better detection results
Angle: Frontal or slight angle faces work best (±45 degrees)
Size: Face size should be at least 80x80 pixels

Video Requirements

Duration: Shorter videos process faster
Frame Rate: Standard frame rates (24-30 fps) are optimal
Resolution: 720p or higher recommended for best results
Face Count: API can detect multiple faces per frame
Encoding: Use standard encoding formats (H.264 recommended)

API Usage Tips

Parameter Usage:
- For Images: Only url or img parameter is required. The num_frames parameter is NOT needed.
- For Videos: Both url and num_frames parameters are recommended.
Frame Selection (for videos only):
- Short videos (< 10s): 5-10 frames
- Medium videos (10-30s): 10-20 frames
- Long videos (> 30s): 20-50 frames
URL Accessibility: Ensure the media URL is publicly accessible
Supported Formats:
- Images: JPG, JPEG, PNG, BMP, WEBP
- Videos: MP4, MOV, AVI, WEBM

Understanding the Response

Response Structure

{
  "error_code": 0,
  "error_msg": "SUCCESS",
  "faces_obj": {
    "0": {
      "landmarks": [
        [[100, 120], [150, 120], [125, 150], [125, 180], [110, 180], [140, 180]]
      ],
      "landmarks_str": [
        "100,120:150,120:125,150:125,180"
      ],
      "region": [[80, 100, 100, 120]],
      "removed": [],
      "frame_time": null,
      "face_urls": null,
      "crop_region": null,
      "crop_landmarks": null
    }
  }
}

Field Descriptions

error_code: Status code (0 = success)
error_msg: Status message or error description
faces_obj: Dictionary keyed by frame index (as string)
- landmarks: Array of 6-point landmarks for each detected face
  - Format: [[x1, y1], [x2, y2], [x3, y3], [x4, y4], [x5, y5], [x6, y6]]
- landmarks_str: String format of first 4 landmarks for Face Swap API compatibility
  - Format: "x1,y1:x2,y2:x3,y3:x4,y4"
- region: Bounding boxes for each detected face
  - Format: [x, y, width, height] where (x, y) is the top-left corner
- removed: Bounding boxes of faces no longer visible (video only)
- frame_time: Timestamp in seconds for this frame (video only, null for images)
- face_urls: Cropped face image URLs (only when return_face_url=true)
- crop_region: Cropping region in original image (only when return_face_url=true)
- crop_landmarks: Landmarks relative to cropped image (only when return_face_url=true)

Common Use Cases

Face Detection for Image Processing

{
  "url": "https://example.com/portrait.jpg"
}

Use case: Detect faces in a portrait photo for face alignment, face recognition, or face swap preprocessing.

For images, the num_frames parameter is not needed and will be ignored.

Get Cropped Face Images for Face Swap

{
  "url": "https://example.com/photo.jpg",
  "return_face_url": true
}

Use case: Get cropped face images with their landmarks for direct use in Face Swap API.

Face Tracking in Video Content

{
  "url": "https://example.com/video.mp4",
  "num_frames": 15
}

Use case: Track faces across video frames for video editing, face swap in videos, or facial animation.

Single Face Detection

{
  "url": "https://example.com/group_photo.jpg",
  "single_face": true
}

Use case: Get only the main/largest face from a group photo.

Multiple Face Detection

The API automatically detects all faces in an image or video frame. No special configuration needed.

Integration with Face Swap

Use Face Detection API to get face landmarks and optionally cropped face URLs
Pass the landmarks_str value to Face Swap API as the opts parameter
When using return_face_url=true, use crop_landmarks for the cropped face image

Error Handling

Common Errors

Error Message	Cause	Solution
”Either ‘url’ or ‘img’ parameter must be provided”	Missing input	Provide either `url` or `img` parameter
”Invalid URL format”	Malformed URL provided	Ensure URL is properly formatted with protocol (http/https)
“Failed to download media”	URL inaccessible or invalid	Verify URL is publicly accessible
”No faces detected”	No faces found in media	Check image quality and face visibility
”Failed to process media”	Media format not supported	Use supported formats (JPG, PNG, MP4, etc.)
”Media type detection failed”	Unable to determine media type	Ensure file has proper extension or content-type

Handling Failed Requests

# Example error handling in Python

# For image detection (no num_frames needed)
response = requests.post(
    "https://openapi.akool.com/interface/detect-api/detect_faces",
    json={"url": "https://example.com/image.jpg"},
    headers={"x-api-key": "YOUR_API_KEY"}
)

# For video detection (num_frames recommended)
# response = requests.post(
#     "https://openapi.akool.com/interface/detect-api/detect_faces",
#     json={"url": "https://example.com/video.mp4", "num_frames": 10},
#     headers={"x-api-key": "YOUR_API_KEY"}
# )

result = response.json()
if result["error_code"] != 0:
    print(f"Error: {result['error_msg']}")
else:
    faces = result["faces_obj"]
    print(f"Detected {len(faces['0']['landmarks'])} faces")

Performance Considerations

Processing Time

Images: Typically < 1 second
Videos: Varies based on:
- Number of frames requested
- Video resolution
- Number of faces per frame

Rate Limits

Rate limits apply to all API endpoints. Please refer to your account settings for specific limits.

Optimization Tips

Use appropriate num_frames value - more frames = longer processing time
Use single_face=true when you only need one face
Cache results when processing the same media multiple times
Process videos in batches if analyzing many videos

Support

For additional help and examples, check out:

Need help? Contact us at info@akool.com

Authentication

Face Swap

Streaming Avatar

Talking Photo

Video Translation

Face Detection

Character Swap

AI Tools Suite

​API Endpoints

​Face Detection Operations

​Getting Started

​Basic Workflow

​Response Code Description

​Features

​Dual Input Modes

​6-Point Facial Landmarks

​Cropped Face Images

​Single Face Mode

​Face Tracking for Videos

​Auto Media Type Detection

​Best Practices

​Image Requirements

​Video Requirements

​API Usage Tips

​Understanding the Response

​Response Structure

​Field Descriptions

​Common Use Cases

​Face Detection for Image Processing

​Get Cropped Face Images for Face Swap

​Face Tracking in Video Content

​Single Face Detection

​Multiple Face Detection

​Integration with Face Swap

​Error Handling

​Common Errors

​Handling Failed Requests

​Performance Considerations

​Processing Time

​Rate Limits

​Optimization Tips

​Support

API Endpoints

Face Detection Operations

Getting Started

Basic Workflow

Response Code Description

Features

Dual Input Modes

6-Point Facial Landmarks

Cropped Face Images

Single Face Mode

Face Tracking for Videos

Auto Media Type Detection

Best Practices

Image Requirements

Video Requirements

API Usage Tips

Understanding the Response

Response Structure

Field Descriptions

Common Use Cases

Face Detection for Image Processing

Get Cropped Face Images for Face Swap

Face Tracking in Video Content

Single Face Detection

Multiple Face Detection

Integration with Face Swap

Error Handling

Common Errors

Handling Failed Requests

Performance Considerations

Processing Time

Rate Limits

Optimization Tips

Support