Understanding OpenClaw Canvas: UI Automation for Mobile & Desktop

The OpenClaw Canvas is a powerful tool for automating graphical user interfaces on both mobile and desktop devices. Unlike full browser automation, Canvas provides direct access to the visual layer of paired devices, enabling pixel-perfect interaction with native apps, system interfaces, and custom GUI applications.

What is the Canvas?

The Canvas is a GUI automation surface that renders the screen of a paired node (mobile or desktop) within the OpenClaw ecosystem. It allows you to programmatically interact with any visible element, making it ideal for tasks that require visual feedback or access to non-web applications.

Key capabilities include:

Rendering the UI of a paired device
Taking screenshots of the current state
Evaluating JavaScript in the context of the Canvas
Pushing A2UI elements to create hybrid interfaces
Automating interactions based on visual elements

Core Commands

`canvas present`

Displays the Canvas interface for a paired node. This is the first step in any Canvas automation workflow.

{
  "action": "present",
  "node": "iPhone12"
}

`canvas snapshot`

Captures the current state of the Canvas and returns an accessibility-tree-like representation that can be used to identify interactive elements by role, label, or position.

{
  "action": "snapshot",
  "node": "MacBookPro"
}

`canvas eval`

Executes JavaScript code within the Canvas environment, allowing for dynamic manipulation of the displayed content or execution of complex logic.

{
  "action": "eval",
  "javaScript": "document.getElementById('button').click();"
}

`canvas a2ui_push`

Pushes A2UI (Agent-to-UI) elements to the Canvas, enabling the creation of custom interfaces that blend agent logic with user interaction.

Common Use Cases

Mobile App Automation

Automate native mobile applications that don't have APIs or web interfaces. This is particularly useful for apps that require visual verification or have complex gesture-based interactions.

Desktop UI Testing

Perform automated testing of desktop applications by simulating user interactions and validating visual outputs.

Hybrid Interfaces

Create interfaces that combine the power of AI agents with traditional UI elements, allowing users to interact with agents through familiar visual controls.

Accessibility Automation

Automate tasks for users with accessibility needs by creating custom navigation flows that work with screen readers and other assistive technologies.

Security Considerations

Canvas operations require explicit permission from the user, as they provide direct access to the device's display. Always ensure that Canvas automation is performed in a secure context and that sensitive information is not exposed through screenshots or logs.

Getting Started

Pair your device with OpenClaw
Use nodes status to verify the connection
Call canvas present to initialize the Canvas
Use canvas snapshot to analyze the current UI state
Perform actions using canvas act with the appropriate element references

The Canvas opens up new possibilities for automation beyond what's possible with browser-based tools alone, making OpenClaw a truly versatile platform for AI-driven task automation.