using-canvas-for-gui-automation
Using Canvas for GUI Automation
Canvas is OpenClaw's core tool for GUI automation, enabling direct interaction with native applications and web interfaces across desktop and mobile devices. By providing a shared visual surface, Canvas bridges the gap between agents and graphical user interfaces, making actions like clicking, typing, and navigation programmable.
What is Canvas?
Canvas functions as a digital workspace where screenshots and interface elements are shared between the user and the agent. When activated, it captures the current screen or a specific application window and presents it within the chat interface. This allows both parties to see the same visual context, facilitating precise automation and remote assistance.
The tool is not limited to simple screenshots. It supports dynamic interaction through the A2UI (Agent-to-UI) protocol, which overlays interactive elements like buttons and forms directly onto the shared canvas. This transforms static images into functional interfaces that can be manipulated by both humans and AI agents.
Key Features
Visual Context Sharing: Canvas eliminates ambiguity by providing a common visual reference. Instead of describing interface elements with text, you can point directly to them on the shared screen. This is particularly valuable for complex workflows involving multiple applications or nested menus.
Cross-Platform Compatibility: Canvas works seamlessly across different operating systems and device types. Whether you're automating a macOS application, a Windows utility, or a mobile app on iOS or Android, the interaction model remains consistent.
A2UI Integration: The A2UI protocol extends Canvas's capabilities by allowing dynamic interface elements to be pushed to the shared surface. Agents can generate buttons, input fields, and other controls that appear directly on the canvas, enabling interactive workflows without requiring traditional UI development.
Common Use Cases
Application Automation: Automate repetitive tasks within desktop applications that lack native APIs or scripting support. For example, you can use Canvas to automate data entry in legacy software, navigate through complex configuration dialogs, or perform batch operations in creative applications.
Remote Assistance: Provide real-time support by sharing your screen via Canvas. The assistant can see exactly what you're seeing and guide you through troubleshooting steps, highlight interface elements, or even perform actions remotely with your permission.
UI Testing and Validation: Verify the appearance and behavior of user interfaces across different environments. Agents can analyze Canvas screenshots to detect visual regressions, validate layout consistency, or confirm that specific elements are present and functional.
Interactive Workflows: Create guided processes where users make decisions based on visual information. For instance, an agent could present a series of design options on the canvas and ask the user to select their preferred choice by clicking on the corresponding area.
Getting Started
To begin using Canvas, ensure that your device is properly paired with OpenClaw. The pairing process establishes a secure connection that enables screen sharing and remote control.
- Initiate a Canvas session by calling the
canvastool with thepresentaction. - Select the screen or application window you want to share.
- Interact with the shared content by referencing elements directly in your messages.
- For dynamic interactions, use A2UI commands to push interactive elements to the canvas.
Remember to maintain security best practices when using Canvas. Only share screens with trusted agents, and be mindful of sensitive information that may be visible in your environment.
Canvas represents a significant advancement in human-agent collaboration, making graphical interfaces as programmable as text-based systems. By leveraging visual context and dynamic interaction, it opens new possibilities for automation, support, and creative workflows.
Enjoyed this article?
Join the ClawMakers community to discuss this and more with fellow builders.
Join on Skool โ It's Free โ