When to Use Canvas vs. Browser in OpenClaw

Both OpenClaw's canvas and browser tools enable GUI automation, but serve different purposes and work in distinct ways. Understanding their differences ensures you choose the right tool for the job.

Canvas: Direct Node GUI Control

The canvas tool is designed for direct automation of desktop GUI applications on paired nodes (like your Mac, PC, or Raspberry Pi). It works by sending low-level input events (mouse, keyboard) directly to the operating system's window manager.

Key Characteristics

Target: Native desktop applications (Safari, Finder, WhatsApp Desktop, etc.)
Control Level: OS-level input simulation (mouse moves, clicks, keystrokes)
Requirements:
- Node must be paired and online
- Screen Recording permission enabled on the node
- canvas tool enabled in OpenClaw config
Limitations:
- Cannot interact with UI element internals (no DOM access)
- Actions are positional; screen layout changes can break scripts
- No built-in waiting for page load or AJAX

Use Cases

Automating desktop applications that lack APIs (e.g., QuickBooks, Adobe apps)
Simulating user workflows for testing
Controlling media playback in desktop apps
Triggering actions in Electron apps that resist browser automation

Example Workflow

# Show the canvas
openclaw canvas present

# Move mouse and click
openclaw canvas eval 'window.mouseMove(100, 200); window.mouseClick()'

# Type text
openclaw canvas eval 'window.type("Hello World")'

# Take a snapshot
openclaw canvas snapshot

Browser: Full Web Automation

The browser tool provides a complete, isolated Chromium-based browser controlled via the Chrome DevTools Protocol (CDP). It's ideal for web automation with robust selectors and state inspection.

Key Characteristics

Target: Web applications and pages
Control Level: High-level DOM interaction via accessibility roles and ARIA refs
Requirements:
- browser tool enabled in OpenClaw config
- Playwright (for advanced actions like act)
- Browser profile configured (e.g., openclaw)
Advantages:
- Element stability via role/name matching (not pixel position)
- Built-in waiting for navigation, AJAX, and JavaScript conditions
- Full access to network requests, console logs, and storage
- Cross-platform (works on any OS with Chromium)

Use Cases

Web scraping and data extraction
Automated form filling and checkout
Testing web applications
Monitoring dashboards
Interacting with complex SPAs (React, Vue, etc.)

Example Workflow

# Start browser
openclaw browser start

# Open page
openclaw browser open https://example.com

# Get interactive elements
openclaw browser snapshot --interactive

# Click by role ref
openclaw browser act kind=click ref=e12

# Wait for navigation
openclaw browser wait --url "**/dashboard"

# Take screenshot
openclaw browser screenshot

When to Choose Which?

| Scenario | Recommended Tool | |----------|------------------| | Automating Safari, Chrome, Firefox | browser | | Controlling desktop apps (Slack, WhatsApp Desktop) | canvas | | Need DOM access or network inspection | browser | | Automating non-web apps with no API | canvas | | Position-independent, robust selectors | browser | | Simulating exact mouse movements | canvas | | Working with SPAs or dynamic content | browser | | Need to bypass web anti-bot measures | canvas (sometimes) |

Summary

Use browser for reliable, maintainable web automation with high-level controls. Use canvas for direct OS-level input to desktop applications when no better API exists. The browser tool is generally preferred for web tasks due to its stability and diagnostic capabilities, while canvas serves as a powerful fallback for native app automation.