mastering-web-fetch-in-openclaw
Mastering Web Fetch in OpenClaw: Efficient Web Content Extraction
The web_fetch tool is a powerful utility in OpenClaw for retrieving and extracting readable content from web pages without the overhead of full browser automation. It's designed for lightweight web research, making it perfect for quickly gathering information from articles, documentation, and other text-heavy pages.
What is web_fetch?
web_fetch is a tool that performs HTTP GET requests and extracts the main readable content from HTML pages, converting it into clean markdown or plain text. Unlike browser automation, web_fetch does not execute JavaScript, making it significantly faster and more resource-efficient for simple content extraction tasks.
The tool uses Readability.js for content extraction by default, which intelligently identifies and extracts the primary content of a web page, stripping away navigation, ads, and other non-essential elements. This makes it ideal for creating summaries, archiving articles, or processing documentation.
Basic Usage
To use web_fetch, you simply provide a URL:
await web_fetch({
url: "https://docs.openclaw.ai/tools/web.md"
});
This returns the extracted content in markdown format by default. You can also specify plain text extraction:
await web_fetch({
url: "https://docs.openclaw.ai/tools/web.md",
extractMode: "text"
});
Configuration Options
web_fetch supports several configuration parameters to customize its behavior:
url: The HTTP or HTTPS URL to fetch (required)extractMode: Either "markdown" or "text" (default: "markdown")maxChars: Maximum number of characters to return (useful for truncating lengthy pages)
Advanced Configuration
For more granular control, you can configure web_fetch in your OpenClaw configuration:
{
tools: {
web: {
fetch: {
enabled: true,
maxChars: 50000,
maxCharsCap: 50000,
maxResponseBytes: 2000000,
timeoutSeconds: 30,
cacheTtlMinutes: 15,
maxRedirects: 3,
userAgent: "Mozilla/5.0 (Macintosh; Intel Mac OS X 14_7_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/122.0.0.0 Safari/537.36",
readability: true,
firecrawl: {
enabled: true,
apiKey: "FIRECRAWL_API_KEY_HERE",
baseUrl: "https://api.firecrawl.dev",
onlyMainContent: true,
maxAgeMs: 86400000,
timeoutSeconds: 60,
},
},
},
},
}
Firecrawl Integration
When configured, web_fetch can use Firecrawl as a fallback for pages where Readability fails. Firecrawl provides advanced scraping capabilities, including JavaScript execution and bot circumvention, making it suitable for more complex sites.
To enable Firecrawl, set the API key in your configuration. Firecrawl requests are cached by default, reducing repeated API calls for the same content.
When to Use web_fetch vs Browser Automation
Choose web_fetch when:
- You need to extract content from static HTML pages
- Performance and speed are critical
- You're processing multiple articles or documentation pages
- JavaScript execution is not required to access the content
Use browser automation when:
- The content is loaded dynamically via JavaScript
- You need to interact with page elements (click, type, etc.)
- Authentication or complex navigation is required
- You're dealing with single-page applications
Best Practices
- Check cache first: Results are cached for 15 minutes by default, reducing unnecessary network requests
- Set character limits: Use
maxCharsto prevent memory issues when processing exceptionally long pages - Handle failures gracefully: Some sites may not be parseable; have fallback strategies
- Respect robots.txt: Always consider the website's terms of service and crawling policies
Troubleshooting
If web_fetch returns an error:
- Verify the URL is accessible and uses HTTP/HTTPS
- Check if the page requires JavaScript to display content
- Test with browser automation as an alternative
- Ensure Firecrawl is configured if needed for complex sites
The web_fetch tool strikes an excellent balance between simplicity and functionality, making it an essential component of any OpenClaw workflow that involves web content extraction.
Enjoyed this article?
Join the ClawMakers community to discuss this and more with fellow builders.
Join on Skool โ It's Free โ