Featured image of post AI智能驱动浏览器工具Browser Use详解

AI智能驱动浏览器工具Browser Use详解

前言

在之前关于 AI 测试相关的几篇文章中,我们分别介绍了通过 playwright-mcp,以及 midscene.jsmagentic ui 等几个不同的 AI 浏览器工具,实现 AI 驱动自动化测试的方法介绍。而其实在这些不断涌现的新工具出现之前,还有一个更早推出(2024.11),也同样还在不断完善的AI浏览器驱动工具 Browser Use,其工具实现思路,为后来的相关工具提供了很多借鉴。 而经过半年多40多个版本的迭代,目前最新版本 0.2.5,工具功能也日臻完善。

本文,我们就再来对这个 AI 浏览器做一个系统的介绍。

browser-use简介

browser-use 的定位是提供一个简易且功能强大的,将不同AI Agent和浏览器连接起来的方案,实现基于AI的智能化浏览器自动化。

它是一个基于Python的开源库,在github上已有超过 61K 的Stars,可以说是备受关注。

主要依托 Playwright/Puppeteer 的浏览器控制能力和 AI 大模型的推理分析能力,完成让 AI 从资讯助手向辅助执行的转变。在其之后,mcp大行其道,包括后续的其他各种浏览器AI驱动,也都能看到 browser-use 方案的影子。

实现原理

browser-use 的实现,其实是通过 LangchainLLM 大语言模型实现的一个 AI Agent 智能体,来理解网页内容并进一步生成操作指令。项目底层依赖 Playwright 框架实现浏览器自动化操作,支持多浏览器(如 Chromium、Firefox 等),能够模拟真实用户的点击、输入、导航等行为。

基于 AI 能力,系统能够自动识别网页中的可交互元素(如按钮、输入框等),并结合上下文的理解生成对应的交互逻辑,提升自动化效率,实现基于 AI 的浏览器智能。

核心架构解析

browser-use 通过分层架构实现 AI Agent 与浏览器的深度集成,其核心架构不同分层的主要作用如下:

Agent 层(决策中枢)

这一层,主要负责任务流程编排与决策制定。通过实现小型状态机管理任务流程, 并与 LLM(如 OpenAI)进行交互获取决策指令。

Controller 层(指令转换器)

这一层负责将高级决策转化为具体浏览器操作指令, 支持 DOM 操作、页面导航等基础动作,也提供对多标签页交互逻辑的管理。

DOM 解析引擎

在这一层对网页结构与内容实时解析,包括提供视觉识别能力(OCR 支持),完成对可操作的网页元素映射关系的构建。

Browser 接口层

在这一层,基于Playwright框架实际驱动浏览器行为,除了内置的无头模式控制,也可以通过指定浏览器路径和用户浏览器进行交互。

架构图

部署安装

建议使用 python 包管理工具 uv 进行安装部署

1
2
3
4
uv venv --python 3.11
.venv\Scripts\activate
uv pip install browser-use
playwright install --with-deps

工具需要使用对应大模型的API KEY, 定义在.env 环境配置文件中, 支持各大主流LLM:

1
2
3
4
5
6
7
8
OPENAI_API_KEY=
ANTHROPIC_API_KEY=
AZURE_OPENAI_ENDPOINT=
AZURE_OPENAI_KEY=
GOOGLE_API_KEY=
DEEPSEEK_API_KEY=
GROK_API_KEY=
NOVITA_API_KEY=

通过web界面使用

browser-use 本身也提供了一个 Web UI 界面来直接调用,它也需要单独安装, 在 venv下继续执行:

1
2
3
4
5

git clone https://github.com/browser-use/web-ui.git
cd web-ui
uv pip install -r requirements.txt
python webui.py --ip 127.0.0.1 --port 7788

启动界面

运行成功后,会看到如下 web-ui 的界面,Agent Settings 可以在界面上配置使用的LLM,Browser Settings 配置浏览器的交互方式,Run Agent 是实际调用大模型和浏览器完成自动化任务的交互界面和结果记录

LLM配置

这里使用的是本地部署的Ollama+Deepseek-r1:14b

运行过程

实际调用的浏览器和运行日志:

整个执行过程还会生成一个 gif 图:

Python脚本调用

除了通过 Web-UI 来完成任务,当然更多的情况下是通过代码调用,比如如下代码通过 Deepseek的官方API调用,包括了输出格式的定义

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
from langchain_deepseek import ChatDeepSeek
from browser_use import Agent
from pydantic import SecretStr, BaseModel, ValidationError, field_validator, ConfigDict, Field
from dotenv import load_dotenv
import os
import asyncio
from typing import List,Optional
from browser_use import Controller
import json


# 统一模型输出定义
class PostItem(BaseModel):
    model_config = ConfigDict(extra="ignore")
    
    post_title: str  
    post_url: str
    num_comments: int
    hours_since_post: int
    
    @field_validator('num_comments', 'hours_since_post', mode='before')
    def convert_numbers(cls, value):
        """确保数值字段转换为整数"""
        if isinstance(value, str) and value.isdigit():
            return int(value)
        return value or 0  

class Posts(BaseModel):
    posts: list[PostItem]
    total: int

# 结果解析函数
def parse_result(result: str | dict) -> Posts:
    """安全解析结果数据"""
    try:
        if isinstance(result, str):
            data = json.loads(result)
        else:
            data = result
            
        print("Parsed data:", data)
        return Posts.model_validate(data)
        
    except (json.JSONDecodeError, ValidationError) as e:
        print(f"解析错误: {type(e).__name__}: {e}")
        # 非json格式,直接返回结果
        return result

controller = Controller(output_model=Posts)

load_dotenv()
api_key = os.getenv("DEEPSEEK_API_KEY")

# Initialize the model
llm=ChatDeepSeek(base_url='https://api.deepseek.com/v1', model='deepseek-reasoner', api_key=SecretStr(api_key))

async def main():
    # Create agent with the model
    agent = Agent(
        task="测试saucedemo.com网站standard_user的登录功能, 使用不同密码(secret_sauce,空密码),登录成功则验证完成,否则需要输出错误信息。测试结果需要包含以下信息:\n1. 登录是否成功\n2. 如果登录失败,错误信息是什么\n",
        llm=llm,
        use_vision=False
    )
    history  = await agent.run()
    result = history.final_result()
    parsed = parse_result(result)
    
    if parsed.posts:
        for post in parsed.posts:
            print('\n--------------------------------')
            print(f'Title:            {post.post_title}')
            print(f'URL:              {post.post_url}')
            print(f'Comments:         {post.num_comments}')
            print(f'Hours since post: {post.hours_since_post}')
    else:
        print(result)

# 异步主函数
if __name__ == "__main__":
    try:
        asyncio.run(main())
    except Exception as e:
        print(f"程序运行时出错: {e}")

执行也是通过playwright打开浏览器,效果和web-ui类似。

程序输出如下:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
C:\qiucao\AI\browser_use>python app.py
WARNING  [agent] ⚠️ DeepSeek models do not support use_vision=True yet. Setting use_vision=False for now...
INFO     [agent] 🧠 Starting an agent with main_model=deepseek-reasoner +rawtools +memory, planner_model=None, extraction_model=deepseek-reasoner
INFO     [mem0.vector_stores.faiss] Loaded FAISS index from /tmp/mem0_384_faiss/mem0.faiss with 0 vectors
INFO     [mem0.vector_stores.faiss] Loaded FAISS index from C:\Users\weiwe\.mem0\migrations_faiss/mem0_migrations.faiss with 1 vectors
INFO     [agent] 🚀 Starting task: 测试saucedemo.com网站standard_user的登录功能, 使用不同密码(secret_sauce,空密码),登录成功则验证完成,否则需要输出错误信息。测试结果需要包含以下信息:
1. 登录是否成功
2. 如果登录失败,错误信息是什么

INFO     [agent] 📍 Step 1
INFO     [agent] 🤷 Eval: Unknown - Just started the task. No previous actions taken.
INFO     [agent] 🧠 Memory: Starting task: Test login functionality on saucedemo.com for user 'standard_user' with two password cases. 0 out of 2 password tests completed. First need to navigate to login page.
INFO     [agent] 🎯 Next goal: Navigate to saucedemo.com login page to begin testing
INFO     [agent] 🛠️  Action 1/1: {"go_to_url":{"url":"https://www.saucedemo.com"}}
INFO     [controller] 🔗  Navigated to https://www.saucedemo.com
INFO     [agent] 📍 Step 2
INFO     [agent] 👍 Eval: Success - Navigation to saucedemo.com completed successfully. Login page is loaded with required input fields visible.
INFO     [agent] 🧠 Memory: 0 out of 2 password tests completed. Starting first test case: username 'standard_user' with valid password 'secret_sauce'. Will verify login success by checking URL change to /inventory.html after submission.
INFO     [agent] 🎯 Next goal: Execute first login test with valid credentials and verify success
INFO     [agent] 🛠️  Action 1/3: {"input_text":{"index":0,"text":"standard_user"}}
INFO     [agent] 🛠️  Action 2/3: {"input_text":{"index":1,"text":"secret_sauce"}}
INFO     [agent] 🛠️  Action 3/3: {"click_element_by_index":{"index":2}}
INFO     [controller] ⌨️  Input standard_user into index 0
INFO     [controller] ⌨️  Input secret_sauce into index 1
INFO     [controller] 🖱️  Clicked button with index 2:
INFO     [agent] 📍 Step 3
WARNING  [message_manager] Failed to parse model output: {
  "current_state": {
    "evaluation_previous_goal": "Success - First test case completed: Login with username 'standard_user' and password 'secret_sauce' was successful. Verified by URL change to /inventory.html and presence of product listings.",
    "memory": "1 out of 2 password tests completed. First test successful. Next: Test with empty password. Need to logout before second test.",
    "next_goal": "Logout to return to login page for second test case"
  },
  "action": [
    {"click_element_by_index": {"index": 0}},  // Open menu
    {"wait": {"seconds": 2}},  // Wait for menu animation
    {"click_element_by_index": {"index": 34}}  // Logout (assuming logout appears at index 34 after menu opens)
  ]
} Expecting value: line 8 column 48 (char 532)
WARNING  [agent] Failed to parse model output: content='{\n  "current_state": {\n    "evaluation_previous_goal": "Success - First test case completed: Login with username \'standard_user\' and password \'secret_sauce\' was successful. Verified by URL change to /inventory.html and presence of product listings.",\n    "memory": "1 out of 2 password tests completed. First test successful. Next: Test with empty password. Need to logout before second test.",\n    "next_goal": "Logout to return to login page for second test case"\n  },\n  "action": [\n    {"click_element_by_index": {"index": 0}},  // Open menu\n    {"wait": {"seconds": 2}},  // Wait for menu animation\n    {"click_element_by_index": {"index": 34}}  // Logout (assuming logout appears at index 34 after menu opens)\n  ]\n}' additional_kwargs={'refusal': None, 'reasoning_content': 'We are on the inventory page after successful login with the first password (secret_sauce). This indicates that the first test case (valid password) was successful.\n Now, we need to test the second case: empty password. \n However, we are currently logged in. We must first log out to return to the login page.\n\n Steps:\n 1. Open the menu (index 0: button with text "Open Menu")\n 2. In the menu, there should be a logout option. We need to find the logout link.\n\n But note: the interactive elements provided are only for the current viewport. The menu is not open, so the logout link is not visible yet.\n\n After clicking the menu button, the menu will open and we can see the logout link. Then we can click it.\n\n However, the current action sequence can only include actions that are based on the current state (before opening the menu). So we can only plan to click the menu button now, and then in the next step we will see the updated elements including the logout link.\n\n Alternatively, we could use a longer sequence? But the problem is that after clicking the menu, the page changes (menu opens) so the state updates and we get new interactive elements. Therefore, we should only click the menu now and then in the next step we will see the logout link.\n\n But note: the task requires testing two cases. We have done one (with valid password) and now we need to do the second (empty password). We must log out first.\n\n Plan:\n Step 1: Click the menu button (index 0)\n Step 2: After the menu opens, we expect to see a logout link. Then click that logout link to return to the login page.\n\n However, the current response can only include one step because the state will change after clicking the menu. So we do:\n\n Action: [{"click_element_by_index": {"index": 0}}]\n\n Then in the next step, we will see the menu and the logout link, and then we can click logout.\n\n But note: the task history memory is now at step 3/100. We have to remember that we are in the middle of the second test case.\n\n We have to update the memory: \n   First test case: valid password -> successful login (we are on inventory page)\n   Now we are going to log out to test the second case.\n\n Let\'s write the state accordingly.'} response_metadata={'token_usage': {'completion_tokens': 678, 'prompt_tokens': 4236, 'total_tokens': 4914, 'completion_tokens_details': {'accepted_prediction_tokens': None, 'audio_tokens': None, 'reasoning_tokens': 495, 'rejected_prediction_tokens': None}, 'prompt_tokens_details': {'audio_tokens': None, 'cached_tokens': 3072}, 'prompt_cache_hit_tokens': 3072, 'prompt_cache_miss_tokens': 1164}, 'model_name': 'deepseek-reasoner', 'system_fingerprint': 'fp_393bca965e_prod0425fp8', 'id': '68127967-53c3-4329-a9fe-01feb6090fc4', 'finish_reason': 'stop', 'logprobs': None} id='run-c6f0d672-4dd5-4ea7-b269-9c908f5aef4e-0' usage_metadata={'input_tokens': 4236, 'output_tokens': 678, 'total_tokens': 4914, 'input_token_details': {'cache_read': 3072}, 'output_token_details': {'reasoning': 495}} Could not parse response.
ERROR    [agent] ❌ Result failed 1/3 times:
 Could not parse response.
INFO     [agent] 📍 Step 3
INFO     [agent] 👍 Eval: Success - First test case completed successfully. Login with valid password 'secret_sauce' succeeded as confirmed by URL change to /inventory.html and visible product listings.
INFO     [agent] 🧠 Memory: 1 out of 2 password tests completed (valid password successful). Remaining: test with empty password. Need to logout before next test.
INFO     [agent] 🎯 Next goal: Logout to return to login page for second test case
INFO     [agent] 🛠️  Action 1/2: {"click_element_by_index":{"index":0}}
INFO     [agent] 🛠️  Action 2/2: {"wait":{"seconds":2}}
INFO     [controller] 🖱️  Clicked button with index 0: Open Menu
INFO     [controller] 🕒  Waiting for 2 seconds
INFO     [agent] 📍 Step 4
INFO     [agent] 👍 Eval: Success - Menu opened successfully after login. Logout option is visible at index 2.
INFO     [agent] 🧠 Memory: 1 out of 2 password tests completed (valid password successful). Need to logout for second test with empty password. Next: click logout button.
INFO     [agent] 🎯 Next goal: Logout to return to login page for second password test case
INFO     [agent] 🛠️  Action 1/1: {"click_element_by_index":{"index":2}}
INFO     [controller] 🖱️  Clicked button with index 2: Logout
INFO     [agent] 📍 Step 5
INFO     [agent] 👍 Eval: Success - Successfully logged out and returned to login page. All interactive elements for testing are available.
INFO     [agent] 🧠 Memory: 1 out of 2 password tests completed (valid password successful). Remaining test: username 'standard_user' with empty password. After submission, need to capture error message if login fails.
INFO     [agent] 🎯 Next goal: Execute second test case with empty password and capture results
INFO     [agent] 🛠️  Action 1/3: {"input_text":{"index":0,"text":"standard_user"}}
INFO     [agent] 🛠️  Action 2/3: {"input_text":{"index":1,"text":""}}
INFO     [agent] 🛠️  Action 3/3: {"click_element_by_index":{"index":2}}
INFO     [controller] ⌨️  Input standard_user into index 0
INFO     [controller] ⌨️  Input  into index 1
INFO     [controller] 🖱️  Clicked button with index 2:
INFO     [agent] 📍 Step 6
INFO     [agent] 👍 Eval: Success - Second test case completed: Login failed as expected with empty password. Error message 'Epic sadface: Password is required' is visible on page.
INFO     [agent] 🧠 Memory: 2 out of 2 password tests completed: 1) Valid password 'secret_sauce' - login succeeded 2) Empty password - login failed with error. No remaining tests.
INFO     [agent] 🎯 Next goal: Compile final test results and complete task
INFO     [agent] 🛠️  Action 1/1: {"done":{"text":"测试结果:\n1. 测试用例1 (密码: secret_sauce):\n   - 登录成功: 是\n   - 错误信息: 无\n\n2. 测试用例2 (密码: 空):\n   - 登录成功: 否\n   - 错误信息: \"Epic sadface: Password is required\"","success":true}}
INFO     [agent] 📄 Result: 测试结果:
1. 测试用例1 (密码: secret_sauce):
   - 登录成功: 是
   - 错误信息: 无

2. 测试用例2 (密码: 空):
   - 登录成功: 否
   - 错误信息: "Epic sadface: Password is required"
INFO     [agent] ✅ Task completed
INFO     [agent] ✅ Successfully
INFO     [agent] 📝 Total input tokens used (approximate): 33168
INFO     [agent] Agent run telemetry logged.

从以上输出,可见Browser-Use 将任务分解为了6步:

  1. 开始任务,访问网站 https://www.saucedemo.com
  2. 输入正常用户 standard_user/secret_sauce,点击登录按钮
  3. 点击登出(未发现登出按钮,重试)
  4. 点击菜单后登出
  5. 使用空密码重新登录
  6. 测试完成,获取输出结果

总结

通过以上案例,可以看到,Browser-Use 可以有效借助 LLM根据我们输入的提示词,分解任务,实现对页面的有效解析并完成执行,获取结果。

执行效率上来说,API调用相对还是比较慢,对于复杂任务,会耗时较长。而在任务开始后,我们并无法干预后续执行路径(除非强制中断),这也是后续如 Magentic UI 这样工具的重点优化方向。

但 Browser-Use 的优点是已经较为成熟,对本地大模型和多种在线大模型都有良好支持,社区实践也较多,利用它辅助完成自动化测试也已有较多实践,包括系统提示词的优化。是我们利用AI辅助自动化测试的一个较好方向。


Mermaid 架构图

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
graph TD
    A[AI Agent] -->|决策指令| B(Controller)
    B -->|DOM操作指令| C[DOM 解析引擎]
    C -->|网页内容| D[Browser 接口]
    D -->|浏览器交互| E[Chrome/Edge]
    
    subgraph 功能模块
        B --> 多标签管理
        C --> 视觉识别(OCR)
        D --> 真实用户行为模拟
    end
    
    style A fill:#FFE4B5,stroke:#333
    style E fill:#98FB98,stroke:#333
使用 Hugo 构建
主题 StackJimmy 设计