VisualTreeSearch

Understanding Web Agent Test-time Scaling

Authors: Danqing Zhang, Yuanli Wang, Shiying He, Yaoyao Qian, Jingyi Ni, Junyu Cao

PathOnAI.org, Northeastern University, Boston University, The University of Texas at Austin

Abstract

We present VisualTreeSearch, a fully-deployed system for visualizing and understanding web agent test-time scaling. While test-time search algorithms substantially improve web agent success rates, they remain confined to research contexts with limited practical deployment. Our system bridges this gap with three key contributions:

  1. Production-ready Solution: A fully-deployed web agent tree search system with cloud-based architecture.
  2. Fast State Reset Mechanism: An API-based state reset solution that reduces reset time from 50 to 2 seconds.
  3. Interactive Visualization Interface: A web UI that transparently demonstrates the agent's decision-making process.

VisualTreeSearch provides an intuitive framework for both researchers and users to understand tree search execution in web agents.

Video Demonstration

Watch the demonstration of VisualTreeSearch in action, showing real-time tree search visualization and web agent interactions.

System Architecture

VisualTreeSearch System Architecture

The VisualTreeSearch system consists of four main components:

State Reset API

A specialized service that provides an efficient state reset mechanism, enabling web agents to restore a clean initial state before starting each new trajectory. Reduces reset time from 50 seconds to just 2 seconds.

Backend

Implements various tree search algorithms (BFS, DFS, MCTS) and manages real-time WebSocket communication with the frontend to transmit agent execution information.

Browser Service

Provides isolated browser sessions where web agents can execute actions, while also managing automatic authentication using Playwright.

Frontend

Provides the user interface for configuring search tasks, visualizing tree search trajectories, and observing agent behavior through embedded browser views and execution logs.

Key Technical Innovations

API-based State Reset

When web agents interact with UIs, they modify states that persist in the website's database, causing evaluation inconsistencies across trajectories. Our solution implements an API-based state reset mechanism with a FastAPI server hosted on AWS EC2 that manages the website database to control website state, reducing reset time from 50 seconds with previous docker container restarts to just 2 seconds.

Cloud-based Architecture

Unlike previous Vercel-based web agent demos, our system implements AWS ECS container-based services to overcome serverless execution limitations. This architecture supports persistent WebSocket communication and accommodates extended processing times, both of which are essential for comprehensive tree search operations.

Interactive Visualization System

The VisualTreeSearch frontend visualization system enhances web agent research by providing an interpretable monitoring environment with three main components: a browser interface for real-time web environment view, a D3.js tree visualization highlighting the active trajectory, and a comprehensive execution log documenting the agent's operational sequence.

Visualization Interface

Tree Visualization & Live Browser View

Tree Visualization & Browser Interface

The system provides a live view of the browser interaction alongside an interactive tree visualization. The tree shows the exploration paths, with nodes representing different states and actions. Users can see the current trajectory being executed, along with detailed logs of each action.

Configuration Interface

Configuration Interface

The configuration panel allows users to customize search parameters, select algorithms (BFS, DFS, MCTS, LATS), set the starting URL, define goals, specify max depth, and control other advanced settings. This makes it easy to experiment with different search strategies and compare their effectiveness.

Supported Search Algorithms

BFS/DFS

Basic breadth-first and depth-first search algorithms for systematic exploration of the action space. These provide a baseline for comparing more advanced search strategies.

MCTS

Monte Carlo Tree Search, which balances exploration and exploitation through its four phases: selection, expansion, simulation, and backpropagation.

LATS

Language Agent Tree Search, which unifies reasoning, acting, and planning in language models for more effective exploration of complex web interactions.

Custom Algorithms

The system's modular design allows researchers to implement and test custom search algorithms and compare their performance with existing methods.

Get Started

Interested in trying VisualTreeSearch? Our open-source implementation serves as both a demonstration tool and a foundation for future research on agent decision-making optimization. As web agents advance, robust visualization frameworks will be essential for developing more reliable autonomous systems.