Reproducible eval sandbox for testing Computer Use / browser agents on fill job applications on company portals in customer support context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on schedule posts in Buffer in customer support context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on triage tickets in Zendesk in customer support context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on reconcile invoices in QuickBooks in sales ops context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on triage tickets in Zendesk in sales ops context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on update records in Salesforce in sales ops context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on schedule posts in Buffer in sales ops context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on manage ads in Meta Ads Manager in sales ops context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on fill job applications on company portals in sales ops context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on book flights on Google Flights in sales ops context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on apply to jobs on LinkedIn in sales ops context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on scrape product listings on Amazon in sales ops context. Fixture sites, gold trajectories, and regression gates.