Reproducible eval sandbox for testing Computer Use / browser agents on schedule posts in Buffer in real estate context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on pull metrics from Mixpanel in real estate context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on fill job applications on company portals in real estate context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on download reports from Stripe dashboard in real estate context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on apply to jobs on LinkedIn in real estate context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on scrape product listings on Amazon in real estate context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on extract leads from Apollo.io in real estate context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on pull analytics from GA4 dashboard in real estate context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on fill expense reports in Concur in real estate context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on book flights on Google Flights in real estate context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on manage ads in Meta Ads Manager in real estate context. Fixture sites, gold trajectories, and regression gates.
Reproducible eval sandbox for testing Computer Use / browser agents on manage ads in Meta Ads Manager in logistics context. Fixture sites, gold trajectories, and regression gates.