Hi all,
I’ve been building a couple of GTK apps with a lot of help from Claude, and I wanted a way for it to actually see and click around in the apps it was helping me build. So I started with a small MCP server, and it grew into WayDriver — a Rust library for functional testing of Wayland apps, with a bundled MCP server for the AI-agent use case.
Each session boots a headless Mutter, a private D-Bus, and PipeWire, launches your app inside that bubble, and lets you drive it through AT-SPI and real Wayland input events. You get screenshots, a WebM recording, and an event log per run, packaged as a self-contained HTML viewer so failed CI runs leave you something to look at.
The locator API is XPath over the AT-SPI tree with auto-waits baked in:
session.locate("//Button[@name='Sign in']").click().await?;
session.locate("//Text[@name='username']").fill("alice").await?;
session.locate("//Label[@name='status']")
.wait_for_text(|t| t == "saved").await?;
Mutter is the only backend wired up today, but the library is designed around three traits (CompositorRuntime, InputBackend, CaptureBackend) so KWin and sway are reachable from the same surface.
If you’re working on a GTK app and want to try it on a real test suite, I’d be very interested to hear how it goes. And if there’s existing prior art in this space I should know about, I’d be glad to hear about that too.
Longer-term I’m thinking about bindings for other languages used in the GNOME ecosystem, and about making this work against a real session rather than only headless (which would open up accessibility-tooling use cases). Both depend a lot on what people actually want, so if any of this sounds useful for what you’re working on, I’d like to hear about it.