← All articles

AI in test automation — taking stock

by Rainer Haupt

Everyone is talking about AI in testing. But who is actually using it in production?

The numbers are sobering. According to Qable.io (2025), 75 % of organisations rate AI-assisted testing as strategically important — but only 16 % have actually rolled it out in production. The World Quality Report 2025-26 confirms the picture: an average 19 % productivity gain from AI in quality work. That sounds good — until you read that one third of organisations see minimal or no results at all. The gap between intent and delivery is substantial.

Tooling hype meets practical reality

Browse blog posts and conference talks today and the impression is that AI has already reinvented testing. The reality in most QA teams looks different: thousands of existing tests, mature frameworks, an established CI/CD operation. “Switching AI on” does not work that way.

On top of that, many of the marketed AI testing tools are commercial, cloud-bound or still in alpha. A team running an existing setup with Robot Framework, Python and Playwright does not find a drop-in solution at every corner. The investment question shifts from “What can the tool do?” to “Can it integrate into our existing pipeline without rebuilding selectors, reporting and permissions?”

Where AI delivers ROI today

Three areas stand out because they demonstrably work in practice — not just in demos.

Visual regression testing is the most mature AI category in testing, full stop. Applitools has done this commercially for over ten years. Open-source alternatives like Visual Regression Tracker now offer AI-assisted image comparison with Playwright integration. Even simple pixel comparison with pixelmatch-py or SSIM metrics from scikit-image catches a lot — without an LLM, without cloud dependency.

Self-healing tests address one of the most expensive problems in test automation. Test maintenance consumes 30–40 % of QA effort. Self-healing locators measurably reduce that for minor UI changes. For Python and Playwright there is autoheal-locator-python, which works even with free or local LLMs. Practical tip: Playwright’s semantic locators (get_by_role, get_by_text) already eliminate over 80 % of locator breaks — that is the most effective measure before any AI solution.

AI-assisted test generation delivers measurable results when the context is right. According to a 2024 ACM study, GPT-4 reaches around 72.5 % validity on generated tests, with another 15 % identifying previously overlooked edge cases. Generation works particularly well for human-readable formats like Robot Framework. But: 92 % of tests generated without suite context fail. Context is everything — RAG integration with the existing test library is the difference between demo and production.

What is immediately actionable — at no cost

The most useful tools need no LLM and cost nothing.

  • Faker or Mimesis for test data generation — stable, fast, no setup
  • axe-playwright-python for accessibility tests — one line of code, catches around 57 % of WCAG issues
  • Schemathesis for automatic API tests from OpenAPI specs — 1.4–4.5× more defects than competitors per study
  • pixelmatch-py for simple visual regression — three lines of code

Anyone evaluating local LLMs: install Ollama, load Qwen 2.5 Coder 7B — runs on any laptop with 16 GB RAM. Put LiteLLM in front as middleware and switching between local and cloud is one config line. Cost: zero. Data governance: fully in-house.

Verdict

AI in testing is not hype, but it is not an overnight revolution either. The 16 % that use it in production do not chase the loudest tools, they chase the most pragmatic ones. The right starting point today is not AI agents — it is the unflashy problems: test data, visual regression, accessibility. That is where the ROI is immediately measurable. Without those tidied up, an AI layer on top adds little.

Sources

Request callback