AI-native testing tools in regulated industries — why keyword-driven is not yet replaceable
by Rainer Haupt
TL;DR: AI-native testing tools such as testRigor, mabl and Virtuoso cannot replace keyword-driven frameworks for compliance-critical test execution in regulated industries today. The core problem is structural, not a maturity issue: self-healing is non-deterministic and collides with FDA, ISO 26262, DO-178C, IEC 62304 and DORA. Keyword-driven frameworks deliver determinism, reproducibility and traceable change management by design. With ISO/IEC/IEEE 29119-5:2024 as a standard vocabulary, the gap widens rather than closes.
Reading time approx. 13 min · As of: 2026-04
The AI-augmented-testing vendor pitch sounds like efficiency: write tests in natural language, selectors heal themselves, the tool generates test cases from user stories. For non-regulated web applications those are real benefits. In industries where every test run has to be legally traceable — pharma, medical devices, automotive, aerospace, banking — the assessment changes fundamentally. This analysis compares the three most prominent AI-native tools with keyword-driven frameworks against the concrete demands of FDA, ISO 26262, DO-178C, IEC 62304 and DORA.
Compliance claims are thin and mostly aspirational
| Tool | Certifications | FDA claim | On-premise | Regulated customers |
|---|---|---|---|---|
| testRigor | ISO 27001, SOC 2 Type II, HIPAA, GDPR | 21 CFR Part 11 reporting support (not self-validated) | in the Enterprise plan | Medrio (eClinical / pharma) |
| mabl | SOC 2 Type II | none | no, cloud-only | ToS forbids PHI and PCI data |
| Virtuoso | SOC 2 Type II | mentions 21 CFR Part 11 in marketing, no certification | no, cloud-only (AWS EU) | last funding November 2021, valuation around USD 5.1 million |
Three findings cut across the table:
- No tool has formal tool qualification under ISO 26262-8 Clause 11 or DO-330.
- No tool ships IQ/OQ/PQ validation packages for GxP environments.
- No tool holds IEC 62304 compliance claims.
For comparison: deterministic tools such as Parasoft C/C++test carry TÜV-SÜD certification for IEC 62304. The certification gap is not a detail, it is a structural difference — and it has a technical reason.
Self-healing versus determinism — the fundamental conflict
What regulators demand:
- FDA (Software Validation) — requirements must be “consistently fulfilled”.
- ISO 26262 — tools must “comply with specified requirements” demonstrated through validation testing.
- DO-178C — provable bidirectional traceability at four points.
- GAMP 5 — validation rests on control, reproducibility, traceability.
What the AI tools do:
| Tool | Self-healing approach | Problem for regulation |
|---|---|---|
| mabl | 30+ attributes per UI element, GenAI for semantic matching (e.g. “Confirm” → “Approve”). On a plan run, applied automatically without user approval. | The same test can take different interaction paths across runs. |
| Virtuoso | around 95 % self-healing accuracy. Own docs: “If your test step can’t be healed with confidence… the test step will fail.” | A 5 % uncertainty rate is incompatible with safety-critical testing. |
| testRigor | resolves elements from the end-user perspective (visible labels). Self-healing optional, marked with “fixed-by-ai” tags, rollback possible. | The most stable of the three — DOM interpretation still varies dynamically. |
The FDA explicitly distinguishes “locked” algorithms (same result every time) from “adaptive” ones. All three tools fall into the second category. PMC research on pharma GMP puts it bluntly: “regulatory agencies remain cautious due to the non-deterministic nature of AI/ML algorithms.”
Change control — where AI tools fail most critically
In regulated environments, every change to a validated test must run through formal change control: documented change request, risk assessment, approval by qualified personnel, implementation, verification. That sequence is non-negotiable under FDA 21 CFR Part 11, GAMP 5, ISO 26262 and DO-178C.
AI tools do the opposite — “change first, review later”:
| Tool | Behaviour | Regulatory issue |
|---|---|---|
| mabl | Auto-heal plus automatic element-model update on a passing plan run. Review afterwards on the Insights page. | No approval gate, no dual authorisation, no formal CR workflow. |
| Virtuoso | Manual acceptance of self-heals required (“click of a button”). Revisions history available. | No formal change-impact analysis, no electronic signatures. |
| testRigor | ”fixed-by-ai” tags and rollback. | Visibility yes — but no pre-execution approval workflow. |
The FDA PCCP guidance 2025 for AI/ML puts it sharper: even approved AI adaptations have to specify upfront what changes, how it is validated, what acceptance criteria apply and how rollback works. Self-healing tests that adapt ad hoc structurally fail that framework.
Standardised keyword vocabularies as an unassailable advantage
ISO/IEC/IEEE 29119-5:2024 (second edition, December 2024) formalises keyword-driven testing with a standardised vocabulary, hierarchical keyword structures and a common data exchange format for tool interoperability. Robot Framework’s keyword-driven architecture aligns structurally with the model described in 29119-5 — a de-facto implementation, without the Robot Framework Foundation asserting any formal conformance claim.
The traceability chain in regulated industries then looks like this:
Requirements → Standardised Keywords → Test Specifications → Execution Logs → Results
↑ ↑ ↑ ↑ ↑
Jira / ALM 29119-5-aligned .robot files output.xml report.html
vocabulary (Git-versioned) (step by step) (audit-ready)
Side by side:
| Capability | Robot Framework | testRigor | mabl | Virtuoso |
|---|---|---|---|---|
| Native RTM | [Tags] REQ-001 per test | no, only via TestRail / Jira / Xray | no | native requirements feature |
| Execution logs | output.xml and log.html with step-by-step traces, timestamps, screenshots | PDF / Word / CSV reports | screenshots + steps | revisions history |
| Electronic signatures | no (via CI/CD infrastructure) | no | no | no |
| Version control | .robot = plain text → Git-native | proprietary platform | cloud-based | cloud-based |
| Change history / diff | standard Git diff | audit trail (since April 2023) | limited | revisions history |
Vendor lock-in as an existential risk
Product lifecycles in regulated industries are long:
- Medical devices: 10–30+ years
- Automotive ECUs: 15–25 years
- Pharma: keep test evidence for product lifetime plus years
- Aerospace: certification data for aircraft lifetime (30+ years)
What happens if the tool vendor disappears, gets acquired or pivots the platform during that time? The export options of the three tools:
| Tool | Export | Verdict |
|---|---|---|
| testRigor | no export to Selenium, Playwright, Cypress or standard languages | total lock-in |
| mabl | export via CLI to Playwright (TS), Selenium IDE, JSON, CSV | lossy — no auto-heal, no visual tests, no flows |
| Virtuoso | claims “various formats” without specification | proprietary NL format, hard to port |
| Robot Framework | plain text, Apache 2.0, Foundation-backed, no vendor | indefinitely maintainable |
For DORA in financial services, the concentration risk picks up regulatory weight. DORA Third-Party Risk Management requires financial firms to assess concentration risk for critical ICT providers. Depending on a single SaaS testing vendor for compliance-critical test infrastructure is exactly the risk type DORA addresses.
Data protection rules out cloud-only tools for many scenarios
| Requirement | testRigor | mabl | Virtuoso |
|---|---|---|---|
| On-premise | in the Enterprise plan | no | no |
| Air-gapped deployment | unclear | no | no |
| PHI processing allowed | yes (HIPAA) | no, ToS forbids it explicitly | unclear |
| PCI data allowed | unclear | no, ToS forbids it explicitly | unclear |
| EU data residency | unclear | no | yes (AWS EU) |
That makes mabl and Virtuoso architecturally incompatible with healthcare, financial and defence applications — independent of AI quality.
Verdict, with nuance
The clear claim first: AI-native testing tools cannot replace keyword-driven frameworks for compliance-critical test execution in regulated industries today. But not everything in a regulated organisation has the same compliance demand:
| Scenario | AI tools possible? | Recommendation |
|---|---|---|
| Safety-critical tests (ASIL D, SIL 3/4) | no | deterministic keywords, formally qualified |
| GxP validation (IQ / OQ / PQ) | no | keyword-driven with audit trail |
| Compliance-relevant regression tests | only with self-healing disabled | testRigor with AI features off + on-premise — conceivable |
| Low-risk smoke and exploratory tests | yes | AI tools as a complement next to a validated core |
| Non-regulated subsystems | yes | AI tools genuinely useful for efficiency |
Among the three tools, testRigor comes architecturally closest to the compliance bar: no locator-based testing, self-healing optional and disableable, on-premise option, HIPAA-certified. With AI features disabled, however, it is essentially a natural-language keyword layer — which largely undoes the AI value proposition.
The regulatory landscape is moving. The FDA PCCP framework, GAMP 5 in its second edition and the developing ISO 29119 Part 11 for AI-based systems acknowledge AI/ML as reality. They add requirements rather than relaxing determinism expectations.
When organisations use standardised keywords aligned with ISO/IEC/IEEE 29119-5, the package becomes hard to replace from a regulatory perspective:
- Each keyword has defined preconditions and postconditions with documented outcomes.
- Keywords map directly to requirement IDs.
- Plain text is Git-versionable — full change control included.
- Deterministic execution returns identical results for identical inputs.
- Vendor-independent and maintainable across 30+ year lifecycles.
- Audit-ready HTML reports without additional tooling.
The honest version of the conclusion: AI-augmented testing has its place in regulated industries — as a complement in areas where the compliance bar is lower. As a replacement for the validated deterministic core, it is not a viable option today.
Sources
- testRigor — 21 CFR Part 11 compliance
- testRigor — AI-based self-healing
- mabl — customer terms of service
- mabl — GenAI test automation with self-healing
- Virtuoso QA — product features
- FDA — AI/ML in software as a medical device
- FDA — PCCP guidance 2025 (Ballard Spahr summary)
- ISPE — GAMP 5 guide, 2nd edition
- ISO/IEC/IEEE 29119-5:2024 — keyword-driven testing
- Siemens — ISO 26262 tool qualification
- AFuzion — DO-178C and DO-330
- PMC — regulatory perspectives for AI/ML in pharmaceutical GMP
- DORA — compliance overview (Vantagepoint)
- Robot Framework — user guide
Evaluating test tooling for a regulated environment, or assessing an existing setup against FDA, ISO 26262, DO-178C, IEC 62304 or DORA? In the UTAA workshop we set the compliance architecture specifically for your project. More on the method or request directly.