The Robot Framework identity crisis

TL;DR: Robot Framework sits in a fundamental tension: a readable keyword-driven DSL versus its slow drift into a programming language. Since version 4.0, IF/ELSE, WHILE, TRY/EXCEPT, VAR and GROUP have been added. RF creator Pekka Klärck has acknowledged the tension himself — users keep asking for features that would already be available if they implemented their keywords in Python. This article surveys the seven most common community criticisms, the evolution since 2021, and the middle path that experienced users describe as pragmatic.

Reading time approx. 12 min · As of: 2026-04

In 2008 Robot Framework had a clear character: tabular, readable test data in plain-text files, complemented by keyword libraries written in Python. That was a DSL for test description, not a programming language. Almost twenty years later, less of that remains than the marketing side suggests. A team setting up an RF project in 2025 finds a language that supports typed variables, conditionals, two loop types, exception handling, inline Python and named code blocks — everything a fully fledged programming language requires. The community is divided on whether that is progress.

What proponents praise

Strength	Substance
HTML reporting	Out of the box, hierarchical, detailed — even critics concede the superiority
Keyword-driven	Accessible to testers without programming background, provided a keyword library exists
Library ecosystem	SeleniumLibrary, Browser (Playwright), Appium, Requests, Database, SSH — web, API, mobile, desktop under one roof
RF Foundation	70+ member organisations, stable governance, active development
Slack community	Consistently described as helpful and responsive
Standardisation	Multiple kinds of automation in one unified framework

Those strengths are uncontested. The argument is about the costs that come with the language itself.

Seven common criticisms

Listed by frequency in community discussion (RF Forum, Reddit r/QualityAssurance, Hacker News, HitchDev).

Unnecessary abstraction layer. The most cited objection. Reddit threads sum it up: “It adds an unnecessary layer. It’s Python but without Python.” Hacker News discussions argue that writing tests directly in a programming language would have been simpler. Keyword reuse is described in forum samples as double-edged — a few much-used keywords, a long tail of one-off keywords.

Non-programmers rarely write tests. RF’s core promise — non-developers reading and writing tests — rarely materialises in practice according to multiple field reports. One HN voice: “I can count on no fingers how many times non-technical QA folks ever wrote tests.” Another report describes RF being introduced so product managers could write tests — nobody outside the dev team ever did.

Debugging is painful. HitchDev criticises that line numbers and the failing step are not shown. The abstraction layer obscures stack traces. Standard breakpoint support is missing — debugging happens via log inspection rather than interactively, the way pytest takes for granted.

Scaling becomes a burden. A DZone migration report covering 600+ tests describes the experience: “Without OOP and programming patterns it became a nightmare.” Keyword updates across hundreds of test cases cost disproportionately. A Medium post adds: “Keyword reusability drops as test coverage grows; tests get slower and flaky.”

Performance overhead. Multiple comparison articles consistently report 30–40 % slower than pytest, due to keyword-parsing overhead. Switching from SeleniumLibrary to the Browser Library (Playwright) brings up to 50 % speedup. Rigorous benchmarks are scarce, but the reports converge.

IDE support trails. The VS Code Robot Code Extension lacks full auto-formatting and broad autocompletion (as of 2023). RIDE as a dedicated IDE is unstable on macOS. First-class IDE support comparable to pytest or Playwright in PyCharm and VS Code does not exist.

Ecosystem lock-in. RF skills transfer poorly outside the ecosystem. Job postings are noticeably rarer than for pytest or Playwright. Even proponents acknowledge the hiring problem.

The «accidental programming language» — evolution from RF 4.0

The chronology of programming features can be summarised compactly:

Version	Year	New constructs
RF 4.0	2021	IF / ELSE IF / ELSE
RF 5.0	2022	TRY/EXCEPT/ELSE/FINALLY, WHILE, BREAK/CONTINUE/RETURN, inline IF
RF 7.0	2024	VAR syntax with scope control
RF 7.2	2025	GROUP (named code blocks)
RF 7.3	2025	Variable type annotations (`${count: int}`)
RF 7.4	2025	Secret variables, typed library keywords

The result: typed variables, scoping, conditionals, two loop types, exception handling, flow control, inline Python, code grouping, type annotations. Inside .robot files RF is now de-facto Turing-complete.

Community commentary follows accordingly. HitchDev: “Once you have conditionals, loops and variables — why a DSL at all? Python is the better programming language.” Comparisons to C++ templates or MediaWiki templates appear regularly in the forum — under the label “DSL treadmill”, where an originally simple language gradually turns into an incomplete programming language. An RF forum user: “In professional use, RF’s keyword-based test language is awkward. Why not Python as the language and RF as a support library?”

Pekka Klärck himself acknowledged the tension openly in 2021. The RF 5.0 survey produced TRY/EXCEPT and WHILE as the most-requested features. Klärck’s comment: “All these features would already be available if they would just implement their keywords in Python.” That is a remarkably candid admission that the userbase is pulling RF away from its founding philosophy.

The thin-Robot-layer pattern

The pragmatic middle path that experienced users describe: RF only at the test-case level. Logic, branches, loops, data manipulation — all in Python. RF becomes a pure runner for routing, variable handling and logging.

Two voices stand out in the community. Robin Mackaij: “I write logic in Python. For test cases, though, Python is a terrible choice — there you want to express functional steps.” Bartłomiej Hirsz, RF contributor: “You can swap the implementation without changing the test cases.” Both see the pattern as deliberate architecture, not a compromise.

Critics see the same pattern differently. If all the real work happens in Python, the RF layer is overhead without proportional value. The DZone migration report: a feature that would have taken about two months in RF was done in pytest in a week.

A Maxilect field report provides two data points from the same team. Project 1: everything written in RF DSL — the helper library grew to 6,700 lines of RF code, “became hard to maintain”. Project 2: thin-layer pattern — RF became a pure runner, “the work felt like normal development”. One team, two architectures, a clear result.

Where RF stands in comparison

Dimension	Robot Framework	pytest	Playwright	Cypress
Sweet spot	Acceptance / ATDD, mixed-skill teams	Unit, integration, API	Web E2E	JavaScript web E2E
Execution speed	Slowest (around 30–40 % overhead)	Fastest (Python)	Fast (browser level)	Fast (in-browser)
Built-in reports	Excellent HTML logs	Plugins required	HTML + trace viewer	Dashboard
Non-coder-friendly	Yes (keyword-driven)	No	No	No
Multi-platform	Web, API, mobile, desktop	Via libraries	Web only	Web only
Debugging	Log-based, limited	Full Python debugger	Trace viewer, inspector	Time travel
Parallelisation	External (Pabot)	pytest-xdist	Native sharding	Paid feature

Migration trends confirm the picture. The most common ways out of RF: pytest in Python teams, Playwright natively in web teams. A widely seen hybrid: RF + Selenium → RF + Browser Library — not away from RF but away from Selenium. The flow toward RF includes individual SpecFlow BDD teams after SpecFlow’s discontinuation. Market share according to 6sense is around 4.92 %, rank 6 among test frameworks, with strength in manufacturing, telecommunications and embedded.

Verdict

The thesis “RF is drifting away from its original intent” is clearly supported by the community discussions. Since 2021 RF has systematically added programming constructs — roughly one per major release. The RF creator has acknowledged the tension. Forum threads, HN discussions and migration reports cover the DSL-treadmill problem consistently. RF 2025 is no longer the tabular test-data format of 2008.

The defence runs: the new constructs are optional, simple test cases stay simple. Technically true. In practice teams use the features as soon as they become available. The codebase grows more complex; the learning curve longer.

The thin-layer pattern is the pragmatic answer: RF only where it actually plays to its original advantage — business-readable tests, multi-tech orchestration, audit-grade reports. Logic, data processing and control flow in Python, where they belong. Anyone setting up RF today should pick this middle path from day one as the default architecture — not as a retrofit when the suite has long outgrown the DSL.

Sources

Evaluating Robot Framework for a new project or assessing an existing RF suite against pytest? In the UTAA workshop we set the architectural decision specifically for your project. More on the method or request directly.