Robot Framework as a Pure Domain-Specific Language

TL;DR: Robot Framework pays off even without its standard libraries. In the «RF Thin Layer + Pure Python Keywords» pattern, RF is used only as a DSL orchestration layer; all keywords are pure Python functions with direct access to the full Python ecosystem. Benefits: full IDE debugger, no library-wrapper bottleneck, low lock-in, business-readable keywords for auditors. The cost: higher Python competence in the team and architectural discipline.

Reading time approx. 11 min · As of: 2026-04

Robot Framework is often equated with its standard libraries: SeleniumLibrary, Browser, BuiltIn, RequestsLibrary, DatabaseLibrary. In that reading, RF is a test framework with its own ecosystem of test tools. That reading works for small suites — and breaks down at tens of thousands of tests, heterogeneous tech stacks, and long lifecycles.

There is another reading, working for years now in regulated industries (finance, pharma, social insurance, public administration): Robot Framework as a pure DSL orchestration layer — and all keywords as pure Python functions.

The pattern

Three rules define it:

RF only as DSL. .robot files describe test cases, test suites, tags — they are specification, readable for business testers and auditors.
No RF libraries in .robot files. No Library SeleniumLibrary, no Should Be Equal, no Sleep, no Log.
All keywords in Python, with direct access to Playwright, requests, sqlalchemy, paramiko, pyrfc, zeep — whatever the Python ecosystem offers.

Pulled through consistently: a business keyword like Verify Customer Form Complete triggers ten or more Python asserts the .robot file knows nothing about. The test specification stays domain-level; the mechanics live entirely in Python.

What you gain

Full Python ecosystem. When Playwright ships a new trace feature, it’s available immediately. With an RF library wrapper, you wait for the maintainer. The waiting bottleneck disappears because you call libraries directly without an intermediate layer. The same applies to requests (REST), zeep (SOAP), sqlalchemy (database), paramiko (SSH), pika (RabbitMQ), pyrfc/ctypes (SAP). Any interface with a Python library is reachable natively.

Real debugging. Python keywords run with breakpoints, step-through, pdb, IDE inspection. The usual RF variant with standard libraries loses that depth — stack traces are stripped, error localisation goes through library hooks. In the thin-layer pattern the full Python stack stays visible.

Readability for testers and auditors. Precisely because keywords abstract at the business level, .robot files stay human-readable — even for staff without programming experience. This is not «testing without programming knowledge» — that claim has been unsupported since record-and-replay tools of 2005. It is «not everyone has to know Python». Separating specification (.robot) from mechanics (Python) becomes a design principle.

Audit trail by default. RF generates output.xml and log.html from every run. These are not add-ons — they are native. For regulated industries (FINMA, GxP, public-sector audits), that is exactly the chain of evidence reviewers expect. Business-named keywords appear in the log; the test flow is documented end to end.

Lowest lock-in. Python keywords are framework-agnostic — if RF were ever to step aside, the keywords migrate to pytest with minimal effort (a function stays a function). With a standard RF setup using SeleniumLibrary and BuiltIn calls, the same migration demands a complete test rewrite.

Open source, no licence costs. Robot Framework ships under Apache 2.0; mainstream Python libraries under open-source licences. Licence costs for standard test tools at comparable enterprise scale often run into the high five to six figures per year — they vanish here.

What it costs

Performance overhead remains. RF parses and reports every step. Plain pytest is roughly 15–25 % faster. For nightly regression runs across parallel CI jobs that is irrelevant in practice. In microservice CI pipelines with sub-second targets it can matter. The question is whether the RF overhead becomes relevant in the concrete pipeline.

Higher Python bar in the team. Standard RF can start with Library SeleniumLibrary and prebuilt keywords — RF syntax knowledge gets the team a long way. The thin-layer pattern requires Python competence from day one: classes, type hints, helpers, fixtures, data models. Choosing the pattern means accepting a higher entry cost — and getting deep control in return.

Curated keyword library required. For testers to work productively, someone has to name, group, and document the keywords cleanly. Without that curation, you end up with thousands of micro-keywords with overlapping names — .robot files turn into hieroglyphics.

XML output as a scaling boundary. RF writes outputs as XML. Beyond suite sizes of 10,000+ tests in a single run, parsing slows down. A modular test architecture (small suites, parallel jobs, separate outputs) defuses this — at the cost of discipline in suite design.

What the setup looks like

A typical project structure:

project/
├── tests/
│   ├── suites/              # .robot files (test specification)
│   │   ├── customers/
│   │   │   ├── master_data.robot
│   │   │   └── maintenance.robot
│   │   └── orders/
│   ├── keywords/            # Python @keyword functions (mechanics)
│   │   ├── customer_form.py
│   │   └── order_form.py
│   ├── clients/             # Pure Python: API/DB/SAP clients
│   │   ├── customer_client.py
│   │   └── sap_rfc_client.py
│   └── fixtures/            # Setup/teardown, test data
└── pyproject.toml

Three layers, cleanly separated:

suites/ — .robot files with business-level test cases. Readable for business testers. No technical detail.
keywords/ — Python modules with @keyword-decorated functions. Bundle assertions, combine client calls, express domain operations.
clients/ — pure Python clients against the system under test’s interfaces. No RF dependency, usable outside tests too.

This layering is not new — it follows the hexagonal-architecture principle on the test side. It is, however, unusual in standard RF setups, where RF libraries tend to mix the client and keyword layers.

Code example

A business keyword that bundles multiple assertions:

# tests/keywords/customer_form.py
from robot.api.deco import keyword
from tests.clients.customer_client import CustomerClient

@keyword("Verify Customer Form Complete")
def verify_customer_form_complete(customer_id: str) -> None:
    data = CustomerClient().read(customer_id)
    assert data.id == customer_id, f"ID mismatch: {data.id}"
    assert data.status == "active", f"Status: {data.status}"
    assert data.address.zip, "ZIP empty"
    assert data.address.street, "Street empty"
    assert data.contact.email, "Email empty"
    # further business assertions ...

A keyword that composes a workflow:

@keyword("Create Order With Item")
def create_order_with_item(customer_id: str, sku: str, qty: int) -> str:
    client = OrderClient()
    order_id = client.new(customer_id)
    client.add_item(order_id, sku, qty)
    client.save(order_id)
    return order_id

The .robot file only knows the business keywords:

*** Settings ***
Library    tests.keywords.customer_form
Library    tests.keywords.order_form

*** Test Cases ***
Create Order For Existing Customer
    [Tags]    smoke    orders
    Verify Customer Form Complete    C-12345
    ${order_id}=    Create Order With Item    C-12345    SKU-789    3
    Verify Order Saved    ${order_id}

If CustomerClient switches from SOAP to REST tomorrow, the .robot test stays unchanged. If the form gains a new mandatory field, a single assertion is added inside verify_customer_form_complete — the test itself does not need to be touched.

Comparison to standard RF

Criterion	Standard RF (with libraries)	RF Thin Layer + Pure Python
Keyword source	RF libraries (SeleniumLib, BuiltIn)	Pure Python (Playwright, requests, …)
Debugging	Stack traces stripped, library hooks	Full IDE debugger, `pdb`, step-through
New library features	Wait for wrapper update	Available immediately
Ecosystem depth	RF plugin landscape	Python ecosystem
Lock-in	High (RF-library specific)	Low (framework-agnostic)
Performance	RF overhead	RF overhead (same)
Team learning curve	RF + library syntax	RF + Python architecture
Audit trail	output.xml / log.html	output.xml / log.html (same)
Migration to pytest	Test rewrite	Keyword reuse

When (and when not) the effort pays off

The pattern earns its keep under three conditions:

Long-lived test suite. A pilot with 100 tests barely justifies the architectural overhead. A suite expected to run for ten years does.
Heterogeneous technologies in the system under test. When tests touch web, REST, SOAP, database, SAP, message queue, and legacy systems, the full Python ecosystem is invaluable. RF libraries don’t cover all layers.
Regulatory evidence requirements. FINMA reviews, GxP, public-sector audits — wherever tests have to serve as documentation for reviewers, business-readable keywords pay back.

What it is not:

Not abandoning Robot Framework. The DSL stays central. If you want to drop RF entirely, pytest or Behave are the better fit.
Not a universal cure. Small suites benefit little from the architectural overhead.
Not a substitute for test discipline. Architecture follows discipline, not the other way round. A team without a clear test strategy will not produce a maintainable suite even with the best pattern.

Migration from an existing standard RF setup proceeds incrementally: write new tests in the pattern, replace existing library calls at natural seams with Python keywords, build out test reuse across components. A big-bang rewrite is risky and usually unnecessary — the two patterns coexist in one suite as long as the rules are clear.

Sources

Considering RF in the thin-layer pattern, or rebuilding an existing test suite? In the UTAA workshop we set architecture, keyword design, and tooling specifically for your project. More on the method or get in touch directly.