ISO 25010 and 25059 in the context of software quality
by Rainer Haupt
TL;DR: ISO 25010:2023 defines 9 main quality characteristics (not 8 as in the 2011 version) with 40 sub-characteristics for software product quality. ISO 25059:2023 extends this model with 6 AI-specific sub-characteristics for AI/ML systems. Together both standards form a complete quality framework that maps systematically to test types and automation tooling.
Reading time approx. 14 min · As of: 2026-04
Making quality measurable without slipping into marketing terms is an old discipline. The SQuaRE framework of the ISO/IEC 25000 series provides the formal basis for it. With the 2023 revision of ISO 25010 and the AI extension ISO 25059 in the same year, the model is current again — including for teams that today integrate LLM components into their applications. This dossier sorts both standards, maps them to concrete test types and tools, and shows where automation carries weight and where it does not.
ISO/IEC 25010:2023 at a glance
All 9 main characteristics and 40 sub-characteristics
| # | Main characteristic | Sub-characteristics |
|---|---|---|
| 1 | Functional Suitability | Functional Completeness · Functional Correctness · Functional Appropriateness |
| 2 | Performance Efficiency | Time Behaviour · Resource Utilization · Capacity |
| 3 | Compatibility | Co-existence · Interoperability |
| 4 | Interaction Capability (formerly Usability) | Appropriateness Recognizability · Learnability · Operability · User Error Protection · User Engagement · Inclusivity · User Assistance · Self-descriptiveness |
| 5 | Reliability | Faultlessness · Availability · Fault Tolerance · Recoverability |
| 6 | Security | Confidentiality · Integrity · Non-repudiation · Accountability · Authenticity · Resistance |
| 7 | Maintainability | Modularity · Reusability · Analysability · Modifiability · Testability |
| 8 | Flexibility (formerly Portability) | Adaptability · Scalability · Installability · Replaceability |
| 9 | Safety (NEW) | Operational Constraint · Risk Identification · Fail Safe · Hazard Warning · Safe Integration |
What changed from 2011 to 2023
| Aspect | 2011 | 2023 |
|---|---|---|
| Main characteristics | 8 | 9 (Safety added) |
| Sub-characteristics | 31 | 40 |
| Usability | Standalone, 6 sub-characteristics | Renamed → Interaction Capability, 8 sub-characteristics |
| Portability | Standalone, 3 sub-characteristics | Renamed → Flexibility, 4 sub-characteristics |
| Safety | Not present | Entirely new, 5 sub-characteristics |
| Quality-in-Use | Inside 25010 | Spun out into ISO 25019:2023 |
| Scope | Software / computer systems only | Extended to all ICT products |
Renamed sub-characteristics: Maturity → Faultlessness (under Reliability), User Interface Aesthetics → User Engagement (under Interaction Capability).
New sub-characteristics besides Safety: Inclusivity (split out of Accessibility), User Assistance (split out of Accessibility), Self-descriptiveness (entirely new), Resistance (new under Security), Scalability (new under Flexibility).
Removed: Accessibility (split into Inclusivity + User Assistance).
Mapping quality characteristic to test type
| Quality characteristic | Test types |
|---|---|
| Functional Suitability | Unit tests · Integration tests · End-to-end tests · API tests · Acceptance tests · Regression tests |
| Performance Efficiency | Load tests · Stress tests · Soak tests · Spike tests · Benchmarks · Capacity tests |
| Compatibility | Cross-browser · Cross-platform · API contract tests · Interoperability tests · Co-existence tests |
| Interaction Capability | Accessibility tests · Usability tests · Visual regression · UX heuristic evaluation |
| Reliability | Chaos engineering · Fault injection · Recovery tests · Availability monitoring · Failover tests · Soak tests |
| Security | SAST · DAST · SCA · IAST · Penetration tests · Vulnerability scanning · Secret detection · Container scanning |
| Maintainability | Static code analysis · Complexity metrics · Duplicate detection · Architecture compliance · Code reviews |
| Flexibility | Installation tests · Migration tests · Configuration tests · Cross-environment tests · Scalability tests |
| Safety | HAZOP · FMEA · Fault tree analysis · Fail-safe verification · Formal verification |
Automation potential by characteristic
| Characteristic | Automation level | Reason |
|---|---|---|
| Functional Suitability | Very high | Most mature tooling, CI/CD core |
| Performance Efficiency | Very high | Threshold-based pass/fail gates feasible |
| Maintainability | Very high | Static analysis on every commit |
| Security | High | SAST/DAST/SCA automated; pentests manual |
| Reliability | High | Chaos engineering integrates into CI/CD |
| Compatibility | High | Cross-browser automated; co-existence harder |
| Flexibility | Medium-high | IaC / containers help; migrations complex |
| Interaction Capability | Low-medium | Only ~30–40 % of WCAG issues automatable |
| Safety | Low | Expert-driven analysis; formal verification partial |
ISO/IEC 25059:2023 — quality model for AI systems
Key facts
ISO 25059:2023 (“Quality model for AI systems”) was published in June 2023 as a first edition by JTC 1/SC 42 (Artificial Intelligence). The standard normatively references ISO/IEC 25010:2011 (not 2023) — a DIS revision referencing 25010:2023 is in progress. Scope: an application-specific extension of the SQuaRE framework for AI/ML systems.
Architectural decision
ISO 25059 retains all 8 main characteristics from ISO 25010:2011 and adds no new main characteristics. The model adds 5 new + 1 modified sub-characteristics in product quality, plus 2 new sub-characteristics in the Quality-in-Use model.
AI-specific sub-characteristics — product quality
| Sub-characteristic | Type | Mapped to | Meaning |
|---|---|---|---|
| Functional Adaptability | NEW | Functional Suitability | Learning from data, adapting to environmental change |
| Functional Correctness | MODIFIED | Functional Suitability | Error rate expected for AI; measure correctness and incorrectness |
| Transparency | NEW | Usability | Information about the AI system available to stakeholders |
| User Controllability | NEW | Usability | User can intervene in AI function in time |
| Robustness | NEW | Reliability | Correctness preserved under adversarial or faulty input |
| Intervenability | NEW | Security | Operator can intervene to prevent harm |
AI-specific sub-characteristics — Quality in Use
| Sub-characteristic | Type | Mapped to | Meaning |
|---|---|---|---|
| Transparency | NEW | Satisfaction | Understanding the system function for end users |
| Societal & Ethical Risk Mitigation | NEW | Freedom from Risk | Fairness, accountability, explainability, privacy, human control, professional responsibility |
Test challenges for AI systems
| Challenge | Cause | Affected characteristic |
|---|---|---|
| Test oracle problem | Expected result unclear in ML | Functional Correctness |
| Non-determinism | Same input ≠ same output | Functional Correctness |
| Concept drift | Data distribution shifts over time | Functional Adaptability |
| Data drift | Input features shift statistically | Functional Adaptability |
| Adversarial attacks | Targeted manipulation of inputs | Robustness |
| Black-box models | Internal logic not inspectable | Transparency |
| Low testability | Low transparency makes testing harder | Transparency |
| Bias detection | Discrimination hidden in training data | Societal Risk Mitigation |
| Time-critical interventions | Autonomous systems react fast | Intervenability, Controllability |
| Model regression | Retraining degrades sub-metrics | Functional Correctness |
Approaches and tools for AI testing
| Quality characteristic | Test approach | Tools |
|---|---|---|
| Functional Correctness | ML metrics (Accuracy, Precision, Recall, F1, AUC-ROC), cross-validation | MLflow, Deepchecks, scikit-learn metrics |
| Functional Adaptability | Data drift monitoring, concept drift detection, retraining validation | Evidently AI, NannyML, WhyLabs, Alibi Detect |
| Robustness | Adversarial testing, noise injection, out-of-distribution detection | IBM ART, CleverHans, Giskard |
| Transparency | Explainability analysis, log inspection, model cards | SHAP, LIME, Fiddler AI |
| User Controllability | Override tests, response time for human intervention | UI test frameworks (Playwright, Cypress) |
| Intervenability | Kill-switch tests, safe-state transition tests | Chaos engineering tools + custom suites |
| Fairness / Bias | Group comparisons, demographic parity, equalised odds | Fairlearn, AIF360, Evidently AI |
| Data Quality | Schema validation, completeness, freshness | Great Expectations, Deequ, Soda Core |
| LLM evaluation | Hallucination detection, toxicity, relevance | Ragas, DeepEval, Evidently AI |
ISO 25010 versus ISO 25059
Structural comparison
| Aspect | ISO 25010:2023 | ISO 25059:2023 |
|---|---|---|
| Scope | All ICT products | AI/ML systems specifically |
| Type | Base standard | Application extension of the base standard |
| Main product-quality characteristics | 9 | 8 (references 25010:2011) |
| Product-quality sub-characteristics | 40 | 31 + 5 new + 1 modified = 37 |
| Quality-in-Use | Spun out into ISO 25019 | Included, 5 main characteristics + 2 new sub-characteristics |
| Normative reference | Standalone | Based on ISO 25010:2011 |
| Metrics | Defined in ISO 25023 | Own TS in development (SC 42) |
What ISO 25059 adds
| Added by 25059 | Parent characteristic | Model | Test relevance |
|---|---|---|---|
| Functional Adaptability | Functional Suitability | Product | Data drift monitoring, retraining validation |
| Functional Correctness (modified) | Functional Suitability | Product | ML metrics instead of binary pass/fail |
| Transparency | Usability | Product | Explainability tests, log audits |
| User Controllability | Usability | Product | Override-mechanism tests |
| Robustness | Reliability | Product | Adversarial testing, noise injection |
| Intervenability | Security | Product | Kill-switch tests, emergency scenarios |
| Transparency | Satisfaction | Quality-in-Use | End-user comprehensibility tests |
| Societal & Ethical Risk Mitigation | Freedom from Risk | Quality-in-Use | Bias testing, fairness metrics |
Conceptual differences
| Aspect | Classical software (25010) | AI systems (25059) |
|---|---|---|
| Correctness | Binary: correct or faulty | Probabilistic: error rate expected |
| Behaviour | Deterministic, reproducible | Non-deterministic, learning |
| Test data | Defined test cases | Statistical distributions, drift monitoring |
| Explainability | Code is inspectable | Black-box models, explainability needed |
| Failure analysis | Stack traces, logs | Feature importance, confusion matrices |
| Regression | Code changes cause regression | Data changes cause regression |
| Bias | Not in scope of the standard | Central challenge (Societal Risk) |
| Safety intervention | Standard error mechanisms | Explicit intervenability required |
Practical relevance — tools and test strategy
Tool recommendations by quality characteristic
| Quality characteristic | Recommended tools | Licence / cost |
|---|---|---|
| Functional Suitability | Playwright, Cypress, RestAssured, pytest, JUnit, Robot Framework | All OSS / free |
| Performance Efficiency | k6 (Grafana), Gatling, JMeter, Locust | OSS; enterprise variants available |
| Compatibility | Playwright (multi-browser), BrowserStack, Pact (contract testing) | Pact OSS; BrowserStack from USD 29/mo |
| Interaction Capability | axe-core, Pa11y, Lighthouse, Applitools Eyes | axe / Pa11y OSS; Applitools freemium |
| Reliability | Gremlin, LitmusChaos, Chaos Mesh, ToxiProxy | Litmus / Chaos Mesh OSS; Gremlin from USD 475/yr |
| Security | OWASP ZAP, SonarQube, Snyk, Trivy, GitLeaks | ZAP / Trivy OSS; Snyk freemium |
| Maintainability | SonarQube, ESLint / PMD, ArchUnit, CodeClimate | SonarQube Community OSS |
| Flexibility | Terraform / Ansible test suites, Container Structure Tests, InSpec | All OSS |
| Safety | LDRA, Parasoft, Polyspace (formal verification) | Commercial, high |
| AI: Correctness | MLflow, Deepchecks, scikit-learn metrics | All OSS |
| AI: Adaptability / Drift | Evidently AI, NannyML, WhyLabs | Evidently OSS; WhyLabs freemium |
| AI: Robustness | IBM ART, CleverHans, Giskard | All OSS |
| AI: Transparency | SHAP, LIME, Fiddler AI | SHAP / LIME OSS; Fiddler commercial |
| AI: Fairness / Bias | Fairlearn, AIF360, Evidently AI | All OSS |
| AI: Data Quality | Great Expectations, Deequ, Soda Core | GE / Deequ OSS; Soda freemium |
Building a test strategy on the standards
Phase 1 — prioritise quality characteristics. Identify which ones matter per system; not all 9 are equally important. Banking: prioritise Security and Reliability. Consumer apps: Interaction Capability and Performance. AI systems: also Robustness, Fairness, Transparency.
Phase 2 — define measurable quality goals. Examples:
- Performance: 95 % of requests under 2 seconds
- Reliability: 99.9 % uptime, recovery under 30 seconds
- Security: zero critical OWASP Top 10 findings
- Maintainability: cyclomatic complexity under 15, duplication under 3 %
- AI correctness: F1 score above 0.92 on test set
- AI fairness: demographic parity difference under 0.05
Phase 3 — quality gates in the CI/CD pipeline.
| Pipeline stage | Tests | Tools |
|---|---|---|
| Build | Unit tests, SAST, linting, dependency check | pytest / JUnit, SonarQube, ESLint, Snyk |
| Integration | API tests, contract tests, accessibility | RestAssured, Pact, axe-core |
| Pre-release | Performance, DAST, chaos engineering | k6, OWASP ZAP, LitmusChaos |
| Pre-release (AI) | Model validation, bias check, data quality | Deepchecks, Fairlearn, Great Expectations |
| Production | Monitoring, drift detection, availability | Prometheus / Grafana, Evidently AI, PagerDuty |
Phase 4 — automate or keep manual.
| Automate | Keep manual |
|---|---|
| Functional regression | Exploratory testing |
| Performance / load tests | Usability tests with real users |
| SAST / DAST / SCA | Deep penetration testing |
| Static code analysis | Safety analysis (HAZOP, FMEA) |
| Accessibility (30–40 %) | Accessibility (60–70 %, screen reader) |
| Cross-browser tests | UX heuristic evaluation |
| Data drift monitoring | Ethical evaluation of AI outputs |
| ML metrics tracking | Explainability evaluation by domain experts |
Phase 5 — continuous monitoring.
| Monitoring metric | Mapped characteristic |
|---|---|
| Response times, throughput | Performance Efficiency |
| Error rates, HTTP 5xx | Functional Suitability |
| Uptime, MTTR | Reliability |
| Security alerts, CVEs | Security |
| Feature drift, prediction drift | AI: Functional Adaptability |
| Fairness metrics over time | AI: Societal Risk Mitigation |
Related standards in the SQuaRE ecosystem
| Standard | Content | QA relevance |
|---|---|---|
| ISO 25010:2023 | Product quality model (9 characteristics) | Base framework for test strategy |
| ISO 25019:2023 | Quality-in-Use model (split out of 25010) | User-centred quality |
| ISO 25059:2023 | AI extension of the quality model | Required for AI/ML systems |
| ISO 25023 | Measurement procedures for product quality | Concrete metrics per characteristic |
| ISO 25012 | Data quality model | Basis for data-quality tests |
| ISO/IEC 5259 | Data quality for ML / analytics | AI-specific data quality |
| ISO/IEC TR 29119-11 | Test guide for AI systems | Practical AI test methods |
| ISO/IEC 24029 | Robustness assessment of neural networks | Formal methods for robustness |
| ISO/IEC 42001 | AI management system | Governance framework |
Verdict
ISO 25010 and 25059 are not theory for the drawer. Anyone building a test strategy gets a complete catalogue of characteristics with mappings to test types and tools — without having to invent the vocabulary again. The AI extension closes the gap classical product quality left open for learning systems: Robustness, Transparency and Intervenability are not marketing terms but formally defined sub-characteristics with assigned test approaches.
Three recommendations for project use. First, do not weight all 9 characteristics equally — prioritise by domain (banking ≠ consumer app ≠ AI system). Second, formulate measurable quality goals before selecting tools — the standard provides the scaffolding, not the target values. Third, automate where the characteristic has an automatable test approach (Functional Suitability, Performance, Maintainability) — where it does not (Safety, interaction, ethical AI evaluation), expert work remains indispensable.
Sources
- ISO/IEC 25010:2023 — Systems and software Quality Requirements and Evaluation (SQuaRE) — Product quality model
- ISO/IEC 25010:2011 — previous edition
- ISO/IEC 25019:2023 — Quality-in-Use model
- ISO/IEC 25059:2023 — Quality model for AI systems
- arc42 Quality Models — Quality.arc42 repository
- iso25000.com — Overview of ISO 25010
- IEC Blog — New standard ensures quality of AI systems
- ZEISS Digital Innovation — ISO 25010 in practice
- innoQ — ISO 25010 Shortcomings
Building a test strategy on ISO 25010 / 25059, or assessing an existing setup against the standards? In the UTAA workshop we prioritise characteristics and tooling against your project. More on the method or request directly.