← All articles

ISO 25010 and 25059 in the context of software quality

by Rainer Haupt

TL;DR: ISO 25010:2023 defines 9 main quality characteristics (not 8 as in the 2011 version) with 40 sub-characteristics for software product quality. ISO 25059:2023 extends this model with 6 AI-specific sub-characteristics for AI/ML systems. Together both standards form a complete quality framework that maps systematically to test types and automation tooling.

Reading time approx. 14 min · As of: 2026-04


Making quality measurable without slipping into marketing terms is an old discipline. The SQuaRE framework of the ISO/IEC 25000 series provides the formal basis for it. With the 2023 revision of ISO 25010 and the AI extension ISO 25059 in the same year, the model is current again — including for teams that today integrate LLM components into their applications. This dossier sorts both standards, maps them to concrete test types and tools, and shows where automation carries weight and where it does not.

ISO/IEC 25010:2023 at a glance

All 9 main characteristics and 40 sub-characteristics

#Main characteristicSub-characteristics
1Functional SuitabilityFunctional Completeness · Functional Correctness · Functional Appropriateness
2Performance EfficiencyTime Behaviour · Resource Utilization · Capacity
3CompatibilityCo-existence · Interoperability
4Interaction Capability (formerly Usability)Appropriateness Recognizability · Learnability · Operability · User Error Protection · User Engagement · Inclusivity · User Assistance · Self-descriptiveness
5ReliabilityFaultlessness · Availability · Fault Tolerance · Recoverability
6SecurityConfidentiality · Integrity · Non-repudiation · Accountability · Authenticity · Resistance
7MaintainabilityModularity · Reusability · Analysability · Modifiability · Testability
8Flexibility (formerly Portability)Adaptability · Scalability · Installability · Replaceability
9Safety (NEW)Operational Constraint · Risk Identification · Fail Safe · Hazard Warning · Safe Integration

What changed from 2011 to 2023

Aspect20112023
Main characteristics89 (Safety added)
Sub-characteristics3140
UsabilityStandalone, 6 sub-characteristicsRenamed → Interaction Capability, 8 sub-characteristics
PortabilityStandalone, 3 sub-characteristicsRenamed → Flexibility, 4 sub-characteristics
SafetyNot presentEntirely new, 5 sub-characteristics
Quality-in-UseInside 25010Spun out into ISO 25019:2023
ScopeSoftware / computer systems onlyExtended to all ICT products

Renamed sub-characteristics: Maturity → Faultlessness (under Reliability), User Interface Aesthetics → User Engagement (under Interaction Capability).

New sub-characteristics besides Safety: Inclusivity (split out of Accessibility), User Assistance (split out of Accessibility), Self-descriptiveness (entirely new), Resistance (new under Security), Scalability (new under Flexibility).

Removed: Accessibility (split into Inclusivity + User Assistance).

Mapping quality characteristic to test type

Quality characteristicTest types
Functional SuitabilityUnit tests · Integration tests · End-to-end tests · API tests · Acceptance tests · Regression tests
Performance EfficiencyLoad tests · Stress tests · Soak tests · Spike tests · Benchmarks · Capacity tests
CompatibilityCross-browser · Cross-platform · API contract tests · Interoperability tests · Co-existence tests
Interaction CapabilityAccessibility tests · Usability tests · Visual regression · UX heuristic evaluation
ReliabilityChaos engineering · Fault injection · Recovery tests · Availability monitoring · Failover tests · Soak tests
SecuritySAST · DAST · SCA · IAST · Penetration tests · Vulnerability scanning · Secret detection · Container scanning
MaintainabilityStatic code analysis · Complexity metrics · Duplicate detection · Architecture compliance · Code reviews
FlexibilityInstallation tests · Migration tests · Configuration tests · Cross-environment tests · Scalability tests
SafetyHAZOP · FMEA · Fault tree analysis · Fail-safe verification · Formal verification

Automation potential by characteristic

CharacteristicAutomation levelReason
Functional SuitabilityVery highMost mature tooling, CI/CD core
Performance EfficiencyVery highThreshold-based pass/fail gates feasible
MaintainabilityVery highStatic analysis on every commit
SecurityHighSAST/DAST/SCA automated; pentests manual
ReliabilityHighChaos engineering integrates into CI/CD
CompatibilityHighCross-browser automated; co-existence harder
FlexibilityMedium-highIaC / containers help; migrations complex
Interaction CapabilityLow-mediumOnly ~30–40 % of WCAG issues automatable
SafetyLowExpert-driven analysis; formal verification partial

ISO/IEC 25059:2023 — quality model for AI systems

Key facts

ISO 25059:2023 (“Quality model for AI systems”) was published in June 2023 as a first edition by JTC 1/SC 42 (Artificial Intelligence). The standard normatively references ISO/IEC 25010:2011 (not 2023) — a DIS revision referencing 25010:2023 is in progress. Scope: an application-specific extension of the SQuaRE framework for AI/ML systems.

Architectural decision

ISO 25059 retains all 8 main characteristics from ISO 25010:2011 and adds no new main characteristics. The model adds 5 new + 1 modified sub-characteristics in product quality, plus 2 new sub-characteristics in the Quality-in-Use model.

AI-specific sub-characteristics — product quality

Sub-characteristicTypeMapped toMeaning
Functional AdaptabilityNEWFunctional SuitabilityLearning from data, adapting to environmental change
Functional CorrectnessMODIFIEDFunctional SuitabilityError rate expected for AI; measure correctness and incorrectness
TransparencyNEWUsabilityInformation about the AI system available to stakeholders
User ControllabilityNEWUsabilityUser can intervene in AI function in time
RobustnessNEWReliabilityCorrectness preserved under adversarial or faulty input
IntervenabilityNEWSecurityOperator can intervene to prevent harm

AI-specific sub-characteristics — Quality in Use

Sub-characteristicTypeMapped toMeaning
TransparencyNEWSatisfactionUnderstanding the system function for end users
Societal & Ethical Risk MitigationNEWFreedom from RiskFairness, accountability, explainability, privacy, human control, professional responsibility

Test challenges for AI systems

ChallengeCauseAffected characteristic
Test oracle problemExpected result unclear in MLFunctional Correctness
Non-determinismSame input ≠ same outputFunctional Correctness
Concept driftData distribution shifts over timeFunctional Adaptability
Data driftInput features shift statisticallyFunctional Adaptability
Adversarial attacksTargeted manipulation of inputsRobustness
Black-box modelsInternal logic not inspectableTransparency
Low testabilityLow transparency makes testing harderTransparency
Bias detectionDiscrimination hidden in training dataSocietal Risk Mitigation
Time-critical interventionsAutonomous systems react fastIntervenability, Controllability
Model regressionRetraining degrades sub-metricsFunctional Correctness

Approaches and tools for AI testing

Quality characteristicTest approachTools
Functional CorrectnessML metrics (Accuracy, Precision, Recall, F1, AUC-ROC), cross-validationMLflow, Deepchecks, scikit-learn metrics
Functional AdaptabilityData drift monitoring, concept drift detection, retraining validationEvidently AI, NannyML, WhyLabs, Alibi Detect
RobustnessAdversarial testing, noise injection, out-of-distribution detectionIBM ART, CleverHans, Giskard
TransparencyExplainability analysis, log inspection, model cardsSHAP, LIME, Fiddler AI
User ControllabilityOverride tests, response time for human interventionUI test frameworks (Playwright, Cypress)
IntervenabilityKill-switch tests, safe-state transition testsChaos engineering tools + custom suites
Fairness / BiasGroup comparisons, demographic parity, equalised oddsFairlearn, AIF360, Evidently AI
Data QualitySchema validation, completeness, freshnessGreat Expectations, Deequ, Soda Core
LLM evaluationHallucination detection, toxicity, relevanceRagas, DeepEval, Evidently AI

ISO 25010 versus ISO 25059

Structural comparison

AspectISO 25010:2023ISO 25059:2023
ScopeAll ICT productsAI/ML systems specifically
TypeBase standardApplication extension of the base standard
Main product-quality characteristics98 (references 25010:2011)
Product-quality sub-characteristics4031 + 5 new + 1 modified = 37
Quality-in-UseSpun out into ISO 25019Included, 5 main characteristics + 2 new sub-characteristics
Normative referenceStandaloneBased on ISO 25010:2011
MetricsDefined in ISO 25023Own TS in development (SC 42)

What ISO 25059 adds

Added by 25059Parent characteristicModelTest relevance
Functional AdaptabilityFunctional SuitabilityProductData drift monitoring, retraining validation
Functional Correctness (modified)Functional SuitabilityProductML metrics instead of binary pass/fail
TransparencyUsabilityProductExplainability tests, log audits
User ControllabilityUsabilityProductOverride-mechanism tests
RobustnessReliabilityProductAdversarial testing, noise injection
IntervenabilitySecurityProductKill-switch tests, emergency scenarios
TransparencySatisfactionQuality-in-UseEnd-user comprehensibility tests
Societal & Ethical Risk MitigationFreedom from RiskQuality-in-UseBias testing, fairness metrics

Conceptual differences

AspectClassical software (25010)AI systems (25059)
CorrectnessBinary: correct or faultyProbabilistic: error rate expected
BehaviourDeterministic, reproducibleNon-deterministic, learning
Test dataDefined test casesStatistical distributions, drift monitoring
ExplainabilityCode is inspectableBlack-box models, explainability needed
Failure analysisStack traces, logsFeature importance, confusion matrices
RegressionCode changes cause regressionData changes cause regression
BiasNot in scope of the standardCentral challenge (Societal Risk)
Safety interventionStandard error mechanismsExplicit intervenability required

Practical relevance — tools and test strategy

Tool recommendations by quality characteristic

Quality characteristicRecommended toolsLicence / cost
Functional SuitabilityPlaywright, Cypress, RestAssured, pytest, JUnit, Robot FrameworkAll OSS / free
Performance Efficiencyk6 (Grafana), Gatling, JMeter, LocustOSS; enterprise variants available
CompatibilityPlaywright (multi-browser), BrowserStack, Pact (contract testing)Pact OSS; BrowserStack from USD 29/mo
Interaction Capabilityaxe-core, Pa11y, Lighthouse, Applitools Eyesaxe / Pa11y OSS; Applitools freemium
ReliabilityGremlin, LitmusChaos, Chaos Mesh, ToxiProxyLitmus / Chaos Mesh OSS; Gremlin from USD 475/yr
SecurityOWASP ZAP, SonarQube, Snyk, Trivy, GitLeaksZAP / Trivy OSS; Snyk freemium
MaintainabilitySonarQube, ESLint / PMD, ArchUnit, CodeClimateSonarQube Community OSS
FlexibilityTerraform / Ansible test suites, Container Structure Tests, InSpecAll OSS
SafetyLDRA, Parasoft, Polyspace (formal verification)Commercial, high
AI: CorrectnessMLflow, Deepchecks, scikit-learn metricsAll OSS
AI: Adaptability / DriftEvidently AI, NannyML, WhyLabsEvidently OSS; WhyLabs freemium
AI: RobustnessIBM ART, CleverHans, GiskardAll OSS
AI: TransparencySHAP, LIME, Fiddler AISHAP / LIME OSS; Fiddler commercial
AI: Fairness / BiasFairlearn, AIF360, Evidently AIAll OSS
AI: Data QualityGreat Expectations, Deequ, Soda CoreGE / Deequ OSS; Soda freemium

Building a test strategy on the standards

Phase 1 — prioritise quality characteristics. Identify which ones matter per system; not all 9 are equally important. Banking: prioritise Security and Reliability. Consumer apps: Interaction Capability and Performance. AI systems: also Robustness, Fairness, Transparency.

Phase 2 — define measurable quality goals. Examples:

  • Performance: 95 % of requests under 2 seconds
  • Reliability: 99.9 % uptime, recovery under 30 seconds
  • Security: zero critical OWASP Top 10 findings
  • Maintainability: cyclomatic complexity under 15, duplication under 3 %
  • AI correctness: F1 score above 0.92 on test set
  • AI fairness: demographic parity difference under 0.05

Phase 3 — quality gates in the CI/CD pipeline.

Pipeline stageTestsTools
BuildUnit tests, SAST, linting, dependency checkpytest / JUnit, SonarQube, ESLint, Snyk
IntegrationAPI tests, contract tests, accessibilityRestAssured, Pact, axe-core
Pre-releasePerformance, DAST, chaos engineeringk6, OWASP ZAP, LitmusChaos
Pre-release (AI)Model validation, bias check, data qualityDeepchecks, Fairlearn, Great Expectations
ProductionMonitoring, drift detection, availabilityPrometheus / Grafana, Evidently AI, PagerDuty

Phase 4 — automate or keep manual.

AutomateKeep manual
Functional regressionExploratory testing
Performance / load testsUsability tests with real users
SAST / DAST / SCADeep penetration testing
Static code analysisSafety analysis (HAZOP, FMEA)
Accessibility (30–40 %)Accessibility (60–70 %, screen reader)
Cross-browser testsUX heuristic evaluation
Data drift monitoringEthical evaluation of AI outputs
ML metrics trackingExplainability evaluation by domain experts

Phase 5 — continuous monitoring.

Monitoring metricMapped characteristic
Response times, throughputPerformance Efficiency
Error rates, HTTP 5xxFunctional Suitability
Uptime, MTTRReliability
Security alerts, CVEsSecurity
Feature drift, prediction driftAI: Functional Adaptability
Fairness metrics over timeAI: Societal Risk Mitigation
StandardContentQA relevance
ISO 25010:2023Product quality model (9 characteristics)Base framework for test strategy
ISO 25019:2023Quality-in-Use model (split out of 25010)User-centred quality
ISO 25059:2023AI extension of the quality modelRequired for AI/ML systems
ISO 25023Measurement procedures for product qualityConcrete metrics per characteristic
ISO 25012Data quality modelBasis for data-quality tests
ISO/IEC 5259Data quality for ML / analyticsAI-specific data quality
ISO/IEC TR 29119-11Test guide for AI systemsPractical AI test methods
ISO/IEC 24029Robustness assessment of neural networksFormal methods for robustness
ISO/IEC 42001AI management systemGovernance framework

Verdict

ISO 25010 and 25059 are not theory for the drawer. Anyone building a test strategy gets a complete catalogue of characteristics with mappings to test types and tools — without having to invent the vocabulary again. The AI extension closes the gap classical product quality left open for learning systems: Robustness, Transparency and Intervenability are not marketing terms but formally defined sub-characteristics with assigned test approaches.

Three recommendations for project use. First, do not weight all 9 characteristics equally — prioritise by domain (banking ≠ consumer app ≠ AI system). Second, formulate measurable quality goals before selecting tools — the standard provides the scaffolding, not the target values. Third, automate where the characteristic has an automatable test approach (Functional Suitability, Performance, Maintainability) — where it does not (Safety, interaction, ethical AI evaluation), expert work remains indispensable.

Sources


Building a test strategy on ISO 25010 / 25059, or assessing an existing setup against the standards? In the UTAA workshop we prioritise characteristics and tooling against your project. More on the method or request directly.

Request callback