Fairness as a Design Problem: Why we Need Design Science Research in AI
What Hevner knew and we should too.
Imagine you’re asked to build a credit scoring system. You have data, you know how to train models, and you can compute fairness metrics. But there’s a question no metric will answer for you: what does “fair” actually mean in this specific context? What trade-offs are you willing to make between accuracy and equity? How do you document those choices so they’re not arbitrary? Fairness is not just an ex post evaluation problem; it’s a socio-technical design problem.
This is where Design Science Research (DSR) comes in: a methodological approach that treats algorithmic systems as designed artifacts, not just trained models. DSR gives us a structure to surface normative choices, iterate on design, and build systems that can be governed and audited. This post explains what DSR is, why it matters for fairness, and how it changes the way we think about building algorithmic systems.
The Problem DSR Tries to Solve
Traditional empirical science aims to explain the world: we observe phenomena, build theories, make predictions. In machine learning, we train models on historical data and evaluate performance. But what happens when the core issue isn’t explaining what exists, but designing what should exist?
Algorithmic systems are not neutral. They encode assumptions about what matters, what to optimize, and how decisions should be made. When you build a fraud detection model, you are not just “learning from data”: you are deciding what counts as fraud, what false positive rate is acceptable, and who bears the cost of those errors. These are design decisions, and no dataset will make them for you.
DSR starts from the recognition that many complex problems require intervention and construction to generate useful knowledge. It’s not enough to describe how things work; we need to design artifacts that behave in certain ways, test them in context, learn from their failures, and refine the design.
The Logic of Design Science Research
DSR is structured around an iterative cycle with four core elements:
Relevant problem: it starts from a concrete need in a real institutional context. This is not an abstract optimization puzzle, but a situation where intervention is required.
Artifact: the system, model, method, or tool built to address the problem. In AI, this might be an algorithm, a data pipeline, or a full decision-support system.
Evaluation: the artifact is tested in its intended context of use. Evaluation is not only technical (accuracy, AUC) but also about usefulness, effectiveness, and fit to the problem.
Iteration and refinement: evaluation results feed back into the next design cycle. Knowledge is generated by building, failing, adjusting, and re-testing.
The crucial point is that knowledge emerges from the process of designing and evaluating, not just from observing. In Hevner’s well-known formulation, DSR involves three interconnected cycles: a relevance cycle (does this solve a real problem?), a design cycle (build–evaluate–refine), and a rigor cycle (does it draw on existing knowledge and produce generalizable insights?).
DSR is not “let’s just hack and see what sticks.” It’s a disciplined form of design: every decision needs a rationale, must be documented, and is evaluated against explicit criteria.
Fairness as a Design Problem
Now let’s connect this to fairness. Most technical approaches treat fairness as a measurement or correction problem: you train a model, compute some fairness metrics (demographic parity, equalized odds), and tweak weights or thresholds to “fix” it. But this runs into fundamental issues:
Fairness metrics are often mutually incompatible; you can’t optimize them all at once. Choosing a metric is already a normative decision.
Bias is not only in the model: it’s in the data, in the target definition, in how the system is used, and in the institutional context around it.
Fairness cannot be “bolted on at the end”: if the system was designed to optimize accuracy without considering equity, ex post corrections are usually fragile patches.
Fairness is a socio-technical design problem because it involves decisions about:
What to optimize, and for whom
Which errors are tolerable, and which are unacceptable
How the system will be used in practice
Who has a voice in making these decisions
DSR provides a framework to handle these issues systematically, by making design options explicit and documenting trade-offs along the way.
Examples: Design and Fairness in Context
Credit Scoring Systems
A credit risk model learns patterns of default from historical data. If some groups have historically had less access to formal credit, the model may learn to associate their characteristics with “risk,” perpetuating exclusion. The problem is not just statistical bias; it’s that the system is designed to reproduce the past, without questioning whether that past is just.
A DSR perspective treats the system design as including decisions such as:
Which variables define “risk” and which are off-limits or sensitive
How to balance institutional risk with equitable access to credit
What appeal mechanisms exist for rejected applicants
How to assess the impact of the system on historically excluded groups
These are not questions the data can answer on their own. They are resolved by designing the system around explicit criteria, testing it in context, and adapting it based on real-world effects.
Automated Fraud Detection
A fraud detection system learns to flag “suspicious” transactions. But “suspicious” often correlates with minority patterns of behavior: recent migrants, people with non-standard spending profiles, or those in informal economies. The outcome can be a system that disproportionately flags vulnerable groups.
Designing such a system involves:
Defining what counts as an “anomalous” transaction and relative to which reference population
Deciding what cost of false positives is acceptable (e.g., blocking legitimate accounts)
Setting up audit mechanisms to detect disproportionate impact on specific groups
DSR frames these as core parts of the design–evaluate cycle, not as side effects to be patched later.
Public Resource Allocation
Think of a system assigning health inspections, medical appointments, or housing subsidies. If it learns from historical data where certain neighborhoods have been more heavily monitored, it can create a feedback loop: more inspections generate more recorded violations, which then justify even more inspections.
Here, design must incorporate mechanisms to:
Break discriminatory feedback loops
Embed equity criteria into prioritization rules
Audit territorial and demographic impacts of the system
DSR pushes these design choices to the foreground from the outset, instead of treating them as “unintended consequences” discovered after deployment.
Why DSR Is Particularly Useful for Fairness
DSR reshapes how we talk about fairness in at least three important ways:
It Makes Normative Assumptions Explicit
Every algorithmic system takes stances on what to optimize, which trade-offs to accept, and which impacts are tolerable. DSR requires you to document those choices because you cannot build an artifact without stating what problem it addresses and by which criteria you will judge it. This reduces algorithmic arbitrariness: decisions don’t remain hidden in hyperparameters or architectures, but become part of a justified design.
It Embeds Evaluation into the Design Cycle
Instead of training a model and only then checking whether it’s fair, DSR treats fairness evaluation as an integral, iterative part of design. You test the system in context, observe its effects, adjust the design, and re-evaluate. This is especially important for fairness because many impacts are emergent and cannot be fully anticipated by pre-deployment metrics.
It Supports Governance and Accountability
When a system is documented as a designed artifact (with explicit decisions, evaluation criteria, and refinement cycles), it becomes more auditable. You can trace why a given design decision was made, which alternatives were considered, and how its impact was evaluated. DSR doesn’t guarantee that a system is fair, but it does help ensure that fairness-relevant decisions are traceable and justifiable.
Fairness as a Design Discipline
Fairness is not a single number. It is not something you “fix” by tuning a threshold at the end of your pipeline. It’s a set of design choices that shape how an algorithmic system distributes opportunities, risks, and resources.
DSR offers a practical logic for handling those choices systematically: design with explicit criteria, evaluate in context, iterate based on real-world effects, and document the entire process. For people building algorithmic systems in institutional settings, adopting DSR is not just a methodological shift; it’s an acknowledgment that we are designing decision infrastructures, not just training models. And that calls for a different discipline: not only knowing how to code and optimize, but also how to design systems that can be governed, audited, and justified.
References and Suggested Readings
Hevner, A. R., March, S. T., Park, J., & Ram, S. (2004). Design Science in Information Systems Research. MIS Quarterly, 28(1), 75–105.
Hevner, A. R. (2007). A Three Cycle View of Design Science Research. Scandinavian Journal of Information Systems, 19(2), 87–92.
Barocas, S., Hardt, M., & Narayanan, A. (2023). Fairness and Machine Learning: Limitations and Opportunities. MIT Press. (Open-access version at fairmlbook.org.)
Kaas, M. H. L., et al. (2024). Fair by design: A sociotechnical approach to justifying the fairness of AI-enabled systems across the lifecycle.
Katzenbach, C., & Ulbricht, L. (2019). Algorithmic governance. Internet Policy Review, 8(4).
Bovens, M. (2007). Analysing and Assessing Accountability: A Conceptual Framework. European Law Journal, 13(4), 447–468.



A compelling argument that fairness in AI is fundamentally a design challenge, and that Design Science Research provides the structured framework needed to make normative choices explicit, iteratively tested, and auditable
This is such a needed shift in how we talk about AI. I love how you frame fairness not as a metric to compute at the end, but as a design choice from the very beginning. No dataset can tell us what counts as “fair” or which trade-offs are acceptable. Those are human decisions.