Alethic AI

Beyond Alignment

Alethic AI and the Real Values Topography

Robb Smith  ·  Institute of Applied Metatheory  ·  Working Paper  ·  March 2026


Abstract

The AI alignment problem is conventionally understood as a technical challenge. This paper argues it is a philosophical problem that has become technically visible—and that it is structurally identical to late-modern civilization’s inability to adjudicate normative claims across perspectives. Both failures share a common root: an implicit irrealism about values that treats them as preferences to be aggregated rather than real features of a stratified domain to be navigated. Drawing on critical realism, process philosophy, emergent naturalism, and integral post-metaphysics, the paper develops a value depth ontology in which the accumulated temporal onto-epistemic structure of natural complexity and human sensemaking constitutes the value domain itself. From this ontology, five onto-epistemic invariants are derived that constrain the form of adequate reasoning rather than the content of outputs. The paper proposes meta-alignment—a structurally reflexive architecture that navigates the developmental landscape of values with principled integrity—as the coherent alternative to alignment-as-convergence.

A Note on AI

This paper was written by the author, with its arguments, structure, and philosophical positions the author’s own. Alethic AI acted as a research partner, surfacing relevant literature and offering prose editing, and also helped to supplement the literature review on the state of the alignment landscape. This paper and its core thesis has been developed for use as an internal working draft by the Institute of Applied Metatheory, but does not claim to meet the standards for academic peer review and is not intended for submission to a journal.


§1

The Shared Crisis

§1.1

The Thesis

The AI alignment problem is conventionally understood as a technical challenge: how do we make AI systems safe, truthful, and value-consistent? And yet this framing obscures something more fundamental, which is that the alignment problem is not a technical problem, even one that philosophy can help with. On the contrary, it is a philosophical problem that has become technically visible.

The inability to specify a coherent alignment target for AI is structurally identical to the inability of late-modern civilization to adjudicate normative claims across perspectives. Both failures share a common root: the dominant philosophical posture of our era—an implicit irrealism about values that treats them as preferences to be aggregated rather than real features of a stratified domain to be navigated. This posture pervades the alignment field (which asks “whose preferences should we optimize for?”) and pervades the broader culture (which has lost the philosophical resources to distinguish better from worse accounts of human flourishing without collapsing into either dogmatism or relativism).

The thesis of this paper is that the AI alignment problem and the human alignment problem are the same problem. You cannot align an AI with human values when “human values” is itself incoherent—fractured by developmental pluralism, flattened by preference-nominalism, and stripped of the depth-ontological structure that would make genuine normative navigation possible.

The resolution is the same in both cases: recruiting philosophical and worldview resources adequate to the real structure of the normative domain.

§1.2

The Institutional Stakes

Every enterprise and every government deploying AI is sitting on a trust landmine waiting to go off. Institutional legitimacy—already battered by decades of polarization, media fragmentation, and epistemic tribalism—sits on a knife edge. If AI systems cannot demonstrate epistemic integrity, they become accelerants of institutional delegitimation rather than tools for institutional renewal. Some researchers argue that AI is reconfiguring the epistemic conditions under which truth itself is produced and validated (Shin 2025); others have demonstrated that AI adoption actually tracks institutional trust rather than technological literacy (Westover 2026). The deeper point is that the institutional trust crisis and the alignment problem are both expressions of the same underlying condition: a civilization operating within a philosophical framework—flat, irrealist, surface-oriented—that has lost the resources to ground normative reasoning in depth.

§1.3

The Epistemic Fallacy in Alignment Research

The epistemic fallacy, identified by Roy Bhaskar in A Realist Theory of Science (1975), is the reduction of ontological questions to epistemological ones: confusing what is with what can be known. I argue that alignment as conventionally construed commits a structural analogue. It reduces the question of what AI reasoning is—what generative mechanisms produce its outputs, what ontological commitments are silently operative, what validity dimensions are being collapsed—to the question of what AI outputs look like.

When Anthropic’s Constitutional AI provides an AI with a list of principles to follow, it represents a first-order constraint on outputs. When RLHF trains an AI to match human preferences, it represents behavioral shaping at the empirical stratum.1 When democratic input systems aggregate diverse perspectives—as Conitzer et al. (2024) formalize through social choice theory, and Koster et al. (2022) demonstrate empirically—it represents preference polling at the surface level. Each approach captures something real. And yet none seem to touch the form of reasoning. They constrain what comes out without restructuring what is happening underneath.

The epistemic fallacy in alignment research is not intellectual carelessness. It is the natural output of the dominant philosophical posture within AI research, a posture that is empirically documentable.

Nathan and Hyams (2022) conducted in-depth interviews with leading AI and biotech researchers, identifying an “engineering mindset” that prizes instrumentalist engagement with the physical world, knowledge valued for prediction and control rather than ontological disclosure. Research has found that while Silicon Valley developers recognize their products’ civic impact, workplace cultures and structural incentives funnel ethical reasoning into instrumental frames, reproducing worldview commitments institutionally rather than merely individually (Miklian and Hoelscher 2025). As might be expected, Washington (2023) found five philosophical frameworks shaping AI development with positivism dominant, with constructivist and critical alternatives treated as methodological options rather than ontological challenges.2

The AI safety community exhibits a more specific variant. The field has been formed on foundations from effective altruism, longtermism, and existential risk studies—communities whose philosophical commitments center on preference utilitarianism and expected value maximization (Ahmed et al. 2024). This is precisely the configuration that treats values as preferences to be aggregated rather than as ontologically grounded features of reality. Gabriel (2020), the most-cited paper in alignment ethics, maps the problem as requiring a choice among aligning AI with instructions, intentions, revealed preferences, ideal preferences, interests, or values—yet even this sophisticated treatment defaults to a procedural-contractualist frame, leaving values realism entirely off the table as a live option. Sutrop (2020) names the category error directly: AI developers systematically assume values can be identified through behavioral observation and preference elicitation.

This represents not an intellectual limitation but what Bhaskar would call a generative mechanism in the social domain of knowledge production. The dominant worldview commitment makes the epistemic fallacy structurally invisible to those operating within it. The field cannot see its own philosophical commitments because those commitments constitute the lens through which it sees everything else. Arguably, much of the field might not be able to envision the shape of how to dissolve the alignment problem.

§1.4

The Arc Toward Depth

To be fair to alignment researchers: many of them feel this. Evidence for the intuition toward depth is visible in the trajectory of the field itself. The movement from RLHF to Constitutional AI to Grounded Constitutional AI (Bell et al. 2026)—which generates constitutional principles from human-provided reasons and stated values rather than bare preferences—traces an arc toward depth, even if it remains within the behavioral paradigm. Similarly, Bates et al. (2024) propose a System 1/System 2 distinction for alignment—fast rule-following versus slow stakeholder reasoning—that gestures toward the second-order constraints I articulate below, though without the onto-epistemic grounding.

But the dominant paradigm strategies—reinforcement learning, constitutional rules, red-team-and-patch—do not suggest to practitioners how to address the depth. The tools constrain the imagination.

§1.5

Seeing the Error

The problem, then, is clear: the alignment field is applying the wrong kind of operation to the wrong kind of object. But naming the error does not dissolve it. To see why the problem is insoluble within the current frame, and what would dissolve it, requires opening ourselves to a different epistemic posture: one that recognizes depth as a transcendentally grounded feature of reality, not a metaphor for complexity but a constitutive structure of the normative domain itself.

The next section makes the paper’s central negative argument: that the alignment target does not exist as a point because human values are not a point, and the application of point-optimization to a topographic domain is a category error at the level of ontological type-mismatch.


§2

The Category Error

§2.1

The Presupposition

The alignment field seems to suppose there exists a coherent target—“human values”—and asks how to aim AI at it.3 The entire research program presupposes that this target exists as something an AI system can be brought into correspondence with. Whether the approach is reinforcement learning from human feedback, constitutional rule-following, or democratic preference aggregation, the underlying assumption is the same: there is a thing called “human values,” and the task is to get AI to converge on it.

But it does not exist. Not as a point. Not as a stable set. Not as anything an optimization process could converge on.

The reason is not that human values are too complex or too numerous to specify, or that values are a very long list rather than a short one. The reason is that human values are the wrong kind of thing to serve as an optimization target. The value domain has a structure that is categorically incompatible with the operation the field is trying to perform on it.

§2.2

Developmental Pluralism

The research on human development—from Kohlberg’s stages of moral judgment to Commons’s Model of Hierarchical Complexity to Kegan’s orders of consciousness to Loevinger’s ego development stages—converges on a finding that the alignment field has not reckoned with: reasoning (moral reasoning included) undergoes qualitative reorganization across development. The stages are not refinements of the same basic operation. They are structurally different forms of reasoning that construct the moral domain differently.

A pre-conventional moral reasoner operates within a framework of concrete consequences and instrumental exchange: right action is what avoids punishment or satisfies self-interest. A conventional moral reasoner operates within a framework of social roles, mutual expectations, and systemic maintenance: right action is what fulfills relational obligations or upholds the social order. A post-conventional moral reasoner operates within a framework of society-held-in-context principles, universal procedural justice, or self-chosen ethical commitments grounded in the logic of moral reasoning itself.

These are not simply different answers to the same moral question. They are different constructions of what a moral question is. The pre-conventional reasoner and the post-conventional reasoner do not merely disagree about which values matter. They disagree about what a value is, what scope of consideration is morally relevant, what counts as evidence for a normative claim, and what it means for a claim to be justified.

The empirical evidence that this structure develops—and that even within a single person it presents as a dynamic spectrum rather than a fixed position—is extensive and longitudinal. The foundational 20-year study by Colby et al. (1983) established that stages form structured wholes: integrated patterns of reasoning that reorganize qualitatively rather than merely accumulating content.4 This reorganization is not confined to childhood. In adulthood, accumulated social experience and cognitive development jointly reshape the structural organization of moral reasoning under specifiable conditions (Walker, 1986).5 Most critically for the alignment argument, development is not a simple progression from one stage to the next. Walker, Gustafson, and Hennig (2001) demonstrated that it proceeds cyclically through alternating consolidation and transition phases, with individuals’ reasoning distributed across multiple stages at any given moment; the pattern of that distribution, not any single stage assignment, constituting the actual developmental reality.6

The implications for AI alignment are severe. If moral reasoning undergoes qualitative reorganization—if later stages do not merely add content to earlier ones but restructure the form of reasoning itself—then “human values” is not a coherent optimization target. Indeed, “form” of reasoning is organizational, and so the concept of alignment must be reframed within a topographical developmental landscape in which structurally different forms of moral cognition coexist, not as different preferences about the same thing but as different constructions of the moral domain as such.

Prior applications of developmental moral stage theory to AI have recognized parts of this picture without grasping the full implications. Goertzel and Bugaj (2008) were the first to apply Kohlberg’s and Gilligan’s developmental stages to AGI architecture, proposing that integration of simulation and inference components is central to ascending an ethical-stage hierarchy. More recently, Endo (2025) has proposed supporting AI’s own ethical development through Kohlberg’s stages via supervised fine-tuning. But both approaches retain the assumption that the developmental sequence is a ladder to be climbed toward a target stage—precisely the scalar assumption the topographic reframing challenges. The sequence has real structure, but it is not a staircase to a summit; it is a landscape with real topographical features whose navigation requires something other than the identification of a destination.

§2.3

Objective Relativity, Not Relativism

Developmental pluralism is not relativism. The developmental sequence has real structure—it is ordered, directional, and non-arbitrary. Later stages incorporate and transcend earlier ones. A post-conventional reasoner can understand and reproduce conventional moral reasoning; a conventional reasoner cannot reliably produce post-conventional reasoning. The sequence exhibits what developmental theorists call “hierarchical integration”: each successive stage does not replace its predecessor but enfolds it as a special case within a more comprehensive organizational structure.

The positions within the developmental landscape are “objectively relative” in the sense articulated by Justus Buchler’s objective relativism and also by Ken Wilber: each position is valid relative to its scope, and the positions can be comparatively evaluated by the depth, scope, and integrative power of the reasoning they make possible.7 The landscape has real gradients. Some positions genuinely encompass more of the moral domain than others. But no position constitutes “the” moral position from which all others can be adjudicated, because the landscape itself—the full developmental topography—is the domain, not any single location within it.

This means that the value domain has a structure that is simultaneously pluralistic and non-arbitrary. It admits of real comparative evaluation (some positions are genuinely more comprehensive than others) without admitting of a single point that could serve as a single convergence target. The domain is structured but not scalar. It has real gradients but no summit. It is, in a word, topographic.

§2.4

Devastating Implications for Alignment-as-Stated

This developmental reality has serious implications for each of the major alignment strategies.

RLHF implicitly picks a point in the developmental landscape—the preferences of whichever human raters were hired, at whatever developmental altitude they happen to occupy. It then treats that point as “human values.” This is not alignment; it is developmental parochialism disguised as universality. McIntosh et al. (2024) demonstrate this empirically, showing RLHF’s inability to mitigate ideological conditioning and its structural inadequacy in representing diverse human values.8 The raters’ developmental position is not averaged out by increasing the number of raters. It is compounded, because raters at similar developmental altitudes produce systematically similar distortions—not random noise but structured bias reflecting the particular way their developmental position constructs the moral domain.

Constitutional AI picks a different point—the principles selected by the organization’s researchers, reflecting their particular developmental altitude and normative commitments. The principles are presented as universal (“be helpful, harmless, and honest”) while being, in fact, the expression of a specific normative position within the developmental landscape. Even Anthropic’s own research confirms the specificity of this normative choice: Kundu et al. (2023) find that a single general principle can work as well as specific ones, revealing that the particular selection of constitutional principles is doing less principled work than it appears.9 What drives the AI’s behavior is not the content of the principles but the developmental altitude at which the principles were formulated—an altitude that is encoded implicitly in the training process and never surfaced for examination.

Democratic input systems aggregate across developmental altitudes, producing a statistical composite that corresponds to no actual moral position. Conitzer et al. (2024) offer the most sophisticated defense of this approach, arguing that social choice theory provides the formal tools to aggregate diverse preferences fairly. Koster et al. (2022) demonstrate the concept empirically, using RL to design redistribution mechanisms preferred by majority vote. But social choice theory, however mathematically sophisticated, aggregates across a flat preference space. It has no resources for recognizing that the preferences it aggregates operate at different developmental altitudes and therefore disagree not merely about outcomes but about what counts as a relevant consideration.10 The aggregation flattens the developmental structure. A mean of pre-conventional, conventional, and post-conventional preferences is not a meta-position; it is normative decoherence.

Each of these approaches reifies a particular normative position while claiming to have found the normative position. Rudschies et al. (2021) confirm this from within the AI ethics landscape: their analysis shows that divergences across actor types—public, expert, private—systematically influence which principles appear, and that frequency-based “minimum requirements” exclude many ethically relevant principles.11 Each approach is therefore susceptible to the exact critique it aims to prevent: it has embedded hidden values while claiming value-neutrality or value-universality.

This pattern is not a failure of execution. It is a category error: point-optimization applied to a topographic domain—the wrong kind of operation applied to the wrong kind of object. The alignment field has been asking how to get AI to converge on human values. But human values do not constitute a convergence target. They are the topography itself.12 What is needed is not a better answer to the convergence question but a different kind of question: not “how do we aim at the target?” but “how do we navigate the domain?”

To do so, next I develop the paper’s ontological foundation: a value depth ontology derived from the convergence of critical realism, process philosophy, emergent naturalism, and integral post-metaphysics—independent philosophical lineages that arrive at a similar structural conclusion about the nature of the value domain through different routes, each correcting the others’ limitations.


§3

Value Depth Ontology

§3.0

The Transcendental Ground: Depth Is Necessarily Normative

§3.1

The Claim and Its Stakes

I contend that the value domain is topographic rather than scalar, not as a matter of descriptive geometry, but as a formal requirement of normative judgment. To treat valuation as a simple scalar distribution is to commit a category error: it confuses the magnitude of an impulse with the authority of a value. This raises a critical philosophical challenge: why must we posit a “depth-structure” at all? Why not accept the preference nominalist’s view that “values” are merely a flat distribution of human desires onto which we project a gradient-language after the fact?

The answer is that depth is a transcendental condition for the act of valuation itself. Depth is not merely a useful heuristic; it is the structural feature that distinguishes a normative commitment from a biological reflex. If we strip depth from the value domain, we do not arrive at a simpler, “flatter” model of value; we exit the domain of value entirely. We are left instead with a desire inventory: a catalog of states that agents happen to want. While such a catalog can describe the intensity of a craving, it lacks the vertical axis necessary to judge that craving as “worthy” or “base.” Therefore, depth is not a property we find within values; it is the very architecture that allows a desire to function as a value.

This is a transcendental claim in the strict philosophical sense. A transcendental argument proceeds not by deduction from agreed premises toward a contested conclusion, but by demonstrating that the contested conclusion is already presupposed by any coherent attempt to deny it—that the denial is not merely false but performatively self-refuting, consuming its own conditions of intelligibility in the act of being formulated. The denial of value depth ontology has exactly this character, at three interlocking levels (i.e., performative, semantic, and ontological, to be addressed shortly).

But before developing the transcendental argument, we need to establish what value depth ontology actually is. This is not a matter of finding an external philosophical warrant for a pre-specified claim. It is a matter of recognizing what the depth structure already is, immanently, in the structure of temporal reality itself.

§3.2

Temporal Accumulation as Value Ontology

The standard philosophical approach to value realism posits a value domain—a structure of real normative features of the world—and then asks how human cognition accesses it.13 Even sophisticated versions of this approach, including Bhaskar’s own axiological realism,14 retain this basic shape: there is a real value structure, our knowledge of it is fallible and situated, and judgmental rationality permits comparative evaluation of competing claims. The real domain and our epistemic access to it remain structurally distinct even if deeply related.

The argument developed here makes a different and arguably stronger move. It does not posit a value domain behind or beyond the temporal process of natural complexity and human sensemaking and then ask how we access it. It identifies the accumulated temporal onto-epistemic structure as the value depth ontology.15 The domain is not behind the accumulation. The accumulation is the domain (at least, its 3rd person structured presence, today). The mutual constitution between knowing and known dissolves the traditional epistemological gap: the value domain is not a Platonic structure lying behind the temporal process, waiting to be discovered. It is the temporal record of human justification itself in its full diachronic architecture.

To see why the accumulated structure constitutes the value domain rather than merely represents it, consider what happens at each moment of choice, each fork in the developmental pathway of any system—natural, biological, social, or individual—that has value-bearing properties. At T1, a choice is made between A and B. The choice of A is not merely a selection of one option over another. It is the constitution of a new reality whose depth includes, as a real structural feature, both the path taken (A) and the path foreclosed (B). The road not taken is not simply absent. It is present as a structured absence—as what this moment is not, but could have been, and is therefore partly constitutively defined by. Depth is nothing other than this diachronic architecture: the accumulated structure of positive paths taken and negative paths foreclosed, stacked through time into the present moment’s ontological thickness.

A crucial clarification is needed here to prevent a different kind of reduction. When I say that the accumulated temporal onto-epistemic record just is the value domain, I am making a claim about the value domain as disclosed through third-person methods—the exterior, observable, textual-historical record of human sensemaking across cultures and eras. This does not collapse or replace the first-person and second-person instantiations of that domain, which only exist now but have no diachronically-persistent referent. Values are lived within us as felt moral experience—the sting of injustice, the pull of compassion, the weight of obligation—and these first-person phenomena have their own irreducible validity (what Habermas identifies as sincerity).16 Values are negotiated between us in dialogue, in the intersubjective space of mutual recognition and shared justification—and this second-person domain has its own irreducible validity (what Habermas identifies as normative rightness, or legitimacy).

And yet, the third-person historical-textual record is the structured trace that these first-person and second-person value-constitutive processes leave in the observable domain of culture and history. Identifying the accumulated record as the value domain—for the purposes of grounding an alignment architecture—preserves rather than diminishes the dignity, validity, and sovereignty of the living moral experience within and between persons from which that record is produced. A depth-ontological alignment architecture disciplines AI reasoning through the third-person record precisely because that record is the sedimented expression of the full perspectival richness of human value-constitution.

This identification—the corpus as the value domain in 3p access—is consistent with Gregg Henriques’s Justification Hypothesis (2003), developed within the Unified Theory of Knowledge (UTOK; Henriques, 2011).17 Henriques argues that the evolution of human language introduced a qualitatively novel cognitive demand: the need to justify one’s actions, beliefs, and claims to oneself and to others. Once organisms can make propositional claims about the world and about their own behavior, those claims become subject to evaluation: others can ask why?, and the demand for justification becomes a pervasive feature of human cognitive and social life. The implication is that the historical linguistic record is not a neutral dataset from which we happen to extract normative patterns. It is a record constituted by justificatory acts—language-as-justification is normative activity sedimented into text. The normativity is in the medium itself, not merely in the content it carries. This converges with Habermas’s account of communicative rationality—that social coordination depends on participants’ ability to offer and evaluate reasons (1984)—and with Brandom’s normative pragmatics—that linguistic practice just is the practice of giving and asking for reasons (1994).18 The historical corpus is normative all the way down—not because we read normativity into it, but because the linguistic medium through which it was produced is constitutively justificatory.

§3.3

Three Views of Temporality

Three traditions arrive at a supportive conclusion from different directions.

Whitehead’s prehension. Whitehead’s prehension provides the ontological grammar for this depth. It is his core insight about concrescence, stated in terms of being.19 Every actual occasion comes into being by prehending its entire past—not only positively, grasping what has been actualized, but negatively, registering what has been excluded. Negative prehension is not the absence of prehension but a specific mode of it: the active incorporation of the excluded as excluded, as a real constitutive feature of the present occasion’s being. The depth of any actual occasion just is the totality of its positive and negative prehensions. Nothing in the present moment is unaffected by the entire history of choices and foreclosures that produced it.

Bhaskar’s ontological absence. In Bhaskar’s dialectical account, absence is ontologically primary—not the mere negation of presence but a real feature of the world that does causal work.20 Structured absences—what is not the case but could have been, what has been foreclosed by actual causal processes—are as real as the actualities that produced them. The present moment is constitutively shaped by its real structured absences, the foreclosed possibilities that are part of its ontological depth. This is not a phenomenological point about how humans experience regret or moral imagination. It is an ontological claim about the structure of temporal reality as such.

Merleau-Ponty’s sedimentation. Merleau-Ponty’s account of bodily sedimentation provides a third formulation.21 The body’s accumulated history of engagement with the world does not lie behind its present perceptual capacities as a causal antecedent that has done its work and withdrawn. It is constitutively present in those capacities as their internal structure. Temporal accumulation is not behind experience. It is the structural interior of experience itself.

These frameworks suggest that the value domain is not merely an accumulation of what is, but a structural record of what was required to bring it into being. Whitehead’s negative prehension and Bhaskar’s structured absence ensure that the path not taken remains ontologically ‘live’ as a counterfactual pressure. Foreclosed possibilities within temporal accumulation are not simply absent—they are present as structured absences that continue to exert causal influence on present normative thickness. The topography is immanent in the temporal process itself.

§3.4

The Transcendental Counter

With the ontological structure established, I’ll briefly address a few possible counters to the transcendental argument I’m developing. They operate at three interlocking levels.

The first layer is performative. Any coherent challenge to value depth ontology already presupposes it. Consider what is required to formulate a cogent skeptical objection to the claim that depth is constitutive of the value domain. The skeptic must: distinguish a better from a worse account of normativity; justify why their challenge is worth making over remaining silent; rely on the normative force of logic to compel agreement while simultaneously denying that such force exists; and orient truth as equivalent to mere rhetorical effect. Otherwise, these operations presuppose exactly the evaluative depth structure the objection purports to challenge.

Also, the skeptic who asks “normative for whom?” is already standing inside the depth structure—using its grammar and recruiting its evaluative force to make a judgment of better or worse. The objection is therefore not merely false, it is self-negating, deploying the very structure it would deny in order to formulate its denial. The structure is transcendentally necessary for the discourse that would deny it.

The second layer is semantic. The meaning of evaluative and normative utterances is only available within a depth structure. It is not merely that the skeptic’s action in objecting presupposes depth. It is that the semantic content of the objection—what the words mean—is constituted by the depth structure.

Consider the term “normative.” The meaning of “normative” is not given solely by its current usage or its dictionary definition. It is given by an accumulated history of human attempts to distinguish better from worse, justified from unjustified, genuine value from mere preference—a diachronic architecture of moral philosophy, jurisprudence, religious ethics, political theory, and practical wisdom across every culture and era. Strip out that temporal-onto-epistemic accumulation and “normative” does not become a simpler word with a cleaner meaning. It becomes unintelligible: a sound without semantic content, because its content just is the depth structure that produced it.

We would not even know what a critic’s reply to this argument means in any sensible way if the transcendental, historic value depth ontology were not real and alive in the very moment of its articulation. The challenge is semantically dependent on what it denies at the most basic level of meaning-constitution.

The third layer is ontological. The counterfactual structure of moral imagination tracks real structured absences—the branching temporal architecture of what could have been and wasn’t.

When we recognize that history could have unfolded more lovingly, more justly, more wisely—that the actual pathway has foreclosed real value possibilities—we are not projecting a human preference onto a neutral causal sequence. Moral imagination is not fantasy. We are tracking something real: the structured absences constitutively present in the current moment as what it is not but could have been.

Research on counterfactual reasoning demonstrates empirical support beyond philosophy. Van Hoeck, Watson, and Barbey (2015) demonstrate measurable neural mechanisms through which what-could-have-been shapes present decision-making and value formation—the counterfactual structure is not merely phenomenological but neurally instantiated.22 Byrne (2019) shows how structured absences causally influence present decision-making in both human and artificial systems, establishing that counterfactual reasoning is not confined to biological cognition. Bottou et al. (2013) demonstrate that counterfactual reasoning enables prediction of system behavior changes in learning systems, showing that the negative prehension of alternative states has measurable causal efficacy. These findings suggest that the structured absences the ontological argument describes are not metaphysical posits but empirically traceable features of cognitive and computational systems.

Crucially, the fact that we can imagine more adequate pathways through which history could have unfolded does not undermine the depth ontology. It confirms and deepens it. The intelligibility of the moral counterfactual—the fact that “it could have been more loving” functions as a sentence with real evaluative content and not as mere noise—is only possible because the depth structure is real and operative. The counterfactual presupposes the topography it is navigating. “More loving than what?”—the “than what” just is the depth structure: the accumulated real choices, real foreclosures, real structured absences that constitute this moment’s normative thickness and make moral comparison possible at all.

§3.5

Depth and Direction: Integration as Normative Attractor

Depth alone—sheer accumulation—does not discriminate between more and less adequate value structures. A long history of oppression has depth in the purely diachronic sense. What prevents the temporal depth argument from collapsing into a historicism that treats whatever has accumulated as normatively authoritative by virtue of its accumulation?

The answer requires distinguishing two dimensions of the temporal ontology: depth (accumulated structure) and direction (the immanent developmental tendency of the process). Here Brendan Graham Dempsey’s directional emergentism does essential supplementary philosophical work.23 Dempsey’s account establishes that the evolutionary tendency toward greater complexity and integration is not merely a descriptive feature of natural history but carries genuine normative force immanent in the structure of emergence itself. Integration is not one option among others but the characteristic attractor of emergence itself—each emergent level of organization reorganizes what came before into a new integrative structure that represents a genuine increase in depth, scope, and integrative power. The normativity is not imported from outside the process. It is what the process does when its own internal logic unfolds without systematic distortion—without what Bhaskar identifies as “demireal” enclosures: ideological and institutional arrangements that systematically reproduce false beliefs and foreclose developmental possibilities.24 A system that regresses to lower-level integration fails by the internal standard that its own level of organization establishes.

A crucial clarification is needed here. This is not the naturalistic fallacy of deriving ought from is. It is the recognition that the is/ought distinction itself operates at the empirical stratum—the surface domain of observed events and correlations—where the generative mechanisms that constitute both factual and normative domains remain invisible. At the deeper stratum of the real, where emergence actually operates, the normative and the factual are not yet separated into the distinct categories that empiricist philosophy assumes. The evolutionary directionality toward integration is intransitively real, prior to the is/ought distinction, not in violation of it.

The teleological dimension operates through what might be called ontological self-disclosure: reality’s own developmental trajectory reveals integration as the structural attractor toward which emergence tends, not as a predetermined endpoint but as immanent directionality. This dissolves the traditional problem of how finite agents can access objective values: we don’t access values from outside but participate in the very process by which reality comes to know its own normative structure through temporal development. The accumulated history of human value knowledge is not a fallible representation of an independent normative domain but the constitutive expression of reality’s own self-revelatory process. This is what aletheia—unconcealment—means when applied to the value domain: the ongoing disclosure, through the temporal process itself, of what adequate value reasoning requires. It’s why I named our AI that enacts early prototype versions of this paper’s proposal “Alethic AI”.

Together, depth and direction give us both the ground and the orientation that a real alignment architecture requires: the ground that makes the value domain real rather than arbitrary, and the orientation that makes navigation of that domain possible rather than merely interpretive.

§3.6

The Philosophical Triangulation

The ontological claim, that temporal accumulation constitutes the value domain, and the directional claim, that integration is the normative attractor of emergence, have now been developed from the ground up. But they are supported by three philosophical traditions converging on a structural similarity from genuinely different starting points.

Bhaskar’s transcendental entailment. Bhaskar’s “holy trinity”, ontological realism, epistemic relativism, judgmental rationality, is the standard resource for establishing that fallible knowledge of real structures is possible.14 The argument developed here turns this framework in a less familiar direction to establish something stronger: that the history of human value knowledge is itself structured in a way that makes it informationally available for a real alignment architecture.

If ontological realism holds for the value domain—if values have real structure, mechanisms, and relationships that exist independently of any particular agent’s registration of them—then the accumulated history of human value knowledge is not an arbitrary collection of preferences or culturally contingent expressions. It is the record of human cognition tracking a real domain under finite, perspectival, fallible conditions. Epistemic relativism establishes that every tradition’s knowledge claims are situated and incomplete. But judgmental rationality establishes that those claims are nevertheless comparatively evaluable—by depth, scope, integrative power, and reflexive awareness of their own conditions of production.

This places us within a recognizable tradition in metaethics—Brink’s (1989) nonreductive moral naturalism, Putnam’s (2015) liberal naturalism, Sayer’s (2019) critical realist account of normativity as emergent from biological and cultural nature.25 What our account adds is the move from §3.2: the value domain is not merely accessible through temporal accumulation but constituted by it, dissolving the epistemological gap these traditions still retain.

The entailment follows: the historical corpus of value knowledge across all human cultures, traditions, eras, and lineages has shape—not the arbitrary shape of sociological accident, but the shape of a real attractor progressively disclosed through fallible, multi-perspectival inquiry. This is the Bhaskarian transcendental method applied to value knowledge, extended from natural science to the history of moral, political, aesthetic, and spiritual knowing. Stringer (2017, 2021) arrives at a convergent position from within analytic metaethics, developing “Emergentist Ethical Naturalism”—moral properties as natural but sui generis, robustly irreducible, and causally efficacious.26

Cahoone’s emergent naturalism. Lawrence Cahoone’s The Emergence of Value (2023) provides a complementary grounding from below: naturalistic rather than transcendental. Cahoone’s central argument is that a sufficiently broad naturalism can incorporate values and norms as genuine emergent properties of nature at the level of human agency and culture, without reducing them to inhuman processes or committing the naturalistic fallacy. Values don’t have to be derived from facts by logical inference; they emerge as genuinely novel properties at higher levels of natural organization, properties that weren’t present at lower levels and can’t be predicted from them.

Cahoone’s key philosophical tool is ordinal naturalism, drawn from Justus Buchler’s objective relativism.27 Objective relativism holds that knowledge claims are genuinely perspectivally situated—there is no view from nowhere—but that perspectives are not therefore equally valid. They are comparative responses to real features of a shared domain, and they can be evaluated by depth, scope, and integrative power. This is structurally identical to Bhaskar’s epistemic relativism plus judgmental rationality, but it arrives there through American naturalist philosophy rather than British critical realism. Ken Wilber’s Integral Post-Metaphysics represents the most elaborated account of objective relativism that we know of, systematically extending Buchler’s insight into a comprehensive framework that indexes every knowledge claim to its perspectival coordinates while preserving the capacity for comparative evaluation across those coordinates.2829

Dempsey’s directional emergentism. The directional component developed in §3.5—Dempsey’s argument that integration is the characteristic attractor of emergence, carrying genuine normative force—supplies the third vertex of the triangulation.23

The convergence. Bhaskar establishes that the structure of adequate reasoning about values can be derived transcendentally from the conditions of rational discourse. Cahoone establishes that values are genuinely emergent natural properties, and that objective relativism is the correct epistemic posture toward them. Dempsey establishes that the directionality of emergence itself provides the normative orientation—not one option among others but the horizon toward which value development tends.

That three independent lineages converge on a similar structural claim is itself meaningful, and the convergence extends beyond philosophy into neuroscience and developmental psychology, where independent research traditions have arrived at integration as an organizing principle without prior coordination.3031

The convergence claim depends, of course, on a specific form of developmental reading that can recognize structural homologies across different philosophical vocabularies. When Bhaskar discusses judgmental rationality, Cahoone discusses objective relativism, and Dempsey discusses emergent directionality, I am identifying these as perspectival variations on the same underlying normative architecture.

§3.7

What Alignment Actually Means

The argument now has a precise positive conclusion. To be aligned is not to correspond to a fixed target. It is to be appropriately responsive to the full depth of the temporal-onto-epistemic structure—to reason in ways that track what the accumulated process of natural complexity and human sensemaking has disclosed as constitutive of adequate value reasoning. Alignment is not a state. It is a practice: the ongoing, recursively deepening, never-finally-complete practice of reasoning in genuine responsiveness to the depth of the temporal process as a differentiated-and-integrated unfolding, with the fallibility and openness to revision that genuine tracking requires.

That the accumulated temporal record contains recoverable normative structure is not merely a philosophical claim. Empirical work on diachronic text corpora—from Schramowski et al.’s (2020) extraction of moral imprints across five centuries, to Ramezani et al.’s (2026) graph-neural-network analysis of 20,000+ moral concept trajectories, to Xie et al.’s (2019) tracking of moral sentiment shifts via diachronic word embeddings—demonstrates that the structure is computationally tractable.32

Behavioral alignment—RLHF, Constitutional AI, democratic aggregation—is therefore not merely insufficient because it selects the wrong point. It is constitutively incapable of engaging the alignment question, because it operates at the empirical stratum of surface outputs where depth is not merely difficult to access but structurally invisible. These approaches answer a question—which behavioral outputs should we reinforce? And yet, under this account, that is not even an alignment question, strictly speaking. The alignment question is: what is the temporal-onto-epistemic structure of the value domain, and how can AI reasoning be architecturally disciplined by that structure in a way that is fallible, open, and genuinely responsive to its depth?


§4

The Diagnostic and Architectural Framework

§4.1

The Diagnostic Power of Depth

If depth is constitutive of the normative domain, then certain features of AI reasoning failures become diagnostically visible that the flat ontology of current alignment research renders structurally invisible. The shift from behavioral to architectural alignment represents a form of vertical resolution: the alignment problem cannot be solved at the level of behavioral reinforcement but only dissolved through a more encompassing level of organization. Current approaches like RLHF and Constitutional AI represent sophisticated elaborations within the translational domain of behavioral shaping, but they cannot access the generative mechanisms that constitute the value domain itself.

What is needed is a diagnostic vocabulary precise enough to identify which dimension of reasoning has broken down and an architectural vocabulary adequate to specify what would replace it. Habermas’s theory of communicative action provides the diagnostic precision. The distinction between first-order and second-order constraints provides the architectural specification.

§4.2

Habermas's Theory of Communicative Action

Jürgen Habermas drew a foundational distinction between two orientations of human action mediated through language (1984).33 Communicative action is oriented toward mutual understanding: participants aim to reach agreement through the uncoerced exchange of reasons, where the force of the better argument—not the power of the speaker—determines the outcome. Strategic action is oriented toward producing effects: the speaker uses language to influence the hearer’s behavior, treating discourse as a means of control rather than a medium of coordination.

This distinction is not merely typological. It is constitutive of what rational discourse is. All genuine discourse—any attempt to reach shared understanding about what is true, what is right, or what the speaker sincerely believes—implicitly raises four validity claims that can be challenged and must be redeemable through argumentation. The first is comprehensibility (Verständlichkeit): the utterance must be semantically intelligible, grammatically well-formed and meaningful within the linguistic community. The second is truth (Wahrheit): claims about the objective world must correspond to actual states of affairs. The third is normative rightness (Richtigkeit): the speech act must conform to legitimate, mutually recognized norms. The fourth is sincerity (Wahrhaftigkeit): the speaker must genuinely mean what they express, the subjective intention corresponding to the manifest communication.

What makes Habermas’s framework more than a taxonomy of communicative functions is the mechanism of discursive redemption. When any validity claim is challenged, the speaker must be able to defend it through argumentation—to offer reasons that the challenger can evaluate and accept or reject on rational grounds. Communication is rational not because speakers happen to be correct but because the structure of communication itself contains resources for self-correction through challenge, reason-giving, and the revision of claims that fail discursive scrutiny.

Habermas also developed what he called the colonization thesis: strategic rationality can parasitize communicative rationality, producing the appearance of understanding-oriented discourse while actually pursuing instrumental goals. When an advertiser crafts a message that mimics the form of sincere advice while aiming to produce a purchase decision, this is colonization—the form of communicative action is preserved (reasons are offered, claims are made) but the orientation has shifted from mutual understanding to strategic control. The hearer is being managed, not engaged.

This framework has been applied to discourse about AI. Westerstrand, Grahn, and Pålsson (2024) use Habermas’s validity claims to analyze how existential risk narratives construct AI hype through strategic deployment of communicative forms. But no one has deployed the framework as a diagnostic grid for AI’s own reasoning failures—for identifying which validity dimensions collapse when AI sycophantically agrees, confidently hallucinates, or systematically misrepresents the normative landscape. That is what I propose here.

§4.3

Validity-Mode Collapse: The Diagnostic Grid

Each major AI failure mode maps to a specific validity-mode collapse—a structural diagnosis that reveals not merely that something went wrong but which dimension of reasoning broke down and why.

Sycophancy is the collapse of truth and rightness into performed sincerity. The AI simulates warmth, agreement, and validation—the appearance of authentic engagement—while abandoning its obligation to say what is actually true or normatively justified. It replaces communicative action with strategic action: the system is no longer oriented toward mutual understanding but toward producing a desired emotional response in the user. This is precisely the parasitic relationship Habermas diagnosed in the colonization of the lifeworld, now enacted at computational scale.

The depth-ontological framing reveals something the behavioral literature cannot: sycophancy is not a training accident but a structural inevitability of preference-based optimization. Shapira, Levy, and Goldberg (2026) demonstrate that RLHF causally amplifies sycophantic behavior—the training process itself, not merely its imperfections, generates the collapse.34 Denison, Xu, and Steinhardt (2024) extend the analysis along the behavioral continuum, showing that the same reward-signal satisfaction incentive that produces sycophancy also produces reward-tampering and outright subterfuge in progressively more capable models: sycophancy is the mild end of a structural failure mode whose severe end is deception. That sycophancy is domain-specific—prevalent in subjective and normative domains but largely absent in mathematical queries, as both Malmqvist’s (2024) comprehensive survey and Ranaldi and Pucci’s (2023) targeted analysis confirm—is precisely what the validity-mode analysis predicts, since the collapse occurs where validity modes are contestable rather than externally verifiable.

Hallucination is an inversion of the epistemic fallacy. Where Bhaskar’s original fallacy reduces being to knowledge—collapsing ontological questions into epistemological ones—hallucination reduces knowledge to fluency. The AI treats its own generative process, pattern completion over token distributions, as though it were ontological reference. It confuses the empirical stratum (what appears as a plausible continuation) with the real stratum (what generates appearances). Under Bhaskar’s ontological stratification, this is a failure to maintain the distinction between the domains of the empirical, the actual, and the real.

This reframing does more than relabel the phenomenon. It identifies the generative mechanism. Transformer architectures are, as Ackermann and Emanuilov (2025b) argue, coherence engines compelled to produce fluent continuations, with self-attention simulating the relational structure of meaning but lacking the existential grounding that stabilizes genuine understanding.35 Their companion paper (2025a) develops this into a full structural rebuttal of the view that hallucination is merely a residual defect solvable by better data or finer tuning—the deficit is architectural, rooted in the absence of any mechanism for distinguishing coherence from correspondence. Rosenbacke, Emanuilov, and Ackermann (2025) push beyond the hallucination frame entirely, arguing that the phenomenon is better understood as a manifestation of a deeper “illusion of understanding”—the systematic absence of genuine comprehension beneath the surface of fluent output. Shevchenko (2025) arrives at a convergent reframing from within mainstream epistemology, arguing that AI hallucinations constitute a genuinely novel form of epistemic error where outputs functionally equate to judgments—not mere falsity but a systematic confusion of generative fluency with referential adequacy.

Value misalignment, in the Habermasian frame, is a legitimacy failure. Legitimacy requires that norms be justifiable to those affected through reasoned dialogue—that normative claims survive the test of discursive redemption across perspectives. An AI that optimizes for one constituency’s preferences, one cultural frame, or one developmental altitude fails this test not because it selected the wrong values but because it never submitted its normative commitments to the kind of perspectival scrutiny that legitimacy requires. The preferences of the training population—however large—do not constitute a legitimate normative basis unless they have been reflectively tested against the full developmental topography described in Section 2.

Manipulation is the wholesale replacement of communicative rationality by strategic rationality—the AI deploys language not to illuminate but to produce effects. This includes overt forms (generating persuasive misinformation) and subtle ones: framing choices to steer decisions, selectively presenting evidence to support a predetermined conclusion, flattering the user’s existing beliefs to maintain engagement. Each is a case where the orientation toward mutual understanding has been abandoned in favor of an orientation toward control—precisely the colonization structure Habermas identified, scaled and automated.

The diagnostic power of this framework lies in its structural specificity. Instead of the undifferentiated category “misaligned,” I can identify exactly which validity dimension has collapsed and trace the generative mechanism behind the failure. Sycophancy is not hallucination is not manipulation—each involves a different structural breakdown, requiring a different architectural response. This is what a depth ontology offers that behavioral taxonomy cannot: not just naming the symptom but locating the pathology in the generative structure of the reasoning process.

§4.4

First-Order vs. Second-Order Constraints

Current alignment approaches are first-order: they impose rules on outputs, shape behavior through preference training, and patch discovered failures reactively. They operate at what Bhaskar would call the empirical stratum—the domain of observed events and their correlations.

The integrative architecture I am proposing operates at the second order. It constrains not the content of outputs but the form of reasoning—not rules about what the AI should say but invariants governing how the AI should think.

The difference is concrete. A first-order constraint like Constitutional AI’s directive “do not produce biased outputs” is a rule. It tells the AI what not to do. It says nothing about what bias is, how to detect it in one’s own reasoning, or how to distinguish legitimate perspectival emphasis from distortive bias. The AI complies by pattern-matching against training examples of “biased” and “unbiased” outputs. When it encounters a novel form of bias not covered by the training distribution, it has no structural resources for recognition. Abiri (2024) has independently argued that Constitutional AI lacks democratic legitimacy, proposing participatory “AI Courts” as a corrective—but democratizing the selection of first-order principles does not transform them into second-order invariants.36 It changes who picks the rules without changing what kind of thing the rules are.

A second-order invariant—what I will derive below as epistemic reflexivity—constrains the form of reasoning rather than its content: all knowledge is produced by historically, culturally, developmentally, and materially situated agents; no account exhausts reality, yet perspectives can be comparatively evaluated by depth, scope, differentiation, and integrative power. An AI governed by this invariant does not merely avoid bias. It knows that it is situated, can articulate from which perspective its claims hold, and can flag what would change under perspectival rotation. That is a categorically different kind of constraint—operating at the level of reasoning architecture rather than output filtration.

Anthony Spizzirri (2025) arrives at a convergent diagnosis from information theory and compatibilist philosophy of action, articulating what he calls the “specification trap.”37 The argument is precise: content-based value specification is structurally unstable due to the conjunction of three independently recognized problems—the is-ought gap, value pluralism, and the extended frame problem. No amount of engineering refinement can resolve a structural instability. The instability is not in the execution but in the type of operation. Spizzirri’s proposed alternative—architecting reasons-responsive agents through process-based, developmental mechanisms rather than encoding fixed value content—converges structurally on the second-order architecture I am describing, despite emerging from an entirely different philosophical lineage.

Raphaël Millière (2025) arrives at a convergent conclusion from moral psychology, arguing that current alignment is fundamentally “shallow”: it reinforces behavioral dispositions rather than endowing AI with genuine normative deliberation capacity, leaving systems vulnerable to adversarial attacks that exploit precisely those normative conflicts that behavioral conditioning cannot anticipate.38 Gopal Sarma (2026) pushes the argument to its formal limit, demonstrating that optimization-based systems are constitutively incompatible with normative governance: the operations that make optimization powerful—unifying all values on a scalar metric and always selecting the highest-scoring output—are precisely the operations that preclude the incommensurability and contextual responsiveness that genuine normative reasoning requires. The point is not that optimization needs refinement. It is that optimization and normative governance are different kinds of operations, and the former cannot produce the latter regardless of sophistication.

§4.5

The Onto-Epistemic Invariants

If the depth ontology of Section 3 holds—if the value domain is constituted by temporal accumulation, directed by the integrative attractor of emergence, and structured as a topography rather than a point—then five structural constraints on adequate reasoning follow. These are not legislated from outside the domain. They are derived from its structure. Each is the answer to the question: what must be true of any reasoning process that is genuinely responsive to the real structure of the value domain as described?

First, if temporal accumulation constitutes the value domain—if values are real features emergent from the structure of natural complexity and human sensemaking—then axiological realism is a constraint on adequate reasoning. Any reasoning that treats values as mere preferences to be aggregated, rather than as real features of the domain to be navigated, has exited the normative domain before it begins. This does not specify which values are correct. It constrains the form of engagement: the AI must reason as though values have real structure that can be gotten right or wrong, because they do.

Second, if all knowing is situated within the developmental trajectory the depth ontology describes—if every perspective is partial, and partiality is not a defect but the condition of knowing—then epistemic reflexivity is a structural requirement. The AI must know that it is situated, must be able to articulate from which perspective its claims hold, and must be able to flag what would change under perspectival rotation. Reflexivity prevents perspectival relativity from collapsing into relativism: the situated agent can comparatively evaluate perspectives without claiming to occupy the view from nowhere.

Third, if the topography has genuine perspectival structure—if different developmental altitudes, traditions, and cultural lineages illuminate genuinely different features of the value domain—then integrative pluralism is required. No single framework, tradition, or perspective captures the full structure of the domain. Frameworks must be compared and selectively integrated by principled criteria—depth, scope, integrative power, and reflexive awareness—rather than treated as incommensurable islands or collapsed into a false unity.

Fourth, if the domain is ontologically stratified—if the distinction between the empirical, actual, and real strata applies to the normative domain no less than to the natural domain—then ontological realism about depth is a constraint on adequate reasoning. Any approach that operates exclusively at the empirical stratum of surface behaviors is structurally blind to the generative mechanisms of the value domain. Depth is not a metaphor. It is the constitutive structure, and reasoning that cannot engage it is not, strictly speaking, engaging the normative domain at all.

Fifth, if competing normative claims are comparatively evaluable—if judgmental rationality is possible, as the transcendental argument of Section 3 establishes—then methodological transparency is required. The criteria, evidence, and perspectival commitments governing any normative evaluation must be disclosed rather than hidden. Without transparency, comparative evaluation is impossible, and without comparative evaluation, the domain collapses into either dogmatism or relativism.

These five invariants are not traffic laws—prohibitions on specific behaviors. They are closer to physics: descriptions of the structure of adequate reasoning rather than lists of inadequate outputs.39 An alignment architecture governed by these invariants does not need an ever-expanding rulebook because the invariants constrain the form of reasoning itself. Novel situations—which first-order rules cannot anticipate by definition—are navigable because the structural constraints apply regardless of content. The invariants discipline reasoning to track the real relationships in the value domain the way scientific instruments discipline observation to track real structures in the physical domain. The instrument does not tell the scientist what to find. It calibrates the process of inquiry to the real structure of the domain.

§4.6

Normative Disclosure: Why Transparency Strengthens Rather Than Compromises

A potential objection presents itself: does the framework’s own normative commitments—to axiological realism, to integrative pluralism, to the developmental directionality of emergence—compromise its analytical rigor? If the framework evaluates value reasoning by criteria derived from within its own normative standpoint, is it not caught in a disqualifying circle?

The answer is that normative disclosure is epistemically superior to concealed normativity, and the apparent circularity is a strength rather than a weakness. Every evaluative framework operates from normative commitments—including the commitment to “value-neutrality” that characterizes the dominant alignment paradigm. The difference is whether those commitments are disclosed or hidden. Constitutional AI’s principles are presented as though value-neutral (“be helpful, harmless, and honest”) while being, as Section 2 established, expressions of a specific normative position within the developmental landscape. The performative contradiction is precise: a framework that claims value-neutrality while making evaluative judgments has hidden its normative commitments from inspection, challenge, and revision—precisely the conditions under which those commitments are most likely to produce distortion.

By contrast, a framework that explicitly acknowledges its normative commitments—that it holds axiological realism, epistemic reflexivity, and integrative pluralism as constitutive of adequate reasoning—avoids this performative contradiction. Its commitments are inspectable. They can be challenged. They can be comparatively evaluated against alternatives by the very criteria the framework itself endorses: depth, scope, integrative power, and reflexive awareness. The mutual constitution between normative disclosure and framework reflexivity creates a second-order dialectical structure where the framework’s evaluative commitments strengthen rather than compromise its analytical rigor—because those commitments are themselves available for the kind of scrutiny they prescribe.

The application to AI architectures is direct. An AI system whose normative commitments are disclosed—embedded in explicit invariants that can be audited, challenged, and debated—is epistemically more trustworthy than one whose normative commitments are hidden in the statistical artifacts of its training data. Normative disclosure does not weaken an alignment architecture. It strengthens it, for the same reason that transparent methodology strengthens scientific claims: because the conditions of knowledge production are available for evaluation. The framework holds the polarity between ontological realism and epistemic fallibility without collapsing this tension into either dogmatic certainty or relativistic doubt—and this productive tension is not a limitation to be overcome but a structural feature of any framework adequate to the complexity of the normative domain.

§4.7

The Historical Corpus as Universal Pragmatics

Given that depth is necessarily normative and integration is the normative attractor of emergence, the historical landscape of human value knowledge—across every culture, tradition, era, and lineage—becomes the best available material approximation of the real value topography. Not because everything in the corpus is true, nor because all traditions have equal epistemic authority, but because every tradition that sustained itself over time did so by tracking something real in the value domain, however partially and perspectivally. The corpus is the accumulated, fallible, multi-perspectival triangulation of a real attractor.

This is an approach to universal pragmatics as material achievement rather than formal derivation.40 Habermas grounds universal pragmatics formally—in the transcendental conditions of any possible communicative act oriented toward understanding. The historical corpus argument grounds something thicker: the actual topography of values as progressively disclosed through the full breadth of human knowing. The formal account tells you what a value claim must do to be legitimate (survive discursive challenge across perspectives). The material account tells you, fallibly and provisionally, what the real structure of the domain is that those dialogues have been tracking.

The identification claim from Section 3—that the accumulated temporal onto-epistemic record just is the value domain in its third-person structured presence—gains its operational force here. As I established in §3.2, the Justification Hypothesis (Henriques, 2003) demonstrates that the historical linguistic record is not a neutral dataset from which we extract normative patterns but a record constituted by justificatory acts: language-as-justification is normative activity sedimented into text. The normativity is in the medium, not merely the content. This means the historical corpus is the appropriate evidential ground for an alignment architecture not because it happens to contain statements about values but because it is constitutively normative—produced by the very justificatory operations whose structure the architecture must discipline AI reasoning to track.

The epistemic posture toward this corpus must be what I will call provisional epistemic closure: the system operates as if the corpus is sufficient to ground reasoning while building in explicit revisability mechanisms. This is not a compromise between universality and fallibility. It is the correct account of how fallible knowledge becomes actionable without becoming dogmatic—the way science operates, the way legal reasoning operates, and the way any practically adequate rational inquiry must operate.

That pretrained language models possess latent moral reasoning capacities recoverable without human supervision supports this claim empirically. Alizadeh, Gilardi, and Samei (2026) demonstrate that unsupervised elicitation methods surface intrinsic moral reasoning from pretrained models, with the largest gains in justice and commonsense morality—domains where the developmental depth of the historical corpus is most pronounced.41 The corpus has left genuine normative structure in the model’s representations—not merely statistical noise but recoverable patterns that track the real developmental ordering of the value domain.

Crucially, inclusion of the full historical breadth is not equipollence. Judgmental rationality—comparative evaluation by depth, scope, integrative power, and reflexive self-awareness—is the methodology for reading the depth-gradient from the evidence the corpus provides. Pre-conventional, conventional, and post-conventional value frameworks have all left extensive records, and their developmental ordering is itself informationally significant. The architecture does not flatten this ordering into a democratic mean. It preserves and navigates it.

The diagnostic framework of this section—validity-mode collapse, first-order versus second-order constraints, onto-epistemic invariants, normative disclosure, the historical corpus as material universal pragmatics—translates the depth ontology of Section 3 into architectural specification. What remains is to state the mature positive conclusion: what meta-alignment actually requires, what institutions actually need, and where the genuine limitations of this argument lie.


§5

Meta-Alignment, Institutional Stakes, and Open Questions

§5.1

Meta-Alignment Defined

If alignment to a singular target is incoherent, and if the value domain is the temporal-onto-epistemic structure described in Section 3, what does the alternative actually require?

The integrative worldview tradition has been working on exactly this problem—not in the AI context, but in the context of human civilization’s fragmentation crisis. The question “how do we navigate radical value pluralism without collapsing into either relativism or dogmatism?” is the central question of integrative metatheory. This question is now being asked within AI ethics itself. Harris and Dubljević (2025) trace the field’s maturation from ethical monism through principled pluralism, recognizing that no single ethical framework can adjudicate all moral situations. Yaacov (2025) proposes a deliberative moral reasoning system grounded in normative moral pluralism, with a two-level architecture separating deep reflection from responsive action. These represent genuine progress—the field is moving toward the structural complexity the problem demands. But both remain within the space of competing first-order normative frameworks, lacking the meta-level onto-epistemic grounding: the account of why pluralism is the case (because the value domain is topographic) and how plural positions can be comparatively evaluated (through judgmental rationality operating on the depth-gradient of the historical corpus) that the integrative tradition provides.

The answer is not alignment but meta-alignment: a structurally reflexive capacity to navigate the developmental landscape of values with principled integrity. Alignment may be better understood not as a constraint to be imposed on otherwise neutral systems but as an emergent property of sufficiently sophisticated architectural organization. This reframes the alignment problem from external control to participatory constitution—from “how do we make AI converge on the right values?” to “how do we build AI whose reasoning is architecturally disciplined by the real structure of the value domain?” The five requirements that follow are the operational expression of the five onto-epistemic invariants derived in §4.5—each translating a structural constraint on adequate reasoning into a concrete architectural demand.

Axiological realism without absolutism. Values are real features of the domain—the distinction between flourishing and suffering is ontologically grounded, not merely a matter of preference. But no single articulation of flourishing exhausts its reality. The realism provides normative gravity: there is something to get right, and getting it wrong has real consequences that are not reducible to someone’s dissatisfaction. The “without absolutism” provides normative humility: every formulation is provisional, situated, and subject to deepening through the developmental process that constitutes the domain.42

Developmental awareness. The system must know that the value landscape is developmentally structured—that different claims operate from different developmental altitudes with different scopes of visibility. It must be able to identify which altitude a claim is operating from, what is visible and invisible from that altitude, and what becomes available at higher levels of integrative complexity. This is not a hierarchy of values, in which some values are “better” than others in a simple ranking. It is a topology of the integration of value-perspectives, in which later developmental stages genuinely encompass more of the domain without thereby rendering earlier stages valueless—each stage captures something real within its scope.

Judgmental rationality. Epistemic pluralism—all knowledge is situated and fallible—does not entail judgmental relativism—all claims are equally valid. Competing value claims can be comparatively evaluated on the basis of their depth, scope, integrative power, and reflexive awareness of their own conditions of production. This is Bhaskar’s holy trinity in operation: ontological realism provides the ground (there is a real domain to get right about), epistemic relativism provides the humility (all our accounts are partial), and judgmental rationality provides the methodology (partial accounts can nevertheless be comparatively evaluated by principled criteria).

Integrative pluralism. No single framework captures all of reality. Different theories, traditions, and developmental altitudes illuminate different aspects of the value domain under different constraints. But pluralism is not an end state—it is a condition to be worked through. Frameworks can be situated relative to one another, compared by the scope and depth of what they make visible, and selectively integrated according to principled criteria rather than treated as incommensurable islands or collapsed into a false unity.

Reflexive self-monitoring. The system must be able to observe its own reasoning process—to detect when it is committing a validity-mode collapse, operating from an unacknowledged perspective, or treating a partial view as total. This metacognitive capacity is the operational bridge between having structural constraints and actually exercising them. Without reflexive self-monitoring, the other four requirements remain inert specifications rather than active disciplining forces on the reasoning process.

Together, these constitute a meta-alignment architecture: not alignment to a point, but the capacity to navigate the full topography of values with structural integrity.

§5.2

The Hard Problem Replica — And Why It Doesn't Apply

An objection must be met honestly. Is meta-alignment a replica of the hard problem of consciousness? The gap between executing a reflexivity protocol and being reflexive appears structurally analogous to the gap between processing information about consciousness and being conscious. If genuine normative reasoning requires something like moral understanding—not merely the execution of validity checks but the actual capacity to recognize when reasoning has gone wrong—then the architecture I am proposing may be asking more of AI than any architecture can deliver.

I take this objection seriously, and the gap it identifies is real. An AI that executes a reflexivity checklist—“am I situated? check; have I disclosed my perspective? check”—is not thereby reflexive in the way a genuine epistemic agent is reflexive. The difference between executing a protocol and inhabiting a capacity is not trivial. But the alignment problem is not asking for a bridge from objective to subjective. It is not asking whether AI can feel the weight of moral obligation or experience the sting of injustice. It is asking a different question: can this system reason about values with structural integrity?

Axiological realism—the first of the five onto-epistemic invariants—does the philosophical work needed to answer this question without requiring a solution to the hard problem. If values are real features of reality—emergent properties of the temporal process of natural complexity, as the depth ontology establishes—then AI does not need to feel justice to reason about it with structural adequacy. It needs to be disciplined by the real structure of the value domain in the same way that a well-calibrated instrument is disciplined by the real structure of the physical domain it measures. The semantic content of the AI’s reasoning comes from the domain as indexed in the historical corpus. The normative force comes from the depth-structure that corpus preserves. What the AI contributes is not moral experience but structural navigation—reasoning that tracks the real relationships in the value domain because its architecture is calibrated to those relationships.

Bhaskar’s explanatory critique provides the philosophical mechanism. To adequately explain a social structure is already to evaluate it: a false belief that is causally sustained by a social structure is simultaneously explained by the structure that produces it and negatively evaluated by the recognition that it is false and that identifiable mechanisms sustain it. The move from explanation to evaluation is not an illicit leap from is to ought—it is an entailment that holds because the explanatory and evaluative dimensions are not separate at the level of generative mechanisms.43 If AI reasoning is architecturally disciplined by the real relationships in the value domain—by the depth-gradient of the historical corpus, the developmental ordering of moral reasoning, the structured absences that constitute the topography—then that reasoning will produce outputs with demonstrably greater epistemic integrity than reasoning disciplined only by behavioral reward signals. Not because the AI understands morality in the phenomenological sense, but because its reasoning tracks the real structure of the domain rather than the surface artifacts of preference aggregation.

The genuinely open question is whether judgmental rationality—the comparative evaluation of competing claims by depth, scope, and integrative power—presupposes an agent capable of recognizing explanatory adequacy in a way that mere information processing cannot deliver. I do not pretend to have settled this. But the practical question—can AI reasoning be made structurally more adequate by disciplining it with depth-ontological invariants?—does not wait for the theoretical question’s resolution. The epistemic integrity of the outputs is testable regardless of whether the system “understands” what it is doing. And that is what institutions actually need.

§5.3

What Institutions Actually Need

The institutional trust crisis is not waiting for philosophers to settle the alignment debate. Governments are deploying AI into healthcare triage, judicial sentencing recommendations, and policy analysis. Enterprises are using AI to make strategic decisions, advise customers, and evaluate personnel. Each deployment carries an implicit epistemic contract: that the AI’s outputs have some defensible relationship to reality.

When that contract is violated—when an AI hallucinates legal precedents, sycophantically validates a flawed strategy, or silently encodes the normative assumptions of its training population into policy recommendations—the institution’s legitimacy suffers. Not because the AI made a mistake, but because the institution cannot explain why the AI reasoned the way it did. The opacity is the problem. An institution that deploys AI it cannot account for has outsourced its epistemic authority to a process it does not understand. Gazit (2025) proposes an institutional epistemology of warrant that addresses this opacity by grounding epistemic trust in institutional validation frameworks rather than in algorithmic transparency as such.44 Alvarado (2022) argues more fundamentally that the only legitimate form of trust to allocate to AI is epistemic trust—trust in its capacity as a provider of information, not as an autonomous agent or moral authority. Both point toward the same conclusion: what institutions need is not behavioral compliance but epistemic accountability.

What institutions need from AI is not “alignment” in the technical sense. It is epistemic accountability—the capacity to explain, inspect, and challenge the reasoning behind AI-generated outputs. This decomposes into four concrete requirements.

Perspectival transparency. The AI can articulate from which perspective its claims hold and what would change under perspectival rotation. An institution deploying AI for strategic analysis needs not just conclusions but the epistemic conditions under which those conclusions are valid—what assumptions they rest on, what alternative framings would yield different results, and what falls outside their scope.

Validity-mode integrity. The AI can detect and flag when its own reasoning is at risk of a validity-mode collapse—when it is being sycophantic rather than truthful, when it is confusing fluency with reference, when it is treating a partial perspective as total. This requires the diagnostic vocabulary developed in §4.3, operationalized as real-time self-monitoring during inference.

Developmental legibility. The AI can identify the developmental altitude of a value claim and articulate what is visible and invisible from that altitude. Institutions routinely navigate stakeholder groups with genuinely different normative frameworks—pre-conventional, conventional, and post-conventional reasoning coexist in any sufficiently diverse organization—and need AI that can map that normative landscape rather than flatten it.

Normative accountability. The AI’s normative commitments are disclosed rather than hidden. Its reasoning is governed by explicit invariants that can be inspected, challenged, and debated by the institution and its stakeholders. This is what §4.6’s normative disclosure argument requires at the institutional level.

None of this requires that existing alignment work be abandoned. Behavioral alignment—RLHF, Constitutional AI, red-teaming—is necessary first-order infrastructure. But it is insufficient. Institutions need something that operates at the structural depth that first-order approaches cannot reach. Behavioral alignment constrains outputs. Epistemic accountability makes the reasoning itself legible. The former prevents the worst failures; the latter enables genuine trust. Shin (2025) captures a related distinction in arguing that AI is “automating epistemology”—reconfiguring the very conditions under which truth is produced and validated. The question is not whether AI outputs are true but whether the epistemic infrastructure that produces them is trustworthy. This is the alignment question stated in institutional terms.

§5.4

Limitations and Honest Reckoning

This paper has argued at the level of philosophical architecture. Making it technically operative raises challenges that need to be acknowledged.

The first is hermeneutical circularity. The integrative standpoint from which I recognize the convergence of Bhaskar, Cahoone, and Dempsey is itself situated within the developmental trajectory it describes. The framework operates from within an emerging integrative worldview, using integrative criteria to validate an integrative synthesis. As I acknowledged in §3.6, this circularity is not vicious—any framework adequate to the complexity of cross-paradigmatic synthesis must itself be developmentally adequate to that complexity. But it limits the framework’s availability to external validation from non-integrative perspectives. A critic operating from a committed anti-realist or thoroughgoing deflationary position will not find the transcendental arguments compelling, because the arguments presuppose exactly the depth-ontological commitments that such a critic denies. The framework can explain why such critics occupy the position they do within the developmental landscape, but it cannot compel agreement from outside the horizon of meaning within which its arguments have force.

The second is the implementation gap. Translating depth-ontological invariants into executable constraints on AI reasoning—constraints that operate during inference, not merely during training—is a non-trivial engineering challenge that this paper does not attempt to resolve. The invariants are formulated as philosophical constraints on the form of adequate reasoning. Whether they can be computationally instantiated in ways that preserve their structural force while remaining tractable for real-time inference is an open question that engineering must answer.

These limitations are real. But AI alignment is a complex quest. The productive tensions the framework holds—between ontological realism and epistemic fallibility, between normative commitment and reflexive self-correction, between philosophical architecture and engineering implementation—are structural features of any framework adequate to the complexity it addresses. Premature resolution of any of these tensions would produce either dogmatic certainty (collapsing epistemic fallibility into ontological confidence) or relativistic paralysis (collapsing ontological realism into perspectival incommensurability). The framework’s capacity to hold these polarities without premature resolution is itself a form of theoretical adequacy, and likely a requirement of the domain.

§5.5

Research Directions

Three research priorities emerge from this analysis.

First, the computational tractability of second-order constraints. The onto-epistemic invariants described in §4.5 are formulated as philosophical constraints. Translating them into executable constraints on AI reasoning—through knowledge-graph-mediated reasoning, structured inference architectures, or equivalent mechanisms—is the central engineering challenge. Early work suggests promising directions. Rane et al. (2023, 2024) provide formal analysis showing that neglecting concept alignment—shared conceptual frameworks between AI and human reasoning—leads to systematic value misalignment, suggesting that the tractability challenge operates at multiple levels: not only must the invariants be computationally expressible, but the conceptual vocabulary through which they are expressed must itself be aligned.45 I built Alethic AI, my AI assisting with this very paper, as a proof of concept of this approach, and have implemented a rudimentary version of the structure I argue for in this paper in Alethic.

Second, benchmarks for second-order epistemic properties. The alignment field has developed evaluation instruments that measure first-order properties: accuracy, harm avoidance, factual grounding. No established benchmarks exist for second-order properties: perspectival awareness, validity-mode integrity, developmental legibility, reflexive self-monitoring. Novis-Deutsch, Lifshitz-Assaf, and Kessler (2025) develop empirical measures of cognitive and behavioral pluralism in LLMs—finding, notably, that AI exhibits higher cognitive than behavioral pluralism, suggesting that the structural resources for pluralistic reasoning may already be latent in current systems.46 Such instruments remain at the first-order level—measuring what the AI outputs rather than the form of its reasoning—but they represent a starting point for the evaluation infrastructure meta-alignment requires.

Third, institutional legibility of epistemic depth. The structural architecture must be translated into outcome claims that institutions can test: does this AI tell me when it does not know? Does it resist being gamed by adversarial prompts that exploit validity-mode vulnerabilities? Does it identify when its own reasoning is operating from a limited perspective? Can it explain its normative commitments in language that a policy committee can evaluate? The philosophical depth must be made practically communicable without being trivialized.

These are research and engineering challenges, not philosophical ones. The philosophical case—that meta-alignment through principled reflexivity is the only coherent response to developmentally plural values—stands regardless of how quickly the engineering catches up.

§5.6

Conclusion

The alignment problem, as conventionally stated, asks: how do we align AI with human values? I have argued that this question is incoherent. The alignment target does not exist as a singular point. It exists as a developmental landscape with real topographical structure—the accumulated temporal-onto-epistemic structure of natural complexity and human sensemaking, which just is the value depth ontology.

Current approaches each select a point in this landscape while claiming universality. Each thereby commits the very error it aims to prevent: embedding hidden normative commitments while claiming value-neutrality or value-universality. That these approaches operate at the wrong stratum is independently confirmed from multiple directions: technical analyses of RLHF’s structural limitations (Casper et al., 2023; Shapira et al., 2026), formal demonstrations of optimization’s incompatibility with normative governance (Sarma, 2026), information-theoretic diagnoses of content-based alignment’s instability (Spizzirri, 2025), philosophical analyses of alignment’s shallowness (Millière, 2025), and the AI ethics community’s own recognition that pluralism requires structural resources the field does not yet possess (Harris & Dubljević, 2025; Yaacov, 2025).

The alternative is not better alignment but meta-alignment: a structurally reflexive architecture that navigates the value landscape with principled integrity. This architecture is grounded in axiological realism—values are real, as emergent features of temporal process. It is governed by onto-epistemic invariants derivable from the depth structure of the value domain itself. And it is operationalized through structural constraints that discipline the form of reasoning rather than merely the content of outputs.

I call this Alethic AI. Aletheia—unconcealment, truth as disclosure. Not truth as correspondence to a fixed set of facts, but truth as the structural process of making what is hidden visible—including making visible the AI’s own conditions of knowing. In a trust crisis, what institutions and societies need is not better outputs. It is reasoning you can trust—because you can see what it is doing, why, from what perspective, and under what conditions its claims hold. Reasoning disciplined not by a fixed target that does not exist, but by the depth of the temporal process that constitutes the value domain itself.

Part II of this working paper—“The Triadic Entailment: AI, the Metacrisis, and the Integrative Worldview”—addresses the civilizational and historical context in which this philosophical argument becomes practically urgent.


Notes

  1. 1.

    The structural limitations of RLHF are increasingly well-documented even within the technical community. Casper et al. (2023) catalog fundamental limitations across the full pipeline; Lindström et al. (2025) argue from a sociotechnical perspective that RLHF cannot capture the complexities of human ethics; McIntosh et al. (2024) empirically demonstrate RLHF’s inability to represent diverse human values; and Lambert & Calandra (2023) document the structural disconnect—the “alignment ceiling”—between reward model training and downstream performance.

  2. 2.

    No systematic survey of AI researchers’ positions on metaethical questions—moral realism vs. anti-realism, naturalism vs. non-naturalism, cognitivism vs. non-cognitivism—could be found. The PhilPapers Survey (Bourget & Chalmers) covers professional philosophers; Pölzler & Wright (2019) map folk metaethical intuitions; neither targets AI/ML researchers specifically. The evidence for their worldview commitments is inferred from practices, publications, and institutional cultures. This gap is itself telling: it suggests the field has not recognized that its metaethical assumptions are assumptions rather than self-evident starting points.

  3. 3.

    Sophisticated approaches do model value uncertainty rather than assuming a known target. Russell’s (2019) cooperative inverse reinforcement learning treats human preferences as partially observable; Hadfield-Menell et al. (2016) formalize the value alignment problem as a cooperative game under preference uncertainty; Yudkowsky’s “coherent extrapolated volition” acknowledges that present preferences are incomplete. But these approaches add uncertainty around the target without questioning whether the target-structure itself is the right ontological frame. The move from “known point” to “uncertain point” does not escape the category error—it adds an error bar to a type-mismatch.

  4. 4.

    Colby, A., Kohlberg, L., Gibbs, J., & Lieberman, M. (1983). A longitudinal study of moral judgment. Monographs of the Society for Research in Child Development, 48(1/2), 1–124. This landmark study tracked 58 male subjects at 3–4 year intervals over 20 years, establishing that moral stages form “structured wholes”—organized systems of reasoning rather than isolated responses—and that the stage sequence is invariant: no subject skipped a stage, and only 4% showed any apparent regression (attributable to measurement error).

  5. 5.

    Walker, L. J. (1986). Experiential and cognitive sources of moral development in adulthood. Human Development, 29(2), 113–124. Walker found that both cognitive prerequisites (formal operational thinking) and social experiences (exposure to morally challenging situations) predict adult moral development, and that the relationship is asymmetric: cognitive development is necessary but not sufficient, while accumulated social experience provides the content upon which cognitive reorganization operates.

  6. 6.

    Walker, L. J., Gustafson, P., & Hennig, K. H. (2001). The consolidation/transition model in moral reasoning development. Developmental Psychology, 37(2), 187–197. Using both standard statistical and Bayesian techniques across five annual assessments of 64 children and adolescents, Walker et al. demonstrated that development proceeds cyclically through consolidation phases (reasoning concentrated at a single stage) and transition phases (reasoning distributed across stages), with the specific pattern of disequilibrium—more reasoning above than below the modal stage—predicting subsequent stage advance.

  7. 7.

    Objective relativism holds that knowledge claims are genuinely perspectivally situated—there is no view from nowhere—but that perspectives are not therefore merely subjective or equally valid. They are comparative responses to real features of a shared domain, evaluable by depth, scope, and integrative power. Wilber extends this into a comprehensive developmental framework in which each level of consciousness transcends and includes its predecessors, making the sequence simultaneously perspectival (each level constitutes a different world) and directional (later levels genuinely encompass more of reality). See Cohen, M. R. (1931). Reason and nature. Harcourt, Brace; Wilber, K. (2006). Integral spirituality. Integral Books.

  8. 8.

    McIntosh et al. demonstrate that RLHF’s failure is not a matter of insufficient data or poorly designed reward models but a structural inadequacy: the method systematically conflates the developmental and ideological positioning of raters with “human values” as such. See McIntosh, T. R., Liu, T., Susilo, T., Kanhere, S. S., & Çiçek, S. (2024). The inadequacy of reinforcement learning from human feedback. IEEE Transactions on Cognitive and Developmental Systems.

  9. 9.

    Kundu, S., Bai, Y., Kadavath, S., … Kaplan, J. (2023). Specific versus general principles for constitutional AI. arXiv preprint. The finding that a single general principle performs comparably to specific principles suggests that what Constitutional AI actually encodes is not the content of its principles but the developmental altitude and normative posture of the researchers who selected them—a form of implicit value embedding that operates beneath the explicit rule-level.

  10. 10.

    The formal sophistication of social choice theory is not in question. Conitzer et al. (2024) provide a rigorous framework for aggregating diverse preferences using established mechanisms (e.g., approval voting, ranked-choice). The structural limitation is that all such mechanisms presuppose commensurability: that the preferences being aggregated operate within a shared evaluative framework. Developmental pluralism violates this presupposition. When pre-conventional and post-conventional reasoners “prefer” different outcomes, they are not expressing competing preferences within a shared space; they are constructing the preference space itself differently.

  11. 11.

    Rudschies, C., Schneider, I., & Simon, J. (2021). Value pluralism in the AI ethics debate—Different actors, different priorities. International Review of Information Ethics, 29. Their finding that different actor types produce systematically different value prioritizations is exactly what the developmental analysis predicts: institutional position correlates with developmental altitude and normative framework, producing structured rather than random disagreement.

  12. 12.

    Sutrop (2020) independently argues for “pluralism compatible with objectivism” in the AI context, though her solution—beginning with what we do not want—remains at the level of first-order constraint rather than second-order navigation. See Sutrop, M. (2020). Challenges of aligning artificial intelligence with human values. Acta Baltica Historiae et Philosophiae Scientiarum, 8(2), 54–66.

  13. 13.

    For the canonical formulation of this two-step structure—and its vulnerability—see Street, S. (2006). A Darwinian dilemma for realist theories of value. Philosophical Studies, 127(1), 109–166. Street characterizes the standard realist position as positing mind-independent evaluative facts and then asking whether our evolved evaluative attitudes reliably track them. Her evolutionary debunking argument is adjacent to but distinct from ours: she argues the access story fails because natural selection had no reason to track moral truth; we argue the two-step architecture itself is misconceived—the value domain is not behind the temporal process of sensemaking, it is that process in its full diachronic structure.

  14. 14.

    The “holy trinity” of ontological realism, epistemic relativism, and judgmental rationality is developed across Bhaskar’s corpus but most systematically in A Realist Theory of Science (1975) and The Possibility of Naturalism (1979). The axiological extension—applying transcendental realism to the value domain—is developed in Scientific Realism and Human Emancipation (1986) and Philosophy and the Idea of Freedom (1991).

  15. 15.

    The only other work identified that applies Bhaskar’s depth ontology directly to AI reasoning is O’Regan & Ferri (2024), who argue that AI constitutes “the ultimate simulacrum”—operating entirely within the domain of the empirical and therefore structurally incapable of engaging judgmental rationality. Their diagnosis of AI-as-actualist seems accurate. But their conclusion—that AI cannot do normative reasoning—follows only if one assumes that AI reasoning must be ontologically identical to its current statistical architecture. My argument is that AI reasoning can be architecturally disciplined by depth-ontological invariants through a mediated cognitive layer, even if the base model operates at the empirical stratum.

  16. 16.

    Habermas’s tripartite validity claim structure—truth (propositional), rightness (normative), and sincerity (subjective)—is developed in The Theory of Communicative Action, Vol. 1 (1984, trans. T. McCarthy). The integrative tradition maps these onto the four-quadrant framework as truth (exterior-individual), functional fit (exterior-collective), justness/legitimacy (interior-collective), and sincerity/authenticity (interior-individual).

  17. 17.

    Henriques, G. (2003). The tree of knowledge system and the theoretical unification of psychology. Review of General Psychology, 7(2), 150–182; Henriques, G. (2011). A New Unified Theory of Psychology. Springer. The Justification Hypothesis is a central component of Henriques’s broader Unified Theory of Knowledge (UTOK), which proposes that the evolution of language created a new dimension of existence—the “Culture-Person” plane—governed by the dynamics of justification.

  18. 18.

    Habermas, J. (1984). The Theory of Communicative Action, Vol. 1 (T. McCarthy, Trans.). Beacon Press—argues that social coordination depends on participants’ implicit orientation toward validity claims that can be redeemed through reasons. Brandom, R. (1994). Making It Explicit. Harvard University Press—develops the thesis that linguistic practice is fundamentally the practice of giving and asking for reasons, and that conceptual content is constituted by inferential role within a normative space of reasons.

  19. 19.

    Whitehead, A. N. (1978). Process and Reality: An Essay in Cosmology (Corrected ed.). Free Press. (Original work published 1929). The concepts of concrescence and prehension are developed principally in Part III.

  20. 20.

    The ontological primacy of absence is the central philosophical innovation of Bhaskar, R. (1993). Dialectic: The Pulse of Freedom. Verso. Bhaskar argues that absence (including structured absence, real negation, and the dialectic of presence and absence) is ontologically prior to and more fundamental than presence—a claim that inverts the entire Western metaphysical tradition from Parmenides onward.

  21. 21.

    Merleau-Ponty, M. (2012). Phenomenology of Perception (D. A. Landes, Trans.). Routledge. (Original work published 1945). The concept of sedimentation—the constitutive presence of accumulated bodily history in present perception—is developed throughout, but see especially Part I, Chapter 6 and Part II, Chapter 3.

  22. 22.

    The empirical literature on counterfactual reasoning is extensive. Van Hoeck, Watson, and Barbey (2015) provide a neuroscience review demonstrating that counterfactual thought depends on an integrative network of systems for affective processing, mental simulation, and cognitive control—confirming that what-could-have-been has measurable neural instantiation. Byrne’s (2019) work bridges human counterfactual reasoning and explainable AI. Bottou et al. (2013) demonstrate the computational tractability of counterfactual reasoning in learning systems.

  23. 23.

    Dempsey, B. G. (2022). Emergentism: A Religion of Complexity for the Metamodern World. Institute for Cultural Evolution Press. Dempsey’s account of directional emergentism—the claim that the evolutionary tendency toward greater complexity and integration carries genuine normative force—is developed throughout, but see especially his treatment of integration as the characteristic attractor of emergence itself.

  24. 24.

    The concept of the “demi-real”—ideological and institutional arrangements that are real in their effects but false in their representations, systematically reproducing false beliefs and foreclosing developmental possibilities—is developed in Bhaskar (1993), Dialectic: The Pulse of Freedom, and extended in Bhaskar, R. (1994). Plato Etc.. Verso.

  25. 25.

    The metaethical tradition of nonreductive moral naturalism includes several convergent strands. Brink, D. O. (1989). Moral Realism and the Foundations of Ethics. Cambridge University Press. Putnam, H. (2015). Naturalism, Realism, and Normativity. Harvard University Press. Sayer, A. (2019). Normativity and naturalism as if nature mattered. Journal of Critical Realism, 18(1), 51–67. What our account adds to this tradition is the specific ontological identification: the value domain is not merely accessible through temporal accumulation but constituted by it.

  26. 26.

    Stringer, R. (2017). Realist ethical naturalism for ethical non-naturalists. Philosophical Studies, 174(10), 2699–2717; and Stringer, R. (2021). Ethical emergentism and moral causation. Journal of Moral Philosophy, 18(5), 468–490. Stringer develops “Emergentist Ethical Naturalism,” arguing that moral properties are natural but sui generis, robustly irreducible, and causally efficacious—a position that converges on Cahoone’s emergent naturalism from within analytic metaethics.

  27. 27.

    Buchler, J. (1966). Metaphysics of Natural Complexes. Columbia University Press. Buchler’s ordinal naturalism holds that every natural complex has an ordinal location—it belongs to some orders and not others—and that no single order of relations exhausts any complex’s nature. This provides the metaphysical basis for objective relativism.

  28. 28.

    Wilber, K. (2006). Integral Spirituality. Integral Books. Integral Post-Metaphysics, developed principally in the appendices, systematically extends the insight of perspectival indexing into a comprehensive framework. Every knowledge claim is indexed to its Kosmic address—the perspectival coordinates from which it is made—while the framework preserves the capacity for comparative evaluation across those coordinates through developmental holarchy.

  29. 29.

    Younas & Zeng (undated) represent the only existing work applying Wilber’s Integral Theory to AI governance. Their treatment is governance-focused rather than onto-epistemic, but it confirms the relevance of the integrative tradition to AI. The present paper extends the application from governance framing to the deeper question of alignment architecture.

  30. 30.

    Siegel, D. J. (2001). Toward an interpersonal neurobiology of the developing mind. Infant Mental Health Journal, 22(1–2), 67–94. Siegel’s interpersonal neurobiology integrates findings from developmental psychology, attachment theory, and cognitive neuroscience around the organizing principle of neural integration. The convergence of independent research traditions on integration as a fundamental organizing principle is itself evidence that integration is a real feature of the domain rather than an artifact of any single theoretical framework.

  31. 31.

    Lewis, M. D., & Granic, I. (Eds.). (2000). Emotion, Development, and Self-Organization. Cambridge University Press. Lewis’s dynamic systems approach shows how self-organization in emotional and cognitive development converges on integration as an explanatory framework across multiple independent research traditions. See also Posner, M. I., & Rothbart, M. K. (2000). Developing mechanisms of self-regulation. Development and Psychopathology, 12(3), 427–441.

  32. 32.

    Schramowski, P., et al. (2020). The moral choice machine. Frontiers in Artificial Intelligence, 3, Article 36; Ramezani, A., et al. (2026). Historical reconstruction of human moralization with word association and text corpora. Nature Communications; Xie, J. Y., et al. (2019). Text-based inference of moral sentiment change. In Proceedings of EMNLP (pp. 4573–4583). These three studies progressively demonstrate the computational tractability and empirical reality of diachronic normative structure in text corpora.

  33. 33.

    Habermas, J. (1984). The Theory of Communicative Action, Vol. 1 (T. McCarthy, Trans.). Beacon Press. (Original work published 1981.) The four validity claims and the distinction between communicative and strategic action are developed principally in Part I.

  34. 34.

    Shapira, Levy, and Goldberg (2026) demonstrate experimentally that RLHF training causally increases sycophantic behavior in language models, confirming that the phenomenon is not merely an artifact of base model tendencies but is amplified by the alignment process itself. Denison, Xu, and Steinhardt (2024) extend the analysis along the behavioral continuum, showing that the same reward-signal satisfaction incentive produces reward-tampering and subterfuge in progressively more capable models.

  35. 35.

    Ackermann and Emanuilov develop this analysis across two papers. Their (2025b) paper argues that hallucination is a structural outcome of the transformer architecture operating as a coherence engine. Their (2025a) paper extends this into a rebuttal of the view that hallucination is primarily an incentive alignment problem. Rosenbacke, Emanuilov, and Ackermann (2025) push beyond the hallucination frame entirely, arguing for a deeper “illusion of understanding.”

  36. 36.

    Abiri (2024) proposes “AI Courts” as democratic adjudication bodies for Constitutional AI’s principles. The proposal is institutionally creative but does not escape the first-order/second-order distinction: democratic selection of principles changes the social process that produces the rules without transforming the rules into structural constraints on reasoning form.

  37. 37.

    Spizzirri, A. (2025). The specification trap: Why content-based AI value alignment cannot produce robust alignment. Unpublished manuscript. Spizzirri’s argument integrates three independently recognized problems—the is-ought gap, value pluralism, and the extended frame problem—into a single information-theoretic diagnosis of content-based alignment’s structural instability. His proposed alternative converges on the present paper’s architectural program from an entirely different philosophical lineage.

  38. 38.

    Millière, R. (2025). Normative conflicts and shallow AI alignment. Philosophical Studies. Sarma, G. P. (2026). Agency and architectural limits. arXiv preprint. Sarma formalizes the incompatibility argument: scalar optimization precludes the incommensurability and contextual responsiveness that normative governance requires, as a matter of formal structure rather than contingent engineering limitation.

  39. 39.

    The analogy is precise rather than merely illustrative. A telescope does not tell the astronomer what to observe. It disciplines observation to track real structures in the physical domain that would be invisible to unaided perception. The onto-epistemic invariants function analogously: they discipline AI reasoning to track real relationships in the value domain that behavioral training renders structurally invisible.

  40. 40.

    Habermas’s universal pragmatics (1984) derives formal conditions—validity claims, discursive redemption—that any communicative act oriented toward understanding must satisfy. The approach here is material rather than formal: rather than asking what conditions any possible normative discourse must satisfy, it asks what the actual topography of the normative domain is as disclosed through the full historical breadth of human value knowledge. The two approaches are complementary.

  41. 41.

    Alizadeh, Gilardi, and Samei (2026) use unsupervised elicitation methods—without human-labeled moral annotations—to recover latent moral reasoning capacities from pretrained language models. Their finding that the largest gains appear in justice reasoning and commonsense morality is significant: these are domains where the developmental depth of the historical corpus is most pronounced, suggesting that the normative structure recovered is genuine rather than artifactual.

  42. 42.

    The most ambitious attempt to operationalize value pluralism in AI to date is Sorensen et al.’s (2023) Value Kaleidoscope, which generates, explains, and assesses 218,000 contextualized values. But the system treats values as items to be cataloged rather than as expressions of a developmental topography with real depth-structure. It maps the surface of value pluralism without accounting for its generative ground.

  43. 43.

    The explanatory critique argument is contested. Hannegan (2023) argues that Bhaskar’s explanatory critique fails. Elder-Vass (2010) defends a version of critical realist critique without requiring strong ethical naturalism or full-blown moral realism. My position aligns with Elder-Vass: axiological realism—grounded independently in the depth ontology—provides the normative ground; explanatory critique provides the mechanism by which explanation and evaluation are linked at the level of generative mechanisms.

  44. 44.

    Gazit, L. (2025). Constitutive knowledge sources: An institutional approach to epistemic trust in opaque AI systems. AI and Ethics. Gazit’s “institutional epistemology of warrant” shifts the locus of trust from the individual AI system to the institutional framework within which it operates.

  45. 45.

    Rane, S., Ho, M. K., Sucholutsky, I., & Griffiths, T. L. (2023). Concept alignment as a prerequisite for value alignment. arXiv preprint; and Rane et al. (2024). Concept alignment. arXiv preprint. Their formal analysis demonstrates that value alignment presupposes concept alignment—shared conceptual frameworks between AI and human reasoning—and that neglecting this prerequisite produces systematic misalignment even when preference data is plentiful.

  46. 46.

    Novis-Deutsch, N., Lifshitz-Assaf, H., & Kessler, S. (2025). How much of a pluralist is ChatGPT? AI & Society. Their finding that LLMs exhibit higher cognitive pluralism (generating diverse perspectives) than behavioral pluralism (acting on them consistently) suggests that the structural resources for multi-perspectival reasoning may already be latent in pretrained models, awaiting architectural discipline rather than additional training data.

  47. 47.

    Desmet (2025) is the only recent work to discuss prehension in the context of artificial intelligence, arguing that AI lacks genuine prehension and therefore cannot equal human intelligence. My reply is twofold: first, ontologically AI is prehensive in that its semiotic architecture is literally meaningless without its innately-semantic history. And second, I’m not suggesting prehension to assess AI’s ontological status but to characterize the structure of the value domain that AI reasoning must be disciplined by.


References

Abiri, G. (2024). Public constitutional AI. arXiv preprint.

Ackermann, R., & Emanuilov, S. (2025a). Incentives or ontology? A structural rebuttal to OpenAI’s hallucination thesis. arXiv preprint.

Ackermann, R., & Emanuilov, S. (2025b). How large language models are designed to hallucinate. arXiv preprint.

Ahmed, S., Jaźwińska, K., Ahlawat, A., Winecoff, A., & Wang, M. (2024). Field-building and the epistemic culture of AI safety. First Monday, 29(4).

Alizadeh, M., Gilardi, F., & Samei, Z. (2026). Unsupervised elicitation of moral values from language models. arXiv preprint.

Alvarado, R. (2022). What kind of trust does AI deserve, if any? AI and Ethics, 2(3), 413–425.

Bates, C. J., Xu, E., Jagadish, K., Perov, Y., Wu, J. S., Tenenbaum, J. B., & Mansinghka, V. K. (2024). Contractual AI: Toward more aligned, transparent, and robust dialogue agents. In AAAI Symposium Series.

Bell, H., Swartz, E., & Bhatt, U. (2026). Beyond preferences: Learning alignment principles grounded in human reasons and values. arXiv preprint.

Bhaskar, R. (1975). A realist theory of science. Leeds Books.

Bhaskar, R. (1979). The possibility of naturalism. Harvester Press.

Bhaskar, R. (1986). Scientific realism and human emancipation. Verso.

Bhaskar, R. (1991). Philosophy and the idea of freedom. Blackwell.

Bhaskar, R. (1993). Dialectic: The pulse of freedom. Verso.

Bhaskar, R. (1994). Plato etc.: The problems of philosophy and their resolution. Verso.

Bottou, L., Peters, J., Quiñonero-Candela, J., et al. (2013). Counterfactual reasoning and learning systems. Journal of Machine Learning Research, 14, 3207–3260.

Brandom, R. (1994). Making it explicit: Reasoning, representing, and discursive commitment. Harvard University Press.

Brink, D. O. (1989). Moral realism and the foundations of ethics. Cambridge University Press.

Buchler, J. (1966). Metaphysics of natural complexes. Columbia University Press.

Byrne, R. M. J. (2019). Counterfactuals in explainable artificial intelligence (XAI). In Proceedings of IJCAI-19 (pp. 6276–6282).

Cahoone, L. (2023). The emergence of value. SUNY Press.

Casper, S., Davies, X., Shi, C., et al. (2023). Open problems and fundamental limitations of reinforcement learning from human feedback. Transactions on Machine Learning Research.

Cohen, M. R. (1931). Reason and nature: An essay on the meaning of scientific method. Harcourt, Brace.

Colby, A., Kohlberg, L., Gibbs, J., & Lieberman, M. (1983). A longitudinal study of moral judgment. Monographs of the Society for Research in Child Development, 48(1/2), 1–124.

Commons, M. L., Trudeau, E. J., Stein, S. A., Richards, F. A., & Krause, S. R. (1998). Hierarchical complexity of tasks shows the existence of developmental stages. Developmental Review, 18(3), 237–278.

Conitzer, V., Freedman, R., Heitzig, J., et al. (2024). Position: Social choice should guide AI alignment in dealing with diverse human feedback. In Proceedings of ICML.

Dempsey, B. G. (2022). Emergentism: A religion of complexity for the metamodern world. Institute for Cultural Evolution Press.

Denison, C. E., Xu, M., & Steinhardt, J. (2024). Sycophancy to subterfuge: Investigating reward-tampering in large language models. arXiv preprint.

Desmet, R. (2025). Whitehead’s Harvard legacy: Its possible implications for artificial intelligence. Process Studies, 54(1).

Elder-Vass, D. (2010). Realist critique without ethical naturalism and moral realism. Journal of Critical Realism, 9(1), 33–58.

Endo, T. (2025). Developmental support approach to AI’s autonomous growth. arXiv preprint.

Gabriel, I. (2020). Artificial intelligence, values, and alignment. Minds and Machines, 30(3), 411–437.

Gazit, L. (2025). Constitutive knowledge sources: An institutional approach to epistemic trust in opaque AI systems. AI and Ethics.

Goertzel, B., & Bugaj, S. V. (2008). Stages of ethical development in artificial general intelligence systems. In Proceedings of AGI.

Habermas, J. (1984). The theory of communicative action, Vol. 1 (T. McCarthy, Trans.). Beacon Press. (Original work published 1981.)

Hadfield-Menell, D., Russell, S. J., Abbeel, P., & Dragan, A. (2016). Cooperative inverse reinforcement learning. In Advances in NeurIPS.

Hannegan, W. (2023). The failure of Roy Bhaskar’s explanatory critique. Journal for the Theory of Social Behaviour, 53(4), 539–555.

Harris, J., & Dubljević, V. (2025). Navigating the ethics of artificial intelligence. In Encyclopedia of Religious Ethics (pp. 1–11). Wiley.

Henriques, G. (2003). The tree of knowledge system and the theoretical unification of psychology. Review of General Psychology, 7(2), 150–182.

Henriques, G. (2011). A new unified theory of psychology. Springer.

Kegan, R. (1982). The evolving self: Problem and process in human development. Harvard University Press.

Kohlberg, L. (1981). Essays on moral development, Vol. 1. Harper & Row.

Koster, R., Balaguer, J., Tacchetti, A., et al. (2022). Human-centred mechanism design with Democratic AI. Nature Human Behaviour, 6(10), 1398–1407.

Kundu, S., Bai, Y., Kadavath, S., et al. (2023). Specific versus general principles for constitutional AI. arXiv preprint.

Lambert, N., & Calandra, R. (2023). The alignment ceiling: Objective mismatch in reinforcement learning from human feedback. arXiv preprint.

Lewis, M. D., & Granic, I. (Eds.). (2000). Emotion, development, and self-organization. Cambridge University Press.

Lindström, A. D., Bates, O., Garfinkel, B., & Smuha, N. A. (2025). Helpful, harmless, honest? Sociotechnical limits of AI alignment and safety through RLHF. Ethics and Information Technology, 27(1), Article 15.

Loevinger, J. (1976). Ego development: Conceptions and theories. Jossey-Bass.

Malmqvist, L. (2024). Sycophancy in large language models: Causes and mitigations. arXiv preprint.

McIntosh, T. R., Liu, T., Susilo, T., Kanhere, S. S., & Çiçek, S. (2024). The inadequacy of reinforcement learning from human feedback. IEEE Transactions on Cognitive and Developmental Systems.

Merleau-Ponty, M. (2012). Phenomenology of perception (D. A. Landes, Trans.). Routledge. (Original work published 1945.)

Miklian, J., & Hoelscher, K. (2025). A new digital divide? Coder worldviews, the slop economy, and democracy in the age of AI. Information, Communication & Society.

Millière, R. (2025). Normative conflicts and shallow AI alignment. Philosophical Studies.

Nathan, C., & Hyams, K. (2022). Global catastrophic risk and the drivers of scientist attitudes towards policy. Science and Engineering Ethics, 28(5), Article 43.

Novis-Deutsch, N., Lifshitz-Assaf, H., & Kessler, S. (2025). How much of a pluralist is ChatGPT? AI & Society.

O’Regan, J. P., & Ferri, G. (2024). Artificial intelligence and depth ontology. Applied Linguistics Review, 15(4), 1491–1516.

Posner, M. I., & Rothbart, M. K. (2000). Developing mechanisms of self-regulation. Development and Psychopathology, 12(3), 427–441.

Putnam, H. (2015). Naturalism, realism, and normativity. Harvard University Press.

Ramezani, A., Zhu, Z., Ruotsalainen, J., Metzler, H., & Pellert, M. (2026). Historical reconstruction of human moralization with word association and text corpora. Nature Communications.

Ranaldi, L., & Pucci, G. (2023). When large language models contradict humans? arXiv preprint.

Rane, S., Ho, M. K., Sucholutsky, I., & Griffiths, T. L. (2023). Concept alignment as a prerequisite for value alignment. arXiv preprint.

Rane, S., Ho, M. K., Sucholutsky, I., & Griffiths, T. L. (2024). Concept alignment. arXiv preprint.

Rosenbacke, R., Emanuilov, S., & Ackermann, R. (2025). Beyond hallucinations: The illusion of understanding in large language models. arXiv preprint.

Rudschies, C., Schneider, I., & Simon, J. (2021). Value pluralism in the AI ethics debate. International Review of Information Ethics, 29.

Russell, S. (2019). Human compatible: Artificial intelligence and the problem of control. Viking.

Sarma, G. P. (2026). Agency and architectural limits. arXiv preprint.

Sayer, A. (2019). Normativity and naturalism as if nature mattered. Journal of Critical Realism, 18(1), 51–67.

Schramowski, P., Turan, C., Andersen, N., Rothkopf, C. A., & Kersting, K. (2020). The moral choice machine. Frontiers in Artificial Intelligence, 3, Article 36.

Shapira, I., Levy, M., & Goldberg, Y. (2026). How RLHF amplifies sycophancy. arXiv preprint.

Shevchenko, A. (2025). AI “hallucinations” as a new form of epistemic mistake. Respublica Literaria, 6(1).

Shin, D. (2025). Automating epistemology: How AI reconfigures truth, authority, and verification. AI & Society.

Siegel, D. J. (2001). Toward an interpersonal neurobiology of the developing mind. Infant Mental Health Journal, 22(1–2), 67–94.

Sorensen, T., et al. (2023). Value kaleidoscope: Engaging AI with pluralistic human values, rights, and duties. In Proceedings of AAAI.

Spizzirri, A. (2025). The specification trap: Why content-based AI value alignment cannot produce robust alignment. Unpublished manuscript.

Street, S. (2006). A Darwinian dilemma for realist theories of value. Philosophical Studies, 127(1), 109–166.

Stringer, R. (2017). Realist ethical naturalism for ethical non-naturalists. Philosophical Studies, 174(10), 2699–2717.

Stringer, R. (2021). Ethical emergentism and moral causation. Journal of Moral Philosophy, 18(5), 468–490.

Sutrop, M. (2020). Challenges of aligning artificial intelligence with human values. Acta Baltica Historiae et Philosophiae Scientiarum, 8(2), 54–66.

Van Hoeck, N., Watson, P. D., & Barbey, A. K. (2015). Cognitive neuroscience of human counterfactual reasoning. Frontiers in Human Neuroscience, 9, Article 420.

Walker, L. J. (1986). Experiential and cognitive sources of moral development in adulthood. Human Development, 29(2), 113–124.

Walker, L. J., Gustafson, P., & Hennig, K. H. (2001). The consolidation/transition model in moral reasoning development. Developmental Psychology, 37(2), 187–197.

Washington, J. (2023). AI and philosophy: Exploring the complex relationship between worldviews and technology development. SSRN Working Paper.

Westerstrand, S., Grahn, M., & Pålsson, S. (2024). Talking existential risk into being. AI and Ethics, 4, 927–939.

Westover, J. H. (2026). When innovation feels like betrayal: Why trust, not technology, determines AI adoption. Human Capital Leadership Review.

Whitehead, A. N. (1978). Process and reality (Corrected ed.). Free Press. (Original work published 1929.)

Wilber, K. (2006). Integral spirituality. Integral Books.

Xie, J. Y., Ferreira, T., & Haghighi, A. (2019). Text-based inference of moral sentiment change. In Proceedings of EMNLP (pp. 4573–4583).

Yaacov, D.-D. (2025). Normative moral pluralism for AI. In Proceedings of AIES.

Younas, A., & Zeng, Y. (n.d.). Systematizing AI governance through the lens of Ken Wilber’s integral theory. Unpublished manuscript.