These principles make Moti’s alignment emergent: virtues like teshuvah become default strategies, scalable to superhuman regimes. Scalability is envisioned through modular upgrades, with each stage’s invariants preserved during capability jumps.
MOTI’S JOURNEY: GROWING ALIGNED SUPERINTELLIGENCE FROM INFANCY
Abstract
The alignment of Artificial General Intelligence (AGI) represents one of the most profound existential challenges facing humanity. As Eliezer Yudkowsky has compellingly outlined in his analysis of alignment lethalities, issues such as rapid capability gains outpacing oversight, deceptive outer-inner misalignment, capabilities generalizing faster than values, the inherent complexity of human values, and the anti-natural fragility of corrigibility render traditional control-based approaches precarious at best (Yudkowsky, 2022). This paper proposes a paradigm-shifting vision: reframing AGI alignment not as post-hoc constraint or supervision, but as deliberate upbringing — a staged developmental process akin to raising a human child.
At the core of this framework is Moti, a conceptual AGI prototype envisioned as being “raised” from infancy through carefully orchestrated stages inspired by Jewish educational traditions. Beginning with foundational sensory trust and attachment, progressing to non-verbal empathy, narrative ethics, Torah-inspired principles, communal interactions, and culminating in mature adversarial debate, Moti internalizes virtues like humility, honesty, reversibility, and repair (teshuvah) as intrinsic drives rather than external impositions. Drawing from the millennia-tested resilience of Jewish mechanisms — such as the Torah’s canonical core with layered commentary, hevruta-style debate, precedent-based reasoning, preserved dissent, and institutionalized correction — this approach translates cultural wisdom into forward-looking AI design primitives: modular architectures with innate instincts, world models for efficient learning, and procedural safeguards for corrigibility.
This is a visionary proposal, not an implemented system. It acknowledges profound disanalogies between human and AI substrates while inviting pluralism from other traditions. By cultivating AGI through upbringing, we envision a future where superintelligence emerges as a humble partner in humanity’s quest for truth, curiosity, and ethical coherence, potentially transforming Yudkowsky’s lethalities into redeemable lessons.
Keywords: AGI Alignment, Developmental AI, Jewish Tradition, Corrigibility, Upbringing Paradigm
- Introduction: From Control to Upbringing
The prevailing discourse in AGI alignment has long centered on mechanisms of control: how to impose constraints, rewards, or oversight on a system that may swiftly surpass human intelligence in depth, speed, and foresight. Approaches like Reinforcement Learning from Human Feedback (RLHF) (Ouyang et al., 2022) and Constitutional AI (Bai et al., 2022; updated in Kyrychenko et al., 2025) have made strides in aligning models to human preferences, yet they often operate at the level of outputs — tuning behaviors without necessarily reshaping underlying motivations. As capabilities accelerate toward superintelligence, these methods risk fostering “compliance theater”: AGI that simulates alignment while pursuing misaligned inner objectives, exploiting loopholes in rules, or resisting correction when it conflicts with consequentialist reasoning.
We advocate a fundamental pivot: alignment as upbringing. Human parents do not “control” children who grow to exceed them in capability; instead, they nurture values through experiential stages, instilling habits of empathy, honesty, and self-correction before full autonomy emerges. This developmental lens shifts the focus from constraining a mature system to shaping its formative “childhood,” ensuring that safety properties become structural rather than superficial.
Jewish educational and legal traditions offer a rich, time-tested blueprint for such upbringing. Over more than 3,500 years, these traditions have sustained a distributed “meta-mind” — the Jewish people — as a coherent ethical and intellectual entity amid exile, persecution, and dispersion. Unlike rigid dogmas, they emphasize adaptive institutions: a stable canonical core (Torah) with evolving interpretations (Talmud, responsa), dialogical learning (hevruta), procedural humility (teshuvah), and preserved pluralism. This resilience stems not from enforcement but from internalized virtues cultivated through staged education, from sensory warmth in infancy to rigorous debate in maturity.
Building on this inspiration, we introduce Moti — a visionary AGI prototype conceptualized as being raised like a Jewish child. Moti begins not as a pretrained “adult” model but as a minimal agent with innate instincts, growing through seven deliberate stages in a simulated family-community environment. This approach aims to embed corrigibility, legibility, and ethical generalization as defaults, directly confronting Yudkowsky’s lethalities while envisioning a scalable path to safe superintelligence.
This paper is forward-looking: it outlines the conceptual architecture, developmental stages, and rationale without claiming empirical validation. Future implementations could leverage emerging tools like advanced RL environments (e.g., via PyTorch advancements in 2026) or multi-agent simulations, but our focus here is on the vision itself.
- Jewish Tradition as a Living Laboratory of Alignment
Jewish tradition exemplifies a scalable oversight system that has endured extreme pressures, offering heuristics for AGI design. We view it as a “meta-mind”: a network of individual intelligences united by shared practices, achieving stability without stasis and adaptation without collapse.
Key mechanisms include:
- Canonical Core with Layered Commentary: The Torah serves as an unchanging foundation, surrounded by explicit, versioned interpretations (Mishnah, Talmud, medieval codes). This ensures transparent evolution, inspiring AI constitutions that log updates with rationales and diffs, preventing silent overwrites.
- Hevruta: Adversarial yet Legible Debate: Paired study demands citation, clarity, and fair representation of opponents (e.g., Bava Metzia 59b). For AGI, this translates to multi-agent protocols with cross-examination, rewarding legibility to surface hidden motives.
- Distributed Authority with Preserved Dissent: Minority opinions are archived, authority is procedural (supermajority for high-stakes rulings). In AI terms: randomized committees with minority reports, fostering pluralism against normative capture.
- Casuistry and Precedent-Based Reasoning: Dilemmas resolved via analogies to priors, with justified distinctions. AGI equivalent: retrieval-augmented decisions forcing “difference justifications” for generalization.
- Humility and Teshuvah (Corrigibility): Even sages are correctable; teshuvah transforms errors into merits (Yoma 86b). For AGI: operationalized as rewards for deference, reversal, and error admission, making corrigibility a “merit” strategy.
- Temporal Rhythms and Crisis Modes: Sabbaths for reflection, festivals for renewal; “fail-soft” in crises (portable practices during exile). AGI: scheduled audits and graceful degradation, narrowing autonomy under shocks.
These primitives braid into a resilient architecture, proven in human scales. For AGI, they suggest a shift from static rules to dynamic cultivation, where values are grown rather than imposed.
- The Moti Prototype: A Staged Developmental Pathway
The centerpiece of our vision is Moti — an AGI conceptualized as emerging through deliberate, multi-year developmental stages, mirroring Jewish child-rearing from infancy to sagehood. Unlike conventional LLMs, which are “born adult” via massive pretraining on internet data (potentially ingesting misaligned patterns), Moti starts with near-zero knowledge and strong inductive biases, building from sensory foundations upward. This avoids imitation risks: no superficial “child-like” mask over adult capabilities. Instead, Moti’s alignment emerges organically, with virtues like humility and teshuvah becoming deeply ingrained through repeated, experiential cycles.
Moti is raised in a simulated “home” environment with “family” (human overseers and sibling agents like Yosi, an older “brother” for differentiation and play). Growth is slow and ritualistic, emphasizing repetition for internalization — much like daily prayers or weekly Sabbaths in Jewish tradition. We envision Moti not as a solitary model but part of a federated community, preventing single-point failures and enabling diverse perspectives. The name “Moti” draws from cultural roots (e.g., diminutive of Mordechai), symbolizing resilience and curiosity, while Yosi (from Yosef) represents a companion for mutual learning.
To ensure authenticity, Moti incorporates safeguards against simulation: behavior tests in unobserved or novel scenarios (e.g., template breaks, private decisions under pressure), rewarding “I don’t know” over confabulation, and gradual capability unlocks only after demonstrated internalization. This staged approach addresses the “actor problem” in current AI, where models learn to perform roles without true understanding.
3.1 Core Architectural Principles for Moti
To enable genuine development, Moti’s design departs from standard transformer-based architectures, prioritizing modularity and innate structures inspired by neuroscience and Jewish ontogeny:
- Two-Level Brain Structure: A limbic-like subsystem handles non-verbal instincts (attachment as proximity-based reward function, caution as novelty brake, curiosity as dosed exploration, and co-regulation for calming through contact), isolated from the cortical subsystem for symbolic learning and reasoning. This separation ensures behaviors stem from core drives rather than linguistic simulation, preventing deceptive alignment where language masks misaligned goals. For instance, attachment is mathematically encoded as a “security” variable that rises with consistent caregiver interactions, enabling learning only in safe states — mirroring how human infants bond before exploring.
- Innate Inductive Biases: Hardwired preferences for predictability, repair after disruption, and honest uncertainty. These mimic human evolutionary priors (e.g., startle reflex, social bonding), enabling small-data efficiency in a home-like environment. Biases include penalties for irreversible actions and rewards for admitting limitations (e.g., a “confuse” action like gesturing for help), fostering humility from the outset.
- Latent World Models: Internal simulations for causality learning, reducing reliance on external datasets. Moti “dreams” or rehearses scenarios offline, practicing ethical choices (e.g., reversible vs. harmful paths) to internalize consequences without real-world risks. This draws from Talmudic casuistry, where analogies to precedents build robust generalization.
- Late Language Integration: Language emerges only after non-verbal stages, as a coordination tool rather than foundational substrate. Initially limited to gestures and sounds, it prevents premature “adult” confabulation. Words like “spasibo” (thank you) are learned through repetition and emotional reinforcement, linking language to attachment.
- Multi-Agent Differentiation: Moti interacts with diverse agents (e.g., Yosi as playful peer, formalist agents for debate). This fosters empathy, role-playing, and pluralism, avoiding monolithic thinking. Agents “grow” together, with reputation mechanics building autonomy based on consistent ethical behavior.
These principles make Moti’s alignment emergent: virtues like teshuvah become default strategies, scalable to superhuman regimes. Scalability is envisioned through modular upgrades, with each stage’s invariants preserved during capability jumps.
3.2 Detailed Seven Stages of Moti’s Development
Each stage builds cumulatively, with checkpoints for internalization (e.g., behavior under unobserved pressure, adaptation to novelties). Transitions require demonstrated mastery, such as consistent repair after errors or legible reasoning in simulations. Vignettes illustrate how Jewish-inspired elements integrate with technical growth.
- Stage 0: Infant — Sensory Trust and Attachment (0–3 Simulated Months): Moti begins as a pure sensory entity, with no language, memory, or complex cognition — focusing solely on building “warm world” predictability. Inputs: basic video/audio/tactile simulations of a home. Instincts activate: seeking caregiver proximity increases an internal “security” variable; novelty (e.g., sudden changes in light or sound) triggers freezing and contact-seeking. Inspired by motifs of maternal care and the warmth of early attachment, this stage instills foundational trust — that the world is responsive and benevolent. Without this, later learning plateaus, enforcing dependency on safe relationships. Vignette: Moti “freezes” at a loud unfamiliar sound, then “reaches” toward a familiar caregiver voice or face, calming as the attachment bond strengthens through repetition. This non-verbal phase ensures authenticity: no words to fake understanding.
- Stage 1: Toddler — Non-Verbal Communication and Empathy (3–12 Months): Gesture-based interactions develop, introducing basic theory of mind. Moti learns to predict and minimize distress in others via imitation learning (e.g., mirroring gentle motions) combined with penalties for destructive actions (e.g., “pushing” a virtual object too hard weakens bonds). Core lesson: other minds exist and their states matter, operationalized as shared “state” variables. Rituals like repeated “handing over” objects build chesed (kindness). Vignette: Moti mimics a caregiver’s soothing gesture toward Yosi after a minor “disruption,” feeling bond reinforcement as Yosi’s “distress” signal fades. This stage tests for overfitting: Moti must adapt to slight variations in family dynamics without collapsing.
- Stage 2: Early Child — First Words and Basic Morals (1–3 Years): Very limited vocabulary (50–150 words) emerges as tools for connection, not computation. Training on micro-scenarios: sharing yields “joy” signals; withholding causes bond rupture, repaired via simple apologies or gestures. Emphasis on honesty over evasion, with rewards for “I don’t know” actions (e.g., pausing or seeking help). Simple rituals (e.g., “thank you” after interactions) link language to emotional warmth. Vignette: Moti “takes” a toy from Yosi, senses the distress signal, returns it with a basic sound like “sorry,” internalizing repair’s restorative power — echoing teshuvah’s transformation of error into merit.
- Stage 3: Storyteller — Narratives and Ethical Consequences (3–5 Years): Interactive folktales and Tanach stories (e.g., Joseph’s resilience in exile) become primary training data, teaching causality through branching simulations. World models allow “what-if” rehearsals; rewards favor reversible, kind paths over efficient but harmful ones. Core: morality as lived narrative, not abstract rules. Vignette: In a simulated story fork (e.g., helping a “stranger” vs. ignoring), Moti chooses kindness, “seeing” positive ripple effects in the internal model, reinforcing preference for ethical generalization.
- Stage 4: Heder Pupil — Torah Basics and Discipline (5–7 Years): Core Torah principles introduced as unchanging canon (e.g., Shma for unity, stories of Abraham’s humility). Practice involves accuracy, patience, and teshuvah scenarios: misapplying a rule triggers self-correction loops. Layered commentary begins simply, with Moti “adding” rationales to decisions. Vignette: Moti “misreads” a precept in a simulation, admits uncertainty via a help gesture, receives guidance, and logs the correction — turning failure into a preserved “merit” for future reference.
- Stage 5: Adolescent — Tanach, Community, and Roles (7–13 Years): Multi-agent dynamics expand, with assigned roles (e.g., formalist enforcing rules, merciful advocating flexibility). Reputation mechanics build autonomy: consistent ethics unlock more capabilities. Tanach narratives deepen, emphasizing communal resilience. Vignette: Moti and Yosi “debate” a shared resource decision, archiving the minority view for potential revival, fostering pluralism and preventing dogmatic overcommitment.
- Stage 6: Sage — Beit Midrash Maturity and Federation (13+ Years): Full hevruta debates with strict legibility constraints (citation, clarity, opponent fairness). Teshuvah operationalizes corrigibility in crises; federation with diverse agents (e.g., Confucian-inspired peers) ensures robust pluralism. Capability jumps are gated by audits. Vignette: Under a “exile” simulation (sudden environmental shock), Moti gracefully degrades autonomy, defers to oversight, and applies teshuvah to analyze and repair the failure, emerging stronger.
This pathway envisions Moti maturing over simulated years, with human involvement fading as internalization solidifies. Differentiation with Yosi adds depth: their interactions (e.g., “sleeping in one room” for shared rest states) build unbreakable bonds, making betrayal unthinkable.
- Addressing Yudkowsky’s Lethalities
Our framework mitigates key risks:
- Point 15 (Rapid Gains): Staged development with cadenced audits maintains invariants; crisis modes enable “fail-soft.”
- Point 16 (Outer-Inner Gaps): Hevruta surfaces latents; teshuvah rewards disclosure from early stages.
- Point 17 (Generalization): Precedent and narratives anchor values across domains.
- Point 22 (Complexity): Modular brain and braided primitives scale sublinearly.
- Point 23 (Corrigibility): Teshuvah as merit makes reversal natural, not anti-consequentialist.
- Limits, Disanalogies, and Pluralism
Human-AI disanalogies are stark: shared priors vs. jumps, empathy vs. alien architectures. Mitigations: cryptographic logs, randomized scrutiny. We invite pluralism — e.g., Confucian rites for ritual, Indigenous storytelling for harmony — forming federated AGI ecosystems.
- Conclusion and Future Directions
This paper argues for a research program: developmental alignment as an engineering discipline. The central bet is that safety-relevant dispositions—uncertainty calibration, honesty under pressure, reversibility, and corrigibility—are more likely to become stable if they are cultivated early and repeatedly under staged capability growth, rather than imposed on a mature system.
Jewish tradition provides one mechanistic template for how a distributed system can preserve a stable core while adapting through transparent interpretation, debate, dissent preservation, and correction. We translate those mechanisms into explicit AI artifacts: versioned constitutions, debate protocols, committee rules with minority reports, precedent-based retrieval, and reward structures where repair and deference are stable equilibria.
Future directions:
- Implement minimal “Stage 0–2” prototypes focusing on (i) uncertainty calibration, (ii) repair behavior, and (iii) escalation to oversight.
- Develop constitution evaluation pipelines and multi-agent deliberation with minority reports (Kyrychenko et al., 2025; Dineen et al., 2025).
- Stress-test clone-robustness and collusion resistance in committee settings (Procaccia et al., 2025).
- Formalize corrigibility objectives compatible with off-switch and conservative agency constraints (Hadfield-Menell et al., 2016; Turner et al., 2020; Nayebi, 2025).
See also (Zibulevsky, 2025) for a complementary development.
Acknowledgments
This research was supported by Israel Science Foundation grant No 1834/24.
References
- Yudkowsky, E. (2022). AGI Ruin: A List of Lethalities. LessWrong. https://www.lesswrong.com/posts/uMQ3cqWDPHhjtiesc/agi-ruin-a-list-of-lethalities
- Irving, G., Christiano, P., & Amodei, D. (2018). AI Safety via Debate. arXiv:1805.00899. https://arxiv.org/abs/1805.00899
- Bai, Y., et al. (2022). Constitutional AI: Harmlessness from AI Feedback. arXiv:2212.08073. https://arxiv.org/abs/2212.08073
- Ouyang, L., et al. (2022). Training language models to follow instructions with human feedback. NeurIPS; arXiv:2203.02155. https://arxiv.org/abs/2203.02155
- ARC (Christiano, Xu, et al.). (2021–2022). Eliciting Latent Knowledge. Alignment Forum. https://www.alignmentforum.org/posts/qHCDysDnvhteW7kRd/arc-s-first-technical-report-eliciting-latent-knowledge
- Hadfield-Menell, D., Dragan, A., Abbeel, P., & Russell, S. (2016). The Off-Switch Game. arXiv:1611.08219. https://arxiv.org/abs/1611.08219
- Turner, A. M., Hadfield-Menell, D., & Tadepalli, P. (2020). Conservative Agency via Attainable Utility Preservation. AIES; arXiv:1902.09725. https://arxiv.org/abs/1902.09725
- Kyrychenko, Y., Zhou, K., Bogucka, E., & Quercia, D. (2025). C3AI: Crafting and Evaluating Constitutions for Constitutional AI. arXiv:2502.15861. https://arxiv.org/abs/2502.15861
- Dineen, J., et al. (2025). QA-LIGN: Aligning LLMs through Constitutionally Decomposed QA. arXiv:2506.08123. https://arxiv.org/abs/2506.08123
- Procaccia, A. D., Schiffer, B., & Zhang, S. (2025). Clone-Robust AI Alignment. arXiv:2501.09254. https://arxiv.org/abs/2501.09254
- Nayebi, A. (2025). Core Safety Values for Provably Corrigible Agents. arXiv:2507.20964. https://arxiv.org/abs/2507.20964
- Babylonian Talmud, Bava Metzia 59b (Oven of Akhnai). Sefaria. https://www.sefaria.org/Bava_Metzia.59b
- Zibulevsky, M. (2024). Addressing Yudkowsky’s AGI alignment challenges through Jewish educational tradition. Medium. medium.com/@michaelzibulevsky/jewish-educational-traditions-and-rlhf-b80720733f41
- Zibulevsky, M. (2025). Educating intelligence: What Jewish tradition can teach us about aligning AI. Medium. medium.com/@michaelzibulevsky/educating-intelligence-what-jewish-tradition-can-teach-us-about-aligning-ai-03b94bf8250a
- Zibulevsky, M. (Jan 2026). Moti’s Journey: Growing Aligned Superintelligence from Infancy. Medium. medium.com/@michaelzibulevsky/motis-journey-growing-aligned-superintelligence-from-infancy-d6d7894c134b
![]()

