The Artemis Solution for Green AI Architecture

A Multi-Layer Framework for Energy-Efficient Machine Learning

Alan Scott Encinas
ML Engineer & Systems Architect | Founder, MyDataOne (MD1)
February 2026 | Working Paper v2.0


Abstract

The rapid proliferation of large language models (LLMs) and generative AI systems has created a rapidly intensifying energy constraint in computing infrastructure. The International Energy Agency (IEA, 2025) estimates global data center electricity consumption at approximately 415 terawatt-hours (TWh) in 2024, projecting growth to around 945 TWh by 2030 under its Base Case scenario. This paper argues that achieving sustainable cognitive AI requires simultaneous optimization across three interdependent layers: neuromorphic hardware, software algorithms, and governance frameworks.

We synthesize current research across these optimization layers: (1) neuromorphic hardware architectures, including IBM NorthPole (demonstrating 25× energy efficiency gains in frames per joule over comparable 12nm GPUs on image classification benchmarks) and Intel Hala Point (achieving 15 TOPS/W on conventional deep neural networks); (2) software optimization frameworks including Zeus (achieving 15.3%–75.8% energy reduction in DNN training) and carbon-aware scheduling; and (3) emerging policy frameworks including the EU Energy Efficiency Directive and AI Act.

The paper further addresses the threat of model collapse from AI-generated training data contamination (Shumailov et al., Nature, 2024), establishing data integrity as an essential component of the sustainability mandate. We propose the Artemis Solution—a phased implementation framework that integrates these optimization layers into a coherent architectural strategy for sustainable AI development. Contribution: a coupled clean-data plus clean-power architecture framing with a phased deployment path across hardware, software, and governance.

Keywords: Green AI, neuromorphic computing, sustainable machine learning, energy efficiency, LLM inference, carbon footprint, model collapse, data integrity, von Neumann bottleneck


1. Introduction: The Energy Obstacle

The contemporary pursuit of advanced artificial intelligence has driven a technological revolution at the cost of significant environmental and economic strain. The IEA’s April 2025 special report Energy and AI provides the most comprehensive data-driven analysis to date: global data center electricity consumption reached approximately 415 TWh in 2024—about 1.5% of global electricity consumption—growing at 12% annually over the preceding five years (IEA, 2025, Executive Summary). Under the IEA’s Base Case, this figure is projected to more than double to around 945 TWh by 2030, representing approximately 3% of total global electricity consumption. AI-optimized data centers are the primary driver, with electricity demand from accelerated servers projected to grow by 30% annually.

The regional implications are particularly significant. In the United States, data centers consumed approximately 180 TWh in 2024 (IEA, 2025). The Lawrence Berkeley National Laboratory (LBNL) projects total U.S. data center electricity consumption could reach 298–376 TWh by 2028 in moderate-to-high growth scenarios (Shehabi et al., 2024, LBNL-2001635, Table ES-2). The IEA projects U.S. data center consumption will increase by approximately 240 TWh (up 130%) by 2030, noting that the U.S. is “set to consume more electricity for data centres than for the production of aluminium, steel, cement, chemicals and all other energy-intensive goods combined” (IEA, 2025, Executive Summary).

1.1 The Biological Benchmark

The human brain provides a foundational benchmark for cognitive efficiency. It operates on roughly 20 watts of power for all its cognitive tasks—reasoning, perception, motor control, and physical regulation—comparable to the power required by a standard LED light bulb (Kováč, 2010; PNAS, 2025). This efficiency derives from event-driven, parallel, in-memory processing where neurons simultaneously serve as both computation and storage elements.

Conversely, training GPT-3 (175 billion parameters) required approximately 1,287 megawatt-hours (MWh) of electricity—equivalent to approximately 120 years of average U.S. household consumption—generating an estimated 552 metric tons of CO₂ equivalent (Patterson et al., 2021, Table 4, arXiv:2104.10350). These figures establish the scale of the efficiency gap between biological and artificial cognitive systems.

Table 1: Energy Consumption Comparison—Biological vs. Artificial Cognition

SystemEnergy SourcePower / EnergyProcessing ParadigmSource
Human BrainGlucose/Oxygen~20 WattsEvent-driven, parallel, in-memoryKováč (2010); PNAS (2025)
GPT-3 TrainingElectricity1,287 MWh total; 552 tCO₂eContinuous, sequential data transferPatterson et al. (2021), Table 4
IBM NorthPole (Inference)Electricity25× frames/joule vs. 12nm GPUIn-memory, parallel, digital neuromorphicModha et al., Science (2023)
Intel Hala PointElectricity15 TOPS/W; 2,600W max systemSpiking neural networks, event-drivenIntel Labs (2024), ICASSP

Note: Neuromorphic benchmarks measured on specific workloads (ResNet-50 image classification for NorthPole; conventional DNN inference for Hala Point). Performance gains are workload-dependent and represent vendor-reported optimized conditions.

1.2 The Inference Dominance Problem

While training costs have dominated public discourse, for mature products serving at scale, inference increasingly becomes the dominant ongoing energy cost. Recent analyses suggest inference can account for up to 90% of a model’s total lifecycle energy use in high-volume deployment scenarios (Desislavov et al., 2023; WEF, 2025). This proportion varies significantly by organization and deployment stage—training-intensive research labs will exhibit different ratios than inference-heavy consumer products.

Per-query energy estimates vary widely due to infrastructure differences, model architectures, and measurement methodologies. Epoch AI (2025) estimates approximately 0.3–0.4 Wh per ChatGPT query, noting this is roughly 10× the energy of a conventional Google search. A recent benchmarking study (arXiv:2505.09598, 2025) estimated, using an infrastructure-aware methodology, per-query consumption ranging from approximately 0.43 Wh for GPT-4o short prompts to over 33 Wh for reasoning-intensive models like o3 on complex tasks. Scaled to estimated daily query volumes, the annual energy consumption of a single flagship model deployment can be substantial—one analysis projected GPT-4o’s annual inference energy at 391,000–463,000 MWh, comparable to the electricity consumption of 35,000 U.S. households (arXiv:2505.09598, 2025, Figure 4).


2. The Dual Mandate: Clean Data and Clean Power

Achieving sustainable cognition requires addressing the complete lifecycle of AI systems—from data provenance to energy sourcing. This dual mandate recognizes that neither clean data nor clean power alone is sufficient; both must be addressed in concert to achieve meaningful sustainability outcomes.

2.1 The Clean Data Crisis: Model Collapse and Data Contamination

A peer-reviewed study published in Nature by Shumailov et al. (2024, Vol. 631, pp. 755–759) demonstrated that AI models experience progressive performance degradation when trained recursively on AI-generated content. This phenomenon, termed model collapse, causes models to lose representation of low-probability events in their training distributions, progressively narrowing their understanding of the world. The study showed that after several generations of recursive training, model outputs converge toward increasingly homogeneous distributions, losing the diversity and accuracy of the original human-generated training data.

Epoch AI (2024, arXiv:2211.04325v2) estimates that high-quality text data—defined as curated, deduplicated content suitable for language model pre-training under current licensing and quality standards—may face diminishing returns between 2026 and 2028, with image and video data following by 2030–2040. These projections depend on definitions of “high-quality,” legal constraints on data access, and the effectiveness of synthetic data augmentation techniques.

Research on data poisoning reinforces the urgency. ETH Zurich (2025) demonstrated that, in that experimental setup, contaminating as little as 0.001% of medical training data with misinformation caused statistically significant harmful outputs. Anthropic (2024) showed that as few as 250 malicious documents can influence model behavior across fine-tuning cycles, including at large scales. Persistent pre-training poisoning (arXiv:2410.13722, 2024) demonstrated that small corruptions survive retraining and propagate across model generations. This data contamination creates a feedback loop with direct energy implications: models trained on degraded data require more extensive remediation, additional training cycles, and ultimately greater energy expenditure to achieve equivalent reliability.

2.2 The Clean Power Imperative: Carbon and Water Footprints

The carbon footprint of AI systems extends beyond direct electricity consumption. A Cornell University study (Li, 2025, arXiv:2304.03271) projects current AI growth trajectories could generate 24–44 million metric tons of CO₂ annually by 2030, comparable to the emissions of a mid-sized industrialized nation. Research published in Nature Sustainability (Li, 2025) estimates U.S. AI servers could require 4.2–6.6 billion cubic meters of fresh water withdrawal annually by 2027 for cooling purposes. These figures reflect direct cooling water tied to data center operation, not the full supply-chain water footprint of hardware manufacturing or energy production. For context, Denmark’s total annual water withdrawal is approximately 1.27 billion cubic meters (FAO AQUASTAT), placing the projected AI water demand at roughly 3–5× that figure—though direct comparison requires caution, as Li’s estimates focus on evaporative cooling losses while national withdrawal figures include water returned to source.

A significant complicating factor is the transparency deficit in industry reporting. As the IEA (2025) notes, “there is substantial uncertainty both about data centre consumption today and in the future.” Proprietary model architectures, undisclosed parameter counts, and variable infrastructure configurations make precise energy accounting exceptionally difficult, undermining effective policy response.


3. Layer One: Neuromorphic Hardware Architecture

Neuromorphic computing represents a fundamental departure from conventional von Neumann architecture, addressing the primary source of AI inefficiency: the physical separation between memory and processing that forces energy-intensive data movement on every computational step.

3.1 The von Neumann Bottleneck

Conventional AI systems operate on von Neumann architecture, which maintains physical separation between memory (storage) and processing (CPU/GPU). Every computational step requires moving data from memory to processor and back—a process that consumes energy and creates latency. This data movement dominates energy consumption in modern computing, often accounting for more energy than the actual computation (Modha et al., Science, 2023).

The biological brain circumvents this limitation through in-memory computing: neurons serve simultaneously as both processing and storage elements, with synaptic connections encoding learned information directly where computation occurs. This eliminates the energy cost of data transfer and enables the brain’s remarkable 20-watt operating efficiency for complex cognitive tasks.

3.2 IBM NorthPole: Production-Ready Neuromorphic AI

IBM’s NorthPole chip, detailed in Science (Modha et al., October 2023, DOI: 10.1126/science.adh1174) and subsequent IEEE publications (2024), represents the most advanced production-ready neuromorphic inference processor. The architecture comprises 22 billion transistors in 795 mm² on a 12nm process node, organized as 256 interconnected cores that each integrate memory and computation.

The efficiency gains are substantial and peer-reviewed. On the ResNet-50 image classification benchmark, NorthPole demonstrates 25× greater energy efficiency (frames per joule) than comparable 12nm GPUs—and even outperforms GPUs fabricated at the more advanced 4nm node by 5× on the same metric (Modha et al., Science, 2023, Figure 4A). Similar results were reported for the YOLOv4 object detection benchmark.

In October 2024, IBM presented LLM inference results at the IEEE High Performance Extreme Computing (HPEC) Conference. Running a 3-billion-parameter Granite LLM distilled from IBM’s Granite-8B-Code-Base model, NorthPole achieved latency below 1 millisecond per token, 46.9× faster than the next most energy-efficient GPU, while achieving 72.7× greater energy efficiency than the next lowest-latency GPU. A 16-chip NorthPole configuration in a standard 2U server achieved throughput of 28,356 tokens per second (IBM Research, 2024). Benchmark conditions: These results were achieved on a specific 3B-parameter model distilled for NorthPole’s architecture; performance on larger or differently structured models may vary.

3.3 Intel Loihi 2 and Hala Point: Scaling Spiking Neural Networks

Intel’s neuromorphic research program has produced the Loihi 2 processor and Hala Point large-scale system. Loihi 2, fabricated on Intel 4 process node, implements asynchronous spiking neural networks (SNNs) where neurons communicate through discrete spikes rather than continuous values—mirroring biological neural signaling. In published results at ICASSP 2024, Loihi 2 demonstrated what Intel characterizes as “orders of magnitude gains in the efficiency, speed and adaptability of emerging small-scale edge workloads” (Intel Newsroom, April 2024). Independent evaluation by The Register (April 2024) reported benchmarks showing up to 50× faster performance and 100× less power consumption versus specific CPU/GPU baselines (Jetson Orin Nano and Core i9-7920X) on targeted inference and optimization problems.

Hala Point packages 1,152 Loihi 2 processors into a 6-rack-unit chassis, supporting 1.15 billion neurons and 128 billion synapses at a maximum system power of 2,600 watts. Intel characterizes its peak throughput at 20 petaops with efficiency exceeding 15 trillion 8-bit operations per second per watt (TOPS/W) when executing conventional deep neural networks (Intel, April 2024). Important qualification: Hala Point is a research prototype deployed at Sandia National Laboratories. As Intel’s Mike Davies acknowledged publicly, current neuromorphic systems cannot yet run transformer-based LLMs, and “the neuromorphic research field does not have a neuromorphic version of the transformer” (The Register, April 2024). The system’s advantages are most pronounced for workloads with continuous, correlated input streams (video, audio) where sparsity and temporal correlation can be exploited.

3.4 The Software Ecosystem Gap

A significant barrier to neuromorphic adoption remains the software ecosystem. Unlike the mature CUDA ecosystem for GPUs—which benefits from over a decade of tooling, libraries, and developer expertise—neuromorphic programming requires fundamentally different algorithmic approaches based on spike timing, temporal coding, and event-driven computation. Models cannot simply be ported from GPU to neuromorphic hardware without redesign.

This gap is narrowing. Intel’s Lava framework, released as open-source software (github.com/lava-nc/lava), allows developers to define spiking neural network architectures using Python syntax. IBM’s NorthPole uses a compiler-based approach that maps conventional neural networks to its architecture through automated model optimization. The Artemis framework advocated in this paper proposes a phased migration strategy: beginning with edge inference workloads where neuromorphic advantages are most pronounced (batch-size-1, real-time processing), then expanding as software tooling matures and “killer applications” demonstrate decisive superiority over conventional architectures.


4. Layer Two: Software and Algorithmic Optimization

While neuromorphic hardware offers transformative efficiency potential, the majority of current AI infrastructure operates on conventional GPU architectures. Software-level optimization provides immediate, deployable efficiency gains within existing hardware investments.

4.1 The Zeus Framework: GPU Power Optimization

The Zeus framework, developed at the University of Michigan and presented at USENIX NSDI 2023 (You, Chung & Chowdhury, pp. 119–139), demonstrates that intelligent GPU power management can achieve energy reductions of 15.3%–75.8% across diverse DNN training workloads without requiring hardware changes or significant accuracy loss. Zeus operates by jointly optimizing two parameters: GPU power limits and training batch size, finding optimal tradeoff points between energy consumption and training speed in real time.

The wide range (15.3%–75.8%) reflects workload dependence—some models and tasks offer substantially more optimization headroom than others. Applied hypothetically to GPT-3-scale training (1,287 MWh baseline per Patterson et al.), Zeus-class optimization could reduce energy consumption by 197–976 MWh per training run, depending on workload characteristics. Zeus has accumulated over 100,000 downloads on DockerHub, indicating meaningful industry adoption.

4.2 Carbon-Aware Computing: The Chase Framework

The Chase framework extends Zeus with carbon-intensity awareness, dynamically adjusting training intensity based on real-time grid carbon signals. During periods of high renewable energy availability (e.g., afternoon solar peaks in California, when grid carbon intensity drops below 70 gCO₂/kWh), training proceeds at full intensity. During high-carbon periods (evening fossil fuel peaks exceeding 400 gCO₂/kWh), the framework throttles computation to prioritize efficiency over speed.

Important distinction: Carbon-aware scheduling is primarily load-shifting—it reduces the carbon intensity of computation by timing workloads to coincide with cleaner grid periods, rather than reducing total energy consumption directly. The energy efficiency gains come from Zeus’s power optimization; Chase adds carbon optimization on top. These approaches are complementary but should not be treated as multiplicatively independent when estimating combined impact.

4.3 Model Compression and Efficient Architectures

Algorithmic techniques including pruning, quantization, and knowledge distillation offer substantial efficiency gains for inference workloads. Quantization from 32-bit floating point (FP32) to 8-bit integer (INT8) representations can reduce model memory footprint and computational requirements by approximately 4× while maintaining near-equivalent accuracy for many tasks (Liu & Yin, 2024; NVIDIA TensorRT documentation). Knowledge distillation—training smaller student models to replicate larger teacher model behavior—can achieve 5–10× compression ratios while preserving 95%+ accuracy.

These techniques are particularly relevant for inference deployment, where model size directly affects latency, throughput, and energy consumption per query. Combined with hardware-aware optimization (selecting precision formats matched to accelerator capabilities), software compression can achieve meaningful efficiency improvements on existing infrastructure.

4.4 DARPA ML2P: Energy-Aware Machine Learning Construction

The Defense Advanced Research Projects Agency (DARPA) has recognized energy consumption as a critical constraint for military AI deployment. The ML2P (Mapping Machine Learning to Physics) program develops what it terms the ‘Energy Semantics of ML’ (ES-ML)—capturing joule-level power-performance interactions between algorithms and physical hardware. Rather than treating energy as an afterthought, ML2P aims to make energy an intrinsic optimization variable alongside accuracy and latency (DARPA, Program Solicitation DARPA-PS-25-32).

ML2P’s approach represents a fundamental shift: treating software and hardware as an integrated system where algorithms are designed with explicit awareness of their physical energy costs. This hardware-software co-design philosophy aligns with the broader trajectory toward energy-aware computing and is particularly relevant for edge deployment scenarios where power budgets are tightly constrained.


5. Layer Three: Policy and Governance Frameworks

Technical optimization alone cannot address the collective action problems inherent in AI’s environmental impact. Policy frameworks establishing reporting requirements, efficiency standards, and market incentives are necessary to create the institutional conditions for sustainable AI development.

5.1 European Union: The Energy Efficiency Directive and AI Act

The European Union has adopted the most comprehensive regulatory approach to AI sustainability. The 2023 Energy Efficiency Directive (EED, Directive 2023/1791) mandates annual reporting for data centers with installed IT power capacity of 500 kW or above, covering energy performance, water usage, waste heat utilization, and renewable energy procurement. The European Commission issued Delegated Regulation EU/2024/1364 establishing specific data center sustainability indicators and methodologies.

The EU AI Act (Regulation (EU) 2024/1689, OJ L 2024/1689, EUR-Lex: 32024R1689), which entered into force August 1, 2024, includes provisions for environmental sustainability. Providers of general-purpose AI (GPAI) models must maintain documentation on training energy consumption and known or estimated energy usage (Regulation 2024/1689, Art. 53 and Annex XI; see also White & Case LLP, 2025). The Act’s implementing measures require disclosure of computational resources used in training, testing, and validation.

Germany’s implementation of the EED through the Energy Efficiency Act (Energieeffizienzgesetz, EnEfG, enacted November 2023) establishes binding targets for data centers: 50% renewable electricity share, increasing to 100% by January 1, 2027, with a mandatory Power Usage Effectiveness (PUE) threshold of 1.2 for newly constructed facilities after July 1, 2026.

5.2 United States: Fragmented Approach and State-Level Action

The United States lacks comprehensive federal AI sustainability regulation. Individual states are beginning to address the gap: California’s SB 253 (Climate Corporate Data Accountability Act, 2023) requires businesses with over $1 billion in annual revenue to report Scope 1, 2, and 3 greenhouse gas emissions, which would encompass large-scale AI operations. Virginia, which hosts the largest concentration of data centers globally in Northern Virginia, has seen significant grid impacts from rapid data center growth.

The economic impacts of data center expansion are increasingly visible. The PJM Interconnection, the regional grid operator serving much of the Eastern United States, has cited data center growth as a significant contributor to increased capacity needs, with capacity market prices rising substantially since 2022. A Carnegie Mellon University analysis projected that data center electricity consumption growth could add 2–8% to retail electricity prices in affected regions by 2030, depending on grid capacity expansion and energy mix assumptions (though specific impacts will vary by location and utility structure).

5.3 Industry Self-Regulation: The Climate Neutral Data Centre Pact

Over 100 data center operators have signed the Climate Neutral Data Centre Pact, committing to measurable sustainability targets including carbon-free energy procurement, water conservation measures, and circular economy practices for hardware lifecycle management. While voluntary, the Pact establishes industry benchmarks against which regulatory requirements may be calibrated.

Heat recovery programs demonstrate the potential for data centers to provide community benefit. Meta’s Odense, Denmark facility exports approximately 100,000 MWh of waste heat annually to district heating networks (Meta Sustainability Report, 2024). Such programs illustrate how operational efficiency gains can create positive externalities beyond reduced carbon footprint.


6. The Artemis Solution: Integrated Framework for Sustainable Cognition

The Artemis Solution synthesizes the preceding technical and policy analysis into a coherent architectural framework for sustainable AI. Named for precision and purposeful resource use, Artemis is not a single technology but a systems-level approach that coordinates optimization across hardware, software, and governance layers.

6.1 Architectural Principles

The Artemis framework rests on five foundational principles:

Verification-First Design: Systems should be architected to surface uncertainty, challenge claims, and resist confident error propagation. This addresses both the technical challenge of hallucination and the energy waste of remediating unreliable outputs.

Edge-Centric Processing: Neuromorphic efficiency gains are most pronounced at batch size 1, suggesting the future of sustainable AI lies in distributed, specialized processing at the network edge rather than exclusively in centralized data centers processing generalized workloads.

Carbon-Aware Orchestration: Workload scheduling should incorporate real-time grid carbon intensity, exploiting renewable energy availability to minimize carbon footprint beyond what efficiency improvements alone can achieve.

Data Provenance Governance: Training data must maintain verifiable provenance distinguishing human-generated from AI-generated content, with curation processes that preserve distribution diversity and resist contamination.

Net-Positive Optimization: AI systems should be designed such that the energy savings they enable exceed the energy they consume—achieving positive environmental return on computational investment.

6.2 Implementation Layers

Artemis implementation proceeds through three coordinated layers, each delivering independent value while creating foundations for subsequent optimization:

Layer 1 (Immediate — Software Optimization): Deploy Zeus/Chase frameworks for GPU power management and carbon-aware scheduling. Implement model compression through quantization (FP8/INT8) and knowledge distillation. Establish energy monitoring baselines using tools like CodeCarbon and ML CO₂ Impact Calculator. These interventions require no hardware changes and can be deployed on existing infrastructure within weeks.

Layer 2 (Near-term — Infrastructure Transition): Migrate inference workloads to neuromorphic processors where task characteristics align with neuromorphic strengths (real-time processing, batch-size-1, continuous input streams). Implement data provenance tracking and contamination detection for training pipelines. Integrate renewable energy procurement with workload scheduling.

Layer 3 (Strategic — Architectural Transformation): Develop verification-first cognitive architectures incorporating uncertainty quantification and claim validation. Establish federated data governance frameworks ensuring training data integrity across organizations. Advance hardware-software co-design following DARPA ML2P principles.

6.3 Projected Efficiency Gains

Table 2: Artemis Framework—Documented Efficiency Gains by Optimization Layer

LayerEfficiency GainBenchmark ConditionsSource
Zeus GPU Optimization15.3%–75.8% energy reductionAcross diverse DNN training workloads (BERT, ResNet, DeepSpeech2, recommendation models)You et al., USENIX NSDI 2023, pp. 119–139
INT8 Quantization~4× memory/compute reductionFP32 to INT8 with <1% accuracy loss on standard benchmarksNVIDIA TensorRT; Liu & Yin (2024)
NorthPole Inference25× frames/joule vs. 12nm GPUResNet-50, YOLOv4 benchmarks; comparable 12nm processModha et al., Science (2023), Figure 4A
NorthPole LLM Inference46.9× faster; 72.7× more efficient3B-param Granite LLM, 16-chip config, vs. GPU baselinesIBM Research, IEEE HPEC (2024)
Hala Point System15 TOPS/W efficiencyConventional DNN inference; 1,152 Loihi 2 chips, 2,600W maxIntel Labs (2024)
Carbon-Aware SchedulingCarbon intensity reduction (load-shifting)Depends on grid mix variability; does not reduce total energyChase framework, ICLR Workshop 2023

Note: These efficiency gains are reported independently under specific benchmark conditions. They should not be treated as multiplicatively stackable across all workloads. Actual combined impact depends on workload characteristics, deployment context, and the degree to which individual optimizations address overlapping versus independent sources of inefficiency. Software optimization (Zeus) and hardware migration (NorthPole) address different workload pools and can be combined, while quantization and neuromorphic migration are partially overlapping interventions.


7. Conclusion and Recommendations

The energy constraint facing artificial intelligence is not a temporary growing pain but a structural challenge to the field’s continued development trajectory. With data center electricity consumption projected to more than double by 2030 (IEA, 2025), the biological brain’s 20-watt efficiency benchmark serves as both an aspirational target and a reminder of how far current architectures remain from optimal design.

The Artemis Solution provides a coherent framework for addressing this challenge across hardware, software, and policy layers. Neuromorphic architectures like IBM NorthPole and Intel Hala Point demonstrate that order-of-magnitude efficiency gains are achievable for specific workloads—with NorthPole’s 25× improvement in frames per joule on image classification, and its 72.7× energy efficiency gain on LLM inference, representing the most compelling evidence to date that the von Neumann bottleneck can be circumvented at scale. Software frameworks like Zeus demonstrate that 15–76% energy reductions are achievable on existing GPU hardware through intelligent power management.

Critically, this paper has established that the dual mandate of sustainable cognition—clean data and clean power—must be addressed in concert. Model collapse from AI-generated data contamination (Shumailov et al., Nature, 2024) creates compounding inefficiency: degraded training data leads to degraded models, which require remediation cycles that consume additional energy. The data integrity problem and the energy problem are not independent; they amplify each other.

7.1 Strategic Recommendations

For AI developers and organizations:

  1. Implement energy monitoring infrastructure (CodeCarbon, Zeus) immediately to establish consumption baselines and identify optimization opportunities.
  2. Adopt carbon-aware scheduling for non-time-critical training workloads, exploiting grid variability to minimize carbon footprint.
  3. Evaluate neuromorphic hardware for inference workloads, particularly edge deployment scenarios where batch-size-1 efficiency advantages are maximized.
  4. Establish data provenance tracking and contamination detection to maintain training data integrity and avoid model collapse.

For policymakers:

  1. Mandate energy consumption reporting for AI systems and data centers, following the EU EED model of standardized disclosure.
  2. Establish incentives for neuromorphic hardware research and renewable energy procurement in AI infrastructure.
  3. Support hardware-software co-design research programs following the DARPA ML2P model.
  4. Address grid infrastructure capacity proactively, ensuring electricity supply keeps pace with data center demand growth.

7.2 Future Research Directions

Several research frontiers require continued investigation. The ‘killer application’ for neuromorphic computing—the task demonstrating decisive superiority over conventional architectures—remains an open question, with LLM inference, robotics, and real-time sensor processing as leading candidates. The development of neuromorphic software ecosystems must accelerate to reduce the “translation tax” between conventional and spiking architectures. Standardized energy benchmarking methodologies for AI systems are urgently needed to enable meaningful cross-platform comparison and informed policy design.

Most fundamentally, the research community must develop verification-first cognitive architectures that resist confident error propagation—systems that, in the language of this research program, “don’t lie to themselves.” The intersection of reliability and sustainability represents the defining architectural challenge: building systems that are simultaneously more accurate, more efficient, and more honest about the boundaries of their knowledge.

The path to sustainable intelligence requires technological advancement guided by the principles of radical efficiency, ecological accountability, and comprehensive architectural awareness. The Artemis Solution provides a framework for that journey.


References

Anthropic. (2024). Small samples, big impact: Data poisoning in LLMs. Anthropic Research. https://www.anthropic.com/research/small-samples-poison

arXiv. (2024). Persistent pre-training poisoning of language models. arXiv preprint, 2410.13722. https://arxiv.org/abs/2410.13722

arXiv. (2025). How hungry is AI? Benchmarking energy, water, and carbon footprint of LLM inference. arXiv preprint, 2505.09598. https://arxiv.org/abs/2505.09598

Desislavov, R., Martínez-Plumed, F., & Hernández-Orallo, J. (2023). Trends in AI inference energy consumption: Beyond the performance-vs-parameter laws of deep learning. Sustainable Computing: Informatics and Systems, 38, 100857. https://doi.org/10.1016/j.suscom.2023.100857

Epoch AI. (2024). Will we run out of data? An analysis of the limits of scaling datasets in machine learning. arXiv preprint, 2211.04325v2. https://arxiv.org/abs/2211.04325

Epoch AI. (2025). How much energy does ChatGPT use? Gradient Updates. https://epochai.org/gradient-updates/how-much-energy-does-chatgpt-use

ETH Zurich. (2025). Can poisoned AI models be cured? Department of Computer Science Spotlight. https://inf.ethz.ch/news-and-events/spotlights/2025/02/can-poisoned-ai-models-be-cured.html

European Commission. (2024). Delegated Regulation EU/2024/1364 on data centre sustainability indicators. Official Journal of the European Union. https://eur-lex.europa.eu/eli/del_reg/2024/1364/oj

European Parliament and Council. (2023). Directive 2023/1791 on Energy Efficiency (EED recast). Official Journal of the European Union. https://eur-lex.europa.eu/eli/dir/2023/1791/oj

European Parliament and Council. (2024). Regulation (EU) 2024/1689 laying down harmonised rules on artificial intelligence (AI Act). Official Journal of the European Union. https://eur-lex.europa.eu/eli/reg/2024/1689/oj

IBM Research. (2023). NorthPole: An architecture for neural network inference. Science, 382(6668). https://doi.org/10.1126/science.adh1174

IBM Research. (2024). NorthPole achieves new speed and efficiency milestones for LLM inference. IEEE High Performance Extreme Computing (HPEC) Conference, September 2024. https://research.ibm.com/blog/northpole-llm-inference-results

Intel Labs. (2023). Lava: An open-source software framework for neuromorphic computing. GitHub Repository. https://github.com/lava-nc/lava

Intel Labs. (2024). Hala Point: World’s largest neuromorphic system. Intel Newsroom, April 17, 2024. ICASSP 2024 results: Efficient video and audio processing with Loihi 2. https://newsroom.intel.com/news/intel-builds-worlds-largest-neuromorphic-system

International Energy Agency. (2025). Energy and AI: Special Report. IEA Publications, April 10, 2025. https://www.iea.org/reports/energy-and-ai

Kováč, L. (2010). Bioenergetics: A key to brain and mind. EMBO Reports, 11(4), 252–255. https://doi.org/10.1038/embor.2010.38

Li, S. (2025). Making AI less “thirsty”: Uncovering and addressing the secret water footprint of AI models. arXiv preprint, 2304.03271. Updated estimates in Nature Sustainability (2025). https://arxiv.org/abs/2304.03271

Liu, V., & Yin, Y. (2024). Green AI: Exploring carbon footprints, mitigation strategies, and trade-offs in LLM training. Discover Artificial Intelligence, 4, 49. https://doi.org/10.1007/s44163-024-00149-w

Meta Platforms. (2024). Sustainability Report 2024. Waste heat recovery data for Odense, Denmark facility. https://sustainability.atmeta.com/2024-sustainability-report/

Modha, D. S., et al. (2023). Neural inference at the frontier of energy, space, and time. Science, 382(6668). https://doi.org/10.1126/science.adh1174

Patterson, D., Gonzalez, J., Le, Q., Liang, C., Munguia, L.-M., Rothchild, D., So, D., Texier, M., & Dean, J. (2021). Carbon emissions and large neural network training. arXiv preprint, 2104.10350. Table 4: GPT-3 energy = 1,287 MWh, CO₂e = 552 t. https://arxiv.org/abs/2104.10350

PNAS. (2025). Can neuromorphic computing help reduce AI’s high energy cost? Proceedings of the National Academy of Sciences. https://doi.org/10.1073/pnas.2528654122

Shehabi, A., Smith, S. J., Masanet, E., & Koomey, J. (2024). United States data center energy usage report. Lawrence Berkeley National Laboratory, LBNL-2001635. https://eta.lbl.gov/publications/united-states-data-center-energy

Shumailov, I., Shumaylov, Z., Zhao, Y., Papernot, N., Anderson, R., & Gal, Y. (2024). AI models collapse when trained on recursively generated data. Nature, 631, 755–759. https://doi.org/10.1038/s41586-024-07566-y

The Register. (2024). DoE receives Intel’s latest neuromorphic brain-in-a-box. The Register, April 25, 2024. https://www.theregister.com/2024/04/17/intel_hala_point_neuromorphic_owl/

U.S. Defense Advanced Research Projects Agency. (2025). ML2P: Mapping Machine Learning to Physics. Program Solicitation DARPA-PS-25-32. https://www.darpa.mil/research/programs/mapping-machine-learning-physics

White & Case LLP. (2025). Energy efficiency requirements under the EU AI Act. Client Advisory. https://www.whitecase.com/insight-alert/eu-ai-act-energy-efficiency

World Economic Forum. (2025). How data centres can avoid doubling their energy use by 2030. World Economic Forum, December 2025. https://www.weforum.org/stories/2025/12/data-centres-energy-use-efficiency/

You, J., Chung, J.-W., & Chowdhury, M. (2023). Zeus: Understanding and optimizing GPU energy consumption of DNN training. 20th USENIX Symposium on Networked Systems Design and Implementation (NSDI ’23), pp. 119–139. https://www.usenix.org/conference/nsdi23/presentation/you


Appendix A: Claims Ledger

This appendix provides source traceability for each quantitative claim in the paper, identifying the exact source document, specific figure or table reference, and conditions under which the data was produced. This transparency is offered in the spirit of verification-first methodology—the same standard this paper advocates for AI systems.

Table A1: Source Traceability for Key Quantitative Claims

ClaimSource & Specific ReferenceConditions / Assumptions
415 TWh global DC electricity (2024)IEA (2025), Energy and AI, Executive Summary, p. 1IEA estimate; acknowledged uncertainty in DC metering
945 TWh projected by 2030IEA (2025), Energy and AI, Executive Summary, Base CaseBase Case scenario; Lift-Off Case projects higher
GPT-3: 1,287 MWh training energyPatterson et al. (2021), arXiv:2104.10350, Table 4Estimated from compute hours × hardware TDP; single training run
GPT-3: 552 tCO₂e emissionsPatterson et al. (2021), arXiv:2104.10350, Table 4Based on U.S. grid average carbon intensity at time of training
Brain operates at ~20 wattsKováč (2010), EMBO Reports 11(4); PNAS (2025)Whole-brain metabolic estimate; varies by cognitive load
NorthPole: 25× frames/jouleModha et al., Science (2023), Figure 4AResNet-50 benchmark; vs. comparable 12nm V100 GPU
NorthPole: 46.9× faster, 72.7× efficient (LLM)IBM Research blog (Oct 2024); IEEE HPEC 20243B-param Granite model distilled for NorthPole; vs. specific GPU baselines
NorthPole: 28,356 tokens/secIBM Research blog (Oct 2024)16-chip config in 2U server; 3B-param model
Hala Point: 15 TOPS/W, 20 petaopsIntel Newsroom (April 2024); Business WireVendor-reported peak; conventional DNN workloads
Hala Point: 1,152 Loihi 2 chips, 2,600WIntel Newsroom (April 2024)Maximum system power; actual varies by workload
Zeus: 15.3%–75.8% energy reductionYou et al., USENIX NSDI 2023, pp. 119–139Range across diverse workloads; vs. max batch size / max power baseline
Model collapse from recursive AI trainingShumailov et al., Nature 631 (2024), pp. 755–759Peer-reviewed; tested across multiple model architectures
0.001% contamination causing harmful outputETH Zurich (2025)Medical domain training data; specific threshold
250 documents can poison a modelAnthropic (2024)Across fine-tuning cycles; demonstrated including at large scales
Data exhaustion 2026–2028 (text)Epoch AI (2024), arXiv:2211.04325v2High-quality curated text; depends on quality definition and legal access
4.2–6.6B m³ water withdrawal by 2027Li (2025), arXiv:2304.03271; Nature SustainabilityU.S. AI servers; evaporative cooling estimates; withdrawal not consumption

This paper represents ongoing research. Comments, corrections, and collaboration inquiries are welcome. The claims ledger (Appendix A) is provided to facilitate verification and constructive critique.


About Alan Scott Encinas

I design and scale intelligent systems across cognitive AI, autonomous technologies, and defense. Writing on what I’ve built, what I’ve learned, and what actually works.

AboutCognitive AIAutonomous SystemsBuilding with AI


RELATED ARTICLES

Cognitive AI: The Energy Obstacle, and The Artemis Solution (Research)

The GIGO Crisis: Why Social Media’s Fact-Check Rollback Is Teaching AI to Lie

Leave a Reply

Your email address will not be published. Required fields are marked *