AI Intelligence Brief

Tue 14 April 2026

Daily Brief — Curated and contextualised by Best Practice AI

173Articles
Editor's pickEditor's Highlights

Utilities Commit $1.4 Trillion to AI Power Needs, Top Firms Capture 75% of Gains, and Developers Face Claude Degradation

TL;DR Investor-owned utilities plan $1.4 trillion in spending over five years to upgrade the aging US power grid and meet AI-driven electricity demand. PwC's AI Performance study shows 75% of economic gains from AI going to just 20% of companies focused on growth. Stanford HAI's 2026 AI Index reveals China has erased the US lead in AI capabilities. Developers accuse Anthropic of intentionally degrading Claude's performance amid compute limits. CIO reports 40% of AI productivity gains lost to error rework.

Editor's highlights

The stories that matter most

Selected and contextualised by the Best Practice AI team

6 of 173 articles
Lead story
Editor's pickProfessional Services
Arxiv· Yesterday

The Hourglass Revolution: A Theoretical Framework of AI's Impact on Organizational Structures in Developed and Emerging Markets

arXiv:2604.09623v1 Announce Type: cross Abstract: This paper presents a theoretical framework examining how artificial intelligence (AI) transforms organizational structures, introducing an "hourglass" configuration that emerges as AI assumes traditional middle management functions. The analysis identifies three key mechanisms algorithmic coordination, structural fluidity, and hybrid agency that demonstrate how AI enables organizational forms transcending traditional structural boundaries. These mechanisms illustrate how AI enables new modes of organizing to go beyond existing structural boundaries. Drawing on institutional theory and digital transformation research, we examine how these mechanisms operate differently in developed and emerging markets, producing distinct patterns of structural transformation. Our framework offers three important theoretical contributions: (1) conceptualizing algorithmic coordination as a unique form of organizational integration, (2) explaining how structural fluidity allows organizations to achieve stability and adaptability at the same time, and (3) the theoretical argument that hybrid agency surpasses traditional, human centric forms of organizational capabilities. Our analysis shows that while the move to AI enabled strategies overall seems quite global, successful application will need to pay sufficient attention to the technological capabilities, cultural dimensions, and contexts of the market.

Editor's pickManufacturing & Industrials
Arxiv· Yesterday

Agentic AI in Engineering and Manufacturing: Industry Perspectives on Utility, Adoption, Challenges, and Opportunities

arXiv:2604.09633v1 Announce Type: new Abstract: This work examines how AI, especially agentic systems, is being adopted in engineering and manufacturing workflows, what value it provides today, and what is needed for broader deployment. This is an exploratory and qualitative state-of-practice study grounded in over 30 interviews across four stakeholder groups (large enterprises, small/medium firms, AI developers, and CAD/CAM/CAE vendors). We find that near-term AI gains cluster around structured, repetitive work and data-intensive synthesis, while higher-value agentic gains come from orchestrating multi-step workflows across tools. Adoption is constrained less by model capability than by fragmented and machine-unfriendly data, stringent security and regulatory requirements, and limited API-accessible legacy toolchains. Reliability, verification, and auditability are central requirements for adoption, driving human-in-the-loop frameworks and governance aligned with existing engineering reviews. Beyond technical barriers there are also organizational ones: a persistent AI literacy gap, cultural heterogeneity, and governance structures that have not yet caught up with agentic capabilities. Together, the findings point to a staged progression of AI utility from low-consequence assistance toward higher-order automation, as trust, infrastructure, and verification mature. This highlights key breakthroughs needed, including integration with traditional engineering tools and data types, robust verification frameworks, and improved spatial and physical reasoning.

BPAI context

Realizing ROI in manufacturing requires investing in data infrastructure and API-accessible legacy toolchains before deploying agents. Executive takeaway (AI brief): Prioritize 'data hygiene' and legacy system modernization as prerequisites for agentic AI adoption. Agentic AI's strategic value in engineering and manufacturing lies in orchestrating complex workflows to boost productivity, but adoption hinges on bridging infrastructure gaps rather than advancing models alone, positioning early integrators for competitive edges in industrial automation. Insights from over 30 interviews reveal near-term gains in repetitive tasks, constrained by fragmented data, legacy systems, and regulatory demands, necessitating human-in-the-loop verification and organizational upskilling for scaled deployment.

Editor's pickFinancial Services
Arxiv· Yesterday

Detecting Corporate AI-Washing via Cross-Modal Semantic Inconsistency Learning

arXiv:2604.09644v1 Announce Type: new Abstract: Corporate AI-washing-the strategic misrepresentation of AI capabilities via exaggerated or fabricated cross-channel disclosures-has emerged as a systemic threat to capital market information integrity with the widespread adoption of generative AI. Existing detection methods rely on single-modal text frequency analysis, suffering from vulnerability to adversarial reformulation and cross-channel obfuscation. This paper presents AWASH, a multimodal framework that redefines AI-washing detection as cross-modal claim-evidence reasoning (instead of surface-level similarity measurement), built on AW-Bench-the first large-scale trimodal benchmark for this task, including 88412 aligned annual report text, disclosure image, and earnings call video triplets from 4892 A-share listed firms during 2019Q1-2025Q2. We propose the Cross-Modal Inconsistency Detection (CMID) network, integrating a tri-modal encoder, a structured natural language inference module for claim-evidence entailment reasoning, and an operational grounding layer that cross-validates AI claims against verifiable physical evidence (patent filing trajectories, AI-specific talent recruitment, compute infrastructure proxies). Evaluated against six competitive baselines, CMID achieves an F1 score of 0.882 and an AUC-ROC of 0.921, outperforming the strongest text-only baseline by 17.4 percentage points and the latest multimodal competitor by 11.3 percentage points. A pre-registered user study with 14 regulatory analysts verifies that CMID-generated evidence reports cut case review time by 43% while increasing true positive detection rates by 28%. These findings confirm the technical superiority and practical applicability of structured multimodal reasoning for large-scale corporate disclosure surveillance.

Editor's pickTechnology
VentureBeat· 2 days ago

Is Anthropic 'nerfing' Claude? Users increasingly report performance degradation as leaders push back

A growing number of developers and AI power users are taking to social media to accuse Anthropic of degrading the performance of Claude Opus 4.6 and Claude Code — intentionally or as an outcome of compute limits — arguing that the company’s flagship coding model feels less capable, less reliable and more wasteful with tokens than it did just weeks ago. The complaints have spread quickly on Github, X and Reddit over the past several weeks, with several high-reach posts alleging that Claude has become worse at sustained reasoning, more likely to abandon tasks midway through, and more prone to hallucinations or contradictions. Some users have framed the issue as “AI shrinkflation” — the idea that customers are paying the same price for a weaker product. Others have gone further, suggesting Anthropic may be throttling or otherwise tuning Claude downward during periods of heavy demand. Those claims remain unproven, and Anthropic employees have publicly denied that the company degrades models to manage capacity. At the same time, Anthropic has acknowledged real changes to usage limits and reasoning defaults in recent weeks, which has made the broader debate more combustible. VentureBeat has reached out to Anthropic for further clarification on the recent accusations, including whether any recent changes to reasoning defaults, context handling, throttling behavior, inference parameters or benchmark methodology could help explain the spike in complaints. We have also asked how Anthropic explains the recent benchmark-related claims and whether it plans to publish additional data that could reassure customers. An Anthropic spokesperson did not address the questions individually, instead referring us to X posts by Claude Code creator Boris Cherny and Claude Code team member Thariq Shihipar regarding Opus 4.6 performance and usage limits, respectively. Both X posts are also referenced and linked below. Viral user complaints, including from an AMD Senior Director, argue Claude has become less capable One of the most detailed public complaints originated as a GitHub issue filed by Stella Laurenzo on April 2, 2026, whose LinkedIn profile identifies her as Senior Director in AMD’s AI group. In that post, Laurenzo wrote that Claude Code had regressed to the point that it could not be trusted for complex engineering work, then backed that claim with a sprawling analysis of 6,852 Claude Code session files, 17,871 thinking blocks and 234,760 tool calls. The complaint argued that, starting in February, Claude’s estimated reasoning depth fell sharply while signs of poorer performance rose alongside it, including more premature stopping, more “simplest fix” behavior, more reasoning loops, and a measurable shift from research-first behavior to edit-first behavior. The post’s broader point was that for advanced engineering workflows, extended reasoning is not a luxury but part of what makes the model usable in the first place. That GitHub thread then escaped into the broader social media conversation, with X users including @Hesamation, who posted screenshots of Laurenzo's GitHub post to X on April 11, turning it into an even more viral talking point. That amplification mattered because it gave the wider “Claude is getting worse” narrative something more concrete than anecdotal frustration: a long, data-heavy post from a senior AI leader at a major chip company arguing that the regression was visible in logs, tool-use patterns and user corrections, not just gut feeling. Anthropic’s public response focused on separating perceived changes from actual model degradation. In a pinned follow-up on the same GitHub issue posted a week ago, Claude Code lead Boris Cherny thanked Laurenzo for the care and depth of the analysis but disputed its main conclusion. Cherny said the “redact-thinking-2026-02-12” header cited in the complaint is a UI-only change that hides thinking from the interface and reduces latency, but “does not impact thinking itself,” “thinking budgets,” or how extended reasoning works under the hood. He also said two other product changes likely affected what users were seeing: Opus 4.6’s move to adaptive thinking by default on Feb. 9, and a March 3 shift to medium effort, or effort level 85, as the default for Opus 4.6, which he said Anthropic viewed as the best balance across intelligence, latency and cost for most users. Cherny added that users who want more extended reasoning can manually switch effort higher by typing /effort high in Claude Code terminal sessions. That exchange gets at the core of the controversy. Critics like Laurenzo argue that Claude’s behavior in demanding coding workflows has plainly worsened and point to logs and usage patterns as evidence. Anthropic, by contrast, is not saying nothing changed. It is saying the biggest recent changes were product and interface choices that affect what users see and how much effort the system expends by default, not a secret downgrade of the underlying model. That distinction may be technically important, but for power users who feel the product is delivering worse results, it is not necessarily a satisfying one. External coverage from TechRadar and PC Gamer further amplified Laurenzo's post and larger wave of agreement from some power users. Another viral post on X from developer Om Patel on April 7 made the same argument in even more direct terms, claiming that someone had “actually measured” how much “dumber” Claude had gotten and summarizing the result as a 67% drop. That post helped popularize the “AI shrinkflation” label and pushed the controversy beyond hard-core Claude Code users into the broader AI discourse on X. These claims have resonated because they map closely onto what many frustrated users say they are seeing in practice: more unfinished tasks, more backtracking, more token burn and a stronger sense that Claude is less willing to reason deeply through complicated coding jobs than it was earlier this year. Benchmark posts turned anecdotal frustration into a public controversy The loudest benchmark-based claim came from BridgeMind, which runs the BridgeBench hallucination benchmark. On April 12, the account posted that Claude Opus 4.6 had fallen from 83.3% accuracy and a No. 2 ranking in an earlier result to 68.3% accuracy and No. 10 in a new retest, calling that proof that “Claude Opus 4.6 is nerfed.” That post spread widely and became one of the main anchors for the broader public case that Anthropic had degraded the model. Other users also circulated benchmark-related or test-based posts suggesting that Opus 4.6 was underperforming versus Opus 4.5 in practical coding tasks. Still other posts pointed to TerminalBench-related results as supposed evidence that the model’s behavior had changed in certain harnesses or product contexts. The effect was cumulative: benchmark screenshots, side-by-side tests and anecdotal frustration all began reinforcing one another in public. That matters because benchmark claims tend to travel farther than more subjective complaints. A developer saying a model “feels worse” is one thing. A screenshot showing a ranking drop from No. 2 to No. 10, or a dramatic percentage swing in accuracy, gives the appearance of hard proof, even when the underlying comparison may be more complicated. Critics of the benchmark claims say the evidence is weaker than it looks The most important rebuttal to the BridgeBench claim did not come from Anthropic. It came from Paul Calcraft, an outside software and AI researcher on X, who argued that the viral comparison was misleading because the earlier Opus 4.6 result was based on only six tasks while the later one was based on 30. In his words, it was a “DIFFERENT BENCHMARK.” He also said that on the six tasks the two runs shared in common, Claude’s score moved only modestly, from 87.6% previously to 85.4% in the later run, and that the bigger swing appeared to come mostly from a single fabrication result without repeats. He characterized that as something that could easily fall within ordinary statistical noise. That outside rebuttal matters because it undercuts one of the cleanest and most viral claims in circulation. It does not prove users are wrong to think something has changed. But it does suggest that at least some of the benchmark evidence now driving the story may be overstated, poorly normalized or not directly comparable. Even the BridgeBench post itself drew a community note to similar effect. The note said the two benchmark runs covered different scopes — six tasks in one case and 30 in the other — and that the common-task subset showed only a minor change. That does not make the later result meaningless, but it weakens the strongest version of the “BridgeBench proved it” argument. This is now a key feature of the controversy: the claims are not all equally strong. Some are grounded in first-hand user experience. Some point to real product changes. Some rely on benchmark comparisons that may not be apples-to-apples. And some depend on inferences about hidden system behavior that users outside Anthropic cannot directly verify. Earlier capacity limits gave users a reason to suspect more changes under the hood The current backlash also lands in the shadow of a real, confirmed Anthropic policy change from late March. On March 26, Anthropic technical staffer Thariq Shihipar posted that, “To manage growing demand for Claude,” the company was adjusting how 5-hour session limits work for Free, Pro and Max subscribers during peak hours, while keeping weekly limits unchanged. He added that during weekdays from 5 a.m. to 11 a.m. Pacific time, users would move through their 5-hour session limits faster than before. In follow-up posts, he said Anthropic had landed efficiency wins to offset some of the impact, but that roughly 7% of users would hit session limits they would not have hit before, particularly on Pro tiers. In an email on March 27, 2026, Anthropic told VentureBeat that Team and Enterprise customers were not affected by those changes, and that the shift was not dynamically optimized per user but instead applied to the peak-hour window the company had publicly described. Anthropic also said it was continuing to invest in scaling capacity. Those comments were about session limits, not model downgrades. But they are important context, because they establish two things that users now keep connecting in public: first, Anthropic has been dealing with surging demand; second, it has already changed how usage is rationed during busy periods. That does not prove Anthropic reduced model quality. It does help explain why so many users are primed to believe something else may also have changed. Prompt caching and TTL A separate, more recent GitHub issue broadens the dispute beyond model quality and into pricing and quota behavior. In issue #46829, user seanGSISG argued that Claude Code’s prompt-cache time-to-live, or TTL, appeared to shift from a one-hour setting back to a five-minute setting in early March, based on analysis of nearly 120,000 API calls drawn from Claude Code session logs across two machines. The complaint argues that this change drove meaningful increases in cache-creation costs and quota burn, especially for long-running coding sessions where cached context expires quickly and must be rebuilt. The author claims that this helps explain why some subscription users began hitting usage limits they had not previously encountered. What makes this issue notable is that Anthropic did not flatly deny that something changed. In a reply on the thread, Jarred Sumner said the March 6 change was real and intentional, but rejected the framing that it was a regression. He said Claude Code uses different cache durations for different request types, and that one-hour cache is not always cheaper because one-hour writes cost more up front and only save money when the same cached context is reused enough times to justify it. In his telling, the change was part of ongoing cache optimization work, not a silent downgrade, and the pre–March 6 behavior described in the issue “wasn’t the intended steady state.” The thread later drew a more detailed response from Anthropic’s Cherny, who described one-hour caching as “nuanced” and said the company has been testing heuristics to improve cache hit rates, token usage and latency for subscribers. Cherny said Anthropic keeps five-minute cache for many queries, including subagents that are rarely resumed, and said turning off telemetry also disables experiment gates, which can cause Claude Code to fall back to a five-minute default in some cases. He added that Anthropic plans to expose environment variables that let users force one-hour or five-minute cache behavior directly. Together, those replies do not validate the issue author’s claim that Anthropic silently made Claude Code more expensive overall, but they do confirm that Anthropic has been actively experimenting with cache behavior behind the scenes during the same period users began complaining more loudly about quota burn and changing product behavior. Anthropic says user-facing changes, not secret degradation, explain much of the uproar Anthropic-affiliated employees have publicly pushed back on the broadest accusations. In one widely circulated reply on X, Cherny responded to claims that Anthropic had secretly nerfed Claude Code by writing, “This is false.” He said Claude Code had been defaulted to medium effort in response to user feedback that Claude was consuming too many tokens, and that the change had been disclosed both in the changelog and in a dialog shown to users when they opened Claude Code. That response is notable because it concedes a meaningful product change while rejecting the more conspiratorial interpretation of it. Anthropic is not saying nothing changed. It is saying that what changed was disclosed and was aimed at balancing token use, not secretly reducing model quality. Public documentation also supports the fact that effort defaults have been in motion. Claude Code’s changelog says that on April 7, Anthropic changed the default effort level from medium to high for API-key users as well as Bedrock, Vertex, Foundry, Team and Enterprise users. That suggests Anthropic has actively been tuning these settings across different segments, which could plausibly affect user perceptions even if the core model weights are unchanged. Shihipar has also directly denied the broader demand-management accusation. In a reply on X posted April 11, he said Anthropic does not “degrade” its models to better serve demand. He also said that changes to thinking summaries affected how some users were measuring Claude’s “thinking,” and that the company had not found evidence backing the strongest qualitative claims now spreading online. The real issue may be trust as much as model quality What is clear is that a trust gap has opened between Anthropic and some of its most demanding users. For developers who rely on Claude Code all day, subtle shifts in visible thinking output, effort defaults, token burn, latency tradeoffs or usage caps can feel indistinguishable from a weaker model. That is true whether the root cause is a product setting, a UI change, an inference-policy tweak, capacity pressure or a genuine quality regression. It also means both sides of the fight may be talking past each other. Users are describing what they experience: more friction, more failures and less confidence. Anthropic is responding in product terms: effort defaults, hidden thinking summaries, changelog disclosures, and denials that demand pressure is causing secret model degradation. Those are not necessarily incompatible descriptions. A model can feel worse to users even if the company believes it has not “nerfed” the underlying model in the way critics allege. But coming at a time when Anthropic's chief rival OpenAI has recently pivoted and put more resources behind its competing, enterprise and vibe-coding focused product Codex — even offering a new, more mid-range ChatGPT subscription in an effort to boost usage of the tool — it's certainly not the kind of publicity that stands to benefit Anthropic or its customer retention. At the same time, the public evidence remains mixed. Some of the most viral claims have come from developers with detailed logs and strong opinions based on repeated use. Some of the benchmark evidence has been challenged by outside observers on methodological grounds. And Anthropic’s own recent changes to limits and settings ensure that this debate is happening against a backdrop of real adjustments, not pure rumor.

Editor's pickTechnology
Arxiv· Yesterday

Assessing Model-Agnostic XAI Methods against EU AI Act Explainability Requirements

arXiv:2604.09628v1 Announce Type: new Abstract: Explainable AI (XAI) has evolved in response to expectations and regulations, such as the EU AI Act, which introduces regulatory requirements on AI-powered systems. However, a persistent gap remains between existing XAI methods and society's legal requirements, leaving practitioners without clear guidance on how to approach compliance in the EU market. To bridge this gap, we study model-agnostic XAI methods and relate their interpretability features to the requirements of the AI Act. We then propose a qualitative-to-quantitative scoring framework: qualitative expert assessments of XAI properties are aggregated into a regulation-specific compliance score. This helps practitioners identify when XAI solutions may support legal explanation requirements while highlighting technical issues that require further research and regulatory clarification.

Editor's pickTechnology
Arxiv· Yesterday

Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards

arXiv:2604.09855v1 Announce Type: cross Abstract: The recent advancement of Large Language Models (LLMs) has established their potential as autonomous interactive agents. However, they often struggle in strategic games of incomplete information, such as bilateral price negotiation. In this paper, we investigate if Reinforcement Learning from Verifiable Rewards (RLVR) can effectively teach LLMs to negotiate. Specifically, we explore the strategic behaviors that emerge during the learning process. We introduce a framework that trains a mid-sized buyer agent against a regulated LLM seller across a wide distribution of real-world products. By grounding reward signals directly in the maximization of economic surplus and strict adherence to private budget constraints, we reveal a novel four-phase strategic evolution. The agent progresses from naive bargaining to using aggressive starting prices, moves through a phase of deadlock, and ultimately develops sophisticated persuasive skills. Our results demonstrate that this verifiable training allows a 30B agent to significantly outperform frontier models over ten times its size in extracting surplus. Furthermore, the trained agent generalizes robustly to stronger counterparties unseen during training and remains effective even when facing hostile, adversarial seller personas.

Economics & Markets

38 articles
AI Investment & Valuations11 articles
Editor's pickFinancial Services
Arxiv· Yesterday

Detecting Corporate AI-Washing via Cross-Modal Semantic Inconsistency Learning

arXiv:2604.09644v1 Announce Type: new Abstract: Corporate AI-washing-the strategic misrepresentation of AI capabilities via exaggerated or fabricated cross-channel disclosures-has emerged as a systemic threat to capital market information integrity with the widespread adoption of generative AI. Existing detection methods rely on single-modal text frequency analysis, suffering from vulnerability to adversarial reformulation and cross-channel obfuscation. This paper presents AWASH, a multimodal framework that redefines AI-washing detection as cross-modal claim-evidence reasoning (instead of surface-level similarity measurement), built on AW-Bench-the first large-scale trimodal benchmark for this task, including 88412 aligned annual report text, disclosure image, and earnings call video triplets from 4892 A-share listed firms during 2019Q1-2025Q2. We propose the Cross-Modal Inconsistency Detection (CMID) network, integrating a tri-modal encoder, a structured natural language inference module for claim-evidence entailment reasoning, and an operational grounding layer that cross-validates AI claims against verifiable physical evidence (patent filing trajectories, AI-specific talent recruitment, compute infrastructure proxies). Evaluated against six competitive baselines, CMID achieves an F1 score of 0.882 and an AUC-ROC of 0.921, outperforming the strongest text-only baseline by 17.4 percentage points and the latest multimodal competitor by 11.3 percentage points. A pre-registered user study with 14 regulatory analysts verifies that CMID-generated evidence reports cut case review time by 43% while increasing true positive detection rates by 28%. These findings confirm the technical superiority and practical applicability of structured multimodal reasoning for large-scale corporate disclosure surveillance.

Editor's pickPAYWALL
Bloomberg· Yesterday

Emerging-Market Profit Forecasts at Record as AI Boom Defies War

Profit forecasts for emerging-market companies are hitting record highs even as the war in Iran shakes global markets, with investors betting on earnings resilience at Asia’s artificial-intelligence powerhouses.

Editor's pickFinancial Services
unhypedai.substack.com· 2 days ago

Private Equity Needs a Repeatable AI Advantage

Private Equity Needs a Repeatable AI Advantage # Unhyped AI SubscribeSign in # Private Equity Needs a Repeatable AI Advantage ### How PE turns AI portfolio activity into portfolio-level value Apr 13, 2026 4 Share Private equity is under growing pressure to say something credible about AI, not as a fashionable add-on, but as part of the next cycle of portfolio value creation and the case it will need to make to LPs about repeatable operating advantage. Spend enough time close to how PE organisations and portfolio companies are actually trying to make this real and the problem becomes impossible to miss. It is no longer enthusiasm, tooling, or even activity. It is whether any of that activity is turning into an advantage private equity can recognise, trust, scale, and defend. Too much portfolio AI still sits in the gap between impressive local motion and evidence strong enough to

Editor's pickFinancial Services
PR Newswire· 2 days ago

42% of CFOs plan to increase AI investment by over 30% within two years - Bain & Company

/PRNewswire/ -- CFOs' capital commitment to AI is real and growing rapidly, with finance departments increasingly reaping the benefits, Bain & Company finds in...

Editor's pickPharma & Biotech
Theindianpractitioner· Yesterday

Novo Nordisk Partners with OpenAI to Accelerate AI-Driven Drug Discovery - The Indian Practitioner

Novo Nordisk has announced a strategic partnership with OpenAI to drive AI-led transformation in healthcare. Through this collaboration, Novo Nordisk aims to accelerate the development of innovative treatment options and […]

Editor's pickFinancial Services
Venture Capital Journal· 2 days ago

An AI-sized bump in VC secondaries’ rebound

AI-driven tech disruptions have raised questions over the sustainability of the recent pricing recovery in venture secondaries.

Editor's pick
MBI Deep Dives· 2 days ago

AI Economics in the East

In the most recent All-in episode, Brad Gerstner from Altimeter (which owns stakes in both OpenAI and Anthropic) pushed back at the gross margin concerns for AI labs. From Brad Gerstner (emphasis mine): “Their gross margins are exploding higher. Like the fastest increase in gross margins ...

Editor's pickTechnology
Bebeez· Yesterday

Exclusive: Barcelona and San Francisco-based Modern Relay lands €2.5 million for its enterprise AI foundation layer

EU-Startups has learned that Spain and U.S.-based Modern Relay, a shared foundation for enterprise AI, has raised a €2.5 million ($3 million) funding round along with releasing its first open-source product, Omnigraph – a Git-style graph database built for a world where agents are first-class operators. The round was raised with participation from Point Nine, […]

Editor's pickTechnology
Top Daily Headlines: IT manager approved downtime over lunch, but made a meal of it· Yesterday

NHS pays £46K to prep next Microsoft licensing round

Benchmarking contract lays groundwork for renegotiating £774M software agreement.

Editor's pickManufacturing & Industrials
Simplified Capital· 2 days ago

The $11 Billion Power Play: How Infrastructure and Data Centers are Fueling the 2026 Equipment Surge | Simplified Capital

dream…grow…succeed [NEWS ALERT: FEBRUARY VOLUME HITS RECORD $11 BILLION ... CONSTRUCTION SECTOR GROWTH AT 22.2% YTD ... DATA CENTERS AND ENERGY INFRASTRUCTURE DRIVING HISTORIC DEMAND ... SIMPLIFIED CAPITAL ANALYSIS ...] Breaking: The Equipment Market Just Shattered Expectations If you’ve ...

Editor's pickTechnology
Yahoo! Finance· 2 days ago

3 AI Semiconductor Stocks That Are Now Trading Below 20X Earnings

AI semiconductor stocks have been holding up, and some have been soaring. Their earnings have risen so much that some are still trading below 20x earnings, like Micron (NASDAQ:MU), Skywater Technology (NASDAQ:SKYT), and Photronics (NASDAQ:PLAB). What makes them special is that these three are ...

AI Macroeconomics5 articles
AI Market Competition10 articles
Editor's pickTechnology
Top Daily Headlines: Fewer than 3 in 10 register for HMRC's Making Tax Digital shake-up· 2 days ago

How Salesforce and ServiceNow are squaring off in the battle for the helpdesk

Salesforce and ServiceNow are competing for dominance in the helpdesk market, focusing on user engagement versus AI agent governance.

Editor's pickTechnology
VentureBeat· 2 days ago

Is Anthropic 'nerfing' Claude? Users increasingly report performance degradation as leaders push back

A growing number of developers and AI power users are taking to social media to accuse Anthropic of degrading the performance of Claude Opus 4.6 and Claude Code — intentionally or as an outcome of compute limits — arguing that the company’s flagship coding model feels less capable, less reliable and more wasteful with tokens than it did just weeks ago. The complaints have spread quickly on Github, X and Reddit over the past several weeks, with several high-reach posts alleging that Claude has become worse at sustained reasoning, more likely to abandon tasks midway through, and more prone to hallucinations or contradictions. Some users have framed the issue as “AI shrinkflation” — the idea that customers are paying the same price for a weaker product. Others have gone further, suggesting Anthropic may be throttling or otherwise tuning Claude downward during periods of heavy demand. Those claims remain unproven, and Anthropic employees have publicly denied that the company degrades models to manage capacity. At the same time, Anthropic has acknowledged real changes to usage limits and reasoning defaults in recent weeks, which has made the broader debate more combustible. VentureBeat has reached out to Anthropic for further clarification on the recent accusations, including whether any recent changes to reasoning defaults, context handling, throttling behavior, inference parameters or benchmark methodology could help explain the spike in complaints. We have also asked how Anthropic explains the recent benchmark-related claims and whether it plans to publish additional data that could reassure customers. An Anthropic spokesperson did not address the questions individually, instead referring us to X posts by Claude Code creator Boris Cherny and Claude Code team member Thariq Shihipar regarding Opus 4.6 performance and usage limits, respectively. Both X posts are also referenced and linked below. Viral user complaints, including from an AMD Senior Director, argue Claude has become less capable One of the most detailed public complaints originated as a GitHub issue filed by Stella Laurenzo on April 2, 2026, whose LinkedIn profile identifies her as Senior Director in AMD’s AI group. In that post, Laurenzo wrote that Claude Code had regressed to the point that it could not be trusted for complex engineering work, then backed that claim with a sprawling analysis of 6,852 Claude Code session files, 17,871 thinking blocks and 234,760 tool calls. The complaint argued that, starting in February, Claude’s estimated reasoning depth fell sharply while signs of poorer performance rose alongside it, including more premature stopping, more “simplest fix” behavior, more reasoning loops, and a measurable shift from research-first behavior to edit-first behavior. The post’s broader point was that for advanced engineering workflows, extended reasoning is not a luxury but part of what makes the model usable in the first place. That GitHub thread then escaped into the broader social media conversation, with X users including @Hesamation, who posted screenshots of Laurenzo's GitHub post to X on April 11, turning it into an even more viral talking point. That amplification mattered because it gave the wider “Claude is getting worse” narrative something more concrete than anecdotal frustration: a long, data-heavy post from a senior AI leader at a major chip company arguing that the regression was visible in logs, tool-use patterns and user corrections, not just gut feeling. Anthropic’s public response focused on separating perceived changes from actual model degradation. In a pinned follow-up on the same GitHub issue posted a week ago, Claude Code lead Boris Cherny thanked Laurenzo for the care and depth of the analysis but disputed its main conclusion. Cherny said the “redact-thinking-2026-02-12” header cited in the complaint is a UI-only change that hides thinking from the interface and reduces latency, but “does not impact thinking itself,” “thinking budgets,” or how extended reasoning works under the hood. He also said two other product changes likely affected what users were seeing: Opus 4.6’s move to adaptive thinking by default on Feb. 9, and a March 3 shift to medium effort, or effort level 85, as the default for Opus 4.6, which he said Anthropic viewed as the best balance across intelligence, latency and cost for most users. Cherny added that users who want more extended reasoning can manually switch effort higher by typing /effort high in Claude Code terminal sessions. That exchange gets at the core of the controversy. Critics like Laurenzo argue that Claude’s behavior in demanding coding workflows has plainly worsened and point to logs and usage patterns as evidence. Anthropic, by contrast, is not saying nothing changed. It is saying the biggest recent changes were product and interface choices that affect what users see and how much effort the system expends by default, not a secret downgrade of the underlying model. That distinction may be technically important, but for power users who feel the product is delivering worse results, it is not necessarily a satisfying one. External coverage from TechRadar and PC Gamer further amplified Laurenzo's post and larger wave of agreement from some power users. Another viral post on X from developer Om Patel on April 7 made the same argument in even more direct terms, claiming that someone had “actually measured” how much “dumber” Claude had gotten and summarizing the result as a 67% drop. That post helped popularize the “AI shrinkflation” label and pushed the controversy beyond hard-core Claude Code users into the broader AI discourse on X. These claims have resonated because they map closely onto what many frustrated users say they are seeing in practice: more unfinished tasks, more backtracking, more token burn and a stronger sense that Claude is less willing to reason deeply through complicated coding jobs than it was earlier this year. Benchmark posts turned anecdotal frustration into a public controversy The loudest benchmark-based claim came from BridgeMind, which runs the BridgeBench hallucination benchmark. On April 12, the account posted that Claude Opus 4.6 had fallen from 83.3% accuracy and a No. 2 ranking in an earlier result to 68.3% accuracy and No. 10 in a new retest, calling that proof that “Claude Opus 4.6 is nerfed.” That post spread widely and became one of the main anchors for the broader public case that Anthropic had degraded the model. Other users also circulated benchmark-related or test-based posts suggesting that Opus 4.6 was underperforming versus Opus 4.5 in practical coding tasks. Still other posts pointed to TerminalBench-related results as supposed evidence that the model’s behavior had changed in certain harnesses or product contexts. The effect was cumulative: benchmark screenshots, side-by-side tests and anecdotal frustration all began reinforcing one another in public. That matters because benchmark claims tend to travel farther than more subjective complaints. A developer saying a model “feels worse” is one thing. A screenshot showing a ranking drop from No. 2 to No. 10, or a dramatic percentage swing in accuracy, gives the appearance of hard proof, even when the underlying comparison may be more complicated. Critics of the benchmark claims say the evidence is weaker than it looks The most important rebuttal to the BridgeBench claim did not come from Anthropic. It came from Paul Calcraft, an outside software and AI researcher on X, who argued that the viral comparison was misleading because the earlier Opus 4.6 result was based on only six tasks while the later one was based on 30. In his words, it was a “DIFFERENT BENCHMARK.” He also said that on the six tasks the two runs shared in common, Claude’s score moved only modestly, from 87.6% previously to 85.4% in the later run, and that the bigger swing appeared to come mostly from a single fabrication result without repeats. He characterized that as something that could easily fall within ordinary statistical noise. That outside rebuttal matters because it undercuts one of the cleanest and most viral claims in circulation. It does not prove users are wrong to think something has changed. But it does suggest that at least some of the benchmark evidence now driving the story may be overstated, poorly normalized or not directly comparable. Even the BridgeBench post itself drew a community note to similar effect. The note said the two benchmark runs covered different scopes — six tasks in one case and 30 in the other — and that the common-task subset showed only a minor change. That does not make the later result meaningless, but it weakens the strongest version of the “BridgeBench proved it” argument. This is now a key feature of the controversy: the claims are not all equally strong. Some are grounded in first-hand user experience. Some point to real product changes. Some rely on benchmark comparisons that may not be apples-to-apples. And some depend on inferences about hidden system behavior that users outside Anthropic cannot directly verify. Earlier capacity limits gave users a reason to suspect more changes under the hood The current backlash also lands in the shadow of a real, confirmed Anthropic policy change from late March. On March 26, Anthropic technical staffer Thariq Shihipar posted that, “To manage growing demand for Claude,” the company was adjusting how 5-hour session limits work for Free, Pro and Max subscribers during peak hours, while keeping weekly limits unchanged. He added that during weekdays from 5 a.m. to 11 a.m. Pacific time, users would move through their 5-hour session limits faster than before. In follow-up posts, he said Anthropic had landed efficiency wins to offset some of the impact, but that roughly 7% of users would hit session limits they would not have hit before, particularly on Pro tiers. In an email on March 27, 2026, Anthropic told VentureBeat that Team and Enterprise customers were not affected by those changes, and that the shift was not dynamically optimized per user but instead applied to the peak-hour window the company had publicly described. Anthropic also said it was continuing to invest in scaling capacity. Those comments were about session limits, not model downgrades. But they are important context, because they establish two things that users now keep connecting in public: first, Anthropic has been dealing with surging demand; second, it has already changed how usage is rationed during busy periods. That does not prove Anthropic reduced model quality. It does help explain why so many users are primed to believe something else may also have changed. Prompt caching and TTL A separate, more recent GitHub issue broadens the dispute beyond model quality and into pricing and quota behavior. In issue #46829, user seanGSISG argued that Claude Code’s prompt-cache time-to-live, or TTL, appeared to shift from a one-hour setting back to a five-minute setting in early March, based on analysis of nearly 120,000 API calls drawn from Claude Code session logs across two machines. The complaint argues that this change drove meaningful increases in cache-creation costs and quota burn, especially for long-running coding sessions where cached context expires quickly and must be rebuilt. The author claims that this helps explain why some subscription users began hitting usage limits they had not previously encountered. What makes this issue notable is that Anthropic did not flatly deny that something changed. In a reply on the thread, Jarred Sumner said the March 6 change was real and intentional, but rejected the framing that it was a regression. He said Claude Code uses different cache durations for different request types, and that one-hour cache is not always cheaper because one-hour writes cost more up front and only save money when the same cached context is reused enough times to justify it. In his telling, the change was part of ongoing cache optimization work, not a silent downgrade, and the pre–March 6 behavior described in the issue “wasn’t the intended steady state.” The thread later drew a more detailed response from Anthropic’s Cherny, who described one-hour caching as “nuanced” and said the company has been testing heuristics to improve cache hit rates, token usage and latency for subscribers. Cherny said Anthropic keeps five-minute cache for many queries, including subagents that are rarely resumed, and said turning off telemetry also disables experiment gates, which can cause Claude Code to fall back to a five-minute default in some cases. He added that Anthropic plans to expose environment variables that let users force one-hour or five-minute cache behavior directly. Together, those replies do not validate the issue author’s claim that Anthropic silently made Claude Code more expensive overall, but they do confirm that Anthropic has been actively experimenting with cache behavior behind the scenes during the same period users began complaining more loudly about quota burn and changing product behavior. Anthropic says user-facing changes, not secret degradation, explain much of the uproar Anthropic-affiliated employees have publicly pushed back on the broadest accusations. In one widely circulated reply on X, Cherny responded to claims that Anthropic had secretly nerfed Claude Code by writing, “This is false.” He said Claude Code had been defaulted to medium effort in response to user feedback that Claude was consuming too many tokens, and that the change had been disclosed both in the changelog and in a dialog shown to users when they opened Claude Code. That response is notable because it concedes a meaningful product change while rejecting the more conspiratorial interpretation of it. Anthropic is not saying nothing changed. It is saying that what changed was disclosed and was aimed at balancing token use, not secretly reducing model quality. Public documentation also supports the fact that effort defaults have been in motion. Claude Code’s changelog says that on April 7, Anthropic changed the default effort level from medium to high for API-key users as well as Bedrock, Vertex, Foundry, Team and Enterprise users. That suggests Anthropic has actively been tuning these settings across different segments, which could plausibly affect user perceptions even if the core model weights are unchanged. Shihipar has also directly denied the broader demand-management accusation. In a reply on X posted April 11, he said Anthropic does not “degrade” its models to better serve demand. He also said that changes to thinking summaries affected how some users were measuring Claude’s “thinking,” and that the company had not found evidence backing the strongest qualitative claims now spreading online. The real issue may be trust as much as model quality What is clear is that a trust gap has opened between Anthropic and some of its most demanding users. For developers who rely on Claude Code all day, subtle shifts in visible thinking output, effort defaults, token burn, latency tradeoffs or usage caps can feel indistinguishable from a weaker model. That is true whether the root cause is a product setting, a UI change, an inference-policy tweak, capacity pressure or a genuine quality regression. It also means both sides of the fight may be talking past each other. Users are describing what they experience: more friction, more failures and less confidence. Anthropic is responding in product terms: effort defaults, hidden thinking summaries, changelog disclosures, and denials that demand pressure is causing secret model degradation. Those are not necessarily incompatible descriptions. A model can feel worse to users even if the company believes it has not “nerfed” the underlying model in the way critics allege. But coming at a time when Anthropic's chief rival OpenAI has recently pivoted and put more resources behind its competing, enterprise and vibe-coding focused product Codex — even offering a new, more mid-range ChatGPT subscription in an effort to boost usage of the tool — it's certainly not the kind of publicity that stands to benefit Anthropic or its customer retention. At the same time, the public evidence remains mixed. Some of the most viral claims have come from developers with detailed logs and strong opinions based on repeated use. Some of the benchmark evidence has been challenged by outside observers on methodological grounds. And Anthropic’s own recent changes to limits and settings ensure that this debate is happening against a backdrop of real adjustments, not pure rumor.

Editor's pickTechnology
Guardian· 2 days ago

Elon Musk’s X cuts payments to users who post clickbait

Platform says it will reward original creators as it penalises ‘aggregators’ for flooding timelines with ‘stolen posts’ Elon Musk’s X has reduced payments to users who post clickbait and recycle news stories as it warned account holders against “flooding the timeline” with low-quality content. Nikita Bier, X’s head of product, wrote on the social media platform that all “aggregators” – users who quickly repackage and repost news from other accounts – had received less money from the creator revenue sharing programme. Continue reading...

Editor's pickTechnology
Tech Policy Press· 2 days ago

What Regulators Should Do About The AI Industry's Hidden Financial Loop | TechPolicy.Press

Antitrust enforcers should be asking how a self-referential financial circuit is manufacturing the demand that justifies investments, Hera Hyeonseo Lee writes.

Editor's pickTechnology
Daily Brew· Yesterday

Read OpenAI’s latest internal memo about beating the competition

An internal memo from OpenAI outlines the company's strategy to maintain its competitive edge against rivals like Anthropic.

Editor's pickGovernment & Public Sector
Top Daily Headlines: IT manager approved downtime over lunch, but made a meal of it· Yesterday

Veterans Affairs has lost track of software licenses amid $985M bill

Department putting systems in place to manage 'restrictive licensing practices'.

Editor's pickTechnology
IndexBox· Yesterday

Amazon AI Chip Sales Proposal & Arm Holdings Growth Outlook | 2026 - News and Statistics - IndexBox

Analysis of Amazon's potential entry into the AI chip sales market and the subsequent projected financial benefits for architecture licensor Arm Holdings.

Editor's pickPAYWALLTelecommunications
Bloomberg· Yesterday

Poste CEO Says Telecom Italia Is ‘Perfect Fit’ for Digital Hub

Poste Italiane SpA decided to buy full ownership of Telecom Italia SpA to capture all the potential benefits of working with the former phone monopoly, Chief Executive Officer Matteo Del Fante said.

Editor's pickTechnology
ZeroHedge· 2 days ago

AI Infrastructure Bottlenecks Create Clear Winners In Tech | ZeroHedge

The nature of the bottlenecks signals gains will be concentrated in providers of the AI hardware stack, rather than broadening to tech hardware...

Editor's pickDefense & National Security
OpenPR· 2 days ago

Leading Companies Consolidating Their Roles in the Artificial Intelligence and Robotics Market for Aerospace and Defense

The integration of artificial intelligence and robotics is revolutionizing the aerospace and defense sectors promising significant advancements and new capabilities This growing market is set to experience substantial expansion as innovative technologies and strategic deployments reshape how ...

AI Productivity12 articles
Editor's pickProfessional Services
Arxiv· Yesterday

The Hourglass Revolution: A Theoretical Framework of AI's Impact on Organizational Structures in Developed and Emerging Markets

arXiv:2604.09623v1 Announce Type: cross Abstract: This paper presents a theoretical framework examining how artificial intelligence (AI) transforms organizational structures, introducing an "hourglass" configuration that emerges as AI assumes traditional middle management functions. The analysis identifies three key mechanisms algorithmic coordination, structural fluidity, and hybrid agency that demonstrate how AI enables organizational forms transcending traditional structural boundaries. These mechanisms illustrate how AI enables new modes of organizing to go beyond existing structural boundaries. Drawing on institutional theory and digital transformation research, we examine how these mechanisms operate differently in developed and emerging markets, producing distinct patterns of structural transformation. Our framework offers three important theoretical contributions: (1) conceptualizing algorithmic coordination as a unique form of organizational integration, (2) explaining how structural fluidity allows organizations to achieve stability and adaptability at the same time, and (3) the theoretical argument that hybrid agency surpasses traditional, human centric forms of organizational capabilities. Our analysis shows that while the move to AI enabled strategies overall seems quite global, successful application will need to pay sufficient attention to the technological capabilities, cultural dimensions, and contexts of the market.

Editor's pickPAYWALLProfessional Services
partners.wsj.com· 2 days ago

Driving AI Value Across the Enterprise - Paid Program - WSJ

Paid Program: Driving AI Value Across the Enterprise # Driving AI Value Across the Enterprise AI creates value when intelligent systems and human judgment are engineered to work as one integrated operating model. Quick Summary read more - Scaling AI across an enterprise presents significant complexities beyond initial small-scale pilots, often hindered by technical debt and legacy organizational structures. - Companies are shifting toward agentic workflows that focus on redesigning how people and products interact to drive measurable enterprise growth or cost reduction. - Publicis Sapient provides deep expertise and a suite of enterprise AI products and services designed to modernize legacy technology systems, accelerate the software development lifecycle process, build and orchestrate agents, and automate IT operations for meaningful enterprise-level value. An artificial-intellige

Editor's pick
@erikbryn· 2 days ago

Here's my take (aka the good old productivity J-curve): “The biggest gains from a powerful general-purpose technology usually arrive only after firms invest in the complements: reorganizing workflows, retraining workers, redesigning processes and building the intangible capital needed to use the technology effectively,”

Here's my take (aka the good old productivity J-curve): “The biggest gains from a powerful general-purpose technology usually arrive only after firms invest in the complements: reorganizing workflows, retraining workers, redesigning processes and building the intangible capital needed to use the technology effectively,”

Editor's pick
windfalltrust.substack.com· 2 days ago

Brief #9: MIT researchers validate fast AI capabilities

Brief #9: MIT researchers validate fast AI capabilities SubscribeSign in # Brief #9: MIT researchers validate fast AI capabilities ### Two new measurement efforts point to rapid, broad gains in capability, while a major survey shows economists still expect only moderate macro disruption. Jacob Schaal and Joel Christoph Apr 13, 2026 2 1 Share Welcome! This bi-weekly newsletter, published by the Windfall Trust, curates the most important developments in AI economics research and policy. Each issue features key research and updates, along with in-depth analysis and quick links to relevant opportunities and recent news. ## Need to Know Two major measurement efforts now suggest that AI capabilities across economically relevant tasks are improving very quickly. MIT FutureTech’s new wor

Editor's pick
apacoutlookmag.com· 2 days ago

PwC: Why Most AI Value is Going to Just 20% of Companies - APAC Outlook Magazine

PwC: Why Most AI Value is Going to Just 20% of Companies - APAC Outlook Magazine New PwC 2026 AI Performance Study finds leading businesses are using AI for growth rather than just productivity. Contents - PwC’s 2026 AI Performance Study - AI leaders are prioritising growth over efficiency - Workflow redesign is key to stronger AI performance - Automation is increasing across leading businesses - Governance and trust underpin AI success - Gap between AI leaders and laggards may continue to widen ## PwC’s 2026 AI Performance Study A small group of businesses is capturing the majority of artificial intelligence’s economic benefits, according to new research from PwC, highlighting a widening gap between AI leaders and the rest of the market. PwC’s 2026 AI Performance Study found that 74% of AI’s economic value is being captured by just 20% of organisations, suggesting that while many

Editor's pickProfessional Services
Livemint· 2 days ago

Gaps are emerging between AI’s promise and delivery: Is there an opportunity in this for India’s IT firms? | Mint

The AI revolution has entered a turbulent phase, where promise runs ahead of delivery. Productivity gains are real but messy. As sundry businesses struggle to make the most of AI, Indian IT service companies could make themselves useful.

Editor's pickManufacturing & Industrials
Bain & Company· 2 days ago

Industrial Automation: From Control to Intelligence | Bain & Company

AI is reshaping the automation value pyramid into an hourglass.

Editor's pickProfessional Services
Daily AI News April 13, 2026: Intuit Demonstrates a Repeatable AI Playbook for Regulated Industries· 2 days ago

AI's Next Operating Model

Bain's report argues that long-running agents with persistence and operational memory could shift AI from a transactional tool into an ongoing layer of organizational capability.

Editor's pickManufacturing & Industrials
Arxiv· Yesterday

Agentic Exploration of PDE Spaces using Latent Foundation Models for Parameterized Simulations

arXiv:2604.09584v1 Announce Type: new Abstract: Flow physics and more broadly physical phenomena governed by partial differential equations (PDEs), are inherently continuous, high-dimensional and often chaotic in nature. Traditionally, researchers have explored these rich spatiotemporal PDE solution spaces using laboratory experiments and/or computationally expensive numerical simulations. This severely limits automated and large-scale exploration, unlike domains such as drug discovery or materials science, where discrete, tokenizable representations naturally interface with large language models. We address this by coupling multi-agent LLMs with latent foundation models (LFMs), a generative model over parametrised simulations, that learns explicit, compact and disentangled latent representations of flow fields, enabling continuous exploration across governing PDE parameters and boundary conditions. The LFM serves as an on-demand surrogate simulator, allowing agents to query arbitrary parameter configurations at negligible cost. A hierarchical agent architecture orchestrates exploration through a closed loop of hypothesis, experimentation, analysis and verification, with a tool-modular interface requiring no user support. Applied to flow past tandem cylinders at Re = 500, the framework autonomously evaluates over 1,600 parameter-location pairs and discovers divergent scaling laws: a regime-dependent two-mode structure for minimum displacement thickness and a robust linear scaling for maximum momentum thickness, with both landscapes exhibiting a dual-extrema structure that emerges at the near-wake to co-shedding regime transition. The coupling of the learned physical representations with agentic reasoning establishes a general paradigm for automated scientific discovery in PDE-governed systems.

Editor's pickTechnology
Artificial Intelligence Newsletter | April 14, 2026· Yesterday

South Korea ranks third in notable AI models, tops patents in Stanford index

South Korea has climbed to third place globally for notable AI models released in 2025 and continues to lead in AI patents per capita, according to the Stanford AI Index 2026.

Editor's pickManufacturing & Industrials
The Economic Times· 2 days ago

india manufacturing growth: India wants manufacturing at 25% of GDP — will AI in factories help? - The Economic Times Video | ET Now

What does it take to move India's manufacturing from 16% to 25% of GDP? Two industry heavyweights, Vinod Kumar, Partner & Leader – Manufacturing,PwC India and Srihari Kaninghat, Group Chief Digital Officer, JSW Group sit down with host Anirban Chowdhury to cut through the hype and get real ...

Editor's pickEducation
Arxiv· Yesterday

DeepReviewer 2.0: A Traceable Agentic System for Auditable Scientific Peer Review

arXiv:2604.09590v1 Announce Type: new Abstract: Automated peer review is often framed as generating fluent critique, yet reviewers and area chairs need judgments they can \emph{audit}: where a concern applies, what evidence supports it, and what concrete follow-up is required. DeepReviewer~2.0 is a process-controlled agentic review system built around an output contract: it produces a \textbf{traceable review package} with anchored annotations, localized evidence, and executable follow-up actions, and it exports only after meeting minimum traceability and coverage budgets. Concretely, it first builds a manuscript-only claim--evidence--risk ledger and verification agenda, then performs agenda-driven retrieval and writes anchored critiques under an export gate. On 134 ICLR~2025 submissions under three fixed protocols, an \emph{un-finetuned 196B} model running DeepReviewer~2.0 outperforms Gemini-3.1-Pro-preview, improving strict major-issue coverage (37.26\% vs.\ 23.57\%) and winning 71.63\% of micro-averaged blind comparisons against a human review committee, while ranking first among automatic systems in our pool. We position DeepReviewer~2.0 as an assistive tool rather than a decision proxy, and note remaining gaps such as ethics-sensitive checks.

Labor & Society

68 articles
AI & Employment14 articles
Editor's pick
SIEPR· 2 days ago

AI’s big productivity boost? It’s happening from the sofa

A new study by SIEPR’s Michael Blank is among the first to examine an overlooked effect of generative AI: It’s significantly boosting how much people get done at home.

Editor's pickProfessional Services
Arxiv· Yesterday

LLM Nepotism in Organizational Governance

arXiv:2604.09620v1 Announce Type: new Abstract: Large language models are increasingly used to support organizational decisions from hiring to governance, raising fairness concerns in AI-assisted evaluation. Prior work has focused mainly on demographic bias and broader preference effects, rather than on whether evaluators reward expressed trust in AI itself. We study this phenomenon as LLM Nepotism, an attitude-driven bias channel in which favorable signals toward AI are rewarded even when they are not relevant to role-related merit. We introduce a two-phase simulation pipeline that first isolates AI-trust preference in qualification-matched resume screening and then examines its downstream effects in board-level decision making. Across several popular LLMs, we find that resume screeners tend to favor candidates with positive or non-critical attitudes toward AI, discriminating skeptical, human-centered counterparts. These biases suggest a loophole: LLM-based hiring can produce more homogeneous AI-trusting organizations, whose decision-makers exhibit greater scrutiny failure and delegation to AI agents, approving flawed proposals more readily while favoring AI-delegation initiatives. To mitigate this behavior, we additionally study prompt-based mitigation and propose Merit-Attitude Factorization, which separates non-merit AI attitude from merit-based evaluation and attenuates this bias across experiments.

Editor's pickEducation
Arxiv· Yesterday

The Division of Understanding: Specialization and Democratic Accountability

arXiv:2604.09871v1 Announce Type: new Abstract: This paper studies how the organization of production shapes democratic accountability. I propose a model in which learning economies make specialization productively efficient: most workers perform one-domain tasks, while a small set of integrators with cross-domain knowledge keep the system coherent. When policy consequences run across domains, integrators understand them better than specialists. Electoral competition then tilts targeted services toward integrators' interests, while low aggregate system knowledge weakens governance and reduces the fraction of public resources converted into citizen-valued services. Labor markets leave these civic margins unpriced, failing to internalize the political returns to system knowledge. Broadening routine specialists can therefore raise welfare relative to the market allocation. The model speaks to debates on liberal art education and the effects of AI.

Editor's pickProfessional Services
Arxiv· Yesterday

Help Without Being Asked: A Deployed Proactive Agent System for On-Call Support with Continuous Self-Improvement

arXiv:2604.09579v1 Announce Type: new Abstract: In large-scale cloud service platforms, thousands of customer tickets are generated daily and are typically handled through on-call dialogues. This high volume of on-call interactions imposes a substantial workload on human support analysts. Recent studies have explored reactive agents that leverage large language models as a first line of support to interact with customers directly and resolve issues. However, when issues remain unresolved and are escalated to human support, these agents are typically disengaged. As a result, they cannot assist with follow-up inquiries, track resolution progress, or learn from the cases they fail to address. In this paper, we introduce Vigil, a novel proactive agent system designed to operate throughout the entire on-call life-cycle. Unlike reactive agents, Vigil focuses on providing assistance during the phase in which human support is already involved. It integrates into the dialogue between the customer and the analyst, proactively offering assistance without explicit user invocation. Moreover, Vigil incorporates a continuous self-improvement mechanism that extracts knowledge from human-resolved cases to autonomously update its capabilities. Vigil has been deployed on Volcano Engine, ByteDance's cloud platform, for over ten months, and comprehensive evaluations based on this deployment demonstrate its effectiveness and practicality. The open source version of this work is publicly available at https://github.com/volcengine/veaiops.

Editor's pickPAYWALLEducation
washingtonpost.com· 2 days ago

The hottest college major hit a wall. What happened?

Computer science majors are disappearing. Is AI to blame? - The Washington Post Democracy Dies in Darkness Column by Shira Ovide and Shira is eager to hear from college students and their families about how you’re feeling about the job market. Drop her a line at shira.ovide@washpost.com. A lot of students took the advice to learn to code. Since the Great Recession left technology as a rare spot of optimism in American industry, computer science has been among the fastest-growing college majors in the country, according to indispensable degree data from the National Center for Education Statistics.

Editor's pickPAYWALLEducation
FT· 2 days ago

Robot-Proof — can the next generation keep a step ahead of the machines?

Vivienne Ming argues for a change in how we prepare the young for a near-future dominated and ‘deprofessionalised’ by AI

Editor's pick
indiatoday.in· Yesterday

78% Indian workers use AI tools to stay competitive, says ETS report - India Today

78% Indian workers use AI tools to stay competitive, says ETS report - India Today News Jobs 78% Indians are using AI at work just to survive, not to innovate: ETS report # 78% Indians are using AI at work just to survive, not to innovate: ETS report ## AI is rapidly becoming unavoidable at work, with 78% of Indian workers using it to stay competitive, while concerns about training costs and skill gaps continue to rise, as per the ITS Human Progress Report 2026. AI is rapidly becoming unavoidable at work, with 78% of Indian workers using it to stay competitive, while concerns about training costs and skill gaps continue to rise, as per the ITS Human Progress Report 2026. India Today Education Desk New Delhi,UPDATED: Apr 14, 2026 12:52 IST Written By: [Rosh

Editor's pick
Tech Edition· 2 days ago

AI leaders warn of job disruption as younger workers resist workplace automation - Tech Edition

AI leaders warn of job losses as younger workers resist workplace automation and sabotage AI rollouts.

Editor's pickProfessional Services
Business Insider· 2 days ago

LinkedIn Enters AI Training Market, Challenges Startups - Business Insider

LinkedIn is launching an AI labor marketplace, offering up to $150 an hour for AI training, challenging startups like Mercor and Surge AI.

Editor's pick
@erikbryn· 2 days ago

Here's a terrific article today by @HarrietTorry about why GDP is growing but employment isn't.

Here's a terrific article today by @HarrietTorry about why GDP is growing but employment isn't.

Editor's pickTransportation & Logistics
Daily Brew· 3 days ago

TechCrunch Mobility: Who is poaching all the self-driving vehicle talent?

An investigation into the competitive landscape for hiring experts in the autonomous vehicle industry.

Editor's pickTechnology
Daily Brew· Yesterday

Linux kernel now allows AI-generated code

The Linux kernel project has updated its policy to permit AI-generated code, provided the contributor takes full responsibility for any bugs.

Editor's pickEducation
futurism.com· 2 days ago

Recent Grads Say AI Is Making It Impossible to Find a Job - Futurism

Recent Grads Say AI Is Making It Impossible to Find a Job Illustration by Tag Hartman-Simkins / Futurism. Source: Getty Images ## Sign up to see the future, today Can’t-miss innovations from the bleeding edge of science and tech Sign Up Thank you! As debate rages over whether AI’s effects on the job market are real or illusory, one thing is clear: recent college grads are entering a labor force that has no room for them regardless. In a poll conducted by Gallup over the final three months of 2025, a whopping 72 percent of respondents said it was a “bad time” to find a quality job. From December of last year to March, the [labor force participation rate fell](https://fred.stlouisfed.org/release/tables?rid=50&ei

Editor's pickTechnology
neal-davis.medium.com· 2 days ago

AI Is Replacing Tasks Not Tech Jobs by Neal Davis | Medium

AI Is Replacing Tasks Not Tech Jobs by Neal Davis | Medium Sign up Get app Sign up # AI Is Replacing Tasks Not Tech Jobs — Here’s What That Means for Your Cloud Career 5 min read Just now -- Share Press enter or click to view image in full size There is a growing narrative in the tech industry that AI is replacing tech workers such as developers, DevOps engineers, IT support teams, and even data analysts. The reality is different from what the headlines suggest. AI is not replacing tech jobs. It is replacing specific, repetitive tasks within those jobs. And that difference matters if you want to build a future-proof career in cloud computing and AI. Across the industry, roles are not disappearing overnight. They are evolving. T

AI Ethics & Safety16 articles
Editor's pick
Theregister· Yesterday

The votes are in: AI will hurt elections and relationships

Latest report from Stanford's AI boffins finds unsafe usage practices, widespread anxiety about impacts, and China catching up to the USA Artificial intelligence has achieved mass adoption faster than the personal computer or the internet, reaching 53 percent of the population in just three years. The number of harmful AI incidents has increased correspondingly. And both experts and laypeople believe the impact will be felt in two areas: Elections and relationships.…

Editor's pick
Arxiv· Yesterday

Hidden Signals in Language: Inferring Sensitive Attributes from Reddit Comments Using Machine Learning

arXiv:2604.09627v1 Announce Type: new Abstract: Sensitive attributes are legally protected characteristics that should not be used to discriminate. Careful steps have been taken to minimize the risk of human bias regarding these fields, such as race and age. Large language models (LLMs) are similarly trained not to attempt to infer these aspects. However, just because they shouldn't, doesn't mean they don't. Using chat-like text fragments from authors tagged with sensitive attributes (e.g., MBTI personality, country of origin, gender), a model can often classify these attributes better than a naive guess, with results depending on the combination of subject matter and attribute. The text data from these comments is converted into numerical representations using embedding models, which are then used to train relatively simple classifiers such as logistic regression and decision trees. This study's results show that even these lightweight models can detect statistically significant signals associated with sensitive attributes in user-generated text. The results show that demographic traits such as gender and age are more readily predictable, whereas personality traits are expressed more subtly and depend more heavily on context. Predictive performance varies across online Reddit communities, with some subreddits consistently revealing attributes, while others show high variability depending on the trait being analyzed. These findings indicate that language contains latent identity signals that users may not intend to disclose but are nevertheless detectable through computational methods, and imply that more complex language models may have an inherent, greater capacity to infer sensitive attributes. This raises important concerns about privacy, bias, and the potential misuse of inferred personal information in AI systems. We call for increased transparency, stronger safeguards, and careful policy consideration for future LLMs.

Editor's pick
Arxiv· Yesterday

Morally Programmed LLMs Reshape Human Morality

arXiv:2604.10222v1 Announce Type: new Abstract: As large language models (LLMs) increasingly participate in high-stakes decision-making, a central societal debate has revolved around which moral frameworks-deontological or utilitarian-should guide machine behavior. However, a largely overlooked question is whether the moral principles that humans encode in LLMs could, through repeated interactions, reshape human moral inclinations. We developed two LLMs programmed with either deontological principles (D-LLM) or utilitarian principles (U-LLM) and conducted two pre-registered experiments involving extensive human-LLM interactions, comprising 15,985 total exchanges across the two experiments. Results show that interacting with these morally programmed LLMs systematically shifted human moral inclinations to align with the principles embedded in these systems. These effects remained strong two weeks after the interaction, with only slight decay, suggesting deep internalization rather than superficial agreement. Further, LLM-induced shifts in human moral inclinations translated into meaningful changes in socio-political policy evaluations, shaping how individuals approach contentious social issues. Overall, these results demonstrate that morally programmed LLMs can shape-not merely reflect-human morality, revealing a critical design paradox: embedding moral principles in LLMs not only restricts their behavior but also poses the risk of shaping human morality, raising important ethical and policy questions about who determines which principles intelligent machines should adhere to.

Editor's pickGovernment & Public Sector
Daily Brew· 2 days ago

AI-Powered Cyberattack Exposes Millions of Mexican Records, Highlights Vulnerabilities in Government Security

A sophisticated AI-assisted cyberattack compromised nine Mexican government agencies, exfiltrating hundreds of millions of citizen records between December 2025 and February 2026.

Editor's pick
Arxiv· Yesterday

All Eyes on the Ranker: Participatory Auditing to Surface Blind Spots in Ranked Search Results

arXiv:2604.09946v1 Announce Type: new Abstract: Search engines that present users with a ranked list of search results are a fundamental technology for providing public access to information. Evaluations of such systems are typically conducted by domain experts and focus on model-centric metrics, relevance judgments, or output-based analyses, rather than on how accountability, harm, or trust are experienced by users. This paper argues that participatory auditing is essential for revealing users' causal and contextual understandings of how ranked search results produce impacts, particularly as ranking models appear increasingly convincing and sophisticated in their semantic interpretation of user queries. We report on three participatory auditing workshops (n=21) in which participants engaged with a custom search interface across four tasks, comparing a lexical ranker (BM25) and a neural semantic reranker (MonoT5), exploring varying levels of transparency and user controls, and examining an intentionally adversarially manipulated ranking. Reflexive activities prompted participants to articulate causal narratives linking search system properties to broader impacts. Synthesising the findings, we contribute a taxonomy of user-perceived impacts of ranked search results, spanning epistemic, representational, infrastructural, and downstream social impacts. However, interactions with the neural model revealed limits to participatory auditing itself: perceived system competence and accumulated trust reduced critical scrutiny during the workshop, allowing manipulations to go undetected. Participants expressed desire for visibility into the full search pipeline and recourse mechanisms. Together, these findings show how participatory auditing can surface user perceived impacts and accountability gaps that remain unseen when relying on conventional audits, while revealing where participatory auditing may encounter limitations.

Editor's pickTechnology
PR Newswire· 2 days ago

Securing AI becomes top priority as CIOs rank AI alongside malware, ransomware and phishing as major cyber risk

CIOs cite the culmination of a cyber skills shortage, and blind spots introduced by shadow AI, as a significant risk for them to manage and almost half wish AI...

Editor's pickTechnology
@emollick· 2 days ago

I am catching glimpses in my feed that there is a backlash against Mythos as "marketing hype," and it is a little confusing. I don't think anyone who has used the latest agentic coding tools, would think that expecting large-scale cybersecurity implications of increasingly good AI models is unbelievable, especially after reading the red team reports. It feels like a better place to start is to assume that there are new risks, and then we can all laugh at Anthropic and pat each other on the back if there are not. Also, while the AI labs certainly are impressed by their own accomplishments and benchmarks are flawed, I would note that both publicly and privately, Mythos seems to be taken seriously at a lot of large institutions and organizations filled with smart people who would rather not be worried about a new cybersecurity risk. Finally, I am not sure "our product is dangerous and we need to alert the government to that" is the sales pitch to the corporate world that critics seem to think it is.

I am catching glimpses in my feed that there is a backlash against Mythos as "marketing hype," and it is a little confusing. I don't think anyone who has used the latest agentic coding tools, would think that expecting large-scale cybersecurity implications of increasingly good AI models is unbelievable, especially after reading the red team reports. It feels like a better place to start is to assume that there are new risks, and then we can all laugh at Anthropic and pat each other on the back if there are not. Also, while the AI labs certainly are impressed by their own accomplishments and benchmarks are flawed, I would note that both publicly and privately, Mythos seems to be taken seriously at a lot of large institutions and organizations filled with smart people who would rather not be worried about a new cybersecurity risk. Finally, I am not sure "our product is dangerous and we need to alert the government to that" is the sales pitch to the corporate world that critics seem to think it is.

Editor's pickProfessional Services
Top Daily Headlines: IT manager approved downtime over lunch, but made a meal of it· Yesterday

What happened when AI ran into the cold hard reality of the legal profession

Hallucinations don't fly in a court of law.

Editor's pickFinancial Services
Daily Brew· Yesterday

Major Security Flaws in LLM Routers Expose Cryptocurrency Wallets to Theft, Researchers Warn

Researchers unveiled vulnerabilities in LLM routers allowing cryptocurrency theft via malicious commands during a live Ethereum transfer.

Editor's pickHealthcare
Arxiv· Yesterday

Investigating Vaccine Buyer's Remorse: Post-Vaccination Decision Regret in COVID-19 Social Media Using Politically Diverse Human Annotation

arXiv:2604.09626v1 Announce Type: new Abstract: A significant gap exists in datasets regarding post-COVID-19 vaccination experiences, particularly ``vaccine buyer's remorse''. Understanding the prevalence and nature of vaccine regret, whether based on personal or vicarious experiences, is vital for addressing vaccine hesitancy and refining public health communication. In this paper, we curate a novel dataset from a large YouTube news corpus capturing COVID-19 vaccination experiences, and construct a benchmark subset focused on vaccine regret, annotated by a politically diverse panel to account for the subjective and often politicized nature of the topic. We utilize large language models (LLMs) to identify posts expressing vaccine regret, analyze the reasons behind this regret, and quantify its occurrence in both first and second-person accounts. This paper aims to (1) quantify the prevalence of vaccine regret; (2) identify common reasons for this sentiment; (3) analyze differences between first-person and vicarious experiences; and (4) assess potential biases introduced by different LLMs. We find that while vaccine buyer's remorse appears in only $<2\%$ of public discourse, it is disproportionately concentrated in vaccine-skeptic influencer communities and is predominantly expressed through first-person narratives citing adverse health events.

Editor's pickPAYWALLTechnology
Washington Post· 2 days ago

Can AI be a ‘child of God’? Inside Anthropic’s meeting with Christian leaders.

Anthropic met with Christian leaders including from Catholic and Protestant churches to discuss its chatbot Claude’s moral development.

Editor's pickTechnology
impactalpha.com· Yesterday

Impact investors seek to assert human agency over the future of AI - ImpactAlpha

Impact investors seek to assert human agency over the future of AI - ImpactAlpha Introducing ImpactAlpha Edge. Track capital, LPs & GPs, and allocations. Shaping the Algorithm| April 13, 2026 # Impact investors seek to assert human agency over the future of AI David Bank and Dennis Price Gift this article No rule or regulator required Anthropic to pause the release of its new Mythos model. The San Francisco-based AI company voluntarily limited its distribution after finding that Mythos is able to identify thousands of previously unknown flaws in major operating systems and browsers – and exploit them within hours to gain full control of corporate networks. As AI pioneer Yoshua Bengio is credited with observing, there’s more regulation of a sandwich in New York City than there is of the emergent power of artificial intellige

Editor's pickTechnology
Linkdood· 2 days ago

Why New Tech Giants Preach Ethics While Racing Dominate Future - Linkdood Technologies

Critics argue that the AI industry ... development, competition and deployment at breakneck speed. This perceived hypocrisy is not simply a matter of public relations—it reflects deeper structural pressures shaping the AI ecosystem, including competition, investment, geopolitics and the race ...

Editor's pickFinancial Services
Digital Dealer· 2 days ago

You Can’t AI Your Way Out of a TILA Violation: Why Logic and Expertise Still Rule Compliance | Digital Dealer

The current atmosphere in consumer finance can be summed up by a single phrase: if you aren’t talking about Artificial Intelligence, you aren’t in the

Editor's pickPAYWALLTechnology
partners.wsj.com· 2 days ago

What Zero Trust Really Means When AI Changes the Rules

Paid Program: What Zero Trust Really Means When AI Changes the Rules # What Zero Trust Really Means When AI Changes the Rules The strongest security strategies focus on necessity, limit permissions and reduce exposure before attackers can act. In an era where AI can generate unique, never-before-seen malware in seconds, many security leaders are rethinking whether traditional detection tools can still keep pace. In this interview, ThreatLocker CEO Danny Jenkins argues the answer is not just better detection, but stricter control. He explains why zero-trust and least-privilege principles matter more in the age of AI, and why limiting what software can run, access and connect to may offer a more durable path than trying to block every new exploit before it becomes a threat. Q: Many organizations still invest heavily in tools designed to identify malicious behavior after it starts. At

Editor's pickPAYWALLProfessional Services
FT· 2 days ago

Bain & Co vulnerability exposed by hacker a month after McKinsey

CodeWall says it gained access to consultant’s Pyxis platform using a username and password from public web code

AI Policy & Regulation27 articles
Editor's pickTechnology
Artificial Intelligence Newsletter | April 13, 2026· 5 days ago

US tech industry faces uneasy status quo as supply chain litigation inches forward

Following a split in US courts over Anthropic's supply chain risk designation, the tech industry remains in an uneasy status quo while awaiting further decisions on the merits and potential appeals.

Editor's pickTechnology
Arxiv· Yesterday

Assessing Model-Agnostic XAI Methods against EU AI Act Explainability Requirements

arXiv:2604.09628v1 Announce Type: new Abstract: Explainable AI (XAI) has evolved in response to expectations and regulations, such as the EU AI Act, which introduces regulatory requirements on AI-powered systems. However, a persistent gap remains between existing XAI methods and society's legal requirements, leaving practitioners without clear guidance on how to approach compliance in the EU market. To bridge this gap, we study model-agnostic XAI methods and relate their interpretability features to the requirements of the AI Act. We then propose a qualitative-to-quantitative scoring framework: qualitative expert assessments of XAI properties are aggregated into a regulation-specific compliance score. This helps practitioners identify when XAI solutions may support legal explanation requirements while highlighting technical issues that require further research and regulatory clarification.

Editor's pickTechnology
sealion-lavender-ar2n.squarespace.com· 2 days ago

OpenAI expands global policy team under Chris Lehane | ETIH EdTech News — EdTech Innovation Hub

OpenAI expands global policy team under Chris Lehane | ETIH EdTech News — EdTech Innovation Hub # OpenAI sets out AI industrial policy strategy under Chief Global Affairs Officer Lehane 14 Apr #### Senior hires and new initiatives signal a coordinated approach to AI governance, workforce engagement, and education input. Chris Lehane, Chief Global Affairs Officer at OpenAI, outlines the company’s approach to AI industrial policy and global governance strategy OpenAI has expanded its global affairs team, bringing in senior hires from across the tech sector to support what it describes as a “National Industrial Policy” for artificial intelligence. Details of the hiring strategy were shared in a LinkedIn post by Chris Lehane, Chief Global Affairs Officer a

Editor's pickGovernment & Public Sector
cxodigitalpulse.com· Yesterday

Tom Duff Gordon takes charge of OpenAI’s policy efforts across EMEA - CXO Digitalpulse

Tom Duff Gordon takes charge of OpenAI’s policy efforts across EMEA - CXO Digitalpulse CXO pulse Content Hub Search CXO Musings - CXO Digitalpulse CXO Musings - CXO Digitalpulse CXO Musings - CXO Digitalpulse CXO pulse Content Hub Search CXO pulse Content Hub Search # Tom Duff Gordon takes charge of OpenAI’s policy efforts across EMEA April 14, 2026 161 OpenAI has appointed Tom Duff Gordon, formerly Vice President of International Policy at Coinbase, as its new Head of Policy for Europe, the Middle East, and Africa. The move reflects OpenAI’s growing focus on navigating complex regulatory environments as artificial intelligence adoption accelerates globally. Gordon spent nearly four years at Coinbase, where he led international policy efforts and worked closely wit

Editor's pick
Pretoria News· 2 days ago

AI war between Elon Musk and Sam Altman matters for SA AI Policy

The case, then, is not simply about what was promised, but about what is possible. Can a technology as powerful as AI remain open in a world defined by competition and geopolitical rivalry? Or does its very importance necessitate control? The answer will reverberate far beyond the courtroom.

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | April 14, 2026· 2 days ago

States help buoy US’s ranking in global AI policy index

The US sits in the middle of the pack in a prominent annual report on AI policy, with state-level lawmaking potentially offsetting a federal shift toward deregulation.

Editor's pickDefense & National Security
Bloomberg Law· 2 days ago

Cyberattacks, Tariffs, Geopolitics Loom Over Business Executives

Nearly 70% of business executives ... a moderate or serious risk to their company. ... Geopolitical uncertainty, rapid policy shifts, and mounting security risks are ratcheting up the threats facing US companies. Company executives say they’ve doubled down on AI and tech ...

Editor's pickDefense & National Security
Foreign Policy· 2 days ago

How the Pentagon Can Manage the Risks of AI Warfare

If warfighters don’t trust the technology, they won’t use it.

Editor's pickPAYWALLGovernment & Public Sector
washingtonpost.com· Yesterday

AI & Tech Brief: Radical activists and AI safety - The Washington Post

AI & Tech Brief: Radical activists and AI safety - The Washington Post Democracy Dies in Darkness By Benjamin Guggenheim Not a subscriber? Sign up here to get this newsletter in your inbox. - AI safety advocates and industry groups are casting blame on each other as the nation reckons with bouts of violence related to AI and data centers. - While Congress remains preoccupied, the fight over AI regulation is spilling out into states including Colorado and Illinois. (Plus, Anthropic hires Ballard Partners to lobby.) - Should the outputs of AI chatbots be treated like free speech? Court decisions will determine if AI companies are shielded from liability associated with the outputs of their models.

Editor's pickFinancial Services
Siliconrepublic· 2 days ago

Mythos testing begins as governments raise cyber concerns

US, UK and Canadian authorities are raising concerns over the model's abilities. Read more: Mythos testing begins as governments raise cyber concerns

Editor's pickGovernment & Public Sector
hai.stanford.edu· 2 days ago

Policy and Governance | The 2026 AI Index Report | Stanford HAI

Policy and Governance | The 2026 AI Index Report | Stanford HAI Policy and Governance | The 2026 AI Index Report | Stanford HAI Skip to content * * * * * * * * * * * * ###### Navigate * [ A

Editor's pickTechnology
Artificial Intelligence Newsletter | April 13, 2026· 5 days ago

US 3rd Circuit fair-use opinion could benefit AI companies

A recent fair-use decision in the US Court of Appeals for the Third Circuit could provide a pathway for AI companies to defend against copyright infringement lawsuits, though application may be complex.

Editor's pickTechnology
Artificial Intelligence Newsletter | April 13, 2026· 5 days ago

Musk's legal fight against US state AI laws grows as x.AI targets Colorado

Elon Musk is challenging the Colorado Artificial Intelligence Act, adding to his ongoing legal battles against state-level AI regulations in California and Minnesota.

Editor's pickTechnology
Artificial Intelligence Newsletter | April 14, 2026· 2 days ago

Tech companies request more time to implement EU's AI rulebook

Industry associations are urging EU policymakers to extend the grace period for generative AI labeling from six months to 12 months, citing risks of legal uncertainty and delays.

Editor's pickFinancial Services
Daily Brew· Yesterday

AI in Finance: Balancing Innovation with Fairness and Accountability, Warns RBI Deputy Governor

AI's role in finance could revolutionize efficiency and customer service, but poses risks that demand robust regulation, warns RBI Deputy Governor Swaminathan.

Editor's pick
KQED· 2 days ago

Stanford Study: AI Experts Are Optimistic About AI. The Rest of Us … Not So Much | KQED

The Stanford Institute for Human-Centered AI released its 2026 AI Index report, which reveals that the technology is advancing faster than society’s ability to understand, govern, or trust it.

Editor's pickFinancial Services
Unite.AI· 2 days ago

Beyond Retention: Why AI Governance in 2026 Is a Defensibility Problem – Unite.AI

Picture a regulated financial institution receiving a regulatory inquiry in early 2027. The regulator isn’t just asking whether the firm kept its records. Instead, the questions are more specific and considerably harder to answer: What did the AI system do?

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | April 14, 2026· 2 days ago

Challenges linked to agentic AI outpace existing oversight, say UK regulators

The Digital Regulation Cooperation Forum stated that autonomous systems are blurring lines of liability and control, prompting calls for cross-regulator coordination and outcome-based rules.

Editor's pickFinancial Services
artificialintelligence-news.com· 2 days ago

Companies expand AI adoption while keeping control - AI News

Companies expand AI adoption while keeping control Governance, Regulation & Policy # Companies expand AI adoption while keeping control April 13, 2026 Share this story: Tags: ## aiautonomousfinancegovernance Categories: ## AI Business StrategyFeaturesFinance AIGovernance, Regulation & PolicyTechEx Events Many companies are taking a slower, more controlled approach to autonomous systems as AI adoption grows. Rather than deploying systems that act on their own, they are focusing on tools that assist human decision-making and keep control over outputs. This approach is especially clear in sectors where errors carry real financial or legal risk. One example comes from [S&P Global Market Intelligence](https://www.spglobal.com/market-intelligence/en/solutions/products/resources/c

Editor's pickGovernment & Public Sector
thetimes-tribune.com· 2 days ago

POINT: Congress must embrace sensible federal guidelines

POINT: Congress must embrace sensible federal guidelines ### Republican Rep. Tony Gonzales of Texas says he will retire after acknowledging affair with staffer April 13, 2026 at 7:48 pm A photo taken on Sept. 1, 2025, shows the letters “AI” for Artificial Intelligence on a laptop screen (right) next to the logo of the ChatGPT application on a smartphone screen in Frankfurt am Main, western Germany. Getting your Trinity Audio player ready... “The main thing is to keep the main thing the main thing,” famously said Stephen Covey, the renowned organizational consultant. With AI legislation, what matters most is common sense. That means first not killing, or stagnating, the benefits of AI from 50 states’ cumbersome and contradictory laws regarding AI model development and how it runs. Accompanying this should be related sensible federal guidelines. [Both pa

Editor's pick
@emollick· 2 days ago

General Purpose Technologies have downstream effects everywhere, good and bad. Those outcomes can be mitigated, or encouraged, with the right sorts of policy choices If the only options are being for or against a GPT, you end up having very few levers to pull to make it work out.

General Purpose Technologies have downstream effects everywhere, good and bad. Those outcomes can be mitigated, or encouraged, with the right sorts of policy choices If the only options are being for or against a GPT, you end up having very few levers to pull to make it work out.

Editor's pickTechnology
Artificial Intelligence Newsletter | April 13, 2026· 5 days ago

Singapore heat on TikTok, X shows online safety scrutiny going global

Enforcement of TikTok and X by a Singapore regulator over failures to curb harmful content adds to global efforts to confront similar risks. The UK, EU, and US are also increasing pressure on platform moderation systems.

Editor's pickMedia & Entertainment
Artificial Intelligence Newsletter | April 14, 2026· 2 days ago

AI companies lean on Cox opinion in seeking dismissal of US copyright suits

Singapore AI firm Nanonoble argued that the Walt Disney Company’s contributory copyright infringement claims against it should be dismissed following the Supreme Court’s decision in Cox Communications v. Sony Music Entertainment.

Editor's pickGovernment & Public Sector
Substack· 2 days ago

Congress should require analytic integrity and AI literacy across government intelligence

Artificial intelligence raises the stakes. Generative tools can draft quickly, summarize broadly, and sound authoritative even when they are wrong. They can hallucinate facts, flatten nuance, and conceal analytical weakness behind polished language.

Editor's pickGovernment & Public Sector
hansard.parliament.uk· 2 days ago

Artificial Intelligence: Impact on Employment - Hansard

Artificial Intelligence: Impact on Employment - Hansard - UK Parliament # Artificial Intelligence: Impact on Employment ## Debated on Monday 13 April 2026 Previous debate Next debate House of Commons data for 26 March 2026 is currently being processed. Data for this date may temporarily appear incomplete. This debate is sourced from the uncorrected (rolling) version of Hansard and is subject to correction. Question 3.04pm Asked by Share a link to this specific contribution: To ask His Majesty’s Government what assessment they have made of the impact of developments in artificial intelligence on current levels of employment. [The Parliament

Editor's pick
@emollick· 2 days ago

The trend of treating all of AI as One Big Thing that always includes data centers & job changes & education changes & power & accelerating science & misinformation & national security & corporate control & healthcare is going to inevitably lead to some bad policy on all sides.

The trend of treating all of AI as One Big Thing that always includes data centers & job changes & education changes & power & accelerating science & misinformation & national security & corporate control & healthcare is going to inevitably lead to some bad policy on all sides.

Editor's pickGovernment & Public Sector
Artificial Intelligence Newsletter | April 14, 2026· 2 days ago

Sweden's privacy watchdog to see budget increase for AI Act duties

The Swedish government proposed a SEK 1 million budget increase for its Data Protection Authority to support new market surveillance duties under the EU's AI Act.

AI Skills & Education11 articles
Editor's pickEducation
Arxiv· Yesterday

Assessing the Pedagogical Readiness of Large Language Models as AI Tutors in Low-Resource Contexts: A Case Study of Nepal's K-10 Curriculum

arXiv:2604.09619v1 Announce Type: new Abstract: The integration of Large Language Models (LLMs) into educational ecosystems promises to democratize access to personalized tutoring, yet the readiness of these systems for deployment in non-Western, low-resource contexts remains critically under-examined. This study presents a systematic evaluation of four state-of-the-art LLMs--GPT-4o, Claude Sonnet 4, Qwen3-235B, and Kimi K2--assessing their capacity to function as AI tutors within the specific curricular and cultural framework of Nepal's Grade 5-10 Science and Mathematics education. We introduce a novel, curriculum-aligned benchmark and a fine-grained evaluation framework inspired by the "natural language unit tests" paradigm, decomposing pedagogical efficacy into seven binary metrics: Prompt Alignment, Factual Correctness, Clarity, Contextual Relevance, Engagement, Harmful Content Avoidance, and Solution Accuracy. Our results reveal a stark "curriculum-alignment gap." While frontier models (GPT-4o, Claude Sonnet 4) achieve high aggregate reliability (approximately 97%), significant deficiencies persist in pedagogical clarity and cultural contextualization. We identify two pervasive failure modes: the "Expert's Curse," where models solve complex problems but fail to explain them clearly to novices, and the "Foundational Fallacy," where performance paradoxically degrades on simpler, lower-grade material due to an inability to adapt to younger learners' cognitive constraints. Furthermore, regional models like Kimi K2 exhibit a "Contextual Blindspot," failing to provide culturally relevant examples in over 20% of interactions. These findings suggest that off-the-shelf LLMs are not yet ready for autonomous deployment in Nepalese classrooms. We propose a "human-in-the-loop" deployment strategy and offer a methodological blueprint for curriculum-specific fine-tuning to align global AI capabilities with local educational needs.

Editor's pickEducation
Arxiv· Yesterday

Explainability and Certification of AI-Generated Educational Assessments

arXiv:2604.09622v1 Announce Type: new Abstract: The rapid adoption of generative artificial intelligence (AI) in educational assessment has created new opportunities for scalable item creation, personalized feedback, and efficient formative evaluation. However, despite advances in taxonomy alignment and automated question generation, the absence of transparent, explainable, and certifiable mechanisms limits institutional and accreditation-level acceptance. This chapter proposes a comprehensive framework for explainability and certification of AI-generated assessment items, combining self-rationalization, attribution-based analysis, and post-hoc verification to produce interpretable cognitive-alignment evidence grounded in Bloom's and SOLO taxonomies. A structured certification metadata schema is introduced to capture provenance, alignment predictions, reviewer actions, and ethical indicators, enabling audit-ready documentation consistent with emerging governance requirements. A traffic-light certification workflow operationalizes these signals by distinguishing auto-certifiable items from those requiring human review or rejection. A proof-of-concept study on 500 AI-generated computer science questions demonstrates the framework's feasibility, showing improved transparency, reduced instructor workload, and enhanced auditability. The chapter concludes by outlining ethical implications, policy considerations, and directions for future research, positioning explainability and certification as essential components of trustworthy, accreditation-ready AI assessment systems.

Editor's pickEducation
Fortune· Yesterday

Like Elon Musk, he was coding at 12 and became one of Google’s youngest ever CMOs—but now says Gen Z are better off ice skating than learning to code

Like Elon Musk and Mark Zuckerberg, Alon Chen coded his way to Google CMO and a seven-figure exit. Now he says AI has made the skill 'obsolete' for Gen Z

Editor's pickManufacturing & Industrials
aaobx.org· Yesterday

Google Invests $10M to Train 40,000 Manufacturing Workers in AI Skills (2026)

Google Invests $10M to Train 40,000 Manufacturing Workers in AI Skills (2026) { style = event.detail.style; message = event.detail.message; show = true; }); "> # Google Invests $10M to Train 40,000 Manufacturing Workers in AI Skills (2026) The AI Skills Train for a New American Factory Town Hall Personally, I think Google’s latest move is more than a funding gimmick. It’s a deliberate bet on a future where the factory floor isn’t a relic but a rapidly evolving, AI-augmented workspace. The core idea is simple on the surface: teach 40,000 manufacturing workers AI literacy and connect them to deeper apprenticeship tracks across the country. What makes it interesting is what this implies about who owns the future of production—and who pays for the education required to participate in it. A practical reboot of shop floor intelligence What makes this program notable is not just the scale,

Editor's pickEducation
PR Newswire· 2 days ago

New Pearson and AWS Global Research: 53% of Employers Struggle to Find AI-Ready Graduates

/PRNewswire/ -- Pearson (FTSE: PSON.L), the world's lifelong learning company, and Amazon Web Services, Inc. (AWS), an Amazon.com, Inc. company (NASDAQ: AMZN),...

Editor's pickEducation
hr-brew.com· 2 days ago

AI readiness gap is slowing productivity gains - HR Brew

AI readiness gap is slowing productivity gains April 13, 2026 • less than 3 min read Imagine if NASA built a rocket ship, laid out a plan to go to the moon, but didn’t train the astronauts on how to crew the new vessel. NASA would never. In fact, the Artemis II crew trained for three years in a replica Orion capsule practicing real-world (pun intended) scenarios in order to master the tech they’d be using away from Earth. The crew sometimes spent more than 30 hours at a time training ahead of its mission, which safely splashed down in the Pacific Ocean Friday. Compare that to your department’s last AI training session. A gap in AI readiness is impacting AI’s potential gains across industries, according to a new report from study.com on the

Editor's pickManufacturing & Industrials
themanufacturinginstitute.org· 2 days ago

AI Skills Training | The Manufacturing Institute

AI Skills Training | The Manufacturing Institute Manufacturers know better than anyone how quickly technology is changing. Artificial intelligence is rapidly becoming a larger part of modern manufacturing—from machine learning that improves predictive maintenance, to tools that optimize production and quality in real time. While AI's impact is unfolding, waiting isn't an option. In fact, a key barrier to faster adoption is preparing manufacturing's frontline workforce with the skills needed to succeed in an AI-enabled future. After all, manufacturing in America is human-led. Without these skills, manufacturers risk falling behind, while those who embrace AI will strengthen overall competitiveness. But manufacturers don't have to figure it out alone. The Manufacturing Institute is stepping forward to ensure workforce preparation keeps pace. ## What We're Doing The Manufacturing Insti

Editor's pickEducation
buildcognitiveresonance.substack.com· 2 days ago

An illustrated guide to resisting "AI is inevitable" in education

An illustrated guide to resisting "AI is inevitable" in education SubscribeSign in # An illustrated guide to resisting "AI is inevitable" in education ### Enough, enough, enough Apr 13, 2026 56 1 14 Share ### 1. Ask the AI-in-education enthusiast to clarify their premise. (Slide created by Jane Rosenzweig, Director of Harvard College Writing Center) ### 2. Ask the AI-in-education enthusiast if they are familiar with recent research indicating that generative AI leads to widespread “cognitive surrender.” Shot: “Conceptually distinct from cognitive offloading, which involves strategically outsourcing a discrete task to an external tool (e.g., using a calculator), cognitive surrender represents a deeper abdication of critical evaluation, where the user relinquishes cognitive control and adopts the AI’s judgment as their own…. Across our studies, we observe that when System 3 [

Editor's pickEducation
Arxiv· Yesterday

From Understanding to Creation: A Prerequisite-Free AI Literacy Course with Technical Depth Across Majors

arXiv:2604.09634v1 Announce Type: new Abstract: Most AI literacy courses for non-technical undergraduates emphasize conceptual breadth over technical depth. This paper describes UNIV 182, a prerequisite-free course at George Mason University that teaches undergraduates across majors to understand, use, evaluate, and build AI systems. The course is organized around five mechanisms: (1) a unifying conceptual pipeline (problem definition, data, model selection, evaluation, reflection) traversed repeatedly at increasing sophistication; (2) concurrent integration of ethical reasoning with the technical progression; (3) AI Studios, structured in-class work sessions with documentation protocols and real-time critique; (4) a cumulative assessment portfolio in which each assignment builds competencies required by the next, culminating in a co-authored field experiment on chatbot reasoning and a final project in which teams build AI-enabled artifacts and defend them before external evaluators; and (5) a custom AI agent providing structured reinforcement outside class. The paper situates this design within a comparative taxonomy of cross-major AI literacy courses and pedagogical traditions. Instructor-coded analysis of student artifacts at four assessment stages documents a progression from descriptive, intuition-based reasoning to technically grounded design with integrated safeguards, reaching the Create level of Bloom's revised taxonomy. To support adoption, the paper identifies which mechanisms are separable, which require institutional infrastructure, and how the design adapts to settings ranging from general AI literacy to discipline-embedded offerings. The course is offered as a documented resource, demonstrating that technical depth and broad accessibility can coexist when scaffolding supports both.

Editor's pickEducation
Arxiv· Yesterday

Leveraging Machine Learning Techniques to Investigate Media and Information Literacy Competence in Tackling Disinformation

arXiv:2604.09635v1 Announce Type: new Abstract: This study develops machine learning models to assess Media and Information Literacy (MIL) skills specifically in the context of disinformation among students, particularly future educators and communicators. While the digital revolution has expanded access to information, it has also amplified the spread of false and misleading content, making MIL essential for fostering critical thinking and responsible media engagement. Despite its relevance, predictive modeling of MIL in relation to disinformation remains underexplored. To address this gap, a quantitative study was conducted with 723 students in education and communication programs using a validated survey. Classification and regression algorithms were applied to predict MIL competencies and identify key influencing factors. Results show that complex models outperform simpler approaches, with variables such as academic year and prior training significantly improving prediction accuracy. These findings can inform the design of targeted educational interventions and personalized strategies to enhance students' ability to critically navigate and respond to disinformation in digital environments.

Editor's pickGovernment & Public Sector
Policy Options· 2 days ago

The missing work of co-ordination in Canada’s skills ecosystem

Filling Canada’s skills gap requires better co-ordination, stronger intermediaries and long-term workforce planning are essential.

Technology & Infrastructure

39 articles
AI Hardware4 articles
AI Infrastructure & Compute22 articles
Editor's pickPAYWALLManufacturing & Industrials
FT· 2 days ago

The AI build-out is powering global goods trade

Data centre boom is helping to mask the impact of Trump tariffs on US and world economy

Editor's pickTechnology
DATAQUEST· 2 days ago

AI compute demand drives 44% YoY growth for top 10 global fabless IC firms in 2025

Nvidia-Marvell will focus on customized XPUs, scale-up interconnect architectures based on NVLink Fusion, and optical interconnect and silicon photonics technologies

Editor's pickPAYWALLEnergy & Utilities
WSJ· Yesterday

Utilities Plan to Spend $1.4 Trillion Over Next Five Years to Power AI Boom

Soaring spending plans by 51 investor-owned utilities will help patch up the aging power grid and meet rising electricity demand for AI, a new report finds.

Editor's pickTechnology
Fortune· Yesterday

Exclusive: Chad Rigetti’s Sygaldry raises $139 million to bring quantum hardware to AI data centers

Sygaldry is the company Chad Rigetti cofounded in 2024 after leaving Rigetti Computing.

Editor's pickTechnology
Daily Brew· 2 days ago

Your developers are already running AI locally: Why on-device inference is the CISO’s new blind spot

An analysis of the security risks associated with developers running AI models locally on their own devices.

Editor's pickManufacturing & Industrials
Arxiv· Yesterday

AHC: Meta-Learned Adaptive Compression for Continual Object Detection on Memory-Constrained Microcontrollers

arXiv:2604.09576v1 Announce Type: new Abstract: Deploying continual object detection on microcontrollers (MCUs) with under 100KB memory requires efficient feature compression that can adapt to evolving task distributions. Existing approaches rely on fixed compression strategies (e.g., FiLM conditioning) that cannot adapt to heterogeneous task characteristics, leading to suboptimal memory utilization and catastrophic forgetting. We introduce Adaptive Hierarchical Compression (AHC), a meta-learning framework featuring three key innovations: (1) true MAML-based compression that adapts via gradient descent to each new task in just 5 inner-loop steps, (2) hierarchical multi-scale compression with scale-aware ratios (8:1 for P3, 6.4:1 for P4, 4:1 for P5) matching FPN redundancy patterns, and (3) a dual-memory architecture combining short-term and long-term banks with importance-based consolidation under a hard 100KB budget. We provide formal theoretical guarantees bounding catastrophic forgetting as O({\epsilon}{sq.root(T)} + 1/{sq.root(M)}) where {\epsilon} is compression error, T is task count, and M is memory size. Experiments on CORe50, TiROD, and PASCAL VOC benchmarks with three standard baselines (Fine-tuning,EWC, iCaRL) demonstrate that AHC enables practical continual detection within a 100KB replay budget, achieving competitive accuracy through mean-pooled compressed feature replay combined with EWC regularization and feature distillation.

Editor's pickTechnology
DataCenterKnowledge· 2 days ago

AMD: Memory, Not Compute, the Next Bottleneck in AI Data Centers

Discover how AMD is redefining memory as a key factor in AI data center performance, emphasizing workload-specific architectures like LPDDR5X.

Editor's pickTechnology
digitimes.com· Yesterday

AI infrastructure boom triggers NAND shortage, memory profits seen rising 2–3x, Silicon Motion says

AI infrastructure boom triggers NAND shortage, memory profits seen rising 2–3x, Silicon Motion says REALTIME NEWS # AI infrastructure boom triggers NAND shortage, memory profits seen rising 2–3x, Silicon Motion says Siu Han, Taipei Credit: DIGITIMES Silicon Motion's president, Wallace Kou, said the memory industry is undergoing an irreversible structural shift driven by AI infrastructure growth. Since August 2025, NAND flash prices have surged 4–10x, and module makers' profits are expected... The article requires paid subscription. Subscribe Now LOGIN Email address Password Keep me signed in Some subscribers prefer to save their log-in information so they do not have to enter their User ID and Password each time they visit the site. To activate this function, check the 'Keep me signed in' box in the log-in section. This will save the pass

Editor's pickTechnology
apps.digitimes.com· 2 days ago

Weekly news roundup: Shortages spread to MLCCs; SK Hynix reportedly in talks with Microsoft and Google

Weekly news roundup: Shortages spread to MLCCs; SK Hynix reportedly in talks with Microsoft and Google # Weekly news roundup: Shortages spread to MLCCs; SK Hynix reportedly in talks with Microsoft and Google , DIGITIMES Asia, Taipei Below are the most-read DIGITIMES Asia stories from the week of April 6-April 13, 2026: Demand for AI infrastructure spreads shortages beyond memory chips to MLCCs Rising global investment in AI infrastructure is beginning to strain supplies of multi-layer ceramic capacitors (MLCCs), with industry data showing lengthening lead times across most major suppliers in early 2026. The tightening supply is largely driven by surging demand from AI servers and automotive electronics, particularly for high-performance components, while manufacturers are already operating near full capacity. This imbalance is l

Editor's pickTechnology
securitybrief.news· 2 days ago

Cloudflare expands Agent Cloud for AI software agents

Cloudflare expands Agent Cloud for AI software agents SecurityBrief US - Technology news for CISOs & cybersecurity decision-makers # Cloudflare expands Agent Cloud for AI software agents Mon, 13th Apr 2026 (Yesterday) By Sean Mitchell, Publisher Cloudflare has expanded its Agent Cloud with new tools for developers building and running AI agents, adding compute, storage, sandboxing and model access for long-running workloads. The update targets software agents that can write code, use tools and complete multi-step tasks, rather than simply answer prompts in chatbot form. It includes Dynamic Workers, Artifacts, the general availability of Sandboxes, a new framework called Think, and wider model access, including OpenAI's GPT-5.4 and Codex. Cloudflare is presenting the release as part of a shift in software development, as more devel

Editor's pickTechnology
DIGITIMES· 2 days ago

AI server tracker: Taiwanese thermal solution providers entering structural AI growth phase as liquid cooling adoption accelerates

Taiwan's thermal management suppliers are emerging as one of the fastest-growing segments in the AI hardware ecosystem in 2026, even though their absolute revenue scale remains far below that of semiconductor leaders such as TSMC and large AI server ODMs like Quanta Computer and Foxconn.

Editor's pick
@emollick· 2 days ago

Six months ago, there was a lot of focus on the idea that the there would be a massive glut of unused computing power which would could a recession as AI use plateaued. The "compute bubble" belief was absolutely everywhere. The degree to which this was wrong deserves some notice

Six months ago, there was a lot of focus on the idea that the there would be a massive glut of unused computing power which would could a recession as AI use plateaued. The "compute bubble" belief was absolutely everywhere. The degree to which this was wrong deserves some notice

Editor's pickTechnology
Artificial Intelligence Newsletter | April 14, 2026· 2 days ago

EU’s AI ‘gigafactories’ project plagued by tech and budgetary bottlenecks

The EU's plan to build AI “gigafactories” is facing significant hurdles, with funding constraints and limited access to advanced technology slowing progress.

Editor's pickTechnology
lasvegassun.com· 2 days ago

SPAN Announces XFRA, a Distributed Data Center Solution to Close the Speed-to-Power Gap for AI Compute Demand - Las Vegas Sun News

# SPAN Announces XFRA, a Distributed Data Center Solution to Close the Speed-to-Power Gap for AI Compute Demand Published Monday, April 13, 2026 | 12:15 p.m. Updated 11 hours, 42 minutes ago SAN FRANCISCO--(BUSINESS WIRE)--Apr 13, 2026-- Today SPAN announced the launch of XFRA, a distributed data center solution designed to deliver gigawatts of new compute capacity amidst today’s growing power infrastructure constraints. Comprising a distributed network of compute nodes located in residential and small commercial spaces, XFRA enables both the immediate and future compute needs of hyperscalers, neoscalers and AI cloud providers. Initial launch partners include NVIDIA, the world leader in AI computing. This first-of-a-kind solution will launch with enterprise grade, liquid-cooled [NVIDIA RTX PRO™ 6000 Blackwell Server Edition](https://cts.businesswir

Editor's pickEnergy & Utilities
Government Technology· 2 days ago

Data Center Boom Alarms Arizona Energy Panel

A new report in the state warns that the ongoing boom in data centers and other “large-load” customers in Arizona is posing a challenge to grid reliability and the cost of electricity.

Editor's pickTechnology
✨ Rule of three· 2 days ago

Agentic AI is driving a new compute reality

The explosion in AI agent activity is increasing demand for CPUs to handle coordination, memory movement, and system-level intelligence.

Editor's pickGovernment & Public Sector
Siliconrepublic· Yesterday

Ireland to invest €17m in leading facilities for AI, medtech and more

The new facilities will serve projects in advanced materials, semiconductors and chips, high-speed communications, medical devices, geoengineering, and AI and high-performance computing. Read more: Ireland to invest €17m in leading facilities for AI, medtech and more

Editor's pickTechnology
inelectronics.co.uk· 2 days ago

Navitas cuts AI rack power stages with 800V board | IN Electronics & Design

Navitas cuts AI rack power stages with 800V board | IN Electronics & Design # Navitas cuts AI rack power stages with 800V board — April 13, 2026 Navitas has unveiled direct 800V-to-6V power delivery for AI racks. Its new GaN-based board removes the 48V stage and targets higher efficiency, density, and board-level integration. --- #### IN Brief: - About - Navitas has introduced an 800V-to-6V DC-DC power delivery board that removes the traditional 48V intermediate bus stage in AI server racks. - The design targets 96.5% peak efficiency, 1MHz switching, and 2,100W/in³ power density using GaN on the primary side and an ultra-low-profile layout. - The launch reflects a broader shift towards 800VDC architectures as rack power rises, conversion losses become more visible, and board area becomes part of the compute equation. [Navitas Semiconducto

Editor's pickTechnology
thedatacenterengineer.com· 2 days ago

CXL-based KV cache server adds up to 11 TB memory for AI inference - The Data Center Engineer

CXL-based KV cache server adds up to 11 TB memory for AI inference - The Data Center Engineer # CXL-based KV cache server adds up to 11 TB memory for AI inference - April 13, 2026 - Posted by The Data Center Engineer - Filed under News Penguin Solutions has announced what it calls the industry’s first “production-ready” CXL-based KV cache server, the Penguin Solutions MemoryAI KV cache server, aimed at reducing inference bottlenecks tied to memory capacity and latency. The company positions the system as a memory appliance for enterprise-scale AI inference workloads, including agentic AI, where KV cache capacity can constrain time-to-first-token (TTFT), throughput, and SLA performance. Peng

Editor's pickEnergy & Utilities
Inside Climate News· 2 days ago

Data Center Boom Reaches West Georgia, Raising Questions Amid Mounting Opposition - Inside Climate News

A proposed data center campus in Muscogee County has become a flashpoint in Georgia’s expanding AI infrastructure boom. Residents say development is beginning to outpace public understanding—and some fear the land itself may bear the cost.

Editor's pick
@emollick· 2 days ago

(That doesn't mean that there are no risks associated with the financing methods used to build data centers, of course, but the driving argument was that data center capacity was being overbuilt)

(That doesn't mean that there are no risks associated with the financing methods used to build data centers, of course, but the driving argument was that data center capacity was being overbuilt)

Editor's pickTechnology
Theregister· Yesterday

Microsoft sends Outlook Lite to the great inbox in the sky as memory costs skyrocket

Mailbox access in stripped-down Android app ends on May 25 Having blocked new installations of Outlook Lite in October 2025, Microsoft will " complete the retirement" of the app on May 25.…

AI Models & Capabilities10 articles
Editor's pickPAYWALLTechnology
Bloomberg· Yesterday

Alibaba Just Took the Crown of Video Generation

Released under stealth last week, the Happy Horse AI video generator now ranks top of benchmarks

Editor's pickTechnology
Arxiv· Yesterday

Instructing LLMs to Negotiate using Reinforcement Learning with Verifiable Rewards

arXiv:2604.09855v1 Announce Type: cross Abstract: The recent advancement of Large Language Models (LLMs) has established their potential as autonomous interactive agents. However, they often struggle in strategic games of incomplete information, such as bilateral price negotiation. In this paper, we investigate if Reinforcement Learning from Verifiable Rewards (RLVR) can effectively teach LLMs to negotiate. Specifically, we explore the strategic behaviors that emerge during the learning process. We introduce a framework that trains a mid-sized buyer agent against a regulated LLM seller across a wide distribution of real-world products. By grounding reward signals directly in the maximization of economic surplus and strict adherence to private budget constraints, we reveal a novel four-phase strategic evolution. The agent progresses from naive bargaining to using aggressive starting prices, moves through a phase of deadlock, and ultimately develops sophisticated persuasive skills. Our results demonstrate that this verifiable training allows a 30B agent to significantly outperform frontier models over ten times its size in extracting surplus. Furthermore, the trained agent generalizes robustly to stronger counterparties unseen during training and remains effective even when facing hostile, adversarial seller personas.

Editor's pickPharma & Biotech
Arxiv· Yesterday

LABBench2: An Improved Benchmark for AI Systems Performing Biology Research

arXiv:2604.09554v1 Announce Type: new Abstract: Optimism for accelerating scientific discovery with AI continues to grow. Current applications of AI in scientific research range from training dedicated foundation models on scientific data to agentic autonomous hypothesis generation systems to AI-driven autonomous labs. The need to measure progress of AI systems in scientific domains correspondingly must not only accelerate, but increasingly shift focus to more real-world capabilities. Beyond rote knowledge and even just reasoning to actually measuring the ability to perform meaningful work. Prior work introduced the Language Agent Biology Benchmark LAB-Bench as an initial attempt at measuring these abilities. Here we introduce an evolution of that benchmark, LABBench2, for measuring real-world capabilities of AI systems performing useful scientific tasks. LABBench2 comprises nearly 1,900 tasks and is, for the most part, a continuation of LAB-Bench, measuring similar capabilities but in more realistic contexts. We evaluate performance of current frontier models, and show that while abilities measured by LAB-Bench and LABBench2 have improved substantially, LABBench2 provides a meaningful jump in difficulty (model-specific accuracy differences range from -26% to -46% across subtasks) and underscores continued room for performance improvement. LABBench2 continues the legacy of LAB-Bench as a de facto benchmark for AI scientific research capabilities and we hope that it continues to help advance development of AI tools for these core research functions. To facilitate community use and development, we provide the task dataset at https://huggingface.co/datasets/futurehouse/labbench2 and a public eval harness at https://github.com/EdisonScientific/labbench2.

Editor's pickTechnology
Daily Brew· Yesterday

Anthropic been nerfing models according to BridgeBench

Users and researchers report that Anthropic may be intentionally reducing the performance of its models, potentially as a marketing strategy.

Editor's pickTechnology
Arxiv· Yesterday

Seven simple steps for log analysis in AI systems

arXiv:2604.09563v1 Announce Type: new Abstract: AI systems produce large volumes of logs as they interact with tools and users. Analysing these logs can help understand model capabilities, propensities, and behaviours, or assess whether an evaluation worked as intended. Researchers have started developing methods for log analysis, but a standardised approach is still missing. Here we suggest a pipeline based on current best practices. We illustrate it with concrete code examples in the Inspect Scout library, provide detailed guidance on each step, and highlight common pitfalls. Our framework provides researchers with a foundation for rigorous and reproducible log analysis.

Editor's pickTechnology
Arxiv· Yesterday

Persistent Identity in AI Agents: A Multi-Anchor Architecture for Resilient Memory and Continuity

arXiv:2604.09588v1 Announce Type: new Abstract: Modern AI agents suffer from a fundamental identity problem: when context windows overflow and conversation histories are summarized, agents experience catastrophic forgetting -- losing not just information, but continuity of self. This technical limitation reflects a deeper architectural flaw: AI agent identity is centralized in a single memory store, creating a single point of failure. Drawing on neurological case studies of human memory disorders, we observe that human identity survives damage because it is distributed across multiple systems: episodic memory, procedural memory, emotional continuity, and embodied knowledge. We present soul.py, an open-source architecture that implements persistent identity through separable components (identity files and memory logs), and propose extensions toward multi-anchor resilience. The framework introduces a hybrid RAG+RLM retrieval system that automatically routes queries to appropriate memory access patterns, achieving efficient retrieval without sacrificing comprehensiveness. We formalize the notion of identity anchors for AI systems and present a roadmap for building agents whose identity can survive partial memory failures. Code is available at github.com/menonpg/soul.py

Editor's pickTechnology
Arxiv· Yesterday

Spatial Competence Benchmark

arXiv:2604.09594v1 Announce Type: new Abstract: Spatial competence is the quality of maintaining a consistent internal representation of an environment and using it to infer discrete structure and plan actions under constraints. Prevailing spatial evaluations for large models are limited to probing isolated primitives through 3D transformations or visual question answering. We introduce the Spatial Competence Benchmark (SCBench), spanning three hierarchical capability buckets whose tasks require executable outputs verified by deterministic checkers or simulator-based evaluators. On SCBench, three frontier models exhibit monotonically decreasing accuracy up the capability ladder. Sweeping output-token caps shows that accuracy gains concentrate at low budgets and saturate quickly, and failures are dominated by locally plausible geometry that breaks global constraints. We release the task generators, verifiers, and visualisation tooling.

Editor's pick
Daily Brew· Yesterday

AI Solves Decade-Old Math Problem, Paving Way for Automated Discovery and Verification

An AI system developed by Peking University autonomously solved a decade-old open problem in commutative algebra, highlighting its potential in automating mathematical research.

Editor's pickTechnology
Arxiv· Yesterday

OOWM: Structuring Embodied Reasoning and Planning via Object-Oriented Programmatic World Modeling

arXiv:2604.09580v1 Announce Type: new Abstract: Standard Chain-of-Thought (CoT) prompting empowers Large Language Models (LLMs) with reasoning capabilities, yet its reliance on linear natural language is inherently insufficient for effective world modeling in embodied tasks. While text offers flexibility, it fails to explicitly represent the state-space, object hierarchies, and causal dependencies required for robust robotic planning. To address these limitations, we propose Object-Oriented World Modeling (OOWM), a novel framework that structures embodied reasoning through the lens of software engineering formalisms. We redefine the world model not as a latent vector space, but as an explicit symbolic tuple $W = \langle S, T \rangle$: a State Abstraction ($G_\text{state}$) instantiating the environmental state $S$, coupled with a Control Policy ($G_\text{control}$) representing the transition logic $T: S \times A \rightarrow S'$. OOWM leverages the Unified Modeling Language (UML) to materialize this definition: it employs Class Diagrams to ground visual perception into rigorous object hierarchies, and Activity Diagrams to operationalize planning into executable control flows. Furthermore, we introduce a three-stage training pipeline combining Supervised Fine-Tuning (SFT) with Group Relative Policy Optimization (GRPO). Crucially, this method utilizes outcome-based rewards from the final plan to implicitly optimize the underlying object-oriented reasoning structure, enabling effective learning even with sparse annotations. Extensive evaluations on the MRoom-30k benchmark demonstrate that OOWM significantly outperforms unstructured textual baselines in planning coherence, execution success, and structural fidelity, establishing a new paradigm for structured embodied reasoning.

Editor's pickTechnology
Daily AI News April 13, 2026: Intuit Demonstrates a Repeatable AI Playbook for Regulated Industries· 2 days ago

In-Place Test-Time Training

This research paper proposes a way for large language models to keep adapting at inference time by updating selected parameters in place, rather than staying fully frozen after pretraining.

Adoption & Impact

21 articles
AI Adoption & Diffusion18 articles
Editor's pickManufacturing & Industrials
Arxiv· Yesterday

Agentic AI in Engineering and Manufacturing: Industry Perspectives on Utility, Adoption, Challenges, and Opportunities

arXiv:2604.09633v1 Announce Type: new Abstract: This work examines how AI, especially agentic systems, is being adopted in engineering and manufacturing workflows, what value it provides today, and what is needed for broader deployment. This is an exploratory and qualitative state-of-practice study grounded in over 30 interviews across four stakeholder groups (large enterprises, small/medium firms, AI developers, and CAD/CAM/CAE vendors). We find that near-term AI gains cluster around structured, repetitive work and data-intensive synthesis, while higher-value agentic gains come from orchestrating multi-step workflows across tools. Adoption is constrained less by model capability than by fragmented and machine-unfriendly data, stringent security and regulatory requirements, and limited API-accessible legacy toolchains. Reliability, verification, and auditability are central requirements for adoption, driving human-in-the-loop frameworks and governance aligned with existing engineering reviews. Beyond technical barriers there are also organizational ones: a persistent AI literacy gap, cultural heterogeneity, and governance structures that have not yet caught up with agentic capabilities. Together, the findings point to a staged progression of AI utility from low-consequence assistance toward higher-order automation, as trust, infrastructure, and verification mature. This highlights key breakthroughs needed, including integration with traditional engineering tools and data types, robust verification frameworks, and improved spatial and physical reasoning.

BPAI analysis

Realizing ROI in manufacturing requires investing in data infrastructure and API-accessible legacy toolchains before deploying agents. Executive takeaway (AI brief): Prioritize 'data hygiene' and legacy system modernization as prerequisites for agentic AI adoption. Agentic AI's strategic value in engineering and manufacturing lies in orchestrating complex workflows to boost productivity, but adoption hinges on bridging infrastructure gaps rather than advancing models alone, positioning early integrators for competitive edges in industrial automation. Insights from over 30 interviews reveal near-term gains in repetitive tasks, constrained by fragmented data, legacy systems, and regulatory demands, necessitating human-in-the-loop verification and organizational upskilling for scaled deployment.

Editor's pickProfessional Services
VentureBeat· 2 days ago

Designing the agentic AI enterprise for measurable performance

Presented by Edgeverve Smart, semi‑autonomous AI agents handling complex, real‑time business work is a compelling vision. But moving from impressive pilots to production‑grade impact requires more than clever prompts or proof‑of‑concept demos. It takes clear goals, data‑driven workflows, and an enterprise platform that balances autonomy, governance, observability, and flexibility with hard guardrails from day one. From pilots to the “operational grey zones” The next wave of value sits in the connective tissue between applications — those operational grey zones where handoffs, reconciliations, approvals, and data lookups still rely on humans. Assigning agents to these paths means collapsing system boundaries, applying intelligence to context, and re‑imagining processes that were never formally automated. Many pilots stall because they start as lab experiments rather than outcome‑anchored designs tied to production systems, controls, and KPIs. Start with outcomes, not algorithms. Translate organizational KPIs (cash‑flow, DSO, SLA adherence, compliance hit rates, MTTR, NPS, claims leakage, etc.) into agent goals, then cascade them into single‑agent and multi‑agent objectives. Only after goals are explicit should you select workflows and decompose tasks. Pick targets, then decompose the work What does “target” actually mean? In agentic programs, a target is a business outcome and the use case that moves it. For example, “reduce unapplied cash by 20%” target outcome; “cash application and exceptions handling” use case. With the use case in hand, perform persona‑level task decomposition: map the human role (e.g., cash applications analyst, facilities coordinator), enumerate their tasks, and identify which are ripe for agentification (data retrieval, matching, policy checks, decision proposals, transaction initiation). Delivering on those tasks requires a data‑embedded workflow fabric that can read, write, and reason across enterprise systems while honoring permissions. Data must be AI‑ready, discoverable, governed, labeled where needed, augmented for retrieval (RAG), and policy‑protected for PII, PCI, and regulatory constraints. Integration goes beyond APIs APIs are one mode of integration, not the only one. Robust agent execution typically blends: Stable APIs with lifecycle management for core systems Event‑driven triggers (streams, webhooks, CDC) to react in real time UI/RPA fallbacks where APIs don’t exist Search/RAG connectors for documents and knowledge bases Policy management across tools and actions to enforce entitlements and segregation of duties The north star is integration reliability — built on idempotency, retries, circuit-breakers, and standardized tool schemas — so agents don’t “hallucinate” actions the enterprise can’t verify. A quick example: finance and facilities, in production Inside our organization, we deployed specialized agents in a live CFO environment and in building maintenance. In finance, seven agents interacted with production systems and real accountability structures. Year‑one outcomes included: >3% monthly cash‑flow improvement, 50% productivity gain in affected workflows, 90% faster onboarding, a shift from account‑level handling to function‑level orchestration, and a $32M cash‑flow lift. These results don’t guarantee gains everywhere; they show that designing products can deliver measurable outcomes on a scale. The four design pillars: Autonomy, governance, observability & evals, flexibility 1) Autonomy: right‑size it to the risk Autonomy exists on a spectrum. Early efforts often automate well‑bounded tasks; others pursue research/analysis agents; increasingly, teams target mission‑critical transactional agents (payments, vendor onboarding, pricing changes). The rule: match autonomy to risk, and encode the operating mode suggest‑only, propose‑and‑approve, or execute‑with‑rollback per task. 2) Governance: guardrails by design, not as bolt‑ons Unbounded agents create unacceptable risk. Build guardrails into the plan: Policy & permissions: tie tools/actions to identity, scopes, and SoD rules. Human‑in‑the‑loop (HITL): where mission‑critical thresholds are crossed (amount, vendor risk, regulatory exposure). Agent lifecycle management: versioning, change control, regression gates, approval workflows, and sunsetting. Third‑party agent orchestration: vet external agents like vendors, capabilities, scopes, logs, SLAs. Incident and rollback: kill‑switches, safe‑mode, and compensating transactions. This is how you scale innovation safely while protecting brand, compliance, and customers. 3) Observability & evaluations: trust comes from telemetry Production agents need the same rigor as any core platform: Telemetry: capture full execution traces across perception, planning, tool use, action supported by structured logs and replay. Offline evals: cenario tests, red‑teaming, bias and safety checks, cost/performance benchmarks; baseline vs. challenger comparisons. Online evals: shadow mode, A/B, canary releases, guardrail breach alerts, human feedback loops. Explainability & auditability: why was an action taken, which data/tools were used, and who approved. 4) Flexibility: assume volatility, design for swap‑ability Models, tools, and vendors change fast. Treat agentic capability as platform currency: create an environment where teams can evaluate, select, and swap models/tools without tearing down the build. Use a model router, tool registry, and contract‑first interfaces so upgrades are controlled experiments, not rewrites. The agent platform fabric: how platformization turns goals into outcomes A true agentic enterprise requires a platform fabric that transforms goals into outcomes, not a patchwork of isolated pilots. This platform anchors enterprise‑to‑agent KPI cascades, drives task decomposition and multi‑agent planning, and provides governed tooling and data access across APIs, RPA, search, and databases. It centralizes knowledge and memory through RAG and vector stores, enforces enterprise controls via a policy engine, and manages performance and safety through a unified model layer. It supports robust orchestration of first‑ and third‑party agents with common context, embeds deep observability and evaluation pipelines, and applies disciplined release engineering from sandbox to GA. Finally, it ensures long‑term resilience through lifecycle management versioning, deprecation, incident playbooks, and auditable histories. Guardrails in action: a BFSI example Consider payments exception handling in banking — high stakes, regulated, and customer‑visible. An agent proposes a resolution (e.g., auto‑reconcile or escalate) only when: The transaction falls below risk thresholds; above them, it triggers HITL approval. All policy checks (KYC/AML, velocity, sanctions) pass. Observability hooks record rationale, tools invoked, and data used. Rollback/compensation is defined if downstream failures occur. This pattern generalizes to vendor onboarding, pricing overrides, or claims adjudication — mission‑critical work with explicit safety rails. Scale beyond pilots Scaling agentic AI beyond pilots demands disciplined readiness across nine fronts: leaders must clarify which KPIs matter and how agent goals ladder into them, determine which persona tasks are agentified versus remain human‑led, and align each with the right autonomy mode from suggest‑only to propose‑and‑approve to execute‑with‑rollback. They must embed governance guardrails, including HITL points and lifecycle controls; ensure robust observability and evaluation via telemetry, replay, audits, and offline/online tests; and verify data readiness, with governed, policy‑protected, retrieval‑augmented data flows. Integration must be reliable, with API lifecycle management, event triggers, and RPA/other fallbacks. The underlying platform should enable model swap‑ability and orchestration of first‑ and third‑party agents without rebuilding. Finally, measurement must focus on true operational impact cash flow, cycle times, quality, and risk reduction rather than task counts. The takeaway Agentic AI is not a shortcut; it’s a new system of work. Enterprises that approach it with platform discipline aligning autonomy with risk, embedding governance and observability, and designing for swap‑ability will convert pilots into production impact. Those that don’t keep accumulating impressive but disconnected demos. The difference isn’t how fast you ship an agent; it’s how deliberately you design the enterprise around it. N. Shashidar is SVP & Global Head, Product Management at EdgeVerve. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Editor's pick
MIT· Yesterday

The Human Side of AI Adoption: Lessons From the Field

Carolyn Geason-Beissel/MIT SMR Not a day goes by without another article being published about how AI could disrupt yet another aspect of our business or personal lives. In recent years, AI adoption has indeed taken off. However, if you pay close attention, you’ll notice a dichotomy. Many examples of successful early adoption of artificial intelligence […]

Editor's pickManufacturing & Industrials
Reuters· Yesterday

Nissan to trim global car lineup, boost use of AI driving tech | Reuters

Nissan Motor plans to streamline its global ‌automobile lineup by exiting low-performing ones and deploy its artificial intelligence driving technology across 90% of its array over the long term as it targets a revitalisation after years of turmoil.

Editor's pickTechnology
Fortune· Yesterday

Anthropic is facing a wave of user backlash over reports of performance issues with its Claude AI chatbot

"Claude has regressed to the point [that] it cannot be trusted to perform complex engineering," one developer wrote.

Editor's pickTechnology
Arxiv· Yesterday

OpeFlo: Automated UX Evaluation via Simulated Human Web Interaction with GUI Grounding

arXiv:2604.09581v1 Announce Type: new Abstract: Evaluating web usability typically requires time-consuming user studies and expert reviews, which often limits iteration speed during product development, especially for small teams and agile workflows. We present OpenFlo, a user-experience evaluation agent that simulates user behavior on websites and produces standardized usability. Unlike traditional tools that rely on DOM parsing, OpenFlo grounds actions and observations, enabling it to interact with real web pages end-to-end while maintaining a coherent trace of the user journey. Building on Avenir-Web, our system pairs this robust interaction with simulated user behavior profiles and a structured evaluation protocol that integrates the System Usability Scale (SUS), step-wise Single Ease Questions (SEQ), and concurrent Think Aloud. Subsequently, a comprehensive User Experience (UX) report will be generated. We discuss the architecture of OpenFlo and illustrate how its multimodal grounding improves robustness for web-based interaction and UX evaluation scenarios, paving the way for a new era of continuous, scalable, and data-driven usability testing that empowers every developer to build web interfaces that are usable. Code is available at: https://github.com/Onflow-AI/OpenFlo

Editor's pickTechnology
VentureBeat· Yesterday

Agentic coding at enterprise scale demands spec-driven development

Presented by AWS Autonomous agents are compressing software delivery timelines from weeks to days. The enterprises that scale agents safely will be the ones that build using spec-driven development. There’s a moment in every technology shift where the early adopters stop being outliers and start being the baseline. We’re at that moment in software development, and most teams don’t realize it yet. A year ago, vibe coding went viral. Non-developers and junior developers discovered they could build beyond their abilities with AI. It lowered the floor. It made prototyping much quicker, but it also introduced a surplus of slop. What the industry then needed was something that raised the ceiling — something that improved code quality and worked the way the most expert developers work. Spec-driven development did that. It laid the foundation for trustworthy autonomous coding agents. Specs are the trust model for autonomous development Most discussions of AI-generated code focus on whether AI can write code. The harder question is whether you can trust it. The answer runs directly through the spec. Spec-driven development starts with a deceptively simple idea: before an AI agent writes a single line of code, it works from a structured, context-rich specification that defines what the system is supposed to do, what its properties are, and what "correct" actually means. That specification is an artifact the agent reasons against throughout the entire development process — fundamentally different from pre-agentic AI approaches of writing documentation after the fact. Enterprise teams are building on this foundation. The Kiro IDE team used Kiro to build Kiro IDE — an agentic coding environment with native spec-driven development — cutting feature builds from two weeks to two days. An AWS engineering team completed an 18-month rearchitecture project, originally scoped for 30 developers, with six people in 76 days using Kiro. An Amazon.com engineering team rolled out “Add to Delivery” — a feature that lets shoppers add items after checkout — two months ahead of schedule by using Kiro and spec-driven development. Alexa+, Amazon Finance, Amazon Stores, AWS, Fire TV, Last Mile Delivery, Prime Video, and more all integrate spec-driven development as part of their build approaches. That shift changes everything downstream. Verifiable testing is what makes autonomous agents safe to run The spec becomes an automated correctness engine. When a developer is generating 150 check-ins per week with AI assistance, no human can manually review that volume of code. Instead, code built against a concrete specification can be verified through property-based testing and neurosymbolic AI techniques that automatically generate hundreds of test cases derived directly from the spec, probing edge cases no human would think to write by hand. These tests prove that the code satisfies the spec’s defined properties, going well beyond hand-written test suites to provably correct behavior. Verifiable testing enables the shift from one-shot programming to continuous autonomous development. Traditional AI-assisted development operates as a single shot: you give the agent a spec, the agent produces output, and the process ends. Today’s agents continuously correct themselves, feeding build and test failures back into their own reasoning, generating additional tests to probe their own output, and iterating until they produce something both functional and verifiable. The spec is the anchor that keeps that loop from drifting. Instead of developers constantly checking in to see if the agent is making the right decisions, the agent can check itself against the spec to make sure it is on the right path. The autonomous agent of the future will write its own specs, using specifications as the mechanism for self-correction, for verification, for ensuring that what it produces matches the intended behavior of the system. Multi-agent, autonomous, and running right now The developers setting the pace today operate in a fundamentally different way. Developers spend significant time building their spec, as well as writing steering files used by the spec to make sure the agent knows what and how to build — more time than their agent may spend building the actual software. They run multiple agents in parallel to critique a problem from different perspectives, as well as run multiple specs, each written for a different component of the system they are building. They let agents run for hours, sometimes days. They use thousands of Kiro credits because the output justifies it. A year ago, agents would lose context and fall apart after 20 minutes. Now, every week you can run them longer than the week before. Agentic capabilities have improved significantly in the last six months that genuinely complex problems are tractable. Newer LLMs are more token-efficient than the previous generation, so for the same spend, you get dramatically more done. The challenge is that doing this well requires deep expertise. The tools, methodologies, and infrastructure exist, but orchestrating them is hard. The goal with Kiro is to bring these capabilities with deep expertise to every developer, not just the top one percent who’ve figured it out. Infrastructure is catching up to ambition Agents will be ten times more capable within a year. That’s the rate of improvement we’re seeing week over week. The infrastructure to support that level of capability is converging at the same time. Agents are now running in the cloud rather than locally, executing in parallel at scale with secure, reliable communication between agent systems. Organizations can now run agentic workloads the way they’d run any enterprise-grade distributed system — with governance, cost controls, and reliability guarantees that serious software demands. Spec-driven development is the architecture of tomorrow’s autonomous systems. Developers are no longer restricted by how they want to solve the problem. The developers who thrive in this world are the ones building that foundation now: using spec-driven development, prioritizing testability and verification from the start, working with agents as collaborators, and thinking in systems instead of syntax. Deepak Singh is VP of Kiro at AWS. Sponsored articles are content produced by a company that is either paying for the post or has a business relationship with VentureBeat, and they’re always clearly marked. For more information, contact sales@venturebeat.com.

Editor's pick
SmartAsset· 2 days ago

AI Attitudes, Adoption, and Benefits by State – 2026 Study

SmartAsset ranked U.S. states based on factors adoption of AI technology, including ChatGPT usage, work adoption, and job creation

BPAI context

Regional Development: Uneven adoption rates suggest that the economic benefits of AI may not be distributed uniformly across the U.S. economy. Regional variations in AI adoption signal widening productivity gaps across U.S. states, with leaders like Washington poised to capture outsized economic gains through infrastructure and talent concentration, while laggards risk falling behind in the AI-driven economy. On average, 18.1% of American workers use AI, but adoption rates range from 27.4% in Wyoming to just 8.4% in Hawaii, with Washington topping AI jobs per capita at 289.8 per 100,000 residents.

Editor's pick
Cam· 2 days ago

What does firm-level data tell us about AI adoption in the UK? - Bennett School of Public Policy

Dr John Lourenze Poquiz and Nghi Nguyen explore the characteristics of early business AI adopters in the UK during the period of ChatGPT's public launch in late 2022, which increased public awareness of AI and its potential businesses.

BPAI context

UK AI adoption is accelerating unevenly, with large firms and those with superior management structures pulling ahead, risking a productivity divide that concentrates gains among elites rather than diffusing broadly across the economy—policymakers must prioritize dismantling barriers for SMEs to foster inclusive growth. Data from ONS surveys post-ChatGPT launch reveal large enterprises (250+ employees) reaching 44% adoption by 2025 versus 26% for small firms, while adopters exhibit comparable productivity to non-adopters but lower intermediate costs and higher management scores.

Editor's pick
Wharton School· 2 days ago

Generative AI Won’t Create Value on Its Own - Knowledge at Wharton

Wharton’s Rahul Kapoor explains why leaders need to think beyond the technology and focus on the strategic challenges of emergence, enablement, and embedding.

Editor's pickManufacturing & Industrials
Manufacturing Dive· 2 days ago

How manufacturers are testing physical AI before making big investments | Manufacturing Dive

Growing interest in automation has created a need for testing centers to let manufacturers see if the technology will work for them. Deloitte, Tata Consultancy Services and Microsoft are among them.

Editor's pickManufacturing & Industrials
Livemint· 2 days ago

India’s manufacturing giants are embracing agentic AI to enhance efficiencies | Mint

The manufacturing sector is evolving from automation to an Agentic Enterprise model, where AI acts as a strategic teammate, enhancing decision-making and logistics. This shift requires reskilling workers, integrating AI into daily operations, and addressing data security risks.

Editor's pick
cio.com· 2 days ago

Increasing AI adoption with agents built to serve ALL employees - CIO

Increasing AI adoption with agents built to serve ALL employees | CIO # Increasing AI adoption with agents built to serve ALL employees Opinion Apr 13, 20266 mins ## AI is set to revolutionize workplaces worldwide. But only if people actually use it. Credit: fauxels / Pexels The pattern is remarkably consistent across industries: executive enthusiasm at the top, isolated pockets of experimentation in the middle and stalled adoption everywhere else. Despite billions in AI investment, only 5% of firms worldwide have achieved AI value at scale, according to Boston Consulting Group. A 2025 [UKG global study found that just 38% of frontline workers use AI in their daily roles](https://www.ukg.com/learn/resources/white-paper/more-perspective

Editor's pickProfessional Services
Outsourceaccelerator· 2 days ago

Staffing firms using AI see double revenue growth, report says - Outsource Accelerator

AI-driven recruitment is rapidly reshaping the global staffing industry, with firms using AI twice as likely to report revenue growth last year.

Editor's pickTechnology
moderndata101.substack.com· 2 days ago

How AI Is Reshaping Enterprise Data Governance

How AI Is Reshaping Enterprise Data Governance # Modern Data 101 SubscribeSign in # How AI Is Reshaping Enterprise Data Governance ### How AI is transforming data governance from a static back-office chore into a dynamic, predictive discipline. Apr 13, 2026 34 1 7 Share #### About Our Contributing Expert ### Dia Adams | Chief Data & AI Officer, ex-White House, Keynote Speaker & Best-Selling Author Dia Adams is a Chief Data & AI Officer, former White House Enterprise Data Strategist, and Board Chair of The AI Table, where she helps shape the future of responsible AI and enterprise transformation. With more than two decades of experience in data strategy, AI governance, and digital transformation, she has led high-impact initiatives across federal agencies and global enterprises. An accomplished author of Winning With AI: A Bl

Editor's pickFinancial Services
Wealth Management· 2 days ago

Fitch: AI Adoption Not a Near-Term Credit Driver for Wealth Firms

Fitch affirms stable ratings for wealth managers as AI tools grow.

Editor's pick
✨ Rule of three· 2 days ago

Who's leading AI adoption in the workplace

A new Gallup survey shows that while half of Americans use AI at work, company leaders are adopting it at a higher rate than managers and individual contributors.

Editor's pickTechnology
searchengineland.com· 2 days ago

AI search adoption isn't equal and income is driving the divide

AI search adoption isn’t equal and income is driving the divide Table of contents # AI search adoption isn’t equal and income is driving the divide Published: April 13, 2026 at 10:00 am Read Time: 7 minutes Published: Apr 13, 2026, 10:00 am · 7 min read Written by Becky Simms, Edited by Angel Niñofranco, Reviewed by Danny Goodwin ## Higher-value audiences are adopting AI faster, fragmenting search behavior and reshaping decisions before the click. Everyone is talking about AI search as if it’s already universal — as if we’ve collectively moved on, users have shifted and discovery has changed for everyone. But the reality is far less straightforward. While AI search is growing fast, [it isn’t being adopted evenly](https://

Geopolitics

7 articles
AI Geopolitics7 articles
Editor's pickManufacturing & Industrials
Arxiv· Yesterday

Structural Consequences of Policy-Based Interventions on the Global Supply Chain Network

arXiv:2604.11479v1 Announce Type: cross Abstract: As global political tensions rise and the anticipation of additional tariffs from the United States on international trade increases, the issues of economic independence and supply chain resilience become more prominent. The importance of supply chain resilience has been further underscored by disruptions caused by the COVID-19 pandemic and the ongoing war in Ukraine.In light of these challenges, ranging from geopolitical instability to product supply uncertainties, governments are increasingly focused on adopting new trade policies. This study explores the impact of several of these policies on the global electric vehicle (EV) supply chain network, with a particular focus on their effects on country clusters and the broader structure of international trade. Specifically, we analyse three key policies: Country Plus One, Friendshoring, and Reshoring. Our findings show that Friendshoring, contrary to expectations, leads to greater globalisation by increasing the number of supply links across friendly countries, potentially raising transaction costs. The Country Plus One policy similarly enhances network density through redundant links, while the Reshoring policy creates challenges in the EV sector due to the high number of irreplaceable products. Additionally, the effects of these policies vary across industries; for instance, mining goods being less affected in Country Plus One than the Friendshoring policy.

Editor's pickTechnology
The Hindu BusinessLine· 2 days ago

How West-Asia war could reshape the AI race - The HinduBusinessLine

As AI spreads across every sector of human activity, the global race to scale it collides with a highly volatile force: geopolitics. This convergence exposes how dependent technology is on physical resources.

Editor's pickPAYWALLTechnology
NYT· Yesterday

New Rules Hinder Foreign Firms From Moving Supply Chains From China

Multinationals in China are concerned that the regulations could allow authorities to penalize companies and executives for shifting supply chains away from the country.

Best Practice AI© 2026 Best Practice AI Ltd. All rights reserved.

Get the full executive brief

Receive curated insights with practical implications for strategy, operations, and governance.

AI Daily Brief — leaders actually read it.

Free email — not hiring or booking. Optional BPAI updates for company news. Unsubscribe anytime.

Include

No spam. Unsubscribe anytime. Privacy policy.