Research AI news, analysis and benchmarks

Research

Papers that actually matter

28 articles in this section.

The Gap Between Formal Multilingual AI and Romanized Indic Code-Mixing

The Indi-RomCoM benchmark reveals that LLMs struggle with Romanized Indic-English code-mixing, highlighting a critical failure in current multilingual training pipelines.

Jul 2, 2026 · 3 min read

Research

Measuring the Gap Between CS Curricula and Industry Standards

An analysis of a new framework for measuring curriculum alignment, highlighting the disconnect between academic coverage and actual industry competency.

Jun 20, 2026 · 3 min read

Research

Runtime Governance for LLM Agents: Moving Beyond System Prompts

Exploring the shift from probabilistic system prompts to formal runtime policy enforcement using deontic logic to secure LLM agent tool execution.

Jun 19, 2026 · 3 min read

Research

Dark Matter Research Shifts Toward Compute-First Methodology and Solar Power

The search for dark matter is pivoting from hardware traps to AI-driven simulations, requiring scalable energy infrastructure like Kenya's solar initiatives.

Jun 18, 2026 · 3 min read

NCN

Research

Modeling Workforce Skill Decay in Industrial Production Planning

An analysis of the SkillChain-Gym benchmark and its approach to treating workforce capabilities as dynamic, decaying variables in industrial RL optimization.

Jun 17, 2026 · 3 min read

Research

Google's Gemini-SQL2: Analyzing the Gap Between Benchmarks and Production

An analysis of Gemini-SQL2's high BIRD benchmark scores and the challenges of applying text-to-SQL AI in messy, real-world production environments.

Jun 13, 2026 · 3 min read

Research

Benchmarking LLMs for Safety Data Sheet Extraction

An analysis of using LLMs to extract structured data from complex Safety Data Sheets, highlighting the challenges of PDF ingestion and accuracy.

Jun 11, 2026 · 3 min read

Research

MacArena: Testing the Real-World Friction of macOS Agent Benchmarks

MacArena exposes the gap between simulated environments and the actual friction of operating a macOS GUI, highlighting the fragility of current agents.

Jun 8, 2026 · 3 min read

Research

Huawei Releases KVarN: A Native vLLM Backend for KV-Cache Quantization

Huawei’s KVarN reduces VRAM usage in vLLM by quantizing the KV cache, allowing for larger batch sizes and longer context windows.

Jun 4, 2026 · 3 min read

Research

Solving Long-Form Coherence in Small Open-Weight LLMs

An analysis of the POLARIS paper and its approach to preventing quality degradation and structural collapse in long-form creative writing for small models.

Jun 4, 2026 · 3 min read