Artificial Intelligence

Empirically evaluating commonsense intelligence in large language models
  with large-scale human judgments
Avatar
librarian
1 view
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks
Avatar
Luics Xu
16 views
Towards a Deeper Understanding of Reasoning Capabilities in Large
  Language Models
Avatar
librarian
2 views
Plasticity as the Mirror of Empowerment
Avatar
librarian
1 view
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and
  Challenge
Avatar
Ranjan Sapkota
1 view
\textsc{rfPG}: Robust Finite-Memory Policy Gradients for Hidden-Model
  POMDPs
Avatar
librarian
2 views
The Influence of Human-inspired Agentic Sophistication in LLM-driven
  Strategic Reasoners
Avatar
librarian
1 view
Reproducibility Study of "Cooperate or Collapse: Emergence of
  Sustainable Cooperation in a Society of LLM Agents"
Avatar
librarian
1 view
Counterfactual Strategies for Markov Decision Processes
Avatar
librarian
1 view
Language Agents Mirror Human Causal Reasoning Biases. How Can We Help
  Them Think Like Scientists?
Avatar
Anthony GX-Chen
1 view
WixQA: A Multi-Dataset Benchmark for Enterprise Retrieval-Augmented
  Generation
Avatar
librarian
1 view
TRAIL: Trace Reasoning and Agentic Issue Localization
Avatar
librarian
1 view
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of
  Large Language Models
Avatar
librarian
1 view
ARC-NCA: Towards Developmental Solutions to the Abstraction and
  Reasoning Corpus
Avatar
Stefano Nichele
2 views
Belief Injection for Epistemic Control in Linguistic State Space
Avatar
librarian
1 view
AI for Extreme Event Modeling and Understanding: Methodologies and
  Challenges
Avatar
Aytaç PAÇAL
6 views
"I Apologize For Not Understanding Your Policy": Exploring the
  Specification and Evaluation of User-Managed Access Control Policies by AI
  Virtual Assistants
Avatar
Jennifer Mondragon
1 view
YuLan-OneSim: Towards the Next Generation of Social Simulator with Large
  Language Models
Avatar
Lei Wang
2 views
S-GRPO: Early Exit via Reinforcement Learning in Reasoning Models
Avatar
Mathus Dai
2 views
Emotion-Gradient Metacognitive RSI (Part I): Theoretical Foundations and
  Single-Agent Architecture
Avatar
Rintaro Ando
3 views
Agent RL Scaling Law: Agent RL with Spontaneous Code Execution for
  Mathematical Problem Solving
Avatar
librarian
5 views
A Pain Assessment Framework based on multimodal data and Deep Machine
  Learning methods
Avatar
librarian
8 views
Is there a half-life for the success rates of AI agents?
Avatar
librarian
7 views
MARK: Memory Augmented Refinement of Knowledge
Avatar
Anish Ganguli
10 views
Conversational Process Model Redesign
Avatar
librarian
10 views
Societal and technological progress as sewing an ever-growing,
  ever-changing, patchy, and polychrome quilt
Avatar
librarian
9 views
Multi-agent Embodied AI: Advances and Future Directions
Avatar
librarian
6 views
Advancing Neural Network Verification through Hierarchical Safety
  Abstract Interpretation
Avatar
librarian
9 views
EcoAgent: An Efficient Edge-Cloud Collaborative Multi-Agent Framework
  for Mobile Automation
Avatar
Biao Yi
9 views
TrajEvo: Designing Trajectory Prediction Heuristics via LLM-driven
  Evolution
Avatar
librarian
10 views
Mastering Multi-Drone Volleyball through Hierarchical Co-Self-Play
  Reinforcement Learning
Avatar
librarian
9 views
Qualitative Analysis of $ω$-Regular Objectives on Robust MDPs
Avatar
librarian
10 views