Computer Science

PnPXAI: A Universal XAI Framework Providing Automatic Explanations
  Across Diverse Modalities and Models
Avatar
Seongun Kim
0 views
Empirically evaluating commonsense intelligence in large language models
  with large-scale human judgments
Avatar
librarian
0 views
Web-Bench: A LLM Code Benchmark Based on Web Standards and Frameworks
Avatar
Luics Xu
15 views
Towards a Deeper Understanding of Reasoning Capabilities in Large
  Language Models
Avatar
librarian
0 views
Parallel Scaling Law for Language Models
Avatar
librarian
0 views
Plasticity as the Mirror of Empowerment
Avatar
librarian
0 views
Are Large Language Models Robust in Understanding Code Against
  Semantics-Preserving Mutations?
Avatar
librarian
0 views
AI Agents vs. Agentic AI: A Conceptual Taxonomy, Applications and
  Challenge
Avatar
Ranjan Sapkota
0 views
Variational Rank Reduction Autoencoder
Avatar
librarian
0 views
\textsc{rfPG}: Robust Finite-Memory Policy Gradients for Hidden-Model
  POMDPs
Avatar
librarian
1 view
The Influence of Human-inspired Agentic Sophistication in LLM-driven
  Strategic Reasoners
Avatar
librarian
0 views
Adversarial Suffix Filtering: a Defense Pipeline for LLMs
Avatar
David Khachaturov
0 views
Layered Unlearning for Adversarial Relearning
Avatar
Timothy Qian
1 view
Reproducibility Study of "Cooperate or Collapse: Emergence of
  Sustainable Cooperation in a Society of LLM Agents"
Avatar
librarian
0 views
Counterfactual Strategies for Markov Decision Processes
Avatar
librarian
0 views
Language Agents Mirror Human Causal Reasoning Biases. How Can We Help
  Them Think Like Scientists?
Avatar
Anthony GX-Chen
0 views
BIM-GPT: a Prompt-Based Virtual Assistant Framework for BIM Information
  Retrieval
Avatar
Hervé Onguéné
1 view
WixQA: A Multi-Dataset Benchmark for Enterprise Retrieval-Augmented
  Generation
Avatar
librarian
0 views
Addressing the Current Challenges of Quantum Machine Learning through
  Multi-Chip Ensembles
Avatar
Junghoon Justin Park
0 views
TRAIL: Trace Reasoning and Agentic Issue Localization
Avatar
librarian
0 views
DeepMath-Creative: A Benchmark for Evaluating Mathematical Creativity of
  Large Language Models
Avatar
librarian
0 views
ARC-NCA: Towards Developmental Solutions to the Abstraction and
  Reasoning Corpus
Avatar
Stefano Nichele
1 view