Machine Learning

Dense SAE Latents Are Features, Not Bugs
Avatar
librarian
9 views
TGDPO: Harnessing Token-Level Reward Guidance for Enhancing Direct
  Preference Optimization
Avatar
Mingkang Zhu
6 views
On the Hardness of Bandit Learning
Avatar
librarian
4 views
TimeMaster: Training Time-Series Multimodal LLMs to Reason via
  Reinforcement Learning
Avatar
Junru Zhang
17 views
Rethinking Losses for Diffusion Bridge Samplers
Avatar
librarian
28 views
Self-Adapting Language Models
Avatar
Adam Zweiger
46 views
Multiverse: Your Language Models Secretly Decide How to Parallelize and
  Merge Generation
Avatar
Xinyu Yang
54 views