raw.sh
Batched reward model inference and Best-of-N sampling
- 11/19/2024
Teaching chat models to solve chess puzzles
- 8/24/2024