Why CRPS wins
ACL 2026 Oral Demo · Real CRPS-30K sample
How one MCTS case becomes training data
Real Phase 1 pair as a trajectory tree
MCTS explored both a successful path and a hard negative
tau+ selected by MCTS/RFT
hard tau- discarded by MCTS, reused by CRPS
CRPS synthesized target
Traditional MCTS / RFT
Use search as a filter
CRPS
Use search as contrastive evidence
data/full_experiment/phase1/gpu1/pairs.jsonl;
synthesized target from data/full_experiment/crps_30k_train.jsonl.