ACL 2026 Oral Demo · Real CRPS-30K sample

How one MCTS case becomes training data

    Why CRPS wins

    Traditional MCTS/RFT
    CRPS

    Real Phase 1 pair as a trajectory tree

    MCTS explored both a successful path and a hard negative

    root
    tau+ selected by MCTS/RFT hard tau- discarded by MCTS, reused by CRPS CRPS synthesized target

    Traditional MCTS / RFT

    Use search as a filter

    CRPS

    Use search as contrastive evidence

    Data provenance: tau+/tau- from data/full_experiment/phase1/gpu1/pairs.jsonl; synthesized target from data/full_experiment/crps_30k_train.jsonl.