BoxComm: Benchmarking Category-Aware Commentary Generation and Narration Rhythm in Boxing

Kaiwen Wang1,* Kaili Zheng1,* Yiming Shi1 Rongrong Deng2 Chenyi Guo1,† Ji Wu1,†
1 Tsinghua University · 2 Beijing Sport University
* Equal contribution · Corresponding author
445 matches
77.8 hours
World Boxing Championship videos
52K Broadcast commentary sentences with category labels
260K Millisecond-level punch events
Overview

Why we need BoxComm?

BoxComm teaser figure
Existing sports commentary benchmarks mainly focus on generic narration or sentence-level alignment. In contrast, BoxComm is designed for boxing commentary, where both what to say and when to say it matter. Boxing involves highly dynamic, sub-second actions and a much higher proportion of tactical commentary than team sports (45.6% vs. 21.7%), making direct transfer from generic sports narration benchmarks insufficient. To reflect this, BoxComm evaluates commentary along two complementary dimensions: discourse type and narration rhythm.
Key Ingredients

What makes BoxComm different?

🎙️
Category-aware commentary Each sentence is labeled as play-by-play, tactical, or contextual, so the task is explicitly about discourse control in combat-sport commentary rather than generic narration alone.
👊
Fine-grained punch events Detected events carry boxer side, punch technique, target area, and effectiveness to support structured understanding of combat exchanges.
⏱️
Structured evaluation We do not rely on a single holistic judge score. Instead, BoxComm factorizes commentary quality into discourse type, local semantic correctness, and global narration rhythm, making evaluation more structured than unconstrained commentary scoring.
BoxComm statistics
BoxComm dataset statistics
Construction

How the dataset is built

BoxComm is constructed by aligning broadcast commentary extraction with fine-grained atom-event detection for boxing exchanges.

Commentary extraction via ASR
Commentary extraction and alignment pipeline from professional boxing broadcasts.
Atom event extraction pipeline
Atom-event extraction pipeline for millisecond-level punch events used in BoxComm.
Acknowledgments

Acknowledgments

This research was supported by Huawei’s AI Hundred Schools Program and was carried out using the Huawei Ascend AI technology stack. Additionally, we would like to acknowledge the Xinjiang Uygur Autonomous Region Sports Science Research Center and the research group led by Prof. Qingmin Fan at Beijing Sport University for their critical assistance with the data collection and annotation iteration processes.