BoxComm: Benchmarking Category-Aware Commentary Generation and Narration Rhythm in Boxing
445 matches
77.8 hours World Boxing Championship videos
77.8 hours World Boxing Championship videos
52K
Broadcast commentary sentences with category labels
260K
Millisecond-level punch events
Overview
Why we need BoxComm?
Key Ingredients
What makes BoxComm different?
🎙️
Category-aware commentary
Each sentence is labeled as play-by-play, tactical, or contextual, so the task is explicitly about discourse control in combat-sport commentary rather than generic narration alone.
👊
Fine-grained punch events
Detected events carry boxer side, punch technique, target area, and effectiveness to support structured understanding of combat exchanges.
⏱️
Structured evaluation
We do not rely on a single holistic judge score. Instead, BoxComm factorizes commentary quality into discourse type, local semantic correctness, and global narration rhythm, making evaluation more structured than unconstrained commentary scoring.
BoxComm dataset statistics
Construction
How the dataset is built
BoxComm is constructed by aligning broadcast commentary extraction with fine-grained atom-event detection for boxing exchanges.
Commentary extraction and alignment pipeline from professional boxing broadcasts.
Atom-event extraction pipeline for millisecond-level punch events used in BoxComm.
Acknowledgments
Acknowledgments
This research was supported by Huawei’s AI Hundred Schools Program and was carried out using the Huawei Ascend AI technology stack. Additionally, we would like to acknowledge the Xinjiang Uygur Autonomous Region Sports Science Research Center and the research group led by Prof. Qingmin Fan at Beijing Sport University for their critical assistance with the data collection and annotation iteration processes.