Performance rankings across the overall average, five content domains, and four capability dimensions. Each chart shows Hard (upper) and Easy (lower) scores per model.
By Content Domain
By Capability Dimension
We introduce BizGenEval, a systematic benchmark for commercial visual content generation. The benchmark spans five representative document types—slides, charts, webpages, posters, and scientific figures—and evaluates four key capability dimensions: text rendering, layout control, attribute binding, and knowledge-based reasoning, forming 20 diverse evaluation tasks. BizGenEval contains 400 carefully curated prompts and 8,000 human-verified checklist questions to rigorously assess whether generated images satisfy complex visual and semantic constraints.
Side-by-side qualitative comparisons reveal where current models succeed and fail. Correct regions are highlighted in blue, incorrect regions in red.
If BizGenEval is useful for your research, please consider citing our paper.