LIDA | LIDA: Automated Visualizations with LLMs
🚀 LIDA is now open source on GitHub. Try it out locally on your own data!. 08/14/2023 Learn more.

Automatic Generation of Visualizations and Infographics with LLMs

Works with any programming language or visualization grammar
Systems that support users in the automatic creation of visualizations must address several subtasks - understand the semantics of data, enumerate relevant visualization goals and generate visualization specifications. In this work, we pose visualization generation as a multi-stage generation problem and argue that well-orchestrated pipelines based on large language models (LLMs) and image generation models (IGMs) are suitable to addressing these tasks. We present LIDA, a novel tool for generating grammar-agnostic visualizations and infographics. LIDA comprises of 4 modules - A SUMMARIZER that converts data into a rich but compact natural language summary, a GOAL EXPLORER that enumerates visualization goals given the data, a VISGENERATOR that generates, refines, executes and filters visualization code and an INFOGRAPHER module that yields data-faithful stylized graphics using IGMs. LIDA provides a python api, and a hybrid user interface (direct manipulation and multilingual natural language) for interactive chart, infographics and data story generation.
Install via pip
pip install lida
lida home page image
lida home page image
lida home page image
lida home page image
lida home page image
Features - What Can You Do with LIDA?
Click on the video below for an overview of the capabilities of the LIDA user interface.
Lida video screenshot
Watch Video
LIDA leverages the language modeling and code writing capabilities of state-of-the-art LLMs in enabling core automated visualization capabilities (data summarization, goal exploration, visualization generation, infographics generation) as well as operations on existing visualizations (visualization explanation, self-evaluation, automatic repair, recommendation).
AutoViz Core automated visualization capabilities
Data Summarization
Datasets can be massive. LIDA summarizes data into a compact but information dense natural language representation used as grounding context for all subsequent operations.
Automated Data Exploration
Unfamiliar with a dataset? LIDA provides a fully automated mode that generates meaningful visualization goals based on the dataset. EDA for free.
Grammar-Agnostic Visualizations
Want visualizations created in python in Altair, Matplotlib, Seaborn etc? How about R, C++ ? LIDA is grammar agnostic i.e., can generate visualizations in any grammar represented as code.
Infographics Generation
Convert data into rich, embellished, engaging stylized infographics using image generation models. Think data stories, personalization (brand, style, marketing etc.)?
VizOps Operations on Generated Visualizations
Visualization Explanation
Get detailed descriptions of visualization code. This has applications in accessibility, data literacy, education, and debugging/sensemaking of visualizations.
Self-Evaluation
LLMs like GPT-3.5 and GPT-4 encode visualization best practices. LIDA applies these capabilities in generating multi-dimensional evaluation scores for visualizations represented as code.
Visualization Repair
LIDA provides methods to automatically improve visualizations (via self-evaluation feedback) or repair visualizations based on user provided or compile feedback.
Visualization Recommendations
Given some context (goals, or an existing visualization), LIDA can recommend additional visualizations that may be useful to the user (e.g., for comparison, or to provide additional perspectives).
System Architecture
System architecture for LIDA
Lida architecture
Architecture for LIDA
LLM = Large Language Model | IGM = Image Generation Model
Lida architecture
Example infographics generated with LIDA
FAQ
What are known limitations?
LIDA may not work well for visualization grammars that are not well represented in the LLM's training dataset. Similarly, we will likely see improved performance on datasets that resemble example datasets available online.
Performance is bottlenecked by the choice of visualization libraries used and degrees of freedom accorded the model in generating visualizations (e.g., a strict scaffold constrained to only visualization generation vs a generation scaffold with access to multiple libraries and general code writing capabilities).
LIDA currently requires code execution. While effort is made to constrain the scope of generated code (via scaffolding), a sandbox environment is recommended to ensure safe code execution.
BibTex
A paper on LIDA is available on arxiv and has been accepted at the 2023 ACL Conference (System Demonstrations).
@article{dibia2023lida,
    title={LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models},
    author={Victor Dibia},
    year={2023},
    booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    publisher = "Association for Computational Linguistics",
    month={March}, 
    day={6},
    eprint={2303.02927},
    archivePrefix={arXiv},
    primaryClass={cs.AI}
}
Updates
08/14/2023🚀 LIDA is now open source on GitHub. Try it out locally on your own data!
learn more.
05/08/2023The LIDA paper has been accepted for publication at ACL 2023 Conference (Demonstration Track)
learn more.