🚀 LIDA is now open source on GitHub. Try it out locally on your own data!. 08/14/2023 Learn more.

Automatic Generation of Visualizations and Infographics with LLMs

Works with any programming language or visualization grammar

Systems that support users in the automatic creation of visualizations must address several subtasks - understand the semantics of data, enumerate relevant visualization goals and generate visualization specifications. In this work, we pose visualization generation as a multi-stage generation problem and argue that well-orchestrated pipelines based on large language models (LLMs) and image generation models (IGMs) are suitable to addressing these tasks. We present LIDA, a novel tool for generating grammar-agnostic visualizations and infographics. LIDA comprises of 4 modules - A SUMMARIZER that converts data into a rich but compact natural language summary, a GOAL EXPLORER that enumerates visualization goals given the data, a VISGENERATOR that generates, refines, executes and filters visualization code and an INFOGRAPHER module that yields data-faithful stylized graphics using IGMs. LIDA provides a python api, and a hybrid user interface (direct manipulation and multilingual natural language) for interactive chart, infographics and data story generation.

Install via pip

pip install lida

Features - What Can You Do with LIDA?

Click on the video below for an overview of the capabilities of the LIDA user interface.

Watch Video

LIDA leverages the language modeling and code writing capabilities of state-of-the-art LLMs in enabling core automated visualization capabilities (data summarization, goal exploration, visualization generation, infographics generation) as well as operations on existing visualizations (visualization explanation, self-evaluation, automatic repair, recommendation).

AutoViz Core automated visualization capabilities

Data Summarization

Datasets can be massive. LIDA summarizes data into a compact but information dense natural language representation used as grounding context for all subsequent operations.

Automated Data Exploration

Unfamiliar with a dataset? LIDA provides a fully automated mode that generates meaningful visualization goals based on the dataset. EDA for free.

Grammar-Agnostic Visualizations

Want visualizations created in python in Altair, Matplotlib, Seaborn etc? How about R, C++ ? LIDA is grammar agnostic i.e., can generate visualizations in any grammar represented as code.

Infographics Generation

Convert data into rich, embellished, engaging stylized infographics using image generation models. Think data stories, personalization (brand, style, marketing etc.)?

VizOps Operations on Generated Visualizations

Visualization Explanation

Get detailed descriptions of visualization code. This has applications in accessibility, data literacy, education, and debugging/sensemaking of visualizations.

Self-Evaluation

LLMs like GPT-3.5 and GPT-4 encode visualization best practices. LIDA applies these capabilities in generating multi-dimensional evaluation scores for visualizations represented as code.

Visualization Repair

LIDA provides methods to automatically improve visualizations (via self-evaluation feedback) or repair visualizations based on user provided or compile feedback.

Visualization Recommendations

Given some context (goals, or an existing visualization), LIDA can recommend additional visualizations that may be useful to the user (e.g., for comparison, or to provide additional perspectives).

Learn more in the paper

System Architecture

System architecture for LIDA

Architecture for LIDA

LLM = Large Language Model | IGM = Image Generation Model

Example infographics generated with LIDA

FAQ

What are known limitations?

LIDA may not work well for visualization grammars that are not well represented in the LLM's training dataset. Similarly, we will likely see improved performance on datasets that resemble example datasets available online.

Performance is bottlenecked by the choice of visualization libraries used and degrees of freedom accorded the model in generating visualizations (e.g., a strict scaffold constrained to only visualization generation vs a generation scaffold with access to multiple libraries and general code writing capabilities).

LIDA currently requires code execution. While effort is made to constrain the scope of generated code (via scaffolding), a sandbox environment is recommended to ensure safe code execution.

Learn more in the paper

How is this built?

Source Code?

BibTex

A paper on LIDA is available on arxiv and has been accepted at the 2023 ACL Conference (System Demonstrations).

@article{dibia2023lida,
    title={LIDA: A Tool for Automatic Generation of Grammar-Agnostic Visualizations and Infographics using Large Language Models},
    author={Victor Dibia},
    year={2023},
    booktitle = "Proceedings of the 61th Annual Meeting of the Association for Computational Linguistics: System Demonstrations",
    publisher = "Association for Computational Linguistics",
    month={March}, 
    day={6},
    eprint={2303.02927},
    archivePrefix={arXiv},
    primaryClass={cs.AI}
}

Updates

08/14/2023🚀 LIDA is now open source on GitHub. Try it out locally on your own data!

learn more.

05/08/2023The LIDA paper has been accepted for publication at ACL 2023 Conference (Demonstration Track)

learn more.