agentchat.contrib.img_utils

get_pil_image

def get_pil_image(image_file: Union[str, Image.Image]) -> Image.Image

Loads an image from a file and returns a PIL Image object.

Arguments:

image_file str, or Image - The filename, URL, URI, or base64 string of the image file.

Returns:

Image.Image - The PIL Image object.

get_image_data

def get_image_data(image_file: Union[str, Image.Image], use_b64=True) -> bytes

Loads an image and returns its data either as raw bytes or in base64-encoded format.

This function first loads an image from the specified file, URL, or base64 string using the get_pil_image function. It then saves this image in memory in PNG format and retrieves its binary content. Depending on the use_b64 flag, this binary content is either returned directly or as a base64-encoded string.

Arguments:

image_file str, or Image - The path to the image file, a URL to an image, or a base64-encoded string of the image.
use_b64 bool - If True, the function returns a base64-encoded string of the image data. If False, it returns the raw byte data of the image. Defaults to True.

Returns:

bytes - The image data in raw bytes if use_b64 is False, or a base64-encoded string if use_b64 is True.

llava_formatter

def llava_formatter(prompt: str,
                    order_image_tokens: bool = False) -> Tuple[str, List[str]]

Formats the input prompt by replacing image tags and returns the new prompt along with image locations.

Arguments:

prompt (str): The input string that may contain image tags like <img ...>.
order_image_tokens (bool, optional): Whether to order the image tokens with numbers. It will be useful for GPT-4V. Defaults to False.

Returns:

Tuple[str, List[str]]: A tuple containing the formatted string and a list of images (loaded in b64 format).

pil_to_data_uri

def pil_to_data_uri(image: Image.Image) -> str

Converts a PIL Image object to a data URI.

Arguments:

image Image.Image - The PIL Image object.

Returns:

str - The data URI string.

gpt4v_formatter

def gpt4v_formatter(prompt: str,
                    img_format: str = "uri") -> List[Union[str, dict]]

Formats the input prompt by replacing image tags and returns a list of text and images.

Arguments:

prompt (str): The input string that may contain image tags like <img ...>.
img_format (str): what image format should be used. One of "uri", "url", "pil".

Returns:

List[Union[str, dict]]: A list of alternating text and image dictionary items.

extract_img_paths

def extract_img_paths(paragraph: str) -> list

Extract image paths (URLs or local paths) from a text paragraph.

Arguments:

paragraph str - The input text paragraph.

Returns:

list - A list of extracted image paths.

message_formatter_pil_to_b64

def message_formatter_pil_to_b64(messages: List[Dict]) -> List[Dict]

Converts the PIL image URLs in the messages to base64 encoded data URIs.

This function iterates over a list of message dictionaries. For each message, if it contains a 'content' key with a list of items, it looks for items with an 'image_url' key. The function then converts the PIL image URL (pointed to by 'image_url') to a base64 encoded data URI.

Arguments:

messages List[Dict] - A list of message dictionaries. Each dictionary may contain a 'content' key with a list of items, some of which might be image URLs.

Returns:

List[Dict] - A new list of message dictionaries with PIL image URLs in the 'image_url' key converted to base64 encoded data URIs.

Example Input: [
{'content' - [{'type': 'text', 'text': 'You are a helpful AI assistant.'}], 'role': 'system'},
{'content' - [
{'type' - 'text', 'text': "What's the breed of this dog here? "},
{'type' - 'image_url', 'image_url': {'url': a PIL.Image.Image}},
{'type' - 'text', 'text': '.'}],
'role' - 'user'} ]

Example Output: [
{'content' - [{'type': 'text', 'text': 'You are a helpful AI assistant.'}], 'role': 'system'},
{'content' - [
{'type' - 'text', 'text': "What's the breed of this dog here? "},
{'type' - 'image_url', 'image_url': {'url': a B64 Image}},
{'type' - 'text', 'text': '.'}],
'role' - 'user'} ]

get_pil_image​

get_image_data​

llava_formatter​

pil_to_data_uri​

gpt4v_formatter​

extract_img_paths​

message_formatter_pil_to_b64​

get_pil_image

get_image_data

llava_formatter

pil_to_data_uri

gpt4v_formatter

extract_img_paths

message_formatter_pil_to_b64