agentchat.contrib.img_utils
get_pil_image
def get_pil_image(image_file: Union[str, Image.Image]) -> Image.Image
Loads an image from a file and returns a PIL Image object.
Arguments:
image_file
str, or Image - The filename, URL, URI, or base64 string of the image file.
Returns:
Image.Image
- The PIL Image object.
get_image_data
def get_image_data(image_file: Union[str, Image.Image], use_b64=True) -> bytes
Loads an image and returns its data either as raw bytes or in base64-encoded format.
This function first loads an image from the specified file, URL, or base64 string using
the get_pil_image
function. It then saves this image in memory in PNG format and
retrieves its binary content. Depending on the use_b64
flag, this binary content is
either returned directly or as a base64-encoded string.
Arguments:
image_file
str, or Image - The path to the image file, a URL to an image, or a base64-encoded string of the image.use_b64
bool - If True, the function returns a base64-encoded string of the image data. If False, it returns the raw byte data of the image. Defaults to True.
Returns:
bytes
- The image data in raw bytes ifuse_b64
is False, or a base64-encoded string ifuse_b64
is True.
llava_formatter
def llava_formatter(prompt: str,
order_image_tokens: bool = False) -> Tuple[str, List[str]]
Formats the input prompt by replacing image tags and returns the new prompt along with image locations.
Arguments:
- prompt (str): The input string that may contain image tags like <img ...>.
- order_image_tokens (bool, optional): Whether to order the image tokens with numbers. It will be useful for GPT-4V. Defaults to False.
Returns:
- Tuple[str, List[str]]: A tuple containing the formatted string and a list of images (loaded in b64 format).
pil_to_data_uri
def pil_to_data_uri(image: Image.Image) -> str
Converts a PIL Image object to a data URI.
Arguments:
image
Image.Image - The PIL Image object.
Returns:
str
- The data URI string.
gpt4v_formatter
def gpt4v_formatter(prompt: str,
img_format: str = "uri") -> List[Union[str, dict]]
Formats the input prompt by replacing image tags and returns a list of text and images.
Arguments:
- prompt (str): The input string that may contain image tags like <img ...>.
- img_format (str): what image format should be used. One of "uri", "url", "pil".
Returns:
- List[Union[str, dict]]: A list of alternating text and image dictionary items.
extract_img_paths
def extract_img_paths(paragraph: str) -> list
Extract image paths (URLs or local paths) from a text paragraph.
Arguments:
paragraph
str - The input text paragraph.
Returns:
list
- A list of extracted image paths.
message_formatter_pil_to_b64
def message_formatter_pil_to_b64(messages: List[Dict]) -> List[Dict]
Converts the PIL image URLs in the messages to base64 encoded data URIs.
This function iterates over a list of message dictionaries. For each message, if it contains a 'content' key with a list of items, it looks for items with an 'image_url' key. The function then converts the PIL image URL (pointed to by 'image_url') to a base64 encoded data URI.
Arguments:
messages
List[Dict] - A list of message dictionaries. Each dictionary may contain a 'content' key with a list of items, some of which might be image URLs.
Returns:
-
List[Dict]
- A new list of message dictionaries with PIL image URLs in the 'image_url' key converted to base64 encoded data URIs.Example Input: [
-
{'content'
- [{'type': 'text', 'text': 'You are a helpful AI assistant.'}], 'role': 'system'}, -
{'content'
- [ -
{'type'
- 'text', 'text': "What's the breed of this dog here? "}, -
{'type'
- 'image_url', 'image_url': {'url': a PIL.Image.Image}}, -
{'type'
- 'text', 'text': '.'}], -
'role'
- 'user'} ]Example Output: [
-
{'content'
- [{'type': 'text', 'text': 'You are a helpful AI assistant.'}], 'role': 'system'}, -
{'content'
- [ -
{'type'
- 'text', 'text': "What's the breed of this dog here? "}, -
{'type'
- 'image_url', 'image_url': {'url': a B64 Image}}, -
{'type'
- 'text', 'text': '.'}], -
'role'
- 'user'} ]