Use AutoGen with Gemini via VertexAI
This notebook demonstrates how to use Autogen with Gemini via Vertex AI, which enables enhanced authentication method that also supports enterprise requirements using service accounts or even a personal Google cloud account.
Requirements
Install AutoGen with Gemini features:
pip install autogen-agentchat[gemini]~=0.2
Install other Dependencies of this Notebook
pip install chromadb markdownify pypdf
Google Cloud Account
To use VertexAI a Google Cloud account is needed. If you do not have one yet, just sign up for a free trial here.
Login to your account at console.cloud.google.com
In the next step we create a Google Cloud project, which is needed for VertexAI. The official guide for creating a project is available is here.
We will name our project Autogen-with-Gemini.
Enable Google Cloud APIs
If you wish to use Gemini with your personal account, then creating a Google Cloud account is enough. However, if a service account is needed, then a few extra steps are needed.
Enable API for Gemini
- For enabling Gemini for Google Cloud search for “api” and select Enabled APIs & services.
- Then click ENABLE APIS AND SERVICES.
- Search for Gemini, and select Gemini for Google Cloud.
A direct link will look like this for our autogen-with-gemini project: https://console.cloud.google.com/apis/library/cloudaicompanion.googleapis.com?project=autogen-with-gemini&supportedpurview=project - Click ENABLE for Gemini for Google Cloud.
Enable API for Vertex AI
- For enabling Vertex AI for Google Cloud search for “api” and select Enabled APIs & services.
- Then click ENABLE APIS AND SERVICES.
- Search for Vertex AI, and select Vertex AI API.
A direct link for our autogen-with-gemini will be: https://console.cloud.google.com/apis/library/aiplatform.googleapis.com?project=autogen-with-gemini - Click ENABLE Vertex AI API for Google Cloud.
Create a Service Account
You can find an overview of service accounts can be found in the cloud console
Detailed guide: https://cloud.google.com/iam/docs/service-accounts-create
A service account can be created within the scope of a project, so a project needs to be selected.
Next we assign the Vertex AI
User
for the service account. This can be done in the Google Cloud
console
in our autogen-with-gemini
project.
Alternatively, we can also
grant the Vertex AI
User
role by running a command using the gcloud CLI, for example in Cloud
Shell:
gcloud projects add-iam-policy-binding autogen-with-gemini \
--member=serviceAccount:autogen@autogen-with-gemini.iam.gserviceaccount.com --role roles/aiplatform.user
- Under IAM & Admin > Service Account select the newly created service accounts, and click the option “Manage keys” among the items.
- From the “ADD KEY” dropdown select “Create new key” and select the
JSON format and click CREATE.
- The new key will be downloaded automatically.
- You can then upload the service account key file to the from where
you will be running autogen.
- Please consider restricting the permissions on the key file. For
example, you could run
chmod 600 autogen-with-gemini-service-account-key.json
if your keyfile is called autogen-with-gemini-service-account-key.json.
- Please consider restricting the permissions on the key file. For
example, you could run
Configure Authentication
Authentication happens using standard Google Cloud authentication
methods,
which
means that either an already active session can be reused, or by
specifying the Google application credentials of a service account.
Additionally, AutoGen also supports authentication using
Credentials
objects in Python with the google-auth
library, which enables even more
flexibility.
For example, we can even use impersonated credentials.
Use Service Account Keyfile
The Google Cloud service account can be specified by setting the
GOOGLE_APPLICATION_CREDENTIALS
environment variable to the path to the
JSON key file of the service account.
We could even just directly set the environment variable, or we can add
the "google_application_credentials"
key with the respective value for
our model in the OAI_CONFIG_LIST.
Use the Google Default Credentials
If you are using Cloud
Shell or Cloud Shell
editor in Google
Cloud,
then you are already authenticated. If you have the Google
Cloud SDK installed locally,
then you can login by running
gcloud auth application-default login
in the command line.
Detailed instructions for installing the Google Cloud SDK can be found here.
Authentication with the Google Auth Library for Python
The google-auth library supports a wide range of authentication
scenarios, and you can simply pass a previously created Credentials
object to the llm_config
.
The official
documentation of the Python
package provides a detailed overview of the supported methods and usage
examples.
If you are already authenticated, like in Cloud
Shell, or after running the
gcloud auth application-default login
command in a CLI, then the
google.auth.default()
Python method will automatically return your
currently active credentials.
Example Config List
The config could look like the following (change project_id
and
google_application_credentials
):
config_list = [
{
"model": "gemini-pro",
"api_type": "google",
"project_id": "autogen-with-gemini",
"location": "us-west1"
},
{
"model": "gemini-1.5-pro-001",
"api_type": "google",
"project_id": "autogen-with-gemini",
"location": "us-west1"
},
{
"model": "gemini-1.5-pro",
"api_type": "google",
"project_id": "autogen-with-gemini",
"location": "us-west1",
"google_application_credentials": "autogen-with-gemini-service-account-key.json"
},
{
"model": "gemini-pro-vision",
"api_type": "google",
"project_id": "autogen-with-gemini",
"location": "us-west1"
}
]
Configure Safety Settings for VertexAI
Configuring safety settings for VertexAI is slightly different, as we have to use the speicialized safety setting object types instead of plain strings
from vertexai.generative_models import HarmBlockThreshold, HarmCategory
safety_settings = {
HarmCategory.HARM_CATEGORY_HARASSMENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,
HarmCategory.HARM_CATEGORY_HATE_SPEECH: HarmBlockThreshold.BLOCK_ONLY_HIGH,
HarmCategory.HARM_CATEGORY_SEXUALLY_EXPLICIT: HarmBlockThreshold.BLOCK_ONLY_HIGH,
HarmCategory.HARM_CATEGORY_DANGEROUS_CONTENT: HarmBlockThreshold.BLOCK_ONLY_HIGH,
}
import os
from typing import Any, Callable, Dict, List, Optional, Tuple, Type, Union
import chromadb
from PIL import Image
from termcolor import colored
import autogen
from autogen import Agent, AssistantAgent, ConversableAgent, UserProxyAgent
from autogen.agentchat.contrib.img_utils import _to_pil, get_image_data
from autogen.agentchat.contrib.multimodal_conversable_agent import MultimodalConversableAgent
from autogen.agentchat.contrib.retrieve_user_proxy_agent import RetrieveUserProxyAgent
from autogen.code_utils import DEFAULT_MODEL, UNKNOWN, content_str, execute_code, extract_code, infer_lang
config_list_gemini = autogen.config_list_from_json(
"OAI_CONFIG_LIST",
filter_dict={
"model": ["gemini-1.5-pro"],
},
)
config_list_gemini_vision = autogen.config_list_from_json(
"OAI_CONFIG_LIST",
filter_dict={
"model": ["gemini-pro-vision"],
},
)
for config_list in [config_list_gemini, config_list_gemini_vision]:
for config_list_item in config_list:
config_list_item["safety_settings"] = safety_settings
seed = 25 # for caching
assistant = AssistantAgent(
"assistant", llm_config={"config_list": config_list_gemini, "seed": seed}, max_consecutive_auto_reply=3
)
user_proxy = UserProxyAgent(
"user_proxy",
code_execution_config={"work_dir": "coding", "use_docker": False},
human_input_mode="NEVER",
is_termination_msg=lambda x: content_str(x.get("content")).find("TERMINATE") >= 0,
)
result = user_proxy.initiate_chat(
assistant,
message="""
Compute the integral of the function f(x)=x^2 on the interval 0 to 1 using a Python script,
which returns the value of the definite integral""",
)
Compute the integral of the function f(x)=x^2 on the interval 0 to 1 using a Python script,
which returns the value of the definite integral
--------------------------------------------------------------------------------
Plan:
1. (code) Use Python's `scipy.integrate.quad` function to compute the integral.
```python
# filename: integral.py
from scipy.integrate import quad
def f(x):
return x**2
result, error = quad(f, 0, 1)
print(f"The definite integral of x^2 from 0 to 1 is: {result}")
```
Let me know when you have executed this code.
--------------------------------------------------------------------------------
>>>>>>>> EXECUTING CODE BLOCK 0 (inferred language is python)...
exitcode: 0 (execution succeeded)
Code output:
The definite integral of x^2 from 0 to 1 is: 0.33333333333333337
--------------------------------------------------------------------------------
The script executed successfully and returned the definite integral's value as approximately 0.33333333333333337.
This aligns with the analytical solution. The indefinite integral of x^2 is (x^3)/3. Evaluating this from 0 to 1 gives us (1^3)/3 - (0^3)/3 = 1/3 = 0.33333...
Therefore, the script successfully computed the integral of x^2 from 0 to 1.
TERMINATE
--------------------------------------------------------------------------------
Example with Gemini Multimodal
Authentication is the same for vision models as for the text based
Gemini models.
In this example an object of type Credentials
will be supplied in order to authenticate.
Here, we will use the
google application default credentials, so make sure to run the
following commands if you are not yet authenticated:
export GOOGLE_APPLICATION_CREDENTIALS=autogen-with-gemini-service-account-key.json
gcloud auth application-default login
gcloud config set project autogen-with-gemini
The GOOGLE_APPLICATION_CREDENTIALS
environment variable is a path to
our service account JSON keyfile, as described in the Use Service
Account Keyfile section above.
We also need to
set the Google cloud project, which is autogen-with-gemini
in this
example.
Note, we could also run gcloud auth application-default login
to use
our personal Google account instead of a service account. In this case
we need to run the following commands:
gcloud gcloud auth application-default login
gcloud config set project autogen-with-gemini
import google.auth
scopes = ["https://www.googleapis.com/auth/cloud-platform"]
credentials, project_id = google.auth.default(scopes)
gemini_vision_config = [
{
"model": "gemini-pro-vision",
"api_type": "google",
"project_id": project_id,
"credentials": credentials,
"location": "us-west1",
"safety_settings": safety_settings,
}
]
image_agent = MultimodalConversableAgent(
"Gemini Vision", llm_config={"config_list": gemini_vision_config, "seed": seed}, max_consecutive_auto_reply=1
)
user_proxy = UserProxyAgent("user_proxy", human_input_mode="NEVER", max_consecutive_auto_reply=0)
user_proxy.initiate_chat(
image_agent,
message="""Describe what is in this image?
<img https://github.com/microsoft/autogen/blob/0.2/website/static/img/autogen_agentchat.png?raw=true>.""",
)
user_proxy (to Gemini Vision):
Describe what is in this image?
<image>.
--------------------------------------------------------------------------------
>>>>>>>> USING AUTO REPLY...
Gemini
The image describes a conversational agent that is able to have a conversation with a human user. The agent can be customized to the user's preferences. The conversation can be in form of a joint chat or hierarchical chat.
--------------------------------------------------------------------------------
ChatResult(chat_id=None, chat_history=[{'content': 'Describe what is in this image?\n<img https://github.com/microsoft/autogen/blob/main/website/static/img/autogen_agentchat.png?raw=true>.', 'role': 'assistant'}, {'content': " The image describes a conversational agent that is able to have a conversation with a human user. The agent can be customized to the user's preferences. The conversation can be in form of a joint chat or hierarchical chat.", 'role': 'user'}], summary=" The image describes a conversational agent that is able to have a conversation with a human user. The agent can be customized to the user's preferences. The conversation can be in form of a joint chat or hierarchical chat.", cost={'usage_including_cached_inference': {'total_cost': 0.0001995, 'gemini-pro-vision': {'cost': 0.0001995, 'prompt_tokens': 267, 'completion_tokens': 44, 'total_tokens': 311}}, 'usage_excluding_cached_inference': {'total_cost': 0}}, human_input=[])
Use Gemini via the OpenAI Library in Autogen
Using Gemini via the OpenAI library is also possible once you are
already authenticated.
Run gcloud auth application-default login
to set up application default credentials locally for the example
below.
Also set the Google cloud project on the CLI if you have not
done so far:
gcloud config set project autogen-with-gemini
The prerequisites are essentially the same as in the example above.
You can read more on the topic in the official Google
docs.
A list of currently supported models can also be found in the
docs
Note, that you will need to refresh your token regularly, by
default every 1 hour.
import google.auth
scopes = ["https://www.googleapis.com/auth/cloud-platform"]
creds, project = google.auth.default(scopes)
auth_req = google.auth.transport.requests.Request()
creds.refresh(auth_req)
location = "us-west1"
prompt_price_per_1k = (
0.000125 # For more up-to-date prices see https://cloud.google.com/vertex-ai/generative-ai/pricing
)
completion_token_price_per_1k = (
0.000375 # For more up-to-date prices see https://cloud.google.com/vertex-ai/generative-ai/pricing
)
openai_gemini_config = [
{
"model": "google/gemini-1.5-pro-001",
"api_type": "openai",
"base_url": f"https://{location}-aiplatform.googleapis.com/v1beta1/projects/{project}/locations/{location}/endpoints/openapi",
"api_key": creds.token,
"price": [prompt_price_per_1k, completion_token_price_per_1k],
}
]
assistant = AssistantAgent("assistant", llm_config={"config_list": openai_gemini_config}, max_consecutive_auto_reply=3)
user_proxy = UserProxyAgent(
"user_proxy",
code_execution_config={"work_dir": "coding", "use_docker": False},
human_input_mode="NEVER",
is_termination_msg=lambda x: content_str(x.get("content")).find("TERMINATE") >= 0,
)
result = user_proxy.initiate_chat(
assistant,
message="""
Compute the integral of the function f(x)=x^3 on the interval 0 to 10 using a Python script,
which returns the value of the definite integral.""",
)
Compute the integral of the function f(x)=x^3 on the interval 0 to 10 using a Python script,
which returns the value of the definite integral.
--------------------------------------------------------------------------------
```python
# filename: integral.py
def integrate_x_cubed(a, b):
"""
This function calculates the definite integral of x^3 from a to b.
Args:
a: The lower limit of integration.
b: The upper limit of integration.
Returns:
The value of the definite integral.
"""
return (b**4 - a**4) / 4
# Calculate the integral of x^3 from 0 to 10
result = integrate_x_cubed(0, 10)
# Print the result
print(result)
```
This script defines a function `integrate_x_cubed` that takes the lower and upper limits of integration as arguments and returns the definite integral of x^3 using the power rule of integration. The script then calls this function with the limits 0 and 10 and prints the result.
Execute the script `python integral.py`, you should get the result: `2500.0`.
TERMINATE
--------------------------------------------------------------------------------