This article is contributed. See the original author and article here.
Introduction
In this article we will demonstrate how we leverage GPT-4o capabilities, using images with function calling to unlock multimodal use cases.
We will simulate a package routing service that routes packages based on the shipping label using OCR with GPT-4o.
The model will identify the appropriate function to call based on the image analysis and the predefined actions for routing to the appropriate continent.
Background
The new GPT-4o (“o” for “omni”) can reason across audio, vision, and text in real time.
- It can respond to audio inputs in as little as 232 milliseconds, with an average of 320 milliseconds, which is similar to human response time in a conversation.
- It matches GPT-4 Turbo performance on text in English and code, with significant improvement on text in non-English languages, while also being much faster and 50% cheaper in the API.
- GPT-4o is especially better at vision and audio understanding compared to existing models.
- GPT-4o now enables function calling.
The application
We will run a Jupyter notebook that connects to GPT-4o to sort packages based on the printed labels with the shipping address.
Here are some sample labels we will be using GPT-4o for OCR to get the country this is being shipped to and GPT-4o functions to route the packages.
The environment
The code can be found here – Azure OpenAI code examples
Make sure you create your python virtual environment and fill the environment variables as stated in the README.md file.
The code
Connecting to Azure OpenAI GPT-4o deployment.
from dotenv import load_dotenv
from IPython.display import display, HTML, Image
import os
from openai import AzureOpenAI
import json
load_dotenv()
GPT4o_API_KEY = os.getenv("GPT4o_API_KEY")
GPT4o_DEPLOYMENT_ENDPOINT = os.getenv("GPT4o_DEPLOYMENT_ENDPOINT")
GPT4o_DEPLOYMENT_NAME = os.getenv("GPT4o_DEPLOYMENT_NAME")
client = AzureOpenAI(
azure_endpoint = GPT4o_DEPLOYMENT_ENDPOINT,
api_key=GPT4o_API_KEY,
api_version="2024-02-01"
)
Defining the functions to be called after GPT-4o answers.
# Defining the functions - in this case a toy example of a shipping function
def ship_to_Oceania(location):
return f"Shipping to Oceania based on location {location}"
def ship_to_Europe(location):
return f"Shipping to Europe based on location {location}"
def ship_to_US(location):
return f"Shipping to Americas based on location {location}"
Defining the available functions to be called to send to GPT-4o.
It is very IMPORTANT to send the function’s and parameters descriptions so GPT-4o will know which method to call.
tools = [
{
"type": "function",
"function": {
"name": "ship_to_Oceania",
"description": "Shipping the parcel to any country in Oceania",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The country to ship the parcel to.",
}
},
"required": ["location"],
},
},
},
{
"type": "function",
"function": {
"name": "ship_to_Europe",
"description": "Shipping the parcel to any country in Europe",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The country to ship the parcel to.",
}
},
"required": ["location"],
},
},
},
{
"type": "function",
"function": {
"name": "ship_to_US",
"description": "Shipping the parcel to any country in the United States",
"parameters": {
"type": "object",
"properties": {
"location": {
"type": "string",
"description": "The country to ship the parcel to.",
}
},
"required": ["location"],
},
},
},
]
available_functions = {
"ship_to_Oceania": ship_to_Oceania,
"ship_to_Europe": ship_to_Europe,
"ship_to_US": ship_to_US,
}
Function to base64 encode our images, this is the format accepted by GPT-4o.
# Encoding the images to send to GPT-4-O
import base64
def encode_image(image_path):
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode("utf-8")
The method to call GPT-4o.
Notice below that we send the parameter “tools” with the JSON describing the functions to be called.
def call_OpenAI(messages, tools, available_functions):
# Step 1: send the prompt and available functions to GPT
response = client.chat.completions.create(
model=GPT4o_DEPLOYMENT_NAME,
messages=messages,
tools=tools,
tool_choice="auto",
)
response_message = response.choices[0].message
# Step 2: check if GPT wanted to call a function
if response_message.tool_calls:
print("Recommended Function call:")
print(response_message.tool_calls[0])
print()
# Step 3: call the function
# Note: the JSON response may not always be valid; be sure to handle errors
function_name = response_message.tool_calls[0].function.name
# verify function exists
if function_name not in available_functions:
return "Function " + function_name + " does not exist"
function_to_call = available_functions[function_name]
# verify function has correct number of arguments
function_args = json.loads(response_message.tool_calls[0].function.arguments)
if check_args(function_to_call, function_args) is False:
return "Invalid number of arguments for function: " + function_name
# call the function
function_response = function_to_call(**function_args)
print("Output of function call:")
print(function_response)
print()
Please note that WE and not GPT-4o call the methods in our code based on the answer by GTP4-o.
# call the function
function_response = function_to_call(**function_args)
Iterate through all the images in the folder.
Notice the system prompt where we ask GPT-4o what we need it to do, sort labels for packages routing calling functions.
# iterate through all the images in the data folder
import os
data_folder = "./data"
for image in os.listdir(data_folder):
if image.endswith(".png"):
IMAGE_PATH = os.path.join(data_folder, image)
base64_image = encode_image(IMAGE_PATH)
display(Image(IMAGE_PATH))
messages = [
{"role": "system", "content": "You are a customer service assistant for a delivery service, equipped to analyze images of package labels. Based on the country to ship the package to, you must always ship to the corresponding continent. You must always use tools!"},
{"role": "user", "content": [
{"type": "image_url", "image_url": {
"url": f"data:image/png;base64,{base64_image}"}
}
]}
]
call_OpenAI(messages, tools, available_functions)
Let’s run our notebook!!!
Running our code for the label above produces the following output:
Recommended Function call:
ChatCompletionMessageToolCall(id='call_lH2G1bh2j1IfBRzZcw84wg0x', function=Function(arguments='{"location":"United States"}', name='ship_to_US'), type='function')
Output of function call:
Shipping to Americas based on location United States
That’s all folks!
Thanks
Denise
Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.
Recent Comments