Play with OpenAI API¶
In this part we will see how to play with the OpenAI API, which is a special API because it return output text in reaction to a given prompts 🤖
This is great for conversation, but when we want to use these responses in our apps, we run into a bit of a snag. Our apps often need these replies in a format that's well-organized and ready for further processing. This is where Pydantic
shines.
It acts as a bridge, converting the LLM's text replies into the structured format our apps crave. With Pydantic, we can set up the exact data model we want, molding the LLM's output to fit. This ensures our LLMs give us responses that are not only structured and validated but also way more practical for our needs.
What is Pydantic¶
Pydantic is the most widely used data validation library for Python, basically instead of using a JSON Schema you can use Pydantic, it has several key advantages:
- Widespread Adoption: Pydantic has been downloaded over a 100 million times a month and used by over 250k repositories on Github. Its is a familiar tool in every Python developers tool kit.
- Simplicity: Pydantic allows you to define your models in Python, avoiding the complexities of JSON Schema.
- Framework Compatibility: Many popular Python frameworks already use Pydantic, making it a natural choice.
You can check the official documentation here 🤓
#!pip install openai --upgrade
#official example from the doc
from openai import OpenAI
client = OpenAI(api_key="...")#put your key here
completion = client.chat.completions.create(
model="gpt-3.5-turbo",
messages=[
{"role": "system", "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."},
{"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
]
)
print(completion.choices[0].message)
ChatCompletionMessage(content="In the realm of code, where logic unfolds,\nThere lies a concept, mysterious and bold.\nIts name, recursion, echoes in the night,\nA poetic dance, a sorcerer's delight.\n\nWith humble grace, it enters the fray,\nA tale of repetition, in an enchanting way.\nLike a mirror reflecting its own reflection,\nRecursion beckons, defying convention.\n\nThrough loops and loops, it goes on a quest,\nExploring the depths, as it journeys abreast.\nA function calling itself, a daring act,\nCreating a puzzle, where answers are stacked.\n\nWith each call, like ripples in a pond,\nThe problem's domain shrinks, oh so fond.\nDivide and conquer, its secret intention,\nSolving complex tasks with mathematical invention.\n\nLike a spiral staircase reaching for the sky,\nRecursion dances, reaching ever high.\nA mesmerizing fractal, an infinite embrace,\nUnraveling mysteries, at a steady pace.\n\nBut tread with caution, for power comes with care,\nRecursion can spiral into an infinite affair.\nBase cases, like anchors, must be sound,\nVerifying escape routes, holding solid ground.\n\nYet, in its essence, recursion remains sublime,\nA symphony of patterns, a poetic chime.\nFrom fractals to trees, and mazes intricate,\nRecursion weaves dreams, a creator innate.\n\nSo let us marvel at this coding art,\nWith every cycle, a humble restart.\nFor in the realm of programming's domain,\nRecursion is the poet's whisper, the programmer's refrain.", role='assistant', function_call=None, tool_calls=None)
type(completion.choices[0].message)
openai.types.chat.chat_completion_message.ChatCompletionMessage
OpenAI in a nutshell¶
Now let's start to encapsulate our client.chat.completions.create()
object inside a python function like this :
def get_completion(prompt: str, model: str = "gpt-3.5-turbo") -> str:
"""
Query your LLM model with your prompt.
Parameters:
prompt (str): The text prompt you want the LLM to respond to.
model (str, optional): The model to be used for generating the response. Default is "gpt-3.5-turbo".
Returns:
str: The generated text completion from the specified model.
"""
messages = [{"role": "user", "content": prompt}]
response = client.chat.completions.create(
model= model,
messages=messages,
temperature=0
)
return response.choices[0].message
response = get_completion("What are the top three big cities in Europe by population?")
print(response)
ChatCompletionMessage(content='The top three big cities in Europe by population are:\n\n1. Istanbul, Turkey - With a population of over 15 million people, Istanbul is the most populous city in Europe.\n2. Moscow, Russia - Moscow is the second most populous city in Europe, with a population of over 12 million people.\n3. London, United Kingdom - London is the third most populous city in Europe, with a population of over 9 million people.', role='assistant', function_call=None, tool_calls=None)
Not very useful as a variable as you can see 😢
Add Pydantic secret sauce¶
Let's use Pydantic models to create a well-organized output for our data. We'll make a CityResponse model that gathers important information like the city's name, its country, population, and the local currency. Next, we'll group these cities under a Cities model.
By doing this, we can produce a tailored list of cities, complete with detailed information, all neatly arranged for efficient data management. It's important to remember to include a description for each field because
Because LLM tools are the best LangChain
will use these descriptions to generate a prompt later 😎
from pydantic import BaseModel, Field
from typing import List
class CityResponse(BaseModel):
city_name: str = Field(description="This is the Name of the city")
country: str = Field(description="This is the country of the city")
population_number: int = Field(description="This is the number of inhabitants")
local_currency: str = Field(description="This is the local currency of the city")
class Cities(BaseModel):
city: List[CityResponse]
Langchain for better prompt¶
In this part we will add the famous Langchain
framework in order to shape the output of our Language Model to meet the formatting we desire 👨🍳
#!pip install langchain
from langchain.output_parsers import PydanticOutputParser
pydantic_parser = PydanticOutputParser(pydantic_object=Cities)
format_instructions = pydantic_parser.get_format_instructions()
print(format_instructions)
The output should be formatted as a JSON instance that conforms to the JSON schema below. As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]} the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted. Here is the output schema: ``` {"properties": {"city": {"title": "City", "type": "array", "items": {"$ref": "#/definitions/CityResponse"}}}, "required": ["city"], "definitions": {"CityResponse": {"title": "CityResponse", "type": "object", "properties": {"city_name": {"title": "City Name", "description": "This is the Name of the city", "type": "string"}, "country": {"title": "Country", "description": "This is the country of the city", "type": "string"}, "population_number": {"title": "Population Number", "description": "This is the number of inhabitants", "type": "integer"}, "local_currency": {"title": "Local Currency", "description": "This is the local currency of the city", "type": "string"}}, "required": ["city_name", "country", "population_number", "local_currency"]}}} ```
The format instructions above serve as a template for structuring the output of our Language Model. These instructions establish that the output should adhere to a particular JSON schema.
They provide some examples where one JSON instance correctly follows the schema — an object with a
foo key
and array value, and another instance which does not align with the schema.
Structured query for structured responses¶
Now let’s use PromptTemplate
from LangChain to configure a structured input to our language model.
from langchain.prompts import PromptTemplate
query = "What are the top three big cities in Europe by population?"
prompt = PromptTemplate(
template="Answer the user query.\n{format_instructions}\n{query}\n",
input_variables=["query"],
partial_variables={"format_instructions": pydantic_parser.get_format_instructions()},
)
_input = prompt.format_prompt(query=query)
answer = get_completion(_input.to_string())
print(answer.message.content)
{"city": [ { "city_name": "Istanbul", "country": "Turkey", "population_number": 15029231, "local_currency": "Turkish Lira" }, { "city_name": "Moscow", "country": "Russia", "population_number": 12692466, "local_currency": "Russian Ruble" }, { "city_name": "London", "country": "United Kingdom", "population_number": 9304016, "local_currency": "British Pound" } ]}
Validation¶
Moreover, Pydantic models allow us to incorporate validation rules for each field. This added step of validation makes sure that every response from the LLM matches our set standards exactly.
For example, imagine we need city names to be in uppercase. To achieve this, we'll use Pydantic's validator decorator and place our specific conditions in a class method, say validate_cities. This way, when we get a response from the LLM, we can check if it meets our requirements.
from typing import List
from pydantic import BaseModel, Field, validator
from langchain.prompts import PromptTemplate
class CityResponse(BaseModel):
city_name: str = Field(description="This is the Name of the city")
country: str = Field(description="This is the country of the city")
population_number: int = Field(description="This is the number of inhabitants")
local_currency: str = Field(description="This is the local currency of the city")
class Cities(BaseModel):
cities: List[CityResponse]
@validator("cities", pre=True)
def validate_cities(cls, v):
for city in v:
city_name = city.get("city_name", None)
if not city_name:
raise ValueError(f"'city_name' is required")
if not city_name.isupper():
raise ValueError(f"City name '{city_name}' is not uppercase")
return v
pydantic_parser = PydanticOutputParser(pydantic_object=Cities)
query = "What are the top three big cities in Europe by population?"
prompt = PromptTemplate(
template="Answer the user query.\n{format_instructions}\n{query}\n",
input_variables=["query"],
partial_variables={"format_instructions": pydantic_parser.get_format_instructions()},
)
_input = prompt.format_prompt(query=query)
answer = get_completion(_input.to_string())
pydantic_parser.parse(answer.message.content)
--------------------------------------------------------------------------- ValidationError Traceback (most recent call last) File ~/anaconda3/envs/vision310/lib/python3.10/site-packages/langchain/output_parsers/pydantic.py:30, in PydanticOutputParser.parse(self, text) 29 json_object = json.loads(json_str, strict=False) ---> 30 return self.pydantic_object.parse_obj(json_object) 32 except (json.JSONDecodeError, ValidationError) as e: File ~/anaconda3/envs/vision310/lib/python3.10/site-packages/pydantic/main.py:527, in pydantic.main.BaseModel.parse_obj() File ~/anaconda3/envs/vision310/lib/python3.10/site-packages/pydantic/main.py:342, in pydantic.main.BaseModel.__init__() ValidationError: 1 validation error for Cities cities City name 'Istanbul' is not uppercase (type=value_error) During handling of the above exception, another exception occurred: OutputParserException Traceback (most recent call last) Input In [37], in <cell line: 34>() 32 _input = prompt.format_prompt(query=query) 33 answer = get_completion(_input.to_string()) ---> 34 pydantic_parser.parse(answer.message.content) File ~/anaconda3/envs/vision310/lib/python3.10/site-packages/langchain/output_parsers/pydantic.py:35, in PydanticOutputParser.parse(self, text) 33 name = self.pydantic_object.__name__ 34 msg = f"Failed to parse {name} from completion {text}. Got: {e}" ---> 35 raise OutputParserException(msg, llm_output=text) OutputParserException: Failed to parse Cities from completion {"cities": [ { "city_name": "Istanbul", "country": "Turkey", "population_number": 15029231, "local_currency": "Turkish Lira" }, { "city_name": "Moscow", "country": "Russia", "population_number": 12692466, "local_currency": "Russian Ruble" }, { "city_name": "London", "country": "United Kingdom", "population_number": 9304016, "local_currency": "British Pound" } ]}. Got: 1 validation error for Cities cities City name 'Istanbul' is not uppercase (type=value_error)
Wrap it up¶
To wrap it up, integrating Pydantic models, PromptTemplate and PydanticOutputParser from LangChain into our Python programming offers a crucial method for extracting well-organized and detailed data from Large Language Models.
This strategy simplifies the challenging task of dealing with unstructured outputs and ensures that we achieve a high standard of data quality, meeting all our specific needs and criteria.
Next level with instructor
library¶
The project instructor in conjunction with Pydantic, can revolutionized the way we interact with language models. It brings simplicity, modularity, and a high degree of customization, making the OpenAI SDK even more usable for developers.
Handle proper retry¶
One of the coolest feature of instructor for me is it's validation abstraction and the max_retries
parameter. Here, the UserDetails model is passed as the response_model, and max_retries is set to 2.
import instructor
from openai import OpenAI
from pydantic import BaseModel, field_validator
# Apply the patch to the OpenAI client
client = instructor.patch(OpenAI())
class UserDetails(BaseModel):
name: str
age: int
@field_validator("name")
@classmethod
def validate_name(cls, v):
if v.upper() != v:
raise ValueError("Name must be in uppercase.")
return v
model = client.chat.completions.create(
model="gpt-3.5-turbo",
response_model=UserDetails,
max_retries=2,
messages=[
{"role": "user", "content": "Extract jason is 25 years old"},
],
)
assert model.name == "JASON"