Play with OpenAI API¶

In this part we will see how to play with the OpenAI API, which is a special API because it return output text in reaction to a given prompts 🤖

This is great for conversation, but when we want to use these responses in our apps, we run into a bit of a snag. Our apps often need these replies in a format that's well-organized and ready for further processing. This is where Pydantic shines.

It acts as a bridge, converting the LLM's text replies into the structured format our apps crave. With Pydantic, we can set up the exact data model we want, molding the LLM's output to fit. This ensures our LLMs give us responses that are not only structured and validated but also way more practical for our needs.

What is Pydantic¶

Pydantic is the most widely used data validation library for Python, basically instead of using a JSON Schema you can use Pydantic, it has several key advantages:

Widespread Adoption: Pydantic has been downloaded over a 100 million times a month and used by over 250k repositories on Github. Its is a familiar tool in every Python developers tool kit.
Simplicity: Pydantic allows you to define your models in Python, avoiding the complexities of JSON Schema.
Framework Compatibility: Many popular Python frameworks already use Pydantic, making it a natural choice.

You can check the official documentation here 🤓

In [1]:

Copied!

#!pip install openai --upgrade
#!pip install openai --upgrade

In [39]:

Copied!





#official example from the doc
from openai import OpenAI

client = OpenAI(api_key="...")#put your key here 

completion = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."},
        {"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
      ]
)

print(completion.choices[0].message)
#official example from the doc
from openai import OpenAI

client = OpenAI(api_key="...")#put your key here 

completion = client.chat.completions.create(
    model="gpt-3.5-turbo",
    messages=[
        {"role": "system", "content": "You are a poetic assistant, skilled in explaining complex programming concepts with creative flair."},
        {"role": "user", "content": "Compose a poem that explains the concept of recursion in programming."}
      ]
)

print(completion.choices[0].message)

ChatCompletionMessage(content="In the realm of code, where logic unfolds,\nThere lies a concept, mysterious and bold.\nIts name, recursion, echoes in the night,\nA poetic dance, a sorcerer's delight.\n\nWith humble grace, it enters the fray,\nA tale of repetition, in an enchanting way.\nLike a mirror reflecting its own reflection,\nRecursion beckons, defying convention.\n\nThrough loops and loops, it goes on a quest,\nExploring the depths, as it journeys abreast.\nA function calling itself, a daring act,\nCreating a puzzle, where answers are stacked.\n\nWith each call, like ripples in a pond,\nThe problem's domain shrinks, oh so fond.\nDivide and conquer, its secret intention,\nSolving complex tasks with mathematical invention.\n\nLike a spiral staircase reaching for the sky,\nRecursion dances, reaching ever high.\nA mesmerizing fractal, an infinite embrace,\nUnraveling mysteries, at a steady pace.\n\nBut tread with caution, for power comes with care,\nRecursion can spiral into an infinite affair.\nBase cases, like anchors, must be sound,\nVerifying escape routes, holding solid ground.\n\nYet, in its essence, recursion remains sublime,\nA symphony of patterns, a poetic chime.\nFrom fractals to trees, and mazes intricate,\nRecursion weaves dreams, a creator innate.\n\nSo let us marvel at this coding art,\nWith every cycle, a humble restart.\nFor in the realm of programming's domain,\nRecursion is the poet's whisper, the programmer's refrain.", role='assistant', function_call=None, tool_calls=None)

In [13]:

Copied!

type(completion.choices[0].message)
type(completion.choices[0].message)

Out[13]:

openai.types.chat.chat_completion_message.ChatCompletionMessage

OpenAI in a nutshell¶

Now let's start to encapsulate our client.chat.completions.create() object inside a python function like this :

In [40]:

Copied!





def get_completion(prompt: str, model: str = "gpt-3.5-turbo") -> str:
    """
    Query your LLM model with your prompt.
    Parameters:
    prompt (str): The text prompt you want the LLM to respond to.
    model (str, optional): The model to be used for generating the response. Default is "gpt-3.5-turbo".
    Returns:
    str: The generated text completion from the specified model.
    """
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model= model,
        messages=messages,
        temperature=0
    )
    return response.choices[0].message
def get_completion(prompt: str, model: str = "gpt-3.5-turbo") -> str:
    """
    Query your LLM model with your prompt.
    Parameters:
    prompt (str): The text prompt you want the LLM to respond to.
    model (str, optional): The model to be used for generating the response. Default is "gpt-3.5-turbo".
    Returns:
    str: The generated text completion from the specified model.
    """
    messages = [{"role": "user", "content": prompt}]
    response = client.chat.completions.create(
        model= model,
        messages=messages,
        temperature=0
    )
    return response.choices[0].message

In [41]:

Copied!

response = get_completion("What are the top three big cities in Europe by population?")
print(response)
response = get_completion("What are the top three big cities in Europe by population?")
print(response)

ChatCompletionMessage(content='The top three big cities in Europe by population are:\n\n1. Istanbul, Turkey - With a population of over 15 million people, Istanbul is the most populous city in Europe.\n2. Moscow, Russia - Moscow is the second most populous city in Europe, with a population of over 12 million people.\n3. London, United Kingdom - London is the third most populous city in Europe, with a population of over 9 million people.', role='assistant', function_call=None, tool_calls=None)

Not very useful as a variable as you can see 😢

Add Pydantic secret sauce¶

Let's use Pydantic models to create a well-organized output for our data. We'll make a CityResponse model that gathers important information like the city's name, its country, population, and the local currency. Next, we'll group these cities under a Cities model.

By doing this, we can produce a tailored list of cities, complete with detailed information, all neatly arranged for efficient data management. It's important to remember to include a description for each field because

Because LLM tools are the best LangChain will use these descriptions to generate a prompt later 😎

In [20]:

Copied!





from pydantic import BaseModel, Field
from typing import List

class CityResponse(BaseModel):
    city_name: str = Field(description="This is the Name of the city")
    country: str = Field(description="This is the country of the city")
    population_number: int = Field(description="This is the number of inhabitants")
    local_currency: str = Field(description="This is the local currency of the city")

class Cities(BaseModel):
    city: List[CityResponse]
from pydantic import BaseModel, Field
from typing import List

class CityResponse(BaseModel):
    city_name: str = Field(description="This is the Name of the city")
    country: str = Field(description="This is the country of the city")
    population_number: int = Field(description="This is the number of inhabitants")
    local_currency: str = Field(description="This is the local currency of the city")

class Cities(BaseModel):
    city: List[CityResponse]

Langchain for better prompt¶

In this part we will add the famous Langchain framework in order to shape the output of our Language Model to meet the formatting we desire 👨‍🍳

In [42]:

Copied!

#!pip install langchain
#!pip install langchain

In [24]:

Copied!





from langchain.output_parsers import PydanticOutputParser
pydantic_parser = PydanticOutputParser(pydantic_object=Cities)
format_instructions = pydantic_parser.get_format_instructions()
print(format_instructions)
from langchain.output_parsers import PydanticOutputParser
pydantic_parser = PydanticOutputParser(pydantic_object=Cities)
format_instructions = pydantic_parser.get_format_instructions()
print(format_instructions)

The output should be formatted as a JSON instance that conforms to the JSON schema below.

As an example, for the schema {"properties": {"foo": {"title": "Foo", "description": "a list of strings", "type": "array", "items": {"type": "string"}}}, "required": ["foo"]}
the object {"foo": ["bar", "baz"]} is a well-formatted instance of the schema. The object {"properties": {"foo": ["bar", "baz"]}} is not well-formatted.

Here is the output schema:
```
{"properties": {"city": {"title": "City", "type": "array", "items": {"$ref": "#/definitions/CityResponse"}}}, "required": ["city"], "definitions": {"CityResponse": {"title": "CityResponse", "type": "object", "properties": {"city_name": {"title": "City Name", "description": "This is the Name of the city", "type": "string"}, "country": {"title": "Country", "description": "This is the country of the city", "type": "string"}, "population_number": {"title": "Population Number", "description": "This is the number of inhabitants", "type": "integer"}, "local_currency": {"title": "Local Currency", "description": "This is the local currency of the city", "type": "string"}}, "required": ["city_name", "country", "population_number", "local_currency"]}}}
```

The format instructions above serve as a template for structuring the output of our Language Model. These instructions establish that the output should adhere to a particular JSON schema.

They provide some examples where one JSON instance correctly follows the schema — an object with a foo key and array value, and another instance which does not align with the schema.

Structured query for structured responses¶

Now let’s use PromptTemplate from LangChain to configure a structured input to our language model.

In [33]:

Copied!





from langchain.prompts import PromptTemplate

query = "What are the top three big cities in Europe by population?"
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": pydantic_parser.get_format_instructions()},
)
_input = prompt.format_prompt(query=query)
answer = get_completion(_input.to_string())
print(answer.message.content)
from langchain.prompts import PromptTemplate

query = "What are the top three big cities in Europe by population?"
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": pydantic_parser.get_format_instructions()},
)
_input = prompt.format_prompt(query=query)
answer = get_completion(_input.to_string())
print(answer.message.content)

{"city": [
  {
    "city_name": "Istanbul",
    "country": "Turkey",
    "population_number": 15029231,
    "local_currency": "Turkish Lira"
  },
  {
    "city_name": "Moscow",
    "country": "Russia",
    "population_number": 12692466,
    "local_currency": "Russian Ruble"
  },
  {
    "city_name": "London",
    "country": "United Kingdom",
    "population_number": 9304016,
    "local_currency": "British Pound"
  }
]}

Validation¶

Moreover, Pydantic models allow us to incorporate validation rules for each field. This added step of validation makes sure that every response from the LLM matches our set standards exactly.

For example, imagine we need city names to be in uppercase. To achieve this, we'll use Pydantic's validator decorator and place our specific conditions in a class method, say validate_cities. This way, when we get a response from the LLM, we can check if it meets our requirements.

In [37]:

Copied!





from typing import List 
from pydantic import BaseModel, Field, validator
from langchain.prompts import PromptTemplate
    
class CityResponse(BaseModel):
    city_name: str = Field(description="This is the Name of the city")
    country: str = Field(description="This is the country of the city")
    population_number: int = Field(description="This is the number of inhabitants")
    local_currency: str = Field(description="This is the local currency of the city")
    
class Cities(BaseModel):
    cities: List[CityResponse]

    @validator("cities", pre=True)
    def validate_cities(cls, v):
        for city in v:
            city_name = city.get("city_name", None)
            if not city_name:
                raise ValueError(f"'city_name' is required")
            if not city_name.isupper():
                raise ValueError(f"City name '{city_name}' is not uppercase")
        return v
    
pydantic_parser = PydanticOutputParser(pydantic_object=Cities)

query = "What are the top three big cities in Europe by population?"
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": pydantic_parser.get_format_instructions()},
)
_input = prompt.format_prompt(query=query)
answer = get_completion(_input.to_string())
pydantic_parser.parse(answer.message.content)
from typing import List 
from pydantic import BaseModel, Field, validator
from langchain.prompts import PromptTemplate
    
class CityResponse(BaseModel):
    city_name: str = Field(description="This is the Name of the city")
    country: str = Field(description="This is the country of the city")
    population_number: int = Field(description="This is the number of inhabitants")
    local_currency: str = Field(description="This is the local currency of the city")
    
class Cities(BaseModel):
    cities: List[CityResponse]

    @validator("cities", pre=True)
    def validate_cities(cls, v):
        for city in v:
            city_name = city.get("city_name", None)
            if not city_name:
                raise ValueError(f"'city_name' is required")
            if not city_name.isupper():
                raise ValueError(f"City name '{city_name}' is not uppercase")
        return v
    
pydantic_parser = PydanticOutputParser(pydantic_object=Cities)

query = "What are the top three big cities in Europe by population?"
prompt = PromptTemplate(
    template="Answer the user query.\n{format_instructions}\n{query}\n",
    input_variables=["query"],
    partial_variables={"format_instructions": pydantic_parser.get_format_instructions()},
)
_input = prompt.format_prompt(query=query)
answer = get_completion(_input.to_string())
pydantic_parser.parse(answer.message.content)

---------------------------------------------------------------------------
ValidationError                           Traceback (most recent call last)
File ~/anaconda3/envs/vision310/lib/python3.10/site-packages/langchain/output_parsers/pydantic.py:30, in PydanticOutputParser.parse(self, text)
     29     json_object = json.loads(json_str, strict=False)
---> 30     return self.pydantic_object.parse_obj(json_object)
     32 except (json.JSONDecodeError, ValidationError) as e:

File ~/anaconda3/envs/vision310/lib/python3.10/site-packages/pydantic/main.py:527, in pydantic.main.BaseModel.parse_obj()

File ~/anaconda3/envs/vision310/lib/python3.10/site-packages/pydantic/main.py:342, in pydantic.main.BaseModel.__init__()

ValidationError: 1 validation error for Cities
cities
  City name 'Istanbul' is not uppercase (type=value_error)

During handling of the above exception, another exception occurred:

OutputParserException                     Traceback (most recent call last)
Input In [37], in <cell line: 34>()
     32 _input = prompt.format_prompt(query=query)
     33 answer = get_completion(_input.to_string())
---> 34 pydantic_parser.parse(answer.message.content)

File ~/anaconda3/envs/vision310/lib/python3.10/site-packages/langchain/output_parsers/pydantic.py:35, in PydanticOutputParser.parse(self, text)
     33 name = self.pydantic_object.__name__
     34 msg = f"Failed to parse {name} from completion {text}. Got: {e}"
---> 35 raise OutputParserException(msg, llm_output=text)

OutputParserException: Failed to parse Cities from completion {"cities": [
  {
    "city_name": "Istanbul",
    "country": "Turkey",
    "population_number": 15029231,
    "local_currency": "Turkish Lira"
  },
  {
    "city_name": "Moscow",
    "country": "Russia",
    "population_number": 12692466,
    "local_currency": "Russian Ruble"
  },
  {
    "city_name": "London",
    "country": "United Kingdom",
    "population_number": 9304016,
    "local_currency": "British Pound"
  }
]}. Got: 1 validation error for Cities
cities
  City name 'Istanbul' is not uppercase (type=value_error)

Wrap it up¶

To wrap it up, integrating Pydantic models, PromptTemplate and PydanticOutputParser from LangChain into our Python programming offers a crucial method for extracting well-organized and detailed data from Large Language Models.

This strategy simplifies the challenging task of dealing with unstructured outputs and ensures that we achieve a high standard of data quality, meeting all our specific needs and criteria.

Next level with `instructor` library¶

The project instructor in conjunction with Pydantic, can revolutionized the way we interact with language models. It brings simplicity, modularity, and a high degree of customization, making the OpenAI SDK even more usable for developers.

Handle proper retry¶

One of the coolest feature of instructor for me is it's validation abstraction and the max_retries parameter. Here, the UserDetails model is passed as the response_model, and max_retries is set to 2.

import instructor

from openai import OpenAI
from pydantic import BaseModel, field_validator

# Apply the patch to the OpenAI client
client = instructor.patch(OpenAI())

class UserDetails(BaseModel):
    name: str
    age: int

    @field_validator("name")
    @classmethod
    def validate_name(cls, v):
        if v.upper() != v:
            raise ValueError("Name must be in uppercase.")
        return v

model = client.chat.completions.create(
    model="gpt-3.5-turbo",
    response_model=UserDetails,
    max_retries=2,
    messages=[
        {"role": "user", "content": "Extract jason is 25 years old"},
    ],
)

assert model.name == "JASON"

Bonus tips : caching¶

Let's talk about caching Pydantic models, which usually don't play well with pickle, and look into alternative approaches involving decorators such as functools.cache.

I recommend you this lecture on the subject by the instructor creator here