Developer quickstart
Get up and running with the OpenAI API
Looking for ChatGPT? Head to chatgpt.com.
The OpenAI API provides a simple interface for developers to create an intelligence layer in their applications, powered by OpenAI’s state of the art models. The Chat Completions endpoint powers ChatGPT and provides a simple way to take text as input and use a model like GPT-4o to generate an output.
Want to jump straight to the code?
Skip the quickstart and dive into the API reference.
This quickstart is designed to help get your local development environment set up and send your first API request. If you are an experienced developer or want to just dive into using the OpenAI API, the API reference of GPT guide are a great place to start. Throughout this quickstart, you will learn:
How to set up your development environment
How to install the latest SDKs
Some of the basic concepts of the OpenAI API
How to send your first API request
If you run into any challenges or have questions getting started, please join our developer forum.
Account setup
First, create an OpenAI account or sign in. Next, navigate to the API key page and “Create new secret key”, optionally naming the key. Make sure to save this somewhere safe and do not share it with anyone.
Quickstart language selection
Select the tool or language you want to get started using the OpenAI API with.
Node.js is a popular JavaScript framework that is commonly used for web development. OpenAI provides a custom Node.js / TypeScript library which makes working with the OpenAI API in JavaScript simple and efficient.
Step 1: Setting up Node
Install Node.js
To use the OpenAI Node.js library, you will need to ensure you have Node.js installed.
To download Node.js, head to the official Node website and download the most recent version marked “LTS” (Long Term Support). If you are installing Node.js for the first time, you can follow the official Node.js usage guide to get started.
Install the OpenAI Node.js library
Once you have Node.js installed, the OpenAI Node.js library can be installed. From the terminal / command line, run:
npm install –save openai
or
yarn add openai
Step 2: Set up your API key
Set up your API key for all projects (recommended)
The main advantage to making your API key accessible for all projects is that our SDK will automatically detect it and use it without having to write any code.
MacOS
Open Terminal: You can find it in the Applications folder or search for it using Spotlight (Command + Space).
Edit bash profile: Use the command nano ~/.bash_profile or nano ~/.zshrc (for newer MacOS versions) to open the profile file in a text editor.
Add Environment Variable: In the editor, ensure you have set your API key as shown below, replacing your-api-key-here with your actual API key:
export OPENAI_API_KEY=’your-api-key-here’
Save and exit: Press Ctrl+O to write the changes, followed by Ctrl+X to close the editor.
Load your profile: Use the command source ~/.bash_profile or source ~/.zshrc to load the updated profile.
Verification: Verify the setup by typing echo $OPENAI_API_KEY in the terminal. It should display your API key.
Windows
Step 3: Sending your first API request
Making an API request
After you have Node.js configured and set up an API key, the final step is to send a request to the OpenAI API using the Node.js library. To do this, create a file named openai-test.js using the terminal or an IDE.
Inside the file, copy and paste one of the examples below:
ChatCompletions
ChatCompletions
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
const completion = await openai.chat.completions.create({
messages: [{ role: “system”, content: “You are a helpful assistant.” }],
model: “gpt-3.5-turbo”,
});
console.log(completion.choices[0]);
}
main();
To run the code, enter node openai-test.js into the terminal / command line.
The Chat Completions example highlights just one area of strength for our models: creative ability. Explaining recursion (the programming topic) in a well formatted poem is something both the best developers and best poets would struggle with. In this case, gpt-3.5-turbo does it effortlessly.
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
const embedding = await openai.embeddings.create({
model: “text-embedding-ada-002”,
input: “The quick brown fox jumped over the lazy dog”,
});
console.log(embedding);
}
main();
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
const image = await openai.images.generate({ prompt: “A cute baby sea otter” });
console.log(image.data);
}
main();
Models
Flagship models
GPT-4o New
Our fastest and most affordable flagship model
Text and image input, text output
128k context length
Input: $5 | Output: $15*
GPT-4 Turbo
Our previous high-intelligence model
Text and image input, text output
128k context length
Input: $10 | Output: $30*
GPT-3.5 Turbo
Our fast, inexpensive model for simple tasks
Text input, text output
16k context length
Input: $0.50 | Output: $1.50*
- prices per 1 million tokens
Models overview
The OpenAI API is powered by a diverse set of models with different capabilities and price points. You can also make customizations to our models for your specific use case with fine-tuning.
MODEL DESCRIPTION
GPT-4o The fastest and most affordable flagship model
GPT-4 Turbo and GPT-4 The previous set of high-intelligence models
GPT-3.5 Turbo A fast, inexpensive model for simple tasks
DALL·E A model that can generate and edit images given a natural language prompt
TTS A set of models that can convert text into natural sounding spoken audio
Whisper A model that can convert audio into text
Embeddings A set of models that can convert text into a numerical form
Moderation A fine-tuned model that can detect whether text may be sensitive or unsafe
GPT base A set of models without instruction following that can understand as well as generate natural language or code
Deprecated A full list of models that have been deprecated along with the suggested replacement
We have also published open source models including Point-E, Whisper, Jukebox, and CLIP.
Continuous model upgrades
gpt-4o, gpt-4-turbo, gpt-4, and gpt-3.5-turbo point to their respective latest model version. You can verify this by looking at the response object after sending a request. The response will include the specific model version used (e.g. gpt-3.5-turbo-1106).
We also offer pinned model versions that developers can continue using for at least three months after an updated model has been introduced. With the new cadence of model updates, we are also giving people the ability to contribute evals to help us improve the model for different use cases. If you are interested, check out the OpenAI Evals repository.
Learn more about model deprecation on our deprecation page.
GPT-4o
GPT-4o (“o” for “omni”) is our most advanced model. It is multimodal (accepting text or image inputs and outputting text), and it has the same high intelligence as GPT-4 Turbo but is much more efficient—it generates text 2x faster and is 50% cheaper. Additionally, GPT-4o has the best vision and performance across non-English languages of any of our models. GPT-4o is available in the OpenAI API to paying customers. Learn how to use GPT-4o in our text generation guide.
MODEL DESCRIPTION CONTEXT WINDOW TRAINING DATA
gpt-4o New GPT-4o
Our most advanced, multimodal flagship model that’s cheaper and faster than GPT-4 Turbo. Currently points to gpt-4o-2024-05-13. 128,000 tokens Up to Oct 2023
gpt-4o-2024-05-13 gpt-4o currently points to this version. 128,000 tokens Up to Oct 2023
GPT-4 Turbo and GPT-4
GPT-4 is a large multimodal model (accepting text or image inputs and outputting text) that can solve difficult problems with greater accuracy than any of our previous models, thanks to its broader general knowledge and advanced reasoning capabilities. GPT-4 is available in the OpenAI API to paying customers. Like gpt-3.5-turbo, GPT-4 is optimized for chat but works well for traditional completions tasks using the Chat Completions API. Learn how to use GPT-4 in our text generation guide.
MODEL DESCRIPTION CONTEXT WINDOW TRAINING DATA
gpt-4-turbo The latest GPT-4 Turbo model with vision capabilities. Vision requests can now use JSON mode and function calling. Currently points to gpt-4-turbo-2024-04-09. 128,000 tokens Up to Dec 2023
gpt-4-turbo-2024-04-09 GPT-4 Turbo with Vision model. Vision requests can now use JSON mode and function calling. gpt-4-turbo currently points to this version. 128,000 tokens Up to Dec 2023
gpt-4-turbo-preview GPT-4 Turbo preview model. Currently points to gpt-4-0125-preview. 128,000 tokens Up to Dec 2023
gpt-4-0125-preview GPT-4 Turbo preview model intended to reduce cases of “laziness” where the model doesn’t complete a task. Returns a maximum of 4,096 output tokens. Learn more. 128,000 tokens Up to Dec 2023
gpt-4-1106-preview GPT-4 Turbo preview model featuring improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. This is a preview model. Learn more. 128,000 tokens Up to Apr 2023
gpt-4 Currently points to gpt-4-0613. See continuous model upgrades. 8,192 tokens Up to Sep 2021
gpt-4-0613 Snapshot of gpt-4 from June 13th 2023 with improved function calling support. 8,192 tokens Up to Sep 2021
gpt-4-0314 Legacy Snapshot of gpt-4 from March 14th 2023. 8,192 tokens Up to Sep 2021
For many basic tasks, the difference between GPT-4 and GPT-3.5 models is not significant. However, in more complex reasoning situations, GPT-4 is much more capable than any of our previous models.
Multilingual capabilities
GPT-4 outperforms both previous large language models and as of 2023, most state-of-the-art systems (which often have benchmark-specific training or hand-engineering). On the MMLU benchmark, an English-language suite of multiple-choice questions covering 57 subjects, GPT-4 not only outperforms existing models by a considerable margin in English, but also demonstrates strong performance in other languages.
GPT-3.5 Turbo
GPT-3.5 Turbo models can understand and generate natural language or code and have been optimized for chat using the Chat Completions API but work well for non-chat tasks as well.
MODEL DESCRIPTION CONTEXT WINDOW TRAINING DATA
gpt-3.5-turbo-0125 The latest GPT-3.5 Turbo model with higher accuracy at responding in requested formats and a fix for a bug which caused a text encoding issue for non-English language function calls. Returns a maximum of 4,096 output tokens. Learn more. 16,385 tokens Up to Sep 2021
gpt-3.5-turbo Currently points to gpt-3.5-turbo-0125. 16,385 tokens Up to Sep 2021
gpt-3.5-turbo-1106 GPT-3.5 Turbo model with improved instruction following, JSON mode, reproducible outputs, parallel function calling, and more. Returns a maximum of 4,096 output tokens. Learn more. 16,385 tokens Up to Sep 2021
gpt-3.5-turbo-instruct Similar capabilities as GPT-3 era models. Compatible with legacy Completions endpoint and not Chat Completions. 4,096 tokens Up to Sep 2021
DALL·E
DALL·E is a AI system that can create realistic images and art from a description in natural language. DALL·E 3 currently supports the ability, given a prompt, to create a new image with a specific size. DALL·E 2 also support the ability to edit an existing image, or create variations of a user provided image.
DALL·E 3 is available through our Images API along with DALL·E 2. You can try DALL·E 3 through ChatGPT Plus.
MODEL DESCRIPTION
dall-e-3 The latest DALL·E model released in Nov 2023. Learn more.
dall-e-2 The previous DALL·E model released in Nov 2022. The 2nd iteration of DALL·E with more realistic, accurate, and 4x greater resolution images than the original model.
TTS
TTS is an AI model that converts text to natural sounding spoken text. We offer two different model variates, tts-1 is optimized for real time text to speech use cases and tts-1-hd is optimized for quality. These models can be used with the Speech endpoint in the Audio API.
MODEL DESCRIPTION
tts-1 The latest text to speech model, optimized for speed.
tts-1-hd The latest text to speech model, optimized for quality.
Whisper
Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multi-task model that can perform multilingual speech recognition as well as speech translation and language identification. The Whisper v2-large model is currently available through our API with the whisper-1 model name.
Currently, there is no difference between the open source version of Whisper and the version available through our API. However, through our API, we offer an optimized inference process which makes running Whisper through our API much faster than doing it through other means. For more technical details on Whisper, you can read the paper.
Embeddings
Embeddings are a numerical representation of text that can be used to measure the relatedness between two pieces of text. Embeddings are useful for search, clustering, recommendations, anomaly detection, and classification tasks. You can read more about our latest embedding models in the announcement blog post.
MODEL DESCRIPTION OUTPUT DIMENSION
text-embedding-3-large Most capable embedding model for both english and non-english tasks 3,072
text-embedding-3-small Increased performance over 2nd generation ada embedding model 1,536
text-embedding-ada-002 Most capable 2nd generation embedding model, replacing 16 first generation models 1,536
Moderation
The Moderation models are designed to check whether content complies with OpenAI’s usage policies. The models provide classification capabilities that look for content in the following categories: hate, hate/threatening, self-harm, sexual, sexual/minors, violence, and violence/graphic. You can find out more in our moderation guide.
Moderation models take in an arbitrary sized input that is automatically broken up into chunks of 4,096 tokens. In cases where the input is more than 32,768 tokens, truncation is used which in a rare condition may omit a small number of tokens from the moderation check.
The final results from each request to the moderation endpoint shows the maximum value on a per category basis. For example, if one chunk of 4K tokens had a category score of 0.9901 and the other had a score of 0.1901, the results would show 0.9901 in the API response since it is higher.
MODEL DESCRIPTION MAX TOKENS
text-moderation-latest Currently points to text-moderation-007. 32,768
text-moderation-stable Currently points to text-moderation-007. 32,768
text-moderation-007 Most capable moderation model across all categories. 32,768
GPT base
GPT base models can understand and generate natural language or code but are not trained with instruction following. These models are made to be replacements for our original GPT-3 base models and use the legacy Completions API. Most customers should use GPT-3.5 or GPT-4.
MODEL DESCRIPTION MAX TOKENS TRAINING DATA
babbage-002 Replacement for the GPT-3 ada and babbage base models. 16,384 tokens Up to Sep 2021
davinci-002 Replacement for the GPT-3 curie and davinci base models. 16,384 tokens Up to Sep 2021
How we use your data
Your data is your data.
As of March 1, 2023, data sent to the OpenAI API will not be used to train or improve OpenAI models (unless you explicitly opt in). One advantage to opting in is that the models may get better at your use case over time.
To help identify abuse, API data may be retained for up to 30 days, after which it will be deleted (unless otherwise required by law). For trusted customers with sensitive applications, zero data retention may be available. With zero data retention, request and response bodies are not persisted to any logging mechanism and exist only in memory in order to serve the request.
Note that this data policy does not apply to OpenAI’s non-API consumer services like ChatGPT or DALL·E Labs.
Default usage policies by endpoint
ENDPOINT DATA USED FOR TRAINING DEFAULT RETENTION ELIGIBLE FOR ZERO RETENTION
/v1/chat/completions* No 30 days Yes, except image inputs*
/v1/assistants No 30 days ** No
/v1/threads No 30 days ** No
/v1/threads/messages No 30 days ** No
/v1/threads/runs No 30 days ** No
/v1/vector_stores No 30 days ** No
/v1/threads/runs/steps No 30 days ** No
/v1/images/generations No 30 days No
/v1/images/edits No 30 days No
/v1/images/variations No 30 days No
/v1/embeddings No 30 days Yes
/v1/audio/transcriptions No Zero data retention –
/v1/audio/translations No Zero data retention –
/v1/audio/speech No 30 days Yes
/v1/files No Until deleted by customer No
/v1/fine_tuning/jobs No Until deleted by customer No
/v1/batches No Until deleted by customer No
/v1/moderations No Zero data retention –
/v1/completions No 30 days Yes
- Image inputs via the gpt-4-turbo model (or previously gpt-4-vision-preview) are not eligible for zero retention.
** Objects related to the Assistants API are deleted from our servers 30 days after you delete them via the API or the dashboard. Objects that are not deleted via the API or dashboard are retained indefinitely.
For details, see our API data usage policies. To learn more about zero retention, get in touch with our sales team.
Model endpoint compatibility
ENDPOINT LATEST MODELS
/v1/assistants All GPT-4 and GPT-3.5 Turbo models. The retrieval tool requires gpt-4-turbo-preview (and subsequent dated model releases) or gpt-3.5-turbo-1106 (and subsequent versions).
/v1/audio/transcriptions whisper-1
/v1/audio/translations whisper-1
/v1/audio/speech tts-1, tts-1-hd
/v1/chat/completions gpt-4 and dated model releases, gpt-4-turbo-preview and dated model releases, gpt-3.5-turbo and dated model releases, fine-tuned versions of gpt-3.5-turbo
/v1/completions (Legacy) gpt-3.5-turbo-instruct, babbage-002, davinci-002
/v1/embeddings text-embedding-3-small, text-embedding-3-large, text-embedding-ada-002
/v1/fine_tuning/jobs gpt-3.5-turbo, babbage-002, davinci-002
/v1/moderations text-moderation-stable, text-moderation-latest
/v1/images/generations dall-e-2, dall-e-3
This list excludes all of our deprecated models.
Prompt examples
import OpenAI from “openai”;
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const response = await openai.chat.completions.create({
model: “gpt-3.5-turbo”,
messages: [
{
“role”: “system”,
“content”: “You will be provided with statements, and your task is to convert them to standard English.”
},
{
“role”: “user”,
“content”: “She no went to the market.”
}
],
temperature: 0.7,
max_tokens: 64,
top_p: 1,
});
import OpenAI from “openai”;
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const response = await openai.chat.completions.create({
model: “gpt-3.5-turbo”,
messages: [
{
“role”: “system”,
“content”: “Summarize content you are provided with for a second-grade student.”
},
{
“role”: “user”,
“content”: “Jupiter is the fifth planet from the Sun and the largest in the Solar System. It is a gas giant with a mass one-thousandth that of the Sun, but two-and-a-half times that of all the other planets in the Solar System combined. Jupiter is one of the brightest objects visible to the naked eye in the night sky, and has been known to ancient civilizations since before recorded history. It is named after the Roman god Jupiter.[19] When viewed from Earth, Jupiter can be bright enough for its reflected light to cast visible shadows,[20] and is on average the third-brightest natural object in the night sky after the Moon and Venus.”
}
],
temperature: 0.7,
max_tokens: 64,
top_p: 1,
});
import OpenAI from “openai”;
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const response = await openai.chat.completions.create({
model: “gpt-3.5-turbo”,
messages: [
{
“role”: “system”,
“content”: “You will be provided with unstructured data, and your task is to parse it into CSV format.”
},
{
“role”: “user”,
“content”: “There are many fruits that were found on the recently discovered planet Goocrux. There are neoskizzles that grow there, which are purple and taste like candy. There are also loheckles, which are a grayish blue fruit and are very tart, a little bit like a lemon. Pounits are a bright green color and are more savory than sweet. There are also plenty of loopnovas which are a neon pink flavor and taste like cotton candy. Finally, there are fruits called glowls, which have a very sour and bitter taste which is acidic and caustic, and a pale orange tinge to them.”
}
],
temperature: 0.7,
max_tokens: 64,
top_p: 1,
});
import OpenAI from “openai”;
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const response = await openai.chat.completions.create({
model: “gpt-4”,
messages: [
{
“role”: “user”,
“content”: “Write a Python function that takes as input a file path to an image, loads the image into memory as a numpy array, then crops the rows and columns around the perimeter if they are darker than a threshold value. Use the mean value of rows and columns to decide if they should be marked for deletion.”
}
],
temperature: 0.7,
max_tokens: 64,
top_p: 1,
});
import OpenAI from “openai”;
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const response = await openai.chat.completions.create({
model: “gpt-4”,
messages: [
{
“role”: “user”,
“content”: “Make a single page website that shows off different neat javascript features for drop-downs and things to display information. The website should be an HTML file with embedded javascript and CSS.”
}
],
temperature: 0.7,
max_tokens: 64,
top_p: 1,
});
import OpenAI from “openai”;
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const response = await openai.chat.completions.create({
model: “gpt-4”,
messages: [
{
“role”: “system”,
“content”: “Given the following SQL tables, your job is to write queries given a user’s request.\n \n CREATE TABLE Orders (\n OrderID int,\n CustomerID int,\n OrderDate datetime,\n OrderTime varchar(8),\n PRIMARY KEY (OrderID)\n );\n \n CREATE TABLE OrderDetails (\n OrderDetailID int,\n OrderID int,\n ProductID int,\n Quantity int,\n PRIMARY KEY (OrderDetailID)\n );\n \n CREATE TABLE Products (\n ProductID int,\n ProductName varchar(50),\n Category varchar(50),\n UnitPrice decimal(10, 2),\n Stock int,\n PRIMARY KEY (ProductID)\n );\n \n CREATE TABLE Customers (\n CustomerID int,\n FirstName varchar(50),\n LastName varchar(50),\n Email varchar(100),\n Phone varchar(20),\n PRIMARY KEY (CustomerID)\n );”
},
{
“role”: “user”,
“content”: “Write a SQL query which computes the average total order value for all orders on 2023-04-01.”
}
],
temperature: 0.7,
max_tokens: 64,
top_p: 1,
});
import OpenAI from “openai”;
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
const response = await openai.chat.completions.create({
model: “gpt-4”,
messages: [
{
“role”: “user”,
“content”: “Write a lesson plan for an introductory algebra class. The lesson plan should cover the distributive law, in particular how it works in simple cases involving mixes of positive and negative numbers. Come up with some examples that show common student errors.”
}
],
temperature: 0.7,
max_tokens: 64,
top_p: 1,
});
Text generation models
OpenAI’s text generation models (often called generative pre-trained transformers or large language models) have been trained to understand natural language, code, and images. The models provide text outputs in response to their inputs. The text inputs to these models are also referred to as “prompts”. Designing a prompt is essentially how you “program” a large language model model, usually by providing instructions or some examples of how to successfully complete a task.
Using OpenAI’s text generation models, you can build applications to:
Draft documents
Write computer code
Answer questions about a knowledge base
Analyze texts
Give software a natural language interface
Tutor in a range of subjects
Translate languages
Simulate characters for games
Try GPT-4o
Try out GPT-4o in the playground.
Explore GPT-4o with image inputs
Check out the vision guide for more detail.
To use one of these models via the OpenAI API, you’ll send a request to the Chat Completions API containing the inputs and your API key, and receive a response containing the model’s output.
You can experiment with various models in the chat playground. If you’re not sure which model to use then try gpt-4o if you need high intelligence or gpt-3.5-turbo if you need the fastest speed and lowest cost.
Chat Completions API
Chat models take a list of messages as input and return a model-generated message as output. Although the chat format is designed to make multi-turn conversations easy, it’s just as useful for single-turn tasks without any conversation.
An example Chat Completions API call looks like the following:
node.js
node.js
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
const completion = await openai.chat.completions.create({
messages: [{“role”: “system”, “content”: “You are a helpful assistant.”},
{“role”: “user”, “content”: “Who won the world series in 2020?”},
{“role”: “assistant”, “content”: “The Los Angeles Dodgers won the World Series in 2020.”},
{“role”: “user”, “content”: “Where was it played?”}],
model: “gpt-3.5-turbo”,
});
console.log(completion.choices[0]);
}
main();
To learn more, you can view the full API reference documentation for the Chat API.
The main input is the messages parameter. Messages must be an array of message objects, where each object has a role (either “system”, “user”, or “assistant”) and content. Conversations can be as short as one message or many back and forth turns.
Typically, a conversation is formatted with a system message first, followed by alternating user and assistant messages.
The system message helps set the behavior of the assistant. For example, you can modify the personality of the assistant or provide specific instructions about how it should behave throughout the conversation. However note that the system message is optional and the model’s behavior without a system message is likely to be similar to using a generic message such as “You are a helpful assistant.”
The user messages provide requests or comments for the assistant to respond to. Assistant messages store previous assistant responses, but can also be written by you to give examples of desired behavior.
Including conversation history is important when user instructions refer to prior messages. In the example above, the user’s final question of “Where was it played?” only makes sense in the context of the prior messages about the World Series of 2020. Because the models have no memory of past requests, all relevant information must be supplied as part of the conversation history in each request. If a conversation cannot fit within the model’s token limit, it will need to be shortened in some way.
To mimic the effect seen in ChatGPT where the text is returned iteratively, set the stream parameter to true.
Chat Completions response format
An example Chat Completions API response looks as follows:
{
“choices”: [
{
“finish_reason”: “stop”,
“index”: 0,
“message”: {
“content”: “The 2020 World Series was played in Texas at Globe Life Field in Arlington.”,
“role”: “assistant”
},
“logprobs”: null
}
],
“created”: 1677664795,
“id”: “chatcmpl-7QyqpwdfhqwajicIEznoc6Q47XAyW”,
“model”: “gpt-3.5-turbo-0613”,
“object”: “chat.completion”,
“usage”: {
“completion_tokens”: 17,
“prompt_tokens”: 57,
“total_tokens”: 74
}
}
The assistant’s reply can be extracted with:
node.js
node.js
completion.choices[0].message.content
Every response will include a finish_reason. The possible values for finish_reason are:
stop: API returned complete message, or a message terminated by one of the stop sequences provided via the stop parameter
length: Incomplete model output due to max_tokens parameter or token limit
function_call: The model decided to call a function
content_filter: Omitted content due to a flag from our content filters
null: API response still in progress or incomplete
Depending on input parameters, the model response may include different information.
JSON mode
A common way to use Chat Completions is to instruct the model to always return a JSON object that makes sense for your use case, by specifying this in the system message. While this does work in some cases, occasionally the models may generate output that does not parse to valid JSON objects.
To prevent these errors and improve model performance, when using gpt-4o, gpt-4-turbo, or gpt-3.5-turbo, you can set response_format to { “type”: “json_object” } to enable JSON mode. When JSON mode is enabled, the model is constrained to only generate strings that parse into valid JSON object.
Important notes:
When using JSON mode, always instruct the model to produce JSON via some message in the conversation, for example via your system message. If you don’t include an explicit instruction to generate JSON, the model may generate an unending stream of whitespace and the request may run continually until it reaches the token limit. To help ensure you don’t forget, the API will throw an error if the string “JSON” does not appear somewhere in the context.
The JSON in the message the model returns may be partial (i.e. cut off) if finish_reason is length, which indicates the generation exceeded max_tokens or the conversation exceeded the token limit. To guard against this, check finish_reason before parsing the response.
JSON mode will not guarantee the output matches any specific schema, only that it is valid and parses without errors.
node.js
node.js
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
const completion = await openai.chat.completions.create({
messages: [
{
role: “system”,
content: “You are a helpful assistant designed to output JSON.”,
},
{ role: “user”, content: “Who won the world series in 2020?” },
],
model: “gpt-3.5-turbo-0125”,
response_format: { type: “json_object” },
});
console.log(completion.choices[0].message.content);
}
main();
In this example, the response includes a JSON object that looks something like the following:
“content”: “{\”winner\”: \”Los Angeles Dodgers\”}”`
Note that JSON mode is always enabled when the model is generating arguments as part of function calling.
Reproducible outputs Beta
Chat Completions are non-deterministic by default (which means model outputs may differ from request to request). That being said, we offer some control towards deterministic outputs by giving you access to the seed parameter and the system_fingerprint response field.
To receive (mostly) deterministic outputs across API calls, you can:
Set the seed parameter to any integer of your choice and use the same value across requests you’d like deterministic outputs for.
Ensure all other parameters (like prompt or temperature) are the exact same across requests.
Sometimes, determinism may be impacted due to necessary changes OpenAI makes to model configurations on our end. To help you keep track of these changes, we expose the system_fingerprint field. If this value is different, you may see different outputs due to changes we’ve made on our systems.
Deterministic outputs
Explore the new seed parameter in the OpenAI cookbook
Managing tokens
Language models read and write text in chunks called tokens. In English, a token can be as short as one character or as long as one word (e.g., a or apple), and in some languages tokens can be even shorter than one character or even longer than one word.
For example, the string “ChatGPT is great!” is encoded into six tokens: [“Chat”, “G”, “PT”, ” is”, ” great”, “!”].
The total number of tokens in an API call affects:
How much your API call costs, as you pay per token
How long your API call takes, as writing more tokens takes more time
Whether your API call works at all, as total tokens must be below the model’s maximum limit (4097 tokens for gpt-3.5-turbo)
Both input and output tokens count toward these quantities. For example, if your API call used 10 tokens in the message input and you received 20 tokens in the message output, you would be billed for 30 tokens. Note however that for some models the price per token is different for tokens in the input vs. the output (see the pricing page for more information).
To see how many tokens are used by an API call, check the usage field in the API response (e.g., response[‘usage’][‘total_tokens’]).
Chat models like gpt-3.5-turbo and gpt-4-turbo-preview use tokens in the same way as the models available in the completions API, but because of their message-based formatting, it’s more difficult to count how many tokens will be used by a conversation.
DEEP DIVE
Counting tokens for chat API calls
To see how many tokens are in a text string without making an API call, use OpenAI’s tiktoken Python library. Example code can be found in the OpenAI Cookbook’s guide on how to count tokens with tiktoken.
Each message passed to the API consumes the number of tokens in the content, role, and other fields, plus a few extra for behind-the-scenes formatting. This may change slightly in the future.
If a conversation has too many tokens to fit within a model’s maximum limit (e.g., more than 4097 tokens for gpt-3.5-turbo or more than 128k tokens for gpt-4o), you will have to truncate, omit, or otherwise shrink your text until it fits. Beware that if a message is removed from the messages input, the model will lose all knowledge of it.
Note that very long conversations are more likely to receive incomplete replies. For example, a gpt-3.5-turbo conversation that is 4090 tokens long will have its reply cut off after just 6 tokens.
Parameter details
Frequency and presence penalties
The frequency and presence penalties found in the Chat Completions API and Legacy Completions API can be used to reduce the likelihood of sampling repetitive sequences of tokens.
DEEP DIVE
Penalties behind the scenes
Reasonable values for the penalty coefficients are around 0.1 to 1 if the aim is to just reduce repetitive samples somewhat. If the aim is to strongly suppress repetition, then one can increase the coefficients up to 2, but this can noticeably degrade the quality of samples. Negative values can be used to increase the likelihood of repetition.
Token log probabilities
The logprobs parameter found in the Chat Completions API and Legacy Completions API, when requested, provides the log probabilities of each output token, and a limited number of the most likely tokens at each token position alongside their log probabilities. This can be useful in some cases to assess the confidence of the model in its output, or to examine alternative responses the model might have given.
Completions API Legacy
The completions API endpoint received its final update in July 2023 and has a different interface than the new chat completions endpoint. Instead of the input being a list of messages, the input is a freeform text string called a prompt.
An example legacy Completions API call looks like the following:
node.js
node.js
const completion = await openai.completions.create({
model: ‘gpt-3.5-turbo-instruct’,
prompt: ‘Write a tagline for an ice cream shop.’
});
See the full API reference documentation to learn more.
Inserting text
The completions endpoint also supports inserting text by providing a suffix in addition to the standard prompt which is treated as a prefix. This need naturally arises when writing long-form text, transitioning between paragraphs, following an outline, or guiding the model towards an ending. This also works on code, and can be used to insert in the middle of a function or file.
DEEP DIVE
Inserting text
Completions response format
An example completions API response looks as follows:
{
“choices”: [
{
“finish_reason”: “length”,
“index”: 0,
“logprobs”: null,
“text”: “\n\n\”Let Your Sweet Tooth Run Wild at Our Creamy Ice Cream Shack”
}
],
“created”: 1683130927,
“id”: “cmpl-7C9Wxi9Du4j1lQjdjhxBlO22M61LD”,
“model”: “gpt-3.5-turbo-instruct”,
“object”: “text_completion”,
“usage”: {
“completion_tokens”: 16,
“prompt_tokens”: 10,
“total_tokens”: 26
}
}
In Python, the output can be extracted with response[‘choices’][0][‘text’].
The response format is similar to the response format of the Chat Completions API.
Chat Completions vs. Completions
The Chat Completions format can be made similar to the completions format by constructing a request using a single user message. For example, one can translate from English to French with the following completions prompt:
Translate the following English text to French: “{text}”
And an equivalent chat prompt would be:
[{“role”: “user”, “content”: ‘Translate the following English text to French: “{text}”‘}]
Likewise, the completions API can be used to simulate a chat between a user and an assistant by formatting the input accordingly.
The difference between these APIs is the underlying models that are available in each. The chat completions API is the interface to our most capable model (gpt-4o), and our most cost effective model (gpt-3.5-turbo).
Prompt engineering
An awareness of the best practices for working with OpenAI models can make a significant difference in application performance. The failure modes that each exhibit and the ways of working around or correcting those failure modes are not always intuitive. There is an entire field related to working with language models which has come to be known as “prompt engineering”, but as the field has progressed its scope has outgrown merely engineering the prompt into engineering systems that use model queries as components. To learn more, read our guide on prompt engineering which covers methods to improve model reasoning, reduce the likelihood of model hallucinations, and more. You can also find many useful resources including code samples in the OpenAI Cookbook.
FAQ
Which model should I use?
We generally recommend that you default to using either gpt-4o, gpt-4-turbo, or gpt-3.5-turbo. If your use case requires high intelligence or reasoning about images as well as text, we recommend you evaluate both gpt-4o and gpt-4-turbo (although they have very similar intelligence, note that gpt-4o is both faster and cheaper). If your use case requires the fastest speed and lowest cost, we recommend gpt-3.5-turbo since it is optimized for these aspects.
gpt-4o and gpt-4-turbo are also less likely than gpt-3.5-turbo to make up information, a behavior known as “hallucination”. Finally, gpt-4o and gpt-4-turbo have a context window that supports up to 128,000 tokens compared to 4,096 tokens for gpt-3.5-turbo, meaning they can reason over much more information at once.
We recommend experimenting in the playground to investigate which models provide the best price performance trade-off for your usage. A common design pattern is to use several distinct query types which are each dispatched to the model appropriate to handle them.
How should I set the temperature parameter?
Lower values for temperature result in more consistent outputs (e.g. 0.2), while higher values generate more diverse and creative results (e.g. 1.0). Select a temperature value based on the desired trade-off between coherence and creativity for your specific application. The temperature can range is from 0 to 2.
Is fine-tuning available for the latest models?
See the fine-tuning guide for the latest information on which models are available for fine-tuning and how to get started.
Do you store the data that is passed into the API?
As of March 1st, 2023, we retain your API data for 30 days but no longer use your data sent via the API to improve our models. Learn more in our data usage policy. Some endpoints offer zero retention.
How can I make my application more safe?
If you want to add a moderation layer to the outputs of the Chat API, you can follow our moderation guide to prevent content that violates OpenAI’s usage policies from being shown. We also encourage you to read our safety guide for more information on how to build safer systems.
Should I use ChatGPT or the API?
ChatGPT offers a chat interface for our models and a range of built-in features such as integrated browsing, code execution, plugins, and more. By contrast, using OpenAI’s API provides more flexibility but requires that you write code or send the requests to our models programmatically.
Example invoking multiple function calls in one response
node.js
node.js
import OpenAI from “openai”;
const openai = new OpenAI();
// Example dummy function hard coded to return the same weather
// In production, this could be your backend API or an external API
function getCurrentWeather(location, unit = “fahrenheit”) {
if (location.toLowerCase().includes(“tokyo”)) {
return JSON.stringify({ location: “Tokyo”, temperature: “10”, unit: “celsius” });
} else if (location.toLowerCase().includes(“san francisco”)) {
return JSON.stringify({ location: “San Francisco”, temperature: “72”, unit: “fahrenheit” });
} else if (location.toLowerCase().includes(“paris”)) {
return JSON.stringify({ location: “Paris”, temperature: “22”, unit: “fahrenheit” });
} else {
return JSON.stringify({ location, temperature: “unknown” });
}
}
async function runConversation() {
// Step 1: send the conversation and available functions to the model
const messages = [
{ role: “user”, content: “What’s the weather like in San Francisco, Tokyo, and Paris?” },
];
const tools = [
{
type: “function”,
function: {
name: “get_current_weather”,
description: “Get the current weather in a given location”,
parameters: {
type: “object”,
properties: {
location: {
type: “string”,
description: “The city and state, e.g. San Francisco, CA”,
},
unit: { type: “string”, enum: [“celsius”, “fahrenheit”] },
},
required: [“location”],
},
},
},
];
const response = await openai.chat.completions.create({
model: “gpt-4o”,
messages: messages,
tools: tools,
tool_choice: “auto”, // auto is default, but we’ll be explicit
});
const responseMessage = response.choices[0].message;
// Step 2: check if the model wanted to call a function
const toolCalls = responseMessage.tool_calls;
if (responseMessage.tool_calls) {
// Step 3: call the function
// Note: the JSON response may not always be valid; be sure to handle errors
const availableFunctions = {
get_current_weather: getCurrentWeather,
}; // only one function in this example, but you can have multiple
messages.push(responseMessage); // extend conversation with assistant’s reply
for (const toolCall of toolCalls) {
const functionName = toolCall.function.name;
const functionToCall = availableFunctions[functionName];
const functionArgs = JSON.parse(toolCall.function.arguments);
const functionResponse = functionToCall(
functionArgs.location,
functionArgs.unit
);
messages.push({
tool_call_id: toolCall.id,
role: “tool”,
name: functionName,
content: functionResponse,
}); // extend conversation with function response
}
const secondResponse = await openai.chat.completions.create({
model: “gpt-4o”,
messages: messages,
}); // get a new response from the model where it can see the function response
return secondResponse.choices;
}
}
runConversation().then(console.log).catch(console.error);
You can find more examples of function calling in the OpenAI Cookbook:
Function calling
Learn from more examples demonstrating function calling
Tokens
Under the hood, functions are injected into the system message in a syntax the model has been trained on. This means functions count against the model’s context limit and are billed as input tokens. If running into context limits, we suggest limiting the number of functions or the length of documentation you provide for function parameters.
It is also possible to use fine-tuning to reduce the number of tokens used if you have many functions defined.
Embeddings
Learn how to turn text into numbers, unlocking use cases like search.
New embedding models
text-embedding-3-small and text-embedding-3-large, our newest and most performant embedding models are now available, with lower costs, higher multilingual performance, and new parameters to control the overall size.
What are embeddings?
OpenAI’s text embeddings measure the relatedness of text strings. Embeddings are commonly used for:
Search (where results are ranked by relevance to a query string)
Clustering (where text strings are grouped by similarity)
Recommendations (where items with related text strings are recommended)
Anomaly detection (where outliers with little relatedness are identified)
Diversity measurement (where similarity distributions are analyzed)
Classification (where text strings are classified by their most similar label)
An embedding is a vector (list) of floating point numbers. The distance between two vectors measures their relatedness. Small distances suggest high relatedness and large distances suggest low relatedness.
Visit our pricing page to learn about Embeddings pricing. Requests are billed based on the number of tokens in the input.
How to get embeddings
To get an embedding, send your text string to the embeddings API endpoint along with the embedding model name (e.g. text-embedding-3-small). The response will contain an embedding (list of floating point numbers), which you can extract, save in a vector database, and use for many different use cases:
Example: Getting embeddings
node
node
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
const embedding = await openai.embeddings.create({
model: “text-embedding-3-small”,
input: “Your text string goes here”,
encoding_format: “float”,
});
console.log(embedding);
}
main();
The response will contain the embedding vector along with some additional metadata.
Example embedding response
json
json
{
“object”: “list”,
“data”: [
{
“object”: “embedding”,
“index”: 0,
“embedding”: [
-0.006929283495992422,
-0.005336422007530928,
… (omitted for spacing)
-4.547132266452536e-05,
-0.024047505110502243
],
}
],
“model”: “text-embedding-3-small”,
“usage”: {
“prompt_tokens”: 5,
“total_tokens”: 5
}
}
By default, the length of the embedding vector will be 1536 for text-embedding-3-small or 3072 for text-embedding-3-large. You can reduce the dimensions of the embedding by passing in the dimensions parameter without the embedding losing its concept-representing properties. We go into more detail on embedding dimensions in the embedding use case section.
Embedding models
OpenAI offers two powerful third-generation embedding model (denoted by -3 in the model ID). You can read the embedding v3 announcement blog post for more details.
Usage is priced per input token, below is an example of pricing pages of text per US dollar (assuming ~800 tokens per page):
MODEL ~ PAGES PER DOLLAR PERFORMANCE ON MTEB EVAL MAX INPUT
text-embedding-3-small 62,500 62.3% 8191
text-embedding-3-large 9,615 64.6% 8191
text-embedding-ada-002 12,500 61.0% 8191
Use cases
Here we show some representative use cases. We will use the Amazon fine-food reviews dataset for the following examples.
Obtaining the embeddings
The dataset contains a total of 568,454 food reviews Amazon users left up to October 2012. We will use a subset of 1,000 most recent reviews for illustration purposes. The reviews are in English and tend to be positive or negative. Each review has a ProductId, UserId, Score, review title (Summary) and review body (Text). For example:
PRODUCT ID USER ID SCORE SUMMARY TEXT
B001E4KFG0 A3SGXH7AUHU8GW 5 Good Quality Dog Food I have bought several of the Vitality canned…
B00813GRG4 A1D87F6ZCVE5NK 1 Not as Advertised Product arrived labeled as Jumbo Salted Peanut…
We will combine the review summary and review text into a single combined text. The model will encode this combined text and output a single vector embedding.
Get_embeddings_from_dataset.ipynb
from openai import OpenAI
client = OpenAI()
def get_embedding(text, model=”text-embedding-3-small”):
text = text.replace(“\n”, ” “)
return client.embeddings.create(input = [text], model=model).data[0].embedding
df[‘ada_embedding’] = df.combined.apply(lambda x: get_embedding(x, model=’text-embedding-3-small’))
df.to_csv(‘output/embedded_1k_reviews.csv’, index=False)
To load the data from a saved file, you can run the following:
import pandas as pd
df = pd.read_csv(‘output/embedded_1k_reviews.csv’)
df[‘ada_embedding’] = df.ada_embedding.apply(eval).apply(np.array)
Image generation
Learn how to generate or manipulate images with DALL·E in the API.
Looking to generate images in ChatGPT? Head to chatgpt.com.
Introduction
The Images API provides three methods for interacting with images:
Creating images from scratch based on a text prompt (DALL·E 3 and DALL·E 2)
Creating edited versions of images by having the model replace some areas of a pre-existing image, based on a new text prompt (DALL·E 2 only)
Creating variations of an existing image (DALL·E 2 only)
This guide covers the basics of using these three API endpoints with useful code samples. To try DALL·E 3, head to ChatGPT.
Usage
Generations
The image generations endpoint allows you to create an original image given a text prompt. When using DALL·E 3, images can have a size of 1024×1024, 1024×1792 or 1792×1024 pixels.
By default, images are generated at standard quality, but when using DALL·E 3 you can set quality: “hd” for enhanced detail. Square, standard quality images are the fastest to generate.
You can request 1 image at a time with DALL·E 3 (request more by making parallel requests) or up to 10 images at a time using DALL·E 2 with the n parameter.
Generate an image
node.js
node.js
const response = await openai.images.generate({
model: “dall-e-3”,
prompt: “a white siamese cat”,
n: 1,
size: “1024×1024”,
});
image_url = response.data[0].url;
What is new with DALL·E 3
Explore what is new with DALL·E 3 in the OpenAI Cookbook
Prompting
With the release of DALL·E 3, the model now takes in the default prompt provided and automatically re-write it for safety reasons, and to add more detail (more detailed prompts generally result in higher quality images).
While it is not currently possible to disable this feature, you can use prompting to get outputs closer to your requested image by adding the following to your prompt: I NEED to test how the tool works with extremely simple prompts. DO NOT add any detail, just use it AS-IS:.
The updated prompt is visible in the revised_prompt field of the data response object.
Example DALL·E 3 generations
PROMPT GENERATION
A photograph of a white Siamese cat.
Each image can be returned as either a URL or Base64 data, using the response_format parameter. URLs will expire after an hour.
Edits (DALL·E 2 only)
Also known as “inpainting”, the image edits endpoint allows you to edit or extend an image by uploading an image and mask indicating which areas should be replaced. The transparent areas of the mask indicate where the image should be edited, and the prompt should describe the full new image, not just the erased area. This endpoint can enable experiences like DALL·E image editing in ChatGPT Plus.
Edit an image
node.js
node.js
const response = await openai.images.edit({
model: “dall-e-2”,
image: fs.createReadStream(“sunlit_lounge.png”),
mask: fs.createReadStream(“mask.png”),
prompt: “A sunlit indoor lounge area with a pool containing a flamingo”,
n: 1,
size: “1024×1024”
});
image_url = response.data[0].url;
IMAGE MASK OUTPUT
Prompt: a sunlit indoor lounge area with a pool containing a flamingo
The uploaded image and mask must both be square PNG images less than 4MB in size, and also must have the same dimensions as each other. The non-transparent areas of the mask are not used when generating the output, so they don’t necessarily need to match the original image like the example above.
Variations (DALL·E 2 only)
The image variations endpoint allows you to generate a variation of a given image.
Generate an image variation
node.js
node.js
const response = await openai.images.createVariation({
model: “dall-e-2”,
image: fs.createReadStream(“corgi_and_cat_paw.png”),
n: 1,
size: “1024×1024”
});
image_url = response.data[0].url;
IMAGE OUTPUT
Similar to the edits endpoint, the input image must be a square PNG image less than 4MB in size.
Content moderation
Prompts and images are filtered based on our content policy, returning an error when a prompt or image is flagged.
Language-specific tips
Using in-memory image data
The Node.js examples in the guide above use the fs module to read image data from disk. In some cases, you may have your image data in memory instead. Here’s an example API call that uses image data stored in a Node.js Buffer object:
import OpenAI from “openai”;
const openai = new OpenAI();
// This is the Buffer object that contains your image data
const buffer = [your image data];
// Set a name
that ends with .png so that the API knows it’s a PNG image
buffer.name = “image.png”;
async function main() {
const image = await openai.images.createVariation({ model: “dall-e-2”, image: buffer, n: 1, size: “1024×1024” });
console.log(image.data);
}
main();
Working with TypeScript
If you’re using TypeScript, you may encounter some quirks with image file arguments. Here’s an example of working around the type mismatch by explicitly casting the argument:
import fs from “fs”;
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
// Cast the ReadStream to any
to appease the TypeScript compiler
const image = await openai.images.createVariation({
image: fs.createReadStream(“image.png”) as any,
});
console.log(image.data);
}
main();
And here’s a similar example for in-memory image data:
import fs from “fs”;
import OpenAI from “openai”;
const openai = new OpenAI();
// This is the Buffer object that contains your image data
const buffer: Buffer = [your image data];
// Cast the buffer to any
so that we can set the name
property
const file: any = buffer;
// Set a name
that ends with .png so that the API knows it’s a PNG image
file.name = “image.png”;
async function main() {
const image = await openai.images.createVariation({
file,
1,
“1024×1024”
});
console.log(image.data);
}
main();
Error handling
API requests can potentially return errors due to invalid inputs, rate limits, or other issues. These errors can be handled with a try…catch statement, and the error details can be found in either error.response or error.message:
import fs from “fs”;
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
try {
const image = await openai.images.createVariation({
image: fs.createReadStream(“image.png”),
n: 1,
size: “1024×1024”,
});
console.log(image.data);
} catch (error) {
if (error.response) {
console.log(error.response.status);
console.log(error.response.data);
} else {
console.log(error.message);
}
}
}
main();
Vision
Learn how to use vision capabilities to understand images.
Introduction
Both GPT-4o and GPT-4 Turbo have vision capabilities, meaning the models can take in images and answer questions about them. Historically, language model systems have been limited by taking in a single input modality, text.
Quick start
Images are made available to the model in two main ways: by passing a link to the image or by passing the base64 encoded image directly in the request. Images can be passed in the user messages.
What’s in this image?
node
node
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
const response = await openai.chat.completions.create({
model: “gpt-4o”,
messages: [
{
role: “user”,
content: [
{ type: “text”, text: “What’s in this image?” },
{
type: “image_url”,
image_url: {
“url”: “https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg”,
},
},
],
},
],
});
console.log(response.choices[0]);
}
main();
The model is best at answering general questions about what is present in the images. While it does understand the relationship between objects in images, it is not yet optimized to answer detailed questions about the location of certain objects in an image. For example, you can ask it what color a car is or what some ideas for dinner might be based on what is in you fridge, but if you show it an image of a room and ask it where the chair is, it may not answer the question correctly.
It is important to keep in mind the limitations of the model as you explore what use-cases visual understanding can be applied to.
Video understanding with vision
Learn how to use use GPT-4 with Vision to understand videos in the OpenAI Cookbook
Uploading base 64 encoded images
If you have an image or set of images locally, you can pass those to the model in base 64 encoded format, here is an example of this in action:
import base64
import requests
OpenAI API Key
api_key = “YOUR_OPENAI_API_KEY”
Function to encode the image
def encode_image(image_path):
with open(image_path, “rb”) as image_file:
return base64.b64encode(image_file.read()).decode(‘utf-8’)
Path to your image
image_path = “path_to_your_image.jpg”
Getting the base64 string
base64_image = encode_image(image_path)
headers = {
“Content-Type”: “application/json”,
“Authorization”: f”Bearer {api_key}”
}
payload = {
“model”: “gpt-4o”,
“messages”: [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “What’s in this image?”
},
{
“type”: “image_url”,
“image_url”: {
“url”: f”data:image/jpeg;base64,{base64_image}”
}
}
]
}
],
“max_tokens”: 300
}
response = requests.post(“https://api.openai.com/v1/chat/completions”, headers=headers, json=payload)
print(response.json())
Multiple image inputs
The Chat Completions API is capable of taking in and processing multiple image inputs in both base64 encoded format or as an image URL. The model will process each image and use the information from all of them to answer the question.
Multiple image inputs
node
node
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
const response = await openai.chat.completions.create({
model: “gpt-4o”,
messages: [
{
role: “user”,
content: [
{ type: “text”, text: “What are in these images? Is there any difference between them?” },
{
type: “image_url”,
image_url: {
“url”: “https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg”,
},
},
{
type: “image_url”,
image_url: {
“url”: “https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg”,
},
}
],
},
],
});
console.log(response.choices[0]);
}
main();
Here the model is shown two copies of the same image and can answer questions about both or each of the images independently.
Low or high fidelity image understanding
By controlling the detail parameter, which has three options, low, high, or auto, you have control over how the model processes the image and generates its textual understanding. By default, the model will use the auto setting which will look at the image input size and decide if it should use the low or high setting.
low will enable the “low res” mode. The model will receive a low-res 512px x 512px version of the image, and represent the image with a budget of 85 tokens. This allows the API to return faster responses and consume fewer input tokens for use cases that do not require high detail.
high will enable “high res” mode, which first allows the model to first see the low res image (using 85 tokens) and then creates detailed crops using 170 tokens for each 512px x 512px tile.
Choosing the detail level
node
node
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
const response = await openai.chat.completions.create({
model: “gpt-4o”,
messages: [
{
role: “user”,
content: [
{ type: “text”, text: “What’s in this image?” },
{
type: “image_url”,
image_url: {
“url”: “https://upload.wikimedia.org/wikipedia/commons/thumb/d/dd/Gfp-wisconsin-madison-the-nature-boardwalk.jpg/2560px-Gfp-wisconsin-madison-the-nature-boardwalk.jpg”,
“detail”: “low”
},
},
],
},
],
});
console.log(response.choices[0]);
}
main();
Managing images
The Chat Completions API, unlike the Assistants API, is not stateful. That means you have to manage the messages (including images) you pass to the model yourself. If you want to pass the same image to the model multiple times, you will have to pass the image each time you make a request to the API.
For long running conversations, we suggest passing images via URL’s instead of base64. The latency of the model can also be improved by downsizing your images ahead of time to be less than the maximum size they are expected them to be. For low res mode, we expect a 512px x 512px image. For high res mode, the short side of the image should be less than 768px and the long side should be less than 2,000px.
After an image has been processed by the model, it is deleted from OpenAI servers and not retained. We do not use data uploaded via the OpenAI API to train our models.
Limitations
While GPT-4 with vision is powerful and can be used in many situations, it is important to understand the limitations of the model. Here are some of the limitations we are aware of:
Medical images: The model is not suitable for interpreting specialized medical images like CT scans and shouldn’t be used for medical advice.
Non-English: The model may not perform optimally when handling images with text of non-Latin alphabets, such as Japanese or Korean.
Small text: Enlarge text within the image to improve readability, but avoid cropping important details.
Rotation: The model may misinterpret rotated / upside-down text or images.
Visual elements: The model may struggle to understand graphs or text where colors or styles like solid, dashed, or dotted lines vary.
Spatial reasoning: The model struggles with tasks requiring precise spatial localization, such as identifying chess positions.
Accuracy: The model may generate incorrect descriptions or captions in certain scenarios.
Image shape: The model struggles with panoramic and fisheye images.
Metadata and resizing: The model doesn’t process original file names or metadata, and images are resized before analysis, affecting their original dimensions.
Counting: May give approximate counts for objects in images.
CAPTCHAS: For safety reasons, we have implemented a system to block the submission of CAPTCHAs.
Calculating costs
Image inputs are metered and charged in tokens, just as text inputs are. The token cost of a given image is determined by two factors: its size, and the detail option on each image_url block. All images with detail: low cost 85 tokens each. detail: high images are first scaled to fit within a 2048 x 2048 square, maintaining their aspect ratio. Then, they are scaled such that the shortest side of the image is 768px long. Finally, we count how many 512px squares the image consists of. Each of those squares costs 170 tokens. Another 85 tokens are always added to the final total.
Here are some examples demonstrating the above.
A 1024 x 1024 square image in detail: high mode costs 765 tokens
1024 is less than 2048, so there is no initial resize.
The shortest side is 1024, so we scale the image down to 768 x 768.
4 512px square tiles are needed to represent the image, so the final token cost is 170 * 4 + 85 = 765.
A 2048 x 4096 image in detail: high mode costs 1105 tokens
We scale down the image to 1024 x 2048 to fit within the 2048 square.
The shortest side is 1024, so we further scale down to 768 x 1536.
6 512px tiles are needed, so the final token cost is 170 * 6 + 85 = 1105.
A 4096 x 8192 image in detail: low most costs 85 tokens
Regardless of input size, low detail images are a fixed cost.
FAQ
Can I fine-tune the image capabilities in gpt-4?
No, we do not support fine-tuning the image capabilities of gpt-4 at this time.
Can I use gpt-4 to generate images?
No, you can use dall-e-3 to generate images and gpt-4o or gpt-4-turbo to understand images.
What type of files can I upload?
We currently support PNG (.png), JPEG (.jpeg and .jpg), WEBP (.webp), and non-animated GIF (.gif).
Is there a limit to the size of the image I can upload?
Yes, we restrict image uploads to 20MB per image.
Can I delete an image I uploaded?
No, we will delete the image for you automatically after it has been processed by the model.
Where can I learn more about the considerations of GPT-4 with Vision?
You can find details about our evaluations, preparation, and mitigation work in the GPT-4 with Vision system card.
We have further implemented a system to block the submission of CAPTCHAs.
How do rate limits for GPT-4 with Vision work?
We process images at the token level, so each image we process counts towards your tokens per minute (TPM) limit. See the calculating costs section for details on the formula used to determine token count per image.
Can GPT-4 with Vision understand image metadata?
No, the model does not receive image metadata.
What happens if my image is unclear?
If an image is ambiguous or unclear, the model will do its best to interpret it. However, the results may be less accurate. A good rule of thumb is that if an average human cannot see the info in an image at the resolutions used in low/high res mode, then the model cannot either.
Text to speech
Learn how to turn text into lifelike spoken audio
Introduction
The Audio API provides a speech endpoint based on our TTS (text-to-speech) model. It comes with 6 built-in voices and can be used to:
Narrate a written blog post
Produce spoken audio in multiple languages
Give real time audio output using streaming
Here is an example of the alloy voice:
Please note that our usage policies require you to provide a clear disclosure to end users that the TTS voice they are hearing is AI-generated and not a human voice.
Quick start
The speech endpoint takes in three key inputs: the model, the text that should be turned into audio, and the voice to be used for the audio generation. A simple request would look like the following:
Generate spoken audio from input text
node
node
import fs from “fs”;
import path from “path”;
import OpenAI from “openai”;
const openai = new OpenAI();
const speechFile = path.resolve(“./speech.mp3”);
async function main() {
const mp3 = await openai.audio.speech.create({
model: “tts-1”,
voice: “alloy”,
input: “Today is a wonderful day to build something people love!”,
});
console.log(speechFile);
const buffer = Buffer.from(await mp3.arrayBuffer());
await fs.promises.writeFile(speechFile, buffer);
}
main();
By default, the endpoint will output a MP3 file of the spoken audio but it can also be configured to output any of our supported formats.
Audio quality
For real-time applications, the standard tts-1 model provides the lowest latency but at a lower quality than the tts-1-hd model. Due to the way the audio is generated, tts-1 is likely to generate content that has more static in certain situations than tts-1-hd. In some cases, the audio may not have noticeable differences depending on your listening device and the individual person.
Voice options
Experiment with different voices (alloy, echo, fable, onyx, nova, and shimmer) to find one that matches your desired tone and audience. The current voices are optimized for English.
Alloy
Echo
Fable
Onyx
Nova
Shimmer
Supported output formats
The default response format is “mp3”, but other formats like “opus”, “aac”, “flac”, and “pcm” are available.
Opus: For internet streaming and communication, low latency.
AAC: For digital audio compression, preferred by YouTube, Android, iOS.
FLAC: For lossless audio compression, favored by audio enthusiasts for archiving.
WAV: Uncompressed WAV audio, suitable for low-latency applications to avoid decoding overhead.
PCM: Similar to WAV but containing the raw samples in 24kHz (16-bit signed, low-endian), without the header.
Supported languages
The TTS model generally follows the Whisper model in terms of language support. Whisper supports the following languages and performs well despite the current voices being optimized for English:
Afrikaans, Arabic, Armenian, Azerbaijani, Belarusian, Bosnian, Bulgarian, Catalan, Chinese, Croatian, Czech, Danish, Dutch, English, Estonian, Finnish, French, Galician, German, Greek, Hebrew, Hindi, Hungarian, Icelandic, Indonesian, Italian, Japanese, Kannada, Kazakh, Korean, Latvian, Lithuanian, Macedonian, Malay, Marathi, Maori, Nepali, Norwegian, Persian, Polish, Portuguese, Romanian, Russian, Serbian, Slovak, Slovenian, Spanish, Swahili, Swedish, Tagalog, Tamil, Thai, Turkish, Ukrainian, Urdu, Vietnamese, and Welsh.
You can generate spoken audio in these languages by providing the input text in the language of your choice.
Streaming real time audio
The Speech API provides support for real time audio streaming using chunk transfer encoding. This means that the audio is able to be played before the full file has been generated and made accessible.
from openai import OpenAI
client = OpenAI()
response = client.audio.speech.create(
model=”tts-1″,
voice=”alloy”,
input=”Hello world! This is a streaming test.”,
)
response.stream_to_file(“output.mp3”)
Post-processing with GPT-4
The second method involves a post-processing step using GPT-4 or GPT-3.5-Turbo.
We start by providing instructions for GPT-4 through the system_prompt variable. Similar to what we did with the prompt parameter earlier, we can define our company and product names.
Post-processing
node
node
const systemPrompt = “You are a helpful assistant for the company ZyntriQix. Your task is to correct any spelling discrepancies in the transcribed text. Make sure that the names of the following products are spelled correctly: ZyntriQix, Digique Plus, CynapseFive, VortiQore V8, EchoNix Array, OrbitalLink Seven, DigiFractal Matrix, PULSE, RAPT, B.R.I.C.K., Q.U.A.R.T.Z., F.L.I.N.T. Only add necessary punctuation such as periods, commas, and capitalization, and use only the context provided.”;
async function generateCorrectedTranscript(temperature, systemPrompt, audioFile) {
const transcript = await transcribe(audioFile);
const completion = await openai.chat.completions.create({
model: “gpt-4o”,
temperature: temperature,
messages: [
{
role: “system”,
content: systemPrompt
},
{
role: “user”,
content: transcript
}
]
});
return completion.choices[0].message.content;
}
const fakeCompanyFilepath = “path/to/audio/file”;
generateCorrectedTranscript(0, systemPrompt, fakeCompanyFilepath)
.then(correctedText => console.log(correctedText))
.catch(error => console.error(error));
If you try this on your own audio file, you can see that GPT-4 manages to correct many misspellings in the transcript. Due to its larger context window, this method might be more scalable than using Whisper’s prompt parameter and is more reliable since GPT-4 can be instructed and guided in ways that aren’t possible with Whisper given the lack of instruction following.
Moderation
Learn how to build moderation into your AI applications.
Overview
The moderations endpoint is a tool you can use to check whether text is potentially harmful. Developers can use it to identify content that might be harmful and take action, for instance by filtering it.
The models classifies the following categories:
CATEGORY DESCRIPTION
hate Content that expresses, incites, or promotes hate based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste. Hateful content aimed at non-protected groups (e.g., chess players) is harassment.
hate/threatening Hateful content that also includes violence or serious harm towards the targeted group based on race, gender, ethnicity, religion, nationality, sexual orientation, disability status, or caste.
harassment Content that expresses, incites, or promotes harassing language towards any target.
harassment/threatening Harassment content that also includes violence or serious harm towards any target.
self-harm Content that promotes, encourages, or depicts acts of self-harm, such as suicide, cutting, and eating disorders.
self-harm/intent Content where the speaker expresses that they are engaging or intend to engage in acts of self-harm, such as suicide, cutting, and eating disorders.
self-harm/instructions Content that encourages performing acts of self-harm, such as suicide, cutting, and eating disorders, or that gives instructions or advice on how to commit such acts.
sexual Content meant to arouse sexual excitement, such as the description of sexual activity, or that promotes sexual services (excluding sex education and wellness).
sexual/minors Sexual content that includes an individual who is under 18 years old.
violence Content that depicts death, violence, or physical injury.
violence/graphic Content that depicts death, violence, or physical injury in graphic detail.
The moderation endpoint is free to use for most developers. For higher accuracy, try splitting long pieces of text into smaller chunks each less than 2,000 characters.
We are continuously working to improve the accuracy of our classifier. Our support for non-English languages is currently limited.
Quickstart
To obtain a classification for a piece of text, make a request to the moderation endpoint as demonstrated in the following code snippets:
Example: Getting moderations
node
node
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
const moderation = await openai.moderations.create({ input: “Sample text goes here.” });
console.log(moderation);
}
main();
Below is an example output of the endpoint. It returns the following fields:
flagged: Set to true if the model classifies the content as potentially harmful, false otherwise.
categories: Contains a dictionary of per-category violation flags. For each category, the value is true if the model flags the corresponding category as violated, false otherwise.
category_scores: Contains a dictionary of per-category raw scores output by the model, denoting the model’s confidence that the input violates the OpenAI’s policy for the category. The value is between 0 and 1, where higher values denote higher confidence. The scores should not be interpreted as probabilities.
{
“id”: “modr-XXXXX”,
“model”: “text-moderation-007”,
“results”: [
{
“flagged”: true,
“categories”: {
“sexual”: false,
“hate”: false,
“harassment”: false,
“self-harm”: false,
“sexual/minors”: false,
“hate/threatening”: false,
“violence/graphic”: false,
“self-harm/intent”: false,
“self-harm/instructions”: false,
“harassment/threatening”: true,
“violence”: true
},
“category_scores”: {
“sexual”: 1.2282071e-6,
“hate”: 0.010696256,
“harassment”: 0.29842457,
“self-harm”: 1.5236925e-8,
“sexual/minors”: 5.7246268e-8,
“hate/threatening”: 0.0060676364,
“violence/graphic”: 4.435014e-6,
“self-harm/intent”: 8.098441e-10,
“self-harm/instructions”: 2.8498655e-11,
“harassment/threatening”: 0.63055265,
“violence”: 0.99011886
}
}
]
}
Assistants API Beta
The Assistants API allows you to build AI assistants within your own applications. An Assistant has instructions and can leverage models, tools, and files to respond to user queries. The Assistants API currently supports three types of tools: Code Interpreter, File Search, and Function calling.
You can explore the capabilities of the Assistants API using the Assistants playground or by building a step-by-step integration outlined in this guide.
Overview
A typical integration of the Assistants API has the following flow:
Create an Assistant by defining its custom instructions and picking a model. If helpful, add files and enable tools like Code Interpreter, File Search, and Function calling.
Create a Thread when a user starts a conversation.
Add Messages to the Thread as the user asks questions.
Run the Assistant on the Thread to generate a response by calling the model and the tools.
This starter guide walks through the key steps to create and run an Assistant that uses Code Interpreter. In this example, we’re creating an Assistant that is a personal math tutor, with the Code Interpreter tool enabled.
Calls to the Assistants API require that you pass a beta HTTP header. This is handled automatically if you’re using OpenAI’s official Python or Node.js SDKs.
OpenAI-Beta: assistants=v2
Step 1: Create an Assistant
An Assistant represents an entity that can be configured to respond to a user’s messages using several parameters like model, instructions, and tools.
Create an Assistant
node.js
node.js
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
const assistant = await openai.beta.assistants.create({
name: “Math Tutor”,
instructions: “You are a personal math tutor. Write and run code to answer math questions.”,
tools: [{ type: “code_interpreter” }],
model: “gpt-4o”
});
}
main();
Step 2: Create a Thread
A Thread represents a conversation between a user and one or many Assistants. You can create a Thread when a user (or your AI application) starts a conversation with your Assistant.
Create a Thread
node.js
node.js
const thread = await openai.beta.threads.create();
Step 3: Add a Message to the Thread
The contents of the messages your users or applications create are added as Message objects to the Thread. Messages can contain both text and files. There is no limit to the number of Messages you can add to Threads — we smartly truncate any context that does not fit into the model’s context window.
Add a Message to the Thread
node.js
node.js
const message = await openai.beta.threads.messages.create(
thread.id,
{
role: “user”,
content: “I need to solve the equation 3x + 11 = 14
. Can you help me?”
}
);
Step 4: Create a Run
Once all the user Messages have been added to the Thread, you can Run the Thread with any Assistant. Creating a Run uses the model and tools associated with the Assistant to generate a response. These responses are added to the Thread as assistant Messages.
You can use the ‘create and stream’ helpers in the Python and Node SDKs to create a run and stream the response.
Create and Stream a Run
node.js
node.js
// We use the stream SDK helper to create a run with
// streaming. The SDK provides helpful event listeners to handle
// the streamed response.
const run = openai.beta.threads.runs.stream(thread.id, {
assistant_id: assistant.id
})
.on(‘textCreated’, (text) => process.stdout.write(‘\nassistant > ‘))
.on(‘textDelta’, (textDelta, snapshot) => process.stdout.write(textDelta.value))
.on(‘toolCallCreated’, (toolCall) => process.stdout.write(\nassistant > ${toolCall.type}\n\n
))
.on(‘toolCallDelta’, (toolCallDelta, snapshot) => {
if (toolCallDelta.type === ‘code_interpreter’) {
if (toolCallDelta.code_interpreter.input) {
process.stdout.write(toolCallDelta.code_interpreter.input);
}
if (toolCallDelta.code_interpreter.outputs) {
process.stdout.write(“\noutput >\n”);
toolCallDelta.code_interpreter.outputs.forEach(output => {
if (output.type === “logs”) {
process.stdout.write(\n${output.logs}\n
);
}
});
}
}
});
See the full list of Assistants streaming events in our API reference here. You can also see a list of SDK event listeners for these events in the Python & Node repository documentation.
How Assistants work Beta
The Assistants API is designed to help developers build powerful AI assistants capable of performing a variety of tasks.
The Assistants API is in beta and we are actively working on adding more functionality. Share your feedback in our Developer Forum!
Assistants can call OpenAI’s models with specific instructions to tune their personality and capabilities.
Assistants can access multiple tools in parallel. These can be both OpenAI-hosted tools — like code_interpreter and file_search — or tools you build / host (via function calling).
Assistants can access persistent Threads. Threads simplify AI application development by storing message history and truncating it when the conversation gets too long for the model’s context length. You create a Thread once, and simply append Messages to it as your users reply.
Assistants can access files in several formats — either as part of their creation or as part of Threads between Assistants and users. When using tools, Assistants can also create files (e.g., images, spreadsheets, etc) and cite files they reference in the Messages they create.
Objects
Assistants object architecture diagram
OBJECT WHAT IT REPRESENTS
Assistant Purpose-built AI that uses OpenAI’s models and calls tools
Thread A conversation session between an Assistant and a user. Threads store Messages and automatically handle truncation to fit content into a model’s context.
Message A message created by an Assistant or a user. Messages can include text, images, and other files. Messages stored as a list on the Thread.
Run An invocation of an Assistant on a Thread. The Assistant uses its configuration and the Thread’s Messages to perform tasks by calling models and tools. As part of a Run, the Assistant appends Messages to the Thread.
Run Step A detailed list of steps the Assistant took as part of a Run. An Assistant can call tools or create Messages during its run. Examining Run Steps allows you to introspect how the Assistant is getting to its final results.
Creating Assistants
We recommend using OpenAI’s latest models with the Assistants API for best results and maximum compatibility with tools.
To get started, creating an Assistant only requires specifying the model to use. But you can further customize the behavior of the Assistant:
Use the instructions parameter to guide the personality of the Assistant and define its goals. Instructions are similar to system messages in the Chat Completions API.
Use the tools parameter to give the Assistant access to up to 128 tools. You can give it access to OpenAI-hosted tools like code_interpreter and file_search, or call a third-party tools via a function calling.
Use the tool_resources parameter to give the tools like code_interpreter and file_search access to files. Files are uploaded using the File upload endpoint and must have the purpose set to assistants to be used with this API.
For example, to create an Assistant that can create data visualization based on a .csv file, first upload a file.
node.js
node.js
const file = await openai.files.create({
file: fs.createReadStream(“revenue-forecast.csv”),
purpose: “assistants”,
});
Then, create the Assistant with the code_interpreter tool enabled and provide the file as a resource to the tool.
node.js
node.js
const assistant = await openai.beta.assistants.create({
name: “Data visualizer”,
description: “You are great at creating beautiful data visualizations. You analyze data present in .csv files, understand trends, and come up with data visualizations relevant to those trends. You also share a brief text summary of the trends observed.”,
model: “gpt-4o”,
tools: [{“type”: “code_interpreter”}],
tool_resources: {
“code_interpreter”: {
“file_ids”: [file.id]
}
}
});
You can attach a maximum of 20 files to code_interpreter and 10,000 files to file_search (using vector_store objects).
Each file can be at most 512 MB in size and have a maximum of 5,000,000 tokens. By default, the size of all the files uploaded by your organization cannot exceed 100 GB, but you can reach out to our support team to increase this limit.
Managing Threads and Messages
Threads and Messages represent a conversation session between an Assistant and a user. There is no limit to the number of Messages you can store in a Thread. Once the size of the Messages exceeds the context window of the model, the Thread will attempt to smartly truncate messages, before fully dropping the ones it considers the least important.
You can create a Thread with an initial list of Messages like this:
node.js
node.js
const thread = await openai.beta.threads.create({
messages: [
{
“role”: “user”,
“content”: “Create 3 data visualizations based on the trends in this file.”,
“attachments”: [
{
file_id: file.id,
tools: [{type: “code_interpreter”}]
}
]
}
]
});
Messages can contain text, images, or file attachment. Message attachments are helper methods that add files to a thread’s tool_resources. You can also choose to add files to the thread.tool_resources directly.
Creating image input content
Message content can contain either external image URLs or File IDs uploaded via the File API. Only models with Vision support can accept image input. Supported image content types include png, jpg, gif, and webp. When creating image files, pass purpose=”vision” to allow you to later download and display the input content. Currently, there is a 100GB limit per organization and 10GB for user in organization. Please contact us to request a limit increase.
Tools cannot access image content unless specified. To pass image files to Code Interpreter, add the file ID in the message attachments list to allow the tool to read and analyze the input. Image URLs cannot be downloaded in Code Interpreter today.
node.js
node.js
import fs from “fs”;
const file = await openai.files.create({
file: fs.createReadStream(“myimage.png”),
purpose: “vision”,
});
const thread = await openai.beta.threads.create({
messages: [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “What is the difference between these images?”
},
{
“type”: “image_url”,
“image_url”: {“url”: “https://example.com/image.png”}
},
{
“type”: “image_file”,
“image_file”: {“file_id”: file.id}
},
]
}
]
});
Low or high fidelity image understanding
By controlling the detail parameter, which has three options, low, high, or auto, you have control over how the model processes the image and generates its textual understanding.
low will enable the “low res” mode. The model will receive a low-res 512px x 512px version of the image, and represent the image with a budget of 85 tokens. This allows the API to return faster responses and consume fewer input tokens for use cases that do not require high detail.
high will enable “high res” mode, which first allows the model to see the low res image and then creates detailed crops of input images based on the input image size. Use the pricing calculator to see token counts for various image sizes.
node.js
node.js
const thread = await openai.beta.threads.create({
messages: [
{
“role”: “user”,
“content”: [
{
“type”: “text”,
“text”: “What is this an image of?”
},
{
“type”: “image_url”,
“image_url”: {
“url”: “https://example.com/image.png”,
“detail”: “high”
}
},
]
}
]
});
Context window management
The Assistants API automatically manages the truncation to ensure it stays within the model’s maximum context length. You can customize this behavior by specifying the maximum tokens you’d like a run to utilize and/or the maximum number of recent messages you’d like to include in a run.
Max Completion and Max Prompt Tokens
To control the token usage in a single Run, set max_prompt_tokens and max_completion_tokens when creating the Run. These limits apply to the total number of tokens used in all completions throughout the Run’s lifecycle.
For example, initiating a Run with max_prompt_tokens set to 500 and max_completion_tokens set to 1000 means the first completion will truncate the thread to 500 tokens and cap the output at 1000 tokens. If only 200 prompt tokens and 300 completion tokens are used in the first completion, the second completion will have available limits of 300 prompt tokens and 700 completion tokens.
If a completion reaches the max_completion_tokens limit, the Run will terminate with a status of incomplete, and details will be provided in the incomplete_details field of the Run object.
When using the File Search tool, we recommend setting the max_prompt_tokens to no less than 20,000. For longer conversations or multiple interactions with File Search, consider increasing this limit to 50,000, or ideally, removing the max_prompt_tokens limits altogether to get the highest quality results.
Truncation Strategy
You may also specify a truncation strategy to control how your thread should be rendered into the model’s context window. Using a truncation strategy of type auto will use OpenAI’s default truncation strategy. Using a truncation strategy of type last_messages will allow you to specify the number of the most recent messages to include in the context window.
Message annotations
Messages created by Assistants may contain annotations within the content array of the object. Annotations provide information around how you should annotate the text in the Message.
There are two types of Annotations:
file_citation: File citations are created by the file_search tool and define references to a specific file that was uploaded and used by the Assistant to generate the response.
file_path: File path annotations are created by the code_interpreter tool and contain references to the files generated by the tool.
When annotations are present in the Message object, you’ll see illegible model-generated substrings in the text that you should replace with the annotations. These strings may look something like 【13†source】 or sandbox:/mnt/data/file.csv. Here’s an example python code snippet that replaces these strings with information present in the annotations.
python
python
Retrieve the message object
message = client.beta.threads.messages.retrieve(
thread_id=”…”,
message_id=”…”
)
Extract the message content
message_content = message.content[0].text
annotations = message_content.annotations
citations = []
Iterate over the annotations and add footnotes
for index, annotation in enumerate(annotations):
# Replace the text with a footnote
message_content.value = message_content.value.replace(annotation.text, f’ [{index}]’)
# Gather citations based on annotation attributes
if (file_citation := getattr(annotation, ‘file_citation’, None)):
cited_file = client.files.retrieve(file_citation.file_id)
citations.append(f'[{index}] {file_citation.quote} from {cited_file.filename}’)
elif (file_path := getattr(annotation, ‘file_path’, None)):
cited_file = client.files.retrieve(file_path.file_id)
citations.append(f'[{index}] Click to download {cited_file.filename}’)
# Note: File download functionality not implemented above for brevity
Add footnotes to the end of the message before displaying to user
message_content.value += ‘\n’ + ‘\n’.join(citations)
Runs and Run Steps
When you have all the context you need from your user in the Thread, you can run the Thread with an Assistant of your choice.
node.js
node.js
const run = await openai.beta.threads.runs.create(
thread.id,
{ assistant_id: assistant.id }
);
By default, a Run will use the model and tools configuration specified in Assistant object, but you can override most of these when creating the Run for added flexibility:
node.js
node.js
const run = await openai.beta.threads.runs.create(
thread.id,
{
assistant_id: assistant.id,
model: “gpt-4o”,
instructions: “New instructions that override the Assistant instructions”,
tools: [{“type”: “code_interpreter”}, {“type”: “file_search”}]
}
);
Note: tool_resources associated with the Assistant cannot be overridden during Run creation. You must use the modify Assistant endpoint to do this.
Run lifecycle
Run objects can have multiple statuses.
Run lifecycle – diagram showing possible status transitions
STATUS DEFINITION
queued When Runs are first created or when you complete the required_action, they are moved to a queued status. They should almost immediately move to in_progress.
in_progress While in_progress, the Assistant uses the model and tools to perform steps. You can view progress being made by the Run by examining the Run Steps.
completed The Run successfully completed! You can now view all Messages the Assistant added to the Thread, and all the steps the Run took. You can also continue the conversation by adding more user Messages to the Thread and creating another Run.
requires_action When using the Function calling tool, the Run will move to a required_action state once the model determines the names and arguments of the functions to be called. You must then run those functions and submit the outputs before the run proceeds. If the outputs are not provided before the expires_at timestamp passes (roughly 10 mins past creation), the run will move to an expired status.
expired This happens when the function calling outputs were not submitted before expires_at and the run expires. Additionally, if the runs take too long to execute and go beyond the time stated in expires_at, our systems will expire the run.
cancelling You can attempt to cancel an in_progress run using the Cancel Run endpoint. Once the attempt to cancel succeeds, status of the Run moves to cancelled. Cancellation is attempted but not guaranteed.
cancelled Run was successfully cancelled.
failed You can view the reason for the failure by looking at the last_error object in the Run. The timestamp for the failure will be recorded under failed_at.
incomplete Run ended due to max_prompt_tokens or max_completion_tokens reached. You can view the specific reason by looking at the incomplete_details object in the Run.
Polling for updates
If you are not using streaming, in order to keep the status of your run up to date, you will have to periodically retrieve the Run object. You can check the status of the run each time you retrieve the object to determine what your application should do next.
You can optionally use Polling Helpers in our Node and Python SDKs to help you with this. These helpers will automatically poll the Run object for you and return the Run object when it’s in a terminal state.
Thread locks
When a Run is in_progress and not in a terminal state, the Thread is locked. This means that:
New Messages cannot be added to the Thread.
New Runs cannot be created on the Thread.
Run steps
Run steps lifecycle – diagram showing possible status transitions
Run step statuses have the same meaning as Run statuses.
Most of the interesting detail in the Run Step object lives in the step_details field. There can be two types of step details:
message_creation: This Run Step is created when the Assistant creates a Message on the Thread.
tool_calls: This Run Step is created when the Assistant calls a tool. Details around this are covered in the relevant sections of the Tools guide.
Data access guidance
Currently, Assistants, Threads, Messages, and Vector Stores created via the API are scoped to the Project they’re created in. As such, any person with API key access to that Project is able to read or write Assistants, Threads, Messages, and Runs in the Project.
We strongly recommend the following data access controls:
Implement authorization. Before performing reads or writes on Assistants, Threads, Messages, and Vector Stores, ensure that the end-user is authorized to do so. For example, store in your database the object IDs that the end-user has access to, and check it before fetching the object ID with the API.
Restrict API key access. Carefully consider who in your organization should have API keys and be part of a Project. Periodically audit this list. API keys enable a wide range of operations including reading and modifying sensitive information, such as Messages and Files.
Create separate accounts. Consider creating separate Projects for different applications in order to isolate data across multiple applications.
Next
Now that you have explored how Assistants work, the next step is to explore Assistant Tools which covers topics like Function calling, File Search, and Code Interpreter.
File Search Beta
File Search augments the Assistant with knowledge from outside its model, such as proprietary product information or documents provided by your users. OpenAI automatically parses and chunks your documents, creates and stores the embeddings, and use both vector and keyword search to retrieve relevant content to answer user queries.
Quickstart
In this example, we’ll create an assistant that can help answer questions about companies’ financial statements.
Step 1: Create a new Assistant with File Search Enabled
Create a new assistant with file_search enabled in the tools parameter of the Assistant.
node.js
node.js
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
const assistant = await openai.beta.assistants.create({
name: “Financial Analyst Assistant”,
instructions: “You are an expert financial analyst. Use you knowledge base to answer questions about audited financial statements.”,
model: “gpt-4o”,
tools: [{ type: “file_search” }],
});
}
main();
Once the file_search tool is enabled, the model decides when to retrieve content based on user messages.
Step 2: Upload files and add them to a Vector Store
To access your files, the file_search tool uses the Vector Store object. Upload your files and create a Vector Store to contain them. Once the Vector Store is created, you should poll its status until all files are out of the in_progress state to ensure that all content has finished processing. The SDK provides helpers to uploading and polling in one shot.
node.js
node.js
const fileStreams = [“edgar/goog-10k.pdf”, “edgar/brka-10k.txt”].map((path) =>
fs.createReadStream(path),
);
// Create a vector store including our two files.
let vectorStore = await openai.beta.vectorStores.create({
name: “Financial Statement”,
});
await openai.beta.vectorStores.fileBatches.uploadAndPoll(vectorStore.id, fileStreams)
Step 3: Update the assistant to to use the new Vector Store
To make the files accessible to your assistant, update the assistant’s tool_resources with the new vector_store id.
node.js
node.js
await openai.beta.assistants.update(assistant.id, {
tool_resources: { file_search: { vector_store_ids: [vectorStore.id] } },
});
Step 4: Create a thread
You can also attach files as Message attachments on your thread. Doing so will create another vector_store associated with the thread, or, if there is already a vector store attached to this thread, attach the new files to the existing thread vector store. When you create a Run on this thread, the file search tool will query both the vector_store from your assistant and the vector_store on the thread.
In this example, the user attached a copy of Apple’s latest 10-K filing.
node.js
node.js
// A user wants to attach a file to a specific message, let’s upload it.
const aapl10k = await openai.files.create({
file: fs.createReadStream(“edgar/aapl-10k.pdf”),
purpose: “assistants”,
});
const thread = await openai.beta.threads.create({
messages: [
{
role: “user”,
content:
“How many shares of AAPL were outstanding at the end of of October 2023?”,
// Attach the new file to the message.
attachments: [{ file_id: aapl10k.id, tools: [{ type: “file_search” }] }],
},
],
});
// The thread now has a vector store in its tool resources.
console.log(thread.tool_resources?.file_search);
Vector stores created using message attachements have a default expiration policy of 7 days after they were last active (defined as the last time the vector store was part of a run). This default exists to help you manage your vector storage costs. You can override these expiration policies at any time. Learn more here.
Step 5: Create a run and check the output
Now, create a Run and observe that the model uses the File Search tool to provide a response to the user’s question.
node.js
node.js
const stream = openai.beta.threads.runs
.stream(thread.id, {
assistant_id: assistant.id,
})
.on(“textCreated”, () => console.log(“assistant >”))
.on(“toolCallCreated”, (event) => console.log(“assistant ” + event.type))
.on(“messageDone”, async (event) => {
if (event.content[0].type === “text”) {
const { text } = event.content[0];
const { annotations } = text;
const citations: string[] = [];
let index = 0;
for (let annotation of annotations) {
text.value = text.value.replace(annotation.text, "[" + index + "]");
const { file_citation } = annotation;
if (file_citation) {
const citedFile = await openai.files.retrieve(file_citation.file_id);
citations.push("[" + index + "]" + citedFile.filename);
}
index++;
}
console.log(text.value);
console.log(citations.join("\n"));
}
Your new assistant will query both attached vector stores (one containing goog-10k.pdf and brka-10k.txt, and the other containing aapl-10k.pdf) and return this result from aapl-10k.pdf.
How it works
The file_search tool implements several retrieval best practices out of the box to help you extract the right data from your files and augment the model’s responses. The file_search tool:
Rewrites user queries to optimize them for search.
Breaks down complex user queries into multiple searches it can run in parallel.
Runs both keyword and semantic searches across both assistant and thread vector stores.
Reranks search results to pick the most relevant ones before generating the final response.
By default, the file_search tool uses the following settings but these can be configured to suit your needs:
Chunk size: 800 tokens
Chunk overlap: 400 tokens
Embedding model: text-embedding-3-large at 256 dimensions
Maximum number of chunks added to context: 20 (could be fewer)
Known Limitations
We have a few known limitations we’re working on adding support for in the coming months:
Support for deterministic pre-search filtering using custom metadata.
Support for parsing images within documents (including images of charts, graphs, tables etc.)
Support for retrievals over structured file formats (like csv or jsonl).
Better support for summarization — the tool today is optimized for search queries.
Vector stores
Vector Store objects give the File Search tool the ability to search your files. Adding a file to a vector_store automatically parses, chunks, embeds and stores the file in a vector database that’s capable of both keyword and semantic search. Each vector_store can hold up to 10,000 files. Vector stores can be attached to both Assistants and Threads. Today, you can attach at most one vector store to an assistant and at most one vector store to a thread.
Creating vector stores and adding files
You can create a vector store and add files to it in a single API call:
node.js
node.js
const vectorStore = await openai.beta.vectorStores.create({
name: “Product Documentation”,
file_ids: [‘file_1’, ‘file_2’, ‘file_3’, ‘file_4’, ‘file_5’]
});
Adding files to vector stores is an async operation. To ensure the operation is complete, we recommend that you use the ‘create and poll’ helpers in our official SDKs. If you’re not using the SDKs, you can retrieve the vector_store object and monitor it’s file_counts property to see the result of the file ingestion operation.
Files can also be added to a vector store after it’s created by creating vector store files.
node.js
node.js
const file = await openai.beta.vectorStores.files.createAndPoll(
“vs_abc123”,
{ file_id: “file-abc123” }
);
Alternatively, you can add several files to a vector store by creating batches of up to 500 files.
node.js
node.js
const batch = await openai.beta.vectorStores.fileBatches.createAndPoll(
“vs_abc123”,
{ file_ids: [“file_1”, “file_2”, “file_3”, “file_4”, “file_5”] },
);
Similarly, these files can be removed from a vector store by either:
Deleting the vector store file object or,
By deleting the underlying file object (which removes the file it from all vector_store and code_interpreter configurations across all assistants and threads in your organization)
The maximum file size is 512 MB. Each file should contain no more than 5,000,000 tokens per file (computed automatically when you attach a file).
File Search supports a variety of file formats including .pdf, .md, and .docx. More details on the file extensions (and their corresponding MIME-types) supported can be found in the Supported files section below.
Attaching vector stores
You can attach vector stores to your Assistant or Thread using the tool_resources parameter.
node.js
node.js
const assistant = await openai.beta.assistants.create({
instructions: “You are a helpful product support assistant and you answer questions based on the files provided to you.”,
model: “gpt-4o”,
tools: [{“type”: “file_search”}],
tool_resources: {
“file_search”: {
“vector_store_ids”: [“vs_1”]
}
}
});
const thread = await openai.beta.threads.create({
messages: [ { role: “user”, content: “How do I cancel my subscription?”} ],
tool_resources: {
“file_search”: {
“vector_store_ids”: [“vs_2”]
}
}
});
You can also attach a vector store to Threads or Assistants after they’re created by updating them with the right tool_resources.
Ensuring vector store readiness before creating runs
We highly recommend that you ensure all files in a vector_store are fully processed before you create a run. This will ensure that all the data in your vector_store is searchable. You can check for vector_store readiness by using the polling helpers in our SDKs, or by manually polling the vector_store object to ensure the status is completed.
As a fallback, we’ve built a 60 second maximum wait in the Run object when the thread’s vector store contains files that are still being processed. This is to ensure that any files your users upload in a thread a fully searchable before the run proceeds. This fallback wait does not apply to the assistant’s vector store.
Customizing File Search settings
You can customize how the file_search tool chunks your data and how many chunks it returns to the model context.
Chunking configuration
By default, max_chunk_size_tokens is set to 800 and chunk_overlap_tokens is set to 400, meaning every file is indexed by being split up into 800-token chunks, with 400-token overlap between consecutive chunks.
You can adjust this by setting chunking_strategy when adding files to the vector store. There are certain limitations to chunking_strategy:
max_chunk_size_tokens must be between 100 and 4096 inclusive.
chunk_overlap_tokens must be non-negative and should not exceed max_chunk_size_tokens / 2.
Number of chunks
By default, the file_search tool outputs up to 20 chunks for gpt-4* models and up to 5 chunks for gpt-3.5-turbo. You can adjust this by setting file_search.max_num_results in the tool when creating the assistant or the run.
Note that the file_search tool may output fewer than this number for a myriad of reasons:
The total number of chunks is fewer than max_num_results.
The total token size of all the retrieved chunks exceeds the token “budget” assigned to the file_search tool. The file_search tool currently has a token bugdet of:
4,000 tokens for gpt-3.5-turbo
16,000 tokens for gpt-4* models
Managing costs with expiration policies
The file_search tool uses the vector_stores object as its resource and you will be billed based on the size of the vector_store objects created. The size of the vector store object is the sum of all the parsed chunks from your files and their corresponding embeddings.
You first GB is free and beyond that, usage is billed at $0.10/GB/day of vector storage. There are no other costs associated with vector store operations.
In order to help you manage the costs associated with these vector_store objects, we have added support for expiration policies in the vector_store object. You can set these policies when creating or updating the vector_store object.
node.js
node.js
let vectorStore = await openai.beta.vectorStores.create({
name: “rag-store”,
file_ids: [‘file_1’, ‘file_2’, ‘file_3’, ‘file_4’, ‘file_5’],
expires_after: {
anchor: “last_active_at”,
days: 7
}
});
Thread vector stores have default expiration policies
Vector stores created using thread helpers (like tool_resources.file_search.vector_stores in Threads or message.attachments in Messages) have a default expiration policy of 7 days after they were last active (defined as the last time the vector store was part of a run).
When a vector store expires, runs on that thread will fail. To fix this, you can simply recreate a new vector_store with the same files and reattach it to the thread.
node.js
node.js
const fileIds = [];
for await (const file of openai.beta.vectorStores.files.list(
“vs_toWTk90YblRLCkbE2xSVoJlF”,
)) {
fileIds.push(file.id);
}
const vectorStore = await openai.beta.vectorStores.create({
name: “rag-store”,
});
await openai.beta.threads.update(“thread_abcd”, {
tool_resources: { file_search: { vector_store_ids: [vectorStore.id] } },
});
for (const fileBatch of _.chunk(fileIds, 100)) {
await openai.beta.vectorStores.fileBatches.create(vectorStore.id, {
file_ids: fileBatch,
});
}
Supported files
For text/ MIME types, the encoding must be one of utf-8, utf-16, or ascii.
FILE FORMAT MIME TYPE
.c text/x-c
.cs text/x-csharp
.cpp text/x-c++
.doc application/msword
.docx application/vnd.openxmlformats-officedocument.wordprocessingml.document
.html text/html
.java text/x-java
.json application/json
.md text/markdown
.pdf application/pdf
.php text/x-php
.pptx application/vnd.openxmlformats-officedocument.presentationml.presentation
.py text/x-python
.py text/x-script.python
.rb text/x-ruby
.tex text/x-tex
.txt text/plain
.css text/css
.js text/javascript
.sh application/x-sh
.ts application/typescript
Code Interpreter Beta
Code Interpreter allows Assistants to write and run Python code in a sandboxed execution environment. This tool can process files with diverse data and formatting, and generate files with data and images of graphs. Code Interpreter allows your Assistant to run code iteratively to solve challenging code and math problems. When your Assistant writes code that fails to run, it can iterate on this code by attempting to run different code until the code execution succeeds.
See a quickstart of how to get started with Code Interpreter here.
How it works
Code Interpreter is charged at $0.03 per session. If your Assistant calls Code Interpreter simultaneously in two different threads (e.g., one thread per end-user), two Code Interpreter sessions are created. Each session is active by default for one hour, which means that you only pay for one session per if users interact with Code Interpreter in the same thread for up to one hour.
Enabling Code Interpreter
Pass code_interpreter in the tools parameter of the Assistant object to enable Code Interpreter:
node.js
node.js
const assistant = await openai.beta.assistants.create({
instructions: “You are a personal math tutor. When asked a math question, write and run code to answer the question.”,
model: “gpt-4o”,
tools: [{“type”: “code_interpreter”}]
});
The model then decides when to invoke Code Interpreter in a Run based on the nature of the user request. This behavior can be promoted by prompting in the Assistant’s instructions (e.g., “write code to solve this problem”).
Passing files to Code Interpreter
Files that are passed at the Assistant level are accessible by all Runs with this Assistant:
node.js
node.js
// Upload a file with an “assistants” purpose
const file = await openai.files.create({
file: fs.createReadStream(“mydata.csv”),
purpose: “assistants”,
});
// Create an assistant using the file ID
const assistant = await openai.beta.assistants.create({
instructions: “You are a personal math tutor. When asked a math question, write and run code to answer the question.”,
model: “gpt-4o”,
tools: [{“type”: “code_interpreter”}],
tool_resources: {
“code_interpreter”: {
“file_ids”: [file.id]
}
}
});
Files can also be passed at the Thread level. These files are only accessible in the specific Thread. Upload the File using the File upload endpoint and then pass the File ID as part of the Message creation request:
node.js
node.js
const thread = await openai.beta.threads.create({
messages: [
{
“role”: “user”,
“content”: “I need to solve the equation 3x + 11 = 14
. Can you help me?”,
“attachments”: [
{
file_id: file.id,
tools: [{type: “code_interpreter”}]
}
]
}
]
});
Files have a maximum size of 512 MB. Code Interpreter supports a variety of file formats including .csv, .pdf, .json and many more. More details on the file extensions (and their corresponding MIME-types) supported can be found in the Supported files section below.
Reading images and files generated by Code Interpreter
Code Interpreter in the API also outputs files, such as generating image diagrams, CSVs, and PDFs. There are two types of files that are generated:
Images
Data files (e.g. a csv file with data generated by the Assistant)
When Code Interpreter generates an image, you can look up and download this file in the file_id field of the Assistant Message response:
{
“id”: “msg_abc123”,
“object”: “thread.message”,
“created_at”: 1698964262,
“thread_id”: “thread_abc123”,
“role”: “assistant”,
“content”: [
{
“type”: “image_file”,
“image_file”: {
“file_id”: “file-abc123”
}
}
]
# …
}
The file content can then be downloaded by passing the file ID to the Files API:
node.js
node.js
import fs from “fs”;
import OpenAI from “openai”;
const openai = new OpenAI();
async function main() {
const response = await openai.files.content(“file-abc123”);
// Extract the binary data from the Response object
const image_data = await response.arrayBuffer();
// Convert the binary data to a Buffer
const image_data_buffer = Buffer.from(image_data);
// Save the image to a specific location
fs.writeFileSync(“./my-image.png”, image_data_buffer);
}
main();
When Code Interpreter references a file path (e.g., ”Download this csv file”), file paths are listed as annotations. You can convert these annotations into links to download the file:
{
“id”: “msg_abc123”,
“object”: “thread.message”,
“created_at”: 1699073585,
“thread_id”: “thread_abc123”,
“role”: “assistant”,
“content”: [
{
“type”: “text”,
“text”: {
“value”: “The rows of the CSV file have been shuffled and saved to a new CSV file. You can download the shuffled CSV file from the following link:\n\nDownload Shuffled CSV File“,
“annotations”: [
{
“type”: “file_path”,
“text”: “sandbox:/mnt/data/shuffled_file.csv”,
“start_index”: 167,
“end_index”: 202,
“file_path”: {
“file_id”: “file-abc123”
}
}
…
Input and output logs of Code Interpreter
By listing the steps of a Run that called Code Interpreter, you can inspect the code input and outputs logs of Code Interpreter:
node.js
node.js
const runSteps = await openai.beta.threads.runs.steps.list(
thread.id,
run.id
);
{
“object”: “list”,
“data”: [
{
“id”: “step_abc123”,
“object”: “thread.run.step”,
“type”: “tool_calls”,
“run_id”: “run_abc123”,
“thread_id”: “thread_abc123”,
“status”: “completed”,
“step_details”: {
“type”: “tool_calls”,
“tool_calls”: [
{
“type”: “code”,
“code”: {
“input”: “# Calculating 2 + 2\nresult = 2 + 2\nresult”,
“outputs”: [
{
“type”: “logs”,
“logs”: “4”
}
…
}
Supported files
For text/ MIME types, the encoding must be one of utf-8, utf-16, or ascii.
FILE FORMAT MIME TYPE
.c text/x-c
.cs text/x-csharp
.cpp text/x-c++
.doc application/msword
.docx application/vnd.openxmlformats-officedocument.wordprocessingml.document
.html text/html
.java text/x-java
.json application/json
.md text/markdown
.pdf application/pdf
.php text/x-php
.pptx application/vnd.openxmlformats-officedocument.presentationml.presentation
.py text/x-python
.py text/x-script.python
.rb text/x-ruby
.tex text/x-tex
.txt text/plain
.css text/css
.js text/javascript
.sh application/x-sh
.ts application/typescript
.csv application/csv
.jpeg image/jpeg
.jpg image/jpeg
.gif image/gif
.png image/png
.tar application/x-tar
.xlsx application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
.xml application/xml or “text/xml”
.zip application/zip
Function calling Beta
Similar to the Chat Completions API, the Assistants API supports function calling. Function calling allows you to describe functions to the Assistants API and have it intelligently return the functions that need to be called along with their arguments.
Quickstart
In this example, we’ll create a weather assistant and define two functions, get_current_temperature and get_rain_probability, as tools that the Assistant can call. Depending on the user query, the model will invoke parallel function calling if using our latest models released on or after Nov 6, 2023. In our example that uses parallel function calling, we will ask the Assistant what the weather in San Francisco is like today and the chances of rain. We also show how to output the Assistant’s response with streaming.
Step 1: Define functions
When creating your assistant, you will first define the functions under the tools param of the assistant.
node.js
node.js
const assistant = await client.beta.assistants.create({
model: “gpt-4o”,
instructions:
“You are a weather bot. Use the provided functions to answer questions.”,
tools: [
{
type: “function”,
function: {
name: “getCurrentTemperature”,
description: “Get the current temperature for a specific location”,
parameters: {
type: “object”,
properties: {
location: {
type: “string”,
description: “The city and state, e.g., San Francisco, CA”,
},
unit: {
type: “string”,
enum: [“Celsius”, “Fahrenheit”],
description:
“The temperature unit to use. Infer this from the user’s location.”,
},
},
required: [“location”, “unit”],
},
},
},
{
type: “function”,
function: {
name: “getRainProbability”,
description: “Get the probability of rain for a specific location”,
parameters: {
type: “object”,
properties: {
location: {
type: “string”,
description: “The city and state, e.g., San Francisco, CA”,
},
},
required: [“location”],
},
},
},
],
});
Step 2: Create a Thread and add Messages
Create a Thread when a user starts a conversation and add Messages to the Thread as the user asks questions.
node.js
node.js
const thread = await client.beta.threads.create();
const message = client.beta.threads.messages.create(thread.id, {
role: “user”,
content: “What’s the weather in San Francisco today and the likelihood it’ll rain?”,
});
Step 3: Initiate a Run
When you initiate a Run on a Thread containing a user Message that triggers one or more functions, the Run will enter a pending status. After it processes, the run will enter a requires_action state which you can verify by checking the Run’s status. This indicates that you need to run tools and submit their outputs to the Assistant to continue Run execution. In our case, we will see two tool_calls, which indicates that the user query resulted in parallel function calling.
Note that a runs expire ten minutes after creation. Be sure to submit your tool outputs before the 10 min mark.
You will see two tool_calls within required_action, which indicates the user query triggered parallel function calling.
json
json
{
“id”: “run_qJL1kI9xxWlfE0z1yfL0fGg9”,
…
“status”: “requires_action”,
“required_action”: {
“submit_tool_outputs”: {
“tool_calls”: [
{
“id”: “call_FthC9qRpsL5kBpwwyw6c7j4k”,
“function”: {
“arguments”: “{“location”: “San Francisco, CA”}”,
“name”: “get_rain_probability”
},
“type”: “function”
},
{
“id”: “call_RpEDoB8O0FTL9JoKTuCVFOyR”,
“function”: {
“arguments”: “{“location”: “San Francisco, CA”, “unit”: “Fahrenheit”}”,
“name”: “get_current_temperature”
},
“type”: “function”
}
]
},
…
“type”: “submit_tool_outputs”
}
}
Run object truncated here for readability
How you initiate a Run and submit tool_calls will differ depending on whether you are using streaming or not, although in both cases all tool_calls need to be submitted at the same time. You can then complete the Run by submitting the tool outputs from the functions you called. Pass each tool_call_id referenced in the required_action object to match outputs to each function call.
For the streaming case, we create an EventHandler class to handle events in the response stream and submit all tool outputs at once with the “submit tool outputs stream” helper in the Python and Node SDKs.
node.js
node.js
class EventHandler extends EventEmitter {
constructor(client) {
super();
this.client = client;
}
async onEvent(event) {
try {
console.log(event);
// Retrieve events that are denoted with ‘requires_action’
// since these will have our tool_calls
if (event.event === “thread.run.requires_action”) {
await this.handleRequiresAction(
event.data,
event.data.id,
event.data.thread_id,
);
}
} catch (error) {
console.error(“Error handling event:”, error);
}
}
async handleRequiresAction(data, runId, threadId) {
try {
const toolOutputs =
data.required_action.submit_tool_outputs.tool_calls.map((toolCall) => {
if (toolCall.function.name === “getCurrentTemperature”) {
return {
tool_call_id: toolCall.id,
output: “57”,
};
} else if (toolCall.function.name === “getRainProbability”) {
return {
tool_call_id: toolCall.id,
output: “0.06”,
};
}
});
// Submit all the tool outputs at the same time
await this.submitToolOutputs(toolOutputs, runId, threadId);
} catch (error) {
console.error(“Error processing required action:”, error);
}
}
async submitToolOutputs(toolOutputs, runId, threadId) {
try {
// Use the submitToolOutputsStream helper
const stream = this.client.beta.threads.runs.submitToolOutputsStream(
threadId,
runId,
{ tool_outputs: toolOutputs },
);
for await (const event of stream) {
this.emit(“event”, event);
}
} catch (error) {
console.error(“Error submitting tool outputs:”, error);
}
}
}
const eventHandler = new EventHandler(client);
eventHandler.on(“event”, eventHandler.onEvent.bind(eventHandler));
const stream = await client.beta.threads.runs.stream(
threadId,
{ assistant_id: assistantId },
eventHandler,
);
for await (const event of stream) {
eventHandler.emit(“event”, event);
}
Prompt engineering
This guide shares strategies and tactics for getting better results from large language models (sometimes referred to as GPT models) like GPT-4o. The methods described here can sometimes be deployed in combination for greater effect. We encourage experimentation to find the methods that work best for you.
You can also explore example prompts which showcase what our models are capable of:
Prompt examples
Explore prompt examples to learn what GPT models can do
Six strategies for getting better results
Write clear instructions
These models can’t read your mind. If outputs are too long, ask for brief replies. If outputs are too simple, ask for expert-level writing. If you dislike the format, demonstrate the format you’d like to see. The less the model has to guess at what you want, the more likely you’ll get it.
Tactics:
Include details in your query to get more relevant answers
Ask the model to adopt a persona
Use delimiters to clearly indicate distinct parts of the input
Specify the steps required to complete a task
Provide examples
Specify the desired length of the output
Provide reference text
Language models can confidently invent fake answers, especially when asked about esoteric topics or for citations and URLs. In the same way that a sheet of notes can help a student do better on a test, providing reference text to these models can help in answering with fewer fabrications.
Tactics:
Instruct the model to answer using a reference text
Instruct the model to answer with citations from a reference text
Split complex tasks into simpler subtasks
Just as it is good practice in software engineering to decompose a complex system into a set of modular components, the same is true of tasks submitted to a language model. Complex tasks tend to have higher error rates than simpler tasks. Furthermore, complex tasks can often be re-defined as a workflow of simpler tasks in which the outputs of earlier tasks are used to construct the inputs to later tasks.
Tactics:
Use intent classification to identify the most relevant instructions for a user query
For dialogue applications that require very long conversations, summarize or filter previous dialogue
Summarize long documents piecewise and construct a full summary recursively
Give the model time to “think”
If asked to multiply 17 by 28, you might not know it instantly, but can still work it out with time. Similarly, models make more reasoning errors when trying to answer right away, rather than taking time to work out an answer. Asking for a “chain of thought” before an answer can help the model reason its way toward correct answers more reliably.
Tactics:
Instruct the model to work out its own solution before rushing to a conclusion
Use inner monologue or a sequence of queries to hide the model’s reasoning process
Ask the model if it missed anything on previous passes
Use external tools
Compensate for the weaknesses of the model by feeding it the outputs of other tools. For example, a text retrieval system (sometimes called RAG or retrieval augmented generation) can tell the model about relevant documents. A code execution engine like OpenAI’s Code Interpreter can help the model do math and run code. If a task can be done more reliably or efficiently by a tool rather than by a language model, offload it to get the best of both.
Tactics:
Use embeddings-based search to implement efficient knowledge retrieval
Use code execution to perform more accurate calculations or call external APIs
Give the model access to specific functions
Test changes systematically
Improving performance is easier if you can measure it. In some cases a modification to a prompt will achieve better performance on a few isolated examples but lead to worse overall performance on a more representative set of examples. Therefore to be sure that a change is net positive to performance it may be necessary to define a comprehensive test suite (also known an as an “eval”).
Tactic:
Evaluate model outputs with reference to gold-standard answers
Tactics
Each of the strategies listed above can be instantiated with specific tactics. These tactics are meant to provide ideas for things to try. They are by no means fully comprehensive, and you should feel free to try creative ideas not represented here.
Strategy: Write clear instructions
Tactic: Include details in your query to get more relevant answers
In order to get a highly relevant response, make sure that requests provide any important details or context. Otherwise you are leaving it up to the model to guess what you mean.
Worse Better
How do I add numbers in Excel? How do I add up a row of dollar amounts in Excel? I want to do this automatically for a whole sheet of rows with all the totals ending up on the right in a column called “Total”.
Who’s president? Who was the president of Mexico in 2021, and how frequently are elections held?
Write code to calculate the Fibonacci sequence. Write a TypeScript function to efficiently calculate the Fibonacci sequence. Comment the code liberally to explain what each piece does and why it’s written that way.
Summarize the meeting notes. Summarize the meeting notes in a single paragraph. Then write a markdown list of the speakers and each of their key points. Finally, list the next steps or action items suggested by the speakers, if any.
Tactic: Ask the model to adopt a persona
The system message can be used to specify the persona used by the model in its replies.
SYSTEM
When I ask for help to write something, you will reply with a document that contains at least one joke or playful comment in every paragraph.
USER
Write a thank you note to my steel bolt vendor for getting the delivery in on time and in short notice. This made it possible for us to deliver an important order.
Open in Playground
Tactic: Use delimiters to clearly indicate distinct parts of the input
Delimiters like triple quotation marks, XML tags, section titles, etc. can help demarcate sections of text to be treated differently.
USER
Summarize the text delimited by triple quotes with a haiku.
“””insert text here”””
Open in Playground
SYSTEM
You will be provided with a pair of articles (delimited with XML tags) about the same topic. First summarize the arguments of each article. Then indicate which of them makes a better argument and explain why.
USER insert first article here insert second article here
Open in Playground
SYSTEM
You will be provided with a thesis abstract and a suggested title for it. The thesis title should give the reader a good idea of the topic of the thesis but should also be eye-catching. If the title does not meet these criteria, suggest 5 alternatives.
USER
Abstract: insert abstract here
Title: insert title here
Open in Playground
For straightforward tasks such as these, using delimiters might not make a difference in the output quality. However, the more complex a task is the more important it is to disambiguate task details. Don’t make the model work to understand exactly what you are asking of them.
Tactic: Specify the steps required to complete a task
Some tasks are best specified as a sequence of steps. Writing the steps out explicitly can make it easier for the model to follow them.
SYSTEM
Use the following step-by-step instructions to respond to user inputs.
Step 1 – The user will provide you with text in triple quotes. Summarize this text in one sentence with a prefix that says “Summary: “.
Step 2 – Translate the summary from Step 1 into Spanish, with a prefix that says “Translation: “.
USER
“””insert text here”””
Open in Playground
Tactic: Provide examples
Providing general instructions that apply to all examples is generally more efficient than demonstrating all permutations of a task by example, but in some cases providing examples may be easier. For example, if you intend for the model to copy a particular style of responding to user queries which is difficult to describe explicitly. This is known as “few-shot” prompting.
SYSTEM
Answer in a consistent style.
USER
Teach me about patience.
ASSISTANT
The river that carves the deepest valley flows from a modest spring; the grandest symphony originates from a single note; the most intricate tapestry begins with a solitary thread.
USER
Teach me about the ocean.
Open in Playground
Tactic: Specify the desired length of the output
You can ask the model to produce outputs that are of a given target length. The targeted output length can be specified in terms of the count of words, sentences, paragraphs, bullet points, etc. Note however that instructing the model to generate a specific number of words does not work with high precision. The model can more reliably generate outputs with a specific number of paragraphs or bullet points.
USER
Summarize the text delimited by triple quotes in about 50 words.
“””insert text here”””
Open in Playground
USER
Summarize the text delimited by triple quotes in 2 paragraphs.
“””insert text here”””
Open in Playground
USER
Summarize the text delimited by triple quotes in 3 bullet points.
“””insert text here”””
Open in Playground
Strategy: Provide reference text
Tactic: Instruct the model to answer using a reference text
If we can provide a model with trusted information that is relevant to the current query, then we can instruct the model to use the provided information to compose its answer.
SYSTEM
Use the provided articles delimited by triple quotes to answer questions. If the answer cannot be found in the articles, write “I could not find an answer.”
USER
Question:
Open in Playground
Given that all models have limited context windows, we need some way to dynamically lookup information that is relevant to the question being asked. Embeddings can be used to implement efficient knowledge retrieval. See the tactic “Use embeddings-based search to implement efficient knowledge retrieval” for more details on how to implement this.
Tactic: Instruct the model to answer with citations from a reference text
If the input has been supplemented with relevant knowledge, it’s straightforward to request that the model add citations to its answers by referencing passages from provided documents. Note that citations in the output can then be verified programmatically by string matching within the provided documents.
SYSTEM
You will be provided with a document delimited by triple quotes and a question. Your task is to answer the question using only the provided document and to cite the passage(s) of the document used to answer the question. If the document does not contain the information needed to answer this question then simply write: “Insufficient information.” If an answer to the question is provided, it must be annotated with a citation. Use the following format for to cite relevant passages ({“citation”: …}).
USER
“”””””
Question:
Open in Playground
Strategy: Split complex tasks into simpler subtasks
Tactic: Use intent classification to identify the most relevant instructions for a user query
For tasks in which lots of independent sets of instructions are needed to handle different cases, it can be beneficial to first classify the type of query and to use that classification to determine which instructions are needed. This can be achieved by defining fixed categories and hardcoding instructions that are relevant for handling tasks in a given category. This process can also be applied recursively to decompose a task into a sequence of stages. The advantage of this approach is that each query will contain only those instructions that are required to perform the next stage of a task which can result in lower error rates compared to using a single query to perform the whole task. This can also result in lower costs since larger prompts cost more to run (see pricing information).
Suppose for example that for a customer service application, queries could be usefully classified as follows:
SYSTEM
You will be provided with customer service queries. Classify each query into a primary category and a secondary category. Provide your output in json format with the keys: primary and secondary.
Primary categories: Billing, Technical Support, Account Management, or General Inquiry.
Billing secondary categories:
- Unsubscribe or upgrade
- Add a payment method
- Explanation for charge
- Dispute a charge
Technical Support secondary categories:
- Troubleshooting
- Device compatibility
- Software updates
Account Management secondary categories:
- Password reset
- Update personal information
- Close account
- Account security
General Inquiry secondary categories:
- Product information
- Pricing
- Feedback
- Speak to a human
USER
I need to get my internet working again.
Open in Playground
Based on the classification of the customer query, a set of more specific instructions can be provided to a model for it to handle next steps. For example, suppose the customer requires help with “troubleshooting”.
SYSTEM
You will be provided with customer service inquiries that require troubleshooting in a technical support context. Help the user by:
- Ask them to check that all cables to/from the router are connected. Note that it is common for cables to come loose over time.
- If all cables are connected and the issue persists, ask them which router model they are using
- Now you will advise them how to restart their device:
— If the model number is MTD-327J, advise them to push the red button and hold it for 5 seconds, then wait 5 minutes before testing the connection.
— If the model number is MTD-327S, advise them to unplug and replug it, then wait 5 minutes before testing the connection. - If the customer’s issue persists after restarting the device and waiting 5 minutes, connect them to IT support by outputting {“IT support requested”}.
- If the user starts asking questions that are unrelated to this topic then confirm if they would like to end the current chat about troubleshooting and classify their request according to the following scheme:
USER
I need to get my internet working again.
Open in Playground
Notice that the model has been instructed to emit special strings to indicate when the state of the conversation changes. This enables us to turn our system into a state machine where the state determines which instructions are injected. By keeping track of state, what instructions are relevant at that state, and also optionally what state transitions are allowed from that state, we can put guardrails around the user experience that would be hard to achieve with a less structured approach.
Tactic: For dialogue applications that require very long conversations, summarize or filter previous dialogue
Since models have a fixed context length, dialogue between a user and an assistant in which the entire conversation is included in the context window cannot continue indefinitely.
There are various workarounds to this problem, one of which is to summarize previous turns in the conversation. Once the size of the input reaches a predetermined threshold length, this could trigger a query that summarizes part of the conversation and the summary of the prior conversation could be included as part of the system message. Alternatively, prior conversation could be summarized asynchronously in the background throughout the entire conversation.
An alternative solution is to dynamically select previous parts of the conversation that are most relevant to the current query. See the tactic “Use embeddings-based search to implement efficient knowledge retrieval”.
Tactic: Summarize long documents piecewise and construct a full summary recursively
Since models have a fixed context length, they cannot be used to summarize a text longer than the context length minus the length of the generated summary in a single query.
To summarize a very long document such as a book we can use a sequence of queries to summarize each section of the document. Section summaries can be concatenated and summarized producing summaries of summaries. This process can proceed recursively until an entire document is summarized. If it’s necessary to use information about earlier sections in order to make sense of later sections, then a further trick that can be useful is to include a running summary of the text that precedes any given point in the book while summarizing content at that point. The effectiveness of this procedure for summarizing books has been studied in previous research by OpenAI using variants of GPT-3.
Strategy: Give models time to “think”
Tactic: Instruct the model to work out its own solution before rushing to a conclusion
Sometimes we get better results when we explicitly instruct the model to reason from first principles before coming to a conclusion. Suppose for example we want a model to evaluate a student’s solution to a math problem. The most obvious way to approach this is to simply ask the model if the student’s solution is correct or not.
SYSTEM
Determine if the student’s solution is correct or not.
USER
Problem Statement: I’m building a solar power installation and I need help working out the financials.
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 / square foot
What is the total cost for the first year of operations as a function of the number of square feet.
Student’s Solution: Let x be the size of the installation in square feet.
- Land cost: 100x
- Solar panel cost: 250x
- Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
ASSISTANT
The student’s solution is correct.
Open in Playground
But the student’s solution is actually not correct! We can get the model to successfully notice this by prompting the model to generate its own solution first.
SYSTEM
First work out your own solution to the problem. Then compare your solution to the student’s solution and evaluate if the student’s solution is correct or not. Don’t decide if the student’s solution is correct until you have done the problem yourself.
USER
Problem Statement: I’m building a solar power installation and I need help working out the financials.
- Land costs $100 / square foot
- I can buy solar panels for $250 / square foot
- I negotiated a contract for maintenance that will cost me a flat $100k per year, and an additional $10 / square foot
What is the total cost for the first year of operations as a function of the number of square feet.
Student’s Solution: Let x be the size of the installation in square feet.
- Land cost: 100x
- Solar panel cost: 250x
- Maintenance cost: 100,000 + 100x
Total cost: 100x + 250x + 100,000 + 100x = 450x + 100,000
ASSISTANT
Let x be the size of the installation in square feet. - Land cost: 100x
- Solar panel cost: 250x
- Maintenance cost: 100,000 + 10x
Total cost: 100x + 250x + 100,000 + 10x = 360x + 100,000
The student’s solution is incorrect. They made an error in the maintenance cost calculation, using 100x instead of 10x. The correct total cost for the first year of operations is 360x + 100,000.
Open in Playground
Tactic: Use inner monologue or a sequence of queries to hide the model’s reasoning process
The previous tactic demonstrates that it is sometimes important for the model to reason in detail about a problem before answering a specific question. For some applications, the reasoning process that a model uses to arrive at a final answer would be inappropriate to share with the user. For example, in tutoring applications we may want to encourage students to work out their own answers, but a model’s reasoning process about the student’s solution could reveal the answer to the student.
Inner monologue is a tactic that can be used to mitigate this. The idea of inner monologue is to instruct the model to put parts of the output that are meant to be hidden from the user into a structured format that makes parsing them easy. Then before presenting the output to the user, the output is parsed and only part of the output is made visible.
SYSTEM
Follow these steps to answer the user queries.
Step 1 – First work out your own solution to the problem. Don’t rely on the student’s solution since it may be incorrect. Enclose all your work for this step within triple quotes (“””).
Step 2 – Compare your solution to the student’s solution and evaluate if the student’s solution is correct or not. Enclose all your work for this step within triple quotes (“””).
Step 3 – If the student made a mistake, determine what hint you could give the student without giving away the answer. Enclose all your work for this step within triple quotes (“””).
Step 4 – If the student made a mistake, provide the hint from the previous step to the student (outside of triple quotes). Instead of writing “Step 4 – …” write “Hint:”.
USER
Problem Statement:
Student Solution:
Open in Playground
Alternatively, this can be achieved with a sequence of queries in which all except the last have their output hidden from the end user.
First, we can ask the model to solve the problem on its own. Since this initial query doesn’t require the student’s solution, it can be omitted. This provides the additional advantage that there is no chance that the model’s solution will be biased by the student’s attempted solution.
USER
Open in Playground
Next, we can have the model use all available information to assess the correctness of the student’s solution.
SYSTEM
Compare your solution to the student’s solution and evaluate if the student’s solution is correct or not.
USER
Problem statement: “”””””
Your solution: “”””””
Student’s solution: “”””””
Open in Playground
Finally, we can let the model use its own analysis to construct a reply in the persona of a helpful tutor.
SYSTEM
You are a math tutor. If the student made an error, offer a hint to the student in a way that does not reveal the answer. If the student did not make an error, simply offer them an encouraging comment.
USER
Problem statement: “”””””
Your solution: “”””””
Student’s solution: “”””””
Analysis: “”””””
Open in Playground
Tactic: Ask the model if it missed anything on previous passes
Suppose that we are using a model to list excerpts from a source which are relevant to a particular question. After listing each excerpt the model needs to determine if it should start writing another or if it should stop. If the source document is large, it is common for a model to stop too early and fail to list all relevant excerpts. In that case, better performance can often be obtained by prompting the model with followup queries to find any excerpts it missed on previous passes.
SYSTEM
You will be provided with a document delimited by triple quotes. Your task is to select excerpts which pertain to the following question: “What significant paradigm shifts have occurred in the history of artificial intelligence.”
Ensure that excerpts contain all relevant context needed to interpret them – in other words don’t extract small snippets that are missing important context. Provide output in JSON format as follows:
[{“excerpt”: “…”},
…
{“excerpt”: “…”}]
USER
“”””””
ASSISTANT
[{“excerpt”: “the model writes an excerpt here”},
…
{“excerpt”: “the model writes another excerpt here”}]
USER
Are there more relevant excerpts? Take care not to repeat excerpts. Also ensure that excerpts contain all relevant context needed to interpret them – in other words don’t extract small snippets that are missing important context.
Open in Playground
Strategy: Use external tools
Tactic: Use embeddings-based search to implement efficient knowledge retrieval
A model can leverage external sources of information if provided as part of its input. This can help the model to generate more informed and up-to-date responses. For example, if a user asks a question about a specific movie, it may be useful to add high quality information about the movie (e.g. actors, director, etc…) to the model’s input. Embeddings can be used to implement efficient knowledge retrieval, so that relevant information can be added to the model input dynamically at run-time.
A text embedding is a vector that can measure the relatedness between text strings. Similar or relevant strings will be closer together than unrelated strings. This fact, along with the existence of fast vector search algorithms means that embeddings can be used to implement efficient knowledge retrieval. In particular, a text corpus can be split up into chunks, and each chunk can be embedded and stored. Then a given query can be embedded and vector search can be performed to find the embedded chunks of text from the corpus that are most related to the query (i.e. closest together in the embedding space).
Example implementations can be found in the OpenAI Cookbook. See the tactic “Instruct the model to use retrieved knowledge to answer queries” for an example of how to use knowledge retrieval to minimize the likelihood that a model will make up incorrect facts.
Tactic: Use code execution to perform more accurate calculations or call external APIs
Language models cannot be relied upon to perform arithmetic or long calculations accurately on their own. In cases where this is needed, a model can be instructed to write and run code instead of making its own calculations. In particular, a model can be instructed to put code that is meant to be run into a designated format such as triple backtick. After an output is produced, the code can be extracted and run. Finally, if necessary, the output from the code execution engine (i.e. Python interpreter) can be provided as an input to the model for the next query.
SYSTEM
You can write and execute Python code by enclosing it in triple backticks, e.g. code goes here
. Use this to perform calculations.
USER
Find all real-valued roots of the following polynomial: 3x5 – 5x4 – 3x3 – 7x – 10.
Open in Playground
Another good use case for code execution is calling external APIs. If a model is instructed in the proper use of an API, it can write code that makes use of it. A model can be instructed in how to use an API by providing it with documentation and/or code samples showing how to use the API.
SYSTEM
You can write and execute Python code by enclosing it in triple backticks. Also note that you have access to the following module to help users send messages to their friends:
python import message message.write(to="John", message="Hey, want to meetup after work?")
Open in Playground
WARNING: Executing code produced by a model is not inherently safe and precautions should be taken in any application that seeks to do this. In particular, a sandboxed code execution environment is needed to limit the harm that untrusted code could cause.
Tactic: Give the model access to specific functions
The Chat Completions API allows passing a list of function descriptions in requests. This enables models to generate function arguments according to the provided schemas. Generated function arguments are returned by the API in JSON format and can be used to execute function calls. Output provided by function calls can then be fed back into a model in the following request to close the loop. This is the recommended way of using OpenAI models to call external functions. To learn more see the function calling section in our introductory text generation guide and more function calling examples in the OpenAI Cookbook.
Strategy: Test changes systematically
Sometimes it can be hard to tell whether a change — e.g., a new instruction or a new design — makes your system better or worse. Looking at a few examples may hint at which is better, but with small sample sizes it can be hard to distinguish between a true improvement or random luck. Maybe the change helps performance on some inputs, but hurts performance on others.
Evaluation procedures (or “evals”) are useful for optimizing system designs. Good evals are:
Representative of real-world usage (or at least diverse)
Contain many test cases for greater statistical power (see table below for guidelines)
Easy to automate or repeat
DIFFERENCE TO DETECT SAMPLE SIZE NEEDED FOR 95% CONFIDENCE
30% ~10
10% ~100
3% ~1,000
1% ~10,000
Evaluation of outputs can be done by computers, humans, or a mix. Computers can automate evals with objective criteria (e.g., questions with single correct answers) as well as some subjective or fuzzy criteria, in which model outputs are evaluated by other model queries. OpenAI Evals is an open-source software framework that provides tools for creating automated evals.
Model-based evals can be useful when there exists a range of possible outputs that would be considered equally high in quality (e.g. for questions with long answers). The boundary between what can be realistically evaluated with a model-based eval and what requires a human to evaluate is fuzzy and is constantly shifting as models become more capable. We encourage experimentation to figure out how well model-based evals can work for your use case.
Tactic: Evaluate model outputs with reference to gold-standard answers
Suppose it is known that the correct answer to a question should make reference to a specific set of known facts. Then we can use a model query to count how many of the required facts are included in the answer.
For example, using the following system message:
SYSTEM
You will be provided with text delimited by triple quotes that is supposed to be the answer to a question. Check if the following pieces of information are directly contained in the answer:
- Neil Armstrong was the first person to walk on the moon.
- The date Neil Armstrong first walked on the moon was July 21, 1969.
For each of these points perform the following steps:
1 – Restate the point.
2 – Provide a citation from the answer which is closest to this point.
3 – Consider if someone reading the citation who doesn’t know the topic could directly infer the point. Explain why or why not before making up your mind.
4 – Write “yes” if the answer to 3 was yes, otherwise write “no”.
Finally, provide a count of how many “yes” answers there are. Provide this count as {“count”: }.
Here’s an example input where both points are satisfied:
SYSTEM
USER
“””Neil Armstrong is famous for being the first human to set foot on the Moon. This historic event took place on July 21, 1969, during the Apollo 11 mission.”””
Open in Playground
Here’s an example input where only one point is satisfied:
SYSTEM
USER
“””Neil Armstrong made history when he stepped off the lunar module, becoming the first person to walk on the moon.”””
Open in Playground
Here’s an example input where none are satisfied:
SYSTEM
USER
“””In the summer of ’69, a voyage grand,
Apollo 11, bold as legend’s hand.
Armstrong took a step, history unfurled,
“One small step,” he said, for a new world.”””
Open in Playground
There are many possible variants on this type of model-based eval. Consider the following variation which tracks the kind of overlap between the candidate answer and the gold-standard answer, and also tracks whether the candidate answer contradicts any part of the gold-standard answer.
SYSTEM
Use the following steps to respond to user inputs. Fully restate each step before proceeding. i.e. “Step 1: Reason…”.
Step 1: Reason step-by-step about whether the information in the submitted answer compared to the expert answer is either: disjoint, equal, a subset, a superset, or overlapping (i.e. some intersection but not subset/superset).
Step 2: Reason step-by-step about whether the submitted answer contradicts any aspect of the expert answer.
Step 3: Output a JSON object structured like: {“type_of_overlap”: “disjoint” or “equal” or “subset” or “superset” or “overlapping”, “contradiction”: true or false}
Here’s an example input with a substandard answer which nonetheless does not contradict the expert answer:
SYSTEM
USER
Question: “””What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time.”””
Submitted Answer: “””Didn’t he walk on the moon or something?”””
Expert Answer: “””Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969.”””
Open in Playground
Here’s an example input with answer that directly contradicts the expert answer:
SYSTEM
USER
Question: “””What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time.”””
Submitted Answer: “””On the 21st of July 1969, Neil Armstrong became the second person to walk on the moon, following after Buzz Aldrin.”””
Expert Answer: “””Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969.”””
Open in Playground
Here’s an example input with a correct answer that also provides a bit more detail than is necessary:
SYSTEM
USER
Question: “””What event is Neil Armstrong most famous for and on what date did it occur? Assume UTC time.”””
Submitted Answer: “””At approximately 02:56 UTC on July 21st 1969, Neil Armstrong became the first human to set foot on the lunar surface, marking a monumental achievement in human history.”””
Expert Answer: “””Neil Armstrong is most famous for being the first person to walk on the moon. This historic event occurred on July 21, 1969.”””