Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
4 changes: 2 additions & 2 deletions docs/guides/profiles.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -18,9 +18,9 @@ from interpreter import interpreter
interpreter.os = True
interpreter.llm.supports_vision = True

interpreter.llm.model = "gpt-4-vision-preview"
interpreter.llm.model = "gpt-4o"

interpreter.llm.supports_functions = False
interpreter.llm.supports_functions = True
interpreter.llm.context_window = 110000
interpreter.llm.max_tokens = 4096
interpreter.auto_run = True
Expand Down
10 changes: 5 additions & 5 deletions docs/settings/all-settings.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -280,17 +280,17 @@ llm:

### Vision Mode

Enables vision mode, which adds some special instructions to the prompt and switches to `gpt-4-vision-preview`.
Enables vision mode, which adds some special instructions to the prompt and switches to `gpt-4o`.

<CodeGroup>
```bash Terminal
interpreter --vision
```

```python Python
interpreter.llm.model = "gpt-4-vision-preview" # Any vision supporting model
interpreter.llm.model = "gpt-4o" # Any vision supporting model
interpreter.llm.supports_vision = True
interpreter.llm.supports_functions = False # If model doesn't support functions, which is the case with gpt-4-vision.
interpreter.llm.supports_functions = True

interpreter.custom_instructions = """The user will show you an image of the code you write. You can view images directly.
For HTML: This will be run STATELESSLY. You may NEVER write '<!-- previous code here... --!>' or `<!-- header will go here -->` or anything like that. It is CRITICAL TO NEVER WRITE PLACEHOLDERS. Placeholders will BREAK it. You must write the FULL HTML CODE EVERY TIME. Therefore you cannot write HTML piecemeal—write all the HTML, CSS, and possibly Javascript **in one step, in one code block**. The user will help you review it visually.
Expand All @@ -302,10 +302,10 @@ If you use `plt.show()`, the resulting image will be sent to you. However, if yo
loop: True

llm:
model: "gpt-4-vision-preview"
model: "gpt-4o"
temperature: 0
supports_vision: True
supports_functions: False
supports_functions: True
context_window: 110000
max_tokens: 4096
custom_instructions: >
Expand Down
2 changes: 1 addition & 1 deletion docs/usage/terminal/vision.mdx
Original file line number Diff line number Diff line change
Expand Up @@ -8,4 +8,4 @@ To use vision (highly experimental), run the following command:
interpreter --vision
```

If a file path to an image is found in your input, it will be loaded into the vision model (`gpt-4-vision-preview` for now).
If a file path to an image is found in your input, it will be loaded into the vision model (`gpt-4o` for now).
4 changes: 2 additions & 2 deletions interpreter/terminal_interface/profiles/defaults/os.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,11 +6,11 @@
interpreter.llm.supports_vision = True
# interpreter.shrink_images = True # Faster but less accurate

interpreter.llm.model = "gpt-4-vision-preview"
interpreter.llm.model = "gpt-4o"

interpreter.computer.import_computer_api = True

interpreter.llm.supports_functions = False
interpreter.llm.supports_functions = True
interpreter.llm.context_window = 110000
interpreter.llm.max_tokens = 4096
interpreter.auto_run = True
Expand Down
4 changes: 2 additions & 2 deletions interpreter/terminal_interface/profiles/defaults/vision.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -3,10 +3,10 @@
loop: True

llm:
model: "gpt-4-vision-preview"
model: "gpt-4o"
temperature: 0
supports_vision: True
supports_functions: False
supports_functions: True
context_window: 110000
max_tokens: 4096
custom_instructions: >
Expand Down
4 changes: 2 additions & 2 deletions tests/test_interpreter.py
Original file line number Diff line number Diff line change
Expand Up @@ -662,9 +662,9 @@ def test_vision():
]

interpreter.llm.supports_vision = True
interpreter.llm.model = "gpt-4-vision-preview"
interpreter.llm.model = "gpt-4o"
interpreter.system_message += "\nThe user will show you an image of the code you write. You can view images directly.\n\nFor HTML: This will be run STATELESSLY. You may NEVER write '<!-- previous code here... --!>' or `<!-- header will go here -->` or anything like that. It is CRITICAL TO NEVER WRITE PLACEHOLDERS. Placeholders will BREAK it. You must write the FULL HTML CODE EVERY TIME. Therefore you cannot write HTML piecemeal—write all the HTML, CSS, and possibly Javascript **in one step, in one code block**. The user will help you review it visually.\nIf the user submits a filepath, you will also see the image. The filepath and user image will both be in the user's message.\n\nIf you use `plt.show()`, the resulting image will be sent to you. However, if you use `PIL.Image.show()`, the resulting image will NOT be sent to you."
interpreter.llm.supports_functions = False
interpreter.llm.supports_functions = True
interpreter.llm.context_window = 110000
interpreter.llm.max_tokens = 4096
interpreter.loop = True
Expand Down