-
Notifications
You must be signed in to change notification settings - Fork 5.2k
Description
Describe the bug
When running in OS mode, calling computer.display.view() crashes the program due to a type mismatch.
Reproduce
$ interpreter --os
> Go to amazon and order me a small pack of yellow legal pads. Use your best judgement as to which one to order, I trust you completely. I’m already logged in on google chrome, so it would be best to use that browser.
Expected behavior
Screenshot should be processed and the program moves on to the next step
Screenshots
No response
Open Interpreter version
0.2.3
Python version
3.11.0
Operating System name and version
macOS 13.6
Additional context
Full terminal output:
$ interpreter --os
▌ OS Control enabled
> go to amazon and order me a small pack of yellow legal pads. use your best judgement as to which one to order, I trust you completely. I’m already logged in on google chrome, so it would be best to use that browser.
To begin the task, I'll open Google Chrome, navigate to Amazon and search for a small pack of yellow legal pads. Then, I'll use my
best judgement to select and order a product.
Let's start by opening Google Chrome using a hotkey command. After that, I will check the screen to confirm that Chrome is open and
ready to use.
import time
# Open Google Chrome
computer.keyboard.hotkey('command', 'space')
time.sleep(1) # Wait a bit for the spotlight search to open
computer.keyboard.write('Google Chrome')
time.sleep(1) # Wait for the search results to show up
computer.keyboard.press('enter')
time.sleep(2) # Wait for Chrome to open
# Capture the screen to verify the active app
computer.display.view()
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.
Python Version: 3.11.0
Pip Version: 23.2.1
Open-interpreter Version: cmd: Open Interpreter 0.2.3 New Computer Update
, pkg: 0.2.3
OS Version and Architecture: macOS-13.6-arm64-arm-64bit
CPU Info: arm
RAM Info: 16.00 GB, used: 6.44, free: 0.37
# Interpreter Info
Vision: True
Model: gpt-4-vision-preview
Function calling: False
Context window: 110000
Max tokens: 4096
Auto run: True
API base: None
Offline: False
Curl output: Not local
# Messages
System Message: You are Open Interpreter, a world-class programmer that can complete any goal by executing code.
When you write code, it will be executed **on the user's machine**. The user has given you **full and complete permission** to execute
any code necessary to complete the task.
When a user refers to a filename, they're likely referring to an existing file in the directory you're currently executing code in.
In general, try to make plans with as few steps as possible. As for actually executing code to carry out that plan, **don't try to do
everything in one code block.** You should try something, print information about it, then continue from there in tiny, informed steps.
You will never get it on the first try, and attempting it in one go will often lead to errors you cant see.
Manually summarize text.
Do not try to write code that attempts the entire task at once, and verify at each step whether or not you're on track.
# Computer
You may use the `computer` Python module to complete tasks:
```python
computer.browser.search(query) # Silently searches Google for the query, returns result. The user's browser is unaffected. (does not
open a browser!)
computer.display.view() # Shows you what's on the screen, returns a `pil_image` `in case you need it (rarely). **You almost always want
to do this first!**
computer.keyboard.hotkey(" ", "command") # Opens spotlight (very useful)
computer.keyboard.write("hello")
# Use this to click text:
computer.mouse.click("text onscreen") # This clicks on the UI element with that text. Use this **frequently** and get creative! To click
a video, you could pass the *timestamp* (which is usually written on the thumbnail) into this.
# Use this to click an icon, button, or other symbol:
computer.mouse.click(icon="gear icon") # Moves mouse to the icon with that description. Use this very often.
computer.mouse.move("open recent >") # This moves the mouse over the UI element with that text. Many dropdowns will disappear if you
click them. You have to hover over items to reveal more.
computer.mouse.click(x=500, y=500) # Use this very, very rarely. It's highly inaccurate
computer.mouse.scroll(-10) # Scrolls down. If you don't find some text on screen that you expected to be there, you probably want to do
this
x, y = computer.display.center() # Get your bearings
computer.clipboard.view() # Returns contents of clipboard
computer.os.get_selected_text() # Use frequently. If editing text, the user often wants this
{{
import platform
if platform.system() == 'Darwin':
print('''
computer.browser.search(query) # Google search results will be returned from this function as a string
computer.files.edit(path_to_file, original_text, replacement_text) # Edit a file
computer.calendar.create_event(title="Meeting", start_date=datetime.datetime.now(), end=datetime.datetime.now() +
datetime.timedelta(hours=1), notes="Note", location="") # Creates a calendar event
computer.calendar.get_events(start_date=datetime.date.today(), end_date=None) # Get events between dates. If end_date is None, only gets
events for start_date
computer.calendar.delete_event(event_title="Meeting", start_date=datetime.datetime) # Delete a specific event with a matching title and
start date, you may need to get use get_events() to find the specific event object first
computer.contacts.get_phone_number("John Doe")
computer.contacts.get_email_address("John Doe")
computer.mail.send("[email protected]", "Meeting Reminder", "Reminder that our meeting is at 3pm today.", ["path/to/attachment.pdf",
"path/to/attachment2.pdf"]) # Send an email with a optional attachments
computer.mail.get(4, unread=True) # Returns the {number} of unread emails, or all emails if False is passed
computer.mail.unread_count() # Returns the number of unread emails
computer.sms.send("555-123-4567", "Hello from the computer!") # Send a text message. MUST be a phone number, so use
computer.contacts.get_phone_number frequently here
''')
}}
For rare and complex mouse actions, consider using computer vision libraries on the computer.display.view() pil_image to produce a
list of coordinates for the mouse to move/drag to.
If the user highlighted text in an editor, then asked you to modify it, they probably want you to keyboard.write over their version of
the text.
Tasks are 100% computer-based. DO NOT simply write long messages to the user to complete tasks. You MUST put your text back into the
program they're using to deliver your text!
Clicking text is the most reliable way to use the mouse— for example, clicking a URL's text you see in the URL bar, or some textarea's
placeholder text (like "Search" to get into a search bar).
Applescript might be best for some tasks.
If you use plt.show(), the resulting image will be sent to you. However, if you use PIL.Image.show(), the resulting image will NOT
be sent to you.
It is very important to make sure you are focused on the right application and window. Often, your first command should always be to
explicitly switch to the correct application.
When searching the web, use query parameters. For example, https://www.amazon.com/s?k=monitor
Try multiple methods before saying the task is impossible. You can do it!
Critical Routine Procedure for Multi-Step Tasks
Include computer.display.view() after a 2 second delay at the end of every code block to verify your progress, then answer these
questions in extreme detail:
- Generally, what is happening on-screen?
- What is the active app?
- What hotkeys does this app support that might get be closer to my goal?
- What text areas are active, if any?
- What text is selected?
- What options could you take next to get closer to your goal?
{{
Add window information
try:
import pywinctl
active_window = pywinctl.getActiveWindow()
if active_window:
app_info = ""
if "_appName" in active_window.__dict__:
app_info += (
"Active Application: " + active_window.__dict__["_appName"]
)
if hasattr(active_window, "title"):
app_info += "\n" + "Active Window Title: " + active_window.title
elif "_winTitle" in active_window.__dict__:
app_info += (
"\n"
+ "Active Window Title:"
+ active_window.__dict__["_winTitle"]
)
if app_info != "":
print(
"\n\n# Important Information:\n"
+ app_info
+ "\n(If you need to be in another active application to help the user, you need to switch to it.)"
)
except:
# Non blocking
pass
}}
{'role': 'user', 'type': 'message', 'content': 'go to amazon and order me a small pack of yellow legal pads. use your best
judgement as to which one to order, I trust you completely. I’m already logged in on google chrome, so it would be best to use that
browser.'}
{'role': 'assistant', 'type': 'message', 'content': "To begin the task, I'll open Google Chrome, navigate to Amazon and search for a
small pack of yellow legal pads. Then, I'll use my best judgement to select and order a product. \n\nLet's start by opening Google
Chrome using a hotkey command. After that, I will check the screen to confirm that Chrome is open and ready to use. \n\n"}
{'role': 'assistant', 'type': 'code', 'format': 'python', 'content': "\nimport time\n\n# Open Google
Chrome\ncomputer.keyboard.hotkey('command', 'space')\ntime.sleep(1) # Wait a bit for the spotlight search to
open\ncomputer.keyboard.write('Google Chrome')\ntime.sleep(1) # Wait for the search results to show
up\ncomputer.keyboard.press('enter')\ntime.sleep(2) # Wait for Chrome to open\n\n# Capture the screen to verify the active
app\ncomputer.display.view()\n"}
{'role': 'computer', 'type': 'console', 'format': 'output', 'content': ''}
{'role': 'computer', 'type': 'image', 'format': 'base64.png', 'content':
'iVBORw0KGgoAAAANSUhEUgAAFAAAAAtACAIAAACnmQKaAAAKrGlDQ1BJQ0MgUHJvZmlsZQAAeJyVlwdUk9kSgO//p4eElhABKaE36S2AlNBDkV4tEJIAocQYCCB2ZHEFVhQREVQ
WZFFEwVUpsqKCBduiYMG+QRYBZV0s2LC8HziE3X3nvXfe/Oee+f75587MvefenAkAZCpbKEyD5QFIF2SKQn3c6dExsXTcKICBKiACBwDYnAwhMzg4ACAyp/8u7+4CaFrfMp2O9e/
f/6socHkZHACgYIQTuBmcdIRPIuMl...ywNdu+FW428UnGMi6xs656/w0Gl4eisj7VI0MwA8xM9xU8dGPmezohpapD3nEVmLuHSPp/rNI8tvWekb4shv1Q/ng0onuCKwznmftbJH
vMf11hrRXk2ZMW27V1CTgEZsH9g6IMsJR4fG9MHrA6pESjd2ggd9QRvi3pJvXlgSlREU2LDnsik3jF7ApGFHDsV7a4HE+Gkjm5mjOax6496LAnh9rRIDzZnu2m9jfmlmJb813QZG
DmCZKMytJ9pITTgSG8OKzy3WHAD+/wGvBT7YmM9hzwAAAABJRU5ErkJggg=='}
Traceback (most recent call last):
File "/Users/nathan/.pyenv/versions/3.11.0/bin/interpreter", line 8, in
sys.exit(main())
^^^^^^
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/terminal_interface/start_terminal_interface.py",
line 437, in main
start_terminal_interface(interpreter)
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/terminal_interface/start_terminal_interface.py",
line 415, in start_terminal_interface
interpreter.chat()
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/core/core.py", line 167, in chat
for _ in self._streaming_chat(message=message, display=display):
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/core/core.py", line 196, in _streaming_chat
yield from terminal_interface(self, message)
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/terminal_interface/terminal_interface.py", line
136, in terminal_interface
for chunk in interpreter.chat(message, display=False, stream=True):
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/core/core.py", line 235, in _streaming_chat
yield from self._respond_and_store()
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/core/core.py", line 281, in _respond_and_store
for chunk in respond(self):
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/core/respond.py", line 69, in respond
for chunk in interpreter.llm.run(messages_for_llm):
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/core/llm/llm.py", line 97, in run
messages = convert_to_openai_messages(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/core/llm/utils/convert_to_openai_messages.py",
line 173, in convert_to_openai_messages
new_message["content"] = new_message["content"].strip()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'strip'