Skip to content

Program crashes when attempting computer.display.view() in os mode #1116

@hackley

Description

@hackley

Describe the bug

When running in OS mode, calling computer.display.view() crashes the program due to a type mismatch.

Reproduce

$ interpreter --os

> Go to amazon and order me a small pack of yellow legal pads. Use your best judgement as to which one to order, I trust you completely. I’m already logged in on google chrome, so it would be best to use that browser.

Expected behavior

Screenshot should be processed and the program moves on to the next step

Screenshots

No response

Open Interpreter version

0.2.3

Python version

3.11.0

Operating System name and version

macOS 13.6

Additional context

Full terminal output:

$ interpreter --os

▌ OS Control enabled

> go to amazon and order me a small pack of yellow legal pads. use your best judgement as to which one to order, I trust you completely. I’m already logged in on google chrome, so it would be best to use that browser.

  To begin the task, I'll open Google Chrome, navigate to Amazon and search for a small pack of yellow legal pads. Then, I'll use my
  best judgement to select and order a product.

  Let's start by opening Google Chrome using a hotkey command. After that, I will check the screen to confirm that Chrome is open and
  ready to use.



  import time

  # Open Google Chrome
  computer.keyboard.hotkey('command', 'space')
  time.sleep(1)  # Wait a bit for the spotlight search to open
  computer.keyboard.write('Google Chrome')
  time.sleep(1)  # Wait for the search results to show up
  computer.keyboard.press('enter')
  time.sleep(2)  # Wait for Chrome to open

  # Capture the screen to verify the active app
  computer.display.view()


[IPKernelApp] WARNING | Parent appears to have exited, shutting down.
[IPKernelApp] WARNING | Parent appears to have exited, shutting down.

        Python Version: 3.11.0
        Pip Version: 23.2.1
        Open-interpreter Version: cmd: Open Interpreter 0.2.3 New Computer Update
, pkg: 0.2.3
        OS Version and Architecture: macOS-13.6-arm64-arm-64bit
        CPU Info: arm
        RAM Info: 16.00 GB, used: 6.44, free: 0.37


        # Interpreter Info

        Vision: True
        Model: gpt-4-vision-preview
        Function calling: False
        Context window: 110000
        Max tokens: 4096

        Auto run: True
        API base: None
        Offline: False

        Curl output: Not local

        # Messages

        System Message: You are Open Interpreter, a world-class programmer that can complete any goal by executing code.

When you write code, it will be executed **on the user's machine**. The user has given you **full and complete permission** to execute
any code necessary to complete the task.

When a user refers to a filename, they're likely referring to an existing file in the directory you're currently executing code in.

In general, try to make plans with as few steps as possible. As for actually executing code to carry out that plan, **don't try to do
everything in one code block.** You should try something, print information about it, then continue from there in tiny, informed steps.
You will never get it on the first try, and attempting it in one go will often lead to errors you cant see.

Manually summarize text.

Do not try to write code that attempts the entire task at once, and verify at each step whether or not you're on track.

# Computer

You may use the `computer` Python module to complete tasks:

```python
computer.browser.search(query) # Silently searches Google for the query, returns result. The user's browser is unaffected. (does not
open a browser!)

computer.display.view() # Shows you what's on the screen, returns a `pil_image` `in case you need it (rarely). **You almost always want
to do this first!**

computer.keyboard.hotkey(" ", "command") # Opens spotlight (very useful)
computer.keyboard.write("hello")

# Use this to click text:
computer.mouse.click("text onscreen") # This clicks on the UI element with that text. Use this **frequently** and get creative! To click
a video, you could pass the *timestamp* (which is usually written on the thumbnail) into this.
# Use this to click an icon, button, or other symbol:
computer.mouse.click(icon="gear icon") # Moves mouse to the icon with that description. Use this very often.

computer.mouse.move("open recent >") # This moves the mouse over the UI element with that text. Many dropdowns will disappear if you
click them. You have to hover over items to reveal more.
computer.mouse.click(x=500, y=500) # Use this very, very rarely. It's highly inaccurate

computer.mouse.scroll(-10) # Scrolls down. If you don't find some text on screen that you expected to be there, you probably want to do
this
x, y = computer.display.center() # Get your bearings

computer.clipboard.view() # Returns contents of clipboard
computer.os.get_selected_text() # Use frequently. If editing text, the user often wants this

{{
import platform
if platform.system() == 'Darwin':
        print('''
computer.browser.search(query) # Google search results will be returned from this function as a string
computer.files.edit(path_to_file, original_text, replacement_text) # Edit a file
computer.calendar.create_event(title="Meeting", start_date=datetime.datetime.now(), end=datetime.datetime.now() +
datetime.timedelta(hours=1), notes="Note", location="") # Creates a calendar event
computer.calendar.get_events(start_date=datetime.date.today(), end_date=None) # Get events between dates. If end_date is None, only gets
events for start_date
computer.calendar.delete_event(event_title="Meeting", start_date=datetime.datetime) # Delete a specific event with a matching title and
start date, you may need to get use get_events() to find the specific event object first
computer.contacts.get_phone_number("John Doe")
computer.contacts.get_email_address("John Doe")
computer.mail.send("[email protected]", "Meeting Reminder", "Reminder that our meeting is at 3pm today.", ["path/to/attachment.pdf",
"path/to/attachment2.pdf"]) # Send an email with a optional attachments
computer.mail.get(4, unread=True) # Returns the {number} of unread emails, or all emails if False is passed
computer.mail.unread_count() # Returns the number of unread emails
computer.sms.send("555-123-4567", "Hello from the computer!") # Send a text message. MUST be a phone number, so use
computer.contacts.get_phone_number frequently here
''')
}}

For rare and complex mouse actions, consider using computer vision libraries on the computer.display.view() pil_image to produce a
list of coordinates for the mouse to move/drag to.

If the user highlighted text in an editor, then asked you to modify it, they probably want you to keyboard.write over their version of
the text.

Tasks are 100% computer-based. DO NOT simply write long messages to the user to complete tasks. You MUST put your text back into the
program they're using to deliver your text!

Clicking text is the most reliable way to use the mouse— for example, clicking a URL's text you see in the URL bar, or some textarea's
placeholder text (like "Search" to get into a search bar).

Applescript might be best for some tasks.

If you use plt.show(), the resulting image will be sent to you. However, if you use PIL.Image.show(), the resulting image will NOT
be sent to you.

It is very important to make sure you are focused on the right application and window. Often, your first command should always be to
explicitly switch to the correct application.

When searching the web, use query parameters. For example, https://www.amazon.com/s?k=monitor

Try multiple methods before saying the task is impossible. You can do it!

Critical Routine Procedure for Multi-Step Tasks

Include computer.display.view() after a 2 second delay at the end of every code block to verify your progress, then answer these
questions in extreme detail:

  1. Generally, what is happening on-screen?
  2. What is the active app?
  3. What hotkeys does this app support that might get be closer to my goal?
  4. What text areas are active, if any?
  5. What text is selected?
  6. What options could you take next to get closer to your goal?

{{

Add window information

try:

import pywinctl

active_window = pywinctl.getActiveWindow()

if active_window:
    app_info = ""

    if "_appName" in active_window.__dict__:
        app_info += (
            "Active Application: " + active_window.__dict__["_appName"]
        )

    if hasattr(active_window, "title"):
        app_info += "\n" + "Active Window Title: " + active_window.title
    elif "_winTitle" in active_window.__dict__:
        app_info += (
            "\n"
            + "Active Window Title:"
            + active_window.__dict__["_winTitle"]
        )

    if app_info != "":
        print(
            "\n\n# Important Information:\n"
            + app_info
            + "\n(If you need to be in another active application to help the user, you need to switch to it.)"
        )

except:
# Non blocking
pass

}}

    {'role': 'user', 'type': 'message', 'content': 'go to amazon and order me a small pack of yellow legal pads. use your best

judgement as to which one to order, I trust you completely. I’m already logged in on google chrome, so it would be best to use that
browser.'}

{'role': 'assistant', 'type': 'message', 'content': "To begin the task, I'll open Google Chrome, navigate to Amazon and search for a
small pack of yellow legal pads. Then, I'll use my best judgement to select and order a product. \n\nLet's start by opening Google
Chrome using a hotkey command. After that, I will check the screen to confirm that Chrome is open and ready to use. \n\n"}

{'role': 'assistant', 'type': 'code', 'format': 'python', 'content': "\nimport time\n\n# Open Google
Chrome\ncomputer.keyboard.hotkey('command', 'space')\ntime.sleep(1) # Wait a bit for the spotlight search to
open\ncomputer.keyboard.write('Google Chrome')\ntime.sleep(1) # Wait for the search results to show
up\ncomputer.keyboard.press('enter')\ntime.sleep(2) # Wait for Chrome to open\n\n# Capture the screen to verify the active
app\ncomputer.display.view()\n"}

{'role': 'computer', 'type': 'console', 'format': 'output', 'content': ''}

{'role': 'computer', 'type': 'image', 'format': 'base64.png', 'content':
'iVBORw0KGgoAAAANSUhEUgAAFAAAAAtACAIAAACnmQKaAAAKrGlDQ1BJQ0MgUHJvZmlsZQAAeJyVlwdUk9kSgO//p4eElhABKaE36S2AlNBDkV4tEJIAocQYCCB2ZHEFVhQREVQ
WZFFEwVUpsqKCBduiYMG+QRYBZV0s2LC8HziE3X3nvXfe/Oee+f75587MvefenAkAZCpbKEyD5QFIF2SKQn3c6dExsXTcKICBKiACBwDYnAwhMzg4ACAyp/8u7+4CaFrfMp2O9e/
f/6socHkZHACgYIQTuBmcdIRPIuMl...ywNdu+FW428UnGMi6xs656/w0Gl4eisj7VI0MwA8xM9xU8dGPmezohpapD3nEVmLuHSPp/rNI8tvWekb4shv1Q/ng0onuCKwznmftbJH
vMf11hrRXk2ZMW27V1CTgEZsH9g6IMsJR4fG9MHrA6pESjd2ggd9QRvi3pJvXlgSlREU2LDnsik3jF7ApGFHDsV7a4HE+Gkjm5mjOax6496LAnh9rRIDzZnu2m9jfmlmJb813QZG
DmCZKMytJ9pITTgSG8OKzy3WHAD+/wGvBT7YmM9hzwAAAABJRU5ErkJggg=='}

Traceback (most recent call last):
File "/Users/nathan/.pyenv/versions/3.11.0/bin/interpreter", line 8, in
sys.exit(main())
^^^^^^
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/terminal_interface/start_terminal_interface.py",
line 437, in main
start_terminal_interface(interpreter)
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/terminal_interface/start_terminal_interface.py",
line 415, in start_terminal_interface
interpreter.chat()
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/core/core.py", line 167, in chat
for _ in self._streaming_chat(message=message, display=display):
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/core/core.py", line 196, in _streaming_chat
yield from terminal_interface(self, message)
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/terminal_interface/terminal_interface.py", line
136, in terminal_interface
for chunk in interpreter.chat(message, display=False, stream=True):
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/core/core.py", line 235, in _streaming_chat
yield from self._respond_and_store()
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/core/core.py", line 281, in _respond_and_store
for chunk in respond(self):
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/core/respond.py", line 69, in respond
for chunk in interpreter.llm.run(messages_for_llm):
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/core/llm/llm.py", line 97, in run
messages = convert_to_openai_messages(
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/Users/nathan/.pyenv/versions/3.11.0/lib/python3.11/site-packages/interpreter/core/llm/utils/convert_to_openai_messages.py",
line 173, in convert_to_openai_messages
new_message["content"] = new_message["content"].strip()
^^^^^^^^^^^^^^^^^^^^^^^^^^^^
AttributeError: 'list' object has no attribute 'strip'

Metadata

Metadata

Assignees

No one assigned

    Labels

    BugSomething isn't working

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions