Skip to content

Use llama2 locally, keep requesting '/chat/completions' got 404 in ollama serve. #1052

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
2868151647 opened this issue Apr 12, 2024 · 16 comments
Labels
bug Something isn't working severity:low Minor issues or affecting single user

Comments

@2868151647
Copy link

Describe the bug

I use llama2, send 'hello' in frontend and saw keep requesting '/api/embeddings' with httpstate 200, mingled with request '/chat/completions' with httpstate 404 on ollama serve.
And i saw 99 steps log output on backend serve.

Setup and configuration

Current version:

commit e9121b78fed0b5ef36718ca0bf59588c0b094b86 (HEAD -> main)
Author: Xingyao Wang <[email protected]>
Date:   Sun Apr 7 16:07:59 2024 +0800

use .getLogger to avoid same logging message to get printed twice (#850)

My config.toml and environment vars (be sure to redact API keys):

LLM Model name: ollama/llama2
LLM API key: ''
LLM Base URL: localhost:11434
LLM Embedding Model: llama2
local model URL: localhost:11434
workspace: ./workspace

notice: i use real ip rather than localhost to solve communication problems between win10 and WSL2

My model and agent (you can see these settings in the UI):

  • Model:ollama/llama2
  • Agent: MonologueAgent

Commands I ran to install and run OpenDevin:

make setup-config
make start-backend
make start-frontend

Steps to Reproduce:
1.set config
2.start backend ,frontend and ollama serve
3.input 'hello' on frontend and send

Logs, error messages, and screenshots:
Traceback (most recent call last):
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1437, in function_with_retries
response = original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 387, in _completion
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 335, in _completion
deployment = self.get_available_deployment(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 2443, in get_available_deployment
raise ValueError(f"No healthy deployment available, passed model={model}")
ValueError: No healthy deployment available, passed model=gpt-3.5-turbo-1106

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/OpenDevin/agenthub/monologue_agent/utils/monologue.py", line 70, in condense
resp = llm.completion(messages=messages)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/OpenDevin/opendevin/llm/llm.py", line 58, in wrapper
resp = completion_unwrapped(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 329, in completion
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 326, in completion
response = self.function_with_fallbacks(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1420, in function_with_fallbacks
raise original_exception
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1345, in function_with_fallbacks
response = self.function_with_retries(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1497, in function_with_retries
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1463, in function_with_retries
response = original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 387, in _completion
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 335, in _completion
deployment = self.get_available_deployment(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 2443, in get_available_deployment
raise ValueError(f"No healthy deployment available, passed model={model}")
ValueError: No healthy deployment available, passed model=gpt-3.5-turbo-1106

ERROR:
Error condensing thoughts: No healthy deployment available, passed model=gpt-3.5-turbo-1106


Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

Traceback (most recent call last):
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/llms/openai.py", line 414, in completion
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/llms/openai.py", line 373, in completion
response = openai_client.chat.completions.create(**data, timeout=timeout) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/_utils/_utils.py", line 275, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 667, in create
return self._post(
^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/_base_client.py", line 1213, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/_base_client.py", line 902, in request
return self._request(
^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/_base_client.py", line 993, in _request
raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: 404 page not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/main.py", line 997, in completion
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/main.py", line 970, in completion
response = openai_chat_completions.completion(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/llms/openai.py", line 420, in completion
raise OpenAIError(status_code=e.status_code, message=str(e))
litellm.llms.openai.OpenAIError: 404 page not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/OpenDevin/agenthub/monologue_agent/utils/monologue.py", line 70, in condense
resp = llm.completion(messages=messages)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/OpenDevin/opendevin/llm/llm.py", line 58, in wrapper
resp = completion_unwrapped(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 329, in completion
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 326, in completion
response = self.function_with_fallbacks(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1420, in function_with_fallbacks
raise original_exception
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1345, in function_with_fallbacks
response = self.function_with_retries(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1497, in function_with_retries
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1463, in function_with_retries
response = original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 387, in _completion
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 370, in _completion
response = litellm.completion(
^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/utils.py", line 2947, in wrapper
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/utils.py", line 2845, in wrapper
result = original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/main.py", line 2129, in completion
raise exception_type(
^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/utils.py", line 8526, in exception_type
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/utils.py", line 7344, in exception_type
raise NotFoundError(
litellm.exceptions.NotFoundError: OpenAIException - 404 page not found

ERROR:
Error condensing thoughts: OpenAIException - 404 page not found

Additional Context

  • use WSL2 on win10
@2868151647 2868151647 added the bug Something isn't working label Apr 12, 2024
@SmartManoj
Copy link
Contributor

SmartManoj commented Apr 12, 2024

ValueError: No healthy deployment available, passed model=gpt-3.5-turbo-1106

seems LLM_MODEL is not configured correcting

--

My config.toml and environment vars (be sure to redact API keys):

LLM Model

Underscores are there?

@2868151647
Copy link
Author

2868151647 commented Apr 12, 2024

@SmartManoj sorry, my description problem.
i see config.toml, is right
cut1

@dproworld
Copy link

I recommend using a litellm proxy to the ollama server, as implementation is buggy. Here is an example config:

LLM_API_KEY="ollama"
LLM_BASE_URL="http://localhost:4000"
LLM_MODEL="ollama/dolphin"
LLM_EMBEDDING_MODEL="llama"
WORKSPACE_DIR="./workspace"
MAX_ITERATIONS=100

with litellm server:

litellm --model ollama/dolphin --api_base http://localhost:11434

@menguzat
Copy link

I did this but I keep getting

Oops. Something went wrong: Invalid \escape: line 2 column 18 (char 19)

on the front end and

ERROR:
Invalid \escape: line 2 column 18 (char 19)
Traceback (most recent call last):
File "/home/meng/OpenDevin/opendevin/controller/agent_controller.py", line 135, in step
action = self.agent.step(self.state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/meng/OpenDevin/agenthub/planner_agent/agent.py", line 44, in step
action = parse_response(action_resp)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/meng/OpenDevin/agenthub/planner_agent/prompt.py", line 224, in parse_response
action_dict = json.loads(response)
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/init.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Invalid \escape: line 2 column 18 (char 19)

on the backend. what may be the issue?

LLM_API_KEY="ollama"
LLM_BASE_URL="http://localhost:4000"
LLM_MODEL="ollama/dolphin"
LLM_EMBEDDING_MODEL="llama"
WORKSPACE_DIR="./workspace"
MAX_ITERATIONS=100

with litellm server:

litellm --model ollama/dolphin --api_base http://localhost:11434

@2868151647
Copy link
Author

2868151647 commented Apr 13, 2024

I think is no need to proxy ollama serve for me, i request and got response already.
I did a network bridge between wsl2 and win10 and changed the ip address so that they were on the same network segment.
I think we used different methods to achieve the same goal.

@SmartManoj
Copy link
Contributor

@menguzat File "/home/meng/OpenDevin/agenthub/planner_agent/agent.py", line 43, add print(action_resp)
Error due to low quality of the model. Check out Gemini 1.5 pro

@menguzat
Copy link

Hmmm... I was trying out mistral instruct.
I really want to use this with local llms so I can tinker with it without worrying about costs.
Any models to recommend?

@menguzat File "/home/meng/OpenDevin/agenthub/planner_agent/agent.py", line 43, add print(action_resp) Error due to low quality of the model. Check out Gemini 1.5 pro

@SmartManoj
Copy link
Contributor

Gemini 1.5 pro is free until May 2?

@rbren
Copy link
Collaborator

rbren commented Apr 21, 2024

I generally have seen this 404 error when the model is set to something unavailable

@Aeonitis
Copy link

Aeonitis commented Apr 22, 2024

@rbren can you please share what you've done for llama3, On this discussion, it seems you stated it should work?

The settings for the client frontend at port 3000 only has ollama/llama2 and previous versions listed in settings, with gpt 3.5-turbo as default, can't tell from where this list is being retrieved yet.

I also looked through the code and there were no 'llama3' strings but 'llama2' as present, which are generally needed as a model name on the requests, but it might be that the env variables do that part for us...

It kinda surprises me that the OpenDevin client doesn't just reassure the user that the client-server has been secured for as a part of the prep for further input e.g. just ping the show endpoint http://localhost:11434/api/show of your ollama container endpoint with request:

{
  "name": "llama2"
 }

as shown here in the api docs. Again, you'd have to state llama3 instead if it applies. Again, the issue of llama3 not listed in the frontend settings applies here (although I have a feeling it's fetched from a remote URL as there is a delay ging from empty to a populated dropdown)

@SmartManoj maybe try the following to confirm connectivity?
Container to Container ping

docker exec -it <your-client-container> ping <your-ollama-container-name>

Or make the curl from one container to the other

docker exec -it <your-client-container> curl -X POST -H "Content-Type: application/json" -d '{
  "name": "llama2"
}' http://<your-ollama-container-name>:<port>/<endpoint>

Two possibilities here I just have to guess on due to fact I don't have enough time to wade through the code atm

  1. That the requests are made from browser client to ollama, not from within the opendevin server/container, and my advice doesn't apply, but you can just use Postman anyway.
  2. That requests are from within the server/container, so my advice above will apply. Please note that the hostname may not be localhost anymore, since your docker containers are using the internal DNS, so you would need to use the container name <your-ollama-container-name> instead of localhostas I had shown.

Here's what I used for my shell script, the only editing (except for settings dropdown in frontend UI) I had to do to run (and yet still had that Nonetype request attribute error repeating itself forever)

export WORKSPACE_DIR=$(pwd)/workspace
docker run \
    --add-host host.docker.internal=host-gateway \
    -e LLM_API_KEY="11111111111111111111" \
    -e WORKSPACE_DIR="workspace" \
    -e LLM_BASE_URL="http://localhost:11434" \
    -e LLM_MODEL="ollama/llama2" \
    -e LLM_EMBEDDING_MODEL="llama2" \
    -e WORKSPACE_MOUNT_PATH=$WORKSPACE_DIR \
    -v $WORKSPACE_DIR:/opt/workspace_base \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -p 3000:3000 \
    ghcr.io/opendevin/opendevin:main

Shame I can't use OpenDevin yet, but I wanna thank you guys for your great work, looking forward to being a future user someday.

@SmartManoj
Copy link
Contributor

Did you check this to use without docker?

@Aeonitis
Copy link

I saw it, but I wasn't interested in working without dockerized containers, thanks

@SmartManoj
Copy link
Contributor

SmartManoj commented Apr 22, 2024

Used this command?
docker exec -it opendevin python opendevin/main.py -d /workspace -t "write bash script to print 5"

@rbren
Copy link
Collaborator

rbren commented Apr 23, 2024

@Aeonitis to be clear--I have not used llama3.

You can type any model you want into the UI, even if it doesn't auto-complete--setting ollama/llama3 (or whatever was passed to ollama pull) should do the trick

@rbren
Copy link
Collaborator

rbren commented Apr 23, 2024

It kinda surprises me that the OpenDevin client doesn't just reassure the user that the client-server has been secured for as a part of the prep for further input e.g. just ping the show endpoint

We're mostly trying to stay LLM/provider agnostic, but we do have this issue: #923

@rbren rbren added the severity:low Minor issues or affecting single user label May 2, 2024
@mamoodi
Copy link
Collaborator

mamoodi commented Jun 8, 2024

Seems like multiple issues were in this one issue but the original issue and author found a solution. Please feel free to open a new issue!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working severity:low Minor issues or affecting single user
Projects
None yet
Development

No branches or pull requests

7 participants