Use llama2 locally, keep requesting '/chat/completions' got 404 in ollama serve. #1052

2868151647 · 2024-04-12T13:41:29Z

Describe the bug

I use llama2, send 'hello' in frontend and saw keep requesting '/api/embeddings' with httpstate 200, mingled with request '/chat/completions' with httpstate 404 on ollama serve.
And i saw 99 steps log output on backend serve.

Setup and configuration

Current version:

commit e9121b78fed0b5ef36718ca0bf59588c0b094b86 (HEAD -> main)
Author: Xingyao Wang <[email protected]>
Date:   Sun Apr 7 16:07:59 2024 +0800

use .getLogger to avoid same logging message to get printed twice (#850)

My config.toml and environment vars (be sure to redact API keys):

LLM Model name: ollama/llama2
LLM API key: ''
LLM Base URL: localhost:11434
LLM Embedding Model: llama2
local model URL: localhost:11434
workspace: ./workspace

notice: i use real ip rather than localhost to solve communication problems between win10 and WSL2

My model and agent (you can see these settings in the UI):

Model:ollama/llama2
Agent: MonologueAgent

Commands I ran to install and run OpenDevin:

make setup-config
make start-backend
make start-frontend

Steps to Reproduce:
1.set config
2.start backend ,frontend and ollama serve
3.input 'hello' on frontend and send

Logs, error messages, and screenshots:
Traceback (most recent call last):
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1437, in function_with_retries
response = original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 387, in _completion
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 335, in _completion
deployment = self.get_available_deployment(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 2443, in get_available_deployment
raise ValueError(f"No healthy deployment available, passed model={model}")
ValueError: No healthy deployment available, passed model=gpt-3.5-turbo-1106

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/OpenDevin/agenthub/monologue_agent/utils/monologue.py", line 70, in condense
resp = llm.completion(messages=messages)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/OpenDevin/opendevin/llm/llm.py", line 58, in wrapper
resp = completion_unwrapped(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 329, in completion
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 326, in completion
response = self.function_with_fallbacks(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1420, in function_with_fallbacks
raise original_exception
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1345, in function_with_fallbacks
response = self.function_with_retries(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1497, in function_with_retries
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1463, in function_with_retries
response = original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 387, in _completion
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 335, in _completion
deployment = self.get_available_deployment(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 2443, in get_available_deployment
raise ValueError(f"No healthy deployment available, passed model={model}")
ValueError: No healthy deployment available, passed model=gpt-3.5-turbo-1106

ERROR:
Error condensing thoughts: No healthy deployment available, passed model=gpt-3.5-turbo-1106

Give Feedback / Get Help: https://github.com/BerriAI/litellm/issues/new
LiteLLM.Info: If you need to debug this error, use `litellm.set_verbose=True'.

Traceback (most recent call last):
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/llms/openai.py", line 414, in completion
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/llms/openai.py", line 373, in completion
response = openai_client.chat.completions.create(**data, timeout=timeout) # type: ignore
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/_utils/_utils.py", line 275, in wrapper
return func(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/resources/chat/completions.py", line 667, in create
return self._post(
^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/_base_client.py", line 1213, in post
return cast(ResponseT, self.request(cast_to, opts, stream=stream, stream_cls=stream_cls))
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/_base_client.py", line 902, in request
return self._request(
^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/openai/_base_client.py", line 993, in _request
raise self._make_status_error_from_response(err.response) from None
openai.NotFoundError: 404 page not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/main.py", line 997, in completion
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/main.py", line 970, in completion
response = openai_chat_completions.completion(
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/llms/openai.py", line 420, in completion
raise OpenAIError(status_code=e.status_code, message=str(e))
litellm.llms.openai.OpenAIError: 404 page not found

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
File "/home/user/OpenDevin/agenthub/monologue_agent/utils/monologue.py", line 70, in condense
resp = llm.completion(messages=messages)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/user/OpenDevin/opendevin/llm/llm.py", line 58, in wrapper
resp = completion_unwrapped(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 329, in completion
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 326, in completion
response = self.function_with_fallbacks(**kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1420, in function_with_fallbacks
raise original_exception
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1345, in function_with_fallbacks
response = self.function_with_retries(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1497, in function_with_retries
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 1463, in function_with_retries
response = original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 387, in _completion
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/router.py", line 370, in _completion
response = litellm.completion(
^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/utils.py", line 2947, in wrapper
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/utils.py", line 2845, in wrapper
result = original_function(*args, **kwargs)
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/main.py", line 2129, in completion
raise exception_type(
^^^^^^^^^^^^^^^
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/utils.py", line 8526, in exception_type
raise e
File "/root/.cache/pypoetry/virtualenvs/opendevin-9K61RljA-py3.11/lib/python3.11/site-packages/litellm/utils.py", line 7344, in exception_type
raise NotFoundError(
litellm.exceptions.NotFoundError: OpenAIException - 404 page not found

ERROR:
Error condensing thoughts: OpenAIException - 404 page not found

Additional Context

use WSL2 on win10

The text was updated successfully, but these errors were encountered:

SmartManoj · 2024-04-12T14:02:20Z

ValueError: No healthy deployment available, passed model=gpt-3.5-turbo-1106

seems LLM_MODEL is not configured correcting

--

My config.toml and environment vars (be sure to redact API keys):
LLM Model

Underscores are there?

2868151647 · 2024-04-12T15:48:22Z

@SmartManoj sorry, my description problem.
i see config.toml, is right

dproworld · 2024-04-12T16:42:22Z

I recommend using a litellm proxy to the ollama server, as implementation is buggy. Here is an example config:

LLM_API_KEY="ollama"
LLM_BASE_URL="http://localhost:4000"
LLM_MODEL="ollama/dolphin"
LLM_EMBEDDING_MODEL="llama"
WORKSPACE_DIR="./workspace"
MAX_ITERATIONS=100

with litellm server:

litellm --model ollama/dolphin --api_base http://localhost:11434

menguzat · 2024-04-12T17:23:52Z

I did this but I keep getting

Oops. Something went wrong: Invalid \escape: line 2 column 18 (char 19)

on the front end and

ERROR:
Invalid \escape: line 2 column 18 (char 19)
Traceback (most recent call last):
File "/home/meng/OpenDevin/opendevin/controller/agent_controller.py", line 135, in step
action = self.agent.step(self.state)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/meng/OpenDevin/agenthub/planner_agent/agent.py", line 44, in step
action = parse_response(action_resp)
^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/home/meng/OpenDevin/agenthub/planner_agent/prompt.py", line 224, in parse_response
action_dict = json.loads(response)
^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/init.py", line 346, in loads
return _default_decoder.decode(s)
^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/decoder.py", line 337, in decode
obj, end = self.raw_decode(s, idx=_w(s, 0).end())
^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
File "/usr/lib/python3.11/json/decoder.py", line 353, in raw_decode
obj, end = self.scan_once(s, idx)
^^^^^^^^^^^^^^^^^^^^^^
json.decoder.JSONDecodeError: Invalid \escape: line 2 column 18 (char 19)

on the backend. what may be the issue?

LLM_API_KEY="ollama"
LLM_BASE_URL="http://localhost:4000"
LLM_MODEL="ollama/dolphin"
LLM_EMBEDDING_MODEL="llama"
WORKSPACE_DIR="./workspace"
MAX_ITERATIONS=100

with litellm server:

litellm --model ollama/dolphin --api_base http://localhost:11434

2868151647 · 2024-04-13T01:18:38Z

I think is no need to proxy ollama serve for me, i request and got response already.
I did a network bridge between wsl2 and win10 and changed the ip address so that they were on the same network segment.
I think we used different methods to achieve the same goal.

SmartManoj · 2024-04-13T01:56:19Z

@menguzat File "/home/meng/OpenDevin/agenthub/planner_agent/agent.py", line 43, add print(action_resp)
Error due to low quality of the model. Check out Gemini 1.5 pro

menguzat · 2024-04-13T09:59:59Z

Hmmm... I was trying out mistral instruct.
I really want to use this with local llms so I can tinker with it without worrying about costs.
Any models to recommend?

@menguzat File "/home/meng/OpenDevin/agenthub/planner_agent/agent.py", line 43, add print(action_resp) Error due to low quality of the model. Check out Gemini 1.5 pro

SmartManoj · 2024-04-13T10:04:53Z

Gemini 1.5 pro is free until May 2?

rbren · 2024-04-21T19:08:13Z

I generally have seen this 404 error when the model is set to something unavailable

Aeonitis · 2024-04-22T12:17:43Z

@rbren can you please share what you've done for llama3, On this discussion, it seems you stated it should work?

The settings for the client frontend at port 3000 only has ollama/llama2 and previous versions listed in settings, with gpt 3.5-turbo as default, can't tell from where this list is being retrieved yet.

I also looked through the code and there were no 'llama3' strings but 'llama2' as present, which are generally needed as a model name on the requests, but it might be that the env variables do that part for us...

It kinda surprises me that the OpenDevin client doesn't just reassure the user that the client-server has been secured for as a part of the prep for further input e.g. just ping the show endpoint http://localhost:11434/api/show of your ollama container endpoint with request:

{
  "name": "llama2"
 }

as shown here in the api docs. Again, you'd have to state llama3 instead if it applies. Again, the issue of llama3 not listed in the frontend settings applies here (although I have a feeling it's fetched from a remote URL as there is a delay ging from empty to a populated dropdown)

@SmartManoj maybe try the following to confirm connectivity?
Container to Container ping

docker exec -it <your-client-container> ping <your-ollama-container-name>

Or make the curl from one container to the other

docker exec -it <your-client-container> curl -X POST -H "Content-Type: application/json" -d '{
  "name": "llama2"
}' http://<your-ollama-container-name>:<port>/<endpoint>

Two possibilities here I just have to guess on due to fact I don't have enough time to wade through the code atm

That the requests are made from browser client to ollama, not from within the opendevin server/container, and my advice doesn't apply, but you can just use Postman anyway.
That requests are from within the server/container, so my advice above will apply. Please note that the hostname may not be localhost anymore, since your docker containers are using the internal DNS, so you would need to use the container name <your-ollama-container-name> instead of localhostas I had shown.

Here's what I used for my shell script, the only editing (except for settings dropdown in frontend UI) I had to do to run (and yet still had that Nonetype request attribute error repeating itself forever)

export WORKSPACE_DIR=$(pwd)/workspace
docker run \
    --add-host host.docker.internal=host-gateway \
    -e LLM_API_KEY="11111111111111111111" \
    -e WORKSPACE_DIR="workspace" \
    -e LLM_BASE_URL="http://localhost:11434" \
    -e LLM_MODEL="ollama/llama2" \
    -e LLM_EMBEDDING_MODEL="llama2" \
    -e WORKSPACE_MOUNT_PATH=$WORKSPACE_DIR \
    -v $WORKSPACE_DIR:/opt/workspace_base \
    -v /var/run/docker.sock:/var/run/docker.sock \
    -p 3000:3000 \
    ghcr.io/opendevin/opendevin:main

Shame I can't use OpenDevin yet, but I wanna thank you guys for your great work, looking forward to being a future user someday.

SmartManoj · 2024-04-22T12:30:05Z

Did you check this to use without docker?

Aeonitis · 2024-04-22T12:32:15Z

I saw it, but I wasn't interested in working without dockerized containers, thanks

SmartManoj · 2024-04-22T13:51:01Z

Used this command?
docker exec -it opendevin python opendevin/main.py -d /workspace -t "write bash script to print 5"

rbren · 2024-04-23T22:03:17Z

@Aeonitis to be clear--I have not used llama3.

You can type any model you want into the UI, even if it doesn't auto-complete--setting ollama/llama3 (or whatever was passed to ollama pull) should do the trick

rbren · 2024-04-23T22:04:36Z

It kinda surprises me that the OpenDevin client doesn't just reassure the user that the client-server has been secured for as a part of the prep for further input e.g. just ping the show endpoint

We're mostly trying to stay LLM/provider agnostic, but we do have this issue: #923

mamoodi · 2024-06-08T16:21:24Z

Seems like multiple issues were in this one issue but the original issue and author found a solution. Please feel free to open a new issue!

2868151647 added the bug label Apr 12, 2024

spoonbobo mentioned this issue Apr 21, 2024

feat: LocalLLM docs without docker #1269

Merged

rbren added the severity:low label May 2, 2024

mamoodi closed this as completed Jun 8, 2024

liudonghua123 mentioned this issue Jan 18, 2025

Ollama Support abi/screenshot-to-code#354

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Use llama2 locally, keep requesting '/chat/completions' got 404 in ollama serve. #1052

Use llama2 locally, keep requesting '/chat/completions' got 404 in ollama serve. #1052

2868151647 commented Apr 12, 2024

SmartManoj commented Apr 12, 2024 •

edited

Loading

2868151647 commented Apr 12, 2024 •

edited

Loading

dproworld commented Apr 12, 2024

menguzat commented Apr 12, 2024

2868151647 commented Apr 13, 2024 •

edited

Loading

SmartManoj commented Apr 13, 2024

menguzat commented Apr 13, 2024

SmartManoj commented Apr 13, 2024

rbren commented Apr 21, 2024

Aeonitis commented Apr 22, 2024 •

edited

Loading

SmartManoj commented Apr 22, 2024

Aeonitis commented Apr 22, 2024

SmartManoj commented Apr 22, 2024 •

edited

Loading

rbren commented Apr 23, 2024 •

edited

Loading

rbren commented Apr 23, 2024

mamoodi commented Jun 8, 2024

Use llama2 locally, keep requesting '/chat/completions' got 404 in ollama serve. #1052

Use llama2 locally, keep requesting '/chat/completions' got 404 in ollama serve. #1052

Comments

2868151647 commented Apr 12, 2024

Describe the bug

Setup and configuration

Additional Context

SmartManoj commented Apr 12, 2024 • edited Loading

2868151647 commented Apr 12, 2024 • edited Loading

dproworld commented Apr 12, 2024

menguzat commented Apr 12, 2024

2868151647 commented Apr 13, 2024 • edited Loading

SmartManoj commented Apr 13, 2024

menguzat commented Apr 13, 2024

SmartManoj commented Apr 13, 2024

rbren commented Apr 21, 2024

Aeonitis commented Apr 22, 2024 • edited Loading

SmartManoj commented Apr 22, 2024

Aeonitis commented Apr 22, 2024

SmartManoj commented Apr 22, 2024 • edited Loading

rbren commented Apr 23, 2024 • edited Loading

rbren commented Apr 23, 2024

mamoodi commented Jun 8, 2024

SmartManoj commented Apr 12, 2024 •

edited

Loading

2868151647 commented Apr 12, 2024 •

edited

Loading

2868151647 commented Apr 13, 2024 •

edited

Loading

Aeonitis commented Apr 22, 2024 •

edited

Loading

SmartManoj commented Apr 22, 2024 •

edited

Loading

rbren commented Apr 23, 2024 •

edited

Loading