Skip to content

Conversation

seun-ja
Copy link
Contributor

@seun-ja seun-ja commented Aug 20, 2025

This PR aims to resolve #3211. Remote LLM implementation now provides options: the default option and OpenAI.

The default represents the current option, and as the name states, it's the default in case none is provided.

It's expandable while preserving the original API and not breaking it.

Concerns arise regarding wasi::llm parameters that are not compatible with OpenAI's API documentation, which may necessitate adjustments to the current definitions available in wit.. Documented here

[Update]

In reference to @rylev's comment there would be no need for a change in the wit file, just going to use what's available.

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
@itowlson
Copy link
Collaborator

It would be useful to see an example of how a user selects this back-end in their runtime config. It might also be useful to add some discussion to the PR of the tradeoffs you considered when bundling this in with the existing cloud-gpu remote back-end vs. having it as a separate (typed) back end.

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
@seun-ja seun-ja marked this pull request as ready for review August 25, 2025 20:41
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
@seun-ja
Copy link
Contributor Author

seun-ja commented Aug 26, 2025

The PR is going with bundling the open-ai api backend into the existing llm-remote-http crate because:

  • They both share lots of common resource, therefore, avoid unneccessary cyclic import issue
  • Interegrating it with existing won't break the current set-up
  • Keeps the current streamlined options, Spin (local) and RemoteHttp, while still making the later customizable and expandable depending on the optional field added to the runtime-config

@itowlson, I just realised in my PR main comment I said "breaking it", apologies, I meant "not breaking it" 😄

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
@@ -132,6 +133,7 @@ impl LlmCompute {
pub struct RemoteHttpCompute {
url: Url,
auth_token: String,
custom_llm: Option<String>,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we want this to be an enum which would allow us to constrain it to only the options implemented.


tracing::info!("Sending remote inference request to {chat_url}");

let body = CreateChatCompletionRequest {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we should use the generate API instead of Chat completions here because our interface does not really lend itself to having multiple roles. The wit interface aligns more closely with the generate API instead of chat completions.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have looked into this option; the endpoint is /v1/responses. While it is more compatible with the current wit interface, I noticed an unusual behaviour with this endpoint while testing it.

For example, it didn't work with gpt-4. I got this error

"error": {
    "message": "The model `gpt-4` does not exist or you do not have access to it.",
    "type": "invalid_request_error",
    "param": null,
    "code": "model_not_found"
 }

But when I tried using it with gpt-4o, it worked. Got a different error message and rightly so 🫣

"error": {
    "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.",
    "type": "insufficient_quota",
    "param": null,
    "code": "insufficient_quota"
  }

I tried with Ollama using the gpt-oss:20b model, got this error:

404 page not found

Dived deep and realised Ollama doesn't have that endpoint. However, Ollama has the /generate endpoint, but not compatible with what we need here. Returns this:

{
    "model": "gpt-oss:20b",
    "created_at": "2025-08-28T21:17:25.432385Z",
    "response": "",
    "done": true,
    "done_reason": "load"
}

In conclusion, I think the current setup is fine, the User role matches the expected behaviour, similar to what the default remote LLM does.

CustomLlm is parsed directly, catches unsupported CustomLlm. Also introduces a new trait called LlmWorker which every LLM engine implements

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Implement OpenAI API LLM backend
3 participants