Implement open ai api llm backend #3238

seun-ja · 2025-08-20T19:44:41Z

This PR aims to resolve #3211. Remote LLM implementation now provides options: the default option and OpenAI.

The default represents the current option, and as the name states, it's the default in case none is provided.

It's expandable while preserving the original API and not breaking it.

Concerns arise regarding wasi::llm parameters that are not compatible with OpenAI's API documentation, which may necessitate adjustments to the current definitions available in wit.. Documented here

[Update]

In reference to @rylev's comment there would be no need for a change in the wit file, just going to use what's available.

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

itowlson · 2025-08-21T00:17:56Z

It would be useful to see an example of how a user selects this back-end in their runtime config. It might also be useful to add some discussion to the PR of the tradeoffs you considered when bundling this in with the existing cloud-gpu remote back-end vs. having it as a separate (typed) back end.

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

seun-ja · 2025-08-26T08:58:37Z

The PR is going with bundling the open-ai api backend into the existing llm-remote-http crate because:

They both share lots of common resource, therefore, avoid unneccessary cyclic import issue
Interegrating it with existing won't break the current set-up
Keeps the current streamlined options, Spin (local) and RemoteHttp, while still making the later customizable and expandable depending on the optional field added to the runtime-config

@itowlson, I just realised in my PR main comment I said "breaking it", apologies, I meant "not breaking it" 😄

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

karthik2804 · 2025-08-28T10:15:03Z

crates/factor-llm/src/spin.rs

@@ -132,6 +133,7 @@ impl LlmCompute {
 pub struct RemoteHttpCompute {
    url: Url,
    auth_token: String,
+    custom_llm: Option<String>,


I think we want this to be an enum which would allow us to constrain it to only the options implemented.

karthik2804 · 2025-08-28T10:22:03Z

crates/llm-remote-http/src/open_ai.rs

+
+        tracing::info!("Sending remote inference request to {chat_url}");
+
+        let body = CreateChatCompletionRequest {


I think we should use the generate API instead of Chat completions here because our interface does not really lend itself to having multiple roles. The wit interface aligns more closely with the generate API instead of chat completions.

I have looked into this option; the endpoint is /v1/responses. While it is more compatible with the current wit interface, I noticed an unusual behaviour with this endpoint while testing it.

For example, it didn't work with gpt-4. I got this error

"error": { "message": "The model `gpt-4` does not exist or you do not have access to it.", "type": "invalid_request_error", "param": null, "code": "model_not_found" }

But when I tried using it with gpt-4o, it worked. Got a different error message and rightly so 🫣

"error": { "message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.", "type": "insufficient_quota", "param": null, "code": "insufficient_quota" }

I tried with Ollama using the gpt-oss:20b model, got this error:

404 page not found

Dived deep and realised Ollama doesn't have that endpoint. However, Ollama has the /generate endpoint, but not compatible with what we need here. Returns this:

{ "model": "gpt-oss:20b", "created_at": "2025-08-28T21:17:25.432385Z", "response": "", "done": true, "done_reason": "load" }

In conclusion, I think the current setup is fine, the User role matches the expected behaviour, similar to what the default remote LLM does.

CustomLlm is parsed directly, catches unsupported CustomLlm. Also introduces a new trait called LlmWorker which every LLM engine implements Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

seun-ja added 4 commits August 18, 2025 22:25

WIP

71dab62

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

WIP: Focus on remote

6d60290

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

WIP: Request API Setup

b770937

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

WIP: added embeddings

38c2c5d

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

seun-ja added 3 commits August 22, 2025 20:48

WIP: clean-up

543b0e7

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

Handles response

b552796

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

added example + improved deserialisation + reability

f30a6cb

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

seun-ja marked this pull request as ready for review August 25, 2025 20:41

revert code

6a601da

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

Resolve compatibility with Ollama and old OpenAI API + Documentation

8a51cee

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

karthik2804 reviewed Aug 28, 2025

View reviewed changes

vdice mentioned this pull request Aug 28, 2025

manifest: ai_model parsing too restrictive #3256

Open

seun-ja added 3 commits August 28, 2025 22:45

refactor RemoteHttpCompute config

a2d3bc2

CustomLlm is parsed directly, catches unsupported CustomLlm. Also introduces a new trait called LlmWorker which every LLM engine implements Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

removed tracing for debugging

d5a2f4a

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

revert endpoint change

044618e

Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Implement open ai api llm backend #3238

Implement open ai api llm backend #3238

Uh oh!

seun-ja commented Aug 20, 2025 •

edited

Loading

Uh oh!

itowlson commented Aug 21, 2025

Uh oh!

seun-ja commented Aug 26, 2025

Uh oh!

karthik2804 Aug 28, 2025

Uh oh!

karthik2804 Aug 28, 2025

Uh oh!

seun-ja Aug 28, 2025

Uh oh!

Uh oh!


		tracing::info!("Sending remote inference request to {chat_url}");

		let body = CreateChatCompletionRequest {

Implement open ai api llm backend #3238

Are you sure you want to change the base?

Implement open ai api llm backend #3238

Uh oh!

Conversation

seun-ja commented Aug 20, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

itowlson commented Aug 21, 2025

Uh oh!

seun-ja commented Aug 26, 2025

Uh oh!

karthik2804 Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

karthik2804 Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

seun-ja Aug 28, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

seun-ja commented Aug 20, 2025 •

edited

Loading