-
Notifications
You must be signed in to change notification settings - Fork 284
Implement open ai api llm backend #3238
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Implement open ai api llm backend #3238
Conversation
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
It would be useful to see an example of how a user selects this back-end in their runtime config. It might also be useful to add some discussion to the PR of the tradeoffs you considered when bundling this in with the existing |
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
The PR is going with bundling the open-ai api backend into the existing
@itowlson, I just realised in my PR main comment I said "breaking it", apologies, I meant "not breaking it" 😄 |
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
crates/factor-llm/src/spin.rs
Outdated
@@ -132,6 +133,7 @@ impl LlmCompute { | |||
pub struct RemoteHttpCompute { | |||
url: Url, | |||
auth_token: String, | |||
custom_llm: Option<String>, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we want this to be an enum which would allow us to constrain it to only the options implemented.
|
||
tracing::info!("Sending remote inference request to {chat_url}"); | ||
|
||
let body = CreateChatCompletionRequest { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we should use the generate API instead of Chat completions here because our interface does not really lend itself to having multiple roles. The wit
interface aligns more closely with the generate API instead of chat completions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I have looked into this option; the endpoint is /v1/responses
. While it is more compatible with the current wit
interface, I noticed an unusual behaviour with this endpoint while testing it.
For example, it didn't work with gpt-4
. I got this error
"error": {
"message": "The model `gpt-4` does not exist or you do not have access to it.",
"type": "invalid_request_error",
"param": null,
"code": "model_not_found"
}
But when I tried using it with gpt-4o
, it worked. Got a different error message and rightly so 🫣
"error": {
"message": "You exceeded your current quota, please check your plan and billing details. For more information on this error, read the docs: https://platform.openai.com/docs/guides/error-codes/api-errors.",
"type": "insufficient_quota",
"param": null,
"code": "insufficient_quota"
}
I tried with Ollama using the gpt-oss:20b
model, got this error:
404 page not found
Dived deep and realised Ollama doesn't have that endpoint. However, Ollama has the /generate
endpoint, but not compatible with what we need here. Returns this:
{
"model": "gpt-oss:20b",
"created_at": "2025-08-28T21:17:25.432385Z",
"response": "",
"done": true,
"done_reason": "load"
}
In conclusion, I think the current setup is fine, the User
role matches the expected behaviour, similar to what the default remote LLM does.
CustomLlm is parsed directly, catches unsupported CustomLlm. Also introduces a new trait called LlmWorker which every LLM engine implements Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
Signed-off-by: Aminu Oluwaseun Joshua <seun.aminujoshua@gmail.com>
This PR aims to resolve #3211. Remote LLM implementation now provides options: the default option and OpenAI.
The default represents the current option, and as the name states, it's the default in case none is provided.
It's expandable while preserving the original API and not breaking it.
Concerns arise regarding
wasi::llm
parameters that are not compatible with OpenAI's API documentation, which may necessitate adjustments to the current definitions available in wit.. Documented here[Update]
In reference to @rylev's comment there would be no need for a change in the wit file, just going to use what's available.