hey @ianbicking - thanks a lot for the feedback. I've merged a change to fix the links [1].
> The metadata tokens is a string [1]... that doesn't seem right. Request/response tokens generally need to be separated, as they are usually priced separately.
For the metadata you are right. Request and response tokens are billed separately and should be captured accordingly. I've put a PR to address that [2]
> It doesn't specify how the messages have to be arranged, if at all. But some providers force system/user/assistant/user... with user last. ...
We do assume that last message in the array to be from user. But we are not forcing it at the moment.
I've hit cross-LLM-compatibility errors in the past with message order, multiple system messages, and empty messages.
Multiple system messages are kind of a hack to invoke that distinct role in different positions, especially the last position. I.e., second to last message is what the user said, last message is a system message telling the LLM to REALLY FOLLOW THE INSTRUCTIONS and not get overly distracted by the user. (Though personally I usually rewrite the user message for that purpose.)
Multiple user messages in a row is likely caused by some failure in the system to produce an assistant response, like no network. You could ask the client to collapse those, but I think it's most correct to allow them. The user understands the two messages as distinct.
Multiple assistant messages, or no trailing user message, is a reasonable way to represent "please continue" without a message. These could also be collapsed, but that may or may not be accurate depending on how the messages are truncated.
This all gets even more complicated once tools are introduced.
(I also notice there's no max_tokens or stop reason. Both are pretty universal.)
These message order questions do open up a more meta question you might want to think about and decide on: is this a prescriptive spec that says how everyone _should_ behave, a descriptive spec that is roughly the outer bounds of what anyone (either user or provider) can expect... or a combination like prescriptive for the provider and descriptive for the user.
Yeah, I can completely see this, the goal of this was to be specifically for the messages object, and not a completions object, since in my experience, you usually send messages from front end to backend and then create the completion request with all the additional parameters when sending from backend to an LLM provider. So when just sending from an application to the server, trying to just capture the messages object seemed ideal. This was also designed to try and maximize cross compatibility, so it is not what the format "should be" instead, it is trying to be a format that everyone can adopt without disrupting current setups.
Huh, that's a different use case than I was imagining. I actually don't know why I'd want a standard API from a frontend and backend that I control.
In most applications where I make something chat-like (honestly a minority of my LLM use) I have application-specific data in the chat, and then I turn that into an LLM request only immediately before sending a completion request, using application-specific code.
Well, in the case of the front-end (like streamlit, gradio, etc) they send conversational messages in their own custom ways - this means I must develop against them each specifically, and that slows down any quick experimentation I would want to do as a developer. This is the client <> server interaction.
And then the conversational messages sent to the LLM are also somewhat unique to each provider. One improvement for simplicity purposes could be that we get a standard /chat/completions API for server <> LLM interaction and define a standard "messages" object in that API (vs the stand-alone messages object as defined in the OMF").
Perhaps that might be simpler, and easier to understand
> The metadata tokens is a string [1]... that doesn't seem right. Request/response tokens generally need to be separated, as they are usually priced separately.
For the metadata you are right. Request and response tokens are billed separately and should be captured accordingly. I've put a PR to address that [2]
> It doesn't specify how the messages have to be arranged, if at all. But some providers force system/user/assistant/user... with user last. ...
We do assume that last message in the array to be from user. But we are not forcing it at the moment.
[1] https://github.com/open-llm-initiative/open-message-format/p...
[2] https://github.com/open-llm-initiative/open-message-format/p...