-
Notifications
You must be signed in to change notification settings - Fork 8.3k
fix chat-style prompt templates (and Mistral OpenOrca) #1970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
What about allowing special tokens as-is in the template but not the input? That way if someone adapts a prompt template from the original HF readme things just work, but typing special tokens into the input box won't cause problems. That is, instead of actually filling the template with string templating and tokenizing that |
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Implemented. This does seem like the most robust option that is still user-friendly. The one casualty here is the documented ability to override self._format_chat_prompt_template - this is now deprecated (by checking the identity of the method) because it causes the prompt and prompt template to be merged into a single string. |
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
86f45f1 to
685c1b9
Compare
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
|
I can confirm that im_end is indeed missing after the model's reply (as well as the newline that would follow it): I propose modifying the prompt template so it looks like: And then modifying LLModel to understand this format. |
I'm cool with this change - I assume this means everything up to %2 is what we pass in to the model and the bit after %2 is treated as a stop-signal? I do wonder if we are going to make a breaking change to the templating anyway if we could use more descriptive markers like |
In the implementation I have right now, the model's EOS token is still honored, but since GPT4All throws out the EOS token (since this would be the right thing to do when the prompt format does not contain
I'm not intending on making a breaking change, just an extension to the format - if |
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
|
@apage43 please R+ if you are good with this and merge |
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
This reverts commit 31841d0. Signed-off-by: Jared Van Bortel <jared@nomic.ai>
0a82546 to
02499f6
Compare
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
This shouldn't be true anymore anyway because we fixed the BOS getting dropped when we shift the context. Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
|
This PR is good enough for now - ChatML and similar templates work 100% correctly with these changes, plus my current local diff: diff --git a/gpt4all-backend/llamamodel.cpp b/gpt4all-backend/llamamodel.cpp
index e8d2ccb..b3b619e 100644
--- a/gpt4all-backend/llamamodel.cpp
+++ b/gpt4all-backend/llamamodel.cpp
@@ -412,6 +412,10 @@ bool LLamaModel::evalTokens(PromptContext &ctx, const std::vector<int32_t> &toke
// llama_decode will output logits only for the last token of the prompt
batch.logits[batch.n_tokens - 1] = true;
+ std::cerr << "decode(n_past=" << ctx.n_past << "):\n";
+ for (int i = 0; i < batch.n_tokens; i++) {
+ std::cerr << "pos=" << ctx.n_past + i << " " << tokens[i] << " '" << llama_token_to_piece(d_ptr->ctx, tokens[i]) << "'\n";
+ }
int res = llama_decode(d_ptr->ctx, batch);
llama_batch_free(batch);
return res == 0;
diff --git a/gpt4all-chat/metadata/models2.json b/gpt4all-chat/metadata/models2.json
index 903e7ad..5e33ca0 100644
--- a/gpt4all-chat/metadata/models2.json
+++ b/gpt4all-chat/metadata/models2.json
@@ -12,7 +12,7 @@
"type": "Gemma",
"description": "<strong>A state-of-the-art open model from Google</strong><br><ul><li>Fast responses</li><li>Chat based model</li><li>Trained by Google</li><li>Licensed for commercial use</li><li>Gemma is provided under and subject to the Gemma Terms of Use found at <a href=\"https://ai.google.dev/gemma/terms\">ai.google.dev/gemma/terms</a></li></ul>",
"url": "https://gpt4all.io/models/gguf/gemma-7b-it.Q4_0.gguf",
- "promptTemplate": "<start_of_turn>user\n%1<end_of_turn>\n<start_of_turn>model\n",
+ "promptTemplate": "<start_of_turn>user\n%1<end_of_turn>\n<start_of_turn>model\n%2<end_of_turn>\n",
"systemPrompt": ""
},
{
@@ -28,7 +28,7 @@
"type": "Mistral",
"description": "<strong>Best overall fast chat model</strong><br><ul><li>Fast responses</li><li>Chat based model</li><li>Trained by Mistral AI<li>Finetuned on OpenOrca dataset curated via <a href=\"https://atlas.nomic.ai/\">Nomic Atlas</a><li>Licensed for commercial use</ul>",
"url": "https://gpt4all.io/models/gguf/mistral-7b-openorca.Q4_0.gguf",
- "promptTemplate": "<|im_start|>user\n%1<|im_end|>\n<|im_start|>assistant\n",
+ "promptTemplate": "<|im_start|>user\n%1<|im_end|>\n<|im_start|>assistant\n%2<|im_end|>\n",
"systemPrompt": "<|im_start|>system\nYou are MistralOrca, a large language model trained by Alignment Lab AI. For multi-step problems, write out your reasoning for each step.\n<|im_end|>"
},
{
@@ -152,7 +152,7 @@
"type": "MPT",
"description": "<strong>Good model with novel architecture</strong><br><ul><li>Fast responses<li>Chat based<li>Trained by Mosaic ML<li>Cannot be used commercially</ul>",
"url": "https://gpt4all.io/models/gguf/mpt-7b-chat-newbpe-q4_0.gguf",
- "promptTemplate": "<|im_start|>user\n%1<|im_end|>\n<|im_start|>assistant\n",
+ "promptTemplate": "<|im_start|>user\n%1<|im_end|>\n<|im_start|>assistant\n%2<|im_end|>\n",
"systemPrompt": "<|im_start|>system\n- You are a helpful assistant chatbot trained by MosaicML.\n- You answer questions.\n- You are excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.\n- You are more than just an information source, you are also able to write poetry, short stories, and make jokes.<|im_end|>"
},
{
diff --git a/gpt4all-chat/modellist.cpp b/gpt4all-chat/modellist.cpp
index 7d07e4c..bea4b85 100644
--- a/gpt4all-chat/modellist.cpp
+++ b/gpt4all-chat/modellist.cpp
@@ -7,7 +7,7 @@
#include <QStandardPaths>
#include <algorithm>
-//#define USE_LOCAL_MODELSJSON
+#define USE_LOCAL_MODELSJSON
#define DEFAULT_EMBEDDING_MODEL "all-MiniLM-L6-v2-f16.gguf"
#define NOMIC_EMBEDDING_MODEL "nomic-embed-text-v1.txt" |
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Signed-off-by: Jared Van Bortel <jared@nomic.ai>
Here is an example of the kind of broken output this PR attempts to fix:
PR #1935 switched Mistral OpenOrca to the official prompt template (ChatML). But users have been seeing
<|im_end|>in the output since this change. There are two problems:Now the model is actually seeing tokens 32001 (
<|im_start|>) and 32000 (<|im_end|>):Token output from llama_tokenize
Before:
After:
One limitation of calling llama_tokenize with special=true is that we can have false-positive special tokens, e.g. if the user input (or LocalDocs context) contains the strings
<s>or</s>, which will be interpreted as BOS and EOS, respectively. The proper fix for this is to be able to specify raw token IDs in the prompt template so we can use special=false, but I don't know what the UI for this would look like.Also, I haven't checked what happens to the EOS token (
<|im_end|>) in history - for ChatML to work properly, this must come after the assistant's responses.