fix chat-style prompt templates (and Mistral OpenOrca) #1970

cebtenzzre · 2024-02-15T21:00:37Z

Here is an example of the kind of broken output this PR attempts to fix:

PR #1935 switched Mistral OpenOrca to the official prompt template (ChatML). But users have been seeing <|im_end|> in the output since this change. There are two problems:

The current GGUF file was converted 5 months ago by TheBloke. It is out-of-date and does not contain the special tokens.
We were still using llama_tokenize with special=false. This was changed to true (which has some caveats).

Now the model is actually seeing tokens 32001 (<|im_start|>) and 32000 (<|im_end|>):

Token output from llama_tokenize

Before:

523:  <
28766: |
321: im
28730: _
2521: start
28766: |
28767: >
1838: user
13:

28708: a
28789: <
28766: |
321: im
28730: _
416: end
28766: |
3409: ><
28766: |
321: im
28730: _
2521: start
28766: |
28767: >
489: ass
11143: istant
13:

After:

Token debug:
32001: <|im_start|>
1838: user
13:

28708: a
32000: <|im_end|>
32001: <|im_start|>
489: ass
11143: istant
13:

One limitation of calling llama_tokenize with special=true is that we can have false-positive special tokens, e.g. if the user input (or LocalDocs context) contains the strings <s> or </s>, which will be interpreted as BOS and EOS, respectively. The proper fix for this is to be able to specify raw token IDs in the prompt template so we can use special=false, but I don't know what the UI for this would look like.

Also, I haven't checked what happens to the EOS token (<|im_end|>) in history - for ChatML to work properly, this must come after the assistant's responses.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

ThiloteE · 2024-02-15T23:42:17Z

LM Studio allows users to enter stop tokens in the GUI like this:

apage43 · 2024-02-16T05:35:44Z

The proper fix for this is to be able to specify raw token IDs in the prompt template so we can use special=false, but I don't know what the UI for this would look like.

What about allowing special tokens as-is in the template but not the input? That way if someone adapts a prompt template from the original HF readme things just work, but typing special tokens into the input box won't cause problems.

That is, instead of actually filling the template with string templating and tokenizing that "<special>%1</special>" -> "<special>user input here</special>" -> tokenize("<special>user input here</special>")
you would instead split the template at the marker and do concat(tokenize("<special>", allow_special=true), tokenize("user input here", allow_special=false), tokenize("</special>", allow_special=true))

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

gpt4all-chat/chatllm.cpp

cebtenzzre · 2024-02-16T22:55:39Z

What about allowing special tokens as-is in the template but not the input? That way if someone adapts a prompt template from the original HF readme things just work, but typing special tokens into the input box won't cause problems.

Implemented. This does seem like the most robust option that is still user-friendly. The one casualty here is the documented ability to override self._format_chat_prompt_template - this is now deprecated (by checking the identity of the method) because it causes the prompt and prompt template to be merged into a single string.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre · 2024-02-20T18:56:28Z

I can confirm that im_end is indeed missing after the model's reply (as well as the newline that would follow it):

pos=86 2526 ' looking'
decode(n_past=87):
pos=87 354 ' for'
decode(n_past=88):
pos=88 28723 '.'
decode(n_past=89):
pos=89 32000 '<|im_end|>'
<snip>
decode(n_past=89):
pos=89 32001 '<|im_start|>'
pos=90 1838 'user'
pos=91 13 '
'
pos=92 287 ' b'

I propose modifying the prompt template so it looks like:

<|im_start|>user
%1<|im_end|>
<|im_start|>assistant
%2<|im_end|>

And then modifying LLModel to understand this format.

Thoughts? @manyoso @apage43

apage43 · 2024-02-20T20:59:56Z

I propose modifying the prompt template so it looks like: [...]

I'm cool with this change - I assume this means everything up to %2 is what we pass in to the model and the bit after %2 is treated as a stop-signal?

I do wonder if we are going to make a breaking change to the templating anyway if we could use more descriptive markers like {input} / {response} or similar (not a big deal if this makes things extra complicated though)

cebtenzzre · 2024-02-20T22:41:06Z

I propose modifying the prompt template so it looks like: [...]

I'm cool with this change - I assume this means everything up to %2 is what we pass in to the model and the bit after %2 is treated as a stop-signal?

In the implementation I have right now, the model's EOS token is still honored, but since GPT4All throws out the EOS token (since this would be the right thing to do when the prompt format does not contain <|im_end|>) we use the part after the %2 to put back the <|im_end|> and a newline (AFAIK we somehow don't already enforce a newline between the response and the next prompt?)

I do wonder if we are going to make a breaking change to the templating anyway if we could use more descriptive markers like {input} / {response} or similar (not a big deal if this makes things extra complicated though)

I'm not intending on making a breaking change, just an extension to the format - if %2 is not seen, nothing is inserted into the chat history after the EOS token (which is fine for e.g. Alpaca-style prompting).

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

manyoso · 2024-02-21T18:00:17Z

@apage43 please R+ if you are good with this and merge

ThiloteE · 2024-02-21T18:39:26Z

Tried this PR.

The pattern re-emerged at the 8th instruction/question I posed.

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

This reverts commit 31841d0. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

This shouldn't be true anymore anyway because we fixed the BOS getting dropped when we shift the context. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre · 2024-02-21T20:25:44Z

This PR is good enough for now - ChatML and similar templates work 100% correctly with these changes, plus my current local diff:

diff --git a/gpt4all-backend/llamamodel.cpp b/gpt4all-backend/llamamodel.cpp
index e8d2ccb..b3b619e 100644
--- a/gpt4all-backend/llamamodel.cpp
+++ b/gpt4all-backend/llamamodel.cpp
@@ -412,6 +412,10 @@ bool LLamaModel::evalTokens(PromptContext &ctx, const std::vector<int32_t> &toke
     // llama_decode will output logits only for the last token of the prompt
     batch.logits[batch.n_tokens - 1] = true;
 
+    std::cerr << "decode(n_past=" << ctx.n_past << "):\n";
+    for (int i = 0; i < batch.n_tokens; i++) {
+        std::cerr << "pos=" << ctx.n_past + i << " " << tokens[i] << " '" << llama_token_to_piece(d_ptr->ctx, tokens[i]) << "'\n";
+    }
     int res = llama_decode(d_ptr->ctx, batch);
     llama_batch_free(batch);
     return res == 0;
diff --git a/gpt4all-chat/metadata/models2.json b/gpt4all-chat/metadata/models2.json
index 903e7ad..5e33ca0 100644
--- a/gpt4all-chat/metadata/models2.json
+++ b/gpt4all-chat/metadata/models2.json
@@ -12,7 +12,7 @@
     "type": "Gemma",
     "description": "<strong>A state-of-the-art open model from Google</strong><br><ul><li>Fast responses</li><li>Chat based model</li><li>Trained by Google</li><li>Licensed for commercial use</li><li>Gemma is provided under and subject to the Gemma Terms of Use found at <a href=\"https://ai.google.dev/gemma/terms\">ai.google.dev/gemma/terms</a></li></ul>",
     "url": "https://gpt4all.io/models/gguf/gemma-7b-it.Q4_0.gguf",
-    "promptTemplate": "<start_of_turn>user\n%1<end_of_turn>\n<start_of_turn>model\n",
+    "promptTemplate": "<start_of_turn>user\n%1<end_of_turn>\n<start_of_turn>model\n%2<end_of_turn>\n",
     "systemPrompt": ""
   },
   {
@@ -28,7 +28,7 @@
     "type": "Mistral",
     "description": "<strong>Best overall fast chat model</strong><br><ul><li>Fast responses</li><li>Chat based model</li><li>Trained by Mistral AI<li>Finetuned on OpenOrca dataset curated via <a href=\"https://atlas.nomic.ai/\">Nomic Atlas</a><li>Licensed for commercial use</ul>",
     "url": "https://gpt4all.io/models/gguf/mistral-7b-openorca.Q4_0.gguf",
-    "promptTemplate": "<|im_start|>user\n%1<|im_end|>\n<|im_start|>assistant\n",
+    "promptTemplate": "<|im_start|>user\n%1<|im_end|>\n<|im_start|>assistant\n%2<|im_end|>\n",
     "systemPrompt": "<|im_start|>system\nYou are MistralOrca, a large language model trained by Alignment Lab AI. For multi-step problems, write out your reasoning for each step.\n<|im_end|>"
   },
   {
@@ -152,7 +152,7 @@
     "type": "MPT",
     "description": "<strong>Good model with novel architecture</strong><br><ul><li>Fast responses<li>Chat based<li>Trained by Mosaic ML<li>Cannot be used commercially</ul>",
     "url": "https://gpt4all.io/models/gguf/mpt-7b-chat-newbpe-q4_0.gguf",
-    "promptTemplate": "<|im_start|>user\n%1<|im_end|>\n<|im_start|>assistant\n",
+    "promptTemplate": "<|im_start|>user\n%1<|im_end|>\n<|im_start|>assistant\n%2<|im_end|>\n",
     "systemPrompt": "<|im_start|>system\n- You are a helpful assistant chatbot trained by MosaicML.\n- You answer questions.\n- You are excited to be able to help the user, but will refuse to do anything that could be considered harmful to the user.\n- You are more than just an information source, you are also able to write poetry, short stories, and make jokes.<|im_end|>"
   },
   {
diff --git a/gpt4all-chat/modellist.cpp b/gpt4all-chat/modellist.cpp
index 7d07e4c..bea4b85 100644
--- a/gpt4all-chat/modellist.cpp
+++ b/gpt4all-chat/modellist.cpp
@@ -7,7 +7,7 @@
 #include <QStandardPaths>
 #include <algorithm>
 
-//#define USE_LOCAL_MODELSJSON
+#define USE_LOCAL_MODELSJSON
 
 #define DEFAULT_EMBEDDING_MODEL "all-MiniLM-L6-v2-f16.gguf"
 #define NOMIC_EMBEDDING_MODEL "nomic-embed-text-v1.txt"

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

This fixes a regression in commit 4fc4d94 ("fix chat-style prompt templates (#1970)"), which moved some return satements into a new function (LLModel::decodePrompt) without making them return from the parent as well. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

This fixes a regression in commit 4fc4d94 ("fix chat-style prompt templates (#1970)"), which moved some return statements into a new function (LLModel::decodePrompt) without making them return from the parent as well. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added 2 commits February 15, 2024 15:24

models2.json: use a fresh conversion of Mistral OpenOrca

e6c1511

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

backend: process special tokens in LLamaModel::tokenize

e99abe8

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre requested a review from manyoso February 15, 2024 21:00

cebtenzzre mentioned this pull request Feb 15, 2024

Fix multi-turn chat-style prompt formatting/tokenization #1961

Closed

cebtenzzre added 3 commits February 16, 2024 17:46

llmodel: cleanup

9a8bca2

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

only process special tokens in prompt template and system prompt

297e830

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

backend: warn if user loads outdated model

a38833c

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre commented Feb 16, 2024

View reviewed changes

gpt4all-chat/chatllm.cpp Show resolved Hide resolved

cebtenzzre added 3 commits February 16, 2024 17:57

md5: fix -Wcast-qual warnings

aae77e2

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

chat: stub out blacklist check

6ec92be

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

docs: remove references to _format_chat_prompt_template

685c1b9

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre force-pushed the chatml-fix branch from 86f45f1 to 685c1b9 Compare February 16, 2024 22:57

cebtenzzre added 3 commits February 20, 2024 13:13

models2.json: reupload Mistral OpenOrca with fixed general.name

03e72b1

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llamamodel: blacklist by GGUF metadata instead of MD5

77f30a2

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

models2.json: fix missing newline in ChatML template

57a74f4

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added 2 commits February 20, 2024 23:47

support prompt templates with "%2" and footer

966349b

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llamamodel: do not add BOS token for e.g. Falcon

997d200

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

manyoso requested review from apage43 and removed request for manyoso February 21, 2024 17:59

cebtenzzre mentioned this pull request Feb 21, 2024

llamamodel: add gemma model support #1992

Merged

cebtenzzre added 2 commits February 21, 2024 14:35

llmodel: never erase BOS from context

bfb7c49

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llmodel: strip leading newline in assistant footer

31841d0

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added 4 commits February 21, 2024 14:40

Merge branch 'main' into chatml-fix

9fcbfef

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Merge branch 'main' into chatml-fix

c039425

models2.json: revert new prompt templates until after release

1b1766c

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

Revert "llmodel: strip leading newline in assistant footer"

02499f6

This reverts commit 31841d0. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre force-pushed the chatml-fix branch from 0a82546 to 02499f6 Compare February 21, 2024 19:56

cebtenzzre added 3 commits February 21, 2024 15:01

fix build errors

bee3421

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llamamodel: never attempt to insert BOS in middle of context

78e9741

This shouldn't be true anymore anyway because we fixed the BOS getting dropped when we shift the context. Signed-off-by: Jared Van Bortel <jared@nomic.ai>

llamamodel: avoid automatic space prefix in the middle of context

7fa551f

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

bump MODEL_VERSION to 3 and use models3.json for new templates

a415331

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

manyoso approved these changes Feb 21, 2024

View reviewed changes

cebtenzzre merged commit 4fc4d94 into main Feb 21, 2024

cebtenzzre changed the title ~~Fix incorrect prompting of Mistral OpenOrca~~ fix chat-style prompt templates (and Mistral OpenOrca) Feb 21, 2024

cebtenzzre mentioned this pull request Feb 22, 2024

Strange echoing of the prompt template #1985

Closed

cebtenzzre added a commit that referenced this pull request Feb 24, 2024

python: fix mistakes from PR #1970

49db4fe

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added a commit that referenced this pull request Feb 26, 2024

python: fix mistakes from PR #1970 (#2023)

a59645c

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added a commit that referenced this pull request Feb 28, 2024

chat: fix ChatGPT after #1970

b4b914d

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre added a commit that referenced this pull request Feb 28, 2024

chat: fix ChatGPT after #1970

4ff99b5

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre mentioned this pull request Feb 28, 2024

chat: fix ChatGPT after #1970 #2051

Merged

cebtenzzre added a commit that referenced this pull request Mar 6, 2024

chat: fix ChatGPT after #1970 (#2051)

402f515

Signed-off-by: Jared Van Bortel <jared@nomic.ai>

cebtenzzre mentioned this pull request Apr 15, 2024

fix regressions in system prompt handling #2219

Merged

cebtenzzre mentioned this pull request Jul 1, 2024

backend: fix a crash on inputs greater than n_ctx #2498

Merged

cebtenzzre mentioned this pull request Sep 26, 2024

chatllm: do not pass nullptr as response callback #2995

Merged

cebtenzzre deleted the chatml-fix branch February 10, 2025 16:38

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix chat-style prompt templates (and Mistral OpenOrca) #1970

fix chat-style prompt templates (and Mistral OpenOrca) #1970

Uh oh!

cebtenzzre commented Feb 15, 2024 •

edited

Loading

Uh oh!

ThiloteE commented Feb 15, 2024

Uh oh!

apage43 commented Feb 16, 2024 •

edited

Loading

Uh oh!

Uh oh!

cebtenzzre commented Feb 16, 2024

Uh oh!

cebtenzzre commented Feb 20, 2024

Uh oh!

apage43 commented Feb 20, 2024 •

edited

Loading

Uh oh!

cebtenzzre commented Feb 20, 2024

Uh oh!

manyoso commented Feb 21, 2024

Uh oh!

ThiloteE commented Feb 21, 2024 •

edited

Loading

Uh oh!

cebtenzzre commented Feb 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

fix chat-style prompt templates (and Mistral OpenOrca) #1970

fix chat-style prompt templates (and Mistral OpenOrca) #1970

Uh oh!

Conversation

cebtenzzre commented Feb 15, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

ThiloteE commented Feb 15, 2024

Uh oh!

apage43 commented Feb 16, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

cebtenzzre commented Feb 16, 2024

Uh oh!

cebtenzzre commented Feb 20, 2024

Uh oh!

apage43 commented Feb 20, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cebtenzzre commented Feb 20, 2024

Uh oh!

manyoso commented Feb 21, 2024

Uh oh!

ThiloteE commented Feb 21, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

cebtenzzre commented Feb 21, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

cebtenzzre commented Feb 15, 2024 •

edited

Loading

apage43 commented Feb 16, 2024 •

edited

Loading

apage43 commented Feb 20, 2024 •

edited

Loading

ThiloteE commented Feb 21, 2024 •

edited

Loading