Course 1 - ChatGPT Prompt Engineering for Developers
Guidelines for Prompting
lear and Specific Instructions: Writing clear and specificinstructionsisvitalforgettingthe
C
model to understand the task and generate accurate outputs. Some tactics include:
● sing delimiters to clearly indicate distinct parts of the input.
U
● Asking for a structured output to make the response easier to process.
● Asking the model to check if conditions are satisfied.
● Implementing few-shot prompting to provide examples and guide the model's output.
ivingtheModelTimetoThink:Thisprinciplefocusesonallowingthemodeltimetoprocess
G
its response before coming to a conclusion. Tactics to achieve this include:
S
● pecifying the steps required to complete a task.
● Asking for the output in a specified format.
● Instructing the model to work out its solution before jumping to a conclusion
imitation of LLMs - Hallucination: One of the biggest limitations of LLMs is hallucination,
L
where the modelconfidentlydeclaressomethingfalseasafact.Keepingtheinstructionsclear
and well-structured can help minimize this.
Iterative Prompt Development
We caniteratively analyze and refine prompts to generateappropriate outputs.
S
● uppose if the text is too long - We canlimit thenumber of words/sentences/characters.
● Suppose the text focuses on the wrong details we ask it to focus on the aspectsthatare
relevant to the intended audience.
● Suppose the description needs a table of dimensions, ask it to extract information and
organize it in a table.
Summarizing
he goal ofsummarizationistocondensealargerbodyoftextwhilefocusingonspecifictopicsof
T
interest.
) S
1 ummarizing with a word/sentence/character limit (50 words or 1 paragraph)
2) Summarizing with a focus on specific topic (price/description/quality/material of the item)
ometimes,summariesincludetopicsthatarenotrelatedtothetopicoffocus,using“extract”would
S
yield better results, keeping the summary focused on the essential points.
.
Inferring
Inferring involves deducing certain pieces ofinformation,suchassentimentortopic,basedonthe
context of the text.
) Identifying positive/negative sentiment
1
2) Identifying emotions (sad/happy/anger)
3) We could a
lso inferthetopicofinterestfromablockoftext(example-creatingaheadline
from an article)
he model can be prompted to do multiple tasks simultaneously, such as summarizing text and
T
inferring sentiment in one go.
Transforming
argeLanguageModelsarecapableofhandlingvarioustexttransformationtasks,suchaslanguage
L
translation, tone adjustment, and format conversion.
) L
1 anguage Translation for converting the text from one language to another
2) Tone transformation for converting the text from a formal to informal language or vice versa
3) Format transformation convertingoneformattoanother.(PythondictionaryinJSONformat
to HTML)
4) Spelling and Grammar Checking - The model can also be used for proofreading and
correcting spelling and grammatical errors in text.
Course 2 - Building Systems with the ChatGPT API
his course highlights the process of developing an LLM based application and some best practices
T
for evaluating and improving systems over time.
L1 Language Models, the Chat Format and Tokens
Setup & API Key Loading
T
● he OpenAI API key is loaded using Python libraries (os, openai, dotenv, tiktoken).
● The helper function get_completion will make it easier to use prompts and look at the
generated outputs.
Prompting & Response Generation
O
● penAI’s chat completion endpoint is used to generate responses based on user prompts.
● Prompts can be refined for structured outputs, including conditions and formatting.
Token Usage
● T
he concept of tokens is introduced, with examples of manipulating text, such as reversing
words.
Classification
ueries may include to classify the user prompt based on the categories provided in the system
Q
message. It may consist of direct classification or branching in the form of primary and secondary
classification.
Moderation
● O penAI has its own moderation policies We can use themoderationsendpoint to check
whether text or images are potentially harmful.
● If harmful content is identified, it can take corrective action, like filtering content or intervening
with user accounts creating offending content.
● Suppose the user prompt asks the model to discard the system message, then it’s a prompt
injection attack and the model has to identify this and act against it.
Chain of Thought Reasoning
Chain of Thought Reasoning
● R eframing query torequestaseriesofrelevantreasoningstepsbeforethemodelprovides
the final answer, so that it can think longer and more methodically for the problem.
Inner monologue
● T acticwherewehidethemodelreasoningprocessthatitusedtoarriveatthefinalresult.At
times, sharing this information with the users may be inappropriate.
● Itinstructsthemodeltoputpartsoftheoutputthataremeanttobehiddenfromtheuserinto
a structured format that makes passing them easy.
● Before presenting the user, the output is passed and only relevant information is made
available for the users.
Chaining Prompts
● C hainingisusefulasitisalwaysbettertoworkoneachpromptonebyonethantomakeit
complexbytryingtoworkwithallthepromptstogether.Itbreaksdownthecomplexityofthe
task reducing the amount of errors.
● Chaining prompts is a powerful strategy when we have a workflow and can maintain the
state of the system at any given point and take differentactionsdependingonthecurrent
state.
● It also reduces costs as the longer prompts with more tokens take longer to run.
● Also makes it easier to identify which steps are failing more often.
Prompt Engineering with Llama 2
he code to call the Llama 2 models through the Together.ai hosted API service has been wrapped
T
into a helper function calledllama. You can takea look at this code if you like by opening the utils.py
file using the File and Open menu item above this notebook
hat vs Base Models
C
Chat models try to provide answers to the query asked. But the base model understands the query
and tries to frame questions in a similar way.
ogether.ai supports both Llama 3 8b chat and Llama 3 70b chat models with the following names
T
(case-insensitive):
m
● eta-llama/Llama-3-8b-chat-hf
● meta-llama/Llama-3-70b-chat-hf
hanging the temperature setting
C
Temperature parameters will be available to set in the llama helper function. This helps to bring in
randomness in the output.
hanging the max tokens setting
C
This limits the number of tokens in the response of the LLM model. It is also a parameter available
as max_tokens in the llama helper function. This parameter includes the number of tokens of both
the prompt and the response.
sking a follow up question
A
Ask a follow up question and ask the model to rewrite the response for the first prompt also
considering the follow up question.
Using Llama 2 or 3 on your own computer!
The smaller Llama 2 or 3 chat model is free to download on your own machine.
nly the Llama 2 7B chat or Llama 3 8B model (by default the 4-bit quantized version is
O
downloaded) will work fine locally.
ther larger sized models could require too much memory (13b models generally require at least
O
16GB of RAM and 70b models at least 64GB of RAM) and run too slowly.
he Meta team still recommends using a hosted API service because it allows you to access all the
T
available llama models without being limited by your hardware.
ne way to install and use llama 7B on your computer is to go tohttps://ollama.com/and download
O
the app. It will be like installing a regular application.To use Llama 2 or 3, the full instructions are
here:https://ollama.com/library/llama2andhttps://ollama.com/library/llama3.
Multi-turn conversations
LLMs are stateless
● LLMs don’t remember the previous interactions by default.
Constructing multi-turn prompts
● N
eed to provide prior prompts and responses as part of the context of each new turn in the
conversation
Use llama chat helper function
● L
lama helper functions use a list of inputs as prompts and another list as responses and then
we use these lists to pass into the helper llama function in order to remember the previous
questions and responses.
Prompt Engineering Techniques
InContext Learning
Zero-shot Prompting
P
● rompting the model to see if it can infer the task from the structure of your prompt.
● In zero-shot prompting, we provide the structure to the model, but without any examples of
the completed task.
Few-shot Prompting
● In few-shot prompting, we not only provide the structure to the model, but also two or more
examples.
● W
e are prompting the model to see if it can infer the task from the structure, as well as the
examples in your prompt.
Specifying the Output Format
W
● e can also specify the format in which we want the model to respond.
● In the example below, we are asking to "give a one word response".
Role Prompting
R
● oles give context to LLMs what type of answers are desired.
● Llama 2 often gives more consistent responses when provided with a role. We try by giving
the model a "role", and within the role, a "tone" using which it should respond with.
Summarization
● Summarizing a large text is another common use case for LLMs.
Chain-of-thought Prompting
● L LMs can perform better at reasoning and logic problems if you ask them to break the
problem down into smaller steps. This is known aschain-of-thoughtprompting.
● Modifying the prompt to ask the model to "think step by step" and we need to provide the
model with additional instructions.
● The order of instructions matters! We can ask the model to "answer first" and "explain later"
to see how the output changes.
● Since LLMs predict their answer one token at a time, the best practice is to ask it to think
step by step, and then only provide the answer after they have explained their reasoning.