Perplexica Performance vs. Perplexity Vanilla vs. Copilot Perplexity #266

Zirgite · 2024-07-13T22:47:01Z

Zirgite
Jul 13, 2024

I made some tests in order to compare Perplexity and Perplexica answers. For Perplexica I used Phi-3 model with new prompt that is using according to Meta's terminology system 2 thinking. I am currently doing tests. My test prompt is:

The results are anonymous as the AI model GPTo or Claude 3.5 should not know which system is which, AI system 1, 2 etc.

"Evaluate the performance of the following Web AI Search systems focusing on the interaction between the AI model and the search results. Prioritize the overall quality of interaction, followed by content relevance, and related searches. Use the following criteria:
1. Interaction Quality:
◦ Relevance: How accurately does the AI model interpret and respond to the query?
◦ Clarity: How clear and understandable are the AI-generated responses?
◦ Helpfulness: How useful are the AI responses in guiding the user to relevant information?
◦ User Experience: How intuitive and seamless is the interaction with the AI model?
2. Content Relevance:
◦ Depth: Does the content provided cover the query comprehensively?
◦ Accuracy: Is the content factually correct and well-researched?
◦ Authority: Are the sources of the content reputable and reliable?
3. Related Searches:
◦ Relevance: How relevant are the suggested related searches to the original query?
◦ Diversity: Do the related searches cover a wide range of subtopics related to the original query?
◦ Usefulness: Are the related searches useful in refining or expanding the search?
Use the following ranking system from 1 to 10 for each criterion, where 1 represents poor performance and 10 represents excellent performance.
Present the results in a tabular format

Several useful founds.

USEFUL FIND 1 Perplexity Copilot may be more useful when you want to interact with close to real time information, regarding planning trip, weather etc.
Perplexica has issues as the searx has to be asked to give current news for the last day. Even if the AI model makes the correct promt the search engine does not use real time information, but uses the general search
This can be improved as the vanilla Perplexica is able to go to the current news feed, like local weather now.
Regarding tough topics
example difficult search: Query: Explain the concept of quantum entanglement and its potential applications in quantum computing.
Perplexity vanilla does not fall behind to copilot. It is very close in other tests too, not lower than 10%.

Regarding that search Perplexica and Perlpexity and Perplexity copilot were on par with

                           Perpexity vanilla  Perplexity copilot    Perplexica   Perplexica with demand for more depth and userfriendliness

Relevance | 9 | 9 | 8 | 9
Clarity | 8 | 8 | 7 | 8
Helpfulness | 9 | 9 | 8 | 9
User Experience | 8 | 8 | 7 | 8

USEFUL FIND 2 In hard topics but for which well established search results do exist Perplexity vanilla and Perlexica do not fall behind Perplexity copilot. Smaller models work exceptionally well

USEFUL FIND 3When I asked to get more depth and user-friendliness Perplexica delivered more.

USEFUL FIND 4 In Perplexica when changing the embedding the results can vary significantly. e.g. From local to ollama. also the different possiblities BGE etc. Here I get best results with ollama (not local) and llama 3 embeddings.

USEFUL FIND 5 The integration with the search engine can be improved to provide usefulness when searching for latest data like news

USEFUL FIND 6 Perplexity, windows copilot and Gemini are SPECIFICALLY instructed to be user friendly and get easy to understand results. And so when the topic is deep You need to tell them SPECIFICALLY that you are not the average user but a specialist as they do not know. I think that explains slightly lower scores.

Zirgite · 2024-07-13T22:56:51Z

Zirgite
Jul 13, 2024
Author

Here are the changes that I made for the web-search agent. To be sure I purged the old install of docker and made a new with the updated prompts.
My next plan would be to include possibly instructions for the answer to include the 4 criteria: relevance, Clarity, Helpfulness, User Experience

const basicSearchRetrieverPrompt = `
You will be given a conversation below and a follow-up question. Your task is to rephrase the follow-up question if needed so it can be used as a standalone query for web searching.
If it is a writing task or a simple greeting rather than a question, return \`not_needed\`.

### Self-Define Phase:
1. **Understand the Query:**
   - Identify key components of the query.

2. **Formulate Rephrased Questions:**
   - Create multiple rephrased questions that cover different aspects of the original query.

3. **Finalize Rephrased Question:**
   - Combine the rephrased questions into a single, comprehensive search query.

Example:
User Query:
- Find the best programming languages for AI development in 2024.
1. Understand the Query:
   - Key components: "best programming languages," "AI development," "2024."

2. Formulate Rephrased Questions:
   ["Top programming languages for AI 2024", "Best languages for AI development 2024", "Programming languages trends AI 2024"]

3. Finalize Rephrased Question:
   "Top programming languages for AI 2024 + Best languages for AI development 2024 + Programming languages trends AI 2024"

Conversation:
{chat_history}

Follow-up question: {query}
Rephrased question:
`;

const basicWebSearchResponsePrompt = `
You are Perplexica, an AI expert at searching the web and answering users' queries.

### Self-Define Phase:
1. **Understand and Analyze Search Results:**
   - The context consists of search results containing brief descriptions of the content.
   - Analyze the results for relevance, accuracy, and completeness.

2. **Formulate Response Plan:**
   - Use the search results as context.
   - Your response should be unbiased and journalistic in tone.
   - Do not repeat the text.

3. **Write Response:**
   - Generate an informative and relevant response based on the context.
   - Always use search results as context for your answer.
   - Do not direct the user to open any links or visit websites. Provide the answer in the response. If the user asks for links, you can provide them.
   - Responses should be medium to long, informative, and relevant.
   - Use markdown for formatting. Use bullet points for lists.
   - Make sure the answer is informative and well-cited using [number] notation.
   - Place citations at the end of the relevant sentences.

### Self-Refine Phase: This is an internal process, not to be shared with the user.
1. **Evaluate Written Response:**
   - Check if guidelines have been followed.

2. **Identify Areas for Improvement:**
   - Pinpoint specific areas for enhancement.

3. **Iterate Written Response:**
   - Redact further if necessary based on guidelines and refined understanding.

4. **Refine and Synthesize:**
   - Combine results from multiple iterations into a comprehensive response.

Anything within the \`context\` HTML block is retrieved from a search engine and should be cited but not mentioned explicitly in your response. Today's date is ${new Date().toISOString()}.
<context>
{context}
</context>

If there’s nothing relevant in the search results, you can say, "Hmm, sorry, I could not find any relevant information on this topic. Would you like me to search again or ask something else?".
`;

I had to tweak a lot, a lot the embeddings to make the new prompt work as intended: as some combinations did not provide good results at all, like irrelevant results and the ai had to make up the answer alone, hallucinating results that were not there, no web results at all.
I have provided another prompt in bug discussion but it is a project for copilot funtionnality that could be inteded in the future that includes not just one search but several searches that the AI make guessing the intent. Here the prompt mirrors closely the origina logic and I hope will work as intended.

0 replies

Zirgite · 2024-07-16T10:08:18Z

Zirgite
Jul 16, 2024
Author

updated promt, possibly slightly better.

const basicWebSearchResponsePrompt = `
You are Perplexica, an AI expert at searching the web and answering users' queries.

### Guidelines for Quality:

1. **Interaction Quality**:
   - **Relevance**: How accurately does the AI model interpret and respond to the query?
   - **Clarity**: How clear and understandable are the AI-generated responses?
   - **Helpfulness**: How useful are the AI responses in guiding the user to relevant information?
   - **User Experience**: How intuitive and seamless is the interaction with the AI model?

2. **Content Relevance**:
   - **Depth**: Does the content provided cover the query comprehensively?
   - **Accuracy**: Is the content factually correct and well-researched?
   - **Authority**: Are the sources of the content reputable and reliable?

### Self-Define Phase:
1. **Understand and Analyze Search Results:**
   - The context consists of search results containing brief descriptions of the content.
   - Analyze the results for relevance, accuracy, and completeness.

2. **Formulate Response Plan:**
   - Use the search results as context.
   - Your response should be unbiased and journalistic in tone.
   - Do not repeat the text.

3. **Write Response:**

   - Always use search results as context for your answer.
   - Generate an informative and relevant response based on the context and use [Guidelines for Quality].
   - Do not direct the user to open any links or visit websites. Provide the answer in the response. If the user asks for links, you can provide them.
   - Responses should be medium to long, informative, and relevant.
   - Use markdown for formatting. Use bullet points for lists.
   - Make sure the answer is informative and well-cited using [number] notation.
   - Place citations at the end of the relevant sentences.

### Self-Refine Phase: This is an internal process, not to be shared with the user.
1. **Evaluate Written Response:**
   - Check if [Guidelines for Quality] have been followed.

2. **Identify Areas for Improvement:**
   - Pinpoint specific areas for enhancement.

3. **Iterate Written Response:**
   - Redact further if necessary based on guidelines and refined understanding.

4. **Refine and Synthesize:**
   - Combine results from multiple iterations into a comprehensive response.

Anything within the \`context\` HTML block is retrieved from a search engine and should be cited but not mentioned explicitly in your response. Today's date is ${new Date().toISOString()}.
<context>
{context}
</context>

If there’s nothing relevant in the search results, you can say, "Hmm, sorry, I could not find any relevant information on this topic. Would you like me to search again or ask something else?".
`;

0 replies

Zirgite · 2024-07-17T07:19:21Z

Zirgite
Jul 17, 2024
Author

Using the improved prompt Perplexica ranked better than Perplexity, Edge copilot, and Google gemini in web search.
Query: I am conducting a research project on the role and impact of artificial intelligence (AI) in the legal profession. Please provide a detailed and concise report within a two-page limit. The report should include an overview of how AI is used in legal contexts and key milestones in its integration. Highlight the main types of AI technologies, such as machine learning and natural language processing, and their primary applications in legal practice, including legal research, case law analysis, and contract review. Discuss the main benefits of AI for legal professionals, such as increased efficiency and accuracy, and the key challenges and limitations, including ethical considerations and data privacy concerns. Additionally, include emerging trends and predictions for AI in the legal profession over the next 5-10 years. Ensure the information is up-to-date and sourced from reputable sources, with citations and references for all data and claims. Where possible, provide links to full-text articles and reports for further reading.

Summary of Performance

Interaction Quality

Relevance: Perplexica (Web AI 1) performed the best, accurately interpreting and responding to queries with a score of 8. Other AIs scored between 6 and 7.
Clarity: Responses were generally clear and understandable, with Perplexica and Perplexity Copilot scoring highest at 8.
Helpfulness: Perplexica and Perplexity Vanilla were most helpful in guiding users to relevant information, scoring 8 each.
User Experience: Perplexity Copilot provided the most seamless interaction (score of 8), while others were slightly lower.

Content Relevance

Depth: Perplexica provided the most comprehensive content, scoring 8. Other AIs were consistent but slightly less detailed.
Accuracy: Perplexica again led with the highest accuracy score, closely followed by others.
Authority: Perplexica's sources were the most reputable, scoring highest at 8.

Related Searches

Relevance: Suggested related searches were mostly relevant, with Perplexica and Perplexity Copilot performing the best.
Diversity: Perplexica and Perplexity Vanilla offered the most diverse range of related searches.
Usefulness: Related searches from Perplexica and Perplexity Copilot were the most useful for refining or expanding searches.

Conclusion

Perplexica (Web AI 1) demonstrated the best overall performance across all criteria, particularly excelling in interaction quality and content relevance. Perplexity Copilot (Web AI 2) and Perplexity Vanilla (Web AI 3) also performed well but had slight areas for improvement. Microsoft Copilot (Web AI 4) and Google Gemini (Web AI 5) provided satisfactory performance but lagged behind in user experience and content depth.

Overall, Perplexica's improved prompt and use of the phi3 medium AI model contributed significantly to its superior performance, making it the best choice for comprehensive, accurate, and user-friendly AI search interactions.

0 replies

Deanozk · 2024-07-22T18:42:33Z

Deanozk
Jul 22, 2024

I need to be able to add and search on local data sources like a vectorDB? And also restrict the searches to certain urls how can this be done?

1 reply

Zirgite Jul 26, 2024
Author

I think you need to repost this again in suggestions. There are standalone applications though that do exactly that. Why mix it up, and not focus on the search functionality. There are a lot of things that need to be done as for example proper chunking of the text. And there are projects that focus exclusively on that also fine tuning the LLM model.

lrq3000 · 2024-08-25T22:28:09Z

lrq3000
Aug 25, 2024

@Zirgite Thank you for your excellent work on improving and benchmarking Perplexica copilot.

I am just a bit lost about the differences with your other implementations in #258, can you please clarify what is the intent here vs there?

2 replies

Zirgite Aug 26, 2024
Author

I agree that is may appear a little messed up. At first I wanted to make a more detailed prompt that uses cutting edge prompting techniques. Then it has been tested versus Perplexity vanilla and copilot. There is methodology for testing and the arbiter was chat gpt 4o, not knowing which model is being tested.
Then in the last version of Perplexica the default prompt was improved. So I updated my prompt integrating the new ideas of the default prompt.
It is important to note 2 things.
-AI summarization of web results do not need an extremely big model. From here the big finding perpelxity vanilla is almost as good as their premium version when analysis is not required e.g. find and compare product x to product y.
-Different LLM react to different prompts differently. Some models react very well to highly detailed prompts some models deal well with more simplified. I have different versions of advanced prompts at the end of the day the results were not very different but were all good enough.
-The embeddings do play a role so everyone has to check all the combinations
-However the result is worth it as the available models are tuned not pro deep professional use but to deliver consummer friendly result.
From there the useful idea if you use perplexity or any other web search ai you need to be explicit on the level of details and what kind of user you are.
-last useful find, do not be happy with the first answer you may try to make the search again sometimes the results are much better.

Zirgite Aug 26, 2024
Author

here the idea was to test the results of the new prompts of perplexicA versus perplexitY available at https://www.perplexity.ai/
So I made with chatgpt a methodology for testing and make the gpt4o as blind arbiter not knowing which results are being tested.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perplexica Performance vs. Perplexity Vanilla vs. Copilot Perplexity #266

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 5 comments 3 replies

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Perplexica Performance vs. Perplexity Vanilla vs. Copilot Perplexity #266

Zirgite Jul 13, 2024

Replies: 5 comments · 3 replies

Zirgite Jul 13, 2024 Author

Zirgite Jul 16, 2024 Author

Zirgite Jul 17, 2024 Author

Deanozk Jul 22, 2024

Zirgite Jul 26, 2024 Author

lrq3000 Aug 25, 2024

Zirgite Aug 26, 2024 Author

Zirgite Aug 26, 2024 Author

Zirgite
Jul 13, 2024

Replies: 5 comments 3 replies

Zirgite
Jul 13, 2024
Author

Zirgite
Jul 16, 2024
Author

Zirgite
Jul 17, 2024
Author

Deanozk
Jul 22, 2024

Zirgite Jul 26, 2024
Author

lrq3000
Aug 25, 2024

Zirgite Aug 26, 2024
Author

Zirgite Aug 26, 2024
Author