Skip to content

Automatically Generating URLs to Public API Documents & Reducing Hallucination (Invalid URLs) #8

@in-c0

Description

@in-c0

Issue

The current method involves manually prompting ChatGPT with the following:

"Generate the URLs pointing to the appropriate page of popular public APIs based on this table"
(Refer to api-docs-urls.csv for the table format)

Then, I repeatedly request "10 more" results and merge the tables together. However, this approach often leads to hallucinations (invalid URLs) or incorrect data. This reduces the utility of the generated table and creates additional overhead in validation and correction.

🤔

Proposed Solution

1 - One agent generates and maintains a list of "names" of APIs - hopefully this will lead to less hallucination

2 - One agent tries to find all candidate URLs to the API document page - based on the API names from step 1.

3 - One agent validates the URLs (head requests) so that only live and reachable URLs make it into the final table

4 - One agent connects these candidate URLs to their "best matches" (e.g. Privacy, TOS, etc) (chances are that several APIs exist under a single known name, e.g. OpenAI API -> OpenAI GPT-4 API or OpenAI Embedding API ... in that case, we might have to generate new columns for the sub-APIs)

It's a vague pipeline yet, I'll try to update it as I begin to work on it sometime soon!

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions