Currently, LetMeDoIt AI core features heavily reply on the strengths of OpenAI function calling features, which offer abilities:
- to organize structured data from unstructured query
- to accept multiple functions in a single guery
- to automatically choose an appropriate function from numerouse available functions specified, by using the "auto" option.
Challenges in Using Function Calling Features Without an OpenAI API Key:
- Many popular open-source AI models lack support for function calling capabilities.
- Utilizing function calling with these open-source models often demands high-end hardware to ensure smooth and timely operation.
- Although a limited number of models, including Gemini Pro and certain open-source options, do offer function calling, their capacity is significantly limited, typically handling only one function at a time. This limitation places them well behind the advanced functionality of OpenAI, which can intelligently and efficiently select from multiple user-specified functions in a single request.
- In our exploratory research and tests, we discovered a viable workaround. This method, however, is practical only for those willing to endure a wait of approximately 10 minutes [on a 64GB RAM device without GPU] for the execution of even a simple single task when numerous functions are specified simultaneously.
In essence, no existing solution matches the capabilities of OpenAI's function calling feature. There is a clear need for an innovative and efficient method to implement function calling features with open-source models on standard hardware. After extensive experimentation and overcoming numerous challenges, the author has developed a new approach:
This novel strategy involves breaking down the function calling process into several distinct steps for multiple generations:
- Intent Screening via Tool Selection Agent
- Tool Selection via Tool Selection Agent
- Retrival of Structured Data
- Tool Execution
- Chat Extension
This methodology has been found to work effectively with freely available open-source models, even on devices lacking a GPU.
In case you are interested, you may check, for example, how we implement this approach with Llama.cpp
We invite further discussion and contributions to refine and enhance this strategy.