Tool invocation rewriting for zero-shot tool retrieval


Augmenting large language models (LLMs) with external tools, rather than relying solely on their internal knowledge, could unlock their potential to solve more challenging problems. Common approaches for such “tool learning” fall into two categories: (1) supervised methods to generate tool calling functions, or (2) in-context learning, which uses tool documents that describe the intended tool usage along with few-shot demonstrations. Tool documents provide instructions on tool’s functionalities and how to invoke it, allowing LLMs to master the individual tools.

However, these methods face practical challenges when scaling to a large number of tools. First, they suffer from input token limits. It is impossible to feed the entire list of tools within a single prompt, and, even if it were possible, LLMs still often struggle to effectively process relevant information from lengthy input contexts. Second, the pool of tools is evolving. LLMs are often paired with a retriever trained on labeled query–tool pairs to recommend a shortlist of tools. However, the ideal LLM toolkit should be vast and dynamic, with tools undergoing frequent updates. Providing and maintaining labels to train a retriever for such an extensive and evolving toolset would be impractical. Finally, one must contend with ambiguous user intents. User context in the queries could obfuscate the underlying intents, and failure to identify them could lead to calling the incorrect tools.

In “Re-Invoke: Tool Invocation Rewriting for Zero-Shot Tool Retrieval”, presented at EMNLP 2024, we introduce a novel unsupervised retrieval method specifically designed for tool learning to address these unique challenges. Re-Invoke leverages LLMs for both tool document enrichment and user intent extraction to enhance tool retrieval performance across various use cases. We demonstrate that the proposed Re-Invoke method consistently and significantly improves upon the baselines covering both single- and multi-tool retrieval tasks on tool use benchmark datasets.

Leave a Reply

Your email address will not be published. Required fields are marked *