Shortly after ChatGPT came out, developers around the world started using it to create new types of software that could tackle a complex array of tasks autonomously. These apps, known as autonomous agents, are designed to complete a series of iterative tasks that can incrementally respond to a user's request to do something complicated. They work by using ChatGPT to understand and orchestrate the series of tasks that must be completed to complete a user's request, answer a question, or respond to a command. One of the first bots to do this was the AutoGPT bot which, at the time of this writing, had over 143K stars on GitHub. Clearly, there are lots of people that are interested in creating more autonomous, self-sustaining bots capable of completing a wide variety of tasks. And now, with the release of ChatGPT's API, it's possible to do so.
However, creating an agent in the style of an AutoGPT bot isn't easy. It requires programming knowledge, infrastructure, and lots of patience. At Locusive, we've been developing autonomous agents for business users since the early days of ChatGPT 3.5, and we wanted to put together this best practices guide to help other developers that are working on creating their own bots. This guide provides a break down of best practices that we've found to work well for us. It's intended for software engineers, data scientists, and PMs who are building out their own bots, along with any curious folks who want to know more about autonomous agents.
Introduction to AutoGPT agents
Have you ever wished a smart robot could just do all your errands for you? Maybe you need to send out 10 personalized emails, create a few social media posts, and respond to a customer's feature request all in one day. Maybe you want to have a recording of your next meeting, publish its notes to your internal wiki, and get a prioritized list of takeaways. Or maybe you just want to book a WeWork desk for you that's close to your meeting tomorrow. Chances are, you have a lot of tasks that you need to get done that don't require a lot of effort, but end up taking a lot of time. Until recently, you'd either need to outsource all of your tasks to a virtual (human) assistant, hire an intern or an assistant to manage these details for you, or schedule time to tackle these time-consuming tasks. That's because each of these tasks require some small, but significant, amount of human knowledge, and until recently, you couldn't automate away that knowledge with a bot.
But with the advent of large language models like ChatGPT, you can now build software that's able to respond to a wide variety of tasks, some of which can be sophisticated and require multiple steps. These software applications use tools like ChatGPT to break down your request into a series of tasks that can be accomplished one by one, until ultimately they respond back to you with an answer or confirmation that whatever you needed to get done has been handled. To function properly, though, these agents need to be plugged in to the different software tools that they need to accomplish common tasks (think search engines, APIs, data sources, etc), and they also need to be managed by well-constructed software that orchestrates their behavior without letting them go off the rails.
When creating an agent, it's your job as the developer to ensure all of these issues are handled seamlessly for the end user. While creating an autonomous agent is similar to creating other pieces of software, there are some agent-specific nuances that you'll need to ensure you're handling. The list below provides some best practices for doing so.
Best practices for building an agent
1. Design for an iterative approach
Autonomous agents, just like people, need to plan for and execute on the steps that are required to complete a task in an iterative fashion. They can't do everything all at once, so you'll need to ensure you're designing your app to run over a series of steps. One request can involve multiple iterations with ChatGPT, executing an API, determining if there's enough information to proceed, and providing a final answer to a user. For example, if you want your bot to find and email 10 prospective customers for you, it needs to first understand where to find your customers (maybe you have a tool that can search LinkedIn), how to determine if a customer is a prospect (scraping the person's profile and identifying if they have the correct title or position), send the person an email (call an email API to compose and send the email), and provide you final confirmation (respond back to you wherever you sent in your command). Each of these needs to be executed properly, and errors need to be caught and handled appropriately. In addition, ChatGPT (or whatever LLM you're using), will need to play a role in nearly every part here.
2. Focus on exactly one task at a time
It's tempting to prompt ChatGPT with a complex set of instructions that provide multiple logic branches depending on the context it has been given or the state that it's in, but just like with humans, asking a bot to do too many things at once will confuse it. For example, let's say your agent is supposed to answer a user's question by using contextual information from a set of trusted documents, but if it can't find the answer in the documents, it can query a search engine, lookup a value in the database, or call an API. You might be tempted to provide a complex prompt that asks the both to either provide the final answer or determine if it should invoke the search engine. But many times, this could result in a sub-optimal response.
Rather than prompting it to either provide the final answer with the documents or make a decision on which tool to run next, you should prompt it to simply select one tool to run. You can create a new tool to provide context from relevant documents and include that tool in the list of available tools that can be invoked, and let ChatGPT figure out which tool to run next. You can even provide a description of the available documents that will be provided if ChatGPT selects the "document lookup tool." By asking ChatGPT to select a tool, rather than either provide the final answer or select the next best tool if not enough context exists, you'll provide stronger guardrails against ChatGPT giving a bad or unexpected answer.
3. Be prepared for unexpected answers and formats
Your application will need to interact with ChatGPT over a series of intermediate steps. At times, it may ask your LLM which tool to run next, at other times, it may ask whether there's enough information to provide a final answer, and at other times, it may need to use an LLM to process a JSON response from an API. You can prompt the LLM to perform whatever actions you need it to perform and provide you with the results in a specific format, but because LLMs are stochastic, it won't always follow your instructions. For example, if you ask it to provide a single "Yes" or "No" as its answer, maybe 90% of the time it will do so, but the other 10% of the time, it might give you something like a "Final answer: Yes", or "Yes, there is enough information."
Your code needs to be prepared to handle unexpected formats like this, and it should have enough "fuzzy logic" to deal with an answer that can get you to the next step without requiring additional user input. You could potentially use ChatGPT to provide a cleaned up answer for any unexpected responses you receive, but you could also implement simple checks using regular expressions or pattern matches to process an unexpected response. Your code should be aware of what state it's in (see (1) above) so that it can properly handle an unexpected answer.
4. Expect significant latency
The downside of having such a powerful tool is that it takes a relatively long time to provide its responses. Waiting for a response from ChatGPT could add minutes (or more) to the response time for a complicated request, and users aren't used to waiting a long time to have their requests handled. Your frontend needs to have smart UX indicators in place to let the user know your request has been received and is being handled. Progress bars work great here, but if those aren't possible, even a little icon or spinning indicator can work well.
5. Limit retries
If you're not careful, your code might end up in an infinite loop, trying the same thing over and over again. Your application needs to be smart enough to keep track of how many times it has tried something that could potentially not work, and if it has hit some maximum threshold, it needs to either ensure it tries something new or respond with a failure to the user. This goes for individual tasks within the larger workflow of a single request, but it also applies to the entire request as a whole. If your application has iterated over a series of actions more than it should have, you need to ensure that it cuts itself off and eventually gets back to the user. If you're not careful, a poorly designed application can run in an infinite loop and never get back to the user while also using up all of your system resources and running up your cloud and LLM API bills.
6. Handle errors clearly
Your agent will inevitably run into errors. Maybe an API will time out, or the user won't have access to some object, or ChatGPT gives you the wrong instructions to execute. Your application needs to handle errors at every step of the way, surface them appropriately, and respond to the user with a clear and concise error message to let them know what went wrong. Sometimes the user can rephrase their request, other times, a user might have to retry their request later, and yet at other times, you might need to get alerted about a bug and your user can't do anything until you fix it. In any case, your user doesn't want to see a generic "I'm sorry, something went wrong error." Try to provide them with as many (non-technical) details as you can so that they aren't left guessing what to do next or why they can't get an answer.
7. Be conscious of context windows
With ChatGPT, you're limited in both the number of words you can send and also the number of words you can receive back in a response. What's worse, the word limit for your response depends on the the length of your request. If you're not careful, your internal interactions with ChatGPT might get so long and complicated that there are no tokens left to respond back to the user. In cases like this, your application will need to either cut down on the amount of context that you're sending to the model or decrease the message history that you send to the model.
8. Clearly describe what tools are available and what they do
Any good agent will have a comprehensive list of tools that can be used to execute on the intermediate tasks that must be completed to handle a user's request. At some point, ChatGPT will need to decide what tool needs to be run next, or if you're using the new "functions" feature, it will need to understand which function must be called next to accomplish the next task. It's important to clearly describe what your tool (or function) does, how it works, what inputs it expects, what outputs it produces, and when it should be run. The more context that ChatGPT has, the better it will be at providing an intermediate response that moves the entire interaction forward.
9. Add support for new tools slowly and selectively
Despite the importance of having a large number of tools available for an agent to use, each tool will have its own nuances, and the way you invoke and process responses from your tools will be highly specific to each tool. Rather than adding as many tools as you can as fast as you can, figure out the primary purpose of your agent and add only those tools that it needs to do its core job. With each tool, focus on adding as many guardrails, high-quality prompts, and error handling capabilities as you can so that your users are left with a clean and seamless experience. It's better to build a limited agent that can do a few things really well rather than a generalist agent that does a lot of things poorly.
The future of agents
As large language models become more and more sophisticated and widely-used, we'll inevitably see a plethora of new agents for different verticals and industries. Companies will create autonomous agents designed to handle their internal operations, startups will create agents designed to be a better personal assistant, and it's possible that in the future, agents will even talk to other agents to accomplish high-level goals and objectives. When starting off with building an agent, we recommend choosing a niche that you can dive into and solve well. Grow your agent around a single use case and ensure it can work well for that use case before expanding its capabilities. Finally, if you're interested in having your own branded agent without the hassle of creating everything from scratch, Locusive's technology can help you launch your own chatbot using reliable APIs and infrastructure for handling your user's queries. Just get in touch if you're interested in learning more.
---