Research & Opinions

Generative Intelligence #1: LLMs For Search

In our inaugural newsletter, we go over how LLMs can improve internal and external search for companies.
Shanif Dhanani
4.5 minutes

Welcome to our new weekly newsletter on all things Gen AI. Every week, we'll send out a new email that dives deep into a Gen AI topic that we think is important to know about.

---

Today, let's talk about how Large Language Models (LLMs), like ChatGPT, Bard, and Claude, are changing the game in data search.

Ever found yourself sifting through an ocean of files for that one elusive email? Or chasing down a colleague for an answer hidden deep in the recesses of a weeks-old chat? You're not alone. This is a common pain point, especially for businesses juggling huge amounts of data.

LLMs are here to change that. These AI models can process and analyze massive datasets, extracting the useful bits and making data access swift and efficient.

The Magic of LLMs

When you were a kid, you were probably told that reading was good for you. It turns out it’s good for AI too. LLM models like ChatGPT were trained on hundreds of billions of words from internet sites, library books, magazine articles, and even your Reddit posts you wish everyone had forgotten about. They’ve binge-read almost anything and everything you can think of.

All this reading has taught them how to learn, understand, and detect patterns and language relationships and become really smart at understanding how all these words fit together. This is also what makes LLMs extremely good at handling natural language search queries.

Forget about finding the right keywords for your search engine or the three sub-menus in the app containing the report with the one data point you need. Just ask what you want and the LLM will scan its internal knowledge or the contextual data you provide it and return the most relevant information for your question.

Time to retire your search engines and complex interfaces?

We've all felt the pain of searching through a tome of obscure product documentation just to answer a client's question. Now, with retrieval augmented generation (RAG), you can simply provide the right docs to an LLM and just ask it something like "Can you briefly explain how the syncing feature works on the Xpass2 XSM?"

LLMs can take the headache out of finding some nuanced detail in large amounts of documentation. Just ask an LLM something like "Can you explain how the syncing feature works on the Xpass2 XSM?" The model will quickly scan through the relevant materials and provide a straightforward answer, saving you the trouble of decoding technical documents or waiting for a response from a busy colleague.

This can work with more than just documentation, too. For example, with an LLM that’s prompted correctly, you can simply ask a question like "What's the phone numero for Jane Smith at OTK Corp?" or “What were our Q3 sales numbers for the Western region last year?” and get a response back in seconds.

Bringing AI Into Your Data Ecosystem

It may sound tempting to start dumping all your data into an LLM and let it do the heavy lifting. But hold on. You can't simply shovel in your data and expect the LLM to sort it out seamlessly. The old adage "garbage in, garbage out" holds true.

How do I get started?

It turns out it’s not that easy, and you can’t just force-feed all your data into the LLM and expect it to work how you need it to. Garbage in, garbage out holds true here.

First, your data is probably not very AI-friendly. Your documents, spreadsheets, and databases are probably in many different formats and buried under downright impossible-to-remember folder structures or in suspect tools your company was supposed to decommission five years ago.

You have to put in the work to clean, structure, and enrich that data. For example, if you have important data in a Google Sheet, you’re going to need to reorganize that data, transforming each row into some sort of JSON or list-based format, and storing it in a vector database so it can be used as a reference later.

Also, keep in mind that while LLMs are powerful, they're not flawless. They can make mistakes, or "hallucinate" an utterly wrong answer. That’s why, whenever you’re asking a fact-based question, you need to provide your LLMs with the right context to answer your question (known as retrieval augmented generation, or RAG), and instruct it to only use the context you’ve provided it to answer your question.

Generative AI chatbots are the future of data search

In a nutshell, generative AI models like ChatGPT are reshaping how we interact with data. They can simplify finding information from an array of sources, but they also need thoughtful data preparation and human oversight to perform optimally. Don't expect them to understand everything about your data instantly. Striking the right balance between AI capabilities and human expertise is key to leveraging these AI models efficiently.

That’s all for this week. Feel free to reach out to us with any questions or feedback. And if you want to take a look at how an AI-enabled search chatbot can work, make sure to try the Locusive chatbot here.

Until next time,

Alain, from Locusive