Today’s business leaders recognize that some application of generative AI has great potential to help their business perform better, although they may still be exploring exactly how and what the ROI may ultimately be. Indeed, as companies turn their gen-AI prototypes into scaled solutions, they must take into account such factors as the technology’s cost, accuracy, and latency to determine its long-term value.
The growing landscape of large language models (LLMs), combined with the fear of making the wrong decision, leaves some businesses in a quandary. LLMs come in all shapes and sizes and can serve different purposes, and the truth is, no single LLM will solve every problem. So, how can a business determine which one is the right one?
Here, we discuss how to make the best selection so your business can use generative AI with confidence.
Chief Design and Strategy Officer at New Relic.
Choose your level of LLM sophistication — the sooner, the better
Some businesses are conservative in adopting an LLM, launching pilot projects, and waiting for the next generation to see how that might change their application of generative AI. Their reluctance to commit may be warranted, as diving in too early and failing to test it correctly could mean big losses. But generative AI is a rapidly evolving technology, with new foundational models introduced seemingly weekly, so being too conservative and continuing to wait for the technology to evolve may mean you never actually move forward.
With that said, there are three levels of sophistication companies may consider when it comes to generative AI. The first is a simple wrapper application around GPT, designed to interact with OpenAI’s language models and provide an interface to guide text completions and conversation-based interactions. The next level of sophistication is using an LLM with retrieval-augmented generation (RAG). RAG allows businesses to enhance their LLM output with proprietary and/or private data. GPT-4, for example, is a powerful LLM that can understand nuanced language and even reasoning.
However, it hasn’t been trained on the data for any specific company and can lead to potential inaccuracies, inconsistencies, or irrelevancies (hallucinations). Companies can get around hallucinations by using implementations like RAG, which allows them to merge insights from a base-model LLM with some of the data unique to their business. (It should be noted that alternative large-context models like Claude 3 may actually render RAG obsolete. And, while many are still in their infancy, we all know how fast technology moves, so obsolescence may come sooner than later.)
In the third level of generative AI sophistication, a company runs its own models. For example, a company may take an open-source model, fine-tune it with proprietary data, and run the model on its own IT infrastructure in place of any third-party offerings like OpenAI. It should be noted that this third-level LLM requires the oversight of engineers trained in machine learning.
Apply the right LLM to the right use case
Given the options here and the differences in cost and capability, companies must determine exactly what they plan to accomplish with their LLM. For example, if you’re an ecommerce company, human support is trained to intervene when a customer is at risk of abandoning their cart and help them decide to complete their purchase. A chat interface will allow for the same result at one-tenth the cost. In this case, it may be worth it for the ecommerce company to invest in running its own LLM with engineers to control it.
But bigger isn’t always cost-effective — or even needed. If you’re a banking application, you can’t afford to make transaction errors. For this reason, you’ll want tighter control. Developing your own model or using an open-source model, fine-tuning it, applying heavily engineered input and output filters, and hosting it yourself gives you all the control you need. And for those companies that simply want to optimize the quality of their customers’ experience, a well-performing LLM from a third-party vendor would work well.
A note about observability
Regardless of the chosen LLM, understanding how the model performs is key. As tech stacks become increasingly complex, homing in on performance issues that may pop up in an LLM can prove challenging. Additionally, due to the uniqueness of the tech stack and the very different LLM interactions, there are entirely new metrics that must be tracked, such as time-to-token, hallucinations, bias, and drift. That’s where observability comes into play, providing end-to-end visibility across the stack to ensure uptime, reliability, and operational efficiency. In short, adding an LLM without visibility could greatly impact how a company measures the technology’s ROI.
The generative AI journey is exciting and fast-paced — if not a bit daunting. Understanding your business’s needs and matching those to the right LLM will not only ensure short-term benefits but also lay the foundation for ideal future business outcomes.
This article was produced as part of TechRadarPro’s Expert Insights channel where we feature the best and brightest minds in the technology industry today. The views expressed here are those of the author and are not necessarily those of TechRadarPro or Future plc. If you are interested in contributing find out more here: https://www.techradar.com/news/submit-your-story-to-techradar-pro