“The web is a collection of data, but it’s a mess,” says Exa co-founder and CEO Will Bryk. “Here’s a video of Joe Rogan, Atlantic article there. There is no organization. But the dream is for the website to act as a database.”
Websets is aimed at advanced users who need to find things that other search engines can’t find, such as types of people or companies. Ask him about “futuristic hardware startups” and you’ll get a list of specific companies in the hundreds, not links to websites that mention those terms. Google can’t do that, says Bryk: “There are many valuable use cases for investors or recruiters or really anyone who wants any kind of data file from the web.”
Since then, things have moved quickly MIT Technology Review brought news in 2021 that Google researchers were exploring the use of large language models in a new kind of search engine. The idea soon attracted fierce criticism. But the technology companies didn’t pay attention. Three years later, giants like Google and Microsoft are fighting for a piece of this hot new trend with a slew of buzzy upstarts like Perplexity and OpenAI, which launched ChatGPT Search in October.
Exa is not (yet) trying to outdo any of these companies. Instead, he proposes something new. Most other search firms wrap large language models around existing search engines and use the models to analyze a user’s query and then summarize the results. But the search engines themselves have not changed much. Perplexity still directs its queries to things like Google Search or Bing. Think of today’s AI search engines as a sandwich with fresh bread but a stale filling.
More than keywords
Exa provides users with familiar reference lists, but uses the technology of large language models to reinvent how searching itself happens. Here’s the basic idea: Google works by crawling the web and building a large index of keywords that are then matched to user queries. Exa crawls the web and encodes web page content into a format known as embedding that can be processed by large language models.
Insertion turns words into numbers in such a way that words with similar meanings become numbers with similar values. In fact, this allows Exa to capture the meaning of text on web pages, not just keywords.
Large language models use embeddings to predict other words in a sentence. Exa search engine predicts another link. Type “startups making futuristic hardware” and the model will come up with (real) links that could follow the phrase.
But Exa’s approach comes at a cost. Coding pages instead of keyword indexing is slow and expensive. Exa has encoded several billion web pages, Bryk says. That’s a pittance next to Google, which has around a trillion indexed. But Bryk doesn’t see that as a problem: “You don’t have to put in the whole site to make it useful,” he says. (Fun fact: “exa” means 1 followed by 18 0s and “googol” means 1 followed by 100 0s.)