Published on September 4, 2024

Our take on text-2-sql

Ben

@benfking

TLDR; Build AI systems that tolerate mistakes.

Most people don't know SQL, and that's a problem: it stops too many people in a business setting from interacting with data. While for decades we've somewhat overcome this with visual interfaces, it's no surprise that the rise of large language models (LLMs) has caused the sudden creation of a flurry of tools that promise to take your question and turn them into SQL. The promise of a computer answering "What is the total number of orders?" seems within reach.

The unfortunate truth is that these tools don't really work even though many have valiantly tried (expect half your queries to be wrong). Worse yet, we don't think they will work for a very long time. Delphi team (now Cube)'s work shows that the missing ingredient isn't the "cleverness of the model" or the amount of data, it's the context of the question. The Delphi team made text-2-sql work by first building a semantic layer that the LLM could lean on. When an analyst takes a request, they don't just write a SQL query, they ask questions, they clarify, they understand the data and the business. This is what the semantic layer captures.

So while the promise of text-2-sql is possible at the moment, the ugly truth is that it requires analysts to build in the context first - a hard task when you are moving at a fast pace. While you could argue that shifting engineers work from answering "analyst" questions to building a semantic model is a positive shift, it's not quite the panacea we were hoping for.

Example

Let's look at a simple example: suppose you run the bank accounts for a company, and you want to ask "What's my current exposure to bad interest rates?" to ensure you're getting the best interest rates for your company. This question is a simple one, but it's not easy to answer. First, you must have the following information in your database:

Accounts with columns: account_id, account_name, account_type
Account Transactions with columns: account_id, balance, date, transaction_type, value (positive or negative)
Rates Of Saving Accounts with columns: account_id, rate, date_of_rate

Second, you need to go through multiple steps such as:

Getting the latest balance for each account of these accounts
Getting the latest rate for each account of these accounts
Joining the three tables together

While the above is relatively simple, you still haven't answered the final bit of the question: what does "exposure to bad interest rates" mean? Does it mean that the rate is above a certain threshold? Does it mean that the rate is above the average rate? Does it mean that the rate is above the rate of inflation? It's no wonder, the LLM needs help. All of these steps and uncertainty are space for a model to be wrong, so even though we began with a simple question, it's not easy for a model to reliably answer correctly.

Text-straight-2-bad-sql

A lot of text-2-sql tools tackle the problem in a "text-straight-2-sql" way: take any question and return SQL. This approach to data tools is the "Shangri-La", "Atlantis", "El Dorado": a question straight to an answer. It's a beautiful idea, but unfortunately it's fiction. They all behave something like the following tool (excuse the comically bad UX):

You put the question into your black box and all you really can do is hope you get the right answer. It's easy to think of the ways this can go wrong:

The model could not understand the question and so give you bad SQL.
The model could output incorrect SQL.
The model could output a slightly wrong answer that you can't tell is wrong.
The answer is slightly off so you keep trying to adjust your question until it gets it right.
The SQL output could be so long and hard to understand that you can't tell if it's correct or not.

Ultimately, if any of the above happens, you're kind of stuffed. The tool doesn't allow for failures. So what is the solution?

DAGs and chain of thought

Before we get into the solution, let's just clarify what a DAG is: it's a fancy word for a graph of nodes and edges (that are directed) with no loops. It's a way of representing a series of steps. Remember the flow diagram from above?

Quary solves the text-2-sql problem by firstly being a platform to build these DAGs, and secondly by accelerating that process with AI. The DAG is core, the AI is the cherry on top. That's the key difference. While you could try to ask "What's my current exposure to bad interest rates?" and hope for a good response, it's the ability to overcome AI's shortcoming by breaking down the problem that is critical. AI is fallible and that's why we're betting on:

Building AI systems that tolerate incorrect answers.

In Quary, you are just as able to give the tool the question and hope for the right answer. Unlike other tools though, if the AI fails, you have tools and a clear route to fixing it. All the problems that we listed above can be addressed. Crucially, it forces you to think through the problem step-by-step which often provides little nuggets of information at each step. At each step, you can double-check the data, you can see the transformations and you can see the data that is being used to generate the answer. Each step is verifiable so you can give yourself extra confidence that the final answer is correct.

In the LLM world, there is a concept called chain-of-thought prompting. In this type of prompting, you ask the model to think through the problem step-by-step, rather than just asking for the final answer. By doing so, models perform better. We see this DAG approach as a chain-of-thought prompting approach on steroids. Not only does the model explain each step, you can see each step and Quary even generates tests for each step so each layer of the transformation is more verifiable. Suddenly, you have a huge amount of confidence in the final answer and not only that, you've probably learned something along the way.

In addition to having answered a single question, you have built part of a DAG of knowledge that is deployable and reusable. Rather than starting from scratch on the next question, you can reuse parts of what you have built. For example, while we built an exposure to bad interest rates, it's easy to see how I could use part of the same DAG to calculate the total money a business is holding by summing the latest account balances.

Quary AI action

It's with this in mind that we are announcing three new AI features in Quary: Generate, Explain, and Edit.

Generate

Generate, as the name implies, lets you generate a model from an English prompt. It's a powerful feature that should allow you to write queries faster and more accurately. It's a feature we are excited to see how it will be used. See the below video for a demo of how we generated a latest_account_value model from an English prompt and created, and saved, a model.

Explain

Explain, plainly, lets you explain a model. By referencing a model and asking a question, Quary will explain the transformation that the model is doing. This is a powerful feature that allows you to understand the transformations that are happening in your DAG. See the below video for a demo of how we explained the latest_account_value model.

Edit

Edit is also hopefully self-explanatory. It allows you to edit a model. This is a powerful feature that allows you to fix a model that is incorrect. See the below video for a demo of how we edited the latest_account_value model to add a human-readable timestamp to the output.

If you have GitHub copilot, you can try all these features for free in our extension. Let us know your feedback and thoughts (and to more model building)!