Configuration
Configuring your project is done with a simple YAML file, named quary.yaml
and placed at the root of a project. The file defines which database to use and how to connect to it. The following are the different options for configuring the database.
BigQuery
BigQuery is a cloud based database, it is a good choice for large projects and for production. It is operated by Google and is part of the Google Cloud Platform. To use BigQuery you need to create a project in the Google Cloud Platform and enable BigQuery for the project. You can configure the connection to BigQuery with the following configuration.
Snowflake
Snowflake is a cloud-based data warehousing platform that offers scalable, on-demand storage and computing power and that separates storage and compute, enabling users to independently scale and pay for each.
Clickhouse
Clickhouse is a column-oriented database management system developed by Yandex. It is designed for OLAP workloads and is a good choice for large-scale data storage and analysis. Unlike other databases, Clickhouse does not have a concept of a schema, so you only need to specify the database.
All connection details are passed as environment variables. To connect to Clickhouse you need to set the following environment variables: CLICKHOUSE_HOST
, CLICKHOUSE_PORT
, CLICKHOUSE_USER
and CLICKHOUSE_PASSWORD
.
Postgres
Postgres is a popular open-source database. It is a good choice for small to medium projects and for production.
All connection details are passed as environment variables. The following environment variables are used to connect to the database. PGHOST
, PGPORT
(if not set defaults to 5432), PGUSER
, PGPASSWORD
, PGDATABASE
. They are the same as those used by the psql
command line tool.
Model Configuration
For Postgres, models can also be configured in order to better suit your needs. There are two main configurations available, unlogged
and indexes
.
Unlogged Table
Details about unlogged tables can be found in the Postgres documentation. Unlogged tables are not written to the write-ahead log, which makes them faster to write to. They are not crash safe, so they are risks to consider.
They can be configured as follows:
Indexes
Indexes can be added to tables to speed up queries. They can be configured as follows. The columns
field is a list of columns to index, and the unique
field is a boolean that specifies if the index should be unique.
Redshift
Redshift is a cloud-based data warehousing service provided by Amazon Web Services (AWS). It is designed for large-scale data storage and analysis.
All connection details are passed as environment variables. The following environment variables are used to connect to the database:
RSHOST
RSUSER
RSPASSWORD
RSDATABASE
RSPORT
(if not set defaults to 5432)
DuckDb
DuckDb is an in-process SQL database, it's a feature-rich SQL dialect that it is a great way to query data locally on your machine.
File Based
The database is stored in a file on the disk and is persisted between runs. In this case the path to the file is specified, and each run creates the file if it does not exist.
In Memory
The database is created in memory and is lost when the application is closed. It is faster than the file-based version and is good for rapid testing.
SQLite
SQLite is a database type that runs locally, it is a file-based database. It is a good choice for small projects and for testing. It can be operated in two manners
In Memory
The database is created in memory and is lost when the application is closed. It is faster than the file-based version and is good for rapid testing.
File Based
The other option is to use a file based database, where the database is stored in a file on the disk and is persisted between runs. In this case the path to the file is specified, and each run creates the file if it does not exist. It is useful for testing, for small projects and when you want to use the database in different applications.
Environment variables
In order to make your set up easier, you can use a .env
files to set environment variables. By default, quary will pick up variables inside of .env
. You can specify other .env
files with the -e
flag. For example, the following will read both .env
files.
Pre-run scripts
Pre-run scripts allow you to execute arbitrary SQL code before the main Quary run. These scripts are useful for various setup tasks, such as:
- Defining user-defined functions (UDFs)
- Installing extensions
- Setting up temporary tables or views
- Configuring session parameters
To create pre-run scripts, create one ore more SQL files containing the desired pre-run code and specify the paths to these in your quary.yaml
configuration file as following.