Configuration


Configuring your project is done with a simple YAML file, named quary.yaml and placed at the root of a project. The file defines which database to use and how to connect to it. The following are the different options for configuring the database.

BigQuery

BigQuery is a cloud based database, it is a good choice for large projects and for production. It is operated by Google and is part of the Google Cloud Platform. To use BigQuery you need to create a project in the Google Cloud Platform and enable BigQuery for the project. You can configure the connection to BigQuery with the following configuration.

bigQuery:
  projectId: my-project-id
  datasetId: my-dataset-id

Snowflake

Snowflake is a cloud-based data warehousing platform that offers scalable, on-demand storage and computing power and that separates storage and compute, enabling users to independently scale and pay for each.

snowflake:
  accountUrl: account-url
  clientId: client-id
  clientSecret: client-secret
  role: your-quary-role
  database: database-name
  schema: schema-name
  warehouse: warehouse-name

Clickhouse

Clickhouse is a column-oriented database management system developed by Yandex. It is designed for OLAP workloads and is a good choice for large-scale data storage and analysis. Unlike other databases, Clickhouse does not have a concept of a schema, so you only need to specify the database.

clickhouse:
  database: transform

All connection details are passed as environment variables. To connect to Clickhouse you need to set the following environment variables: CLICKHOUSE_HOST, CLICKHOUSE_PORT, CLICKHOUSE_USER and CLICKHOUSE_PASSWORD.

Postgres

Postgres is a popular open-source database. It is a good choice for small to medium projects and for production.

postgres:
    schema: transform

All connection details are passed as environment variables. The following environment variables are used to connect to the database. PGHOST, PGPORT (if not set defaults to 5432), PGUSER, PGPASSWORD, PGDATABASE. They are the same as those used by the psql command line tool.

Model Configuration

For Postgres, models can also be configured in order to better suit your needs. There are two main configurations available, unlogged and indexes.

Unlogged Table

Details about unlogged tables can be found in the Postgres documentation. Unlogged tables are not written to the write-ahead log, which makes them faster to write to. They are not crash safe, so they are risks to consider.

They can be configured as follows:

models:
  - name: stg_orders
    materialization: table
    database_config:
      unlogged: true

Indexes

Indexes can be added to tables to speed up queries. They can be configured as follows. The columns field is a list of columns to index, and the unique field is a boolean that specifies if the index should be unique.

models:
  - name: stg_orders
    materialization: table
    database_config:
      indexes:
        - columns: [order_id]
          unique: true
        - columns:
          - order_date
          - order_id

Redshift

Redshift is a cloud-based data warehousing service provided by Amazon Web Services (AWS). It is designed for large-scale data storage and analysis.

redshift:
    schema: your-schema

All connection details are passed as environment variables. The following environment variables are used to connect to the database:

  • RSHOST
  • RSUSER
  • RSPASSWORD
  • RSDATABASE
  • RSPORT (if not set defaults to 5432)

DuckDb

DuckDb is an in-process SQL database, it's a feature-rich SQL dialect that it is a great way to query data locally on your machine.

File Based

The database is stored in a file on the disk and is persisted between runs. In this case the path to the file is specified, and each run creates the file if it does not exist.

duckdb:
    path: database.db

In Memory

The database is created in memory and is lost when the application is closed. It is faster than the file-based version and is good for rapid testing.

duckdbInMemory: {}

SQLite

SQLite is a database type that runs locally, it is a file-based database. It is a good choice for small projects and for testing. It can be operated in two manners

In Memory

The database is created in memory and is lost when the application is closed. It is faster than the file-based version and is good for rapid testing.

sqliteInMemory: {}

File Based

The other option is to use a file based database, where the database is stored in a file on the disk and is persisted between runs. In this case the path to the file is specified, and each run creates the file if it does not exist. It is useful for testing, for small projects and when you want to use the database in different applications.

sqlite:
  path: db.sqlite

Environment variables

In order to make your set up easier, you can use a .env files to set environment variables. By default, quary will pick up variables inside of .env. You can specify other .env files with the -e flag. For example, the following will read both .env files.

quary init -e='.env' -e='local.env'

Pre-run scripts

Pre-run scripts allow you to execute arbitrary SQL code before the main Quary run. These scripts are useful for various setup tasks, such as:

  • Defining user-defined functions (UDFs)
  • Installing extensions
  • Setting up temporary tables or views
  • Configuring session parameters

To create pre-run scripts, create one ore more SQL files containing the desired pre-run code and specify the paths to these in your quary.yaml configuration file as following.

duckdbInMemory: {}
pre_run_scripts:
  - path/to/script1.sql
  - path/to/script2.sql