EMBEDDED ANALYTICS

Rewriting SQL Across 15+ Database Dialects: Multi-Tenant Embedded Analytics, Layer by Layer

Sapna Singh kathik

Software Engineer @ Databrain

Published On:

April 16, 2026

Key Takeaways

The SQL in the UI is not the SQL the warehouse runs - embedded analytics rewrites queries for tenancy, filters, dates, quoting, and drill-downs before execution.
Tenant isolation is a rewrite problem, not just a flag: database-per-tenant vs shared tables changes where substitution happens and what can break (types, admins, qualified names).
Real RLS rarely reduces to “append AND"- joins, CTEs, aliases, and custom SQL force scoped predicates (often subquery wraps) on the right table.
Dates and timezones are the highest-variance dialect surface - truncation, literals, session TZ, and engine-specific functions are where “works in dev, wrong in prod” hides.
There is no dialect-agnostic rewrite layer that stays safe - production systems thread databaseType through every stage, validate outputs, and test per engine so one fix doesn’t silently break another.

Every query in a multi-tenant embedded analytics platform is rewritten before it touches the database. Tenant filters, date ranges, timezone conversions, and identifier quoting all change depending on whether the target is Postgres, Snowflake, BigQuery, or MySQL.

We've been doing this across 15+ dialects for years. Here's what breaks.

Complexity at a glance

The Query You See Is Not the Query That Runs

When a user opens a dashboard in an embedded analytics product, the system doesn't just forward their metric's SQL to the database. Every query goes through a multi-stage rewriting pipeline that:

Substitutes tenant placeholders so Customer A never sees Customer B's data
Injects row-level security predicates based on the user's token claims
Converts date and time expressions to match the target database's syntax
Fixes identifier quoting because Postgres uses double quotes, MySQL uses backticks, and MSSQL uses square brackets
Rewrites GROUP BY and WHERE clauses for drill-down interactions

Each of these stages must produce valid SQL for whichever database the customer happens to use. A rewriting rule that works perfectly on PostgreSQL can generate a syntax error on BigQuery. A date filter that's correct on Snowflake can return wrong results on Trino.

This is the core engineering challenge of multi-database embedded analytics, and it's the reason most in-house teams underestimate the effort by 6-12 months.

Layer 1: Tenant Placeholder Substitution

Multi-tenant analytics platforms support two isolation models: database-per-tenant (each customer has their own database) and row-level filtering (all customers share a database, with a client_id column distinguishing rows).

The query author writes SQL with placeholders:

SQL

-- Database-per-tenant model
SELECT region, SUM(revenue)
FROM {{DATABASE_NAME}}.sales.orders
GROUP BY region

-- Row-level model
SELECT region, SUM(revenue)
FROM orders
WHERE org_id = 'client_id_variable'
GROUP BY region

‍

At runtime, the rewriting engine replaces these placeholders with actual values from the user's authentication token.

This sounds trivial. It isn't. Consider these edge cases:

Numeric vs string client IDs. If the customer's tenant column is an integer, the replacement must produce WHERE org_id = 42, not WHERE org_id = '42'. Get this wrong and PostgreSQL silently casts (maybe), MySQL silently casts (probably), and BigQuery throws a type mismatch error.

The "all clients" case. Admin users who should see all tenants can't just have the placeholder removed -- that would leave a dangling WHERE org_id = clause. The system rewrites the equality check to an existence check: WHERE org_id IS NOT NULL. This preserves the query structure while removing the filter.

Database name injection. In database-per-tenant mode, the placeholder appears in the FROM clause, not in a WHERE filter. The replacement must handle schema-qualified references like {{DATABASE_NAME}}.schema.table across databases where the syntax for multi-part names differs (Snowflake uses dots, BigQuery uses dots with backtick-quoted projects, MSSQL uses dots with bracket-quoted databases).

Layer 2: RLS Predicate Injection

Row-level security conditions are defined by the dashboard author and stored as rules: "filter the orders table where region matches the user's region claim." At query time, these rules must be injected into arbitrary SQL.

The naive approach is appending AND region = 'EMEA' to the WHERE clause. This works for simple queries:

SQL

-- Before injection
SELECT product, COUNT(*)
FROM orders
WHERE status = 'shipped'
GROUP BY product

-- After injection
SELECT product, COUNT(*)
FROM orders
WHERE status = 'shipped' AND region = 'EMEA'
GROUP BY product

‍

But real customer queries aren't simple. They have joins, subqueries, CTEs, and aliased tables. The RLS filter must find the correct table reference in the query and inject the predicate in the right place.

Join tables require subquery wrapping. When the filtered table appears in a JOIN (not in the main FROM), you can't just append to the WHERE clause -- the predicate needs to apply to that specific table. The solution is wrapping the table reference in a subquery:

SQL

-- Before injection
SELECT o.product, c.name
FROM orders o
JOIN customers c ON o.customer_id = c.id
WHERE o.status = 'shipped'

-- After injection (filter on customers table)
SELECT o.product, c.name
FROM orders o
JOIN (SELECT * FROM customers WHERE region = 'EMEA') AS c
  ON o.customer_id = c.id
WHERE o.status = 'shipped'

‍

This subquery wrapping must respect each database's quoting rules. The alias AS c needs double quotes on PostgreSQL if it conflicts with a reserved word, backticks on MySQL, and brackets on MSSQL.

Parent-child table relationships in datamarts. When a filter targets a dimension table that's joined to a fact table, the system must know the relationship graph to inject the predicate on the right table. Filtering region = 'EMEA' on a regions dimension table that joins to orders via region_id requires knowing that join path and wrapping accordingly.

Custom SQL metrics. Dashboard authors can write arbitrary SQL. The rewriting engine can't assume any particular query structure -- it must parse the SQL string, locate table references using pattern matching, and inject predicates without breaking the author's intent.

Layer 3: Date and Time - Where Dialects Diverge Most

Date handling is where the "works on Postgres, breaks on BigQuery" bugs live. Every database has its own syntax for truncating dates, formatting timestamps, and applying timezone conversions.

DATE_TRUNC: Five databases, four syntaxes

Truncating a timestamp to month granularity -- one of the most common dashboard operations -- requires different SQL on nearly every database:

SQL

-- PostgreSQL, Redshift, CockroachDB, Snowflake
DATE_TRUNC('month', order_date)

-- BigQuery (argument order is reversed)
DATE_TRUNC(order_date, MONTH)

-- MSSQL, Microsoft Fabric
DATETRUNC(MONTH, order_date)

-- Athena
DATE_TRUNC('month', order_date) -- same as Postgres, but date literals need DATE prefix

-- MySQL, MongoDB connector
DATE_FORMAT(order_date, '%Y-%m-01') -- no DATE_TRUNC at all

‍

A rewriting engine must maintain a mapping of time-grain expressions per database and generate the correct syntax at runtime. Miss one database and your customers on that warehouse see broken charts.

Timezone conversion: Every database invented its own syntax

When a dashboard viewer is in Tokyo (JST, UTC+9) and the data warehouse stores timestamps in UTC, the analytics platform must convert timestamps before applying date filters or rendering time-series charts.

Here's the same operation -- "convert this UTC timestamp to the user's timezone" -- across six databases:

SQL

-- PostgreSQL, CockroachDB
order_date AT TIME ZONE 'UTC' AT TIME ZONE 'Asia/Tokyo'

-- Snowflake
CONVERT_TIMEZONE('UTC', 'Asia/Tokyo', order_date)

-- BigQuery
DATETIME(TIMESTAMP(order_date, 'UTC'), 'Asia/Tokyo')

-- Trino, Athena
CAST(at_timezone(with_timezone(order_date, 'UTC'), 'Asia/Tokyo') AS timestamp)

-- MSSQL, Microsoft Fabric
order_date AT TIME ZONE 'Asia/Tokyo'

-- MySQL, ClickHouse
CONVERT_TZ(order_date, 'UTC', 'Asia/Tokyo')

‍

Six databases, six completely different function signatures. Some take the source timezone first, some take the target first. Some return TIMESTAMP WITH TIME ZONE, others return DATETIME. PostgreSQL requires chaining two AT TIME ZONE calls (the first strips the timezone, the second applies the new one), which is a behavior that surprises even experienced Postgres developers.

Session-level timezone: Some databases support it, some don't

Some databases let you set a session timezone that applies to all subsequent queries:

SQL

-- PostgreSQL
SET TIME ZONE 'Asia/Tokyo';

-- Snowflake
ALTER SESSION SET TIMEZONE = 'Asia/Tokyo';

-- MySQL
SET time_zone = 'Asia/Tokyo';

‍

But BigQuery, MongoDB, OpenSearch, Elasticsearch, Firebolt, and ClickHouse have no session-level timezone concept. For these databases, every timestamp column must be explicitly converted in the SELECT clause. The rewriting engine must know which databases support session timezone and which require per-column conversion.

Date range filters: Athena and Trino need special treatment

When applying date range filters (e.g., "last 30 days"), most databases accept string literals:

WHERE order_date >= '2026-03-12'

But Athena and Trino require explicit DATE casting:

WHERE order_date >= DATE '2026-03-12'

Without the DATE prefix, Athena treats the literal as a string and the comparison silently returns wrong results -- no error, just incorrect data. This is the worst kind of bug: it passes all tests, looks correct in development, and produces silently wrong dashboards in production.

Layer 4: Identifier Quoting

Every database has opinions about how to quote identifiers (table names, column names, schema names).

SQL

-- PostgreSQL, Redshift, CockroachDB, Trino, Athena
SELECT "Order Date" FROM "public"."orders"

-- MySQL, SingleStore
SELECT `Order Date` FROM `orders`

-- BigQuery
SELECT `Order Date` FROM `project.dataset.orders`

-- MSSQL, Microsoft Fabric
SELECT [Order Date] FROM [dbo].[orders]

‍

The rewriting engine must apply the correct quoting when it generates any SQL fragment -- whether that's an injected WHERE clause, a subquery alias, a CTE name, or a drill-down GROUP BY column.

The backtick-to-double-quote problem. SQL parsers commonly output backtick-quoted identifiers. When the target database is PostgreSQL or Athena (which use double quotes), every backtick in the output must be replaced. But BigQuery uses backticks natively, so the same replacement would break BigQuery queries. The engine must know the target dialect before applying any quoting transformation.

Schema qualification varies by database. PostgreSQL uses schema.table, Snowflake uses database.schema.table, BigQuery uses project.dataset.table, and Databricks uses catalog.schema.table. When generating qualified table names for subquery wrapping or CTE references, the engine must construct the correct multi-part name for the target database.

Redshift lowercases everything. Redshift silently lowercases all unquoted identifiers. A column named OrderDate in the dashboard builder becomes orderdate in the Redshift catalog. The rewriting engine must normalize column name lookups for Redshift to avoid "column not found" errors that don't occur on any other database.

Layer 5: Drill-Down SQL Rewriting

Drill-down is where all four previous layers converge, with an additional challenge: the rewriting engine must structurally modify the query's GROUP BY, WHERE, and ORDER BY clauses.

When a user clicks "United States" on a revenue-by-country chart to drill into states, the engine must:

Replace GROUP BY country with GROUP BY state
Add WHERE country = 'United States' (respecting the database's quoting rules)
Keep the aggregate expressions (SUM(revenue)) intact
Update ORDER BY to reference the new grouping column

For custom SQL metrics, this requires parsing the SQL into an abstract syntax tree (AST), manipulating the tree, and regenerating SQL in the target dialect. But SQL parsers have database-specific behaviors:

BigQuery's AST is structurally different. The open-source SQL parsers that handle most dialects return a different object shape for BigQuery queries. The drill engine must special-case BigQuery to extract the correct SELECT node from the parsed result.

Quoting in the regenerated SQL. After AST manipulation, the parser regenerates SQL with backtick-quoted identifiers. For PostgreSQL, Athena, and MSSQL, these backticks must be converted to the correct quoting style. For BigQuery, double backticks (a parser artifact) must be collapsed to single backticks.

The cross-database filter problem. When drill-down applies a filter column (like country = 'United States') to all metrics on a dashboard, every metric must have a column named exactly country. If one metric's source table calls it ship_country, the filter silently fails. This is an architectural constraint, not a bug -- but it surprises every team that builds cross-dashboard drill, regardless of whether they build in-house or use a platform.

The Bug That Taught Us the Most

Early in our history, we shipped a fix for MySQL quoting in drill-down queries. MySQL uses backticks for identifiers, and our drill logic was generating unquoted column names that broke on reserved words.

The fix was simple: wrap column names in backticks during drill query generation.

Except that fix also ran for BigQuery. BigQuery already used backticks, so now columns were double-backtick-quoted:

SELECT ``country`` FROM ...

BigQuery rejected these queries with a cryptic syntax error.

We patched BigQuery. The patch checked for double backticks and collapsed them. This worked until a Snowflake customer reported that drill-down was quoting their column names with backticks instead of double quotes.

The root cause: our quoting logic was applied after AST regeneration but before dialect-specific post-processing. The fix for one database would break another because the quoting transformation didn't know which database it was targeting.

The lesson: every SQL rewriting rule must be parameterized by the target dialect from the very first line. There is no such thing as "database-agnostic SQL rewriting." Any rule that doesn't check the target database will eventually produce correct SQL for one dialect and broken SQL for another.

Why In-House Teams Underestimate This

When an engineering team decides to build embedded analytics in-house, the SQL layer rarely makes the initial estimate. The focus is on the frontend -- charts, dashboards, drag-and-drop. The backend is "just run the SQL against the database."

Then reality hits:

Sprint 1-2: Basic queries work. Single database, single tenant, no filters. Demo looks great.

Sprint 3-4: Multi-tenancy. Now every query needs tenant filtering. Regex-based placeholder replacement works for simple cases. Someone writes a helper function.

Sprint 5-8: Second database type. A customer uses Snowflake instead of Postgres. Half the date functions break. Identifier quoting is wrong. The team forks the query builder into two code paths.

Sprint 9-12: Dashboard filters, drill-down, cross-filtering. The query rewriting code is now 2,000+ lines of string manipulation and special cases. Bug fixes for one database break another. The team spends more time on SQL rewriting than on product features.

Month 6+: The team realizes they've built a query engine. They didn't plan for it, didn't staff for it, and didn't budget for it. But here they are, maintaining a dialect-aware SQL rewriting system that's critical infrastructure for their entire analytics feature.

This is the trajectory we've seen from dozens of SaaS companies who eventually adopt a purpose-built embedded analytics platform. The SQL rewriting layer alone -- not the charts, not the dashboards, not the embedding -- is a multi-year engineering investment.

What Good Looks Like

A production-grade SQL rewriting system for multi-database analytics has these properties:

Dialect-parameterized from the start. Every function that touches SQL takes a databaseType parameter. No function generates SQL without knowing the target.
Layered rewriting pipeline. Tenant substitution, RLS injection, date conversion, and drill manipulation are separate, composable stages. A bug in date conversion doesn't affect tenant isolation.
Database-grouped behavior. Databases with similar SQL syntax (e.g., PostgreSQL and CockroachDB, or MySQL and SingleStore) are grouped so that fixes apply to the group, not just one database.
Escape handling per dialect. String escaping differs: BigQuery uses unicode escapes for single quotes (\u0027), while most other databases double the quote character (''). Getting this wrong opens SQL injection vectors.
Validation before execution. After all rewriting is complete, the final SQL is validated to ensure no unreplaced placeholders remain (like {{DATABASE_NAME}} or client_id_variable). A query with leftover placeholders is rejected, not executed.
Test coverage per dialect. Every rewriting rule has tests that run against every supported database type. A fix for Snowflake quoting includes a regression test for PostgreSQL, MySQL, and BigQuery.

The Academic Frontier

For teams interested in the research side of this problem, cross-dialect SQL translation is an active area in database systems research:

CrackSQL (Tsinghua, SIGMOD 2025). Uses cross-dialect syntax embeddings to decompose and match query functionality across dialects. Outperforms direct LLM translation by up to 77% on complex queries.
SQLGlot. Open-source SQL transpiler using rule-based AST transformations. Useful for one-time migrations but not designed for runtime per-query rewriting.
SQL-GEN (2024). Mixture-of-experts architecture that unifies dialect-specific models with shared syntax learning.

The state of the art confirms what we've learned from production experience: cross-dialect SQL is a genuine computer science problem, not a string manipulation exercise.

Conclusion

SQL rewriting is the invisible infrastructure of multi-database embedded analytics. It sits between the dashboard and the database, silently translating every query into the correct dialect, injecting security filters, converting date functions, and fixing identifier quoting.

Most teams discover this complexity at Sprint 5, when the second database type arrives. By Sprint 12, they're maintaining a query engine they never planned to build.

If your SaaS product connects to customer databases across multiple vendors -- and in 2026, most do -- the SQL rewriting layer will eventually consume more engineering effort than the charts and dashboards combined. That's the gap that purpose-built embedded analytics platforms are designed to fill.

[1] Zhou et al., "Cracking SQL Barriers: An LLM-based Dialect Translation System," SIGMOD 2025. Code: CrackSQL on GitHub.*

DataBrain supports 15+ database connectors with automatic multi-tenant SQL rewriting, dialect-aware date handling, and production-hardened RLS injection. Start a free trial · Talk to our team.

Frequently Asked Questions

Does the database run the exact SQL shown in the dashboard?

Usually no. Before execution, SQL is rewritten for that warehouse: tenant scoping, RLS-style predicates, date/time dialect rules, identifier quoting, and sometimes drill-down changes to GROUP BY / WHERE / ORDER BY. Each step must output valid SQL for that engine (Postgres, Snowflake, BigQuery, MySQL, etc.).

‍

What are the two common tenant isolation patterns?

Database-per-tenant (separate DB per customer; names go in FROM) and shared DB + row filter (one warehouse; a tenant key like client_id in WHERE). The rewriter must handle types (numeric vs string IDs), admin / all-tenant cases, and different fully qualified name rules across databases.

‍

Why not just append AND … for tenant / RLS filters?

That works for simple single-table queries, but real metrics use JOINs, CTEs, subqueries, and aliases, and the filter often must bind to one table - not the whole outer WHERE. Teams typically wrap the target table in a subquery so the predicate applies correctly, including tricky dimension ↔ fact joins and custom SQL metrics.

‍

Why do dates, timezones, and date ranges break across databases?

Functions and semantics differ widely (DATE_TRUNC vs DATETRUNC vs DATE_FORMAT, different timezone APIs, and inconsistent session timezone support). Engines like Athena/Trino also push teams toward typed date literals - small mistakes here are easy to ship and hard to spot in charts.

‍

Why is dialect-agnostic SQL rewriting unrealistic?

Quoting and parser output don't line up across engines (double quotes vs backticks vs brackets), so “one string fix” regressions are common once AST regen and dialect post-processing interact. Production systems keep rules databaseType-aware, use separate pipeline stages, reject unreplaced placeholders, and ship per-dialect tests so one fix doesn't break another warehouse.

‍