Databricks SQL Query Toolkit
Use the system table queries below to answer your burning questions.
Use the all-in-one dashboard for a health check across Jobs, SQL, APC, ad DLT usage.

How to use
Copy
Copy any of the system table queries below
1
Paste
Paste it in the Databricks “SQL Editor”
2
Run
Hit “Run” to view your results
3
General Use
-
How many clusters are currently active, and how were they created?
-- Count of active clusters by cluster_source SELECT cluster_source, COUNT(*) AS active_clusters FROM system.compute.clusters WHERE delete_time IS NULL GROUP BY cluster_source ORDER BY cluster_source;
-
Which markets (Spot, On-Demand) do my clusters use?
-- Count of active clusters by availability SELECT "AWS" AS cloud, aws_attributes.availability AS availability, COUNT(*) AS active_clusters FROM system.compute.clusters WHERE delete_time IS NULL AND aws_attributes.availability IS NOT NULL GROUP BY aws_attributes.availability UNION ALL SELECT "AZURE" AS cloud, azure_attributes.availability AS availability, COUNT(*) AS active_clusters FROM system.compute.clusters WHERE delete_time IS NULL AND azure_attributes.availability IS NOT NULL GROUP BY azure_attributes.availability UNION ALL SELECT "GCP" AS cloud, gcp_attributes.availability AS availability, COUNT(*) AS active_clusters FROM system.compute.clusters WHERE delete_time IS NULL AND gcp_attributes.availability IS NOT NULL GROUP BY gcp_attributes.availability;
-
Who are the most active users?
-- Count of queries executed by user SELECT executed_by AS user, COUNT(*) AS query_count FROM system.query.history GROUP BY executed_by ORDER BY query_count DESC LIMIT 10;
-
Who owns the most expensive clusters?
-- Most expensive cluster owners SELECT c.owned_by, SUM(u.usage_quantity) AS total_cost FROM system.billing.usage u JOIN system.compute.clusters c ON u.usage_metadata.cluster_id = c.cluster_id WHERE u.usage_start_time >= current_date() - interval 30 days GROUP BY c.owned_by ORDER BY total_cost DESC LIMIT 20
-
How “sticky” is my workspace?
-- Divide daily active users by monthly active users
WITH all_users AS ( SELECT DATE(DATE_TRUNC('month', start_time)) AS usage_month, DATE(start_time) AS usage_date, executed_by FROM system.query.history WHERE start_time >= DATE_TRUNC('month', CURRENT_DATE) - INTERVAL 6 MONTH ), distinct_users AS ( SELECT DISTINCT executed_by FROM all_users ), distinct_daily_users AS ( SELECT DISTINCT usage_month, usage_date, executed_by FROM all_users ), distinct_monthly_users AS ( SELECT DISTINCT usage_month, executed_by FROM all_users ), daily_active_users AS ( SELECT DISTINCT usage_month, usage_date, COUNT(executed_by) OVER (PARTITION BY DATE(usage_month), DATE(usage_date)) AS dau FROM distinct_daily_users ), monthly_active_users AS ( SELECT DISTINCT usage_month, COUNT(executed_by) OVER (PARTITION BY DATE(usage_month)) AS mau FROM distinct_monthly_users ), stickiness AS ( SELECT dau.usage_month, ROUND( AVG(dau.dau) OVER (PARTITION BY DATE(dau.usage_month)), 2 ) AS avg_dau, mau.mau, ROUND( AVG(dau.dau) OVER (PARTITION BY DATE(dau.usage_month)) / mau.mau, 2 ) AS stickiness FROM daily_active_users dau JOIN monthly_active_users mau ON dau.usage_month = mau.usage_month ), total_users AS ( SELECT DISTINCT COUNT(executed_by) OVER () AS total_users FROM distinct_users ), final AS ( SELECT DISTINCT usage_month, ( SELECT total_users FROM total_users LIMIT 1 ) AS total_users, mau, avg_dau, stickiness FROM stickiness ORDER BY usage_month ) SELECT * FROM final;
Cost Optimization
-
What are the compute costs associated with my Databricks workspace?
-- Total compute costs by SKU SELECT t1.sku_name, SUM(t1.usage_quantity * lp.pricing.default) AS total_cost FROM system.billing.usage t1 INNER JOIN system.billing.list_prices lp ON t1.cloud = lp.cloud AND t1.sku_name = lp.sku_name AND t1.usage_start_time >= lp.price_start_time AND ( t1.usage_end_time <= lp.price_end_time OR lp.price_end_time IS NULL ) GROUP BY t1.sku_name ORDER BY total_cost DESC;
-
What are the product costs of my Databricks workspace?
-- Total costs by product WITH usage AS ( SELECT usage_metadata.job_id, usage_metadata.cluster_id, usage_metadata.warehouse_id, billing_origin_product, usage_quantity, usage_start_time, usage_end_time, cloud, sku_name FROM system.billing.usage ), list_prices AS ( SELECT pricing.default AS price, cloud, price_start_time, price_end_time, sku_name FROM system.billing.list_prices ), joined AS ( SELECT u.billing_origin_product, SUM(u.usage_quantity * lp.price) OVER (PARTITION BY u.billing_origin_product) AS total_cost FROM usage u INNER JOIN list_prices lp ON u.cloud = lp.cloud AND u.sku_name = lp.sku_name AND u.usage_start_time >= lp.price_start_time AND ( u.usage_end_time <= lp.price_end_time OR lp.price_end_time IS NULL ) ) SELECT DISTINCT billing_origin_product, total_cost FROM joined ORDER BY total_cost DESC;
-
Which jobs are most likely over-provisioned?
-- Jos with the lowest CPU utilization with list_cost_per_job_cluster as ( SELECT t1.workspace_id, t1.usage_metadata.job_id, t1.usage_metadata.cluster_id, COUNT(DISTINCT t1.usage_metadata.job_run_id) as runs, SUM(t1.usage_quantity * list_prices.pricing.default) as list_cost, first(identity_metadata.run_as, true) as run_as, first(t1.custom_tags, true) as custom_tags, MAX(t1.usage_end_time) as last_seen_date FROM system.billing.usage t1 INNER JOIN system.billing.list_prices list_prices on t1.cloud = list_prices.cloud and t1.sku_name = list_prices.sku_name and t1.usage_start_time >= list_prices.price_start_time and (t1.usage_end_time <= list_prices.price_end_time or list_prices.price_end_time is null) WHERE t1.sku_name LIKE '%JOBS%' and t1.sku_name not ilike '%jobs_serverless%' AND t1.usage_metadata.job_id IS NOT NULL AND t1.usage_date >= CURRENT_DATE() - INTERVAL 30 DAY GROUP BY ALL ), most_recent_jobs as ( SELECT *, ROW_NUMBER() OVER(PARTITION BY workspace_id, job_id ORDER BY change_time DESC) as rn FROM system.lakeflow.jobs QUALIFY rn=1 ), jobsrollingmonth as ( SELECT t2.name, t1.job_id, t1.workspace_id, t1.cluster_id, t1.runs, t1.run_as, SUM(list_cost) as list_cost, t1.last_seen_date FROM list_cost_per_job_cluster t1 LEFT JOIN most_recent_jobs t2 USING (workspace_id, job_id) GROUP BY ALL ORDER BY list_cost DESC ), clusterutil as (SELECT * from system.compute.node_timeline WHERE start_time >= CURRENT_DATE() - INTERVAL 30 DAY ), jobswithutil as ( select jobsrollingmonth.workspace_id, jobsrollingmonth.name, avg(cpu_user_percent + cpu_system_percent) as `Avg CPU Utilization`, max(cpu_user_percent + cpu_system_percent) as `Peak CPU Utilization`, avg(cpu_wait_percent) as `Avg CPU Disk Wait`, max(cpu_wait_percent) as `Max CPU Disk Wait`, avg(mem_used_percent) as `Avg Memory Utilization`, max(mem_used_percent) as `Max Memory Utilization` from clusterutil join jobsrollingmonth on clusterutil.cluster_id = jobsrollingmonth.cluster_id group by 1, 2) select * from jobswithutil order by `Avg CPU Utilization` asc limit 10;
-
Which jobs are most likely under-provisioned?
-- Jobs with highest memory utilization with list_cost_per_job_cluster as ( SELECT t1.workspace_id, t1.usage_metadata.job_id, t1.usage_metadata.cluster_id, COUNT(DISTINCT t1.usage_metadata.job_run_id) as runs, SUM(t1.usage_quantity * list_prices.pricing.default) as list_cost, first(identity_metadata.run_as, true) as run_as, first(t1.custom_tags, true) as custom_tags, MAX(t1.usage_end_time) as last_seen_date FROM system.billing.usage t1 INNER JOIN system.billing.list_prices list_prices on t1.cloud = list_prices.cloud and t1.sku_name = list_prices.sku_name and t1.usage_start_time >= list_prices.price_start_time and (t1.usage_end_time <= list_prices.price_end_time or list_prices.price_end_time is null) WHERE t1.sku_name LIKE '%JOBS%' and t1.sku_name not ilike '%jobs_serverless%' AND t1.usage_metadata.job_id IS NOT NULL AND t1.usage_date >= CURRENT_DATE() - INTERVAL 30 DAY GROUP BY ALL ), most_recent_jobs as ( SELECT *, ROW_NUMBER() OVER(PARTITION BY workspace_id, job_id ORDER BY change_time DESC) as rn FROM system.lakeflow.jobs QUALIFY rn=1 ), jobsrollingmonth as ( SELECT t2.name, t1.job_id, t1.workspace_id, t1.cluster_id, t1.runs, t1.run_as, SUM(list_cost) as list_cost, t1.last_seen_date FROM list_cost_per_job_cluster t1 LEFT JOIN most_recent_jobs t2 USING (workspace_id, job_id) GROUP BY ALL ORDER BY list_cost DESC ), clusterutil as (SELECT * from system.compute.node_timeline WHERE start_time >= CURRENT_DATE() - INTERVAL 30 DAY ), jobswithutil as ( select jobsrollingmonth.workspace_id, jobsrollingmonth.name, avg(cpu_user_percent + cpu_system_percent) as `Avg CPU Utilization`, max(cpu_user_percent + cpu_system_percent) as `Peak CPU Utilization`, avg(cpu_wait_percent) as `Avg CPU Disk Wait`, max(cpu_wait_percent) as `Max CPU Disk Wait`, avg(mem_used_percent) as `Avg Memory Utilization`, max(mem_used_percent) as `Max Memory Utilization` from clusterutil join jobsrollingmonth on clusterutil.cluster_id = jobsrollingmonth.cluster_id group by 1, 2) select * from jobswithutil order by `Avg Memory Utilization` desc limit 20;
-
What are the most expensive jobs over the past 30 days?
-- Most expensive jobs in past 30 days with list_cost_per_job as ( SELECT t1.workspace_id, t1.usage_metadata.job_id, COUNT(DISTINCT t1.usage_metadata.job_run_id) as runs, SUM(t1.usage_quantity * list_prices.pricing.default) as list_cost, first(identity_metadata.run_as, true) as run_as, first(t1.custom_tags, true) as custom_tags, MAX(t1.usage_end_time) as last_seen_date FROM system.billing.usage t1 INNER JOIN system.billing.list_prices list_prices on t1.cloud = list_prices.cloud and t1.sku_name = list_prices.sku_name and t1.usage_start_time >= list_prices.price_start_time and (t1.usage_end_time <= list_prices.price_end_time or list_prices.price_end_time is null) WHERE t1.sku_name LIKE '%JOBS%' AND t1.usage_metadata.job_id IS NOT NULL AND t1.usage_date >= CURRENT_DATE() - INTERVAL 30 DAY GROUP BY ALL ), most_recent_jobs as ( SELECT *, ROW_NUMBER() OVER(PARTITION BY workspace_id, job_id ORDER BY change_time DESC) as rn FROM system.lakeflow.jobs QUALIFY rn=1 ) SELECT t2.name, t1.job_id, t1.workspace_id, t1.runs, t1.run_as, SUM(list_cost) as list_cost, t1.last_seen_date FROM list_cost_per_job t1 LEFT JOIN most_recent_jobs t2 USING (workspace_id, job_id) GROUP BY ALL ORDER BY list_cost DESC LIMIT 20
-
What jobs are growing the most in costs week-over-week?
-- Top jobs growing in WoW costs with job_run_timeline_with_cost as ( SELECT t1.*, t1.usage_metadata.job_id as job_id, t1.identity_metadata.run_as as run_as, t1.usage_quantity * list_prices.pricing.default AS list_cost FROM system.billing.usage t1 INNER JOIN system.billing.list_prices list_prices ON t1.cloud = list_prices.cloud AND t1.sku_name = list_prices.sku_name AND t1.usage_start_time >= list_prices.price_start_time AND (t1.usage_end_time <= list_prices.price_end_time or list_prices.price_end_time is NULL) WHERE t1.billing_origin_product = 'JOBS' AND t1.usage_date >= CURRENT_DATE() - INTERVAL 14 DAY ), most_recent_jobs as ( SELECT *, ROW_NUMBER() OVER(PARTITION BY workspace_id, job_id ORDER BY change_time DESC) as rn FROM system.lakeflow.jobs QUALIFY rn=1 ) SELECT t2.name ,t1.workspace_id ,t1.job_id ,t1.sku_name ,t1.run_as ,Last7DaySpend ,Last14DaySpend ,last7DaySpend - last14DaySpend as Last7DayGrowth ,try_divide( (last7DaySpend - last14DaySpend) , last14DaySpend) * 100 AS Last7DayGrowthPct FROM ( SELECT workspace_id, job_id, run_as, sku_name, SUM(list_cost) AS spend ,SUM(CASE WHEN usage_end_time BETWEEN date_add(current_date(), -8) AND date_add(current_date(), -1) THEN list_cost ELSE 0 END) AS Last7DaySpend ,SUM(CASE WHEN usage_end_time BETWEEN date_add(current_date(), -15) AND date_add(current_date(), -8) THEN list_cost ELSE 0 END) AS Last14DaySpend FROM job_run_timeline_with_cost GROUP BY ALL ) t1 LEFT JOIN most_recent_jobs t2 USING (workspace_id, job_id) ORDER BY Last7DayGrowth DESC LIMIT 20
-
What are the most expensive SQL warehouses for the last 30 days?
- Most expensive warehouses over past 30 days SELECT w.warehouse_id, w.workspace_id, w.warehouse_name, w.warehouse_type, w.warehouse_size, w.min_clusters, w.max_clusters, SUM(bu.usage_quantity) AS total_usage_quantity, SUM(bu.usage_quantity * lp.pricing.default) AS total_cost FROM system.compute.warehouses w JOIN system.billing.usage bu ON w.warehouse_id = bu.usage_metadata.warehouse_id JOIN system.billing.list_prices lp ON bu.cloud = lp.cloud AND bu.sku_name = lp.sku_name AND bu.usage_start_time >= lp.price_start_time AND (bu.usage_end_time <= lp.price_end_time OR lp.price_end_time IS NULL) WHERE bu.usage_date >= CURRENT_DATE() - INTERVAL 30 DAY GROUP BY w.warehouse_id, w.workspace_id, w.warehouse_name, w.warehouse_type, w.warehouse_size, w.min_clusters, w.max_clusters
-
What are my most expensive SQL queries over the past 30 days?
-- Most expensive queries in past 30 days WITH usage_aggregates AS ( SELECT bu.usage_metadata.warehouse_id, bu.usage_start_time AS wh_start_time, bu.usage_end_time AS wh_end_time, DATEDIFF(MILLISECOND, bu.usage_start_time, bu.usage_end_time) AS wh_duration_ms, bu.usage_quantity AS wh_usage_quantity, bu.sku_name AS sku_name FROM system.billing.usage bu WHERE bu.usage_start_time >= current_date() - INTERVAL 30 DAY AND bu.usage_metadata.warehouse_id IS NOT NULL GROUP BY bu.usage_metadata.warehouse_id, wh_start_time, wh_end_time, wh_usage_quantity, sku_name ), query_count AS ( SELECT statement_text, compute.warehouse_id AS warehouse_id, COUNT(*) AS count FROM system.query.history qh WHERE qh.start_time > CURRENT_TIMESTAMP() - INTERVAL 30 DAY GROUP BY statement_text, warehouse_id ) SELECT qh.statement_id, qh.statement_text, qc.count AS monthly_count, any_value(qh.compute) AS compute, qh.executed_by_user_id, qh.total_duration_ms AS query_duration_ms, qh.start_time AS query_start_time, qh.end_time AS query_end_time, qh.compute.warehouse_id, cw.warehouse_size, cw.warehouse_name, ua.sku_name, lp.pricing.default AS pricing_default, any_value(ua.wh_start_time) AS wh_start_time, any_value(ua.wh_end_time) AS wh_end_time, ua.wh_duration_ms, ua.wh_usage_quantity, try_divide(ua.wh_usage_quantity, ua.wh_duration_ms) AS usage_dbu_per_ms, SUM(qh.total_duration_ms * try_divide(ua.wh_usage_quantity, ua.wh_duration_ms) * lp.pricing.default) AS estimated_query_cost, SUM(qc.count * qh.total_duration_ms * try_divide(ua.wh_usage_quantity, ua.wh_duration_ms) * lp.pricing.default) AS estimated_30day_query_cost FROM system.query.history qh JOIN system.compute.warehouses cw ON qh.compute.warehouse_id = cw.warehouse_id JOIN usage_aggregates ua ON qh.compute.warehouse_id = ua.warehouse_id AND ua.wh_start_time <= qh.start_time AND ua.wh_end_time >= qh.end_time JOIN system.billing.list_prices lp ON ua.sku_name = lp.sku_name JOIN query_count qc ON qh.statement_text = qc.statement_text AND qh.compute.warehouse_id = qc.warehouse_id WHERE qh.start_time >= current_date() - INTERVAL 30 DAY GROUP BY qh.statement_id, qh.statement_text, qc.count, qh.compute.warehouse_id, qh.executed_by_user_id, qh.total_duration_ms, qh.start_time, qh.end_time, cw.warehouse_size, cw.warehouse_name, ua.sku_name, lp.pricing.default, ua.wh_duration_ms, ua.wh_usage_quantity ORDER BY estimated_30day_query_cost DESC LIMIT 20
-
Why are these SQL queries so expensive, and who owns them?
-- Granular details about most expensive queries WITH usage_aggregates AS ( SELECT bu.usage_metadata.warehouse_id, bu.usage_start_time AS wh_start_time, bu.usage_end_time AS wh_end_time, DATEDIFF(MILLISECOND, bu.usage_start_time, bu.usage_end_time) AS wh_duration_ms, bu.usage_quantity AS wh_usage_quantity, bu.sku_name AS sku_name FROM system.billing.usage bu WHERE bu.usage_start_time >= current_date() - INTERVAL 30 DAY AND bu.usage_metadata.warehouse_id IS NOT NULL GROUP BY bu.usage_metadata.warehouse_id, wh_start_time, wh_end_time, wh_usage_quantity, sku_name ), query_count AS ( SELECT statement_text, compute.warehouse_id AS warehouse_id, COUNT(*) AS count FROM system.query.history qh WHERE qh.start_time > CURRENT_TIMESTAMP() - INTERVAL 30 DAY GROUP BY statement_text, warehouse_id ) SELECT qh.statement_id, qh.statement_text, qc.count AS monthly_count, any_value(qh.compute) AS compute, qh.executed_by_user_id, qh.total_duration_ms AS query_duration_ms, qh.start_time AS query_start_time, qh.end_time AS query_end_time, qh.compute.warehouse_id, cw.warehouse_size, cw.warehouse_name, ua.sku_name, lp.pricing.default AS pricing_default, any_value(ua.wh_start_time) AS wh_start_time, any_value(ua.wh_end_time) AS wh_end_time, ua.wh_duration_ms, ua.wh_usage_quantity, try_divide(ua.wh_usage_quantity, ua.wh_duration_ms) AS usage_dbu_per_ms, SUM(qh.total_duration_ms * try_divide(ua.wh_usage_quantity, ua.wh_duration_ms) * lp.pricing.default) AS estimated_query_cost, SUM(qc.count * qh.total_duration_ms * try_divide(ua.wh_usage_quantity, ua.wh_duration_ms) * lp.pricing.default) AS estimated_30day_query_cost FROM system.query.history qh JOIN system.compute.warehouses cw ON qh.compute.warehouse_id = cw.warehouse_id JOIN usage_aggregates ua ON qh.compute.warehouse_id = ua.warehouse_id AND ua.wh_start_time <= qh.start_time AND ua.wh_end_time >= qh.end_time JOIN system.billing.list_prices lp ON ua.sku_name = lp.sku_name JOIN query_count qc ON qh.statement_text = qc.statement_text AND qh.compute.warehouse_id = qc.warehouse_id WHERE qh.start_time >= current_date() - INTERVAL 30 DAY GROUP BY qh.statement_id, qh.statement_text, qc.count, qh.compute.warehouse_id, qh.executed_by_user_id, qh.total_duration_ms, qh.start_time, qh.end_time, cw.warehouse_size, cw.warehouse_name, ua.sku_name, lp.pricing.default, ua.wh_duration_ms, ua.wh_usage_quantity ORDER BY estimated_30day_query_cost DESC LIMIT 20
-
What are the most costly APC clusters over the past 30 days?
-- Most expensive APC clusters in past 30 days SELECT usage_metadata.cluster_id AS cluster_id, identity_metadata.created_by AS created_by, u.sku_name, SUM(u.usage_quantity * lp.pricing.default) AS total_cost, c.cluster_name, c.owned_by FROM system.billing.usage u INNER JOIN system.billing.list_prices lp ON u.cloud = lp.cloud AND u.sku_name = lp.sku_name AND u.usage_start_time >= lp.price_start_time AND (u.usage_end_time <= lp.price_end_time OR lp.price_end_time IS NULL) JOIN system.compute.clusters c ON usage_metadata.cluster_id = c.cluster_id WHERE u.usage_metadata.cluster_id IS NOT NULL AND u.usage_date >= CURRENT_DATE() - INTERVAL 30 DAY AND u.sku_name LIKE "%ALL_PURPOSE%" GROUP BY usage_metadata.cluster_id, identity_metadata.created_by, u.sku_name, c.cluster_name, c.owned_by ORDER BY total_cost DESC LIMIT 20
-
Which APC clusters are likely under-utilized?
-- APC clusters with low CPU utilization in past 30 days SELECT nt.cluster_id, nt.cpu_user_percent, nt.cpu_system_percent, nt.mem_used_percent, nt.mem_swap_percent, bu.sku_name, SUM(bu.usage_quantity * lp.pricing.default) AS total_cost, cc.cluster_name, cc.owned_by FROM system.compute.node_timeline nt JOIN system.compute.clusters cc ON nt.cluster_id = cc.cluster_id AND nt.workspace_id = cc.workspace_id JOIN system.billing.usage bu ON nt.cluster_id = bu.usage_metadata.cluster_id AND nt.workspace_id = bu.workspace_id JOIN system.billing.list_prices lp ON bu.cloud = lp.cloud AND bu.sku_name = lp.sku_name AND bu.usage_start_time >= lp.price_start_time AND (bu.usage_end_time <= lp.price_end_time OR lp.price_end_time IS NULL) WHERE bu.usage_date >= CURRENT_DATE() - INTERVAL 30 DAY AND bu.sku_name LIKE "%ALL_PURPOSE_COMPUTE%" GROUP BY nt.cluster_id, nt.cpu_user_percent, nt.cpu_system_percent, nt.mem_used_percent, nt.mem_swap_percent, bu.sku_name, cc.cluster_name, cc.owned_by LIMIT 20
-
What are the most expensive DLT clusters for the past 30 days?
-- Most expensive delta live tables clusters SELECT usage_metadata.cluster_id AS cluster_id, identity_metadata.created_by AS created_by, u.sku_name, SUM(u.usage_quantity * lp.pricing.default) AS total_cost FROM system.billing.usage u INNER JOIN system.billing.list_prices lp ON u.cloud = lp.cloud AND u.sku_name = lp.sku_name AND u.usage_start_time >= lp.price_start_time AND (u.usage_end_time <= lp.price_end_time OR lp.price_end_time IS NULL) WHERE u.usage_metadata.cluster_id IS NOT NULL AND u.usage_date >= CURRENT_DATE() - INTERVAL 30 DAY AND u.sku_name LIKE "%DLT%" GROUP BY usage_metadata.cluster_id, identity_metadata.created_by, u.sku_name ORDER BY total_cost DESC LIMIT 20
Performance Management
-
Which notebooks are consuming the most DBUs over time?
-- Notebooks that have consumed the most DBUs SELECT usage_metadata.notebook_id AS notebook_id, usage_metadata.notebook_path AS notebook_path, SUM(usage_quantity) AS total_usage FROM system.billing.usage WHERE usage_metadata.notebook_id IS NOT NULL GROUP BY notebook_id, notebook_path ORDER BY total_usage DESC LIMIT 10;
-
What are my longest running queries?
-- Slowest queries based upon total duration SELECT statement_id, executed_by, statement_text, total_duration_ms, execution_duration_ms, compilation_duration_ms, waiting_for_compute_duration_ms, waiting_at_capacity_duration_ms, start_time, end_time FROM system.query.history ORDER BY total_duration_ms DESC LIMIT 10;
Data and Workflow Management
-
What are my most commonly used datasets? Where are they stored?
-- Most Used Data Sources by Jobs SELECT coalesce(source_table_full_name, source_path) source, count(distinct entity_id) distinct_jobs from system.access.table_lineage where entity_type = 'JOB' and coalesce(source_table_full_name, source_path) is not null group by 1 order by 2 desc limit 10;
-
How do I track lineage for data transformations in the workspace?
-- Data Lineage (Table Sources) SELECT distinct target_table_full_name target, coalesce(source_table_full_name, source_path) source from system.access.table_lineage where entity_type = 'JOB' and coalesce(source_table_full_name, source_path) is not null and target_table_full_name is not null order by 1 asc;
Get your 1-click
Databricks Dashboard!
Wow the team with a comprehensive Databricks dashboard across Jobs, SQL, APC, DLT!
Get a birds eye view the team’s Databricks usage and visualize all of the queries above in just a couple clicks.
