Introducing the Sync Databricks Workspace Health Check

Introducing the Sync Databricks Workspace health check, a program that we’ve spearheaded to help Databricks users identify common mistakes in their Workspaces.

Here at Sync, we’ve worked with a ton of companies and looked at their overall Databricks workspace usage. We’ve seen all sorts of usage from jobs, all-purpose compute, SQL warehouses, to Delta Live tables and have seen many recurring patterns.

While many companies do operate Databricks well, there are some patterns we’ve observed that have led to wasted compute resources and inflated costs. As a result, we built a tool to help quickly identify these common pitfalls and give users a quick rundown of the health of their overall usage.

With our personalized health check, you’re able gain insight into:

  • Your top 10 jobs most qualified for Gradient
  • Candidates for EBS, Photon, and autoscaling
  • Compute cluster utilization scoring
  • SQL Warehouse utilization efficiency
  • 12-month projected usage growth
  • Estimated overall cost savings
  • Incorrectly run jobs on all-purpose compute clusters

While we foresee the health check to continue to evolve and grow, let’s dive into some of the popular metrics used today:

Nail down APC vs. jobs compute usage to significantly reduce costs by identifying and leveraging the most cost effective job compute option, immediately allowing for savings up to 50%. Companies small and large often incorrectly use all-purpose compute clusters for their production jobs, when they should be using jobs clusters. While this is a subtle detail, it can instantly lead to a 2x cost reduction with just a few clicks.

Visualize APC and warehouse utilization to identify underused clusters and warehouses within your workspace. Both APC and SQL warehouses can fall into the same pitfall of being “always on” even though nobody is using them. With our health check, you’re able to quickly see where that is happening and how to prevent it.

Efficiently select instances across an organization to determine if your users are opting out of default settings in an effort to optimize. Platform teams thrive when they’re able to see the distribution of instances that are being used. This helps identify what kind of clusters are popular and effective. If the Databricks default cluster is used often (e.g. in AWS it’s “i3”), it’s likely that team members are opting for default settings and aren’t spending much time trying to find better instances for optimal performance.

Gain a better understanding of EBS, Photon, and autoscaling optimization insights by identifying how many clusters use these features to assess potential savings that could add major benefit to your jobs. Photon and Autoscaling are options Databricks often recommends for job clusters. However, these features are only beneficial some of the time, ultimately depending on the characteristic details of your job.

Rank your top Jobs candidates for Gradient based on schedule, duration, and consistency. One of the largest sources of cost are jobs clusters used in production. Sync’s core product offering helps to automatically optimize these clusters for cost and performance. When you’re working with hundreds, or even thousands, of jobs in your workload, it can be daunting to identify which jobs should take priority. To help with this, your Workspace health check includes a proprietary ranking system that identifies jobs to see if Gradient’s cluster optimizations are a good fit.

Our health check notebook is an easy-to-use solution that you can run on your own at zero cost to you. 

Want to get a head start and learn more about integrating Gradient into your stack? Head here to request your personalized Databricks health check.

Get a bird’s eye view of your company’s Databricks usage today.