Optimizing EC2 costs on Databricks
Optimizing EC2 costs on Databricks
The global data landscape is experiencing remarkable growth, with unprecedented increases in data generation and substantial investments in analytics and infrastructure. According to data from sources like Network World and, G2 the global datasphere is projected to expand from 33 zettabytes in 2018 to an astounding 175 zettabytes by 2025, reflecting a compound annual growth

Noa Shavit
27 Jan 2025
Blog
Data Lake vs. Data Warehouse vs. Data Lakehouse
Data Lake vs. Data Warehouse vs. Data Lakehouse
Data is central to modern business and society. Depending on what sort of leaky analogy you prefer, data can be the new oil, gold, or even electricity. Of course, even the biggest data sets are worthless, and might even be a liability, if they aren’t organized properly. Data collected from every corner of modern society

Noa Shavit
07 Nov 2024
Blog
Databricks driver sizing impact on cost and performance
Databricks driver sizing impact on cost and performance
As many previous blog posts have reported, tuning and optimizing the cluster configurations of Apache Spark is a notoriously difficult problem. Especially when a data engineer needs to lower costs or accelerate runtimes on platforms such as EMR or Databricks on AWS, tuning these parameters becomes a high priority. Here at Sync, we will experimentally

Jeffrey Chou
07 Feb 2023
Blog, Case Study