Do Graviton instances lower costs for Spark on EMR on AWS?
Do Graviton instances lower costs for Spark on EMR on AWS?
Here at Sync we are passionate about optimizing cloud infrastructure for Apache Spark workloads. One question we receive a lot is “Do Graviton instances help lower costs?” For a little background information, AWS built their own processors which promise to be a “major leap” in performance. Specifically for Spark on EMR, AWS published a report
Jeffrey Chou
04 Apr 2023
Blog, Case Study
How poor provisioning of cloud resources can lead to 10X slower Apache Spark jobs
How poor provisioning of cloud resources can lead to 10X slower Apache Spark jobs
The Situation Let’s say you’re a data engineer and you want to run your data/ML Spark job on AWS as fast as possible. You want to avoid slow Apache Spark performance. After you’ve written your code to be as efficient as possible, it’s time to deploy to the cloud. Here’s the problem, there are over
Jeffrey Chou
24 Mar 2023
Blog
How does the worker size impact costs for Apache Spark on EMR AWS?
How does the worker size impact costs for Apache Spark on EMR AWS?
Here at Sync, we are passionate about optimizing data infrastructure on the cloud, and one common point of confusion we hear from users is what kind of worker instance size is best to use for their job? Many companies run production data pipelines on Apache Spark in the elastic map reduce (EMR) platform on AWS.
Jeffrey Chou
01 Mar 2023
Blog, Case Study
Databricks driver sizing impact on cost and performance
Databricks driver sizing impact on cost and performance
As many previous blog posts have reported, tuning and optimizing the cluster configurations of Apache Spark is a notoriously difficult problem. Especially when a data engineer needs to lower costs or accelerate runtimes on platforms such as EMR or Databricks on AWS, tuning these parameters becomes a high priority. Here at Sync, we will experimentally
Jeffrey Chou
07 Feb 2023
Blog, Case Study
Is Databricks autoscaling cost efficient?
Is Databricks autoscaling cost efficient?
Here at Sync we are always trying to learn and optimize complex cloud infrastructure, with the goal to help more knowledge to the community. In our previous blog post we outlined a few high level strategies companies employ to squeeze out more efficiency in their cloud data platforms. One very popular response from mid-sized to
Jeffrey Chou
20 Jan 2023
Blog, Case Study
The top 6 lessons learned why companies struggle with cloud data efficiency
The top 6 lessons learned why companies struggle with cloud data efficiency
Here at Sync, we’ve spoken with companies of all sizes, from some of the largest companies in the world to 50 person startups who desperately need to improve their cloud costs and efficiencies for their data pipelines. Especially in today’s uncertain economy, companies worldwide are implementing best practices and utilizing SaaS tools in an effort
Jeffrey Chou
14 Dec 2022
Blog