Sync Autotuner for Apache Spark – API Launch!

Sync Gradient - Giving you choice and greater control over cost and runtime of your Apache Spark jobs

The Sync Autotuner has enabled developers, data engineers, and data scientists, from small startups to large enterprises, easily tune their Spark jobs and reduce costs, improve runtime, or both.

Infrastructure tuning can significantly impact data engineering productivity. Most developers and data engineers will tell you that trying to figure out the optimal Spark and cluster configurations for a Spark job is a tedious and time consuming effort involving a lot of trial and error. There are practically an infinite number of infrastructure choices to make and it isn’t feasible to try each of these out. When you finally land on the optimal configuration, a change in the input data size, code, or spot market availability throws all that effort out the window.

The Sync Autotuner will quickly provide you with the most optimal set of cluster configurations, in terms of cost, runtime, and infrastructure selection. Also, it is able to do this using data from a single run. The Sync Autotuner provides a UI through which you can upload Spark job run information — the Spark event log and cluster details — and receive recommendations in the form of cluster configurations that optimize your Spark job run. A data engineer can then quickly and easily select a recommendation from the list, update the job configuration, and rest assured that the job is tuned.

The Sync Autotuner helped Duolingo reduce their ETL and ML costs by 50%, and it helped a Sr. Data Engineer at Disney save $100K in annualized costs while improving job runtime by 73%!  

The Sync Autotuner UI is a quick and simple way for users to try out the Sync Gradient on one of their existing Spark jobs and get real metrics on what the Sync Gradient can do for their Spark jobs. The Sync Autotuner API scales with you – it is there for you whether you want to tune a single job, a few jobs, or all your Spark workloads.

Sync Autotuner API – Programmatic access to the Sync Gradient

All the power of the Sync Autotuner available to you as REST APIs.

The Sync Autotuner API gives you programmatic access, in the form of REST APIs, to the Sync Autotuner . Using this programmatic access, you are able to completely automate the work of generating optimal configurations for your Spark jobs, and you are able to do this at scale for all your Spark jobs.

The recommendations returned by the Sync Autotuner API aim to provide the convenience of “plug and play” – optimal configurations for Spark on AWS EMR are returned using the AWS EMR RunJobFlow schema and optimal configurations for AWS Databricks are returned in a format that make it easy for you set your Databricks cluster configuration.

We realize that there are many ways in which clusters can be spun up to run Spark jobs. We’ve designed our recommendations to be as simple and straightforward as possible making it easy for you to extract optimal configurations and plug it into your workflows.

A typical Sync Autotuner API workflow for a single Spark job has the following steps:

  • Run your Spark job and wait for it to complete
  • Call Sync Autotuner API to run a prediction which generates a list of optimal configurations for your Spark job
    1. Initiate Prediction
    2. Check Prediction Status
    3. Get Prediction Results
  • Update your Spark job configuration with a recommended optimal configuration based on your business needs

When you scale the Sync Autotuner API workflow to span all your Spark jobs, you end up with a workflow that looks like the one in the figure below.

Tune all your Apache Spark jobs in just a few minutes

Let’s get started!

Check out our user guide and recipes to quickly get up and running with the Sync Autotuner API. If you don’t yet have access to the Sync Autotuner then you can request access here.

We’d love to hear from you and how you’re using the Sync Autotuner API – tweet at us, find us on LinkedIn, or email us at

Launch the Gradient API