Gradient

March 2024 Release Notes

release notes

Our team has been hard at work to deliver industry-leading features to support users in achieving optimal performance within the Databricks ecosystem. Take a look at our most recent releases below.

Worker Instance Recommendations

Introducing Worker Instance Recommendations directly from the Sync UI. With this feature, you are able to tap into optimal cluster configuration recos so that your configs are optimized for individual jobs.

The instance recos within Gradient not only optimize the number of workers, but also the worker size. For example, if you are using i3.2xl instances, Gradient will find the right instance size (such as i3.xl, i3.4xl, i3.8xl, etc) in the i3 instance type.


Instance Fleet Support

If your company is using Instance Fleet Clusters, Gradient is now compatible!  There are no changes required on the user flow, as this feature is automatically supported in the backend.  Just onboard your jobs like normal into Gradient, and we’ll handle the rest.

Hosted Log Collection


Running Gradient is now more streamlined than ever! You’re now able to opt into hosted log collection entirely in the Sync environment with Sync-hosted or user-hosted collection options. What does this mean? It means that there are no extra steps or external clusters needed to run Gradient, allowing Sync to do all the heavy lifting while minimizing the impact on your Databricks workspace. 

With hosted DBX log collection within Gradient, you’re able to minimize onboarding errors due to annoying permission settings while increasing visibility into any potential collection failures, ultimately giving you and your team more control over your cluster log data.


Getting Started with Collection Setup
The Databricks Workspace integration flow is triggered when a user clicks on Add → Databricks Workspace after they have configured their workspace and webhook. Users will also now have a toggle option to choose between Sync-hosted (recommended) or User-hosted collection.

  • Sync-hosted collection – The user will be optionally prompted to share their preference for cluster logs stored for their Databricks Jobs. This will initially be an immutable setting saved on the Workspace.
    • For AWS – Users will need to add a generated IAM policy and IAM Role to their AWS account. The IAM policy allows us to ec2:DescribeInstances, ec2:DescribeVolumes, and optionally an s3:GetObject and s3:ListBucket to the specific bucket and prefix to which users have configured uploading cluster logs. S3 permissions are optional because they may be using DBFS to record cluster logs. The user needs to add a “Trusted Relationship” to the IAM role to give our Sync IAM role permissions to sts:AssumeRole using an ExternalId we provide them. Gradient will then generate this policy and trust relationship for the user in a JSON format to be copy and pasted.
    • For Azure – Coming soon!
  • User-hosted collection – For both Azure/AWS will proceed as the normal workspace integration requirements dictate

Stay up to date with the latest feature releases and updates at Sync by visiting our Product Updates documentation.

Ready to start getting the most out of your Databricks job clusters? Request a demo or reach out to us at info@synccomputing.com.

February 2024 Release Notes

release notes

We’re excited to share all the new and improved features that our team has recently released to help our customers gain full governance over their Databricks infrastructure.

Databricks Workspace Integration
Introducing the Databricks Workspace Integration for Gradient. With this new feature, you’re able to further simplify the process of connecting your Databricks Workspace to the Sync platform. This capability eases the tedious process of consolidating with the Gradient UI without the use of the Sync CLI.

To get started, head to the integrations tab in your Sync dashboard. Here you’ll see a list that includes Databricks Workspace. Navigate to the Add dropdown menu and click on the Databricks Workspace dropdown option to trigger the integration flow.


Log in to Gradient to get started.

Project Reset Data
As users integrate their projects into Sync, they are often faced with sudden config changes. Project Reset is a capability built directly into the Sync platform in which users will be able to perform a hard  “reset” on the data for a project, ultimately triggering the build of a new custom model for the related job.

Now available via the Sync API, coming soon to the Sync UI


With this new capability, you’re able to reset the following directly from the Sync UI:

  • Historical logs
  • Resets the selected project back to “learning” mode
  • Clears project graphs
  • Clears the project’s history table
{
  "result": [
    {
      "created_at": "2024-02-21T02:35:46.806Z",
      "updated_at": "2024-02-21T02:35:46.806Z",
      "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "name": "string",
      "app_id": "string",
      "cluster_path": "string",
      "job_id": "string",
      "workspace_id": "string",
      "workflow_id": "string",
      "creator_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "product_code": "aws-emr",
      "description": "string",
      "status": "Pending Setup",
      "cluster_log_url": "string",
      "prediction_preference": "performance",
      "auto_apply_recs": true,
      "prediction_params": {
        "sla_minutes": 0,
        "force_ondemand_workers": true,
        "fix_worker_family": true,
        "fix_driver_type": true,
        "fix_scaling_type": true
      },
      "tuned_cost": 0,
      "tuned_runtime": 0,
      "project_model_id": "UNASSIGNED",
      "metrics": {
        "job_success_rate_percent": 0,
        "sla_met_percent": 0
      },
      "latest_prediction_id": "string",
      "latest_prediction_created_at": "string",
      "creator": {
        "created_at": "2024-02-21T02:35:46.806Z",
        "updated_at": "2024-02-21T02:35:46.806Z",
        "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "sync_tenant_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "email": "string",
        "name": "string",
        "last_login": "string"
      },
      "phase": "LEARNING",
      "optimize_instance_size": true,
      "project_periodicity_type": "DAILY_SINE",
      "product_name": "string"
    }
  ]
}


User Management
With User Management, you’re able to take a hands-on approach to managing your users in Gradient. With this feature, account owners can:

  • Add a user
  • Deactivate a user
  • Assign a specific role to a user

Stay up to date with the latest feature releases and updates at Sync by visiting our Product Updates documentation.

Ready to start getting the most out of your Databricks job clusters? Reach out to us at info@synccomputing.com.

Sync Computing Partners with Databricks for Lakehouse Job Cluster and Usage Optimization

Self-improving machine learning algorithms provide job cluster optimization and insights for Databricks users

CAMBRIDGE, Mass. – Sync Computing, the industry-leading data infrastructure management platform built to leverage machine learning (ML) algorithms that allow users to automatically maximize data compute performance, today announced that it has joined forces with Databricks go-to-market (GTM) teams and their Technology Partner Program. The end goal is to help Databricks customers achieve lower costs, improved reliability, and automatic management of compute clusters at scale. With the collaboration of the two technology powerhouses efforts, Databricks customers will gain the opportunity to take advantage of Sync Computing’s Gradient solution for SLA optimization, real-time insights, and significant cost savings so that teams are able to focus on greater business objectives and ROI.

Platform and data engineering teams are constantly faced with changing pressures as the data infrastructure landscape becomes increasingly complex. They are met with ongoing needs to iterate quickly, gain real-time insights, and maximize performance all while managing cost. The Gradient platform by Sync Computing provides a single source of truth for cost tracking, data governance, and unified metrics monitoring.

The management and cost of data pipelines is top of mind for engineering teams especially in the current economic climate. However, tuning clusters to hit cost and runtime goals is a task nobody has time for,” said Jeffrey Chou, CEO and co-founder of Sync Computing. “Databricks customers who use Sync’s Gradient toolkit are now open to a whole new world of opportunities as they can offload these tasks to Gradient while they focus on more urgent business goals. Organizations absolutely love the ROI they see almost immediately.”

Sync Computing’s machine learning-powered optimization delivers recommendations for Databricks clusters, without making any changes at the code level. Using a closed-loop feedback system, Gradient automatically builds custom-tuned machine learning models for each Databricks job it is managing using historical run logs — continuously driving Databricks jobs cluster configurations to hit user-defined business goals.

Sync for Databricks allows companies to:

  • Enable platform teams full governance over config changes to meet business demands
  • Slash Databricks compute and operating costs by up to 50%
  • Gain coveted insights into DBU, cloud costs, and cluster anomalies
  • Hit SLAs even as data pipelines change

Sync integrates with leading cloud platforms like Amazon Web Services (AWS) and Microsoft Azure to programmatically optimize for tools like Apache Airflow and Databricks workflows, without changing a single line of code.

Learn how Sync helps organizations large and small optimize Databricks clusters at scale here.

About Sync Computing
Having been recognized as a Gartner Cool New Vendor, Sync Computing was originally spun out of MIT with the goal to make data and AI cloud infrastructure easier to control. With Sync’s one-of-a-kind solution, Gradient, users are given full ability to enable self-improving job clusters to hit SLA goals, gain infrastructure insights, and leverage tailored recommendations to achieve optimal performance. Recognized names such as Insider, Handelsblatt, Abnormal Security, Duolingo, and Adobe have relied on Sync to get the most out of the data-driven landscape with automated data optimization. To learn more, visit https://www.synccomputing.com.

Contact
McKinley Culbert
Marketing at Sync Computing
mckinley.culbert@synccomputing.com

Gradient New Product Update Q4 2023

Today we are excited to announce our next major product update for Gradient to help companies optimize their Databricks Jobs clusters.  This update isn’t just a simple UI upgrade…

We upgraded everything from the inside out! 

Without burying the lead, here’s a screenshot of the new project page for Gradient above.

Back in the last week of June of this year (2023), we debuted our first release of Gradient.  In the past few months we gathered all of the user feedback on how we can make the experience even better.

So what were the high level major feature requests that we learned in the past few months?

  • Visualizations – Visual graphs which show the cost and runtime impact of our recommendations to see the impact and ROI of Gradient
  • Easier integration – Easier “one-click” installation and setup experience with Databricks
  • More gains – Larger cost savings gains custom tailored to the unique nature of each job
  • Azure support – A large percentage of Databricks users are on Azure, and obviously they wanted us to support them

Those features requests weren’t small and required pretty substantial changes from the backend to the front, but at the end of the day we couldn’t agree more with the feedback.  While a sane company would prioritize and tackle these one by one, we knew each one of these were actually interrelated behind the scenes, and it wasn’t just a simple matter of checking off a list of features.

Here’s our high level demo video to see the new features in action!

So we took the challenge head on and said “let’s do all of it!”   With all of that in mind, let’s walk through each awesome new features!

Feature #1:  See Gradient’s ROI with cost and runtime Visualizations

With new timeline graphs users can see in real-time the performance of their jobs and what impact Gradient is having. As a general monitoring tool, users can now see the impact of various cloud anomalies on their cost and runtime. A summary of benefits is below:

  • Monitor your jobs total costs across both DBUs and cloud fees in real-time to stay informed
  • Ensure your job runtimes and SLAs are met
  • Learn what anomalies are impacting your jobs’ performances
  • Visualize Gradient’s value in by watching your cost and runtime goals being met

Feature #2:  Cluster integrations with AWS and Azure

Gradient now interfaces with both AWS and Azure cloud infrastructure to obtain low level metrics. We know many Databricks enterprises utilize Azure and this was a highly requested feature. A summary of benefits is below:

  • Granular compute metrics are obtained by retrieving cluster logs beyond what Databricks exposes in their system tables
  • Integrate with Databricks Workflows or Airflow to plug Gradient into how your company runs your infrastructure
  • Easy metrics gathering as Gradient does the heavy lifting for you and automatically compiles and links information across both Databricks and cloud environments

Feature #3:  A new machine learning algorithm that custom learns each job

A huge upgrade from our previous solution is a new machine learning algorithm that learns the behavior of each job individually before optimizing. One lesson we learned is each job is unique, from python, to SQL, to ML, to AI, the variety of codebases out there is massive. A blanket “heuristic” solution was not scalable, and it was clear we needed something far more intelligent. A summary of the benefits is below:

  • Historical log information is used to train custom models for each of your jobs.  Since no two jobs are alike, custom models are critical to optimizing at scale.
  • Accuracy is ensured by training on real job performance data
  • Stability is obtained with small incremental changes and monitoring to ensure reliable performance

Feature #4: Auto-import and setup all of your jobs with a single click

Integrating with the Databricks environment is not easy, as most practitioners can attest to. We invested a lot of development into “how do we make it easy to on-board jobs?” After a bunch of work and talking to early users – we’ve built the easiest system we could find – just push a button.

Behind the scenes, we’re interacting with the Databricks API, tokens, secrets, init scripts, webhooks, logging files, cloud compute metrics, storage – just to name a few. A summary of the benefits is below:

  • Gradient connects to your Databricks workspace behind the scenes to make importing and setting up job clusters as easy as a single click
  • Non-invasive webhook integration is used to link your environment with Gradient without any modifications to your code or workflows

Feature #5:  View and approve recommendations with a click

With all of the integration setup done in the previous feature, applying recommendation is now a piece of cake. Just click a button and your Databricks jobs will be automatically updated. No need to go into the DB console or change anything in another system. We take care of all of that for you! A summary of the benefits is below:

  • View recommendations in the Gradient UI for your approval before any changes are actually made
  • Click to approve and apply a single recommendation so you are always in control

Feature #6:  Change your SLA goals at any time

We always believed that business should drive infrastructure, not the other way around. Now you can change your SLA goals at anytime and Gradient will change your cluster settings to meet your goals. With the new visualizations, you can see everything changing in real time as well. A summary of the benefits is below:

  • Runtime SLA goals ultimately dictate the cost and performance of your jobs.  Longer SLAs can usually lead to lower costs, while shorter SLAs could lead to higher costs.
  • Goals change constantly for your business, Gradient allows your infrastructure to keep up at scale
  • Business lead infrastructure allows you to start with your business goals and work backwards to your infrastructure, not the other way around

Feature #7:  Enable auto-apply for self-improving jobs

One big request was for users at scale, who have hundreds or thousands of jobs. There’s no way someone would want to click an “apply” button 1000x a day! So, for our ultimate experience, we can automatically apply our recommendations and all you have to do is sit back and watch the savings. A summary of the benefits is below:

  • Focus on business goals by allowing Gradient to constantly improve your job clusters to meet your ever changing business needs
  • Optimize at scale with auto apply, no need to manually analyze individual jobs – just watch Gradient get to work across all of your jobs
  • Free your engineers from manually tweaking cluster configurations and allowing them to focus on more important work

Try it yourself!

We’d love to get your feedback on what we’re building.  We hope these features resonate with you and your use case.  If you have other use cases in mind, please let us know! 

To get started – see our docs for the installation process!

Connect with us now via booking a demo, chatting with us, or emailing us at support@synccomputing.com.

How to Use the Gradient CLI Tool to Optimize Databricks / EMR Programmatically

Introduction:

The Gradient Command Line Interface (CLI) is a powerful yet easy utility to automate the optimization of your Spark jobs from your terminal, command prompt, or automation scripts. 

Whether you are a Data Engineer, SysDevOps administrator, or just an Apache Spark enthusiast, knowing how to use the Gradient CLI can be incredibly beneficial as it can dramatically reduce the cost of your Spark workloads and while helping you hit your pipeline SLAs. 

If you are new to Gradient, you can learn more about it in the Sync Docs. In this tutorial, we’ll walk you through the Gradient CLI’s installation process and give you some examples of how to get started. This is meant to be a tour of the CLI’s overall capabilities. For an end to end recipe on how to integrate with Gradient take a look at our Quick Start and Integration Guides.

Pre Work

This tutorial assumes that you have already created a Gradient account and generated your

Sync API keys. If you haven’t generated your key yet, you can do so on the Accounts tab of the Gradient UI.

Step 1: Setting up your Environment

Let’s start by making sure our environment meets all the prerequisites. The Gradient CLI is actually part of the Sync Library, which requires Python v3.7 or above and which only runs on Linux/Unix based systems.

python --version

I am on a Mac and running python version 3.10, so I am good to go, but before we get started I am going to create a Python virtual environment with vEnv. This is a good practice for whenever you install any new Python tool, as it allows you to avoid conflicts between projects and makes environment management simpler. For this example, I am creating a virtual environment called gradient-cli that will reside under the ~/VirtualEnvironments path.

python -m venv ~/VirtualEnvironments/gradient-cli

Step 2: Install the Sync Library

Once you’ve confirmed that your system meets the prerequisites, it’s time to install the Sync Library. Start by activating your new virtual environment.

source ~/VirtualEnvironments/gradient-cli/bin/activate

Next use the pip package installer to install the latest version of the Sync Library.

pip install https://github.com/synccomputingcode/syncsparkpy/archive/latest.tar.gz

You can confirm that the installation was successful by viewing the CLI executable’s version by using the –version or –help options.

sync-cli --help

Step 3. Configure the Sync Library

Configuring the CLI with your credentials and preferences is the final step for the installation and setup for the Sync CLI. To do this, run the configure command:

sync-cli configure

You will be prompted for the following values:

Sync API key ID:

Sync API key secret:

Default prediction preference (performance, balanced, economy) [economy]:

Would you like to configure a Databricks workspace? [y/n]:

Databricks host (prefix with https://):

Databricks token:

Databricks AWS region name:

If you remember from the Pre Work, your Sync API key & secret are found on the Accounts tab of the Gradient UI. For this tutorial we are running on Databricks, so you will need to provide a Databricks Workspace and an Access token.


Databricks recommends that you set up a service principal for automation tasks. As noted in their docs, service principals give automated tools and scripts API-only access to Databricks resources, providing greater security than using users or groups.

These values are stored in ~/.sync/config.

Congrats! You are now ready to interact with Gradient from your terminal, command prompt, or automation scripts.

Step 4. Example Uses

Below are some tasks you can complete using the CLI. This is useful when you want to automate Gradient processes and incorporate them into larger workflows.

Projects

All Gradient recommendations are stored in Projects. Projects are associated with a single Spark job or a group of jobs running on the same cluster. Here are some useful commands you can use to manage your projects with the CLI. For an exhaustive list of commands use the –help option.

Project Commands:

create – Create a project

sync-cli projects create --description [TEXT] --job-id [Databricks Job ID] PROJECT_NAME

delete – Delete a project

sync-cli projects delete PROJECT_ID

get – Get info on a project

sync-cli projects get PROJECT_ID

list – List all projects for account

sync-cli projects list

Predictions

You can also use the CLI to manage, generate and retrieve predictions. This is useful when you want to automate the implementation of recommendations within your Databricks or EMR environments.

Prediction commands:

get – Retrieve a specific prediction

sync-cli predictions get --preference [performance|balanced|economy] PREDICTION_ID

list – List all predictions for account or project

sync-cli predictions list --platform [aws-emr|aws-databricks] --project TEXT

status – Get the status of a previously initiated prediction

sync-cli predictions status PREDICTION_ID

The CLI also provides platform specific commands to generate and retrieve predictions.

Databricks

For Databricks you can generate a recommendation for a previously completed job run with the following command:

sync-cli aws-databricks create-prediction --plan [Standard|Premium|Enterprise] --compute ['Jobs Compute'|'All Purpose Compute'] --project [Your Project ID] RUN_ID

If the run you provided was not already configured with the Gradient agent when it executed, you can still generate a recommendation but the basis metrics may be missing some time sensitive information that may no longer be available. To enable evaluation of prior logs executed without the Gradient agent, you can add the –allow-incomplete-cluster-report option. However, to avoid this issue altogether, you can implement the agent and re-run the job.

Alternatively, you can use the following command to run the job and request a recommendation with a single command:

sync-cli aws-databricks run-job --plan [Standard|Premium|Enterprise] --compute ['Jobs Compute'|'All Purpose Compute'] --project [Your Project ID] JOB_ID

This method is useful in cases when you are able to manually run your job without interfering with scheduled runs.

Finally, to implement a recommendation and run the job with the new configuration, you can issue the following command:

sync-cli aws-databricks run-prediction --preference [performance|balanced|economy] JOB_ID PREDICTION_ID

EMR

Similarly, for Spark EMR, you can generate a recommendation for a previously completed job. EMR does not have the same issue with regard to ephemeral cost data not being available, so you can request a recommendation on a previous run without the Gradient agent.

sync-cli aws-emr create-prediction --region [Your AWS Region] CLUSTER_ID

Use the following command to do so:

If you want to manually rerun the EMR job and immediately request a Gradient recommendation, use the following command:

sync-cli aws-emr record-run --region [Your AWS Region] CLUSTER_ID PROJECT

To execute the EMR job using the recommended configuration, use the following command:

sync-cli aws-emr run-prediction --region [Your AWS Region] PREDICTION_ID

Products

Gradient is constantly working on adding support for new data engineering platforms. To see which platforms are supported by your version of the CLI, you can use the following command:

sync-cli products

Configuration

Should you ever need to update your CLI configurations, you can call config again to change one or more your values.

sync-cli configure --api-key-id TEXT --api-key-secret TEXT --prediction-preference TEXT --databricks-host TEXT --databricks-token TEXT --databricks-region TEXT

Token

The Token command returns an access token that you can use against our REST API with clients like postman

sync-cli token

Conclusion

With these simple commands, you can automate the end to end optimization of all your Databricks or EMR workloads, dramatically reducing your costs and improving the performance. For more information refer to our developer docs or reach out to us at info@synccomputing.com.

Introducing: Gradient for Databricks

Wow the day is finally here! It’s been a long journey, but we’re so excited to announce our newest product: Gradient for Databricks.

Checkout our promo video here!

The quick pitch

Gradient is a new tool to help data engineers know when and how to optimize and lower their Databricks costs – without sacrificing performance.

For the math geeks out there, the name Gradient comes from the mathematical operator from vector calculus that is commonly used in optimization algorithms (e.g. gradient descent).

Over the past 18 months of development we’ve worked with data engineers around the world to understand their frustrations when trying to optimize their Databricks jobs. Some of the top pains we heard were:

  • “I have no idea how to tune Apache Spark”
  • “Tuning is annoying, I’d rather focus on development”
  • “There are too many jobs at my company. Manual tuning does not scale”
  • “But our Databricks costs are through the roof and I need help”

How did companies get here?

We’ve worked with companies around the world who absolutely love using Databricks. So how did so many companies (and their engineers) get to this efficiency pain point? At a high level, the story typically goes like this:

  • “The Honeymoon” phase: We are starting to use Databricks and the engineers love it
  • “The YOLO” phase: We need to build faster, let’s scale up ASAP. Don’t worry about efficiency.
  • “The Tantrum” phase: Uh oh. Everything on Databricks is exploding, especially our costs! Help!

So what did we do?

We wanted to attack the “Tantrum” problem head on. Internally three teams of data science, engineering, and product worked hand in hand with early design partners using our Spark Autotuner to figure out how to deliver a deeply technical solution that was also easy and intuitive. We used all the feedback on the biggest problems to build Gradient:

User Problem What Gradient Does
I don’t know when, why, or how to optimize my jobsGradient continuously monitors your clusters to notify you of when a new optimization is detected, estimate the cost/performance impact, and output a JSON configuration file to easily make the change.
I use Airflow or Databricks Workflows to orchestrate our jobs, everything I use must easily integrate.Our new python libraries and quick-start tutorials for Airflow and Databricks Workflows make it easy to integrate Gradient into your favorite orchestrators.
I just want to state my runtime requirements, and then still have my costs loweredSimply set your ideal max runtime and we’ll configure the cluster to hit your goals at the lowest possible cost.
My company wants us to use Autoscaling for our jobs clustersWhether you use auto-scaled or fixed clusters, Gradient supports optimizing both (or even switching from one to the other). 
I have hundreds of Databricks jobs. I need batch importing for optimizing to workProvide your Databricks token, and we’ll do all the heavy lifting of automatically fetching all of your qualified jobs and importing them into Gradient.

We want to hear from you!

Our early customers made Gradient what it is today, and we want to make sure it’s meeting companies’ needs. We made getting started super easy (you can Just login to the app here). Feel free to browse the docs here. Please let us know how we did via Intercom (in the docs and app).