Gradient

Gradient Product Update— Discover, Monitor, and Automate Databricks Clusters

Jeffrey Chou
06.06.2024

With the Databricks Data+AI Summit 2024 just around the corner, we of course had to have a major product launch to go with it!

We’re super excited to announce an entirely new user flow and features to the product, making it faster to get started and providing a more comprehensive management solution. At a high level, the new expansion involves these new features:

Discover – Find new jobs to optimize
Monitor – Track all job costs and metrics over time
Automate – Auto-apply to save time, costs, and hit SLAs

With ballooning Databricks costs and constrained budgets, Databricks efficiency is crucial for sustainable growth for any company. However, optimizing Databricks clusters is a difficult and time consuming task riddled with low level complexity and red tape.

Our goal with Gradient is to make it as easy and painless as possible to identify jobs to optimize, track overall ROI, and automate the optimization process.

The last automation piece is what sets Gradient apart. Gradient is designed to optimize at scale, for companies that have 100+ production jobs. At that scale, automation is a must, and is where we shine. Gradient provides a new level of efficiency unobtainable with any other tool on this planet.

With automatic cluster management, engineering teams are free to pursue more important business goals while Gradient works around the clock.

Let’s drill in a bit deeper into what these new features are:

Discover

Find your top jobs to optimize as well as discover new opportunities to improve your efficiency even more. This page is refreshed daily so you always get up-to date insights and historical tracking.

How to get started – Simply enter your Databricks credentials and click go! You can get running from scratch in less than a minute

What is shown:

Top jobs to optimize with Gradient
Jobs with Photon enabled
Jobs with Autoscaling enabled
All purpose compute jobs
Jobs with no-job id (meaning they could come from an external orchestrator like Airflow)

To see how fast and easy it is to get the Discover page up and running, check out the video below:

Monitor

Track Spark metrics and costs of all of your jobs managed with Gradient in a single pane of glass view. Use this view to get a bird’s eye view on all of your jobs and track the overall ROI of Gradient with the “Total Savings” view.

How to get started – Onboard your Databricks workspace in the integration page. This may require involving your devops teams as various cloud permissions are required.

What is shown:

Total core hours
Total Spend
Total recommendations applied
Total cost savings
Total estimated developer time saved
Total number of projects
Number of SLAs met

Automate

Enable auto-apply to automatically optimize your Databricks jobs clusters to hit your cost and runtime goals. Save time and money with automation.

How to get started – Onboard your Databricks workspace in the integration page (no need to repeat if already done above)

What is shown:

Job costs over time
Job runtime over time
Job configuration parameters
Cluster configurations
Spark metrics
Input data size

Conclusion

Get started in a minute yourself with the Discover page and start finding new opportunities to optimize your Databricks environment. Login yourself to get started!

Or if you’d prefer a hands on demo, we’d be happy to chat. Schedule a demo here

May 2024 Release Notes

Jeffrey Chou
05.29.2024

April showers bring May product updates! Take a look at Sync’s latest product releases and features. 💐

The Sync team is heading to San Francisco for the Databricks Data+AI Summit 2024! We’ll be at Booth #44 talking all things Gradient with a few new surprise features in store.

Want to get ahead of the crowd? Book a meeting with our team before the event here.

Download our Databricks health check notebook

Have you taken advantage of our fully customizable health check notebook yet?

With the notebook, you’ll be able to answer questions such as:
⚙️ What is the distribution of job runs by compute type?
⚙️ What does Photon usage look like?
⚙️ What are the most frequently used instance types?
⚙️ Are APC clusters being auto-terminated or sitting idle?
⚙️ What are my most expensive jobs?

The best part? It’s a free tool that gives you actionable insights so you can work toward optimally managing your Databricks jobs clusters.

Head here to get started.

Apache Airflow Integration

Apache Airflow for Databricks now directly integrates with Gradient. Via the Sync Python Library, users are able to integrate Databricks pipelines when using 3rd party tools like Airflow.

To get started simply integrate your Databricks Workspace with Gradient via the Databricks Workspace Integration. Then, configure your Airflow instance and ensure that the syncsparkpy library has been installed using the Sync CLI.

Take a look at an example Airflow DAG below:

from airflow import DAG
from airflow.decorators import task
from airflow.operators.python import PythonVirtualenvOperator
from airflow.providers.databricks.operators.databricks import DatabricksSubmitRunOperator
from airflow.utils.dates import days_ago
from airflow.models.variable import Variable
from airflow.models import TaskInstance

from sync.databricks.integrations.airflow import airflow_gradient_pre_execute_hook



default_args = {
    'owner': 'airflow'
}

with DAG(
    dag_id='gradient_databricks_multitask',
    default_args=default_args,
    schedule_interval = None,
    start_date=days_ago(2),
    tags=['demo'],
    params={
        'gradient_app_id': 'gradient_databricks_multitask',
        'gradient_auto_apply': True,
        'cluster_log_url': 'dbfs:/cluster-logs',
        'databricks_workspace_id': '10295812058'
    }
) as dag:

    def get_task_params():
        task_params = {
            "new_cluster":{
                  "node_type_id":"i3.xlarge",
                  "driver_node_type_id":"i3.xlarge",
                  "custom_tags":{},
                  "num_workers":4,
                  "spark_version":"14.0.x-scala2.12",
                  "runtime_engine":"STANDARD",
                  "aws_attributes":{
                     "first_on_demand":0,
                     "availability":"SPOT_WITH_FALLBACK",
                     "spot_bid_price_percent":100
                  }
               },
               "notebook_task":{
                  "notebook_path":"/Users/pete.tamisin@synccomputing.com/gradient_databricks_multitask",
                  "source":"WORKSPACE"
               }
        }

        return task_params

    notebook_task = DatabricksSubmitRunOperator(
        pre_execute=airflow_gradient_pre_execute_hook,
        task_id="notebook_task",
        dag=dag,
        json=get_task_params(),
    )

##################################################################


    notebook_task

And voila! After implementing your DAG, head to the Projects dashboard in Gradient to review recommendations and make any necessary changes to your cluster config.

Take a look at our documentation to get started.

April 2024 Release Notes

Jeffrey Chou
04.29.2024

Our April releases are here! Take a look at Sync’s latest product updates and features.

Sync’s Databricks Workspace health check is now self-serve and available as a notebook that you simply download and run on your own.

With the notebook, you’ll be able to answer questions such as:

⚙️ What is the distribution of job runs by compute type?
⚙️ What does Photon usage look like?
⚙️ What are the most frequently used instance types?
⚙️ Are APC clusters being auto-terminated or sitting idle?
⚙️ What are my most expensive jobs?

The best part? It’s a free tool that gives you actionable insights so you can work toward optimally managing your Databricks jobs clusters. Head here to get started.

Hosted Log Collection for Microsoft Azure

You’re now able to easily onboard your Databricks jobs on Azure. With Sync-hosted collection within Gradient, users are able to minimize onboarding errors with a “low-touch” integration process.

Want to give new features a try and learn more about the latest Gradient updates? Get started for free here.

Job Metrics Timeline View

Track custom Spark and Gradient metrics for your projects directly from the Gradient dashboard. With this enhanced view, you’re able to visualize metrics like core hours, number of workers, input data, and more!

March 2024 Release Notes

Jeffrey Chou
03.26.2024

Our team has been hard at work to deliver industry-leading features to support users in achieving optimal performance within the Databricks ecosystem. Take a look at our most recent releases below.

Worker Instance Recommendations

Introducing Worker Instance Recommendations directly from the Sync UI. With this feature, you are able to tap into optimal cluster configuration recos so that your configs are optimized for individual jobs.

The instance recos within Gradient not only optimize the number of workers, but also the worker size. For example, if you are using i3.2xl instances, Gradient will find the right instance size (such as i3.xl, i3.4xl, i3.8xl, etc) in the i3 instance type.

Instance Fleet Support

If your company is using Instance Fleet Clusters, Gradient is now compatible! There are no changes required on the user flow, as this feature is automatically supported in the backend. Just onboard your jobs like normal into Gradient, and we’ll handle the rest.

Hosted Log Collection

Running Gradient is now more streamlined than ever! You’re now able to opt into hosted log collection entirely in the Sync environment with Sync-hosted or user-hosted collection options. What does this mean? It means that there are no extra steps or external clusters needed to run Gradient, allowing Sync to do all the heavy lifting while minimizing the impact on your Databricks workspace.

With hosted DBX log collection within Gradient, you’re able to minimize onboarding errors due to annoying permission settings while increasing visibility into any potential collection failures, ultimately giving you and your team more control over your cluster log data.

Getting Started with Collection Setup
The Databricks Workspace integration flow is triggered when a user clicks on Add → Databricks Workspace after they have configured their workspace and webhook. Users will also now have a toggle option to choose between Sync-hosted (recommended) or User-hosted collection.

Sync-hosted collection – The user will be optionally prompted to share their preference for cluster logs stored for their Databricks Jobs. This will initially be an immutable setting saved on the Workspace.
- For AWS – Users will need to add a generated IAM policy and IAM Role to their AWS account. The IAM policy allows us to ec2:DescribeInstances, ec2:DescribeVolumes, and optionally an s3:GetObject and s3:ListBucket to the specific bucket and prefix to which users have configured uploading cluster logs. S3 permissions are optional because they may be using DBFS to record cluster logs. The user needs to add a “Trusted Relationship” to the IAM role to give our Sync IAM role permissions to sts:AssumeRole using an ExternalId we provide them. Gradient will then generate this policy and trust relationship for the user in a JSON format to be copy and pasted.
- For Azure – Coming soon!

User-hosted collection – For both Azure/AWS will proceed as the normal workspace integration requirements dictate

Stay up to date with the latest feature releases and updates at Sync by visiting our Product Updates documentation.

Ready to start getting the most out of your Databricks job clusters? Request a demo or reach out to us at info@synccomputing.com.

February 2024 Release Notes

Jeffrey Chou
02.21.2024

We’re excited to share all the new and improved features that our team has recently released to help our customers gain full governance over their Databricks infrastructure.

Databricks Workspace Integration
Introducing the Databricks Workspace Integration for Gradient. With this new feature, you’re able to further simplify the process of connecting your Databricks Workspace to the Sync platform. This capability eases the tedious process of consolidating with the Gradient UI without the use of the Sync CLI.

To get started, head to the integrations tab in your Sync dashboard. Here you’ll see a list that includes Databricks Workspace. Navigate to the Add dropdown menu and click on the Databricks Workspace dropdown option to trigger the integration flow.

Project Reset Data
As users integrate their projects into Sync, they are often faced with sudden config changes. Project Reset is a capability built directly into the Sync platform in which users will be able to perform a hard “reset” on the data for a project, ultimately triggering the build of a new custom model for the related job.

*Now available via the Sync API, coming soon to the Sync UI*

With this new capability, you’re able to reset the following directly from the Sync UI:

Historical logs
Resets the selected project back to “learning” mode
Clears project graphs
Clears the project’s history table

Successful response

{
  "result": [
    {
      "created_at": "2024-02-21T02:35:46.806Z",
      "updated_at": "2024-02-21T02:35:46.806Z",
      "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "name": "string",
      "app_id": "string",
      "cluster_path": "string",
      "job_id": "string",
      "workspace_id": "string",
      "workflow_id": "string",
      "creator_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
      "product_code": "aws-emr",
      "description": "string",
      "status": "Pending Setup",
      "cluster_log_url": "string",
      "prediction_preference": "performance",
      "auto_apply_recs": true,
      "prediction_params": {
        "sla_minutes": 0,
        "force_ondemand_workers": true,
        "fix_worker_family": true,
        "fix_driver_type": true,
        "fix_scaling_type": true
      },
      "tuned_cost": 0,
      "tuned_runtime": 0,
      "project_model_id": "UNASSIGNED",
      "metrics": {
        "job_success_rate_percent": 0,
        "sla_met_percent": 0
      },
      "latest_prediction_id": "string",
      "latest_prediction_created_at": "string",
      "creator": {
        "created_at": "2024-02-21T02:35:46.806Z",
        "updated_at": "2024-02-21T02:35:46.806Z",
        "id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "sync_tenant_id": "3fa85f64-5717-4562-b3fc-2c963f66afa6",
        "email": "string",
        "name": "string",
        "last_login": "string"
      },
      "phase": "LEARNING",
      "optimize_instance_size": true,
      "project_periodicity_type": "DAILY_SINE",
      "product_name": "string"
    }
  ]
}

User Management
With User Management, you’re able to take a hands-on approach to managing your users in Gradient. With this feature, account owners can:

Add a user
Deactivate a user
Assign a specific role to a user

Stay up to date with the latest feature releases and updates at Sync by visiting our Product Updates documentation.

Ready to start getting the most out of your Databricks job clusters? Reach out to us at info@synccomputing.com.

Sync Computing Partners with Databricks for Lakehouse Job Cluster and Usage Optimization

Jeffrey Chou
02.01.2024

Self-improving machine learning algorithms provide job cluster optimization and insights for Databricks users

February 1, 2024 08:00 AM Eastern Standard Time

CAMBRIDGE, Mass. – Sync Computing, the industry-leading data infrastructure management platform built to leverage machine learning (ML) algorithms that allow users to automatically maximize data compute performance, today announced that it has joined forces with Databricks go-to-market (GTM) teams and their Technology Partner Program. The end goal is to help Databricks customers achieve lower costs, improved reliability, and automatic management of compute clusters at scale. With the collaboration of the two technology powerhouses efforts, Databricks customers will gain the opportunity to take advantage of Sync Computing’s Gradient solution for SLA optimization, real-time insights, and significant cost savings so that teams are able to focus on greater business objectives and ROI.

Platform and data engineering teams are constantly faced with changing pressures as the data infrastructure landscape becomes increasingly complex. They are met with ongoing needs to iterate quickly, gain real-time insights, and maximize performance all while managing cost. The Gradient platform by Sync Computing provides a single source of truth for cost tracking, data governance, and unified metrics monitoring.

“The management and cost of data pipelines is top of mind for engineering teams especially in the current economic climate. However, tuning clusters to hit cost and runtime goals is a task nobody has time for,” said Jeffrey Chou, CEO and co-founder of Sync Computing. “Databricks customers who use Sync’s Gradient toolkit are now open to a whole new world of opportunities as they can offload these tasks to Gradient while they focus on more urgent business goals. Organizations absolutely love the ROI they see almost immediately.”

“The collaboration between Sync and Databricks brings an entirely new perspective to code-level optimization.”

Sync Computing’s machine learning-powered optimization delivers recommendations for Databricks clusters, without making any changes at the code level. Using a closed-loop feedback system, Gradient automatically builds custom-tuned machine learning models for each Databricks job it is managing using historical run logs — continuously driving Databricks jobs cluster configurations to hit user-defined business goals.

Sync for Databricks allows companies to:

Enable platform teams full governance over config changes to meet business demands
Slash Databricks compute and operating costs by up to 50%
Gain coveted insights into DBU, cloud costs, and cluster anomalies
Hit SLAs even as data pipelines change

Sync integrates with leading cloud platforms like Amazon Web Services (AWS) and Microsoft Azure to programmatically optimize for tools like Apache Airflow and Databricks workflows, without changing a single line of code.

Learn how Sync helps organizations large and small optimize Databricks clusters at scale here.

About Sync Computing
Having been recognized as a Gartner Cool New Vendor, Sync Computing was originally spun out of MIT with the goal to make data and AI cloud infrastructure easier to control. With Sync’s one-of-a-kind solution, Gradient, users are given full ability to enable self-improving job clusters to hit SLA goals, gain infrastructure insights, and leverage tailored recommendations to achieve optimal performance. Recognized names such as Insider, Handelsblatt, Abnormal Security, Duolingo, and Adobe have relied on Sync to get the most out of the data-driven landscape with automated data optimization. To learn more, visit https://www.synccomputing.com.

Contact
McKinley Culbert
Marketing at Sync Computing
mckinley.culbert@synccomputing.com