UPCOMING WEBINAR Databricks Workload Optimization — Best Practices for Visibility, Performance, and Savings - REGISTER NOW
We allow users to access years worth of cloud billing data in an interactive environment, which could easily amount to millions or billions of rows of data across thousands of billing dimensions.
When a new billing file is generated, we load both cost and usage data into a Spark data frame. We then marry and transform the data into a highly optimized version, which we write back into the Parquet format.
In order to ingest data in parallel across thousands of customers at scale, we’ve built an auto-scaling algorithm that allows us to dynamically scale Druids Middle Managers to ensure that our clients always have access to the latest data.
© nOps 2024. All Rights Reserved.