Category : Digital Transformation
What is Databricks?
Databricks provides a cloud-based platform for data engineering and machine learning. This cloud-based platform and the use of Apache-spark make deploying the architecture easier. It also provides a language-independent notebook interface which makes different users collaborate.
Pros of using Databricks:
Integration with various cloud Providers makes it easier for Databricks to provide the desired infrastructure for heavy workloads.
Databricks also provides a large scale of processors to handle the workloads as they are integrated with the leading cloud providers in the market.
Databricks also provides language independent platform for data engineers so that they can work on the language that they are comfortable with, Databricks supports Python, Scala, and SQL.
The architecture of Databricks:
Databricks architecture consists of two main components. They are
Control plane
Data plane
Control plane: It carries out all the backend services that AWS manages like commands in the notebook and other configurations.
Data Plane: It is where all your data will be processed with the help of the infrastructure from AWS.
About Viacom:
Viacom is an American video and audio streaming company that deals with interests in film and television-related data. It was founded by Sumner Redstone in the year 2005. By 2019, it reached a revenue of 12 billion dollars.
The various Challenges for Viacom to overcome for a better user experience and productivity:
The major challenge at that time was delayed insights and reporting difficulties.
Viacom has faced problems like buffering the video, sudden termination of the video and delay in the first frame of the video with continuous streaming of the videos. Viacom has its proprietary video players, which they use to collect the performance data of these video players so that they can improve their performance of the video players. But to their disappointment, they can only collect the functionality of the video players and not the reason why their video players were malfunctioning. When the data was collected and the reports were generated, they have to customize these reports based on the requirements of the data scientists, data engineers, and non-technical, business users is a time taking process.
The decreasing traffic led to a decline in TV ad sales and this led to a decline in the income of the company.
Why AWS?
Although Viacom has its own in-house datacentre, it knew that increasing the amount of data daily requires them to upgrade to the cloud for the autoscaling of the resources.
They have to go for AWS because they already experimented with it on a different project and know the wide range of services that AWS can provide. They ultimately concluded that AWS was the perfect choice for the POC.
Solution:
From the internal recommendations, they believe that AWS and Databricks could solve their challenges. Viacom approached AWS for a one-month proof of concept (POC) that would solve all these challenges. AWS recommended Apache Kafka for the ingestion of the streamed data and to use of Apache Spark cluster for analysing the data that was streamed and routing this data to Amazon S3 (Simple Storage Service).
They could quickly see the differences in productivity by the POC that was recommended by AWS. Viacom quickly realized that Databricks also helped the company resolve internal reporting challenges.
Databricks Unified Analytics Platform enabled Viacom to deliver analytics to both technical and non-technical teams using a single code base. This process simplification allows for on-demand, self-service, reporting of data at varying levels of granularity, so Viacom’s data scientists can get the raw data they need to run their custom models, and product managers can get the insights they need to make improvements and launch new features.
Why AWS?
Although Viacom has its own in-house datacentre, it knew that increasing the amount of data daily requires them to upgrade to the cloud for the autoscaling of the resources.
They have to go for AWS because they already experimented with it on a different project and know the wide range of services that AWS can provide. They ultimately concluded that AWS was the perfect choice for the POC.
Results:
With the use of AWS and Databricks, Viacom was able to achieve the following improvements:
Enhanced video viewing experience.
Automation of resource allocation.
Able to reduce video delays by 33%.
Simplified the analytical reporting of the data.
Views per session were increased up to 7 times.
Conclusion:
Using Databricks, the Apache Spark clusters can be ready in minutes and can analyze the enormous amount of streaming data from different sources and deliver insights within hours. The speed of the deployment of the clusters and the fully managed cloud infrastructure made it easier for video streaming companies like Viacom to utilize the ability of Spark and real-time analytics to increase productivity.
Lessons learned:
Viacom realized that they are spending too much time managing the clusters and tasks rather than application development because the data being generated every day is about 1.2 TB. With the introduction of AWS and Databricks the clusters were managed by the cloud provider AWS and the tasks were managed by Databricks and Apache Spark. This made the team concentrate on the application rather than the infrastructure.
You May Send Enquiry Or Reach Us At
Tapasya Corporate Heights, Sector 126, Noida, Uttar Pradesh 201313
Toll Free: 1800 889 0952
info@codinixcloud.com
Monday - Friday
9:00 AM - 10:00 PM IST