Cloudera adopts Apache Iceberg, battles Databricks to be most open in data tables

Cloud data lake vendor Cloudera has announced the general availability of Apache Iceberg in its data platform.

Developed through the Apache Software Foundation, Iceberg offers an open table format, designed for high-performance on big data workloads while supporting query engines including Spark, Trino, Flink, Presto, Hive and Impala.

Iceberg started out as a Netflix project before it was donated to the Apache foundation two years later in 2018.

In a blog, Cloudera - the data platform vendor with its roots in Hadoop-based systems - said its goal as to allow multi-function analytics on data lakes, repositories that support both structured and unstructured data. The introduction of the lake house concept is encourage users to employ analytics and BI on data lake systems.

"However, it still remains driven by table formats that are tied to primary engines, and oftentimes single vendors. Companies, on the other hand, have continued to demand highly scalable and flexible analytic engines and services on the data lake, without vendor lock-in," Cloudera said.

The deployment of Iceberg in the Cloudera Data Platform (CDP) includes Cloudera Data Warehousing, Cloudera Data Engineering, and Cloudera Machine Learning. "These tools empower analysts and data scientists to easily collaborate on the same data, with their choice of tools and analytic engines," Cloudera said.

Benefits are set to include support for schema and partition changes as a single command, time travel with point-in-time queries for forensic visibility and regulatory compliance capabilities, and concurrent multi-function analytics to deliver end-to-end data lifecycle needs. Performance is also set to improve with aggressive partitioning to handle very large-scale data sets, Cloudera said.

Tussle of the open source techies

However, Cloudera is not the only data late or lakehouse vendor to commit to an open-source path.

Databricks, the company originating as an Apache Spark vendor, has also donated its storage format layer to the open-source community. The latest iteration, Delta Lake 2.0, was announced last week at the Data and AI Summit.

"Delta Lake 2.0 will bring unmatched query performance to all Delta Lake users and enable everyone to build a highly performant data lakehouse on open standards. With this contribution, Databricks customers and the open-source community will benefit from the full functionality and enhanced performance of Delta Lake 2.0," Databricks said.

Speaking to The Register, Joel Minnick, Databricks marketing VP, said: "After Delta Lake was open-sourced and there's a lot of performance enhancements and features that we had continued to build inside of the Databricks platform. We've always been an open-source company at heart and if we were doing those enhancements, we really did want to be able to give those back to the community."

Minnick said the enhancements were on the "data processing, data warehousing side of things."

Delta Lake 2.0 was donated to the Linux Foundation this week. ®

Search
About Us
Website HardCracked provides softwares, patches, cracks and keygens. If you have software or keygens to share, feel free to submit it to us here. Also you may contact us if you have software that needs to be removed from our website. Thanks for use our service!
IT News
Aug 12
Ubuntu 22.04.1: Slightly late, but worth the upgrade

Latest shine on the Jammy Jellyfish brings ton of fixes to keep you working smoothly

Aug 12
Our software is perfect. If something has gone wrong, it must be YOUR fault

Something for the Weekend Hello customer, can I help you? Ha ha, just kidding, of course I won't

Aug 12
VMware offers cloudy upgrade lifeline to legacy vCenter users

But warns 'upcoming major release of vSphere' will break some plugins

Aug 11
Dealing with legacy issues around Red Hat crypto versions? Here's a fix

RHEL SHA-ll speak unto RHEL... except from 9 to 6

Aug 11
Want the very latest Windows Insider Dev Channel build? Check your disk space

You might need to free up 24GB. A bug for now, but might be sign of way things are going

Aug 11
Rescuezilla 2.4 is here: Grab it before you need it

A fork of Redo Rescue that outdoes the original - and beats Clonezilla too