As the race to deliver the UAW heats up, EMA sees the following vendors working toward a convergence of the data warehouse and data lake: Ahana, Amazon, Cloudera, Databricks, Dremio, Google, HPE Ezmeral, Incorta, isima.io, Oracle, SAP, Starburst, Teradata, and Vertica. EMA also anticipates that vendors that successfully deliver a unified analytics warehouse will quickly eclipse data warehouse and data lake vendors, making them obsolete, except for targeted use cases and analytical projects.
Ahana, the SQL analytics company for Presto, is focused on evangelizing the Presto community and bringing simplified ad hoc analytics offerings to market. Presto is an open-source distributed query engine that enables unifying analytics across databases, data lakes, and other data sources using a federation approach. Presto helps users converge data lake and data warehouse use cases using a single analytics platform.
Amazon Web Services (AWS) offers several different services that, when stitched together, provide a full set of services that would meet the requirements of the Unified Analytics Warehouse. At this point, they do not comprise a single offering and require a significant effort to understand how to integrate features from their different services to have the capabilities of the UAW. For example, users can make their data in Amazon S3 available for analysis with Amazon Redshift for data warehousing workloads and use other analytical engines, such as Amazon Athena, for ad hoc queries and Amazon EMR for big data processing. Amazon Redshift as a standalone service has some of the capabilities of the unified analytics warehouse.
Cloudera Data Platform (CDP) has combined several open-source technologies to deliver a single offering that includes a data hub, streaming data, data engineering, a data warehouse, an operational database, and machine learning. It is available on-premises as well as in the public cloud and allows for deployments across multi-cloud and hybrid environments. CDP uniquely provides consistent data security and governance across all workloads and environments with SDX (Shared Data Experience). The platform caters to a host of users, including data scientists, data analysts, and data engineers.
Databricks provides end-to-end data engineering, data management, analytics, data science, and machine learning on a single Unified Data Analytics Platform. Rooted in open-source Spark, Delta Lake, and MLflow, Databricks effectively combines the benefits of data warehouses and data lakes along with a collaborative data science workspace.
Dremio Cloud Data Lake Engine accelerates queries for different kinds of workloads using Apache Arrow, a language-agnostic, in-memory software framework for developing data analytics applications that process columnar data in both data warehouses and data lakes.
Google Cloud offers Dataproc for data lakes and BigQuery for data warehousing with interoperability between these services that meets the requirements of the Unified Analytics Warehouse. Google enables data warehouse users to query data lakes from within the EDW, and it enables Spark/Beam users to run jobs against data stored in the data warehouse. With a notebook offering that unites these environments, query federation against databases from within the EDW, and the launch of BigQuery Omni to support multi-cloud environments, Google continues to move in the direction of the unified analytics warehouse.
Incorta Direct Data Platform unifies data lake and data warehouse architectures into a single end-to-end, high-performance data analytics platform with the ease of use needed by business users, the data fidelity needed by data scientists, and the governance and security needed by IT. The Incorta Direct Data Mapping engine is able to analyze complex, full-fidelity data in real time and eliminates the need for data modeling, cubes, batch jobs, or optimization of any kind.
isima bi(OS) is a unified insights platform for API, AI, and BI builders. With bi(OS), the entire lifecycle of data management includes ingesting heterogeneous data sources and deriving real-time insights for multi-structured data in a single platform.
HPE Ezmeral Data Fabric has stitched together several open source and commercial technologies to deliver a single offering that includes a data hub, streaming data, data engineering, data warehouse, operational database, and machine learning. It is available on-premises and in the public cloud and runs on bare metal, VM or in containers. It works in multi-cloud and hybrid environments. And it supports both data science and data engineers. By integrating with HPE Ezmeral Container Platform it enables easy deployment of containerized workloads and a wide choice of modern applications.
Oracle Autonomous Data Warehouse provides enterprise capabilities necessary for both analytics and data science by providing access to semi-structured data through the Big Data SQL product.
Qubole Open Data Lake platform integrates several open source platforms into a fully-managed and governed, unified data platform enabling machine learning, streaming analytics, data engineering, and ad-hoc analytics. The platform supports the management of structured and semi-structured data in multi-cloud environments. Open source platforms include Apache Spark, Presto, Hive/Hadoop, TensorFlow, Airflow, and Kubernetes.
SAP HANA Cloud Services includes a full set of offerings integrated to address the needs of both the data lake and the data warehouse. At the core is SAP HANA, an in-memory database capable of analyzing multi-structured data in hybrid and multi-cloud environments. It also includes SAP Data Warehouse Cloud with built-in business logic and enterprise-grade analytics capabilities for all the necessary use cases of the unified analytics warehouse.
Starburst offers an enterprise-grade version of open-source Presto, a distributed SQL query engine used to run fast interactive analytics across any data source, including the data lake and the data warehouse. Their offer is fully supported by a team of experts, including the Presto co-creators.
Teradata Vantage is a modern analytics platform that is based on the highly performant Teradata Advanced SQL database, as well as Machine Learning and Graph engines. Vantage combines open-source and commercial analytics technologies with ecosystem connectivity components to address the needs of both the data warehouse and the data lake.
Vertica is a distributed analytics database with the separation of compute and storage enabling the processing of complex analytics for multi-structured data in databases, file systems, or object storage. Vertica includes embedded analytical functions for high performance and is also able to analyze semi-structured data types in their native structure across multiple storage types, enabling unified access for the different use cases of the unified analytics warehouse.
EMA will publish its first Radar Report for the UAW in the winter of 2020.
For the rest of the story, read the full white paper, “The Emergence of the Unified Analytics Warehouse – Data Lakes and Data Warehouses Merge.”