VMware Tanzu Greenplum - Greenplum Database

Blog

Tanzu Greenplum on Samsung’s Gen-5 NVMe Drives

Speed and scale

Faster time to insight due to in-database analytics and AI. Query and data ingestion for petabyte-size data sets.

Productivity

Diverse data types on a single platform (structured, semi-structured, unstructured, vector, or geospatialor graph) wherever the data is located.

Flexibility

Deployment on any infrastructure type with optimizations for bare metal, public cloud and vSphere-based private cloud.

Resilience

Based on OSS Postgres. A time-tested and proven platform with features including redundant components, remote disaster recovery, enhanced security, 24x7 enterprise support.

“Whatever use case we can dream up and whatever ways we can think of to better understand the user, Greenplum allows us to do it.”

John Conley, Vice President of Data Warehousing, Conversant

Read the full story

“The success we've had with Greenplum is because we just use it for everything...there's nothing holding us back.”

Ian Pytlarz, Lead Data Scientist, Purdue University

Read the full story

“We’ve set the wheels in motion to empower lots of different teams to use data to drive efficiency, job satisfaction and innovate faster. VMware Greenplum is now mission critical at Dell.”

Jim Hall, Senior Consultant for IT Infrastructure, Dell Technologies

Read the full story

Architecture

Features

Cloud-agnostic for flexible deployment

Greenplum is available on leading public cloud marketplaces—Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP)—with “bring your own license” (BYOL) and hourly consumption models. It’s also available for VMware vSphere and OpenStack private clouds. Best of all, it’s the same Greenplum version and the same tools across all clouds for a consistent experience.

Value and performance in an appliance-like experience

Dell Greenplum Reference Architecture is the most performant way to run Tanzu Greenplum in an on-premises deployment. It’s a VMware-certified and supported blueprint for Dell hardware configurations that replace proprietary appliances. Users can also deploy Greenplum on HP- and Cisco-certified configurations, as well as their own commodity hardware.

Analytics from business intelligence to artificial intelligence

Machine learning, deep learning, graph, text, and statistical methods are all provided in one scale-out MPP database. Get expanded text search capabilities, supporting both lexical and AI-powered semantic searches, and high speed and feature-rich geospatial querying. Extensive support for R and Python analytical libraries, as well as Keras and Tensorflow.

Easily handled streaming data and cloud data

Greenplum includes integration with the messaging and streaming ecosystem, such as RabbitMQ. Together with improved low-latency writes, Greenplum provides fast event processing for streaming use cases.

Maximized uptime and protected data integrity

Greenplum has features for high availability, intelligent fault detection, and fast online differential recovery, as well as full and incremental backup and disaster recovery. Security and authentication features address enterprise policy and regulatory requirements.

Industry-leading performance

With its unique, cost-based query optimizer designed for large-scale data workloads, Greenplum scales interactive and batch-mode analytics to large datasets in the petabytes without degrading query performance and throughput.

Massively parallel, highly concurrent architecture

Greenplum features a shared-nothing architecture that automates parallel processing of data and queries and petabyte-scale data ingestion. Its open source, cost-based query optimizer (GPORCA) was developed specifically to address advanced analytics, creating query plans that execute complex joins at breakthrough performance on large data volumes.

Based on open source projects

Avoid proprietary vendor lock-in. The Greenplum Database open source project is 100% in alignment with the PostgreSQL community. All major Tanzu Greenplum contributions are part of the Greenplum Database project and share the same database core, including the MPP architecture, analytical interfaces, and security capabilities.

Enhanced data federation with PXF

The Platform Extension Framework (PXF) in Greenplum has undergone improvements, enabling superior data federation. Businesses can now query datasets in Amazon Simple Storage Service (S3) object stores, Hadoop Distributed File System (HDFS), and other relational databases via JDBC. It leverages the Foreign Data Wrapper API from PostgreSQL to access remote data sources in parallel, offering an abstracted data model for managing security and statistics about the remote data for query optimizations.

Multiple index types supported

Greenplum supports a broad spectrum of index types, including B-tree, Hash, Bitmap, Block Range Index, text indices, geospatial indices, and AI vector indices. This feature optimizes data retrieval and query performance.

Use Cases

Enterprise analytics and AI

With support for advanced algorithms such as multi-layer perceptron and convolutional neural networks in Apache MADlib, users can begin to tackle cutting edge use cases in speech recognition, image recognition, machine translation, and computer vision. With optional support for REST APIs, you can train, test, and deploy in a single language (SQL), reducing the occurrence of errors when putting models into production at scale.

Flexible deployment on-premises or in the cloud

Move your analytics workloads to the platform of your choice under the terms and in the timeframe you choose. Deploy on private, sovereign, or public clouds (like AWS, Microsoft Azure, or GCP) or on-premises with Greenplum Building Blocks (GBB). Have the freedom to select the best platform for each project and workload based on ease of use, performance, and total cost of ownership (TCO).

Enterprise data warehouse modernization and replatforming

Replatform legacy enterprise data warehouses (EDWs) to replace expensive, proprietary databases. Modernize with the only open source-based, multi-cloud platform for analytics offering the full range of data warehouse functionality that your enterprise demands. Gain the power of an MPP system in conjunction with proven technology to reduce the cost and complexity of application migration.

Vector management for RAG processing

Efficient vector management is at the core of RAG processing, and Tanzu Greenplum offers a robust solution. With its high-performance capabilities, Greenplum streamlines the handling and analysis of vectors, ensuring data accuracy and speed, making it an indispensable tool for optimizing RAG processing workflows.

Powering IoT applications

By seamlessly ingesting and analyzing vast streams of IoT data, Tanzu Greenplum empowers businesses to make real-time, data-driven decisions. Whether it's predictive maintenance, smart city management, or supply chain optimization, Tanzu Greenplum's high performance and scalability excel in IoT applications.

Get Started with Greenplum

Downloads and documentation

Run Greenplum on Cloud Marketplaces

Run Greenplum on AWS
Run Greenplum on Azure
Run Greenplum on GCP

A unified platform for BI to AI

Speed and scale

Productivity

Flexibility

Resilience

Architecture

Features

Cloud-agnostic for flexible deployment

Value and performance in an appliance-like experience

Analytics from business intelligence to artificial intelligence

Easily handled streaming data and cloud data

Maximized uptime and protected data integrity

Industry-leading performance

Massively parallel, highly concurrent architecture

Based on open source projects

Enhanced data federation with PXF

Multiple index types supported

Use Cases

Enterprise analytics and AI

Flexible deployment on-premises or in the cloud

Enterprise data warehouse modernization and replatforming

Vector management for RAG processing

Powering IoT applications

Get Started with Greenplum

Run Greenplum on Cloud Marketplaces

Explore More

A unified platform
for BI to AI