PIVOTAL GemFire Features
Solve the Data Challenges Presented by Modern, Distributed Web-Oriented Applications
Pivotal GemFire is the premier in-memory data platform delivering speed and dynamic scalability, together with the reliability and data management capabilities of a traditional database. Fast and flexible, GemFire enables you to process massive quantities of data and scale within the cloud to suit the needs of users, while providing tools to simplify monitoring and management. The GemFire solution features:
- Data Consistency with Cloud Scalability – GemFire solves the problem of moving data into the cloud by implementing a shared-nothing architecture. The only shared resource in the cluster is the network. Adding nodes to the cluster adds capacity for more users and data. The application tier does not need to be aware of how data is being made consistent, how big the cluster is, how many copies of data are available, or what the current cluster membership is. Typically an issue of complex maintenance, scripts and custom code, this logic is all built into GemFire. The solution can be configured for a high level of data availability by using redundancy settings. The logic used to update data ensures that all copies of the data in the cluster are consistent at any given point. And data can be sent to remote clusters via the WAN Gateway in case of catastrophic failures. GemFire can also be used as a reliable Operational Data Store, leveraging archival systems for OLAP and offloading of data when it is no longer relevant.
Extreme Performance and Continuous Uptime with Predictable Performance –
GemFire provides for continuous uptime with built in high availability and disaster recovery. Multiple failure detection models detect and react to failures quickly, ensuring that the cluster is always available, and that the data set is always complete. During hardware refreshes and major GemFire upgrades, GemFire’s cloud elasticity allows you to add nodes to the cluster without impacting the clients. During data model changes, GemFire uses PDX Serialization to enable different nodes and clients in a cluster to have different data model versions running at the same time. This means that new data models can be deployed with zero impact to client applications. During server-logic code changes, GemFire provides hot redeployment of any user logic classes via our management tool, GemFire SHell (gfsh).
Data Aware Parallel Function Execution – By associating a function with a data set, GemFire will route function calls to the relevant nodes on the cluster without interference from the caller. For large data sets, this means dramatically reducing the total time to run complex calculations over traditional approaches. As the GemFire cluster expands and contracts, the calling application does not need to be aware of the cluster membership or where the data resides. Application developers can use a familiar HashMap interface to touch the data directly, or an easy-to-use function API to interact with the data set. If a failure occurs in the cluster during function execution, the function can be configured to be aware of this and restart the execution.
Data Stream with Enterprise Data Store Correlation – GemFire can receive data from any application that is able to call a C++, C#, Java, or REST interface. Conversely, GemFire can call into anything that is available via a Java API, such as JDBC or web services. Along with a powerful query interface, data is readily accessible. This flexible API, along with the ability to ingest data from external sources and send out event notifications to other systems, gives you the ability to correlate data from multiple sources into real time information that is important for your enterprise.
GemFire XD Features
Enterprise Real-Time Data Service on Hadoop
GemFire XD, built on over a decade of innovation, combines with Pivotal HD and HAWQ to provide the industry’s first production quality platform for creating closed loop analytics solutions. It does this by providing:
- The performance of in memory, combined with the scale of big data
- Larger data sizes in the same size JVM
- Direct write to a big data store (HDFS) allowing for back-end analytics
- SQL without the penalties of relational databases
In-Memory with Big Data:
GemFire XD enables the creation of low latency, scale out OLTP applications integrated out of the box with a big data store (HDFS). This provides sub-second response to applications, while allowing the data to be analyzed in the back end via HAWQ or Map Reduce in near real time.
Closed loop analytics with HDFS
GemFire XD can be configured to directly write incoming data to HDFS. This provides lots of interesting scenarios:
- Capture streams of data for analysis in memory, and for roll up historically after the fact.
- Route transactions through a reliable in-memory system with assurances that data is available on disk for audit and compliance.
- Take advantage of consumer generated streams, like Twitter, for sentiment analysis.
- Detect fraud and shut it down in real time by knowing what “normal” patterns are and applying that to current data.
Scale out and scale up in-memory, scale out in HDFS
In addition to clustering which enables elastic scale out, and HDFS integration which enables scaling of the persistence layer, GemFire XD with Memscale allows applications to scale individual servers to hundreds to gigs without incurring penalties associated with traditional garbage collection in servers.
Public performance benchmarks (YCSB) show GemFire XD with Pivotal HD to be two to three times better than HBase in throughput and latency for a variety of workloads:
Familiar SQL Interface
To access data in GemFire XD, developers use a standard ANSI SQL interface. Combining this with a PXF connector for HAWQ to read GemFire XD data, you can access your data via SQL whether in-memory, or on disk. This gives you access to OLAP and OLTP on the same data set.
For applications, GemFire XD provides both a JDBC and ODBC interface, allowing for powerful applications to be built using the familiar and friendly eco-system provided by Spring, Java, and C++.
GemFire XD achieves this by providing:
- Relational technology based on Apache Derby
- ANSI 92 standards compliant query engine
- Powerful distributed stored procedure execution
- Referential integrity on a distributed system
PIVOTAL GemFire Technology
What is GemFire?
GemFire is a distributed in-memory data grid for OLTP applications. It is backed by a robust, shared-nothing persistence architecture that guarantees the performance of in-memory and the reliability of disk. By combining data redundancy, GemFire provides the best-of-breed, high availability for data. It is a transformational piece of technology that was purpose-built to support large volumes of data with a very high degree of data consistency. Data is stored as key-value pairs or as JSON documents in a partitioned/replicated data grid. Every feature of the product is designed to minimize latency, allowing the development of highly responsive applications that never have to be redesigned for scaling.
Familiar Java HashMap API
Data sets in GemFire are internally held in a data structure called a region, which implements the java.util.HashMap interface. Developers familiar with Java will quickly understand GemFire.
GemFire makes optimal use of system resources like CPU, memory, network and disk to ensure that if these resources are doubled, the overall system throughput can also double, without any visible increase in latency for end-user applications. It does this by intelligently managing the placement of data while reducing network round trips. Data gets replicated only to those nodes that need the data, and requests for access are routed intelligently using the most direct path available.
Elastic at Cloud Scale
GemFire’s resource usage is designed to be elastic. If you add more resources, it increases capacity and balances the use of resources across the network. If you take away resources, it responds to the decrease in resources without impacting end-user applications, as long as the remaining elements of the cluster can handle the load. The ability to add and remove nodes in response to changing load conditions makes GemFire cloud ready for private and public cloud deployments.
Highly Available with No Maintenance Windows Required, Even for Upgrade
GemFire runs as a cluster of processes on commodity hardware running on regular TCP/IP networks. In this environment, it takes extraordinary measures to protect data in the face of process failures, machine failures, rack failures and even data center outages. GemFire’s cluster management and resource utilization algorithms use a combination of retries, disk persistence, redundancy zones and network partitioning management techniques to ensure data consistency in a highly available system. GemFire supports the notion of rolling upgrades, which ensures your GemFire administrators do not have to plan for scheduled maintenance windows to upgrade the product or any application code that runs with GemFire.
PIVOTAL GemFire Technology
What Makes GemFire Easy to Use for Developers?
The following capabilities make GemFire an easy-to-use solution for developers:
Deep and Rich Spring Data Integration
As a fully supported Spring project within Pivotal, Spring Data GemFire allows developers to architect applications so that data access and business logic are separated from configuration and operation code. In addition, GemFire APIs and related code samples help make developers productive quickly. Spring Data support in GemFire is enhanced and updated with each product release which allows Spring users to bootstrap applications using very familiar programming techniques to exercise the powerful set of capabilities exposed by GemFire.
Support for Memcached Clients
GemFire servers can be started up as memcached servers with full support for the memcached API. Gemcached extends the reach of GemFire to any memcached client allowing end users an easy way to get started with GemFire.
L2 Hibernate Support and Scalable Session State Management Support
Applications store increasing amounts of data in user sessions and more importantly, user session expiration may be in the order of days, if not months. The ability to store session-state information in a data grid, support multiple topologies for scale (thousands of app servers that can access any session from anywhere), transmit deltas when user sessions are updated, and support for popular application servers such as tcServer, JBoss, WebLogic and WebSphere are common use cases supported by GemFire.
PIVOTAL GemFire Technology
What Legacy Integration Capabilities Does GemFire Have?
GemFire includes the following legacy integration capabilities:
Support for Multiple Deployment Topologies
GemFire supports multiple deployment topologies to provide highly customized application deployments based on their latency, throughput, scale and redundancy requirements. GemFire applications can be deployed as a peer-to-peer cluster (supporting hundreds of nodes), in a client/server cluster (supporting tens of thousands of clients) or in multiple clusters daisy-chained using GemFire’s WAN replication capabilities.
Reliable Publish-Subscribe Framework Allowing Integration with the Rest of the Enterprise
GemFire provides an “eventing” framework that allows end-user applications to receive notifications when data-change events happen in the system. This includes the ability for GemFire clients to configure very customized subscriptions for data change notifications, which is supported by a highly reliable publish-subscribe notification mechanism. These subscriptions can be set up as regular expressions on keys, key sets, or continuous queries using an SQL-like querying language. Event callbacks can be programmed to integrate data coming from GemFire with the rest of the enterprise (including backend databases and exchanges). Subscription management and event distribution is highly optimized to avoid wasting CPU and network cycles and to ensure that events are delivered to the right set of consumers.
Support for Native Clients in Java, C++ and .NET
GemFire is one of the few data grid providers that supports a rich set of APIs and native implementations for those APIs in Java, C++ and .NET. The clients are 64-bit enabled and support full client-side caching with eventual consistency for the cached data which allows you to build applications that are powerful, respond in real time, and can manage large volumes of data without ever going to disk.
Backward Compatibility Support
GemFire clients have been deployed in kiosks—on vendor sites where the customer has no control over the upgrade schedule. As a result, GemFire allows older clients to continue operating with newer servers. Support for rolling upgrades in the cluster ensures that peers from two versions can support data access during an upgrade process.
Support for Reliable Asynchronous Event Queues
GemFire supports callbacks for read-through and write-through operations allowing applications to integrate with legacy data sources. However, when GemFire is being used as the operational data store, it makes sense to use the Async Event Queueing mechanism to update traditional databases using batch operations. As with other aspects of GemFire, high availability is built into the Async Event Queues ensuring data consistency between GemFire and the traditional database being updated.
PIVOTAL GemFire Technology
Which Enterprise-Readiness Capabilities Does GemFire Support?
Support for Global WAN Replication
GemFire provides a globally consistent view of data using highly fault-tolerant asynchronous WAN replication capabilities that are designed to work on high-latency congested networks on the open internet. Using techniques like batch replication, tuned sliding windows, support for conflict detection and failover for both sender and receiver, GemFire WAN replication helps you synchronize data across the globe, ensuring that your users can be served even if a data center goes down for any reason.
Support for Backward- and Forward-Compatible Serialization Protocol
Applications evolve their data models at different rates. When a new application introduces additional attributes to an existing data entity, GemFire’s backward- and forward-compatible serialization protocol ensures that older and newer applications can continue to exchange information in native binary format without having knowledge of the newer attributes to the entity while preserving them in the system.
Scalable Management and Monitoring Framework
GemFire collects statistics within each GemFire server through a highly optimized sampling algorithm, which provides insight into what is happening within each server. GemFire aggregates these statistics within a managing member and makes them available to management and monitoring tools via JMX MBeans. The public statistics API allows your users to plug in their own statistics within application callbacks to see what is happening across the system.
Scalable Behavior Execution Support
GemFire allows your users to write functions in Java that get executed in proximity to the data that the function will use. This is akin to writing distributed stored procedures in Java that get executed in parallel. The function-execution framework in GemFire is generic enough and supports multiple execution models including MapReduce-like executions across the grid.
Support for Transactions, Querying and Indexing
GemFire stores data in tables that are referred to as regions, and there can be any number of regions in the system that store data either as partitioned or replicated. Regions can be co-located, allowing for optimized scalable transactions across co-located data in the cluster. GemFire supports Object Querying Language (OQL), which offers support for indexing and SQL-like querying capabilities over complex object-graph collections that are stored in regions. Some of the capabilities provided to JSON documents include the ability to query within and across JSON documents, federated queries with support for joins across JSON, and key-value regions/tables and transactional support.
Online Monitoring Dashboards and Command Line Utilities
GemFire includes a web-based monitoring dashboard, called Pulse, that provides a birds-eye view into the cluster with relevant summary statistics. The operator can use it to drill down into individual members, look at individual data sets, or dive into the performance characteristics of individual aspects of product behavior that are depicted via Pulse.
Powerful Offline Analysis Capabilities
GemFire supports an offline analysis tool called Visual Statistics Display (VSD). VSD opens stat files created by GemFire to provide powerful correlation capabilities across events and across members. VSD is useful in pinpointing performance or resource bottlenecks in the distributed system that otherwise would be hard to diagnose.
Sophisticated Proactive Resource Management Capabilities
GemFire actively manages CPU, disk, memory and network resources. Using auto-tuning techniques, GemFire ensures that applications have the access to the right quantities of resources when executing both read and write workloads. The distributed resource manager detects critical memory conditions in the server and moves to prevent operations that can destabilize the container while working to remedy the excessive use of memory in the server.