MapR expert on the growing trend towards ‘dataware’

MapR expert on the growing trend towards ‘dataware’

Pinakin Patel, Head of Solutions Engineering at MapR, explores the growing trend towards a new type of data middleware – aka ‘dataware’ – that is designed to do for data what middleware did for operating systems and applications. 

Information Technology has an overriding progression centred on performance and efficiency. The mainframes of the 1970s moved to the racks of pizza boxes in data centres of the 1990s that are now the virtualised clusters within clouds. On the application side, the highly proprietary applications interfaces have given way to more open APIs such as SQL, RestAPI and S3.

One of the drivers is to remove complexity and an early example is the introduction of middleware as an abstraction layer. The Google dictionary description states of middleware: ‘Software that acts as a bridge between an operating system or database and applications, especially on a network.’

However, in the modern era, the stack of hardware, middleware and applications all connected by a network is much less certain. With smartphone apps, the cloud, shared networks, web-apps and a whole host of hybrid systems, the role of middleware as a glue to bind operating systems to applications is still valid, but middleware does not provide a consistent conduit for the handling of digital data which is the foundation for most business use cases.

Data evolution

In the past, where most data was generated from a big relational database, the integration was simple. Today, we have the data of a growing number of formats and delivery types. In no order, data can be structured and unstructured, file, object, streaming real-time, inference data and archived, plus several hybrid types. The data might need a certain type of encryption, higher availability or accessibility to a third party for governance issues. It might have regulatory needs that mean it can only be stored in a certain geography or for a certain duration. The list of data centric requirements along with a need for scaleable performance is vast, complex and growing.

To address this issue, there is a growing trend towards a new type of data middleware – termed ‘dataware’ – that is designed to do for data what middleware did for operating systems and applications.

Dataware sits in the spectrum between hardware and middleware, as a conceptual layer to create the next level of abstraction in the IT stack. Distinctly different from databases or data warehousing, dataware provides a platform-based approach that handles all data in terms of ingest, storage, availability, transformation and delivery between sources and destination to allow enterprises to focus solely on the use case and not the plumbing.

Beyond storage

Unlike the old world of storage that centred around volumes, blocks, files and more recently objects, dataware instead presents a set of standards-based APIs that enables enterprises to manage, secure, govern and protect data along with tools to enable the consumption of data by a broad set of applications and tools. The complexity of data is handled based on defined policies that span across locations, hardware infrastructures from on-premises to the cloud to the edge and containers.

The notion of abstraction is like the way in which operating systems manage hardware using device drivers that provide a known set of interfaces to mask the complexity of the underlying graphics, networking and audio chipsets. This capability to deliver data based on need instead of underlying limitation is particularly useful in use cases that may require multiple sources of data, of differing types, that are served from disparate sources.

For example, take a ride sharing service such as Uber or Lyft; the service will include both real-time streaming data from drivers and passengers, traditional customer information from a database and, in the back end, the service is making lots of analytical decisions around journey planning, capacity and demand. Factors such as weather, time of day and temperature can all add into these calculations.

There may be multiple sources for these elements and differing parsing that may change as the application set evolves. Hard coding these data elements into the workflow is inefficient, especially if the data structure, type or source changes. Instead, in a dataware model, the dataware abstraction layer manages acquisition, storage, parsing and delivery of the required data to the application and handles any return path data capture.

If a new data type is required for the applications, for example geographic information system (GIS) mapping data, the dataware can handle the management complexity involved in its acquisition and present the data to the application group in the required format.

Although the term ‘dataware’ might be new, the notion of abstraction is very old and as more data technologies start to coalesce around emerging standards such as JSON, S3, Spark, Kafka and others, the easier it becomes to add additional data types and data processes into a dataware layer.

AI and security

Currently the use case most relevant for dataware is within the area of Artificial Intelligence. For the concept to take off, the current generation of pioneers are in the process of creating and testing models to teach machines how to spot events and deal with different situations. This process of Machine Learning is largely dependent on access to lots of data. Dataware is perfect for these types of projects as it can handle almost any type of data with the ability to scale capacity almost infinitely by offloading to the cloud as well as handling requirements such as performance tiering and backup.

The other major issue that dataware can potential help address is security. The present situation where enterprises keep multiple silos of data that typically have separate access, encryption and governance criteria is incredibly difficult to secure. As data starts to move from these individual silos into other areas such as AI research, test and development, the ability to apply controls to ensure security and privacy are made more complex.

In a dataware centric model, all data flows through this data abstraction layer which makes it potentially easier to apply policy-based controls at a single point instead of having to fragment the security process.

Dataware is an evolving concept and several pioneers in this space, including MapR, are striving to ensure that it supports the widest ecosystem of open standards. This is vital to ensure the technology avoids the proprietary pitfalls of legacy middleware platforms that tended to lock-in customers rather than provide freedom and choice. If these lessons can be learnt and enacted, then it really will be time for a new ware.