Many of the larger companies I have worked with or discussed with my colleagues have made some progress on their way to Enterprise Data Warehousing. Hardly anyone, however, has actually completed an implementation of the one single point of truth, integrating all relevant structured data and analysis in one place.
The challenges start with justifying the investment into a long-running, complex and expensive initiative in a dynamic, ever-changing business environment with quite a lot of work to complete before business sees the first benefits. Then the exponential growth of available data (structured, semi-structured and unstructured), the improved capabilities for advanced analysis on large amounts data, the demand for (near) real-time analysis have led to a shift of investments away from the classic Enterprise Data Warehouse.
Clearly, there is value in integrating data and in ensuring data quality. But does it pay off to have all data in one place?
- In the Oil & Gas industry, upstream and downstream are two substantially different business operations often under the same organizational roof – what is the business benefit of integrating that in one single Data Warehouse? And how would a truly integrated data model integrating retail operations and refineries look like?
- Mergers & Acquisitions? How do you integrate that other data warehouse you just acquired with that new company into your data warehouse without impacting the agility of your own, established data warehouse? What if you sell parts of your company and need to disentangle and separate the data you have stored?
- In the Utilities industry, power plants generate not just power, but also huge amounts of data, up to petabytes a day. There certainly is value in analyzing that data, but is there value storing it an Enterprise Data Warehouse?
Does it make sense use the same technology for all the different purposes? Current main stream DBMS certainly provide everything needed for an Enterprise Data Warehouse, but are they well suited to handle the increasing amount – and, more importantly, increasing variety of data generated by components of the Internet of Things (IoT) or coming from Social Media platforms?
Shifting Paradigms
The main early driver of data warehousing is largely gone: Modern operational systems have sufficient capacity to serve operational reporting and analysis needs. In fact, real-time reporting and analytics is gaining importance and, consequently most of these operational systems now provide their own analytic functionality.
Advances in computing power, network speed, the advent of in-memory systems, and other technology developments have enabled ever larger amounts of data to be processed and stored, enabling large Social Media platforms, or enabling Internet of Things applications. Business is now able to apply “Big Data” technologies to answer different questions on different, often very large sets of structured, semi-structured or unstructured data.
Adding the availability of standard, open interfaces to that, we now see a shift of paradigms in data warehousing and analytics, away from seeing everything as a data source towards seeing a collection of Analytic Services. An analytic service independently processes and stores information, it allows direct access to analytic functionality that allows reporting and analysis on that information, it exposes analytic functionality, data and meta data through APIs and it allows to extract detailed, aggregated or otherwise preprocessed data for use in other systems. Examples for what we consider Analytic Services include:
- SAP’s S/4HANA ERP solution with “Embedded Analytics” to report and analyze operational data in real-time
- Salesforce, SAP’s Hybris (both CRM) or SAP SuccessFactors (HRM) solutions that have their own built-in analytics components
- In-Situ analysis in complex machines like airplanes or power plants are generating a constant stream of data that is continually analyzed to check for anomalies and can be accessed through APIs
- External service providers not only produce data, but also offer analytic services on top of that, such as
- Social analysis by Facebook, Snapchat, Twitter and other social media
- Nielsen, or IMS Health collect and aggregate data relevant for their industries
- Advanced analytics algorithms made available on IBM Watson, Google Cloud or the SAP Hana Cloud Platform
What independent analytic services don’t provide is cross-functional, cross-application analysis, KPIs embedded into a rich context of information, reliable historical or auditable information. And clearly, there are practical limitations in terms of network bandwidth, processing power or storage capacity that make it worthwhile to still maintain controlled redundancies simply to optimize analysis.
That is where data warehousing comes back in with the capability to combine analytic services to provide high quality, cross-functional, interdisciplinary analysis that really has the potential to add unique business value.
Distributed Data Warehousing
Distributed Data Warehousing is not a new idea and it complements traditional data warehousing by applying a structured approach to combining available analytic services, providing more agility and flexibility without really having to compromise data integrity. Integration points are on three levels:
- Integration on UI Level: Present analyses / reports from various analytic services in an integrated, usually web-based user interface.
- Integration on Meta Data Level: The underlying data warehouse solution would connect to analytic services to retrieve relevant analysis or data, would then process and merge that information with local analysis or data (if required) and present it to a unified UI (web-based, MS Excel-based or others).
- Integration on Data Level: Following a traditional data warehouse approach, data would be extracted as needed and processed, stored and analyzed locally – or again exposed through open interfaces of the distributed data warehousing solution.
This approach puts more emphasis and more urgency on the actual business value that can be generated by delivering data and analysis to the business users, utilizing analytic services that are already available and then gradually refining analytic capabilities, deepening the integration and optimizing execution over time. Distributed data warehousing can, of course, be complemented with other approach to more agile development, such as SCRUM/Agile project management, or with the broader approach of Gartner’s bi-modal BI.
Note that this not a discussion of “Cloud” vs. “On-Premise”. Conceptually, it doesn’t really matter where a specific analytic service is located – everything becomes “hybrid”.
How does BW/4HANA fit in?
With SAP BW/4HANA, SAP has released a much leaner variant of their Business Warehouse product line, deeply integrated with the SAP Hana in-memory database. If you look behind the curtains, most of the original BW logic has been moved into the database layer significantly improving BW performance. At the same time, deep integration into Hana allows BW to utilize more advanced functionality of not just the Hana database, but the Hana platform, enabling integration of predictive analytics, text mining, spatial analytics, etc.
This door is just opening – there is a lot for SAP left to do in terms of development and integration and there is a lot for clients left to explore.
Please refer to my “7 Reasons” Blog for more information about why S/4HANA does not replace BW (and why we still need Data Warehousing).