The time has arrived for the Data Lake to come of age. We now have the technology to build a vibrant and productive solution which can evolve a company into an agile data-driven organization. Data Lakes provide us with the ability to deploy cost effective data stores which can cross organizational data barriers and store unprecedented volumes of information. The challenge that we face is how best to create a lake that can fulfill the vision. We are presented with numerous approaches and methodologies to create the lakes but often they miss the mark. Let’s look at why they miss the mark and what we can do to make our Lake a winner.
Many companies today are beginning to leverage Big Data and how that is done varies from company to company. Facebook and Google not only defined the technology they also use it to power their innovation. Other companies are using Hadoop and NoSQL to try to replace their Data Warehouses; others use it to replace components of the analytic architecture like landing and staging, while others create a completely new analytic platform to complement their structured data systems. Each of these approaches can meet or exceed expectations but too often we see that the expectations are not met because the solution is technology-driven and not business-driven. I believe that you can achieve by doing some of all these approaches based upon driving business differentiation.
A Data Lake must be created in a way that it can meet the agile requirements of a today’s dynamic businesses. The Data Lake cannot have the rigidity which we are placed on our legacy structured data warehouses. They must have a fluidity to them which allows users to quickly on-board data and then process it as required to answer the questions that they have at that moment. The beauty of having a Lake is providing a cost-effective space for experimentation and analytics. Data consumers should not have to wait for data but should be enabled by self-service facilities which allow data to be loaded and processed on-demand. The challenge has been that IT organizations have placed too many restrictions and processes to allow for data to be added quickly. This approach needs to better align with business needs and not simply IT’s desire for control.
The realization of a Data Lake can be transformational to an organization if done right. The key to success is a true partnership between technology and the business. The Data Lake must provide data in a foundational manner. It will include the basic data used to run the organization. Highly granular data will form the basis of all traceable analytics which will be loaded and managed by IT. This data is well-known and is core to running the business. This foundation of data should be linked to support cross functional analysis. Data lakes should include this base layer of data, which will be needed in most analytic scenarios, by loading as needed; hopefully in real-time. In addition, the Data Lake should provide pre-calculated models and analytics which are computed once and used often. This core and analytic data can then be moved to a high-performance analytic platform like Teradata, Oracle Exalytics, Microsoft APS or some other database which can perform analytics at scale supporting numerous users.
To complete the deployment of a robust Data Lake we need to provide a working area within the Data Lake which can be used by advanced information consumer like Data Scientists to join together with our base data to create new analytic models. In this self-governed space these users will be able to load and create data as well store the results of new models. This approach will support an agile business model of a data-driven business. The Data Lake can remove the barriers for people to create innovative analytics and ask questions they never asked in the past and allow organizations to truly drive business value from data.
The Data Lake is here to stay. The Data Lake is good and can be great. The danger of a Data Lake becoming a forgotten is a potential reality if it doesn’t meet and drive business value. The technology should support the business-driven insights, and not create barriers to loading and processing data. Building your Data Lake with an Agile mindset will enable business decisions quickly, and will support self-service needs without restricting creativity.