At the end of the football season I look back at how well I predicted the outcomes against the spread. My correct selections for the season totaled 51.56%, good enough for 25th place out of 150 people. I hope my data predictions for this coming year stack up a little better.
2016, and the years which proceeded it, have been a revolution in data. We have seen exponential growth and businesses which are solely based upon information. We have seen the rise of Hadoop and Big Data. We have seen visualizations move from the IT department to the masses and we still require more. In the coming year of our data-rich era I expect to see a continued evolution and improvement of tools and methods to meet business and consumer demands.
My predictions focus on the architecture to solve the problems of today. The concepts like machine learning, Artificial Intelligence or The Cloud, will all play important parts in the coming year, but my focus will be on how data consumers will be better enabled for success.
Data Democratization
The idea that we can provide data to the entire organization in an efficient and effective manner has always been a problem. Data is stored in numerous places with little consideration for how information is to be integrated to enable new innovations in a business. By creating access to data which may reside in various locations and storage styles we will enable the data consumer’s ability to use data to drive business. For example, the use of social data is a known source of valuable insights. Whether it alerts retailers to data quality, it may also be used to aid support, logistics and merchandising. However, most organizations use this data only within a limited number of business areas, by making data available in a governed manner will enable all areas to leverage these valuable insights for themselves.
Self-service Data Preparation
The age of self-service began in 1917 when Clarence Saunders and the Piggly Wiggly stores patented the “Self-Servicing Store” concept that allows customers to go into a store and select their own merchandise, instead of a clerk selecting the merchandise you want from a list. Fast forward 100 years, and we see new evolutions in self-service. Retailers like Walmart are introducing the ability for customers can go into a store, scan their own items and walk out while paying for their order, without anyone slowing down their experience, unless needed.
Data is the same way. We have built data warehouses and then data marts. We built data lakes and we created data structures to support analytics. However, with the advent of self-service analytics, data consumers need data faster and in more formats and grains that exceed the old warehouse team’s ability to meet these needs. In 2017 we will see the ability to perform self-service tasks will extend into the ETL world of data preparation and integration. Tools like Alteryx which allows users to take data which is often transactional and mix it together from various sources, applying statistical methods to the data set and creating output of the results without asking anyone in the IT department. This is revolutionary for the expanding population of business analysts and data scientists.
The Data Warehouse Needs to Change
I recently read an article that the data warehouse was dead. However, I would like to report that the rumors of its death are highly exaggerated. The data warehouse is far from dead; in fact, it is the core of any business reporting system, but it must adapt faster. We generate significant data through our business applications which feed the traditional data warehouse; 2017 will see businesses adapt their data warehouse architecture and finally see a widespread use of Hadoop for landing data. With loading frameworks that will enable quick loading of data we will see data lakes supporting legacy analytics and innovative data science. If by the end of 2017 Hadoop is not forming a key part of your data architecture you will be falling farther behind in innovation. The advantages of scale and cost can allow for spending money where it is really needed. We will see an expanded spend on platforms both on premise and in the cloud for servers, which provide high performance analytics delivering the highest value to the business. The evolution of the data warehouse will be more about the acceptance of Hadoop as a business critical part of the optimal warehouse architecture.
Overall I believe 2017 is the next evolution in data. I think we need to get this mass of data under control and begin to empower people in every organization using data in new innovative ways. We cannot slow it down so we need to find ways to be more effective, Just like Uber empowering the individual driver, 2017 will enable data consumers to be answer the questions which they have avoided for years.