AI/ML vendors at large present the magical view that you ask your data lakes a business question and out shoots an answer that drives a new business insight. I spoke with the Chief Data Officer (CDO) of a Fortune 500. The company struggled with normalizing over 100 different data sources. Data scientists spent up to 70% of their time normalizing the data before even applying algorithms. Will leveraging metadata decrease time to value for data analysis?
There is a large market of data management companies that will index and ingest data to create normalized data repositories. That system becomes a sort of a system of truth. It’s the Holy Grail of data. “Account Opening” in one system is equal to “accountopening” in another dataset. These systems are massive and require a good deal of professional services to make useful. IT infrastructure companies are beginning to tackle the challenge.
The future of AI consumption
I have a theory that AI/ML will eventually get pushed down to the business user. The role of the data scientist will not include performing data analysis but instead creating algorithms later packaged with prosumer software. End users leverage these packages similarly to how business analyst us a desktop database to query production databases or how they use Microsoft Excel.
End users will point the AI/ML tools to unstructured data. The broader data normalization solutions will work with the metadata from existing infrastructure tools to provide a data point for the packaged software. The packaged software will take the form of a SaaS offering from one of the large cloud providers.
Infrastructure companies are making strides. Several companies take advantage of the metadata that exists on their storage systems. I recently viewed a NetApp presentation on their Network Data as a Service (NDAS). NDAS targets the IT-generalist. That generalist could be a developer responsible for maintaining both the application and infrastructure for a small SaaS company.
The solution indexes all of the metadata stored in the system. NetApp enables basic analytics performed on what is essentially a naturally forming data lake as a result of backing up the system. Netapp touts the ability to find and leverage data with an interface as simple as Google Search.
Data analysis isn’t simple. We are far from the ability to ask a question of PB’s of backup data and get real business insights. However, Netapp is showing a vision for how advanced extract, transform, and load (ETL) could easily ingest and present the metadata from all of the idle data residing in an organization’s unstructured data.
While companies such as Cohesity focus on the backup (secondary) data, I find it intriguing that Netapp offers both backup and Tier 1 storage platforms. If NetApp were to offer a way to provide real-time metadata from transactional systems, I could see integration or even competition with in-memory database systems such as SAP HANA.
I believe it’s worthwhile for the infrastructure teams inside of large organizations to have conversations with the data architecture teams to develop a long-term vision for data management. It could very well impact your purchasing decisions for enterprise storage today.