GigaOm guru Om Malik goes so far as to call data without proper context “dirt” in a recent blog post, and truly some of the large piles of data gathered by companies today are nothing more than a waste of storage space. It’s a tough job for a BI system – and the analyst working with it – to make sense of data that has lost its accompanying metadata. Especially with huge amounts of data, it is critical that it can be processed in an automated fashion without requiring human intervention or interpretation for every single datum.
There are many products emerging in the big data space where the focus is on driving huge volumes of data into your (local or cloud-based) storage systems. Much of this data is erroneous (due to “smart” algorithms that often turn out not to be less smart than advertised) and isn’t annotated with the relevant metadata that could make the difference between relevant information and garbage data.
Along the same lines, Paul Michaud of Nebility discusses the Garbage In = Garbage Out (GIGO) principle. Analysis becomes meaningless if it is done on top of inconsistent or incorrect data. Don’t get lost in the “big data” frenzy; data isn’t just data. It has to be meaningful, annotated, and consistent.
This requires a powerful ETL tool such as Kapow Katalyst that has rich transformation capabilities and can acquire the proper metadata as the data is extracted, or even add new relevant metadata based on complex business rules. And rather than just pulling random petabytes of data in, you should carefully consider what you wish to accomplish with the data and what requirements that poses on the quality of the data, the number of data sources that must be combined, and the current degree of consistency between these data sources.
It’s tempting to measure your big data strategy by the volume of data consumed, but that would be about as meaningful as judging gourmet restaurants by calories per serving. Focus on accessing, transforming, and delivering the highest quality of data for superior decision-making and competitive advantage.
By: Anne-Sofie Nielsen 




