I love the term “Data-Wrangling” used in a recent New York Times article entitled “For Big-Data Scientists, ‘Janitor Work’ Is Key Hurdle to Insights“. The article explores the Big Data revolution and the development of software to automate the gathering, cleaning and organizing of disparate data and puts a name on the step that is critical but often missed in Big Data. It’s about turning data into information.
These days, it is not difficult to collect data, store it, or even to present it in pretty graphs. But if that is all it is — data — then as many companies are finding out, it is not a lot of use.
The whole reason we collect data is to achieve understanding through analysis. The raw material for analysis is information: data that has been contextualized and has had meaningful rules applied.
Here’s a quick example: If I want to understand why a police officer has pulled me over, the data point that my speed was 43 mph does not tell me much. However, the information that I was doing that speed in the middle of town (context) in a 30 zone (rule) allows me to quickly analyze the flashing lights and achieve understanding!
Automated, rule-based systems that achieve this data-to-information “wrangling” are central to any sustainable Big Data strategy. They make the wrangling affordable; they make sure it is done soon enough for the analysis to have a meaningful impact on results, and they ensure that everyone’s information is produced to the same recipe — apples to apples, not oranges or bananas.
If you are investing a lot of money in systems to produce piles of data, but don’t seem to be getting a lot of information, ask your team what they are doing to automate the data-wrangling!