Data Wrangling: Trivia to Information

What’s the difference between data and information? I tend to think of data as facts which are true but trivial; while information is a fact on which I can base a decision. The problem is that this distinction is often in the eye of the beholder. There are some facts we can all agree are just data — I learn the fact, and my world does not change one jot.

A great example of these type of facts are the cascade of faults many modern machines report any time anything goes wrong. Did all these faults happen? Yes. Did they all cause the machine to stop? No. Just one did. Which one? I don’t know. So trivial. If it is my job to fix the machine, which fault caused the machine to malfunction and in what context, is information;but — and here’s where perspective comes in — to my colleagues in the quality department, that fact hardly rates as information at all.

Context is very important to turning a fact into information. Lets think of the fact as what — as in “What happened?”. Everyone gets the importance of also knowing when, so many big data historians consist of huge collections of whatwhen facts. How about their friends where, what — as in “what it happened to”, and who? Establishing what, when, where, what again and who so we can answer why and how is the essence of this emerging profession. Data Wrangling: smart people taking trivial facts and piecing them together until they yield information that can be used to make decisions.

I confess I have a problem with Data Wrangling. It’s got a cool name, and sounds about as ruggedly adventurous as typing at a computer can get. It’s also actually very exciting to force a pile of trivial data to yield its hidden secrets, but it’s enormously labor intensive.Also, its answers too often tend to turn up after the battle is over.

So to the extent possible, Data Wrangling needs to be automated. Computers can tag each fact with the answers to all the “W” questions. Computers can apply rules to filter and combine these facts to create information. Then, and only then, should you be spending expensive human brain power to answer why and how.