We have been thinking about data visualization, data storage, and data analysis lately. Here are a few interesting ideas we are tracking, followed by a short guide for working with data:
The Economist recently ran an article on “data”: the vast amounts of data, the surfeit of data, the potential of data, the headaches of data. The Economist compared the shifts brought about by the speed and quantity of data generation and data exhaust (for individuals and societies)–in terms of scale and importance–to the changes brought about by the industrial revolution.
“Wal-Mart, a retail giant, handles more than 1m customer transactions every hour, feeding databases estimated at more than 2.5 petabytes—the equivalent of 167 times the books in America’s Library of Congress (see article for an explanation of how data are quantified). Facebook, a social-networking website, is home to 40 billion photos. And decoding the human genome involves analysing 3 billion base pairs—which took ten years the first time it was done, in 2003, but can now be achieved in one week.
…What we are seeing is the ability to have economies form around the data—and that to me is the big change at a societal and even macroeconomic level,” says Craig Mundie, head of research and strategy at Microsoft. Data are becoming the new raw material of business: an economic input almost on a par with capital and labour. “Every day I wake up and ask, ‘how can I flow data better, manage data better, analyse data better?” says Rollin Ford, the CIO of Wal-Mart.”
Wired’s Chris Anderson argues that the massive quantities of data available make “scientific theory obsolete.” In The End of Theory, Anderson proposes that
At the petabyte scale, information is not a matter of simple three- and four-dimensional taxonomy and order but of dimensionally agnostic statistics. It calls for an entirely different approach, one that requires us to lose the tether of data as something that can be visualized in its totality. It forces us to view data mathematically first and establish a context for it later. For instance, Google conquered the advertising world with nothing more than applied mathematics. It didn’t pretend to know anything about the culture and conventions of advertising — it just assumed that better data, with better analytical tools, would win the day. And Google was right.
Allison Walker’s post in FastCompany called Mama, Don’t Let Your Babies Grow Up to Be Infographics introduced us to a gorgeous video by Motion Theory. While she (gently) mocks IBM’s commercial/spot, the data floating from that baby is very relevant to the work done in data collection and analysis at Health Plans. (It also reminds us of Harvard professor John Palfrey’s work, esp. see Born Digital.)
Even if we don’t have the math chops to analyze megabites of data, we can adhere to some principles in analysis. The following is summarized from a posting by Nathan Yau on his terrific blog Flowing Data. This is what you have to do, he says:
- Pay Attention to the Details: ”The point is that trends and patterns are important, but so are outliers, missing data points, and inconsistencies.”
- See the Big Picture: This means, don’t get so caught up in the details that you miss the point.
- No Agendas: “This should go without saying, but approach data as objectively as possible. I’m not saying you shouldn’t have a hunch about what you’re looking for, but don’t let your preconceived ideas influence the results. Because if you go to length looking for some specific pattern, you’re probably going to find it. It’ll just be at the sacrifice of accurate results.”
- Look Outside the Data: “Context, context, context. Sometimes this will come in the form of metadata. Other times it’ll come from more data.The more you know about how the data was collected, where it came from, when it happened, and what was going on at the time, the more informative your results and the more confident you can be about your findings.”
- Ask Why: “Finally, and this is the most important thing I’ve learned, always ask why. When you see a blip in a graph, you should wonder why it’s there. If you find some correlation, you should think about whether or not it makes any sense. If it does make sense, then cool, but if not, dig deeper.”