How Sparse Data Can Drive Information Density

Date: 05/07/2014

Company: FirstFuel Software

Source: Datanami

By Alex Woodie:

There is a lot going on in the world of big data today–at times, enough to make your head spin. But instead of trying to pry an advantage from every big data source, you might get better results by concentrating your analytical energies on rather thin data streams that nevertheless carry lots of weight.

All things being equal, more data is better. You can drive down uncertainties by analyzing entire datasets instead of just sampling. You can create new insights by combining datasets in creative ways. With Hadoop and in-memory data grids and stream processing and machine learning algorithms and unlimited computational and storage (Hello, AWS), at times it seems almost within our grasps the possibility of becoming all-knowing Big Data Gods (Hello, NSA).

But of course, all things aren’t equal. For starters, not all data is created equal. At best, it can be dirty or incomplete, or at worst, totally misleading. Like it or not, our data analytical desires are still bound by limitations in the hardware, the software, and the people who build the applications (not to mention budgets). While processing, storage, and network costs have dropped significantly and algorithms and artificial intelligence at times seem poised to take over, they still provide present limitations in how we perceive the world.

Now that your big data bubble is popped, you can start assembling, from scratch, a data analytic solution worth keeping. Of course, you’ll begin with your best and most valuable data source. For Badri Raghavan, CTO at energy analytic solutions provider Firstfuel, that means doing a relatively simple task: reading the meter.

“On the face of it, it’s extremely sparse data,” Raghavan says of the usage data FirstFuel pulls from energy meters, either for electricity or gas. “But it has enormous hidden information and insight if you have the right tools to exploit it.”

