Diving Into Hadoop
At the recent Publisher Forum in Palm Springs, Disney's Karl Reece led an amazing breakout session on a topic I've been quite curious about the last few months: Hadoop. If you attended the PubForum, click here: http://www.admonsters.com/session/member-breakout-session-big-data-syste... You've probably heard of the big data crunching software (maybe you've even seen the famed yellow elephant), but you might not be sure exactly what it does and how it could benefit the data operations within your company.
If that's where you're at, you might want to start with these articles Karl suggested.
http://gigaom.com/cloud/what-it-really-means-when-someone-says-hadoop/
http://gigaom.com/cloud/microsofts-hadoop-play-is-shaping-up-and-it-incl...
http://gigaom.com/2012/03/03/hadoop-jumps-through-hoops-becomes-mainstream/
From the first piece, GigaOm's Derrick Harris explains: "Hadoop is, at its core, an Apache Software Foundation project consisting of two primary subprojects — Hadoop MapReduce and the Hadoop Distributed File System. MapReduce is the parallel-processing engine that allows Hadoop to churn through large data sets in relatively short order. HDFS is the distributed file system that lets Hadoop scale across commodity servers and, importantly, store data on the compute nodes in order to boost performance (and potentially save money)."
Following up on the breakout session, we wanted to start a forum thread on Hadoop and big data processing tools in general to find out who is working with them, the benefits they've seen, the ways in which they've experimented and directions they hope to push toward, as well as the advice they give others heading down this road.
In addition, feel free to submit any questions about Hadoop, big data and other relevant topics. Big data is still a brave new world for media companies and we can all help each other decipher how the tools and processes fit into ad operations.







.png)


Comments
@ http://www.medialets.com/ we have been running Hadoop since April 2010 to process and aggregate all of our batch analytic ad and app data.
I just wrote up our experience going through my experience with the latest distributions from MapR & Cloudera http://allthingshadoop.com/2012/07/10/hadoop-distribution-bake-off-my-ex...
/*
Joe Stein
Chief Architect
joe.stein@medialets.com
*/
Hi all,
This is definitely a topic that piques my interest! TWC implemented Hadoop back in late 2009 during our ad server migration project to address some deficiencies in DFP reporting. We were spoiled with OAS enterprise, since we had direct access to all targeting parameters. I'm sure many of you suffer from the infamous "un-targeted keyword" reporting gap that exists within DFP.
When we migrated, we implemented a "surrogate key" that we pushed into the "u=" parameter in our DFP ad calls. We simultaneously implemented tracking beacons that are trapped by our internal infrastructure and ingested into Hadoop. On a nightly basis, we're pulling down our DFP raw logs and merging the two data sets using the Pig pipelining language native to Hadoop. The result is a complete picture of both the ad request and the response.
This data has allowed us to capture metrics that would otherwise be impossible - like RPM. We're also able to provide our Pricing and Inventory team with the level of keyword reporting that they were accustomed to receiving from OAS. That said though, we're still scratching the surface of what's possible with the technology. Ultimately, we're aiming to provide a UI layer on top of Hadoop that leverages Hive and HBase to provide our internal analysts with direct access to the mountain of data we produce on a daily basis.
Hadoop is absolutely a transformative technology. It has quickly become a fundamental component of our Enterprise DataWarehouse and in many other organizations it is a core component of their delivery architecture. Just as spreadsheets and then databases successively revolutionized our ability to leverage moderately sized data sets to solve real-world business problems, Hadoop meets the next level demand for enormous data processing at scale.
Personally, I'd love to hear how the rest of you are using Hadoop to further your business goals. I was hooked back in '08 when I read Google's Map/Reduce white paper. Feel free to reach out to me directly if you're interested in sharing - or better yet, if you're interested in participating in this evolution with us. :)
Thanks,
Ben Garrett
bgarrett@weather.com