

Sometimes we filter traffic quality from the reports…but sometimes they want to do analyses of traffic quality. “While we collect a tremendous amount of data on a daily basis, we don’t necessarily find all that data valuable on a given day,” he says.

The hunt for a SQL on Hadoop solution would not replace its Vertica MPP database, but instead would make it easier to access less commonly used data, Theisinger says. “We can just simply use to query the data and be able to get the results…pretty instantaneously, if not within minutes or an hour.” SQL Hadoop Showdown “If we release a product on any of our consumer sites or any apps, we can collect feedback on how that product feature is performing very quickly without having to build an architecture around it or instrument anything,” he says. This solution is focused on YP’s internal decision making, not its advertising business. This would enable the company to do more processing of the user behavior data as it sits on HDFS before moving the best summary data to the separate Vertica cluster for reporting and ad hoc analytics. The company is looking to SQL on Hadoop tools to give it a processing edge within its 5 PB cluster, which is based on Cloudera‘s CDH5. We’re seeing that it’s a little too slow in some regards for what we’re trying to do.” “I think that MapReduce is a little bit slow and kludgey. “I’m very interested in trying to move toward something that doesn’t look like MapReduce,” Theisinger tells Datanami. The company wants to move beyond MapReduce to something that’s faster, says Bill Theisinger, vice president of engineering for YP’s Platform Data Services group. YP currently uses MapReduce-based ETL routines to hammer that data into something more manageable and suitable for its data warehouse, which is based on Vertica, the massively parallel processing (MPP) database from Hewlett-Packard.


Because of the scale of the system-it captures more than 3.5 billion events per day-the company uses Hadoop as a staging ground. In addition to directory look-ups and advertising, the company serves local maps, hosts a cheap gas finder, and serves information about local restaurants and movies.Īs you might expect, YP captures data from those 70 million sessions and uses analytics to optimize the experience of visitors to its Web and mobile properties. You could find them in every household and in public phone booths (remember those?) Today, most directory services have migrated to the Web, and YP Holdings, the parent company of Yellow Pages, has kept up with the times.Įvery month YP Holdings attracts about 70 million unique users to its YP.com website or its mobile apps. It wasn’t long ago that the yellow pages were the go-to directory for Americans. So when the Georgia-based local marketing firm set out to find a suitable SQL engine to deliver real-time analytics atop its 1,000-node Hadoop cluster, performance at scale was a prime directive. Like many national companies, scale is important for Yellow Pages, or YP as the company is now known.
