Technofeel Things that keep me up late

29Aug/092

VLDB09 Overview, part 2

I'm right back in Paris after four awesome days at Lyon where I attended the Very Large Database Conference. Overall this was a very interesting event, with both very low level technical talks and architectural presentations. The event was really well organized as I've partially mentioned in my overview of the first two days, but there is still place for improvements.

The average quality of the selected papers was pretty good, but I personally thought that there were still too many non innovative papers and badly prepared speakers. Common guys, you've got the chance to expose your work to an invaluable audience in a prestigious conference, the least you can do is to be well trained so you don't have to skip your last 10 slides... Also, another detail, how the hell can you receive 700 persons in France and serve lunch without desert ? I may be a bit greedy, but I wasn't the only disappointed person. That's it for the traditional french whine part. (But there was good wine !)

So the last two days were full of interesting sessions in various domains. Wednesday morning began with a keynote about how database technologies contribute to enhance games and simulation engines performances. For a keynote, I would have preferred a more database focused talk, because in the end it was just about having each tick of the game acting in a map / reduce fashion to compute all changes together, and some Tree usage here and there, nothing very funky (but there were some great videos of old video games to compensate... :) ).

The afternoon started with a great session called Map/Reduce with three presentations:

  • SQL/MapReduce: A practical approach to self-describing, polymorphic, and parallelizable user-defined functions: The speech was about a commercial product, AsterDB, and their alternative approach to Hadoop / PIG. The idea is that statisticians and other people interested about processing data prefer to write SQL queries instead of coding map / reduce functions that are usually so specialized that they are not reusable. They answer this concern by providing an easily extensible SQL language so you can bypass the actual limits of SQL by writing a few lines of java for example, the whole thing being then parallelized.
  • Building a HighLevel Dataflow System on top of MapReduce: The Pig Experience: Really interesting presentation by a Yahoo! engineer, one of the creator of Pig, the famous high level language to express data analysis programs on top of Hadoop. The focus was on how Pig is designed and implemented to actually transform your logical instructions into physically distributed jobs and stages. Pretty neat.
  • PLANET: Massively Parallel Learning of Tree Ensembles with MapReduce: Highly technical talk from a Google fellow describing "a scalable distributed framework for learning tree models over large datasets. PLANET defines tree learning as a series of distributed computations, and implements each one using the MapReduce model of distributed computation.". A great breakthrough in a domain where lots of state of the art learning algorithms are designed for a single machine.

The day ended on a pleasant discussion panel about "How Best to build Web Scale Data Managers ?". Lots of relevant questions and responses that point to the disappointing lack for an open source parallel database. Hopefully the next decade will solve that. (edit: I might not have understood some of the subtle sarcasm of one of the speaker, so lets remove the critic).

Thursday was even more interesting, starting with the "10-year award keynote", rewarding the most influencing paper of the last decade. The award was attributed to the MonetDB staff for their paper "Database Architecture Optimized for the New Bottleneck: Memory Access" describing a novel approach to database storage (column storage, well known nowadays) and to the join operation optimized for the CPU cache, thus avoiding the overhead of memory access, and how the column oriented engine allow for . They also talked about vectorwise and its enhanced query computation pipeline performing batch hit operations. Well done guys, it's a pretty impressive work !

After a little coffee i attended a really great talk given by a postdoc from ETH Zurich (i don't know how many of these guys came to the conference, but there were a lot !) about "Data Processing on FPGAs". I didn't knew much about these reprogrammable chips before, but the guy made them look pretty cool. The presentation described how you can achieve faster and parallelized sorts and actually save money because of the incredibly low power consumption of these chips (8W vs 102W for a CPU). (And the guy was well prepared and gave a dynamic and fun presentation !).

Another trendy topic followed: HadoopDB (yeah, there where Hadoop and map reduce all around the place). HadoopDB is "an architectural hybrid of MapReduce and DBMS technologies for analytical workloads", a work done by a few students of the Yale university. Their approach is to reconcile the two elephants together (Postgres and Hadoop both have elephants as logos), put an modified version of Hive on top of that (Hive is a SQL => Map Reduce job interface) and make use of the processing force of the databases to actually do what they are meant to do in a distributed fashion. It's an interesting track even if it feels a bit like a big hack on top of hacks. And erm, i can't say much of the end of the presentation, we havn't seen the last 10 slides or so... :/

Finally, I ended my visit at VLDB09 with two presentation of Google Interns about data mining to get structured result sets out of semi unstructured pages with lists and tables. The papers are really neat and worth having a look:

  • Harvesting Relational Tables from Lists on the Web: A method to detect fields in unstructured lists and to successfully align them up into a coherent table.
  • Data Integration for the Relational Web: A description of a search engine that creates clusters of related tables on the web, rank them and is able to create a single coherent table with the similar fields, and to actually extends the table with new fields on demand that may not have been directly included in the original tables. A really interesting approach that sounds like google squared :)

Allright, I think i'm done with this dammnnnn long post that you probably don't have read till the end, I don't care it's also a way for me to keep track of all these interesting stuff :)

VLDB guys, congratulations and see you next year at Singapore !

Comments (2) Trackbacks (0)
  1. - The guy was Samuel Madden and he is widely considered somewhat of a genius, in the database world at least. He was being commical & highly sarcastic, it may have been too subtle for some.

  2. Thanks for the notice, I may not have understood some of the english sarcasm here, my bad.

    - Jeremie


Leave a comment


No trackbacks yet.

About…

Hi ! I'm Jérémie, a french passionate about information retrieval, natural language processing, distributed computing, innovative web interfaces, entrepreneurship and wakeboarding !

I work for Exalead where I lead a little team on innovative challenges to rock'n'roll enterprise search.

View Jeremie Bordier's profile on LinkedIn

Recent Posts

Categories