Francois Bourdoncle, Exalead® CSO and Co-funder

François Bourdoncle François Bourdoncle, one of the pioneers of the search software market, co-founded Exalead® in 2000 to revolutionize the world of enterprise search. With his experience in the areas of search technology, software development and engineering, he played a key role in the AltaVista LiveTopics project while directing the Computer Science Research Laboratory at Ecole des Mines de Paris. He also worked as a senior researcher in the Digital Research Laboratory in Paris and in Digital Systems’ Research Center in Palo Alto, California. François Bourdoncle holds a PhD in Computer Science from Ecole Polytechnique in Paris and is an author and speaker who is frequently invited to participate in major industry events.

Exalead® is a global software provider in the enterprise and Web search markets, and the maker of CloudView, the industry’s top platform for Search-Based Applications (SBAs). Exalead’s clients include leading companies such as AFP, American Greetings, Steelcase, Yellow Pages Group, and Friendster. Every month, 100 million users rely on Exalead for unified information presentation, reporting and search.

What do you consider to be the most significant challenge facing the web industry ?

What web search obstacles have yet to be cleared ?

I consider the rise of the mobile Internet to be one of the today’s most significant challenges, yet it’s a topic about which we hear too little. As users increasingly access services via devices powered by the likes of iOS and Android, to name just the two most prominent platforms, Web technologies begin to lose their importance, and the Web itself to lose its universal character. There’s a natural risk that bit-by-bit users will become the captive property of the services they consume, with an accompanying disintegration of the standards and universal access that have defined the Web to date. To stave off this tidal wave, we need more effective management from the Web’s governing bodies. The decision-making process needs to be accelerated, and new norms and standards forged to permit the rapid development of innovative new services. Without such an acceleration, there’s a serious risk that the Internet will be appropriated by a small number of « landowners » hidden MINITEL-style behind the applications of the Internet giants, and we risk seeing the utopia of a universal Web rapidly disappear, even of seeing the Web transformed into a « giant archive », with the « living Web » subsumed by the mobile Internet. One has only to take a look at the battle being waged now around social networking data to be convinced.

Yet maintaining open access to data is critical to developing innovative new services. It is also pivotal to realizing the potential of « BigData », a subject much in the news today. The Web is the world’s first and foremost Big Data collection, yet even a single member of the Internet’s potential « landed gentry » could monopolize access to a significant portion of this Big Data, significantly reducing its potential, and perhaps even completely choking off access to this new Eldorado to new entrants, especially startups, posing a huge risk to innovation and raising fears of an inexorable cartelization of the Internet by a handful of companies.

In this context, ensuring user privacy at all levels also takes on a new urgency. It is clear that the current trend, notably with regard to social networks, is cause for serious concern. Paradoxically, the openness of the Web has reinforced the respect for privacy, not the inverse.

Maintaining open access to the Internet’s Big Data can furthermore facilitate a new form of the Semantic Web, which I would qualify as « bottom up » or « emerging ». The Semantic Web as originally defined is normative (or top-down); that is to say, it is the producers of content who are responsible for structuring the information they publish. In a bottom-up model, the content already present on the Web (blogs, etc.) is taken as is and structured through a semantic analysis of raw text (whether this analysis is purely linguistic or based on a machine learning technology that uses the gigantic corpus already at our disposition – or some combination of the two methods).

The advantage of this approach is it enables information to be structured according to the ways in which it is actually used, not according to some a priori idea of how it will be used at the moment it is published. I call this « the collective intelligence of usage » (*). For example, a blogger writing a post about a new restaurant she tried the evening before wouldn’t think to structure her comments, but a restaurant site that aggregates consumer reviews could (automatically) structure her comments and aggregate them with data from other users to form a meaningful knowledge base.

This particular type of user-generated Big Data is a tremendous source of innovation. However, the growing privatization of the Internet by a few large corporations poses a serious risk to this wealth of collective knowledge.

Of course, search engines also need open, unbiased access to a maximum amount of public content to continue to deliver their service: providing universal access to information on the Internet. In this regard, the management of semi-structured information (that is to say, all information) by search engines will be an absolutely essential issue over the years to come.

(*) François Bourdoncle, L’intelligence collective d’usage, Technologies de l’information et intelligences collectives, Ed.Lavoisier (2010) 161-174.

What Exalead advances/innovations will you present ?

What are your own priorities today in terms of services, tools or web applications ?

We’re fortunate in that the new R&D resources gained as a result of our acquisition by Dassault Systèmes last June are enabling us to work simultaneously on a number of fronts.

First, the Web crawler we use to feed our public WWW search engine, www.exalead.com, gives us the technical capacity to extract and structure large volumes of data on behalf of our clients. In effect, we are progressively constructing massive warehouses of semi-structured data that are helping our clients develop ever more innovative and powerful applications. It is the embryo of our emerging Data-as-a-Service (DaaS) offering, which is in my opinion the industry’s « new frontier » in terms of technical innovation. The significant investment of Dassault Systèmes, alongside Orange and Thalès, in the Cloud Computing project « Andromède » will give us access to major computing resources that will permit us to shift our work in this domain into high gear.

Once we have created these huge data warehouses, we will undertake the next logical step of exploiting them. Toward this end, we are working on new, ultra-powerful technologies for managing semi-structured information. These innovations will reconcile and unify what have traditionally been very different technologies: relational database management systems, search engines, and Semantic Web technologies. Finally, as it should be, we are working to enhance our CloudView platform with ever greater processing capacity and with new tools to speed the development of search-based applications.

What would your message be in inviting businesses to join you at this conference and its various events ?

Even though we serve the corporate market, the Web has always been at the core of Exalead’s concerns. While our public Web search engine www.exalead.com is primarily a technology demonstrator, it is also serves as an R&D lab in which many of our most creative ideas take root. In addition, it provides a formidable testing ground for our technologies. Our clients know they can use our product with confidence in even the most demanding environments because they know all features must pass the « test of the Web » before being released into the marketplace.

Of equal importance is the fact that over one third of our clients are Web customers operating in B2C markets. It’s essential to us that we understand their needs and challenges. They are, in fact, the largest source of innovation for us because they themselves are continual innovators. We learn from them, develop solutions to enable them fulfill their vision, and then adapt these innovations for use in the corporate world.

Last but certainly not least, the Web is a very important source of data for us. We use this data to help our customers build or improve their databases so they can develop high quality services at the forefront of innovation.

For all these reasons, www2012 is a major event for most companies which will have the unique opportunity to catch the best of the technologies, to understand the major web trends and to meet users, developers, web industries and the best academics in the domain.