WWW2009
Developers Track
Kevin Miller. Will an easy to use language attract content experts? |
Abstract: We propose a new web plugin,
enabling content creators to step easily to the next level of producing compelling
web applications from scratch without |
Fernando Moreno-Torres. A new tool to improve the filtering options in advanced searching |
Abstract: We have developed a software application
that analyzes in detail a text in English and labels
the text with linguistic attributes plus additional information in a fully
automatic way. |
Tom White and Christophe
Bisciglia. Web data processing
with MapReduce and Hadoop |
Abstract: With massive growth in website traffic,
extracting valuable information from clickstreams
is a challenge as existing tools struggle to scale with web scale data.
Apache Hadoop is a system for storing and
processing massive amounts of data in parallel on clusters of commodity
machines. With Hadoop and MapReduce
it becomes feasible to make ad hoc queries over the massive datasets, opening
up new possibilities for unearthing insights in web scale data. |
Víctor Torres, Jaime
Delgado, Xavier Maroñas, Silvia Llorente and Marc Gauvin. A
web-based rights management system for developing trusted value networks |
Abstract: We present an innovative architecture that
enables the digital representation of original works and derivatives while
implementing Digital Rights Management (DRM) features. The architecture’s
main focus is on promoting trust within the multimedia content value networks
rather than solely on content access and protection control. The system
combines different features common in DRM systems such as licensing, content
protection, authorization and reporting together with innovative concepts,
such as the linkage of original and derived content and the definition of
potential rights. The transmission of reporting requests across the content
value network combined with the possibility for authors to preserve rights
over derivative works enables the system to distribute income amongst all the
actors involved in different steps of the creation and distribution chain.
The implementation consists of a web application which interacts with
different external services plus a desktop user application used to render
protected content. It is currently
publicly accessible for evaluation. |
Geetha Manjunath, Thara S, Hitesh Bosamiya,
Santhi Guntupalli, Vinay Kumar and
Ragu Raman G. Creating
Personal Mobile Widgets without Programming |
Abstract: Our goal is to radically simplify the web so
that end-users can perform their personal tasks with just a single click. For
this, we define a concept called TaskLet to
represent a task-based personal interaction pattern and propose a platform
for automatically creating, sharing & executing them. TaskLets
can be created even by a naive web user without programming knowledge, as we
use the technique of Programming-By-Demonstration. TaskLets
can be deployed on the client, cloud or on telecom provider network –
enabling intuitive web interaction through widgets, thin mobile browsers as
well as from mobile phones via SMS and Voice. Our key innovation is a tool
and platform to enable end-users to simplify their personally valuable task.
We wish to share the proposed tool with both expert web developers and naïve
users of WWW to promote a wider developer community for mobile web
applications. |
Peter Baumann. A Semantic Web Ready Service Language
for Large-Scale Earth Science Archives |
Abstract: Geo data classically are categorized into
vector, raster, and meta data. The latter category receives plentiful
attention by the Web research community; vector data are considered to some
extent, but raster data, contributing the largest data volumes, are neglected
hitherto. Hence, today only on metadata level service offerings can only be
consumed in a semantically adequate manner, while requests addressing the
contents of raster sets cannot be done at all or only through APIs. In the
end, they lack automatic contents discovery, service chaining, as well as
flexible retrieval and analysis. |
Sanjay Agrawal, Kaushik Chakrabarti, Surajit Chaudhuri, Venkatesh Ganti, Arnd Konig and Dong Xin. Query Portals |
Abstract: Our goal is to enable users to efficiently
and effectively search the web for informational queries and browse the
content relevant to their queries. We achieve a unique “portal” like
functionality for each query by effectively exploiting structured and
unstructured content. We exploit existing structured data to identify and
return per query a set of highly relevant entities such as people, products,
movies, locations. Further, we are able to return additional information
about the retrieved entities, such as categories, refined queries, and web
sites which provide detailed information for each entity. The combination of
search results and structured data creates a rich set of results, for the
user to focus on and refine their search. |
Christopher Adams and Tony Abou-Assaleh.
Creating Your Own |
Abstract: Street maps are a key element to Local
Search; they make the connection between the search results, and the
geography. Adding a map to your website can be easily done, using an API from
a popular local search provider. However, the lists of restrictions are
lengthy and customization can be costly, or impossible. It is possible to
create a fully customizable web-deployed street map without sponsoring the
corporate leviathans, at only the cost of your time and your
server. Being able to freely style and customize your map is
essential; it will distinguish your website from websites with shrink wrapped
maps that everyone has seen. Using open source software adds to the level of
customizability - you will not have to wait two years for the next release
and then maybe get the anticipated new feature or the bug fix; you can make
the change yourself. Using free data rids you of contracts, costly
transactions, and hefty startup fees. As an
example, we walk through creating a street map for the |
Marina Buzzi, Maria Claudia Buzzi, Barbara Leporini and
Caterina Senette. Improving interaction via screen reader
using ARIA: an example |
Abstract: An interface that conforms to W3C ARIA
(Accessible Rich Internet Applications) suite would overcome accessibility
and usability problems that prevent disabled users from actively contributing
to the collaborative growth of knowledge. In a previous phase of our study we
first identified problems of interaction via screen reader with Wikipedia [2], then proposed an ARIA-based modified Wikipedia editing page [1].
At this stage we only focused on the main content for editing/formatting
purposes (using roles). To evaluate the effectiveness of an ARIA-based
formatting toolbar we only dealt with the main content of the editing page,
not the navigation and footer sections. |
Christian Bizer, Julius Volz and Georgi Kobilarov. Silk –
A Link Discovery Framework for the Web of Data |
Abstract: The Web of Linked Data is built upon two
simple ideas: First, to employ the RDF data model to publish structured data
on the Web. Second, to set explicit RDF links between data items within
different data sources. |
Georgi Kobilarov, Chris Bizer, Sören Auer
and Jens Lehmann. DBpedia – A
Linked Data Hub and Data Source for Web Applications and Enterprises |
Abstract: The DBpedia
project provides Linked Data identifiers for currently 2.6 million things and
serves a large knowledge base of structured information. DBpedia
developed into the central interlinking hub for the Linking Open Data
project, its URIs are used within named entity
recognition services such as OpenCalais and
annotation services such as Faviki, and the BBC
started using DBpedia as their central semantic
backbone. DBpedia's structured data serves as
background information in the process interlinking datasets and provides a
rich source of information for application developers. Beside
making the DBpedia knowledge base available as
linked data and RDF dumps, we offer a Lookup Service which can be used by
applications to discover URIs for identifying
concepts, and a SPARQL endpoint that can be retrieve data from the DBpedia knowledge base to be used in applications. |
Philippe Poulard.
Intrusive unit testing for Web applications |
Abstract: Several tools have been designed for
automating tests on Web applications. They usually drive browsers the same
way people do: they click links, fill in forms,
press buttons, and they check results, such as whether an expected text
appears on the page. |
Olivier Rossel.
The Web, Smart and fast. |
Abstract: The Web is considered literature by many
users, who cannot spend that much time browsing and reading textual content. |
Sheila Méndez Núñez, Jose Emilio Labra Gayo and Javier De Andrés. Towards a
Semantic Web Environment for XBRL |
Abstract: XBRL is an emerging standard that allows
enterprises to present their financial accounting information in a XML format,
according to their applicable legislation. In this paper we present a system
which offer the possibility of adding semantic annotations contained in a
financial ontology to XBRL reports. Furthermore, we present a new approach
that will apply probabilistic methods to determine the degree of similarity
between concepts of different XBRL taxonomies. . This approach will allow
invertors, enterprises and XBRL users in general, to perform comparisons
between reports that fit onto different taxonomies, even belonging to
different countries. |
Jun Wang, Xavier Amatriain
and David Garcia Garzon. Combining multi-level
audio descriptors via web identification and aggregation |
Abstract: In this paper, we present the CLAM
Aggregator tool. It offers a convenient GUI to combine multi-level audio
descriptors. A reliable method is embedded in the tool to identify users’
local music collection with the open data resources. In the context of CLAM
framework and Annotator application, Aggregator allows users to configure, aggregate
and edit music information ranging from low-level frame scales, to segment
scales, and further to any metadata from the outside world such as semantic
web. All these steps are designed in a flexible, graphical and user-defined
way. |
Ted DRAKE. The
Future of Vertical Search Engines with Yahoo! Boss |
Abstract: While general search engines, such as
Google, Yahoo!, and Ask, dominate the search industry; there is a new batch
of vertical, niche search engines that could fundamentally change the behavior of search. These search engines are built on the
open API’s of Yahoo, Google, and other major players. However, Yahoo’s
recently released BOSS API has made these engines more powerful, more
specialized, and easier to build and maintain. |
Julio
Camarero and Carlos A. Iglesias. A REST Architecture for Social Disaster Management |
Abstract: This article presents a social approach for
disaster management, based on a public portal, so-called Disasters2.0, which
provides facilities for integrating and sharing user generated information
about disasters. The architecture of Disasters2.0 is designed following REST
principles and integrates external mashups, such as
Google Maps. This architecture has been integrated with different clients,
including a mobile client, a multiagent system for
assisting in the decentralized management of disasters, and an expert system
for automatic assignment of resources to disasters. As a result, the platform
allows seamless collaboration of humans and intelligent agents, and provides
a novel web2.0 approach for multiagent and disaster
management research and artificial intelligence teaching. |
Matt Sweeney. YUI 3: Faster, Lighter, Easier |
Abstract: The Yahoo! User Interface Library has been
widely adopted by the mainstream web development community, and is used to
power websites worldwide. |
Patrick Sinclair, Nicholas Humfrey,
Yves Raimond, Tom Scott and
Michael Smethurst. Using the Web as our Content
Management System on the BBC Music Beta |
Abstract: In this paper, we describe the BBC Music
Beta, providing a comprehensive guide to music content across the BBC. We
publish a persistent web identifier for each resource in our music domain,
which serves as an aggregation point for all information about it. We
describe a promising approach in building web sites, by re-using structured
data available elsewhere on the Web --- the Web becomes our Content
Management System. We therefore ensure that the BBC Music Beta is a truly
Semantic Web site, re-using data from a variety of places and publishing its
data in a variety of formats. |
laurent denoue, Scott Carter, John Adcock and Gene Golovchinsky. WebNC: efficient
sharing of web applications |
Abstract: WebNC is a browser plugin that leverages the
Document Object Model for efficiently sharing web browser windows or
recording web browsing sessions to be replayed later. Unlike existing
screen-sharing or screencasting tools, WebNC is optimized to work with web pages where a lot of
scrolling happens. Rendered pages are captured as image tiles, and
transmitted to a central server through http post. Viewers can watch the webcasts in real-time or asynchronously using a standard
web browser: WebNC only relies on html and javascript to reproduce the captured web content. Along
with the visual content of web pages, WebNC also
captures their layout and textual content for later retrieval. The resulting webcasts require very little bandwidth, are viewable on
any modern web browser including the iPhone and
Android phones, and are searchable by keyword. |
Sharad Goel, Jake Hofman, John Langford, David Pennock and Daniel
Reeves. CentMail: Rate
Limiting via Certified Micro-Donations |
Abstract: We present a plausible path toward adoption
of email postage stamps--an oft-cited method for fighting spam--along with a
protocol and a prototype implementation. In the standard approach, neither
senders nor recipients gain by joining unilaterally, and senders lose money.
Our system, called CentMail, begins as a charity
fund-raising tool: Users donate $0.01 to a charity of their choice for each
email they send. The user benefits by helping a cause, promoting it to
friends, and potentially attracting matching donations, often at no additional
cost beyond what they planned to donate anyway. Charitable organizations
benefit and so may appeal to their members to join. The sender’s email client
inserts a uniquely generated CentMail stamp into
each message. The recipient’s email client verifies with CentMail
that the stamp is valid for that specific message and has not been queried by
an unexpectedly large number of other recipients. More generally, the system
can serve to rate-limit and validate many types of transactions, broadly
construed, from weblog comments to web links to
account creation. |
Jason Hines and Tony Abou-Assaleh. Query GeoParser: A Spatial-Keyword Query Parser Using Regular Expressions |
Abstract: There has been a growing commercial interest
in local information within Geographic Information Retrieval, or GIR,
systems. Local search engines enable the user to search for entities that
contain both textual and spatial information, such as Web pages containing
addresses or a business directory. Thus, queries to these systems may contain
both spatial and textual components—spatial-keyword queries. Parsing the
queries requires breaking the query into textual keywords, and identifying
components of the geo-spatial description. For example, the query ‘Hotels
near 1567 Argyle St, Halifax, NS’ could be parsed as having the keyword
‘Hotels’, the preposition ‘near’, the street number ‘1567’, the street name
‘Argyle’, the street suffix ‘St’, the city ‘Halifax’, and the province ‘NS’.
Developing an accurate query parser is essential to providing relevant search
results. Such a query parser can also be utilized in extracting geographic
information from Web pages. |
Sean McCleese, Chris Mattmann, Rob Raskin, Dan Crichton and Sean Hardman. A Virtual Oceanographic
Data Center |
Abstract: Oceanographic datacenters
at the National Aeronautics and Space Administration (NASA) are
geographically sparse and disparate in their technological strengths and
standardization on common Internet data formats. Virtualized search across
these national assets would significantly benefit the oceans research
community. To date, the lack of common software infrastructure and open APIs
to access the data and descriptive metadata available at each site has
precluded virtualized search. In this paper, we describe a nascent effort,
called the |
Clint Hall.
Bootstrapping Web Pages for Accessibility and Performance |
Abstract: In this talk, I present a technique called
Web Bootstrapping, a process by which an accurate collection of only those
static resources and metadata necessary for a unique experience be delivered
passively, by the most performant means
possible. In further contrast to existing methodologies, rather
than focus on the web client's identity or version, this approach determines
resources based on capability, form factor and platform by collecting the
often-immutable attributes of the client. Bootstrapping allows for
rule-based, externalized, server-side configuration, further promoting
progressive enhancement and client performance. |
Josep M. Pujol and Pablo
Rodriguez. Towards Distributed
Social Search Engines |
Abstract: We describe a distributed social search
engine build upon open-source tools aiming to help the community to {\em take back the Search}. Access to the Web is universal
and open, and so the mechanisms to search should be. We envision search as a
basic service whose operation is controlled and maintained by the community
itself. To that end we present an alpha version of what could become the
platform of a distributed search engine fueled by
the shared resources and collaboration of the community. |
Govind Kabra and Kevin Chang. Integration at
Web-Scale: Scalable Agent Technology for Enabling Structured Vertical Search |
Abstract: The Web today has ``everything'': Every
object of interest in real world is starting to find its presence in the
online World. As such, the search needs of users are getting increasingly
sophisticated. How do you search for apartments? How do you find products to
buy? Traditional paradigm of Web search, starting from keyword input, and
ending in Web pages as output, stifles users---requiring intensive manual
post-processing of search results. |