Using XML in the MASP Client-Server Protocol

Mark A. Jones, Tony L. Hansen
AT&T Labs - Research, Florham Park, NJ; AT&T Labs - Lincroft, NJ
{jones@research.att.com, tony@att.com}

Introduction

The strength of ASCII protocols for network services such as SMTP [1], NNTP [2], and IMAP [3], is their relative simplicity for debugging, trussing, etc. On the other hand, an undesirable hallmark is their invention of unique syntaxes for specifying requests and replies -- particularly in their conventions for quoting metacharacters, dealing with line continuations, encoding binary data, handling error conditions, etc. XML (Extensible Markup Language 1.0, [4]) is often viewed as encoding domain-specific data payloads over a protocol such as HTTP [6] [7] [8], but not as the protocol substrate itself. This paper presents our experience with MASP (Mediated Attribute Store Protocol), a simple, synchronous, fully XML, client-server protocol.

The MASP Protocol

The important states in MASP are:

  1. the initiation and successful completion of a connection from the client to the server to form a session
  2. the repeated submission of a client requests for service and server responses
  3. the termination of the session and the connection between the client and the server

From the client side, the protocol document begins:

<?xml version="1.0"?>
<!DOCTYPE masp SYSTEM "http://www.research.att.com/~jones/masp-client.dtd">
<client-session>

The server side is similar. A session is closed with the appropriate </client-session> and </server-session> end tags. Although arbitrary markup can represent the requests and responses, we have found the following conventions to be valuable:

  1. Each client request tag such as <search> is paired with either a server response tag such as <search-response> or by an <error-response> tag.
  2. Each client request tag includes a unique id attribute which is also carried in the corresponding server response. The id provides greater security in associating responses, even in a synchronous protocol.
  3. The XML mechanism of "CDATA sections" can handle arbitrary character data. For binary data, the MASP EDATA tag was introduced with an encoding attribute (base64, quoted-printable, url and hex).

The following is an example of a client search request and a successful server response. Note the use of attribute value indexing. The ix attribute references previous name attributes by index.

<search id='1'>                <!-- client request -->
 <typedecl>user$u</typedecl> <filter><![CDATA[(last_name[$u]='Burnes')]]></filter>
 <select name='face[$u]'/>
</search>

<search-response id='1'>       <!-- server response -->
 <resultset>
  <typedecl>user$u</typedecl>
  <results count='2'>
   <result>
    <ids> <id>hermod0000000102</id> </ids>
    <attrvals> <val ix='0' name='face[$u]'><EDATA encoding='qp'>GIF87a=01=00=01=00=80=00=00=95=76=81=00
=00=00=2c=00=00=00=00=01=00=01=00=00=02=02=44=01=00=3b=00</EDATA></val> </attrvals>
   </result>
   <result>
    <ids> <id>hermod0000000324</id> </ids>
    <attrvals> <val ix='0'><EDATA encoding='qp'>GIF87a=01=00=01=00=80=00=00=95=76=81=00=00=00=4e=00
=00=00=00=01=00=01=00=00=02=02=25=09=00=3b=00</EDATA></val> </attrvals>
   </result>
  </results>
 </resultset>
</search-response>

MASP also supports complex multi-turn protocols such as SASL [5] authentication mechanisms. XML debugging comments can be observed with a tool such as the Unix truss utility without affecting the protocol operations. Syntax errors, semantic errors, resource failures, etc. cause the server to return an appropriate <error-response>, which includes a indication of permanence, an errorcode, and an error message. For example:

<error-response id='1' permanence='permanent' errorcode='5'>
<![CDATA[Error parsing : parse error, column 22: '!'Bur...']]>
</error-response>

Summary

MASP is an entirely XML-based client-server protocol whose extensions and conventions form a very useful protocol substrate. XML offers a standard set of mechanisms for representing structured data, and there are many high-quality XML parsers that are now available. DTD's (or XML schemas) present a clear picture of the client and server protocol syntax, and, especially with a validating parser, can enforce very precise syntactic requirements. Modifying the DTD's, changing a dispatch table in the code, and testing a new feature/command is easier than modifying ad hoc parsing code or a YACC grammar.

Most of the features that we have described for turn-taking, escaping and encoding mechanisms, error handling, attribute indexing, debugging and session management would be generally useful for many protocols. A longer version of this paper can be found at http://www.research.att.com/~jones/www9paper.htm.

References

  1. Simple Mail Transfer Protocol, RFC 821, ftp://ftp.ietf.org/rfc/rfc0821.txt
  2. Network News Transfer Protocol, RFC 977, ftp://ftp.ietf.org/rfc/rfc0977.txt
  3. Internet Message Access Protocol, RFC 2060, ftp://ftp.ietf.org/rfc/rfc2060.txt
  4. Extensible Markup Language (XML) 1.0, http://www.w3.org/TR/REC-xml
  5. Simple Authentication and Security Layer (SASL), RFC 2222, ftp://ftp.ietf.org/rfc/rfc2222.txt
  6. The Information and Content Exchange (ICE) Protocol, http://www.gca.org/ice/default.htm
  7. XML-RPC, http://www.xml-rpc.com/
  8. SOAP: Simple Object Access Protocol, ftp://ftp.ietf.org/internet-drafts/draft-box-http-soap-01.txt

Vitae

Mark Jones is a researcher at AT&T Labs. He works on information modeling, artificial intelligence, natural language processing and machine learning, particularly as these fields apply to messaging systems. Tony Hansen is a developer at AT&T Labs. He works on messaging systems, web server systems and Internet standards.