Kinji Yamasaki
Benay Dara-Abrams
Sholink Corporation
{kinji,benay}@sholink.com
Now that we are in the midst of a transition from an industrial society to an information driven society, it is necessary to understand that the importance of information not only lies in the information itself, but also in the process which deals with the information. Today's information frameworks focus strongly on the collection of data, but omit the relevance of the data. This paper presents a framework for dealing with three problems: 1) knowledge domain integration: screening and transformation of data to information, and information to knowledge; 2) knowledge domain interoperability: obtaining and sharing knowledge and knowledge related tasks in a distributed environment; and 3) knowledge domain transaction: nested and parallel transactions of information between organizations. This framework not only places strong emphasis on automated knowledge processing, but also on the necessary human intervention in the process themselves. We call this combination of automation and human intervention in knowledge processing "information mediation".
There is a major difference between raw data and information, however. In [1] and [2], we see the definition of information as described in bits:
Many of the commercial information management frameworks in existence today do not capture these differences between data and information, and between information and knowledge. Most of the frameworks that do are AI driven. AI driven Knowledge Based Systems (KBS) present another set of problems. Most AI systems are complex in nature, as there is a high degree of complexity involved in building a system that will conduct human intelligence: judging what's data, what's information, and what's knowledge. This complexity often results in occasional or frequent incorrect judgments: data will sometimes be presented as information, and even knowledge.
The framework proposed in this paper not only addresses the differences between data, information and knowledge, but also other information management functionalities such as interoperability and transactions. Moreover, this model, called Information Mediation, avoids the complexity presented by AI models by introducing human elements into the information management process. It builds on top of the knowledge domain models presented in [3], but with considerable amount of revisions and extensions. In section 2 we discuss the overall architecture of information mediation. In section 3 we review some issues in the implementation. And in section 4 we focus on current and future works.
The Knowledge Domain Integration component differentiates among data, information and knowledge. Traditional AI systems implement automated transaction systems to deal with the differentiations, necessary conversions and rejections. This method is okay as long as the data to be examined are of static patterns. This is definitely not true in today's world, where compound documents with different layouts popularly exist. In the Information Mediation framework, we introduce an automated workflow system whose components include automated processes and human interface. The human interface allows information administrators to guide the system in selection of data, storage of information and application of knowledge. The automated processes, as a part of a workflow engine, execute programs that handle input data. These programs can range from database tools, search engine indexers, to third party full blown applications. The workflow system is built based upon WFMC (Workflow Management Coalition) Specifications ([4]), but differs in certain areas of interoperability supports and protocol APIs.
Knowledge Domain Interoperability allows distributed collaboration work and sharing of information and knowledge. It is achieved through the workflow's distributed environment support, built based on CORBA. Processes across organizations may participate as a part of the workflow, through a well defined CORBA IDL interface to the workflow component API, and freely (or with access control) obtaining information and knowledge owned by the central workflow system. This collaboration design differs from the traditional whiteboard collaboration model in which active participations are necessary. The workflow engine in this case handles all collaborative activities and transactions, and no human interventions are required. This collaboration model not only promotes sharing of information and knowledge, but also supports distributed enrichment of data into information, or information into knowledge. This process is examined in more detail in section 3.
The Knowledge Domain Transaction component allows for nested and interactive transactions between organizations' information services. This nested model creates parallelism in the data transaction process. It not only increases the speed of network transactions, but also allows more dynamic interactions between processes of the workflow system, such as information content negotiation. The component provides an abstraction to information warehouses across organizations, and creates a mean to extract and store information.
In the above example, we defined a static criterion, as it does not take into consideration of user requirements. This type of criteria should only be used as a first level screening, but not to decide what's useful to who. Data collection systems such as AltaVista TM deals with this kind of data selection criteria. Smarter information systems would incorporate dynamic criteria into their selection processes. An example of a more dynamic criterion could be: if the source is "ComputerWorld" and the subject is Java, it is useful; otherwise, it is not. Such criterion would filters out data not useful, but still not necessarily limits down the raw data to only information. Plus, in a big organization, this criteria would fit some people, but not all of the people. To have each member of the organization define their own information criteria would create chaos and unwanted complexity in our information integration system.
This is one of the differences between the Information Mediation model from traditional AI systems or those information systems described in [3]. In the Information Mediation framework we introduce a human interface factor in the information and knowledge discovery and sharing process. Human intervention could be used in finally deciding what's useful, and what's not. This is especially true at applying information, thus treating it as knowledge. An example of this would be: if the information from INS disproves a person's status, then reject this person's insurance application. Such decision and application of knowledge could probably be done with AI driven knowledge-base systems, but such systems would be complex in nature, and at times inflexible, not to mention the potential lack of confidence in the decisions of a machine.
The Information Mediation framework uses a workflow engine to control data, information and knowledge related tasks. Business processes are managed by the engine, and each business processes can and often do take on data from a particular source. A workflow engine consists of multiple business processes, defined through a common process definition language, as suggested in [4]. At the start of a business process, the process must register with the workflow engine. The workflow engine is responsible for keeping track of the processes, process components, states that must be transfered from one component to another, and the data themselves. This centralized control model allows inter-process communications between business processes.
The workflow process components connect to the workflow engine through well-defined component APIs (subject of the next few subsections). These components range from filters and data format converters to database tools and third party applications. Human interventions can be a part of a business process by participating through a graphical user interface.
The workflow system in the Information Mediation framework is trigger driven. Processes and components are invoked by triggers. A trigger could be the end of another process or component, an action by a knowledge administrator, or a timed event. Each trigger can either start a new process, or invoke a subprocess within an established process. Most components that provide human interface trigger subprocesses and/or other processes.
Another important aspect of the workflow engine is the dynamic definition of business processes. As mentioned before, business processes are defined through the process definition language. One of the capabilities of the Information Mediation workflow engine is dynamic loading of such definitions. This capability allows knowledge administrators to change definitions of business processes in real time.
The Information Mediation framework uses distributed objects as the common protocol for sharing of information. This choice is based on the flexible and cross-platform requirements for an information system. Distributed objects also offer plug-and-play interfaces and portability (see [5] for a more featured list of advantages to using objects instead of traditional client-server approach). A CORBA 2.0 ([6]) compliant ORB engine was used in the initial implementation of the Information Mediation framework.
To fully capture the current need for information sharing, and at the same time to allow flexibility and scalability, the Information Mediation framework builds its objects around a few invariants associated with an information system. These objects are placed internally into the workflow system, and thus allowing remote clients and systems to access information in the workflow engine. The invocations of each of the objects also act as triggers to business processes.
In the Information Mediation framework we identified four invariants to information systems: delivery of data, retrieval of information, processing of data and management of information storage. Based on these invariants we defined four object interfaces using IDL. These four object interfaces are:
Interoperability is an important part of the Information Mediation framework in that it is nearly impossible in today's world for one organization to own all the information mediation tools necessary to transform data into information and information into knowledge. Let's go back to our insurance application example. In order to receive approval for healthcare insurance application one must have legal status in the United States, and such status are only officially stored at INS. In this case interoperability between the healthcare company's information system and INS' information system is highly desirable. With interoperability in place, the data enrichment processes and other authorities related to data, information and knowledge no longer need to reside in one location.
Defining a remote object to be a component adds complexity to our system, unfortunately. Two particular complexities that stand out are state control, and exception handling.
As mentioned before, associated with each business processes are states which the workflow engine keeps. The problem created from interoperability in business process definition is around the ownership of these states. This is a tricky problem because we cannot assume that the remote information system is capable of keeping and perhaps more importantly returning states. In the Information Mediation workflow system we partially dealt with this problem by having the local workflow engine keep state by default, and each time a business process is invoked on a remote system, a local state is copied and sent to the remote object. We still have not discovered a practical solution in making sure that we receive back an updated copy of the state.
Another problem introduced by interoperability and remote subprocesses is exceptions handling. Even though CORBA 2.0 supports some exceptions handling, it is not sufficient in this case as components in remote subprocesses may be disconnected to the CORBA object implementation. Again, we have not yet come up with a suitable solution.
The transaction management component of the Information Mediation framework acts like a resource shared by all participants of the framework. It provides tools to store and retrieve data, information and knowledge into and from a database, as well as methods to perform content negotiation between multiple components, locally and distributely. The ability to support content negotiation is important in an information rich environment, where interactive exchange of information is often desired. Consider the following situation: two medical labs comparing lab test results, and will continue their efforts based on the results of the other lab. Traditional information systems allow them to transfer all the data at once, wait, then receive all the data back, and so on. The Information Mediation framework, with its transaction model, allows the two labs to exchange small pieces of information interactively and keep a persistent connection open until all the information exchange transactions are completed.
The Information Mediation framework's implementation of transaction processing is based on TCP transport for content negotiation, and JDBC for database connectivity. We picked these two methods for different reasons. For content negotiation we desired a fast and reliable method for communication, and for database connectivity a non-system dependent tool is demanded.
The ShoHealth product is used for information management controls in the healthcare industry, especially for information transaction applications such as lab tests transactions, patient record transactions, insurance processing, disease tracking, etc. For more information, contact info@sholink.com.
Many areas still need to be addressed, such as the problems created from interoperability and use of remote process components. Other areas include security, database management and more elaborate transaction process support ([7]).