Пераклад "Ontology Mapping: The State of the Art"Another translations: into Russian. |
- Statistics
- Participants
- Translate into Polish
- Translation result
- Translation is not started yet.
Abstract. Ontology mapping is seen as a solution provider in today’s landscape of ontology research. As the number of ontologies that are made publicly available and accessible on the Web increases steadily, so does the need for applications to use them. A single ontology is no longer enough to support the tasks envisaged by a distributed environment like the Semantic Web. Multiple ontologies need to be accessed from several applications. Mapping could provide a common layer from which several ontologies could be accessed and hence could exchange information in semantically sound manners. Developing such mappings has been the focus of a variety of works originating from diverse communities over a number of years. In this article we comprehensively review and present these works. We also provide insights on the pragmatics of ontology mapping and elaborate on a theoretical approach for defining ontology mapping.
1 Introduction
Nowadays, the interested practitioner in ontology mapping, is often faced with a knotty problem: there is an enormous amount of diverse work originating from different communities who claim some sort of relevance to ontology mapping. For example, terms and works encountered in the literature which claimed to be relevant include: alignment, merging, articulation, fusion, integration, morphism, and so on. Given this diversity, it is difficult to identify the problem areas and comprehend solutions provided. Part of the problem is the lack of a comprehensivesurvey, a standard terminology, hidden assumptions or undisclosed technical details, and the dearth of evaluation metrics.
This article aims to fill-in some of these gaps, primarily the first one: lack of a comprehensive survey. We scrutinised the literature and critically reviewed works originating from a variety of fields to provide a comprehensive overview of ontology mapping work to date. We also worked on the theoretical grounds for
defining ontology mapping, which could act as the glue for better understanding similarities and pinpointing differences in the works reported.
The overall goal of this paper is not only to give readers a comprehensive overview of the ontology mapping works to date, but also to provide necessary insights for the practical understanding of the issues involved. As such, we have been critiquing while reporting these works, and not just been descriptive. At the same time though, we objectively review the works with emphasis given on a practitioner’s interests, and try to provide answers to the following questions:
– What are the lessons learnt from this work?
– How easily can this work be replicated in similar domains?
Outline. We start by elaborating on the survey style we adopt in Section 2, where we also provide a theoretical definition of the term ‘ontology mapping’. As this article is mostly a descriptive exercise and not a normative one, we do not claim that this is the only one. We include it here for the sake of comprehending the issues involved in mapping, especially when these originate from different communities. We continue with the main section of the article, the actual survey, Section 3, which also includes illustrative examples of ontology mapping usage. In Section 5 we discuss the pragmatics for ontology mapping, and we conclude the article in Section 6.
2 Survey style
Current practice in ontology mapping entails a large number of fields ranging from machine learning, concept lattices, and formal theories to heuristics, database schema, and linguistics. Their applications also range significantly, from academic prototypes to large scale industrial applications. Therefore, it was impractical and overwhelming to conduct a marketing-style survey with questionnaires, standardised categories, and multiple participants. In fact, there is an acknowledged dearth of standards and metrics in knowledge engineering which would have made our job even more difficult. The few that are defined, like for example the CommonKADS methodology (Schreiber et al. 2000), or the recent OntoWeb EU thematic network (OntoWeb 2002), are not fully endorsed by recognised bodies, neither do they specifically mention ontology mapping works.
We therefore scrutinised the literature to identify works that target ontology mapping, or at least are somehow related to it. We deliberately widened the scope of our survey and included works that target integration and merging, originate from other communities (for example, database schemata), and works that are purely theoretical. We aim to give a broad picture of ontology mapping practice today and hence do not restrict our survey to those works that are ‘labelled’ as ontology mapping tools. As we will show in the sequel, there are many angles at which the problem can be viewed from, and we aim to highlight this diversity. Despite the fact that we quote original works, we also provide critiquing, whenever appropriate, in order to maintain a uniform style, to provide comparative indicators, and to focus on a broader picture of ontology mapping. As such, the reader should expect a certain degree of subjectivity. However, this has been kept to a minimum, and we gathered most of our personal judgement in Section 5, where we elaborate on issues that we found important for the interested practitioner.
We should also note what this survey is not about: It is not a comparative review, we do not compare the works reported under any specific framework, simply because such a framework does not exist! Although efforts have been made to provide such a framework (see, for example, (OntoWeb 2002), pp. 35– 51), these are far from being standards. Experience from software engineering shows that developing and agreeing on these standards is a lengthy process which takes many years and extensive resources (Moore 1998). This survey also does not make any attempt to provide standardised definitions and scope of ontology mapping. The origin and diversity of works reported makes this task arguably impossible. Only a theoretical approach could help us understand the differences and commonalities. In the next section, we elaborate on such an approach.
2.a Defining ontology mapping
We shall adopt an algebraic approach and present ontologies as logical theories. An ontology is then a pair O = (S,A), where S is the (ontological) signature— describing the vocabulary—and A is a set of (ontological) axioms—specifying the intended interpretation of the vocabulary in some domain of discourse.
Typically, an ontological signature will be modelled by some mathematical structure. For instance, it could consist of a hierarchy of concept or class symbols modelled as a partial ordered set (poset), together with a set of relations symbols whose arguments are defined over the concepts of the concept hierarchy. The relations themselves might also be structured into a poset. For the purposes of this survey we shall not commit to any particular definition of ontological signature; we refer to the definitions of ‘ontology’, ‘core ontology’, or ‘ontology signature’ in (Kalfoglou and Schorlemmer 2002; Stumme and Maedche 2001; Bench-Capon and Malcolm 1999), respectively, for some examples of what we consider here an ontological signature. In addition to the signature specification, ontological axioms are usually restricted to a particular sort or class of axioms, depending on the kind of ontology.
Ontological signature morphisms. We understand ontology mapping as the task of relating the vocabulary of two ontologies that share the same domain of discourse in such a way that the mathematical structure of ontological signatures and their intended interpretations, as specified by the ontological axioms, are respected. Structure-preserving mappings between mathematical structures are called morphisms; for instance, a function f between two posets that preserves the partial order (a <= b implies f(a) <= f(b)) is a morphism of posets. Hence we shall characterise ontology mappings as morphisms of ontological signatures as follows.
A total ontology mapping from O1 = (S1,A1) to O2 = (S2,A2) is a morphism f : S1 → S2 of ontological signatures, such that, A2 |= f(A1), i.e., all interpretations that satisfy O2’s axioms also satisfy O1’s translated axioms. This makes an ontology mapping a theory morphism as it is usually defined in the field of algebraic specification (see, for instance, (Meseguer 1989)). In order to accommodate a weaker notion of ontology mapping we will say that there is a partial ontology mapping form O1 = (S1,A1) to O2 = (S2,A2) if there exists a sub-ontology O'1 = (S'1 ,A'1) (S'1 C= S1 and A'1 C= A1) such that there is a total mapping from O'1 to O2.
Populated ontologies. Central to several approaches to ontology mapping is the concept of a populated ontology. In this case, classes of an ontological signature come equipped with their respective instances. A populated ontology can be characterised by augmenting the signature with a classification relation that defines the classification of instances to the concept symbols in the signature. This brings forth issues about the correctness of populated ontologies, namely if the classification of instances respects the structure of the ontological signature. See (Kalfoglou and Schorlemmer 2002) for a use of populated ontologies in the definition of ontology mapping.
Taking into account the population of ontologies when establishing the mapping between ontologies may be useful for relating concepts according to the meaning and use that these concepts are given by particular communities. This idea is theoretically described in (Kent 2000) and (Schorlemmer 2002), for instance, and is fundamental to the information-flow based approaches described in Section 3.f.
Ontology morphisms. So far, we have defined ontology mapping only in terms of morphisms of ontological signatures, i.e., by determining which concept and relation symbols of one ontology are mapped to concept and relation symbols of the other. A more ambitious and practically necessary approach would be to take into account how particular ontological axioms are mapped as well. Formally, this would require ontology mappings to be defined in terms of morphisms of ontologies, i.e., signature + axioms, instead of morphisms of signatures only.
Most works on ontology mapping reported here adopt the more restrictive view of ontology mapping as signature morphism. Nevertheless, some of them consider the alignment of logical sentences, and not of signature symbols only (Calvanese et al. 2001b; Madhavan et al. 2002). Thus, we will use the term ‘ontology mapping’ for mappings as ontological signature morphisms as well as mappings as ontology morphisms.
Ontology alignment, articulation and merging. Ontology mapping only constitutes a fragment of a more ambitious task concerning the alignment, articulation and merging of ontologies. Here we want to clarify our understanding of these concepts within the above theoretical picture. An ontology mapping is a morphism, which usually will consist of a collection of functions assigning the symbols used in one vocabulary to the symbols of the other. But two ontologies may be related in a more general fashion, namely by means of relations instead of functions. Hence, we will call ontology alignment the task of establishing a collection of binary relations between the vocabularies of two ontologies. Since a binary relation can itself be decomposed into a pair of total functions from a common intermediate source, we may describe the alignment of two ontologies O1 and O2 by means of a pair of ontology mappings from an intermediate source ontology O0 (see Figure 1). We shall call the intermediate ontology O0, together with its mappings, the articulation of two ontologies. For an example of ontology articulation see (Maedche and Staab 2000; Madhavan et al. 2002; Compatangelo and Meisel 2002).
Finally, an articulation allows for defining a way in which the fusion or merging of ontologies has to be carried out. The intuitive idea is to construct the minimal union of vocabularies S1 and S2 and axioms A1 and A2 that respects the articulation, i.e., that is defined modulo the articulation (see Figure 1). This corresponds to the mathematical pushout construct, and is exploited, for instance, in the frameworks described in (Bench-Capon and Malcolm 1999; Kent 2000; Schorlemmer 2002). Again, this ‘strong’ notion of merging can be relaxed by taking the articulation of two sub-ontologies of O1 and O2 respectively, and defining the merged ontology O according to their articulation.
A word on translation and integration. Translation is used by different authors to describe two different things. First, there is the translation between formal languages, for example from Ontolingua to Prolog. This changes the syntactic structure of axioms, but not the vocabulary. This is not of our concern in this survey. Second, there is the actual translation of the vocabulary. This is intimately linked to the issue of ontology mapping. Actually, the difference between mapping and translation is that the former denotes the process of defining a collection of functions that specify which concepts and relations correspond to which other concepts and relation, while the latter is the application of the mapping functions to actually translate the sentences that use the one ontology into the other. This presupposes that the ontologies share the domain in which the respective vocabularies are interpreted. Under integration, on the other hand, we regard the composition of ontologies to build new ones, but whose respective vocabulary are usually not interpreted in the same domain of discourse.
2.b Categorisation of works
We selected the following categories as the most appropriate ones to classify the 35 works we report in this article. These categories are not by any means standard, but merely identify the type of work being reported. In addition, some of them belong to more than one category. In such a case, we include the cited work in both categories with emphasis given on its primary category. The categories are as follows:
– Frameworks: These aremostly a combination of tools, they provide a methodological approach to mapping, and some of them are also based on theoretical work.
– Methods and tools: Here we report tools, either stand-alone or embedded in ontology development environments, and methods used in ontology mapping.
– Translators; Although these works might be seen as peripheral to ontology mapping, they are mostly used at the early phases of ontology mapping.
– Mediators: Likewise, mediators could be seen as peripheral, but they provide some useful insights on algorithmic issues for mapping programs.
– Techniques: This is similar to methods and tools, but not so elaborated or directly connected with mapping.
– Experience reports: We found it useful to include in our survey reports on doing large-scale ontology mapping, as it provides a first-hand experience on issues of scalability and of resources involved.
– Theoretical frameworks: This is probably, the most interesting category. We argue that a lot of theoretical work has not been exploited yet by ontology mapping practitioners. This category aims to highlight these works.
– Surveys: This is similar to experience reports but they are more comparative in style.
– Examples: This is our last category and the most illustrative one. It aims to show the diversity of applications of ontology mapping and the variety of case studies that have benefitted from it.We quote examples from a selection of original works which have been reported in previous categories.
3 Ontology mapping survey
3.a Frameworks
We selected the following frameworks from the literature: Fern´andez-Breis and Mart´ınez-B´ejar’s (Fern´andez-Breis and Mart´ınez-B´ejar 2002) cooperative framework for ontology integration, the MAFRA framework for distributed ontologies in the Semantic Web (Maedche and Staab 2000), the OISs framework for ontology integration systems (Calvanese et al. 2001b), Madhavan and colleagues’ framework and language for ontology mapping (Madhavan et al. 2002), the OntoMapO framework for integrating upper level ontologies (Kiryakov et al. 2001), and the IFF framework for ontology sharing (Kent 2000). Fern´andez-Breis and Mart´ınez-B´ejar (Fern´andez-Breis and Mart´ınez- B´ejar 2002) describe a cooperative framework for integrating ontologies. In particular, they present a system that
. . . could serve as a framework for cooperatively built, integration-derived (i.e., global) ontologies.
Their system is aimed towards ontology integration and is intended for use by normal and expert users. The former are seeking information and provide specific information with regard to their concepts, whereas the latter are integrationderived ontology constructors, in the authors jargon. As the normal users enter information regarding the concepts’ attributes, taxonomic relation, and associated terms in the the system, the expert users process this information, and the system helps them to derive the integrated ontology. The algorithm that supports this integration is based on taxonomic features and on detection of synonymous concepts in the two ontologies. It also takes into account the attributes of concepts, and the authors have defined a typology of equality criteria for concepts. For example, when the name-based equality criterion is called upon, both concepts must have the same attributes. An example of its use is included in Section 4.
Maedche and Staab (Maedche and Staab 2000) devised a mapping framework for distributed ontologies in the Semantic Web. The authors argue that mapping existing ontologies will be easier than creating a common ontology, because a smaller community is involved in the process. MAFRA is part of a multi-ontology system, and it aims to automatically detect similarities of entities contained in two different department ontologies. Maedche and Staab (Maedche and Staab 2000) argue:
Both ontologies must be normalized to a uniform representation, in our case RDF(S), thus eliminating syntax differences and making semantic differences between the source and the target ontology more apparent.
This normalisation process is done by a tool, LIFT, which brings DTDs, XMLSchema and relational databases to the structural level of the ontology. Another interesting contribution of the MAFRA framework is the definition of a semantic bridge. This is a module that establishes correspondences between entities from the source and target ontology based on similarities found between them. All the information regarding the mapping process is accumulated, and populate an ontology of mapping constructs, the so called Semantic Bridge Ontology (SBO). The SBO is in DAML+OIL format, and the authors argue:
One of the goals in specifying the semantic bridge ontology was to maintain and exploit the existent constructs and minimize extra constructs, which could maximize as much as possible the acceptance and understanding by general semantic web tools.
In Section 4 we give a brief mapping example taken directly from (Maedche and Staab 2000).
Calvanese and colleagues (Calvanese et al. 2001b) proposed a formal framework for Ontology Integration Systems—OISs. The framework provides the basis for ontology integration, which is the main focus of their work. Their view of a formal framework is close to that of Kent (see Section 3.f), and it
. . . deals with a situation where we have various local ontologies, developed independently from each other, and we are required to build an integrated, global ontology as a mean for extracting information from the local ones.
Ontologies in their framework are expressed as Description Logic (DL) knowledge bases, and mappings between ontologies are expressed through suitable mechanisms based on queries. Although the framework does not make explicit any of the mechanisms proposed, they are employing the notion of queries, which
. . . allow for mapping a concept in one ontology into a view, i.e., a query, over the other ontologies, which acquires the relevant information by navigating and aggregating several concepts.
They propose two approaches to realise this query/view based mapping: globalcentric and local-centric. The global-centric approach is an adaptation of most data integration systems. In such systems, the authors continue, sources are databases, the global ontology is actually a database schema, and the mapping is specified by associating to each relation in the global schema one relational query over the source relations. In contrast, the local-centric approach requires reformulation of the query in terms of the queries to the local sources. The authors provide examples of using both approaches in (Calvanese et al. 2001a) and we recapitulate some of them in Section 4.
Madhavan and colleagues (Madhavan et al. 2002) developed a framework and propose a language for ontology mapping. Their framework enables mapping between models in different representation languages without first translating the models into a common language, the authors claim. The framework uses a helper model when it is not possible to map directly between a pair of models, and it also enables representing mappings that are either incomplete or involve loose information. The models represented in their framework are representations of a domain in a formal language, and the mapping between models consists of a set of relationships between expressions over the given models. The expression language used in a mapping varies depending on the languages of the models being mapped. The authors claim that mapping formulae in their language can be fairly expressive, which makes it possible to represent complex relationships between models. They applied their framework in an example case with relational database models. They also define a typology of mapping properties: query answerability, mapping inference, and mapping composition. The authors argue:
A mapping between two models rarely maps all the concepts in one model to all concepts in the other. Instead, mappings typically loose some information and can be partial or incomplete.
Question answerability is a proposed formalisation of this property. Mapping inference provides a tool for determining types of mappings, namely equivalent mappings and minimal mappings; and mapping composition enables to map between models that are related by intermediate models. Examples of their framework are given in Section 4.
Kiryakov and colleagues (Kiryakov et al. 2001) developed a framework for accessing and integrating upper level ontologies. They provide a service that allows a user to import linguistic ontologies onto a Web server, which will then be mapped onto other ontologies. The authors argue for
. . . a uniform representation of the ontologies and the mappings between them, a relatively simple meta-ontology (OntoMapO) of property types and relation-types should be defined.
Apart from the OntoMapO primitives and design style, which is peripheral to our survey, the authors elaborate on a set of primitives that OntoMapO offers for mapping. There are two sets of primitives defined, InterOntologyRel and IntraOntologyRel, each of which has a number of relations that aim to capture the correspondence of concepts originating from different ontologies (i.e., equivalent, more-specific, meta-concept). A typology of these relations is given in the form of a hierarchy and the authors claim that an initial prototype has been used to map parts of the CyC ontology to EuroWordNet.
Kent (Kent 2000) proposed a framework for ontological structures to support ontology sharing. It is based on the Barwise-Seligman theory of information flow (Barwise and Seligman 1997). Kent argues that IFF represents the dynamism and stability of knowledge. The former refers to instance collections, their classification relations, and links between ontologies specified by ontological extension and synonymy (type equivalence); it is formalised with Barwise-Seligman’s local logics and their structure-preserving transformations—logic infomorphisms. Stability refers to concept/relation symbols and to constraints specified within ontologies; it is formalised with Barwise-Seligman’s regular theories and their structure-preserving transformations—theory interpretations. IFF represents ontologies as logics; and ontology sharing as a specifiable ontology extension hierarchy. An ontology, Kent continues, has a classification relation between instances and concept/relation symbols, and also has a set of constraints modelling the ontology’s semantics. In Kent’s proposed framework, a community ontology is the basic unit of ontology sharing; community ontologies share terminology and constraints through a common generic ontology that each extends, and these constraints are consensual agreements within those communities. Constraints in generic ontologies are also consensual agreements but across communities. We further examine Kent’s work in section 3.f, where we include a discussion on theoretical frameworks.
3.b Methods and tools
In this section we report on the FCA-Merge method for ontology merging (Stumme andMaedche 2001), the IF-Map method for ontologymapping (Kalfoglou and Schorlemmer 2002), the SMART, PROMPT and PROMPTDIFF tools for the Prot´eg´e ontology development environment from Noy and Musen, the Chimeara tool (McGuinness et al. 2000), the GLUE (Doan et al. 2002) and CAIMAN (Lacher and Groh 2001) systems, both of which use machine learning, the ITTalks web-based system (Prasad et al. 2002), the ONION system for resolving heterogeneity in ontologies (Mitra andWiederhold 2002), and ConcepTool for entity-relationship models (Compatangelo and Meisel 2002).
Stumme and Maedche (Stumme and Maedche 2001) presented the FCAMerge method for ontology merging. It is based on Ganter and Wille’s work on Formal Concept Analysis (Ganter and Wille 1999) and lattice exploration. The authors incorporate natural language techniques in FCA-Merge to derive a lattice of concepts. The lattice is then explored manually by a knowledge engineer who builds the merged ontology with semi-automatic guidance from FCA-Merge. In particular, FCA-Merge works as follows: the input to the method is a set of documents from which concepts and the ontologies to be merged are extracted. These documents should be representative of the domain at question and should be related to the ontologies. They also have to cover all concepts from both ontologies as well as separating them well enough. These strong assumptions have to be met in order to obtain good results from FCA-Merge. As this method relies heavily on the availability of classified instances in the ontologies to be merged, the authors argue that this will not be the case in most ontologies, the authors opt to extract instances from documents:
The extraction of instances from text documents circumvents the problem that in most applications there are no objects which are simultaneously instances of the source ontologies, and which could be used as a basis for identifying similar concepts.
In this respect, the first step of FCA-Merge could be viewed as an ontology population mechanism. This initial step could be skipped, though, if there are shared classified instances in both ontologies. Once the instances are extracted, and the concept lattice is derived, Stumme and Maedche use Formal Concept Analysis techniques to generate the formal context for each ontology. They use lexical analysis to perform, among other things, retrieval of domain-specific information:
It associates single words or composite expressions with a concept from the ontology if a corresponding entry in the domain-specific part of the lexicon exists.
Using this lexical analysis the authors associate complex expressions, like Hotel Schwarzer Adler with concept Hotel. Next, the two formal contexts are merged to generate a pruned concept lattice. This step involves disambiguation (since the two contexts may contain the same concepts) by means of indexing. The computation of the pruned concept lattice is done by an algorithm, TITANIC, which computes formal contexts via their key sets (or minimal generators). In terms of Formal Concept Analysis, the extents of concepts are not computed (these are the documents that they originate from, and are not needed for generating the merged ontology, the authors say), only the intents are taken into account (sets of concepts from the source ontologies). Finally, Stumme and Maedche do not compute the whole concept lattice,
. . . as it would provide too many too specific concepts. We restrict the computation to those formal concepts which are above at least one formal concept generated by an (ontology) concept of the source ontologies.
Having the pruned concept lattice generated, FCA-Merge enters its last phase, the non-automatic construction of the merged ontology, with human interaction. This construction is semi-automatic as it requires background knowledge about the domain. The engineer has to resolve possible conflicts and duplicates, but there is automatic support from FCA-Merge in terms of a query/answering mechanism, which aims to guide and focus the engineer’s attention on specific parts of the construction process. A number of heuristics are incorporated in this phase (like using the key sets of concepts for evidence of class membership), and the is a lattice is derived automatically.
Kalfoglou and Schorlemmer (Kalfoglou and Schorlemmer 2002) developed an automatic method for ontology mapping, IF-Map, based on the Barwise- Seligman theory of information flow (Barwise and Seligman 1997). Their method draws on the proven theoretical ground of Barwise and Seligman’s channel theory, and provides a systematic and mechanised way for deploying it on a distributed environment to perform ontology mapping among a variety of different ontologies. In Figure 2 we illustrate IF-Map’s underpinning framework for establishing mappings between ontologies. These mappings are formalised in terms of logic infomorphisms. We elaborate on these in Section 3.f.
Figure 2 clearly resembles Kent’s proposed two-step process for ontology sharing (see (Kent 2000) and Section 3.f), but it has differences in its implementation. The solid rectangular line surrounding Reference ontology, Local ontology 1 and Local ontology 2 denotes the existing ontologies. We assume that Local ontology 1 and Local ontology 2 are ontologies used by different communities and populated with their instances, while Reference ontology is an agreed understanding that favours the sharing of knowledge, and is not supposed to be populated. The dashed rectangular line surrounding Global ontology denotes an ontology that does not exist yet, but will be constructed ‘on the fly’ for the purpose of merging. This is similar to Kent’s virtual ontology of community connections (Kent 2000). The solid arrow lines linking Reference ontology with Local ontology 1 and Local ontology 2 denote information flowing between these ontologies and are formalised as logic infomorphisms. The dashed arrow lines denote the embedding from Local ontology 1 and Local ontology 2 into Global ontology. The latter is the sum of the local ontologies modulo Reference ontology and the generated logic infomorphisms.
In Figure 3 we illustrate the process of IF-Map. The authors built a step-wise process that consists of four major steps: (a) ontology harvesting, (b) translation, (c) infomorphism generation, and (d) display of results. In the ontology harvesting step, ontology acquisition is performed. They apply a variety of methods: using existing ontologies, downloading them from ontology libraries (for example, from the Ontolingua (Farquhar et al. 1997) or WebOnto (Domingue 1998) servers), editing them in ontology editors (for example, in Prot´eg´e (Grosso et al. 1999)), or harvesting them from the Web. This versatile ontology acquisition step results in a variety of ontology language formats, ranging from KIF (Genesereth and Fikes 1992) and Ontolingua to OCML (Motta 1999), RDF (Lassila and Swick 1999), Prolog, and native Prot´eg´e knowledge bases. This introduces the second step in their process, that of translation. The authors argue:
As we have declaratively specified the IF-Map method in Horn logic and execute it with the aim of a Prolog engine, we partially translate the above formats to Prolog clauses.
Although the translation step is automatic, the authors comment:
We found it practical to write our own translators. We did that to have a partial translation, customised for the purposes of ontology mapping. Furthermore, as it has been reported in a large-scale experiment with publicly available translators (Corrˆea da Silva et al. 2002), the Prolog code produced is not elegant or even executable.
The next step in their process is the main mapping mechanism—the IF-Map method. This step finds logic infomorphisms, if any, between the two ontologies under examination and displays them in RDF format. The authors provide a Java front-end to the Prolog-written IF-Map program so that it can be accessed from the Web, and they are in the process of writing a Java API to enable external calls to it from other systems. Finally, they also store the results in a knowledge base for future reference and maintenance reasons.
Noy and Musen have developed a series of tools over the past three years for performing ontology mapping, alignment and versioning. These tools are SMART (Noy and Musen 1999), PROMPT (Noy and Musen 2000), and PROMPTDIFF (Noy and Musen 2002). They are all available as a plugin for the open-source ontology editor, Prot´eg´e-2000 (Grosso et al. 1999). The tools use linguistic similarity matches between concepts for initiating the merging or alignment process, and then use the underlying ontological structures of the Prot´eg´e-2000 environment (classes, slots, facets) to inform a set of heuristics for identifying further matches between the ontologies. The authors distinguish in their work between the notions of merging and alignment, where merging is defined as
. . . the creation of a single coherent ontology and alignment as establishing links between [ontologies] and allowing the aligned ontologies to reuse information from one another.
The SMART tool is an algorithm that
. . . goes beyond class name matches and looks for linguistically similar class names, studies the structure of relations in the vicinity of recently merged concepts, and matches slot names and slot value types. . .
the authors describe. Some of the tasks for performing merging or alignment, like the initial linguistic similarity matches, can be outsourced and plugged into the PROMPT system by virtue of Prot´eg´e-2000’s open-source architecture. PROMPT is a (semi-)automatic tool and provides guidance for the engineer throughout the steps performed during merging or alignment:
Where an automatic decision is not possible, the algorithm guides the user to the places in the ontology where his intervention is necessary, suggests possible actions, and determines the conflicts in the ontology and proposes solutions for these conflicts.
Their latest tool, PROMPTDIFF, is an algorithm which integrates different heuristic matchers for comparing ontology versions. The authors combine these matchers in a fixed-point manner, using the results of one matcher as input for others until the matcher produces no more changes. PROMPTDIFF addresses structure-based comparison of ontologies as its comparisons are based on the ontology structure and not their text serialisation, the authors argue. Their algorithm works on two versions of the same ontology and is based on the empirical evidence that a large fraction of frames remains unchanged and that, if two frames have the same type and have the same or very similar name, one is almost certainly an image of the other. All Prot´eg´e-specific tools from Noy and Musen have been empirically evaluated in a number of experiments using the Prot´eg´e-2000 ontology editing environment. We present examples of them in Section 4.
McGuinness and colleagues (McGuinness et al. 2000) developed a similar tool for the Ontolingua editor. As in PROMPT, Chimaera, is an interactive tool, and the engineer is in charge of making decisions that will affect the merging process. Chimaera analyses the ontologies to be merged, and if linguistic matches are found, the merge is done automatically, otherwise the user is prompted for further action. When comparing it with PROMPT, these are quite similar in that they are embedded in ontology editing environments, but they differ in the suggestions they make to their users with regard to the merging steps.
Doan and colleagues (Doan et al. 2002) developed a system, GLUE, which employs machine learning techniques to find mappings. Given two ontologies, for each concept in one ontology, GLUE finds the most similar concept in the other ontology using probabilistic definitions of several practical similarity measures. The authors claim that this is their difference when comparing their work with other machine learning approaches, where only a single similarity measure is used. In addition to this, GLUE also
. . . uses multiple learning strategies, each of which exploits a different type of information either in the data instances or in the taxonomic structure of the ontologies. . .
the authors continue. The similarity measures they employ is the joint probability distribution of the concepts involved, so
. . . instead of committing to a particular definition of similarity, GLUE calculates the joint distribution of the concepts, and lets the application use the joint distribution to compute any suitable similarity measure.
GLUE uses a multi-learning strategy, the authors continue, because there are many different types of information a learner can glean from the training instances in order to make predictions. It can exploit the frequencies of words in the text value of instances, the instance names, the value formats, or the characteristics of value distributions. To cope with this diversity, the authors developed two learners, a content learner and a name learner. The former uses a text classification method, called Naive Bayes learning. The name learner is similar to the content learner but uses the full name of the instance instead of its content. They then developed a meta-learner that combines the predictions of the two learners. It assigns to each one of them a learner weight that indicates how much it trusts its predictions. The authors also used a technique, relaxation labelling, that assigns labels to nodes of a graph, given a set of constraints. This technique is based on the observation that the label of a node is typically influenced by the features of the node’s neighbourhood in the graph. The authors applied this technique to map two ontologies’ taxonomies, O1 to O2, by regarding concepts (nodes) in O2 as labels, and recasting the problem as finding the best label assignment to concepts (nodes) in O1, given all knowledge they have about the domain and the two taxonomies. That knowledge can include domain-independent constraints like ‘two nodes match if nodes in their neighbourhood also match’—where neighbourhood is defined to be the children, the parents or both—as well as domain-dependent constraints like ‘if node Y is a descendant of node X, and Y matches professor, then it is unlikely that X matches assistant-professor’. The system has been empirically evaluated with mapping two university courses catalogues.
Lacher and Groh (Lacher and Groh 2001) present CAIMAN, another system which uses machine-learning for ontology mapping. The authors elaborate on a scenario where members of a community would like to keep their own perspective on a community repository. They continue by arguing that
. . . each member in a community of interest organizes her documents according to her own categorization scheme (ontology).
This rather weak account of an ontology justifies, to a certain extent, the use of a user’s bookmark folder as a ‘personal’ ontology. The mapping task is then to align this ontology with the directory structure of CiteSeer5 (also known as ResearchIndex). The use of more formal community ontologies is not supported by the authors, who argue:
Information has to be indexed or categorized in a way that the user can understand and accepts. . . [This] could be achieved by enforcing a standard community ontology, by which all knowledge in the community is organized. However, due to loose coupling of members in a Community of Interest, this will not be possible.
Their mapping mechanism uses machine learning techniques for text classification, it measures the probability that two concepts are corresponding. For each concept node in the ‘personal’ ontology, a corresponding node in the community ontology is identified. It is also assumed that repositories both on the user and on the community side may store the actual documents, as well as links to the physical locations of the documents. CAIMAN is then offering two services to its users: document publication, which publishes documents that a user has newly assigned to one of the concept class to the corresponding community concept class, and retrieval of related documents, which delivers newly added documents from the community repository to the user.
Prasad and colleagues (Prasad et al. 2002) presented a mapping mechanism which uses text classification techniques as part of their web-based system for automatic notification of information technology talks (ITTalks). Their system
. . . combines the recently emerging semantic markup language DAML + OIL, the text-based classification technology (for similarity information collection), and Bayesian reasoning (for resolving uncertainty in similarity comparisons).
They experimented with two hierarchies: the ACM topic ontology and a small ITTalks topic ontology that organises classes of IT related talks in a way that is different from the ACM classification. The text classification technique they use generates scores between concepts in the two ontologies based on their associated exemplar documents. They then use Bayesian subsumption for subsumption checking:
If a foreign concept is partially matched with a majority of children of a concept, then this concept is a better mapping than (and thus subsumes) its children.
An alternative algorithm for subsumption checking, the authors continue, is to take a Bayesian approach that considers the best mapping being the concept that is the lowest in the hierarchy and the posterior probability greater than 0.5.
Mitra andWiederhold (Mitra andWiederhold 2002) developed the ONtology compositION system (ONION) which provides an articulation generator for resolving heterogeneity in different ontologies. The authors argue that ontology merging is inefficient:
A merging approach of creating an unified source is not scalable and is costly. . .One monolithic information source is not feasible due to unresolvable inconsistencies between them that are irrelevant to the application.
They then argue that semantic heterogeneity can be resolved by using articulation rules which express the relationship between two (or more) concepts belonging to the ontologies. Establishing such rules manually, the authors continue, is a very expensive and laborious task, on the other hand, they also claim that full automation is not feasible due to inadequacy of today’s natural language processing technology. So, they take into account relationships in defining their articulation rules, but these are limited to subclass of, part of, attribute of, instance of, and value of. They also elaborate on a generic relation for heuristic matches:
Original (English): Ontology Mapping: The State of the Art
Translation: © .
License: This work is supported under the Advanced Knowledge Technologies (AKT) Interdisciplinary Research Collaboration (IRC), which is sponsored by the UK Engineering and Physical Sciences Research Council under grant number GR/N15764/01. The AKT IRC comprises t
