New, Major Upgrade of UMBEL Released: Version 1.50
CORALVILLE, IA, May 10, 2016 -- Structured Dynamics today released version 1.50 of UMBEL, which fully embraces a typology design and gets other computability improvements. According to Michael Bergman, UMBEL's co-editor, "the potential cleavage in UMBEL's module design caused us to stand back and question whether it [the module design] was the best approach. Ultimately, after much thought and testing, we adopted instead a typology design that brought additional benefits beyond simply modularity."
UMBEL (Upper-level Mapping and Binding Exchange Layer) is a knowledge graph and vocabulary for interoperating Web-accessible information. 90% of UMBEL's 34,000 reference concepts are entity classes that are organized into 31 mostly disjoint "SuperTypes". According to Bergman, taking advantage of this typing structure was the key to the new UMBEL design.
The Typology Design
As Bergman stated in a blog posting announcing the release, each SuperType could become its own "module", with its own boundaries and hierarchical structure. If these are properly organized, UMBEL can achieve a maximum of disjointness, modularity, and reasoning efficiency. Earlier experience with the use of modules in UMBEL pointed the way to a design for each SuperType that was as distinct and disjoint from other STs as possible. According to Bergman, "The design is effective for being able to interoperate across both fine-grained and coarse-grained datasets. For specific domains, the same design approach allows even finer-grained domain concepts to be effectively integrated," he said.
All entity classes within a given SuperType are organized under the SuperType itself as the root. The classes within that ST are then organized hierarchically, with children classes having a subClassOf relation to their parent. Each class within the typology can become a tie-in point for external information, providing a collapsible and expandable scaffolding (the 'accordion' design). Via inferencing, multiple external sources may be related to the same typology, even though at different levels of specificity. Further, very detailed class structures can also be accommodated in this design for domain-specific purposes. Moreover, because of the single tie-in point for each typology at its root, it is also possible to swap out entire typology structures at once, should design needs require this flexibility. According to Fred Giasson, UMBEL's other co-editor, "The design also dovetails nicely with UMBEL's build and testing scripts. Indeed, the evolution of these scripts via literate programming has also been a reinforcing driver for being able to test and refine the complete ST and typologies structure."
Summary of Version 1.50 Changes
These are the principal changes between the last public release, version 1.20, and this version 1.50. In summary, these changes include:
- Removed all instance or individual listings from UMBEL; this change does NOT affect the punning used in UMBEL's design (see Metamodeling in Domain Ontologies)
- Re-aligned the SuperTypes to better support computability of the UMBEL graph and its resulting disjointedness
- These SuperTypes were eliminated with concepts re-assigned: Earthscape, Extraterrestrial, Notations and Numbers
- These new SuperTypes were introduced: AreaRegion, AtomsElements, BiologicalProcesses, Forms, LocationPlaces, and OrganicChemistry, with logically reasoned assignments of RefConcepts
- The Shapes SuperType is a new ST that is inherently non-disjoint because it is shared with about half of the RefConcepts
- The Situations is an important ST, overlooked in prior efforts, that helps better establish context for Activities and Events
- Made re-alignments in UMBEL's upper structure and introduced additional upper-level categories to better accommodate these refinements in SuperTypes
- A typology was created for each of the resulting 31 disjoint STs, which enabled missing concepts to be identified and added and to better organize the concepts within each given ST
- The broad adoption of the typology design for all of the (disjoint) SuperTypes also meant that prior module efforts, specifically Geo and Attributes, could now be made general to all of UMBEL. This re-integration also enabled us to retire these older modules without affecting functionality
- The tests and refinements necessary to derive this design caused us to create flexible build and testing scripts, documented via literate programming (using Clojure)
- Updated all mappings to DBpedia, Wikipedia, and schema.org
- Incorporated donated mappings to five additionial LOV vocabularies
- Tested the UMBEL structure for consistency and coherence
- Updated all prior UMBEL documentation
- Expanded and updated the UMBEL.org Web site, with access and demos of UMBEL.
The re-organizations noted above have resulted in some minor changes to the SuperTypes and how they are organized. According to Bergman, these changes have made UMBEL more computable with a higher degree of disjointedness between SuperTypes. UMBEL thus now has 31 largely disjoint SuperTypes, organized into 10 or so clusters or "dimensions":
||Area or Region|
||Location or Place|
||Atoms and Elements|
||Protists & Fungus|
||Food or Drink|
||Finance & Economy|
These disjoint SuperTypes provide the basis for the typology design.
UMBEL has two broad purposes. UMBEL’s first purpose is to provide a general vocabulary of classes and predicates for describing and mapping domain ontologies, with the specific aim of promoting interoperability with external datasets and domains. UMBEL’s second purpose is to provide a coherent framework of reference subjects and topics for grounding relevant Web-accessible content. UMBEL presently has about 34,000 of these reference concepts drawn from the Cyc knowledge base, organized into 31 mostly disjoint SuperTypes.
The grounding of information mapped by UMBEL occurs by common reference to the permanent URIs (identifiers) for UMBEL's concepts. The connections within the UMBEL upper ontology enable concepts from sources at different levels of abstraction or specificity to be logically related. Since UMBEL is an open source extract of the OpenCyc knowledge base, it can also take advantage of the reasoning capabilities within Cyc.
UMBEL’s vocabulary is designed to recognize that different sources of information have different contexts and different structures, and meaningful connections between sources are not always exact. UMBEL’s 34,000 reference concepts form a knowledge graph of subject nodes that may be related to external classes and individuals (instances and named entities). Via this coherent structure, we gain some important benefits:
- Mapping to other ontologies — disparate and heterogeneous datasets and ontologies may be related to one another by mapping to the UMBEL structure
- A scaffolding for domain ontologies — more specific domain ontologies can be made interoperable by using the UMBEL vocabulary and tieing their more general concepts into the UMBEL structure
- Inferencing — the UMBEL reference concept structure is designed for inferencing, which supports better semantic search and look-ups
- Semantic tagging — UMBEL, and ontologies mapped to it, can be used as input bases to ontology-based information extraction (OBIE) for tagging text or documents; UMBEL’s “semsets” broaden these matches and can be used across languages
- Linked data mining — via the reference ontology, direct and related concepts may be retrieved and mined and then related to one another
- Creating computable knolwedge bases — with complete mappings to key portions of a knowledge base, say, for Wikipedia articles, it is possible to use the UMBEL graph structure to create a computable knowledge source, with follow-on benefits in artificial intelligence and KB testing and improvements, and
- Categorizing instances and named entities — UMBEL can bring a consistent framework for typing entities and relating their descriptive attributes to one another.
UMBEL is written in the semantic Web languages of SKOS and OWL 2. It is a class struure used in linked data, along with other reference ontologies. Besides data integration, UMBEL has been used to aid concept search, concept definitions, query ranking, ontology integration, and ontology consistency checking. It has also been used to build large ontologies and for online question answering systems.
Including OpenCyc, UMBEL has about 64,000 formal mappings to DBpedia, PROTON, GeoNames, and schema.org, and provides linkages to more than 2 million Wikipedia pages (English version). All of its reference concepts and mappings are organized under a hierarchy of 31 different SuperTypes, which are mostly disjoint from one another. Development of UMBEL began in 2006. UMBEL was first released in July 2008. Version 1.00 was released in February 2011.
Where to Get UMBEL and Learn More
The UMBEL Web site provides various online tools and Web services for exploring and using UMBEL. The UMBEL GitHub site is where you can download the UMBEL Vocabulary or the UMBEL Reference Concept ontology, both under a Creative Commons Attribution 3.0 license. Other documents and backup are also available from that location.
Technical specifications for UMBEL and its various annexes are available from the UMBEL wiki site. You can also download a PDF version of the specifications from there. You are also welcomed to participate on the UMBEL mailing list or LinkedIn group.
Structured Dynamics LLC
Email: mike at structureddynamics dot com