scones Tagger
Structured Dynamics' scones
(Subject Concept Or Named
EntitieS) tagger provides information extraction of
domain-specific subject concepts and entities from
unstructured text. It also provides disambiguation of this
information based on the context of the source
information.
The scones
system uses a combination of heuristics, statistical methods
and machine-learning algorithms to separately identify
subject concepts and named entities within the target text.
Then, using existing domain ontologies and entity
dictionaries, the system further identifies and weights
candidate extractions. Uniquely, the system also triangulates
the extractions between concepts and entities to further aid
the disambiguation task (identifying the correct entities or
concepts).
The tagged information can be extracted and used in any of
the formats supported by the structWSF Web services
framework, including XML, CSV, various RDF serializations and
JSON. As an option, if Web pages are the source, scones
can also reinject the tagged information back into the Web
page as RDFa.
Source content can be submitted as individual snippets,
cut-and-pasted content, or entire documents or Web pages.
Optionally, scones
can be integrated into a semi-automated workflow that also
enables users or subject matter experts to make final tag
determinations before writing to file.
In its standard baseline configuration, scones
uses as references the UMBEL
subject concepts ontology and entities from Wikipedia. In production use,
these references are best supplemented with domain-specific
ontologies for concepts and specific entity dictionaries
relevant to the enterprise.
The scones
system also includes methods for creating the specific entity
dictionaries that are a valuable complement to the
methodology.
