scones Tagger

scones - Subject Concepts or Named EntitieS tagger Structured Dynamics' scones (Subject Concept Or Named EntitieS) tagger provides information extraction of domain-specific subject concepts and entities from unstructured text. It also provides disambiguation of this information based on the context of the source information.

scones is presently in prototype form. Please contact us for an individual demo.

The scones system uses a combination of heuristics, statistical methods and machine-learning algorithms to separately identify subject concepts and named entities within the target text. Then, using existing domain ontologies and entity dictionaries, the system further identifies and weights candidate extractions. Uniquely, the system also triangulates the extractions between concepts and entities to further aid the disambiguation task (identifying the correct entities or concepts).

scones Information Extraction Flow

The tagged information can be extracted and used in any of the formats supported by the structWSF Web services framework, including XML, CSV, various RDF serializations and JSON. As an option, if Web pages are the source, scones can also reinject the tagged information back into the Web page as RDFa.

Source content can be submitted as individual snippets, cut-and-pasted content, or entire documents or Web pages.

Optionally, scones can be integrated into a semi-automated workflow that also enables users or subject matter experts to make final tag determinations before writing to file.

In its standard baseline configuration, scones uses as references the UMBEL subject concepts ontology and entities from Wikipedia. In production use, these references are best supplemented with domain-specific ontologies for concepts and specific entity dictionaries relevant to the enterprise.

The scones system also includes methods for creating the specific entity dictionaries that are a valuable complement to the methodology.