Linked Data FAQ

Linked data is the first practical expression of the semantic Web, useful and doable today, and applicable to all forms of data.

Sources such as the four principles of linked data in Tim Berners-Lee's Design Issues: Linked Data and the introductory statements on the linked data Wikipedia entry approximate -- but do not completely express -- an accepted or formal or official definition of linked data per se. Building from these sources and attempting to be more precise, here is the definition of linked data used internally by Structured Dynamics:

Linked data is a set of best practices for publishing and deploying instance and class data using the RDF data model, and uses uniform resource identifiers (URIs) to name the data objects. The approach exposes the data for access via the HTTP protocol, while emphasizing data interconnections, interrelationships and context useful to both humans and machine agents.

All references to linked data below embrace this definition.

Frequently Asked Questions

Listed below are some of the more prominent enterprise questions regarding linked data.

1. Does linked data require RDF?
2. Is publishing RDF sufficient to create linked data?
3. How does one publish or deploy linked data?
4. Is linked data just another term or branding for the Semantic Web?
5. Does linked data only apply to instance data?
6. What role do ontologies play with linked data?
7. Is linked data a centralized or federated approach?
8. How does one maintain context when federating linked data?
9. Does data need to be open to qualify as linked data?
10. Can legacy data be expressed as linked data?
11. Can enterprise and open or public data be intermixed as Linked Data?
12. How does one query or access linked data?
13. How is access control or security maintained around Linked Data?
14. What are the enterprise benefits of linked data? (Why adopt it?)
15. What are early applications or uses of linked data?
1. Does linked data require RDF?

Yes, as originally defined. Though other approaches can also model the first order predicate logic of subject-predicate-object at the core of the Resource Description Framework data model, RDF is the one based on the open standards of the W3C. RDF and FOL are powerful because of simplicity, ability to express complex schema and relationships, and suitability for modeling all extant data frameworks for unstructured, semi-structured and structured data.

2. Is publishing RDF sufficient to create linked data?

No. Linked data represents a set of techniques applied to the RDF data model that names all objects as URIs and makes them accessible via the HTTP protocol (as well as other considerations; see the definition above and further discussion below).

Some vendors and data providers claim linked data support, but if their data is not accessible via HTTP using URIs for data object identification, it is not linked data. Fortunately, it is relatively straightforward to convert non-compliant RDF to linked data.

3. How does one publish or deploy linked data?

There are some excellent references for how to publish linked data. Examples include a tutorial, How to Publish Linked Data on the Web, and a white paper, Deploying Linked Data, using the example of OpenLinks Virtuoso software. There are also recommended approaches and ways to use URI identifiers, such as the W3Cs working draft, Cool URIs for the Semantic Web.

However, there are not yet published guidelines for also how to meet the Structured Dynamics definition above where there is also an emphasis on class and context matching. A number of companies and consultants, including Structured Dynamics, presently provide such assistance.

The key principles, however, are to make links aggressively between data items with appropriate semantics (properties or relations; that is, the predicate edges between the subject and object nodes of the triple) using URIs for the object identifiers, all being exposed and accessible via the HTTP Web protocol.

4. Is linked data just another term or branding for the Semantic Web?

Absolutely not, though this is a source of some confusion at present.

The Semantic Web is probably best understood as a vision or goal where semantically rich annotation of data is used by machine agents to make connections, find information or do things automatically in the background on behalf of humans. We are on a path toward this vision or goal, but under this interpretation the Semantic Web is more of a process than a state. By understanding that the Semantic Web is a vision or goal we can see why a label such as 'Web 3.0' is perhaps simplistic and incomplete.

Linked data is a set of practices somewhere in the early middle of the spectrum from the initial Web of documents to this vision of the Semantic Web. (See this related blog post at bottom for a diagram of this spectrum.)

Linked data is here today, doable today, and pragmatic today. Meaningful semantic connections can be made and there are many other manifest benefits (see below) with linked data, but automatic reasoning in the background or autonomic behavior is not yet one of them.

Strictly speaking, then, linked data represents doable best practices today within the context both of Web access and of this yet unrealized longer-term vision of the Semantic Web.

5. Does linked data only apply to instance data?

Definitely not, though early practice has been interpreted by some as such.

Linked data requires the interplay and intersection of people, instances and schema. Early exposed linked data has been dominated by instance data from sources such as Wikipedia and have lacked the schema (class) relationships that enterprises are based upon. The people aspect in terms of connections, collaboration and joint buy-in is also the means for establishing trust and authority to the data.

In Structured Dynamics' terminology, class-level mappings 'explode the domain' and produce information benefits similar to Metcalfe's Law as a function of the degree of class linkages [1]. While this network effect is well known to the community, it has not yet been shown much in current linked data sets. Schema define enterprise processes and knowledge structures. Demonstrating schema (class) relationships is the next appropriate task for the linked data community.

6. What role do ontologies play with linked data?

In an RDF context, ontologies are the vocabularies and structures that capture the schema structures noted above. Ontologies embody the class and instance definitions and the predicate (property) relations that enable legacy schemas and data to be transformed into linked data graphs.

Though many public RDF vocabularies and ontologies presently exist, and should be re-used where possible and where the semantics match the existing legacy information, enterprises will require specific ontologies reflective of their own data and information relationships.

Despite the newness or intimidation perhaps associated with the ontology term, ontologies are no more complex -- indeed, are simpler and more powerful -- than the standard relational schema familiar to enterprises. If you'd like, simply substitute schema for ontology and you will be saying the same thing in an RDF context.

7. Is linked data a centralized or federated approach?

Neither, really, though the rationale and justification for linked data is grounded in federating widely disparate sources of data that can also vary widely in existing formalism and structure.

Because linked data is a set of techniques and best practices for expressing, exposing and publishing data, it can easily be applied to either centralized or federated circumstances.

However, the real world where any and all potentially relevant data can be interconnected is by definition a varied, distributed, and therefore federated world. Because of its universal RDF data model and Web-based techniques for data expression and access, linked data is the perfect vehicle, finally, for data integration and interoperability without boundaries.

8. How does one maintain context when federating linked data?

The simple case is where two data sources refer to the exact same entity or instance (individual) with the same identity. The standard sameAs predicate is used to assert the equivalence in such cases.

The more important case is where the data sources are about similar subjects or concepts, in which case a structure of well-defined reference classes is employed. Furthermore, if these classes can themselves be expressed in a graph structure capturing the relationships amongst the concepts, we now have some fixed points in the conceptual information space for relating and tieing together disparate data. Still further, such a conceptual structure also provides the means to relate the people, places, things, organizations, events, etc., of the individual instances of the world to one another as well.

Any reference structure that is composed of concept classes that are properly related to each other may provide this referential glue or backbone.

One such structure provided in open source by Structured Dynamics is the 28,000 subject concept node structure of UMBEL, itself derived from the OpenCyc knowledge base. In any event, such broad reference structures may often be accompanied by more specific domain conceptual ontologies to provide focused domain-specific context.

9. Does data need to be open to qualify as linked data?

No, absolutely not.

While, to date, it is the case that linked data has been demonstrated using public Web data and many desire to expose more through the open data movement, there is nothing preventing private, proprietary or subscription data from being Linked Data.

The Linking Open Data (LOD) group originally formed to showcase linked data techniques began with open data. As a parallel concept to sever the idea that it only applies to open data, François-Paul Servant has specifically identified Linking Enterprise Data (and see also the accompanying slides).

For example, with linked data (and not the more restrictive LOD sense), two or more enterprises or private parties can legitimately exchange private linked data over a private network using HTTP. As another example, linked data may be exchanged on an intranet between different departments, etc.

So long as the principles of URI naming, HTTP access, and linking predicates where possible are maintained, the approach qualifies as linked data.

10. Can legacy data be expressed as linked data?

Absolutely yes, without reservation. Indeed, non-transactional legacy data perhaps should be expressed as linked data in order to gain its manifest benefits. See #14 below.

11. Can enterprise and open or public data be intermixed as linked data?

Of course. Since linked data can be applied to any data formalism, source or schema, it is perfectly suited to integrating data from inside and outside the firewall, open or private.

12. How does one query or access linked data?

The basic query language for linked data is SPARQL (pronounced "sparkle"), which bears close resemblance to SQL only applicable to an RDF data graph. The actual datastores applied to RDF may also add a fourth aspect to the tuple for graph namespaces, which can bring access and scale efficiencies. In these cases, the system is known as a 'quad store'. Additional techniques may be added to data filtering prior to the SPARQL query for further efficiencies.

Templated SPARQL queries and other techniques can lead to very efficient and rapid deployment of various Web services and reports, two techniques often applied by Structured Dynamics and other vendors. For example, UMBEL Web services are expressed using such SPARQL templates.

This SPARQL templating approach may also be combined with the use of templating standards such as Fresnel to bind instance data to display templates.

13. How is access control or security maintained around linked data?

In Structured Dynamics' view, access control or security occurs at the layer of the HTTP access and protocols, and not at the linked data layer. Thus, the same policies and procedures that have been developed for general Web access and security are applicable to linked data.

However, standard data level or Web server access and security can be enhanced by the choice of the system hosting the data. Structured Dynamics, for example, uses OpenLink's Virtuoso universal server that has proven and robust security mechanisms. Additionally, it is possible to express security and access policies using RDF ontologies as well. These potentials are largely independent of linked data techniques.

The key point is that there is nothing unique or inherent to linked data with respect to access or control or security that is not inherent with standard Web access. If a given link points to a data object from a source that has limited or controlled access, its results will not appear in the final results graph for those users subject to access restrictions.

14. What are the enterprise benefits of linked data? (Why adopt it?)

For more than 30 years -- since the widespread adoption of electronic information systems by enterprises -- the Holy Grail has been complete, integrated access to all data. With linked data, that promise is now at hand. Here are some of the key enterprise benefits to linked data, which provide the rationales for adoption:

  • Via the RDF model, equal applicability to unstructured, semi-structured, and structured data and content
  • Elimination of internal data 'silos'
  • Integration of internal and external data
  • Easy interlinkage of enterprise, industry-standard, open public and public subscription data
  • Complete data modeling of any legacy schema
  • Flexible and easy updates and changes to existing schema
  • An end to the need to re-architect legacy schema resulting from changes to the business or M & A
  • Report creation and data display based on templates and queries, not requiring manual crafting
  • Data access, analysis and manipulation pushed out to the user level, and,
  • The ability of internal linked data stores to be maintained by existing DBA procedures and assets.
15. What are early applications or uses of linked data?

Linked data is well suited to traditional knowledge base or knowledge management applications. Its near-term application to transactional or material process applications is less apparent.

Of special use is the value-added from connecting existing internal and external content via the network effect from the linkages [1].



[1] Metcalfe's law states that the value of a telecommunications network is proportional to the square of the number of users of the system (n²), where the linkages between users (nodes) exist by definition. For information bases, the data objects are the nodes. Linked data works to add the connections between the nodes. We can thus modify the original sense to become the Linked Data Law: the value of a linked data network is proportional to the square of the number of links between the data objects.