The dilated triple

Abstract

This article was published as The dilated triple. Marko A. Rodriguez, Alberto Pepe, Joshua Shinavier. In: Emergent Web Intelligence: Advanced Semantic Technologies, Advanced Information and Knowledge Processing series, Pages 3-16, ISBN:978-1-84996-076-2, Springer-Verlag. 2010.

Abstract. The basic unit of meaning on the Semantic Web is the RDF statement, or triple, which combines a distinct subject, predicate and object to make a definite assertion about the world. A set of triples constitutes a graph, to which they give a collective meaning. It is upon this very simple foundation that the rich, complex knowledge structures of the Semantic Web are built. Yet the very expressivness of RDF, by inviting comparison with real-world knowledge, highlights a fundamental shortcoming of RDF: that it is limited to statements of absolute fact, in contrast to the thoroughly context-sensitive nature of human thought. However, when a statement is interpreted from beyond the scope of its local graph representation, other statements augment its meaning and identify its uniqueness. Following this line of thought, a model is presented in which each statement in an RDF graph is supplemented by some subjectively related subgraph of the same RDF graph, thereby framing the meaning of the statement within a broader context.

Introduction

The World Wide Web introduced a set of standards and protocols that has led to the development of a collectively generated graph of web resources. Individuals participate in creating this graph by contributing digital resources (e.g. documents, images, etc.) and linking them together by means of dereferenceable Hypertext Transfer Protocol (HTTP) Uniform Resource Identifiers (URI) (Berners-Lee 1994). While the World Wide Web is primarily a technology that came to fruition in the early nineties, much of the inspiration that drove the development of the World Wide Web was developed earlier with such systems as Vannevar Bush’s visionary Memex device (Bush 1945) and Ted Nelson’s Xanadu (Nelson 1981). What the World Wide Web provided that made it excel as the de facto standard was a common, relatively simple, distributed platform for the exchange of digital information. The World Wide Web has had such a strong impact on the processes of communication and cognition that it can be regarded as a revolution in the history of human thought – following those of language, writing and print (Harnad 1991).

While the World Wide Web has provided an infrastructure that has revolutionized the way in which many people go about their daily lives, over the years, it has become apparent that there are shortcomings to its design. Many of the standards developed for the World Wide Web lack a mechanism for representing “meaning” in a form that can be easily interpreted and used by machines. For instance, the majority of the Web is made up of a vast collection of Hypertext Markup Language (HTML) documents. HTML documents are structured such that a computer can discern the intended layout of the information contained within a document, but the content itself is expressed in natural language and thus, understandable only to humans. Furthermore, all HTML documents link web resources according to a single type of relationship. The meaning of a hypertext relationship can be loosely interpreted as “cites” or “related to”. The finer, specific meaning of this relationship is not made explicit in the link itself. In many cases, this finer meaning is made explicit within the HTML document. Unfortunately, without sophisticated text analysis algorithms, machines are not privy to the communication medium of humans. Yet, even within the single relationship model, machines have performed quite well in supporting humans as they use go about discovering and sharing information on the World Wide Web (Brin 1998, Kleinberg 1999, Haveliwala 2002, Golder 2006).

The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help. One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the web. (Berners-Lee 1998)

As a remedy to the aforementioned shortcoming of the World Wide Web, the Semantic Web initiative has introduced a standard data model which makes explicit the type of relationship that exists between two web resources (Berners-Lee 2001, Berners-Lee 2001a). Furthermore, the Linked Data community has not only seen a need to link existing web resources in meaningful ways, but also a need to link the large amounts of non-typical web data (e.g. database information) (Bizer 2008). The standard for relating web resources on the Semantic Web is the Resource Description Framework (RDF) (Berners-Lee 2001, Klyne 2004). RDF is a data model that is used to create graphs of the form $R \subseteq \underbrace{(U \cup B)}_\text{subject} \times \underbrace{U}_\text{predicate} \times \underbrace{(U \cup B \cup L)}_\text{object},$ where $$U$$ is the infinite set of all URIs (W3C/IETF 2001, Berners-Lee 2005), $$B$$ is the infinite set of all blank nodes, and $$L$$ is the infinite set of all literal values. An element in $$R$$ is known as a statement, or triple, and it is composed of a set of three elements: a subject, a predicate, and an object. A statement in RDF asserts a fact about the world.

“The basic intuition of model-theoretic semantics is that asserting a sentence makes a claim about the world: it is another way of saying that the world is, in fact, so arranged as to be an interpretation which makes the sentence true. In other words, an assertion amounts to stating a constraint on the possible ways the world might be.” (Hayes 2004)

An example RDF statement is (aureplacedverbatimaa , aureplacedverbatimaaa , aureplacedverbatimaaaa ). (Note: all resources in this article have been prefixed in order to shorten their lengthy namespaces. For example, aureplacedverbatimaaaaa , in its extended form, is aureplacedverbatimaaaaaa .) This statement makes a claim about the world: namely that “Marko knows Alberto”. The aureplacedverbatimaaaaaaa predicate defines the meaning of the link that connects the subject aureplacedverbatimaaaaaaaa to the object aureplacedverbatimaaaaaaaaa . On the World Wide Web, the only way that such semantics could be derived in a computationally efficient manner would be to note that in Marko’s webpage there exists an aureplacedverbatimaaaaaaaaaa link to Alberto’s webpage. While this web link does not necessarily mean that “Marko knows Alberto”, it is the simplest means, without text analysis techniques, to recognize that there exists a relationship between Marko and Alberto. Thus, for machines, the World Wide Web is a homogenous world of generic relationships. On the Semantic Web, the world is a rich, complicated network of meaningful relationships.

The evolution from the World Wide Web to the Semantic Web has brought greater meaning and expressiveness to our largest digital information repository (Hellman 1999). This explicit meaning provides machines a richer landscape for supporting humans in their information discovery and sharing activities. However, while links are typed on the Semantic Web, the meaning of the type is still primarily based on human interpretation. Granted this meaning is identified by a URI, however, for the machine, there exists no meaning, just symbols that it has been “hardwired” to handle (Uschold 2001).

“Machine usable content presumes that the machine knows what to do with information on the Web. One way for this to happen is for the machine to read and process a machine-sensible specification of the semantics of the information. This is a robust and very challenging approach, and largely beyond the current state of the art. A much simpler alternative is for the human Web application developers to hardwire the knowledge into the software so that when the machine runs the software, it does the correct thing with the information.” (Uschold 2001)

Because relationships among resources are only denoted by a URI, many issues arise around the notion of context. Context-senstive algorithms have been developed to deal with problems such as term disambiguation (Magnini 2003), naming conflicts (Tierney 2005), and ontology integration (Wache 2001, Udrea 2005). The cause of such problems is the fact that statements, by themselves, ignore the situatedness that defines the semantics of such assertions (Floridi 2007). Similar criticism directed towards the issue of semantics has appeared in other specialized literature (Sowa 1999, Woods 2004, Zadeh 2002). Sheth et. al. (Sheth 2005) have framed this issue well and have provided a clear distinction between the various levels of meaning on the Semantic Web.

• Implicit semantics reside within the minds of humans as a collective consensus and as such, are not explicitly recorded in some machine processable medium.

• Formal semantics are in a machine-readable format in the form of an ontology and are primarily used for human consumption and for machine hardwiring.

• Soft semantics are extracted from probabilistic and fuzzy reasoning mechanisms supporting degree of membership and certainty.

The model proposed in this article primarily falls within the domain of soft semantics. Simply put, the purpose of the model is to supplement a statement with other statements. These other statements, while being part of the RDF graph itself, serve to contextualize the original statement.

“Contextualization is a word first used in sociolinguistics to refer to the use of language and discourse to signal relevant aspects of an interactional or communicative situation.” (Wikipedia)

The supplementary statements serve to expose the relevant aspects of the interaction between a subject and an object that are tied by a relationship defined by a predicate. With respect to the example of the statement (aureplacedverbatimaaaaaaaaaaa , aureplacedverbatimaaaaaaaaaaaa , aureplacedverbatimaaaaaaaaaaaaa ), supplementary statements help to answer the question: “What do you mean by ‘Marko knows Alberto’?”. A notion from Ludwig Wittgenstein’s theory of “language games” can be aptly borrowed: the meaning of a concept is not universal and set in stone, but shaped by “a complicated network of similarities, overlapping and criss-crossing” (Wittgenstein 1973). Following this line of thought, this article purposes a “dilated” model of an RDF triple. The dilated triple contextualizes the meaning and enhances the expressiveness of assertions on the Semantic Web.