Alberto Pepe edited introduction.tex  about 11 years ago

Commit id: f3c95c53d98f5717c8f15f0fec85a33e6532bb79

deletions | additions      

       

\section{Introduction} The World Wide Web introduced a set of standards and protocols that has led to the development of a collectively generated graph of web resources. Individuals participate in creating this graph by contributing digital resources (e.g.~documents, images, etc.) and linking them together by means of dereferenceable Hypertext Transfer Protocol (HTTP) Uniform Resource Identifiers (URI) \cite{lee94}. While the World Wide Web is primarily a technology that came to fruition in the early nineties, much of the inspiration that drove the development of the World Wide Web was developed earlier with such systems as Vannevar Bush's visionary Memex device \cite{vbush} and Ted Nelson's Xanadu \cite{nelsonht}. What the World Wide Web provided that made it excel as the \textit{de facto} standard was a common, relatively simple, distributed platform for the exchange of digital information. The World Wide Web has had such a strong impact on the processes of communication and cognition that it can be regarded as a revolution in the history of human thought -- following those of language, writing and print \cite{harnad1991}. While the World Wide Web has provided an infrastructure that has revolutionized the way in which many people go about their daily lives, over the years, it has become apparent that there are shortcomings to its design. Many of the standards developed for the World Wide Web lack a mechanism for representing "meaning" in a form that can be easily interpreted and used by machines. For instance, the majority of the Web is made up of a vast collection of Hypertext Markup Language (HTML) documents. HTML documents are structured such that a computer can discern the intended layout of the information contained within a document, but the content itself is expressed in natural language and thus, understandable only to humans. Furthermore, all HTML documents link web resources according to a single type of relationship. The meaning of a hypertext relationship can be loosely interpreted as "cites" or "related to". The finer, specific meaning of this relationship is not made explicit in the link itself. In many cases, this finer meaning is made explicit within the HTML document. Unfortunately, without sophisticated text analysis algorithms, machines are not privy to the communication medium of humans. Yet, even within the single relationship model, machines have performed quite well in supporting humans as they use go about discovering and sharing information on the World Wide Web \cite{anatom:brin1998,hits:kleinberg1999,topic:haveliwala2002,tagging:hub2006}. %\begin{footnotesize} \begin{quote} The Web was designed as an information space, with the goal that it should be useful not only for human-human communication, but also that machines would be able to participate and help. One of the major obstacles to this has been the fact that most information on the Web is designed for human consumption, and even if it was derived from a database with well defined meanings (in at least some terms) for its columns, that the structure of the data is not evident to a robot browsing the web. \cite{berners:roadmap1998} \end{quote} %\end{footnotesize} As a remedy to the aforementioned shortcoming of the World Wide Web, the Semantic Web initiative has introduced a standard data model which makes explicit the type of relationship that exists between two web resources \cite{lee:semantic2001,pubsem:lee2001}. Furthermore, the Linked Data community has not only seen a need to link existing web resources in meaningful ways, but also a need to link the large amounts of non-typical web data (e.g.~database information) \cite{linkeddata:bizer2008}.\footnote{The necessity to expose large amounts of data on the Semantic Web has driven the development of triple-store technology. Advanced triple-store technology parallels relational database technologies by providing an efficient medium for the storage and querying of semantic graphs \cite{lee:triple2004,owlim:kiryakov2005,agraph:aasman2006}.} The standard for relating web resources on the Semantic Web is the Resource Description Framework (RDF) \cite{lee:semantic2001,rdfcon:klyne2004}. RDF is a data model\footnote{RDF is a data model, not a serialization format. There exist various standard serialization formats such as RDF/XML, N3 \cite{n3:lee1998}, Turtle \cite{turtle:beckette2006}, Trix \cite{trix:carroll2004}, etc.} that is used to create graphs of the form %%% \begin{equation} R \subseteq \underbrace{(U \cup B)}_\text{subject} \times \underbrace{U}_\text{predicate} \times \underbrace{(U \cup B \cup L)}_\text{object}, \end{equation} %%% where $U$ is the infinite set of all URIs \cite{uri:2001,uri:berners2005}, $B$ is the infinite set of all blank nodes, and $L$ is the infinite set of all literal values.\footnote{Other formalisms exist for representing an RDF graph such as the directed labeled graph, bipartite graph \cite{hayes:birdf2004}, and directed hypergraph models \cite{hyperrdf:2006}.} An element in $R$ is known as a statement, or triple, and it is composed of a set of three elements: a subject, a predicate, and an object. A statement in RDF asserts a fact about the world. %\begin{footnotesize} \begin{quote} "The basic intuition of model-theoretic semantics is that asserting a sentence makes a claim about the world: it is another way of saying that the world is, in fact, so arranged as to be an interpretation which makes the sentence true. In other words, an assertion amounts to stating a constraint on the possible ways the world might be." \cite{rdfsem:hayes2004} \end{quote} %\end{footnotesize} %%% An example RDF statement is $(\texttt{lanl:marko}, (\texttt{lanl:marko},  \texttt{foaf:knows}, \texttt{ucla:apepe})$.\footnote{All \texttt{ucla:apepe}).\footnote{All  resources in this article have been prefixed in order to shorten their lengthy namespaces. For example, \texttt{foaf:knows}, in its extended form, is \texttt{http://xmlns.com/foaf/0.1/knows}.} This statement makes a claim about the world: namely that "Marko knows Alberto". The \texttt{foaf:knows} predicate defines the meaning of the link that connects the subject \texttt{lanl:marko} to the object \texttt{ucla:apepe}. On the World Wide Web, the only way that such semantics could be derived in a computationally efficient manner would be to note that in Marko's webpage there exists an \texttt{href} link to Alberto's webpage. While this web link does not necessarily mean that "Marko knows Alberto", it is the simplest means, without text analysis techniques, to recognize that there exists a relationship between Marko and Alberto. Thus, for machines, the World Wide Web is a homogenous world of generic relationships. On the Semantic Web, the world is a rich, complicated network of meaningful relationships. The evolution from the World Wide Web to the Semantic Web has brought greater meaning and expressiveness to our largest digital information repository \cite{sem:hellman1999}. This explicit meaning provides machines a richer landscape for supporting humans in their information discovery and sharing activities. However, while links are typed on the Semantic Web, the meaning of the type is still primarily based on human interpretation. Granted this meaning is identified by a URI, however, for the machine, there exists no meaning, just symbols that it has been "hardwired" to handle \cite{uschold:sem2001}. %\begin{footnotesize} \begin{quote} "Machine usable content presumes that the machine knows what to do with information on the Web. One way for this to happen is for the machine to read and process a machine-sensible specification of the semantics of the information. This is a robust and very challenging approach, and largely beyond the current state of the art. A much simpler alternative is for the human Web application developers to hardwire the knowledge into the software so that when the machine runs the software, it does the correct thing with the information." \cite{uschold:sem2001} \end{quote} %\end{footnotesize} %%% Because relationships among resources are only denoted by a URI, many issues arise around the notion of \textit{context}. Context-senstive algorithms have been developed to deal with problems such as term disambiguation \cite{schema:magnini2003}, naming conflicts \cite{context:tierney2005}, and ontology integration \cite{wache:onto2001,graphont:udrea2005}. The cause of such problems is the fact that statements, by themselves, ignore the situatedness that defines the semantics of such assertions \cite{Floridi2007Web-2.0}. Similar criticism directed towards the issue of semantics has appeared in other specialized literature \cite{know:sowa1999,woods2004,zadeh2002}. Sheth et. al. \cite{sheth:implicit2005} have framed this issue well and have provided a clear distinction between the various levels of meaning on the Semantic Web.\footnote{A similar presentation is also presented in \cite{uschold:sem2001}.} %%% \begin{itemize} \item \textit{Implicit} semantics reside within the minds of humans as a collective consensus and as such, are not explicitly recorded in some machine processable medium. \item \textit{Formal} semantics are in a machine-readable format in the form of an ontology and are primarily used for human consumption and for machine hardwiring. \item \textit{Soft} semantics are extracted from probabilistic and fuzzy reasoning mechanisms supporting degree of membership and certainty. \end{itemize} The model proposed in this article primarily falls within the domain of soft semantics. Simply put, the purpose of the model is to supplement a statement with other statements. These other statements, while being part of the RDF graph itself, serve to contextualize the original statement. %\begin{footnotesize} \begin{quote} "Contextualization is a word first used in sociolinguistics to refer to the use of language and discourse to signal relevant aspects of an interactional or communicative situation."\footnote{Wikipedia (\texttt{http://en.wikipedia.org/wiki/Contextualization}).} \end{quote} %\end{footnotesize} %%% The supplementary statements serve to expose the relevant aspects of the interaction between a subject and an object that are tied by a relationship defined by a predicate. With respect to the example of the statement $(\texttt{lanl:marko}, \texttt{foaf:knows}, \texttt{ucla:apepe})$, supplementary statements help to answer the question: "What do you mean by `Marko knows Alberto'?". A notion from Ludwig Wittgenstein's theory of "language games" can be aptly borrowed: the meaning of a concept is not universal and set in stone, but shaped by "a complicated network of similarities, overlapping and criss-crossing" \cite{witten:pi1973}. Following this line of thought, this article purposes a "dilated" model of an RDF triple. The dilated triple contextualizes the meaning and enhances the expressiveness of assertions on the Semantic Web.