Authorea

Authorea Admin Authorea admin: resolving inconsistent repository state almost 9 years ago

Commit id: 3c4ec88e6a5fa2576f14bcf6515d725ce183471f

deletions | additions

\section{\label{sec:process} Comparing \section{Comparing Contexts} The ``Marko knows" examples from the previous section are reused in this section to explain how dilated triples can assist a machine in discerning and comparing the broader meaning of a statement. In order to present this example, the notion of a contextualized process is introduced. A contextualized process, as defined here, is a human or machine that maintains a perspective or expectation of how the world must be. \begin{quote}

A simple way in which the process can make a distinction between the various interpretations of \ttt{foaf:knows} is to intersect its history with the context of the relationships. In other words, the process can compare its history subgraph with the subgraph that constitutes a dilated triple. If $H \subseteq R$ is a graph defining the history of the process which includes the process' traversal through the scholarly aspects of Marko, then it is the case that $| H \cap T_x | > | H \cap T_y |$ as the process' scholarly perspective is more related to Marko and Alberto than it is to Marko and Carole. That is, the process' history $H$ has more triples in common with $T_x$ than with $T_y$. Thus, what the process means by \texttt{foaf:knows} is a ``scholarly" \texttt{foaf:knows}. This idea is diagrammed in Figure 4, where $H$ has more in common with $T_x$ than with $T_y$, thus an intersection of these sets would yield a solution to the query (\texttt{lanl:marko}, \texttt{foaf:knows}, ?o) that included Alberto and not Carole. (Note: $H$ need not be a dynamic context that is generated as a process moves through an RDF graph. $H$ can also be seen as a static, hardwired ``expectation" of what the process should perceive. For instance, $H$ could include ontological triples and known instance triples. In such cases, querying for such relationships as \texttt{foaf:knows}, \texttt{foaf:fundedBy}, \texttt{foaf:memberOf}, etc. would yield results related to $H$ -- biasing the results towards those relationships that are most representative of the process' expectations.) In other words, the history of the process ``blinds" the process in favor of interpreting its place in the graph from the scholarly angle. (Note: This notion is sometimes regarded as a ``reality tunnel" \cite{nerosoc:wilson1979,prome:wilson1983}.) The trivial intersection method of identifying the degree of similarity between two graph structures can be extended. Other algorithms, such as those based on a spreading activation within a semantic graph \cite{spread:collins1975,inform:cohen1987,search:crestani2000} \cite{spread:collins1975,inform:cohen1987,search:crestani2000,grammar:rodriguez2008} can be used as a more fuzzy and probabilistic means of determining the relative ``semantic distance" between two graphs. graphs \cite{semdist:delugach1993}. Spreading activation methods are more analogous to the connectionist paradigm of cognitive science than the symbolic methods of artificial intelligence research \cite{rumelhart:conn1993}. The purpose of a spreading activation algorithm is to determine which resources in a semantic graph are most related to some other set of resources. In general, a spreading activation algorithm diffuses an energy distribution over a graph starting from a set of resources and proceeding until a predetermined number of steps have been taken or the energy decays to some $\epsilon \approx 0$. (Note: In many ways this is analagous to finding the primary eigenvector of the graph using the power method. However, the energy vector at time step $1$ only has values for the source resources, the energy vector is decayed on each iteration, and finally, only so many iterations are executed as a steady state distribution is not desired.) Those resources that received the most energy flow during the spreading activation process are considered the most similar to the set of source resources. With respect to the particular example at hand, the energy diffusion would start at the resources in $H$ and the results would be compared with resources of $T_x$ and $T_y$. If the set of resource in $T_x$ received more energy than those in $T_y$, then the dilated triple $T_x$ is considered more representative of the context of $H$. (Note: Spreading activation on a semantic graph is complicated as edges have labels. A framework that makes use of this fact to perform arbitrary path traversals through a semantic graph is presented in \cite{grammar:rodriguez2008}.) By taking advantage of the supplementary information contained within a dilated triple, a process has more information on which to base its interpretation of the meaning of a triple. To the process, a triple is not simply a string of three symbols, but instead is a larger knowledge structure which encapsulates the uniqueness of the relationship. The process can use this information to bias its traversal of the graph and thus, how it goes about discovering information in the graph.

\section{\label{sec:structure} Contextualizing \section{Contextualizing a Relationship} The dilated form of $x \in R$, denoted $T_x$, provides a knowledge structure that is suited to contextualizing the meaning of an assertion made about the world. For example, consider the asserted triple \begin{equation} x = (lanl:marko, foaf:knows, ucla:apepe)

Both $T_x$ and $T_y$ share the same predicate \ttt{foaf:knows}. However, what is meant by Marko knowing Alberto is much different than what is meant by Marko knowing his mother Carole (\ttt{cap:carole}). While, broadly speaking, it is true that Marko knows both Alberto and Carole, the context in which Marko knows Alberto is much different than the context in which Marko knows Carole. The supplementary triples that compose $T_y$ may be the RDF expression of: \begin{quote} ``Marko was born in Fairfield, California on November 30$^\text{th}$, 1979. Carole is Marko's mother. Marko's family lived in Riverside (California), Peachtree City (Georgia), Panama City (Panama), and Fairfax (Virginia). During his 10th $10^\text{th}$ grade high-school term, Marko moved with his family back to Fairfield, California." \end{quote} It is obvious from these two examples that \ttt{foaf:knows} can not sufficiently express the subtleties that exist between two people. People know each other in many different ways. There are family relationships, business relationships, scholarly relationships, and so on. It is true that these subtleties can be exposed when performing a deeper analysis of the graph surrounding a \ttt{foaf:knows} relationship as other paths will emerge that exist between people (e.g.~vacation paths, transaction paths, coauthorship paths, etc.). The purpose of a dilated triple is to contain these corroborating statements within the relationship itself. The purpose of $T_x$ is to identify those aspects of Marko and Alberto's ``knowing" relationship that make it unique (that provide it the most meaning). Similarly, the purpose of $T_y$ is to provide a finer representation of the context in which Marko knows his mother. The supplementary triples of $T_x$ and $T_y$ augment the meaning of \ttt{foaf:knows} and frame each respective triple $x$ and $y$ in a broader context. (Note: Examples of other predicates beyond \ttt{foaf:knows} also exist. For instance, suppose the predicates \texttt{foaf:member} and \texttt{foaf:fundedBy}. In what way is that individual a member of that group and how is that individual funded?)

\section{\label{sec:model}The Dilated Triple Model} A single predicate URI does not provide the appropriate degrees of freedom required when modeling the nuances of an RDF relationship. The model proposed in this article enhances the expressiveness of a triple such that its meaning is considered within a larger context as defined by a graph structure. It thereby provides a machine with the ability to discern the more fine-grained context in which a statement relates its subject and object. In the proposed model, every triple in an RDF graph is supplemented with other triples from the same RDF graph. The triple and its supplements form what is called a \textit{dilated triple}. Definition 1. A triple}.\footnote{The Oxford English dictionary provides two definitions for the word "dilate": "to expand" and "to speak or write at length". It will become clear through the remainder of this article that both definitions suffice to succinctly summarize the presented model.} %%% \begin{definition}[A Dilated Triple Triple] Given a set of triples $R$ and a triple $\tau \in R$, a dilation of $\tau$ is a set of triples $T_\tau \subset R$ such that $\tau \in T_\tau$. \end{definition} %%% The dilated form of $\tau \in R$ is $T_\tau$. Informally, $T_\tau$ servers to elaborate the meaning of $\tau$. Formally, $T_\tau$ is a graph that at minimum contains only $\tau$ and at maximum contains all triples in $R$. The set of all non-$\tau$ triples in $T_\tau$ (i.e.~$T_\tau \setminus \tau$) are called \textit{supplementary triples} as they serve to contextualize, or supplement, the meaning of $\tau$. Finally, it is worth noting that every supplemental triple in $T_\tau$ has an associated dilated form, so that $T_\tau$ can be considered a set of nested sets. sets.\footnote{The set of all dilated triples forms a \textit{dilated graph} denoted $\mca{T} = \bigcup_{\tau \in R} \{ T_\tau \}$.} An instance of $\tau$, its subject $s$, predicate $p$, object $o$, and its dilated form $T_\tau$, are diagrammed in Figure \ref{fig:dilatedtriple}. \ref{fig:dilated-triple}. %%% \begin{figure}[h!] \centering \includegraphics[width=0.35\textwidth]{images/dilated-triple} \caption{The dilated triple $T_\tau$.} \label{fig:dilated-triple} \end{figure} A dilated triple can be conveniently represented in RDF using a named graph \cite{named:carroll2005}. Statements using the named graph construct are not triples, but instead, are quads with the fourth component being denoted by a URI or blank node. Formally, $\tau = (s,p,o,g)$ and $g \in U \cup B$. The fourth component is considered the "graph" in which the triple is contained. Thus, multiple quads with the same fourth element are considered different triples in the same graph. Named graphs were developed as a more compact (in terms of space) way to reify a triple. The reification of a triple was originally presented in the specification of RDF with the \texttt{rdf:Statement} construct \cite{rdfcon:klyne2004}. RDF reification has historically been used to add specific metadata to a triple, such as provenance, pedigree, privacy, and copyright information. In this article, the purpose of reifying a triple is to supplement its meaning with those of additional triples. While it is possible to make additional statements about the dilated triple (i.e.~the named graph component $g$), the motivation behind the dilated triple is to encapsulate many triples within a single graph, not to make statements about the graph \textit{per se}. The following sections will further explain the way in which a dilated triple contextualizes the meaning of a statement. \S \ref{sec:structure} demonstrates, by means of an example, how supplementary triples augment the meaning of a relationship between two resources. \S \ref{sec:process} discusses how dilated triples can be compared and used by a machine to discern context.