What is a Knowledge Graph?
Google introduced its Knowledge Graph project in 2012, and has used it to improve query result relevancy and their overall search experience. They have leveraged existing knowledge graphs, such as DBpedia and Freebase, and also have opened up the process of contributing to the graph by ingesting RDFa and microdata formats from the Web pages they index, based on the vocabularies published by schema.org. The success of the Google Knowledge Graph, and its use of semantic technologies, has led to a resurgence in the use of the term in semantic research to describe similar projects. However, the term “knowledge graph” remains underspecified, and in many cases, simply refers to any directed labeled graph. We surveyed and synthesized current literature on knowledge graphs and the historical use of the term. The pre-Semantic Web conceptualization of knowledge graphs provides us with guidance as to what might currently “count” as a knowledge graph and also describes capabilities that do not yet exist in current knowledge graphs. From this synthesis, we propose an updated definition along with a set of knowledge graph requirements We include an implicit requirement: that knowledge graphs represent knowledge, as opposed to bare assertions with no justification or provenance. We discuss how knowledge graphs as defined are a crucial component of the future of the Web and have great potential for transformational change in data science and domain sciences.
Knowledge graphs provide an opportunity to expand our understanding of how knowledge can be managed on the Web and how that knowledge can be distinguished from more conventional Web-based data publication schemes such as Linked Data (Bizer 2009). In recent years knowledge graphs have grown increasingly prominent through commercial and research applications on the Web. Google was one of the first to promote a semantic metadata organizational model described as a “knowledge graph,” (Singhal 2012) and many other organizations have since used the term in the literature and in less formal communication. Our purpose with this paper is to provide an explicit description of the evolving notion of a knowledge graph, and further to lay out a potential impact spectrum. We review recent formal definition of knowledge graphs, knowledge graph analysis and construction algorithms, and popular commercial and research knowledge graphs in the literature. These new knowledge graphs do not strictly adhere to original knowledge graph theory (van de Riet 1992), but instead have followed a looser, more flexible definition. We present a more descriptive view of current, practical knowledge graphs, and discuss their potential for evolution and impact.
Rospocher, et al. present knowledge graphs as collections of facts about entities, typically derived from structured data sources such as Freebase and (Rospocher 2016). They cite a dearth of event representations in current knowledge graphs as a shortcoming - limiting knowledge graphs to encyclopedic items such as birth and death dates - primarily due to the difficulty of obtaining temporal data about entities in a structured manner. Recent surveys such as those by Hogenboom, et al. (Hogenboom 2016) and Deng, et al. (Deng 2015) provide overviews of numerous methods for event extraction from a variety of sources including social media, news, academic publications, and even images and video, indicating that there is a great interest in finding ways to interpret and include such temporal data in a more structured format. Another review by Nickel et al. explores machine learning methods for knowledge graphs, but limits their definition to directed labeled graphs, with the ability to optionally pre-define the schema. They also review but do not take a position on the use of the closed versus open world assumptions.
van de Riet and Meersman (van de Riet 1992), Stokman and de Vries (Stokman 1988), and Zhang (Zhang 2002), present a formal theory of knowledge graphs as a specialization of semantic networks where meaning is expressed as structure, statements are unambiguous, and a limited set of relation types are used. These requirements also minimize redundancy within the knowledge graph, which simplifies analytical operations (including reasoning and queries). Popping explores the use of knowledge graphs and their challenges at the time in their use in network text analysis (Popping 2003). Following Zhang, Popping defines the knowledge graph as a type of semantic network that uses only a few types of relations, but also asserts that additional knowledge may be added to the graph.