Knowledge as Graphs; Graphs as Knowledge

The word “graph” is somewhat overloaded and can be ambiguous depending on who you’re talking to. In mathematics a graph is often a visualization of independent vs dependent variables (Figure 1). In data analytics, a graph is often also called a chart and represents a summary visualization of some large dataset to reduce cognitive load (Figure 2). In computer science, graphs typically involve the storage of relational knowledge and the theories surrounding those relations (Figure 3). In this article, we’ll briefly explore knowledge-oriented graph types and common uses in artificial intelligence as data structures and reasoning systems.

Figure 1: Examples of mathematical graph (Source)
Figure 2: Examples of analytics graphs (Source)
Figure 3: Example of knowledge graph (Source)

Definitions

Before we can dive into knowledge graphs, we need a few basic definitions of what comprises a graph and how to discuss the components of a graph.

A node (or vertex) is a discrete component in a graph that may contain information, and can be related to other nodes via inbound and/or outbound edges. For example, in a social graph a node might represent a person and the information contained on the node might be data points like age, gender, height, weight, etc ad nauseum.

An edge represents a relationship between two nodes, and may contain additional information about that relationship. Edges can be undirected meaning they are bidirectional between the pair of nodes, or directed where the relationship holds in one direction but not the other. To continue the social graph example, a type of undirected edge might be “has-sibling” where the relationship is equally applicable from John to Jane as it is from Jane to John. A directed edge would be “has-child” where John “has-child” Jimmy, but Jimmy “has-child” John is not valid.

A path is a collection of one or more edges that describe the relationship between two nodes.

A graph is a collection of nodes and edges that represents the overall state of a domain, be it a social graph, related facts, or a workflow for achieving some set goal. Graphs can be acyclical meaning they have no cycles or cyclical where cycles are allowed. They can be undirected containing only edges that are valid relationships in either direction (e.g. “siblings”) or directed where all edges represent relationships that can only go one way (“childOf, parentOf”). Graphs are often categorized based on these structural constraints (e.g. directed acyclic, undirected cyclic, etc).

Figure 4: Examples of directed cyclic and acyclic graphs (Source)

Types of knowledge graphs

One use of graphs as knowledge is to leverage the node-edge concept to store large collections of data and the relationships between those data points. Four common approaches are triplestores, property graphs, trees, and pattern graphs. Each has their own use case, and in some instances might be interchangeable depending on the use case.

Triplestores

A triplestore or Resource Description Framework (RDF) store is a highly-structured and standardized purpose-built database for the storage and retrieval of triples through semantic queries (Rusher). These triples consist of {subject, predicate, object} values where the subject and object are nodes in the graph, and the predicate represents an edge between them. Figure 3 above is an example of a triplestore for some data points related to science fiction movies. As can be seen, a triplestore can have more than one semantic concept associated with a subject or object, but the graph is generally directed with the subject as the “source” of the predicate and the object the “target.” Queries written in languages such as SPARQL can directly query for subject-predicate-object triples with varying depths of inference:

  • Leonard Nimoy played what characters?
  • What characters did Leonard Nimoy play in a science fiction movie?
  • Did Alec Guinness and Leonard Nimoy star in any science fiction movies together?

Some advanced triplestores even allow knowledge engineers to craft custom meta-predicates that can provide higher-level inference based on complex predicate patterns already existing in the system, allowing for dynamic expansion of inference capabilities as more is understood about a domain.

Property Graphs

Offering a more dynamic and less-structured approach than triplestores, property graphs provide a storage system capable of injecting nodes and edges up to any depth, with properties attached to nodes and/or edges for richer information. They can also be constrained to be directed or undirected, cyclical or acyclical. Property graphs consist of nodes that have one or more labels attached to them (think of a label as a type) and associated properties, linked by edges that also have labels and properties attached to them. An example of an article comment system property graph is shown in Figure 5.

Figure 5: Example of a property graph capturing article comments. Notice the node labels “article,” “user,” and “comment” with edge labels “post,” “like,” “delete,” and “about” – each with their own set of properties. (Source)

While there are currently no standards per se for property graphs, there are some industry consortiums attempting to standardize around the OpenCypher query language (spun out of Neo4j’s Cypher) and Apache TinkerPop’s semantics and API.

Trees

While not commonly thought of as graphs, trees share a lot of features with directed acyclical property graphs where traversal can only go one way, nodes and edges can contain properties, and data is organized in such a way as to facilitate searching and inference. Unlike graphs, trees almost always start off with a root node and branch into children based on some property criteria. The simplest trees use a binary split approach on some node property, while some of the more complex ones leverage distance and/or distribution heuristics and often more than one property to determine child split points (e.g. KD Tree, Ball Tree, etc).

Trees can be leveraged to rapidly reduce search spaces, and are a common tool in machine learning approaches. In supervised learning scenarios they can be used to construct categorical or regression inference models from training data. In unsupervised learning they can be leveraged in a variety of ways to classify unseen data based on previously encountered labeled training examples.

Figure 6: Example of a KD-Tree visualized as hyperplanes (l) and tree structure (r).

Pattern Graph

Up to now, all the graph types we’ve reviewed are commonly referred to as instance graphs where discrete instances of nodes and edges are created and stored in the graph structure. In contrast, a pattern graph defines the constraints on types of nodes, edges, and properties required to create and inject instances into a graph structure – a schema on steroids. In its simplest form it defines node types and edge types, leveraging unique identifiers on node types to drive edge creation, and subsequently child node creation as far as the supporting data can infer new instances. In a more powerful form a pattern graph can provide heuristics that constrain the inference-driven creation of new edges and nodes, interact with external systems, and consume and/or generate events based on inference activities. When fully realized, a pattern graph provides an AI-driven approach to inferring facts from an instance graph (aka knowledge base). An example of pattern graphs can be seen in the Velox Engineering Framework where they are used to construct Observe-Orient and Decide-Act graphs.

Figure 7: Comparison of a pattern graph and instance graph (Source)

Conclusion

Graphs have always been incredibly powerful tools for representing data with complex relationships. As our world becomes more interconnected and richer data is made available, graphs have become more prevalent in day-to-day operations for everything from knowledge capture and organization to search engines, recommendation engines and tools for discovering previously unknown patterns in data. Both knowledge-based and statistical AI are highly dependent on graph structures, and graph theory and reasoning remains a hot topic in intelligence and cognitive research. By knowing your graphs you can make more informed decisions for your use cases and improve the performance and efficacy of your knowledge systems.

References

Rusher, Jack. Rhetorical Device. https://www.w3.org/2001/sw/Europe/events/20031113-storage/positions/rusher.html. Accessed 21 July 2021.