Creating Knowledge Graph Step by Step
With a project I engaged recently, I created a simple POC knowledge graph for quick service restaurant (QSR).
Think it will be fun to document the basic end to end steps.
Domain definition
When creating a knowledge graph, the first thing you need to think of is its domain. Generally, when you have a business case to create knowledge graph, this is a given.
For my project, my domain is quick service restaurant.
Scope
Next is the scope — what is under consideration and what is not for your knowledge graph.
Scope need to be clearly defined before creating knowledge graph. And of course, it can be adjusted or extend in the future.
For me, I focused on the menu products, promotions a restaurant servers customers.
Data
We need data to generate knowledge graph.
Preprocess the data to make sure that it is clean, consistent, and usable. This includes standardizing data formats, resolving duplicates, and dealing with missing values.
Define ontology
The key of a knowledge graph is its ontology, which includes entity, property and constraints.
Define the entities
Restaurant menu product entities, for examples (but not limited to):
- Combo: a collection of several items that can sale together as a product. For example, McDonald’s Big Mac Combo Meal, including hamburger, drink and french fries.
- Item: a single saleable menu item product. For example, Big Mac hamburger, or medium size coke.
- Component: a single no-saleable component that have a minimum and maximum count associated and be part of an item.
- Promotion: a bundle of items/components/combos that can sale together. Comparing with combo, promotions generally have short life time (can come and go based on market) and be more flexible/mutable.
- ChoiceBundle: the mutually exclusive combos/items/components.
Define the relations (properties)
Object Properties to define relations between entities. For example:
- hasParent
- hasChild
Data Properties to define the properties of an entity. For example:
- spoken_name
- code
- size
- … …
Create Knowledge Graph
I first tried using https://webprotege.stanford.edu to create entities and properties to get better understand of them and how to generate relationships among entities, as following image shown:
But, I actually used python notebook to do all steps creating knowledge graph:
step 1, install owlready2 and rdflib libraries
!pip install owlready2
!pip install rdflib
step 2, create ontology
from owlready2 import *
owl_path = "file:///Users/jingdongsun/ontology.owl"
onto = get_ontology(owl_path).load()
# all entities classes, object properties/relations, and data properties are defined within same namespace,
# so these objects of knowledge graph can be managed within same scope.
with onto:
# classes
class Combo(Thing):
pass
class Item(Thing):
pass
class Component(Thing):
pass
class Promotion(Thing):
pass
class ChoiceBundle(Thing):
pass
# Data properties
class spoken_name(DataProperty):
domain = [Combo, Item, Component, Promotion]
range = [str]
class code(DataProperty):
domain = [Combo, Item, Component, Promotion]
range = [int]
class prod_class(DataProperty):
domain = [Combo, Item, Component]
range = [str]
class category(DataProperty):
domain = [Combo, Item, Component]
range = [str]
class size(DataProperty):
domain = [Combo, Item, Component]
range = [str]
class status(DataProperty):
domain = [Combo, Item, Component, Promotion]
range = [str]
class grillable(DataProperty):
domain = [Combo, Item, Component]
range = [bool]
class allowed_qty(DataProperty):
domain = [Promotion]
range = [int]
class instance_id(DataProperty):
domain = [Promotion]
range = [str]
class redemption_mode(DataProperty):
domain = [Promotion]
range = [str]
class template(DataProperty):
domain = [Promotion]
range = [str]
class template_id(DataProperty):
domain = [Promotion]
range = [str]
class promo_type(DataProperty):
domain = [Promotion]
range = [str]
class from_date(DataProperty):
domain = [Promotion]
range = [datetime.date]
class to_date(DataProperty):
domain = [Promotion]
range = [datetime.date]
class qty(DataProperty):
domain = [ChoiceBundle]
range = [int]
# Object properties
class has_parent(ObjectProperty):
domain = [Combo, Item, Component]
range = [Promotion, Combo, Item, ChoiceBundle]
class has_child(ObjectProperty):
domain = [Promotion, Combo, Item, ChoiceBundle]
range = [ChoiceBundle, Combo, Item, Component]
inverse_property = has_parent
After creating ontology, you can manually create entities, properties, or use scripts to load data that you have available.
All my data are in xml format and following steps demo the codes I used to load these data to create knowledge graph instances.
step 3, load a restaurant menu products and promotions data (in xml format)
import xml.etree.ElementTree as ET
prod_path = "/Users/jingdongsun/samples/product-db.xml"
# Load the product XML file
prod_tree = ET.parse(prod_path)
prod_root = prod_tree.getroot()
promo_path = "/Users/jingdongsun/samples/promotion-db.xml"
# Load the promotion XML file
promo_tree = ET.parse(promo_path)
promo_root = promo_tree.getroot()
I can not share whole product-db.xml and promotion-db.xml, but here are some element examples:
product-db.xml:
<Product statusCode="ACTIVE" productClass="VALUE_MEAL" productCategory="FOOD" salable="true" modified="false" grillGroup="LUNCH">
<ProductCode>0001</ProductCode>
<DisplayOrder>0001</DisplayOrder>
<SalesType eatin="true" takeout="true" other="true"/>
<Production>
<Grillable doNotPrint="false" status="true"/>
... ...
</Production>
... ...
<Composition>
<Component>
<ProductCode>7</ProductCode>
<DefaultQuantity>1</DefaultQuantity>
<MinQuantity>1</MinQuantity>
<MaxQuantity>1</MaxQuantity>
... ...
</Component>
<Component>
<ProductCode>89</ProductCode>
<DefaultQuantity>1</DefaultQuantity>
<MinQuantity>1</MinQuantity>
<MaxQuantity>1</MaxQuantity>
... ...
</Component>
</Composition>
... ...
</Product>
<Product statusCode="ACTIVE" productClass="PRODUCT" productCategory="FOOD" salable="true" modified="false">
<ProductCode>0055</ProductCode>
<DisplayOrder>608</DisplayOrder>
<SalesType eatin="true" takeout="true" other="true"/>
<Production>
<Grillable doNotPrint="false" status="false"/>
... ...
</Production>
... ...
<SizeSelection>
<Size entry="0" code="123" />
<Size entry="1" code="456" />
</SizeSelection>
... ...
</Product>
promotion-db.xml:
<PromotionData code="00001" instanceID="12345" priority="20" status="ACTIVE" template="Price Deal - Two Product Sets" templateId="your-templateId">
<Language code="en_US" name="English" parent="en"><Name>$2 Bundle</Name><LongDescription></LongDescription></Language>
<Language code="es_US" name="Spanish" parent="es"><Name>$2 Bundle</Name><LongDescription></LongDescription></Language>
<Promotion allowedQty="99" countTowardsPromotionLimit="true" id="0001" redemptionMode="OnTotal" type="2forPromotion">
<Actions>
<SetItemPrice discountLimit="0.00" item="Eligible Items 1" price="1.00"/>
<SetItemPrice discountLimit="0.00" item="Eligible Items 2" price="1.00"/>
</Actions>
<Conditions>
<Date from="2023-06-09" to="2023-10-11"/>
<ProductSet alias="Eligible Items 1" codes="001|2345" qty="1"/>
<ProductSet alias="Eligible Items 2" codes="004" qty="1"/>
</Conditions>
</Promotion>
</PromotionData>
<PromotionData code="10002" instanceID="12345" priority="50" status="ACTIVE" template="BOGO - Reduced Price" templateId="your-templateId">
<Language code="en_US" name="English" parent="en"><Name>BOGO for $1</Name><LongDescription></LongDescription></Language>
<Language code="es_US" name="Spanish" parent="es"><Name>BOGO for $1</Name><LongDescription></LongDescription></Language>
<Promotion allowedQty="99" countTowardsPromotionLimit="true" id="0002" redemptionMode="OnTotal" type="BOGO">
<Actions>
<SetItemPrice discountLimit="0.00" item="Eligible Items" price="1.00"/>
</Actions>
<Conditions>
<Date from="2023-08-09" to="2023-12-31"/>
<ProductSet codes="53|55|1234|9876" qty="1"/>
<ProductSet alias="Eligible Items" codes="77/66|5555|4444" eligible="true" minQty="1" qty="1"/>
</Conditions>
</Promotion>
</PromotionData>
step 4, create entities, properties, and knowledge graph
from rdflib import Graph, Literal, RDF, URIRef
# rdflib knows about quite a few popular namespaces, like W3C ontologies, schema.org etc.
from rdflib.namespace import XSD
# Create graph instances with entities, properties, etc
def create_graph(prod_root, promo_root):
# Iterate over child elements of the product root to create the initial instances
for child in prod_root:
if child.attrib["productClass"] == "VALUE_MEAL":
create_combo(child)
else:
if child.attrib["salable"] == "true":
create_item(child)
else:
create_component(child)
# Below relation processing code logic is not optimized for performance
# -- going through loops for each combo, item, and promotion.
# For product, need to have an optimize performance approach.
# Iterate over child elements of the product root to create relationships
for child in prod_root:
if child.attrib["productClass"] == "VALUE_MEAL":
resolve_combo_relations(child)
else:
if child.attrib["salable"] == "true":
resolve_item_relations(child)
# Iterate over child elements of the promotion root
for child in promo_root:
process_promotion(child)
def create_combo(combo_elem):
my_combo = Combo()
my_combo.status.append(combo_elem.attrib["statusCode"])
my_combo.category.append(combo_elem.attrib["productCategory"])
for productCode_elem in combo_elem.findall("./ProductCode"):
my_combo.code.append(int(productCode_elem.text))
for size_elem in combo_elem.findall("./SizeSelection/Size"):
my_combo.size.append(size_elem.attrib["code"])
print("Done create combo: ", my_combo, my_combo.code)
def create_item(item_elem):
my_item = Item()
my_item.status.append(item_elem.attrib["statusCode"])
my_item.category.append(item_elem.attrib["productCategory"])
for productCode_elem in item_elem.findall("./ProductCode"):
my_item.code.append(int(productCode_elem.text))
for size_elem in item_elem.findall("./SizeSelection/Size"):
my_item.size.append(size_elem.attrib["code"])
for grill_elem in item_elem.findall("./Production/Grillable"):
my_item.grillable.append(bool(grill_elem.attrib["status"]))
print("Done create item: ", my_item, my_item.code)
def create_component(comp_elem):
my_comp = Component()
my_comp.status.append(comp_elem.attrib["statusCode"])
my_comp.category.append(comp_elem.attrib["productCategory"])
for productCode_elem in comp_elem.findall("./ProductCode"):
my_comp.code.append(int(productCode_elem.text))
for size_elem in comp_elem.findall("./SizeSelection/Size"):
my_comp.size.append(size_elem.attrib["code"])
for grill_elem in comp_elem.findall("./Production/Grillable"):
my_comp.grillable.append(bool(grill_elem.attrib["status"]))
print("Done create component: ", my_comp, my_comp.code)
def resolve_combo_relations(combo_elem):
for combo in Combo.instances():
match = False
if int(combo_elem.find("./ProductCode").text) in combo.code:
match = True
if match:
for comp_elem in combo_elem.findall("./Composition/Component"):
code = int(comp_elem.find("./ProductCode").text)
found = False
if not found:
for item in Item.instances():
if code in item.code:
combo.has_child.append(item)
found = True
break
if not found:
for comp in Component.instances():
if code in comp.code:
combo.has_child.append(comp)
found = True
break
# If can not found the code from all combos, products, something not right.
if not found:
print("ERROR: Resolve combo relations, can not find product code ", code)
#found match, no more loop and search needed.
break
def resolve_item_relations(item_elem):
for item in Item.instances():
match = False
if int(item_elem.find("./ProductCode").text) in item.code:
match = True
if match:
for comp_elem in item_elem.findall("./Composition/Component"):
code = int(comp_elem.find("./ProductCode").text)
found = False
if not found:
for i in Item.instances():
if code in i.code:
item.has_child.append(i)
found = True
break
if not found:
for comp in Component.instances():
if code in comp.code:
item.has_child.append(comp)
found = True
break
# If can not found the code from all combos, products, something not right.
if not found:
print("ERROR: Resolve item relations, can not find product code ", code)
#found match, no more loop and search needed.
break
from datetime import datetime
# The date format
format_str = "%Y-%m-%d"
def process_promotion(promo_elem):
# processing all data properties
my_promo = Promotion()
my_promo.status.append(promo_elem.attrib["status"])
my_promo.instance_id.append(promo_elem.attrib["instanceID"])
my_promo.code.append(int(promo_elem.attrib["code"]))
my_promo.template.append(promo_elem.attrib["template"])
my_promo.template_id.append(promo_elem.attrib["templateId"])
for name_elem in promo_elem.findall("./Language/Name"):
my_promo.spoken_name.append(name_elem.text)
for promotion_elem in promo_elem.findall("./Promotion"):
my_promo.allowed_qty.append(int(promotion_elem.attrib["allowedQty"]))
my_promo.redemption_mode.append(promotion_elem.attrib["redemptionMode"])
my_promo.promo_type.append(promotion_elem.attrib["type"])
for date_elem in promo_elem.findall("./Promotion/Conditions/Date"):
my_promo.from_date.append(datetime.strptime(date_elem.attrib["from"], format_str).date())
my_promo.to_date.append(datetime.strptime(date_elem.attrib["to"], format_str).date())
print("Done creating promotion: ", my_promo, my_promo.spoken_name)
# processing object properties
for prod_set_elem in promo_elem.findall("./Promotion/Conditions/ProductSet"):
my_promo.has_child.append(create_choice_bundle(prod_set_elem))
print("Done processing promotion children: ", my_promo.has_child)
def create_choice_bundle(choice_elem, product_only=True):
my_choice = ChoiceBundle()
my_choice.qty.append(choice_elem.attrib["qty"])
# Get all choice child products
for code in choice_elem.attrib["codes"].split("|"):
found = False
if not found:
# if a combo be part of this bundle
for combo in Combo.instances():
if int(code) in combo.code:
my_choice.has_child.append(combo)
found = True
break
if not found:
# if an item part of this bundle
for item in Item.instances():
if int(code) in item.code:
my_choice.has_child.append(item)
found = True
break
if not found and not product_only:
# if a component part of this bundle
for comp in Component.instances():
if int(code) in comp.code:
my_choice.has_child.append(comp)
found = True
break
# If can not found the code from all combos, products, something not right.
if not found:
print("ERROR: Create choiceBundle, can not find product code ", code)
print("Done creating choiceBundle: ", my_choice.has_child)
return my_choice
After above steps, a basic knowledge graph was created.
Following codes show how to query and visualize the knowledge graph.
Sample queries using SPARQL
# List all Combo instances which have active children.
list(default_world.sparql("""
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX myns: <urn:webprotege:ontology:70d996e9-fb1e-4b2b-ab25-d20cd441d73b#>
SELECT DISTINCT ?combo
WHERE {
?combo rdf:type myns:Combo .
?combo myns:has_child ?item .
?item myns:status "ACTIVE" .
}
"""))
# List all Promotion instances which have children with quantity at least 1.
list(default_world.sparql("""
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX myns: <urn:webprotege:ontology:70d996e9-fb1e-4b2b-ab25-d20cd441d73b#>
SELECT DISTINCT ?promos
WHERE {
?promos rdf:type myns:Promotion .
?promos myns:has_child ?item .
?item myns:has_child ?item2 .
?item2 myns:status "ACTIVE" .
?item myns:qty ?itemQty .
FILTER (?itemQty >= 1)
}
"""))
# List all Item instances which is food and have children active.
list(default_world.sparql("""
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX myns: <urn:webprotege:ontology:70d996e9-fb1e-4b2b-ab25-d20cd441d73b#>
SELECT DISTINCT ?prod
WHERE {
?prod rdf:type myns:Item .
?prod myns:has_child ?item .
?item myns:status "ACTIVE" .
?prod myns:category ?category .
FILTER (?category = "FOOD")
}
"""))
Graph visualization
!pip install --upgrade networkx
import rdflib
from rdflib.extras.external_graph_libs import rdflib_to_networkx_multidigraph
import networkx as nx
import matplotlib.pyplot as plt
#convert it to a RDF graph
graph = default_world.as_rdflib_graph()
# print all the data in the Turtle format
print(graph.serialize(format='turtle'))
# Plot Networkx instance of RDF Graph
plt.figure(figsize=(96, 72))
G = rdflib_to_networkx_multidigraph(graph)
pos = nx.spring_layout(G, scale=2)
edge_labels = nx.get_edge_attributes(G, 'r')
nx.draw_networkx_edge_labels(G, pos, edge_labels=edge_labels)
nx.draw(G, with_labels=False)
plt.show()
References:
RDFLib:
owlready2:
- https://owlready2.readthedocs.io/en/latest/
- https://pypi.org/project/Owlready2/
- https://bitbucket.org/jibalamy/owlready2/src/master/
- https://github.com/pwin/owlready2
SPARQL: