Knowledge Graph & Reasoning
Table of Contents
- Introduction to Knowledge Graphs
- RDF and Knowledge Graph Components
- Loading RDF Data
- Querying Knowledge Graphs
- The Reasoner
- Defining Rules
- Forward Chaining
- Backward Chaining
- Integrity Constraints
- N3 Logic Rules
- ML Integration in Rules
- Benchmarks
Introduction to Knowledge Graphs
A knowledge graph is a structured representation of interconnected data. It captures entities, relationships between entities, and attributes in a meaningful network, making it easier to explore and derive insights from complex datasets.
RDF and Knowledge Graph Components
In RDF (Resource Description Framework), a knowledge graph consists of:
- Entities: Represented by subjects and objects (e.g., people, organizations).
- Relationships: Represented by predicates connecting entities.
- Attributes: Properties providing additional details about entities.
Loading RDF Data
Create a SparqlDatabase and load your RDF data:
use kolibrie::SparqlDatabase;
use kolibrie::execute_query::execute_query;
fn main() {
let rdf_data = r#"
<?xml version="1.0" encoding="UTF-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/0.1/"
xmlns:ex="http://example.org/">
<rdf:Description rdf:about="http://example.org/alice">
<foaf:name>Alice Smith</foaf:name>
<ex:worksAt rdf:resource="http://example.org/company1"/>
<foaf:knows rdf:resource="http://example.org/bob"/>
</rdf:Description>
<rdf:Description rdf:about="http://example.org/bob">
<foaf:name>Bob Johnson</foaf:name>
<ex:worksAt rdf:resource="http://example.org/company2"/>
</rdf:Description>
</rdf:RDF>
"#;
let mut db = SparqlDatabase::new();
db.parse_rdf(rdf_data);
}
Querying Knowledge Graphs
SPARQL queries retrieve entities and their relationships.
Basic Query
Retrieve people and their workplaces:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.org/>
SELECT ?person ?name ?company
WHERE {
?person foaf:name ?name .
?person ex:worksAt ?company
}
Advanced Query
Find people who know each other but work at different companies:
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
PREFIX ex: <http://example.org/>
SELECT ?person1 ?person2
WHERE {
?person1 foaf:knows ?person2 .
?person1 ex:worksAt ?company1 .
?person2 ex:worksAt ?company2 .
FILTER(?company1 != ?company2)
}
The Reasoner
Beyond SPARQL queries, Kolibrie includes a Reasoner that lets you define logic rules and automatically derive new facts from existing data. The Reasoner operates independently of the query engine — use it when you need rule-based inference that produces persistent new triples, not just query-time results.
Creating a Reasoner and Adding Facts
Rust:
use datalog::reasoning::Reasoner;
let mut kg = Reasoner::new();
kg.add_abox_triple("Alice", "hasParent", "Bob");
kg.add_abox_triple("Bob", "hasParent", "Charlie");
Python:
import py_kolibrie
graph = py_kolibrie.PyKnowledgeGraph()
graph.add_abox_triple("Alice", "hasParent", "Bob")
graph.add_abox_triple("Bob", "hasParent", "Charlie")
Defining Rules
A rule consists of one or more premise triple patterns and one or more conclusion triple patterns. When all premises match, the conclusion triples are asserted.
Example rule: “If X has a parent Y and Y has a parent Z, then X has a grandparent Z.”
Rust:
use shared::terms::Term;
use shared::rule::Rule;
// Encode the predicate strings into the dictionary
let mut dict = kg.dictionary.write().unwrap();
let parent_id = dict.encode("hasParent");
let grandparent_id = dict.encode("hasGrandparent");
drop(dict);
let rule = Rule {
premise: vec![
(Term::Variable("X".into()), Term::Constant(parent_id), Term::Variable("Y".into())),
(Term::Variable("Y".into()), Term::Constant(parent_id), Term::Variable("Z".into())),
],
negative_premise: vec![],
conclusion: vec![(
Term::Variable("X".into()),
Term::Constant(grandparent_id),
Term::Variable("Z".into()),
)],
filters: vec![],
};
kg.add_rule(rule);
Python:
has_parent = graph.encode_term("hasParent")
has_grandparent = graph.encode_term("hasGrandparent")
rule = py_kolibrie.PyRule(
premise=[
py_kolibrie.PyTriplePattern(
py_kolibrie.PyTerm.Variable("X"),
py_kolibrie.PyTerm.Constant(has_parent),
py_kolibrie.PyTerm.Variable("Y")),
py_kolibrie.PyTriplePattern(
py_kolibrie.PyTerm.Variable("Y"),
py_kolibrie.PyTerm.Constant(has_parent),
py_kolibrie.PyTerm.Variable("Z")),
],
filters=[],
conclusion=[py_kolibrie.PyTriplePattern(
py_kolibrie.PyTerm.Variable("X"),
py_kolibrie.PyTerm.Constant(has_grandparent),
py_kolibrie.PyTerm.Variable("Z"),
)],
)
graph.add_rule(rule)
Forward Chaining
Forward chaining starts from the known facts and applies rules repeatedly until no new facts can be derived.
Inference Methods
| Method | Use Case |
|---|---|
infer_new_facts() | Small datasets; basic forward chaining |
infer_new_facts_semi_naive() | Larger datasets; efficient incremental reasoning |
infer_new_facts_semi_naive_parallel() | Large-scale; multi-threaded inference |
Rust — run inference and print derived facts:
let inferred = kg.infer_new_facts_semi_naive();
let dict = kg.dictionary.read().unwrap();
for triple in &inferred {
let s = dict.decode(triple.subject).unwrap_or("?");
let p = dict.decode(triple.predicate).unwrap_or("?");
let o = dict.decode(triple.object).unwrap_or("?");
println!("{s} {p} {o}");
}
Output:
Alice hasGrandparent Charlie
Python:
inferred = graph.infer_new_facts()
for subject, predicate, obj in inferred:
print(f"{subject} {predicate} {obj}")
Querying After Inference
Query the ABox (instance-level facts) after inference has run:
let results = kg.query_abox(Some("Alice"), Some("hasGrandparent"), None);
# Returns all (subject, predicate, object) tuples in the ABox
all_facts = graph.query_abox()
Backward Chaining
Backward chaining works in reverse — given a goal pattern, Kolibrie proves whether it holds by working backwards through the rules. This is useful for answering specific queries rather than materializing all possible derivations.
Rust:
use shared::terms::Term;
let grandparent_id = kg.dictionary.write().unwrap().encode("hasGrandparent");
let query_pattern = (
Term::Variable("X".into()),
Term::Constant(grandparent_id),
Term::Variable("Z".into()),
);
let results = kg.backward_chaining(&query_pattern);
// results: Vec<HashMap<String, Term>>
for binding in &results {
println!("{:?}", binding);
}
Integrity Constraints
Integrity constraints are rules whose conclusion signals an inconsistency. When a constraint fires, Kolibrie can automatically repair the knowledge graph by removing one of the conflicting triples.
Example: “No entity can be both a Professor and a Student.”
Rust:
let isa_id = dict.encode("isA");
let professor_id = dict.encode("Professor");
let student_id = dict.encode("Student");
// A constraint conclusion uses sentinel (0, 0, 0) to signal violation
let constraint = Rule {
premise: vec![
(Term::Variable("X".into()), Term::Constant(isa_id), Term::Constant(professor_id)),
(Term::Variable("X".into()), Term::Constant(isa_id), Term::Constant(student_id)),
],
negative_premise: vec![],
conclusion: vec![(Term::Constant(0), Term::Constant(0), Term::Constant(0))],
filters: vec![],
};
kg.add_constraint(constraint);
// Inference with automatic repair of violations
let inferred = kg.infer_new_facts_semi_naive_with_repairs();
Python:
isa_id = graph.encode_term("isA")
professor_id = graph.encode_term("Professor")
student_id = graph.encode_term("Student")
constraint = py_kolibrie.PyRule(
premise=[
py_kolibrie.PyTriplePattern(
py_kolibrie.PyTerm.Variable("X"),
py_kolibrie.PyTerm.Constant(isa_id),
py_kolibrie.PyTerm.Constant(professor_id)),
py_kolibrie.PyTriplePattern(
py_kolibrie.PyTerm.Variable("X"),
py_kolibrie.PyTerm.Constant(isa_id),
py_kolibrie.PyTerm.Constant(student_id)),
],
filters=[],
conclusion=[py_kolibrie.PyTriplePattern(
py_kolibrie.PyTerm.Constant(0),
py_kolibrie.PyTerm.Constant(0),
py_kolibrie.PyTerm.Constant(0),
)],
)
graph.add_constraint(constraint)
inferred = graph.infer_new_facts_semi_naive_with_repairs()
Inconsistency-Tolerant Querying
Query for answers that are consistent under all possible repairs (IAR semantics):
let results = kg.query_with_repairs(&query_pattern);
results = graph.query_with_repairs(query_pattern)
N3 Logic Rules
Kolibrie parses N3 notation directly. N3 rules use the => arrow and can be sent to the HTTP server via the n3logic field:
@prefix ex: <http://example.org/> .
{ ?X ex:hasParent ?Y .
?Y ex:hasParent ?Z . }
=> { ?X ex:hasGrandparent ?Z . } .
Send via the HTTP server:
curl -X POST http://localhost:8080/query \
-H "Content-Type: application/json" \
-d '{
"rdf": "<your RDF/XML data>",
"n3logic": "@prefix ex: <http://example.org/> . { ?X ex:hasParent ?Y . ?Y ex:hasParent ?Z . } => { ?X ex:hasGrandparent ?Z . } .",
"sparql": "PREFIX ex: <http://example.org/> SELECT ?x ?z WHERE { ?x ex:hasGrandparent ?z }"
}'
ML Integration in Rules
Rules can invoke ML models inside reasoning using the ML.PREDICT() syntax. Predictions from Python ML frameworks become first-class facts in the knowledge graph:
RULE :TemperatureForecast :-
CONSTRUCT { ?room ex:predictedTemp ?predicted_temp . }
WHERE {
?room sensor:temperature ?temp ;
sensor:humidity ?humidity ;
sensor:occupancy ?occupancy .
}
ML.PREDICT(MODEL "temperature_predictor",
INPUT {
SELECT ?room ?temp ?humidity ?occupancy
WHERE {
?room sensor:temperature ?temp ;
sensor:humidity ?humidity ;
sensor:occupancy ?occupancy .
}
},
OUTPUT ?predicted_temp
)
The model name maps to a registered Python callable. Kolibrie calls the model with the bound variables from the INPUT subquery and asserts the OUTPUT value into the CONSTRUCT pattern.
Benchmarks
Kolibrie’s reasoning and query performance has been evaluated against established systems.
WatDiv 10M triple dataset (20 runs per query pattern):
- Sub-millisecond to low-millisecond query times across all WatDiv patterns (L, S, F, C types)
- Outperforms Blazegraph, QLever, and Oxigraph (RocksDB) consistently across query categories
Deep Taxonomy Reasoning (hierarchy depths from 10 to 10,000 levels):
- Logarithmic scaling with hierarchy depth
- Sub-second response times at 10,000 levels
- Faster than Apache Jena and the EYE reasoner at all tested depths