Core Concepts
This page introduces the foundational concepts behind Kolibrie: the RDF data model, the SPARQL query language, and the features Kolibrie brings on top of them.
RDF: The Data Model
RDF (Resource Description Framework) represents information as triples: a subject, a predicate, and an object. Every fact in your dataset is a triple.
The same fact — “Alice knows Bob” — expressed in three common RDF serializations:
RDF/XML:
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:ex="http://example.org/">
<rdf:Description rdf:about="http://example.org/Alice">
<ex:knows rdf:resource="http://example.org/Bob"/>
</rdf:Description>
</rdf:RDF>
Turtle:
@prefix ex: <http://example.org/> .
ex:Alice ex:knows ex:Bob .
N-Triples:
<http://example.org/Alice> <http://example.org/knows> <http://example.org/Bob> .
Kolibrie accepts all three formats (plus N3), so you can load data in whichever serialization you already have.
SPARQL: The Query Language
SPARQL (SPARQL Protocol and RDF Query Language) matches patterns against your RDF triples to retrieve or modify data. A complete annotated example:
PREFIX ex: <http://example.org/>
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?salary
WHERE {
?person ex:hasOccupation "Engineer" .
?person ex:salary ?salary .
FILTER (?salary > 60000)
}
ORDER BY DESC(?salary)
LIMIT 10
Key clauses:
| Clause | Purpose |
|---|---|
PREFIX | Define namespace shortcuts |
SELECT | Choose which variables to return |
WHERE | Specify triple patterns to match |
FILTER | Apply conditions to restrict results |
BIND | Compute and assign new variables |
GROUP BY | Aggregate results by a variable |
INSERT / DELETE | Add or remove triples |
VALUES | Provide inline data bindings |
ORDER BY | Sort results |
LIMIT / OFFSET | Page through results |
What Kolibrie Adds
Kolibrie is a complete semantic data platform built around SPARQL. Here is what it supports beyond a basic query engine:
Multi-Format RDF Parsing
Load data from RDF/XML, Turtle, N3, N-Triples, or their RDF-star variants. File-based loading is automatically parallelized.
Full SPARQL 1.1
SELECT, INSERT, DELETE, FILTER, BIND, GROUP BY, VALUES, ORDER BY, LIMIT, OFFSET, CONCAT, nested queries, and user-defined functions (UDFs) — the full language, not a subset.
Stream Processing (RSP-QL)
Write continuous queries over timestamped RDF streams using sliding windows. Kolibrie evaluates RSTREAM, ISTREAM, and DSTREAM operators as new events arrive.
Knowledge Graph Reasoning
Define logic rules and let Kolibrie derive new facts automatically using forward chaining, backward chaining, or semi-naive evaluation. Supports integrity constraints, inconsistency repair, and probabilistic reasoning.
ML Integration
Call machine learning models directly inside reasoning rules using the ML.PREDICT() syntax. Predictions from Python ML frameworks become first-class facts in your knowledge graph.
Deployment Options
| Mode | How |
|---|---|
| Native Rust library | Add as a Cargo dependency |
| Python bindings | pip install maturin && maturin develop |
| HTTP server + web UI | cargo run --bin kolibrie-http-server |
| Docker (CPU) | docker compose up --build |
| Docker (GPU/CUDA) | docker compose --profile gpu up --build |
RDF-star / SPARQL-star
RDF-star (formerly RDF*) allows you to annotate existing triples — that is, use a triple itself as the subject or object of another triple. This is useful for provenance, confidence scores, and metadata.
Example: stating that Alice knows Bob with a confidence of 0.95:
@prefix ex: <http://example.org/> .
@prefix xsd: <http://www.w3.org/2001/XMLSchema#> .
<< ex:Alice ex:knows ex:Bob >> ex:confidence "0.95"^^xsd:decimal .
Kolibrie parses and stores quoted triples natively. You can query them using SPARQL-star syntax in your WHERE clauses.
Data Flow at a Glance
A typical Kolibrie workflow:
- Load — parse RDF data from files or strings in any supported format
- Query — run SPARQL SELECT, INSERT, or DELETE statements; or build queries programmatically using the QueryBuilder API
- Reason — optionally apply rules to derive new facts from existing ones
- Stream — for live data, register RSP-QL windows and push timestamped triples as events arrive
- Integrate — expose everything through the REST API, or consume results from Python