user@kolibrie:~/docs$
kolibrie@docs : ~/docs $ cat sparql-tutorial.md

SPARQL Tutorial

Table of Contents

  1. Introduction to SPARQL

  2. Writing Queries

  3. Advanced Processing Modes


Introduction to SPARQL

SPARQL (SPARQL Protocol and RDF Query Language) is a powerful query language for retrieving and manipulating data stored in RDF (Resource Description Framework) format. RDF data is represented as triples, consisting of subject, predicate, and object. SPARQL queries match patterns in RDF triples to extract relevant data.

SPARQL Syntax Basics

  • PREFIX: Defines namespaces to simplify URIs.
  • SELECT: Retrieves data matching query patterns.
  • WHERE: Specifies triple patterns for data retrieval.
  • FILTER: Restricts query results based on conditions.
  • BIND: Assigns a value to a variable within a query.
  • GROUP BY: Aggregates results based on specified variables.
  • INSERT: Adds new data triples into the RDF dataset.
  • DELETE: Removes triples from the RDF dataset.
  • VALUES: Provides inline data bindings.
  • ORDER BY: Sorts results by one or more variables.

Example query structure:

PREFIX ex: <http://example.org/>
SELECT ?subject ?predicate ?object
WHERE {
  ?subject ?predicate ?object .
  FILTER(?predicate = ex:someProperty)
}

Writing Queries

Basic Queries

Retrieve individuals by occupation:

PREFIX ex: <http://example.org/>
SELECT ?person WHERE {
  ?person ex:hasOccupation "Engineer"
}

Aggregations

Calculate average salary:

PREFIX ds: <https://data.cityofchicago.org/resource/xzkq-xp2w/>
SELECT AVG(?salary) AS ?average_salary WHERE {
  ?employee ds:annual_salary ?salary
}
GROUP BY ?average_salary

Filtering and Bindings

Filter by author:

PREFIX dc: <http://purl.org/dc/elements/1.1/>
SELECT ?title ?author WHERE {
  ?book dc:title ?title .
  ?book dc:creator ?author .
  FILTER (?author = "Jane Austen")
}

Concatenate names:

PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name WHERE {
  ?P foaf:givenName ?G .
  ?P foaf:surname ?S .
  BIND(CONCAT(?G, " ", ?S) AS ?name)
}

Nested Queries

Find names of friends connected to Alice:

PREFIX ex: <http://example.org/>
SELECT ?friendName WHERE {
  ?person ex:name "Alice" .
  ?person ex:knows ?friend {
    SELECT ?friend ?friendName WHERE {
      ?friend ex:name ?friendName
    }
  }
}

VALUES Clause

Provide inline data bindings without separate INSERT operations:

PREFIX ex: <http://example.org/>
SELECT ?person ?city WHERE {
  VALUES (?person ?city) {
    (ex:Alice "Brussels")
    (ex:Bob   "Ghent")
    (ex:Carol "Leuven")
  }
  ?person ex:knows ?friend .
}

VALUES is also useful for filtering a query to a fixed set of subjects or predicates:

PREFIX ex: <http://example.org/>
SELECT ?person ?salary WHERE {
  VALUES ?person { ex:Alice ex:Bob }
  ?person ex:salary ?salary .
}

Data Insertion

Insert a new triple conditionally:

PREFIX ex: <http://example.org/>
INSERT {
  <http://example.org/JohnDoe> ex:occupation "Software Developer"
} WHERE {
  <http://example.org/JohnDoe> ex:age "30"
}

Insert without a condition:

PREFIX ex: <http://example.org/>
INSERT DATA {
  ex:Alice ex:salary "75000" .
  ex:Alice ex:department "Engineering" .
}

Deleting Data

Remove a specific triple:

PREFIX ex: <http://example.org/>
DELETE {
  <http://example.org/JohnDoe> ex:occupation "Engineer"
} WHERE {
  <http://example.org/JohnDoe> ex:occupation "Engineer"
}

Delete all triples matching a pattern:

PREFIX ex: <http://example.org/>
DELETE {
  ?person ex:salary ?oldSalary
} WHERE {
  ?person ex:salary ?oldSalary .
  FILTER(?oldSalary < "40000")
}

Sorting Results

Sort query results using ORDER BY. Use DESC() for descending order:

PREFIX ex: <http://example.org/>
SELECT ?person ?salary WHERE {
  ?person ex:salary ?salary
}
ORDER BY DESC(?salary)
LIMIT 10

Sort by multiple variables:

PREFIX ex: <http://example.org/>
SELECT ?department ?person ?salary WHERE {
  ?person ex:department ?department .
  ?person ex:salary ?salary .
}
ORDER BY ?department DESC(?salary)

User-Defined Functions

Register a custom function in Rust and use it in a BIND clause:

use kolibrie::SparqlDatabase;
use kolibrie::execute_query::execute_query;

fn main() {
    let mut db = SparqlDatabase::new();
    db.parse_turtle(r#"
        @prefix ex: <http://example.org/> .
        ex:Alice ex:firstName "Alice" .
        ex:Alice ex:lastName  "Smith" .
    "#);

    db.register_udf("fullName", |args: Vec<String>| -> String {
        args.join(" ")
    });

    let query = r#"
        PREFIX ex: <http://example.org/>
        SELECT ?fullName WHERE {
            ?p ex:firstName ?first .
            ?p ex:lastName  ?last .
            BIND(fullName(?first, ?last) AS ?fullName)
        }
    "#;

    for row in execute_query(query, &mut db) {
        println!("{}", row[0]);
    }
}

RULE Definitions

RULE definitions let you express logic that derives new triples and optionally integrates ML predictions. A rule fires when the WHERE clause matches and asserts the CONSTRUCT triples:

RULE :HighEarner :-
CONSTRUCT { ?person ex:isHighEarner true . }
WHERE {
  ?person ex:salary ?salary .
  FILTER(?salary > 100000)
}

Rules can also call ML models inline:

RULE :DetectCongestion() :-
CONSTRUCT { ?road ex:congestionLevel ?level . }
WHERE {
  ?d ex:road ?road ;
     ex:avgVehicleSpeed ?speed ;
     ex:vehicleCount ?count .
}
ML.PREDICT(MODEL "congestion_model",
  INPUT {
    SELECT ?road (AVG(?speed) AS ?avgSpeed) (MAX(?count) AS ?maxCount)
    WHERE {
      ?d ex:road ?road ;
         ex:avgVehicleSpeed ?speed ;
         ex:vehicleCount ?count .
    }
  }, OUTPUT ?level)

For full reasoning capabilities — forward chaining, backward chaining, integrity constraints, and probabilistic inference — see the Knowledge Graph & Reasoning guide.

Advanced Examples

Join multiple RDF descriptions:

PREFIX ex: <http://example.org/>
SELECT ?person ?location ?city ?zipcode WHERE {
  ?person ex:worksAt ?location .
  ?location ex:located ?city .
  ?location ex:zipcode ?zipcode
}

Advanced Processing Modes

Multi-Threaded Processing

Kolibrie automatically parallelizes RDF file parsing using Rayon. Use parse_rdf_from_file() for large files to get multi-threaded ingestion with no additional configuration:

database.parse_rdf_from_file("large_dataset.rdf");

For in-memory string input, use the standard parse methods:

database.parse_rdf(&rdf_xml_string);
database.parse_turtle(&turtle_string);

CUDA-Enabled Processing

GPU-accelerated query processing is available as an experimental feature. It requires an NVIDIA GPU.

Recommended — Docker GPU profile:

docker compose --profile gpu up --build

Manual setup (Unix):

export LD_LIBRARY_PATH=<cuda_lib_path>:$LD_LIBRARY_PATH
cmake .
cmake --build .

Manual setup (Windows):

cmake -G "NMake Makefiles" -DCMAKE_BUILD_TYPE=Release .
cmake --build .

Once built with CUDA support, GPU acceleration is engaged automatically for eligible join operations.

Note: CUDA support is experimental. For production workloads, multi-threaded CPU processing is recommended.

→ Go to Stream Processing