Kuzu 0.2.0 Release

We are very happy to announce the release of Kuzu 0.2.0! This is a major release with two major new features: (i) RDFGraphs; and (ii) Kuzu extensions framework and our first extension for accessing files over HTTP(S) servers and on S3. We also have a set of improvements at the core that should make Kuzu faster behind the scenes and several other improvements, as discussed below.

For details on all the changes in this release, please see the change log of this release.

RDFGraphs

Kuzu’s native data model is a version of the property graph model, where you model your records as a set of entities/nodes and relationships and properties on nodes and relationships. Kuzu’s version of property graphs is, in fact, a structured property graph model, as Kuzu requires you to pre-specify the properties on your nodes and relationships. This is very close to the relational model. The primary difference is that you specify some of your tables as node tables and others as relationship tables.

The second popular graph-based data model in practice is Resource Description Framework (RDF). RDF is in fact more than a data model. It is part of a larger set of standards by the World Wide Web Consortium (W3C), such as RDF Schema and OWL, that form a well founded, well-standardized knowledge representation system. In contrast to the property graph model, RDF is particularly suitable for more flexible and heterogenous information representation. All information, including the actual data as well as the schema of your data, i.e., metadata, is represented homogeneously in the form of (subject, predicate, object) triples.

Kuzu 0.2.0 introduces native support for RDF through a new extension of its data model called RDFGraphs. RDFGraphs is a lightweight extension to Kuzu’s data model that allows ingesting triples natively into Kuzu so that they can be queried using Cypher. It is a lightweight extension because an RDFGraph is simply a wrapper around 2 node and 2 relationship tables that acts as a new object in Kuzu’s data model. For example you can CREATE/DROP RDFGraph <rdfgraph-name> to create or drop an RDFGraph, which will create or drop four underlying tables. You can then query these underlying tables with Cypher. Therefore, RDFGraphs are a specific mapping of your triples into Kuzu’s native property graph data model, so that you can benefit from Kuzu’s easy, scalable, and fast querying capabilities for basic querying of RDF triples.

In short, you can now use Kuzu to store and query RDF data via Cypher!

This release is an important step in our vision to be the go-to system to model your records as graphs. Here is the example from our documentation of how you can use Kuzu to store and query RDF data. Consider a Turtle file uni.ttl modeling information about university students, faculty and cities they live in:

@prefix kz: <http://kuzu.io/rdf-ex#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .

kz:Waterloo a kz:City ;
	    kz:name "Waterloo" ;
	    kz:population 150000 .

kz:Adam a kz:student ;
	kz:livesIn kz:Waterloo ;
	kz:name "Adam" ;
	kz:age	30 .

You can create an RDFGraph named UniKG and import the above Turtle file into UniKG as follows:

CREATE RDFGraph UniKG;

COPY UniKG FROM "${PATH-TO-DIR}/uni.ttl";

You can then query all triples with IRI kz:Waterloo as subject as follows:

WITH "http://kuzu.io/rdf-ex#" as kz
MATCH (s {iri: kz+"Waterloo"})-[p:UniKG]->(o)
RETURN s.iri, p.iri, o.iri, o.val;

Output:
----------------------------------------------------------------------------------------------------------------------------
| s.iri                          | p.iri                                           | o.iri                      | o.val    |
----------------------------------------------------------------------------------------------------------------------------
| http://kuzu.io/rdf-ex#Waterloo | http://kuzu.io/rdf-ex#name                      |                            | Waterloo |
----------------------------------------------------------------------------------------------------------------------------
| http://kuzu.io/rdf-ex#Waterloo | http://kuzu.io/rdf-ex#population                |                            | 150000   |
----------------------------------------------------------------------------------------------------------------------------
| http://kuzu.io/rdf-ex#Waterloo | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://kuzu.io/rdf-ex#City |          |
----------------------------------------------------------------------------------------------------------------------------

Learn all about RDFGraphs, how to CREATE them, how to import triples into them from Turtle files, the property graph node and relationships they map to, how to query and modify them and all in our documentation page for RDFGraphs.

Extensions framework

Kuzu 0.2.0 introduces a new framework for extending Kuzu’s capabilities, similar to PostreSQL’s and DuckDB’s extensions. Extensions are a way to add new features to Kuzu without modifying the core code. The 0.2.0 version is just the beginning of our development of this framework, and we are happy to release our first extension, httpfs, which supports reading data from a file hosted on an HTTP(S) server. httpfs can also be used to read from Amazon S3. You can use the httpfs extension by installing it and dynamically loading it as follows:

INSTALL httpfs;
LOAD EXTENSION httpfs;

You can then read files hosted remotely on a http(s) server or on Amazon S3 as follows:

LOAD FROM "https://raw.githubusercontent.com/kuzudb/extension/main/dataset/test/city.csv" 
RETURN *;

Output:

Waterloo|150000
Kitchener|200000
Guelph|75000

The following example shows how to read a file from Amazon S3:

LOAD FROM 's3://kuzu-test/follows.parquet'
RETURN *;

You can also write to S3 using the httpfs extension. Read all about it here in our documentation.

We have plans to implement additional extensions, such as to support new data types, functions and indices over time.

Improvements at the Core

We are also continuing non-stop to make the core of Kuzu faster and more efficient. We have improved our hash index building by parallelizing it (other parts of the copy pipeline were already parallelized) and through several other optimizations. This results in an improvement in bulk loading performance. Here is a comparison showing by how much we improved bulk loading performance of the LDBC Comments table, which consists of 220M records (~22 GB):

Threads	Kuzu 0.1.0	Kuzu 0.2.0	Performance improvement
1	536.1	496.5	7.4%
2	289.1	257.3	11.0%
4	161.7	137.5	15.0%
8	116.8	77.6	33.5%

We have also improved our disk-based CSR implementation to make it faster when ingesting data through CREATE statements (intended for loading small amounts of data), and added constant compression all improving Kuzu’s performance in some cases in minor ways.

Closing Remarks

In addition to the above, this release includes the following:

Several additional improvements to Kuzu’s command line interface
A new UUID data type
Many improvements to our testing framework

These updates were all made by our amazing interns 😎. As always, we would like to thank everyone in the Kuzu team for making this release possible and look forward to user feedback!