We are very happy to announce the release of Kùzu 0.2.0! This is a major release with two major new features: (i) RDFGraphs; and (ii) Kùzu extensions framework and our first extension for accessing files over HTTP(S) servers and on S3. We also have a set of improvements at the core that should make Kùzu faster behind the scenes and several other improvements, as discussed below.
For details on all the changes in this release, please see the change log of this release.
RDFGraphs
Kùzu’s native data model is a version of the property graph model, where you model your records as a set of entities/nodes and relationships and properties on nodes and relationships. Kùzu’s version of property graphs is, in fact, a structured property graph model, as Kùzu requires you to pre-specify the properties on your nodes and relationships. This is very close to the relational model. The primary difference is that you specify some of your tables as node tables and others as relationship tables.
The second popular graph-based data model in practice is Resource Description Framework (RDF). RDF is in fact more than a data model. It is part of a larger set of standards by the World Wide Web Consortium (W3C), such as RDF Schema and OWL, that form a well founded, well-standardized knowledge representation system. In contrast to the property graph model, RDF is particularly suitable for more flexible and heterogenous information representation. All information, including the actual data as well as the schema of your data, i.e., metadata, is represented homogeneously in the form of (subject, predicate, object) triples.
Kùzu 0.2.0 introduces native support for RDF through a new extension of its data model called RDFGraphs.
RDFGraphs is a lightweight extension to Kùzu’s data model that allows ingesting triples natively into Kùzu so
that they can be queried using Cypher.
It is a lightweight extension because an RDFGraph is simply a wrapper around
2 node and 2 relationship tables that acts as a new object in Kùzu’s data model.
For example you can CREATE/DROP RDFGraph <rdfgraph-name>
to create or drop an RDFGraph, which will
create or drop four underlying tables. You can then query these underlying tables with Cypher.
Therefore, RDFGraphs are a specific mapping of your triples into
Kùzu’s native property graph data model, so that you can benefit from Kùzu’s easy, scalable, and fast querying capabilities
for basic querying of RDF triples.
In short, you can now use Kùzu to store and query RDF data via Cypher!
This release is an important step in our vision to be the
go-to system to model your records as graphs. Here is the example from our documentation
of how you can use Kùzu to store and query RDF data.
Consider a Turtle file uni.ttl
modeling information about university students, faculty and cities they live in:
@prefix kz: <http://kuzu.io/rdf-ex#> .
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix rdfs: <http://www.w3.org/2000/01/rdf-schema#> .
kz:Waterloo a kz:City ;
kz:name "Waterloo" ;
kz:population 150000 .
kz:Adam a kz:student ;
kz:livesIn kz:Waterloo ;
kz:name "Adam" ;
kz:age 30 .
You can create an RDFGraph named UniKG
and import the above Turtle file into UniKG
as follows:
CREATE RDFGraph UniKG;
COPY UniKG FROM "${PATH-TO-DIR}/uni.ttl";
You can then query all triples with IRI kz:Waterloo
as subject as follows:
WITH "http://kuzu.io/rdf-ex#" as kz
MATCH (s {iri: kz+"Waterloo"})-[p:UniKG]->(o)
RETURN s.iri, p.iri, o.iri, o.val;
Output:
----------------------------------------------------------------------------------------------------------------------------
| s.iri | p.iri | o.iri | o.val |
----------------------------------------------------------------------------------------------------------------------------
| http://kuzu.io/rdf-ex#Waterloo | http://kuzu.io/rdf-ex#name | | Waterloo |
----------------------------------------------------------------------------------------------------------------------------
| http://kuzu.io/rdf-ex#Waterloo | http://kuzu.io/rdf-ex#population | | 150000 |
----------------------------------------------------------------------------------------------------------------------------
| http://kuzu.io/rdf-ex#Waterloo | http://www.w3.org/1999/02/22-rdf-syntax-ns#type | http://kuzu.io/rdf-ex#City | |
----------------------------------------------------------------------------------------------------------------------------
Learn all about RDFGraphs, how to CREATE them, how to import triples into them from Turtle files, the property graph node and relationships they map to, how to query and modify them and all in our documentation page for RDFGraphs.
Extensions framework
Kùzu 0.2.0 introduces a new framework for extending Kùzu’s capabilities, similar to PostreSQL’s and DuckDB’s extensions.
Extensions are a way to add new features to Kùzu without modifying the core code.
The 0.2.0 version is just the beginning of our development of this framework, and we are happy to release our first extension, httpfs
,
which supports reading data from a file hosted on an HTTP(S) server. httpfs
can also be used to read from Amazon S3.
You can use the httpfs
extension by installing it and dynamically loading it as follows:
INSTALL httpfs;
LOAD EXTENSION httpfs;
You can then read files hosted remotely on a http(s) server or on Amazon S3 as follows:
LOAD FROM "https://raw.githubusercontent.com/kuzudb/extension/main/dataset/test/city.csv"
RETURN *;
Output:
Waterloo|150000
Kitchener|200000
Guelph|75000
The following example shows how to read a file from Amazon S3:
LOAD FROM 's3://kuzu-test/follows.parquet'
RETURN *;
You can also write to S3 using the httpfs
extension. Read all about it here in our documentation.
We have plans to implement additional extensions, such as to support new data types, functions and indices over time.
Improvements at the Core
We are also continuing non-stop to make the core of Kùzu faster and more efficient. We have improved our hash index building by parallelizing it (other parts of the copy pipeline were already parallelized) and through several other optimizations. This results in an improvement in bulk loading performance. Here is a comparison showing by how much we improved bulk loading performance of the LDBC Comments table, which consists of 220M records (~22 GB):
Threads | Kùzu 0.1.0 | Kùzu 0.2.0 | Performance improvement |
---|---|---|---|
1 | 536.1 | 496.5 | 7.4% |
2 | 289.1 | 257.3 | 11.0% |
4 | 161.7 | 137.5 | 15.0% |
8 | 116.8 | 77.6 | 33.5% |
We have also improved our disk-based
CSR implementation to make it faster when ingesting data through CREATE
statements (intended for loading small amounts of data),
and added constant compression all improving Kùzu’s performance in some cases in minor ways.
Closing Remarks
In addition to the above, this release includes the following:
- Several additional improvements to Kùzu’s command line interface
- A new UUID data type
- Many improvements to our testing framework
These updates were all made by our amazing interns 😎. As always, we would like to thank everyone in the Kùzu team for making this release possible and look forward to user feedback!