Kuzu v0.11.0 is here! We are excited to unveil this release, which is packed with a ton of new
features and improvements, including single-file databases instead of directories, improvements
to our vector and full-text search indices, and a new LLM
extension to directly call LLM and embedding
APIs using Cypher. Let’s dive into the details below.
Single-file databases
As of version 0.11.0, a Kuzu database is now just a file on disk instead of a directory.
This single-file design makes your Kuzu
databases much more portable and easier to share or archive.
We recommend using the .kuzu
or .kz
extension to make these files easily identifiable,
but you can use any file extension of your choice (or no extension at all).
Note that this change is not backward-compatible. Existing Kuzu database directories created with earlier versions must be migrated using the EXPORT/IMPORT commands. This will convert your existing Kuzu directory into the new single-file format used in v0.11.0. You can learn more about the different on-disk files created and managed by Kuzu here.
Vector index and FTS improvements
This release includes extensive improvements to both the vector and full-text search (FTS) indices. Below are the highlights.
Mutable indices
Vector and full-text search (FTS) indices in Kuzu are now mutable, meaning that the index is updated the moment new data is added to the graph. Once an index is created on a table, any further insertion and deletion operations in that table will automatically trigger an update in the associated index. Therefore, you no longer have to recreate your indices once your base tables are updated. Currently, the index is updated in a single thread. We will be optimizing this in our future releases.
Filtered vector search with arbitrary Cypher queries
Filtered vector search is the ability to search across a subset of the
vectors you have indexed. For example, suppose you have
Book
nodes that store as properties the vector embeddings of
the abstracts of these books and the year they were published.
A filtered vector search can find the nearest neighbours of a query vector
only across books that were published after 2020.
Or suppose that you have the relationship (:Book)-[:PublishedBy]->(:Publisher)
,
and you want to find the nearest neighbours across
books that have been PublishedBy
a particular publisher, e.g. Pearson
.
This feature, in combination with regular graph traversals, can be a very useful feature for context-engineering in your LLM-based applications
(see our recent blog post on this).
Our previous release (v0.10.0) supported only simple filters on the nodes that have the vectors. So you could do the first filtered search above, but not the second one. Starting with this release, you can now perform a filtered vector search on any subset of nodes in the graph that are identified by an arbitrary Cypher query.
Let’s consider a graph with (:Book)-[:PublishedBy]->(:Publisher)
relationships and a vector index on the Book
table.
CREATE NODE TABLE Book(id SERIAL PRIMARY KEY, title STRING, title_embedding FLOAT[384]);
CREATE NODE TABLE Publisher(name STRING PRIMARY KEY);
CREATE REL TABLE PublishedBy(FROM Book TO Publisher);
CALL CREATE_VECTOR_INDEX(
'Book', // table name
'book_title_index', // index name
'title_embedding' // property name on which the vector index is created
);
To retrieve relevant books from a certain publisher, e.g. Pearson
, you can use a Cypher
query that identifies the relevant subset of books through the PROJECT_GRAPH_CYPHER
function as below:
CALL PROJECT_GRAPH_CYPHER(
'pearson_book', // graph name
'MATCH (b:Book)-[:PublishedBy]->(p:Publisher {name: "Pearson"}) RETURN b' // cypher query
);
CALL QUERY_VECTOR_INDEX(
'pearson_book', // graph name
'book_title_index', // index name
create_embedding('quantum world', 'open-ai', 'text-embedding-3-small', 384), // input vector using the `llm` extension
2 // top-K
)
RETURN *;
The projected graph is given a name, pearson_book
in this case,
and then passed to the QUERY_VECTOR_INDEX
function.
Try this feature out, and learn more about this feature using a more detailed example in the docs.
FTS performance improvements
This release improves the top-K search performance of FTS indices. To illustrate the performance gains, we report the end-to-end runtime on a machine with 2xAMD EPYC 7551 CPUs and 512GB RAM, using 1, 2, 4 and 8 threads to compute the top 10 most relevant records for the query: “dispossessed meaning” using a full-text-search index on the ms-passage dataset which contains approximately 8M records.
# threads | 1 | 2 | 4 | 8 |
---|---|---|---|---|
v0.10 | 6.2s | 4.2s | 2.8s | 2.1s |
v0.11 | 3.3s | 2.2s | 1.3s | 1.0s |
The performance improvement is almost 2x, even when using 8 threads.
LLM extension
As more and more users build AI applications on top of Kuzu, we see a growing demand for convenience
features to help interact with popular LLM providers. In this release, we’re happy to announce a new
LLM
extension that allows you to directly call LLM APIs using Cypher.
Currently, the extension supports one function: CREATE_EMBEDDING
. It takes a text string as input,
and returns a float vector (embedding), ready for use in vector similarity search.
// Install the extension
INSTALL llm;
LOAD llm;
// Create an embedding via Ollama
RETURN CREATE_EMBEDDING(
"Kuzu is an embedded graph database",
"ollama",
"nomic-embed-text"
);
// OR, create an embedding via OpenAI
RETURN CREATE_EMBEDDING(
"Kuzu is an embedded graph database",
"open-ai",
"text-embedding-3-small"
);
Using the CREATE_EMBEDDING
function this way eliminates the need for external scripts to generate embeddings.
To learn more about which model providers are supported, see the docs.
This feature is only the beginning — expect more functions to be added to the LLM
extension in the near future,
and do reach out if you’re looking for particular functions related to LLMs.
Azure
We’ve added initial support for scanning data from Azure Blob Storage and Azure Data Lake Storage (ADLS). If you’re using Kuzu in Azure environments, check out this feature in the docs and reach out to us via a GitHub issue or discussion if you have any feedback.
Swift API
Kuzu is designed to run anywhere. With this goal in mind, we’re excited to introduce our new Swift API, allowing you to embed and run Kuzu on iOS devices. Developers interested in building Kuzu applications with Swift can check out our Swift API docs. To help you get started, we also provide a demo that shows how to integrate Kuzu into an iOS application.
Other Cypher features
CREATE TABLE AS
In a similar way to PostgreSQL, Kuzu now supports creating and populating tables using this syntax: CREATE NODE/REL TABLE AS <subquery>
.
The created table has the same column names and types as the result of the subquery
.
For example, the following query creates a new node table SeniorPerson
with a subset of records from the Person
table.
CREATE NODE TABLE SeniorPerson AS
MATCH (p:Person)
WHERE p.age > 60
RETURN p.*;
You can find more information about this feature here.
Adding/dropping node table pairs from relationship tables
You can now modify the node table pairs (FROM
/TO
) of a relationship table schema after creation.
This allows you to add/remove relationships as your graph schema evolves over time.
The following example adds a User -> Celebrity
node table pair to the Follows
relationship table.
Removing specific node table pairs from a relationship table is also supported.
ALTER TABLE Follows ADD FROM User TO Celebrity;
ALTER TABLE Follows DROP FROM User TO Celebrity;
More information can be found here.
Closing remarks
Over the coming months, our main goal will be to empower developers working with the next generation of AI agents and LLMs. We’ll also continue to improve Kuzu’s performance and usability features, so that you’re more productive with your graph-based workflows.
As Kuzu’s user base continues to grow, we’re excited to welcome new users to our community, and we encourage you to join us on Discord to share your thoughts and ideas as we build the future of graphs. We’re eager to see what you build with this latest release, and we love hearing from our users about what we can do to make Kuzu even better — keep the feedback coming.
See you next time!
— The Kuzu Team