Kuzu 0.9.0 Release

Kuzu Team
Kuzu Team
Developers of Kùzu Inc.

Happy April Fools! It’s been a whirlwind of activity since our last release, and we’re excited to announce Kuzu 0.9.0! The highlight of this release is a brand new vector extension, which enables similarity search over vector embeddings—fully within Kuzu. Alongside this, we’ve shipped several ecosystem integrations and performance enhancements. Read on for the full scoop.

Vector Index

Our broader vision is to position Kuzu as a feature-rich and performant foundation for AI applications that require search and querying capabilities over both structured records and unstructured data, such as text and embeddings. Following our full-text search index in version 0.8.0, we’re now releasing a vector index designed for ultra-fast search over high-dimensional embeddings. With the vector index, you can use a mix of structured search, full-text search, and vector search all within a single system. If you’re using Graph RAG or hybrid RAG for providing context to your LLM applications, Kuzu is now your one-stop shop.

Kuzu’s vector index has three important features:

  • HNSW-based: The index is a two-layered HNSW index structure.
  • Native and disk-based: Since HNSW is a graph-based index, the index is built using Kuzu’s native storage structures and is disk-based. Therefore, you do not have to worry about memory usage as you scale the number of vectors you index!
  • kNN search over arbitrary subsets of vectors: You can do approximate nearest neighbour search not only over all the vectors in the index but also on a subset of the vectors. The subset of vectors you select is predicate-agnostic, meaning you can select an arbitrary subset of your vectors (more on this below).

Kuzu’s vector index is provided as an extension. It can index embeddings that are stored as node properties of FLOAT[] type. Let’s illustrate its usage with an example in Python. We’ll use a sample dataset of books and their publishers. We have two node tables Book and Publisher, and one relationship table PublishedBy. We used the sentence transformers library to generate 384-dimensional embeddings for the book titles, which will be stored as the title_embedding (FLOAT[384]) property of the Book node table.

import kuzu
from sentence_transformers import SentenceTransformer

# Load a pre-trained embedding generation model
# https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2
model = SentenceTransformer("all-MiniLM-L6-v2")

# Initialize the database
db = kuzu.Database()
conn = kuzu.Connection(db)

# Install and load vector extension
conn.execute("INSTALL VECTOR;")
conn.execute("LOAD VECTOR;")

# Create tables
conn.execute("CREATE NODE TABLE Book(id SERIAL PRIMARY KEY, title STRING, title_embedding FLOAT[384], published_year INT64);")
conn.execute("CREATE NODE TABLE Publisher(name STRING PRIMARY KEY);")
conn.execute("CREATE REL TABLE PublishedBy(FROM Book TO Publisher);")

# Sample data
titles = ["The Quantum World", "Chronicles of the Universe", "Learning Machines", "Echoes of the Past", "The Dragon's Call"]
publishers = ["Harvard University Press", "Independent Publisher", "Pearson", "McGraw-Hill Ryerson", "O'Reilly"]
published_years = [2004, 2022, 2019, 2010, 2015]

# Insert sample data - Books with embeddings
for title, published_year in zip(titles, published_years):
    # Convert title to a 384-dimensional embedding vector
    embeddings = model.encode(title).tolist()
    conn.execute(
        """CREATE (b:Book {title: $title, title_embedding: $embeddings, published_year: $year});""",
        {"title": title, "embeddings": embeddings, "year": published_year}
    )

# Insert sample data - Publishers
for publisher in publishers:
    conn.execute(
        """CREATE (p:Publisher {name: $publisher});""",
        {"publisher": publisher}
    )

# Create relationships between Books and Publishers
for title, publisher in zip(titles, publishers):
    conn.execute("""
        MATCH (b:Book {title: $title})
        MATCH (p:Publisher {name: $publisher})
        CREATE (b)-[:PublishedBy]->(p);
    """, 
    {"title": title, "publisher": publisher}
    )

To create a vector index on the title_embedding column of the Book table, we can use the CALL statement as follows:

# Create vector index
conn.execute(
    """
    CALL CREATE_VECTOR_INDEX(
        'Book',
        'title_vec_index',
        'title_embedding'
    );
    """
)

We can now query the index to find the 2 books closest to the vector embedding of “The Quantum World”.

query_vector = model.encode("The Quantum World").tolist()
result = conn.execute(
    """
    CALL QUERY_VECTOR_INDEX(
        'Book',
        'title_vec_index',
        $query_vector,
        2
    )
    RETURN node.title ORDER BY distance;
    """,
    {"query_vector": query_vector})
print(result.get_as_df())

distance above is the vector distance of each vector to the query vector, which by default is the cosine distance. You can return this distance explicitly or just use it to order the results. The above query returns:

                   node.title
0           The Quantum World
1  Chronicles of the Universe

Since you have access to the entire Cypher, you can combine vector search with Cypher pattern matching! For example, the following finds the publishers of the two books most similar to “The Quantum World”.

result = conn.execute(
    """
    CALL QUERY_VECTOR_INDEX('book', 'title_vec_index', $query_vector, 2)
    WITH node AS n, distance
    MATCH (n)-[:PublishedBy]->(p:Publisher)
    RETURN p.name AS publisher
    """,
    {"query_vector": query_vector})
print(result.get_as_df())

The above query first finds the two books most similar to “The Quantum World”, and then matches the most similar books with the publishers, which is a regular graph traversal in Cypher. This returns:

                  publisher
0  Harvard University Press
1     Independent Publisher

An important feature of Kuzu’s vector index is that it also supports performing vector search on a subset of records, which we refer to as performing a filtered search. To perform a filtered search, we use a projected graph to filter the records before performing the vector search. A projected graph is a subgraph containing only nodes and relationships that match the given table names and filters.

Let’s use a projected graph to find publishers of books similar to “The Quantum World” that are published after 2010:

# Create a projected graph with a filter on published_year of Book
conn.execute(
    """
    CALL CREATE_PROJECTED_GRAPH(
        'filtered_book',
        {'Book': {'filter': 'n.published_year > 2010'}},
        []
    );
    """
)

query_vector = model.encode("The Quantum World").tolist()

# Query the index on the projected graph filtered_book
result = conn.execute("""
    CALL QUERY_VECTOR_INDEX(
        'filtered_book',
        'title_vec_index',
        $query_vector,
        2
    )
    WITH node AS n, distance as dist 
    MATCH (n)-[:PublishedBy]->(p:Publisher)
    RETURN n.title AS book,
           n.published_year AS year,
           p.name AS publisher
    ORDER BY dist;
    """,
    {"query_vector": query_vector})
print(result.get_as_df())

This returns the following books that are most similar to the input query and published after 2010:

                         book  year              publisher
0  Chronicles of the Universe  2022  Independent Publisher
1           Learning Machines  2019                Pearson

Here’s a brief note on Kuzu’s filtered search implementation: Our implementation is based on a “prefiltering” technique, i.e., Kuzu computes the filtered subset first and then passes this subset to the query vector index function, which uses modified HNSW search algorithm to perform a kNN search only over the filtered subset. This contrasts with post-filtering techniques, where the entire index is searched in increasing values of k and then the results are filtered. Searching within a subset is a challenging problem where naive solutions can have very high latencies as the subset gets smaller, at least at some selectivity levels. This may sound counterintuitive because the number of vectors to search across is fewer when there is a filter. However, this behavior emerges because the connections in the index, which was constructed on all vectors and not just the given subset, may mislead search algorithms. We use an advanced search algorithm based on a suite of heuristics at different selectivity levels, that keeps the latencies low across all selectivity levels. Stay tuned for a more detailed blog post on this topic!

Search performance

Let’s next give a sense of the performance you can expect from Kuzu’s vector index. We evaluate the index construction and query performance on the ann-benchmark. The benchmark results shown below are from a Mac Mini with an M4 Pro chip, 64GB RAM and a 1TB SSD. We configured Kuzu to run in its default on-disk mode using 8 threads. The table below summarizes index construction time and the average latency of 10K queries for each dataset when k is 100 (and efs parameter set to 200). These queries do not contain any filters:

DatasetDimensionNum tuplesConstruction (s)Query (ms)Recall
MNIST78460,0009.84.10.97
SIFT128100,00011.65.00.91
Glove-25251,183,51474.45.20.94
Deep1B969,990,0001691.87.90.89

As shown, we can obtain average query latencies in single digit millisecond scale on these datasets, while obtaining close to or above 90% recall.

To evaluate filtered search performance, we took another dataset GIST from ann-benchmark and ran 50 randomly selected queries from its query set on different selectivity levels. We controlled the selectivity level by applying a predicate on the “node ID” of the embeddings (recall that each vector in Kuzu is part of a node record). We changed the selectivity levels from 1% to 90%. The table below presents query latency and recall at different selectivity levels:

Selectivity (%)Query (ms)Recall
116.71.00
316.41.00
517.20.99
10148.31.00
2042.10.99
3032.30.99
407.50.90
508.20.91
7511.40.92
9011.90.92

As shown, we can maintain millisecond-scale query latencies and over 90% recall throughout the entire selectivity range! We hope filtered search, which allows you to find vectors based on vector similarity as well as arbitrary filters, can help you retrieve more helpful context for your LLM-based applications.

Arbitrary SQL scans from Postgres databases

In a previous release, we introduced the Postgres extension which enabled scanning from PostgreSQL tables. This release further extends the capability with SQL_QUERY function, allowing users to execute arbitrary read-only SQL queries on attached PostgreSQL databases and scan the query result in Kuzu.

An example is shown below:

-- Attach a Postgres database
ATTACH 'dbname=university user=postgres host=localhost password=yourpassword port=5432' AS uw (dbtype postgres);
-- Scan from a Postgres query
CALL SQL_QUERY('uw', 'SELECT id, name, age FROM person WHERE age > 20') RETURN *;
-- Bulk insert from a Postgres query
CREATE NODE TABLE person(id INT PRIMARY KEY, name STRING, age INT);
COPY person FROM SQL_QUERY('uw', 'SELECT id, name, age FROM person WHERE age > 20');

You can push arbitrarily complex queries into Postgres, e.g., those that include joins or group bys. This can be very helpful if what you want to scan from Postgres is not an entire table but some derived data. For example, often you may want to use Kuzu’s Postgres extension to move data from Postgres into Kuzu node and relationship tables. However, the records in Postgres may be in a different structure than your node or relationship records in Kuzu. Now, using SQL queries, you can turn the data in Postgres into the structure you want before moving it into Kuzu. You can find more information on this feature here.

WASM with bundled extensions

One limitation of the initial Wasm version of Kuzu was the lack of support for extensions. That meant that users couldn’t access useful functionality like full-text search, JSON support and vector search.

This release addresses that limitation by bundling the relevant extensions through static linking. The following extensions are now bundled with our Wasm binaries:

  • Full-text search (fts)
  • JSON (json)
  • Vector index (vector)

You can now use the fts, json and vector extensions in your Wasm applications with Kuzu!

New Ecosystem Integrations and APIs

G.V() integration

We’re excited to announce that Kuzu is now integrated with G.V(), a graph database client and visualization tool. You can now easily connect to Kuzu from G.V() and run Cypher queries to explore your graph data in various ways, offering added capabilities to generate custom graph visualizations on larger graphs. Check out the docs and the release announcement post on this new integration to learn more. And many thanks to Arthur Bigeard and his team for making this happen (and so quickly)!

MCP server implementation

There’s been a recent surge of excitement around the Model Context Protocol (MCP) — an open standard introduced by Anthropic to streamline how LLMs interact with external tools. We’re thrilled to contribute to this growing ecosystem: Kuzu now includes an MCP server implementation., allowing MCP-compatible clients to connect directly to your graph data.

MCP clients like Claude Desktop and Cursor can now connect to Kuzu and use it as a database for storing, querying and even debugging your graph data — see our recent blog post for an example of using Kuzu-MCP with Cursor.

Async Python API

This release introduces a new async API for Python users — you can now easily integrate Kuzu with web frameworks like FastAPI that may be using asyncio to manage database connections.

import asyncio
import kuzu

db = kuzu.Database("test_db")
# create the async connection, the underlying connection pool will be automatically created and managed by the async connection
conn = kuzu.AsyncConnection(db, max_concurrent_queries=4, max_threads_per_query=4)

async def main():
async def main():
    # create a table
    await conn.execute("CREATE NODE TABLE person(ID INT64 PRIMARY KEY, age INT64)")
    await conn.execute("MERGE (p:person {ID: 0, age: 20})")
    await conn.execute("MERGE (p:person {ID: 1, age: 25})")
    await conn.execute("MERGE (p:person {ID: 2, age: 30})")
    
    # Run async queries via asyncio
    num_queries = 10
    queries = [f"MATCH (a:person {{ID: {i}}}) RETURN a.*;" for i in range(num_queries)]
    result = await asyncio.gather(*[conn.execute(query) for query in queries])
    for r in result:
        while r.has_next():
            print(r.get_next())

See the Python client API docs for more details.

Sync Node.js API

For more feature completeness and for cases where a synchronous API is required, we now provide a synchronous API for our Node.js users. See the Node.js client API docs for more details and an example.

Unity Catalog integration

We already announced Kuzu’s integration with Unity Catalog in our previous release post. We’ve recently been added to their OSS docs page, so go there and check it out!

Performance Improvements

Below, we list two benchmark results to showcase some of the performance improvements in this release.

Aggregation

We’ve been focused on improving our aggregation performance in the last few releases starting from version 0.7.1, and this release is no exception. In this release, we further improved the performance of aggregation workloads, including distinct, hash aggregate and aggregate on distinct values. The performance gains mainly come from vectorizing the computation and paralleling the final aggregation stage.

The benchmark results below are a version of ClickBench’s benchmark suite (which are aggregation-heavy), but adapted to use Cypher and run using a modified version of their benchmarking scripts. We use our Python API to execute the queries and measure the total runtime of kuzu.Connection.execute). We run each query twice from the same process, database and connection, and record the average query latency. The queries were run on a machine with 2x AMD EPYC 7551 (total 64 cores 128 threads) and 512GB RAM.

Query0.7.1 (s)0.8.0 (s)0.8.20.9.0
Q52122.122.80.52 (44x)
Q616.917.1110.373 (46x)
Q919635.4 (6x)2.46 (79x)1.96 (100x)
Q1020647.3 (4x)2.62 (79x)1.99 (104x)
Q11156.4 (2x)0.681 (22x)0.633 (24x)
Q1228.66.88 (4x)0.684 (42x)0.737 (39x)
Q133.580.424 (8x)0.427 (8x)0.428 (8x)
Q145317.3 (3x)0.903 (59x)0.879 (60x)
Q154.030.458 (9x)0.46 (9x)0.446 (9x)
Q165.990.75 (8x)0.747 (8x)0.749 (8x)
Q178.361.01 (8x)1.06 (8x)1.05 (8x)
Q189.130.922 (10x)1.06 (9x)0.947 (10x)
Q19212.42 (9x)2.72 (8x)2.45 (9x)
Q2120.723.621.70.634 (37x)
Q222123.621.50.854 (28x)
Q231.09e+031.11e+0318.5 (60x)0.949 (1170x)
Q2422.524.623.13.24 (8x)
Q313.470.526 (7x)0.535 (6x)0.544 (6x)
Q325.360.771 (7x)0.784 (7x)0.787 (7x)
Q3348.76.22 (8x)6.27 (8x)6.38 (8x)
Q3415.81.67 (9x)1.66 (10x)1.66 (10x)
Q3517.91.75 (10x)1.78 (10x)1.8 (10x)
Q367.840.946 (8x)0.932 (8x)0.898 (9x)

The above table shows query latency in seconds for each query in each of the previous 3 releases, in addition to this one. The speedup numbers shown are relative to version 0.7.1, when we first began these improvements. For certain queries like Q23, we see up to a 1170x speedup, and for many other queries, we achieve a > 10x speedup. If you have any aggregation-heavy workloads that aren’t performing as well as you’d like, please give our new version a try and see if it helps! We’ll continue improving the performance of our core operators in upcoming releases.

Full-text search index creation

The performance of our full-text search (FTS) index creation has also significantly improved in this release. The following benchmark is run using ms-passage dataset with 8.8M documents that take up 2.9GB of disk space on a machine with 2 AMD EPYC 7551 CPUs with 64 cores and 409GB of memory.

v0.8.0 (s)v0.9.0 (s)Speedup
4601084.3x

FTS index creation in v0.9.0 is 4.3x faster than in v0.8.0! 🚀

Conclusions

Phew! This release is packed with a ton of new features and improvements. Along the way, we’ve also included numerous bug fixes to help improve your experience and productivity when working with Kuzu. We’re incredibly grateful to the Kuzu developer community — your feedback and feature requests help make each release better than the last. Try out v0.9.0, star us on GitHub and let us know what you think on Discord.

Happy graph querying!

— The Kuzu Team