Bulk import

Loading large datasets from CSV with gr import: node and relationship files, the header grammar, and performance notes.

When to use bulk import

For initial loads of more than roughly 100,000 nodes or relationships, use gr import instead of individual CREATE statements. The bulk importer bypasses the WAL, builds the storage structures directly, and is typically 10–100x faster than Cypher for cold loads.

For incremental writes after the initial load, use Cypher through the library or the CLI.

Node CSV format

Each column is named by its header. Special header tokens control how the importer reads the column:

Header	Meaning
`id:ID`	Node ID used in relationship files to connect nodes; not stored as a property
`id:ID(Space)`	Node ID within a named ID space
`:LABEL`	One or more labels, pipe-separated: `Person\|Employee`
`name:string`	String property named `name`
`age:int`	Integer property named `age`
`score:float`	Float property named `score`
`active:boolean`	Boolean property named `active`

Example people.csv:

id:ID(Person),name:string,age:int,:LABEL
1,Alice,30,Person
2,Bob,25,Person|Employee
3,Carol,35,Person

Relationship CSV format

Header	Meaning
`:START_ID`	ID of the start node (from node files)
`:START_ID(Space)`	Start node ID within a named ID space
`:END_ID`	ID of the end node
`:END_ID(Space)`	End node ID within a named ID space
`:TYPE`	Relationship type
`since:int`	Integer property named `since`

Example knows.csv:

:START_ID(Person),:END_ID(Person),:TYPE,since:int
1,2,KNOWS,2022
2,3,KNOWS,2021
1,3,KNOWS,2020

Running the import

gr import graph.gr \
  --nodes Person=people.csv \
  --rels KNOWS=knows.csv

Multiple node files, each with a different label group:

gr import graph.gr \
  --nodes Person=people.csv \
  --nodes Product=products.csv \
  --rels BOUGHT=bought.csv \
  --rels KNOWS=knows.csv

If your CSV has a :LABEL column, omit the label from the --nodes flag:

gr import graph.gr --nodes =people-with-labels.csv

Options

Flag	Default	Description
`--batch-size`	`50000`	Rows to buffer before flushing a segment
`--on-duplicate`	`skip`	What to do when two nodes have the same ID: `skip` or `error`
`--on-missing-id`	`skip`	What to do when a relationship references an unknown node ID: `skip` or `error`
`--bad-tolerance`	`0`	Number of bad rows to tolerate before aborting
`--delimiter`	`,`	CSV field delimiter
`--quote`	`"`	CSV quote character
`--array-delimiter`	`\|`	Delimiter for array values within a cell

How it works

The importer runs in four passes:

Scan IDs — read every node file and build an ID-to-position map.
Build node columns — write node storage segments from the buffered input.
Sort relationships — sort the relationship file by start-node position, then write the CSR (compressed sparse row) adjacency structure.
Finalize — write the catalog, flush, and remove the import temp files.

Because it builds storage directly, the importer does not checkpoint through the WAL. The resulting file is a complete, sealed database file ready to open immediately.

After import

The database is ready to open and query as soon as gr import exits.

Create indexes after the import if you plan to query on the imported properties:

gr run graph.gr "CREATE INDEX FOR (p:Person) ON (p.name)"
gr run graph.gr "CREATE INDEX FOR (p:Product) ON (p.sku)"

Verify the import:

gr info graph.gr
gr run graph.gr "MATCH (n) RETURN labels(n), count(*) ORDER BY count(*) DESC"