Bulk import
Loading large datasets from CSV with gr import: node and relationship files, the header grammar, and performance notes.
When to use bulk import
For initial loads of more than roughly 100,000 nodes or relationships, use gr import instead of individual CREATE statements.
The bulk importer bypasses the WAL, builds the storage structures directly, and is typically 10–100x faster than Cypher for cold loads.
For incremental writes after the initial load, use Cypher through the library or the CLI.
Node CSV format
Each column is named by its header. Special header tokens control how the importer reads the column:
| Header | Meaning |
|---|---|
id:ID |
Node ID used in relationship files to connect nodes; not stored as a property |
id:ID(Space) |
Node ID within a named ID space |
:LABEL |
One or more labels, pipe-separated: Person|Employee |
name:string |
String property named name |
age:int |
Integer property named age |
score:float |
Float property named score |
active:boolean |
Boolean property named active |
Example people.csv:
id:ID(Person),name:string,age:int,:LABEL
1,Alice,30,Person
2,Bob,25,Person|Employee
3,Carol,35,Person
Relationship CSV format
| Header | Meaning |
|---|---|
:START_ID |
ID of the start node (from node files) |
:START_ID(Space) |
Start node ID within a named ID space |
:END_ID |
ID of the end node |
:END_ID(Space) |
End node ID within a named ID space |
:TYPE |
Relationship type |
since:int |
Integer property named since |
Example knows.csv:
:START_ID(Person),:END_ID(Person),:TYPE,since:int
1,2,KNOWS,2022
2,3,KNOWS,2021
1,3,KNOWS,2020
Running the import
gr import graph.gr \
--nodes Person=people.csv \
--rels KNOWS=knows.csv
Multiple node files, each with a different label group:
gr import graph.gr \
--nodes Person=people.csv \
--nodes Product=products.csv \
--rels BOUGHT=bought.csv \
--rels KNOWS=knows.csv
If your CSV has a :LABEL column, omit the label from the --nodes flag:
gr import graph.gr --nodes =people-with-labels.csv
Options
| Flag | Default | Description |
|---|---|---|
--batch-size |
50000 |
Rows to buffer before flushing a segment |
--on-duplicate |
skip |
What to do when two nodes have the same ID: skip or error |
--on-missing-id |
skip |
What to do when a relationship references an unknown node ID: skip or error |
--bad-tolerance |
0 |
Number of bad rows to tolerate before aborting |
--delimiter |
, |
CSV field delimiter |
--quote |
" |
CSV quote character |
--array-delimiter |
| |
Delimiter for array values within a cell |
How it works
The importer runs in four passes:
- Scan IDs — read every node file and build an ID-to-position map.
- Build node columns — write node storage segments from the buffered input.
- Sort relationships — sort the relationship file by start-node position, then write the CSR (compressed sparse row) adjacency structure.
- Finalize — write the catalog, flush, and remove the import temp files.
Because it builds storage directly, the importer does not checkpoint through the WAL. The resulting file is a complete, sealed database file ready to open immediately.
After import
The database is ready to open and query as soon as gr import exits.
Create indexes after the import if you plan to query on the imported properties:
gr run graph.gr "CREATE INDEX FOR (p:Person) ON (p.name)"
gr run graph.gr "CREATE INDEX FOR (p:Product) ON (p.sku)"
Verify the import:
gr info graph.gr
gr run graph.gr "MATCH (n) RETURN labels(n), count(*) ORDER BY count(*) DESC"