Replacing a 100GB Graph Database with a Function Call

The Problem

During the ACT-IAC hackathon, we used Memgraph for graph analysis on our 82.6M-edge Medicare provider network. It worked. It also required 100GB RAM, a Docker container, the Bolt protocol, and operated under the Business Source License -- the same dual-licensing model that's burned the open source community with MongoDB, Redis, and Elasticsearch.

For a government deployment, this was unacceptable on every axis. Operational complexity: a separate graph database service with its own memory management, network protocol, and failure modes. License risk: BSL is not open source by any reasonable definition, and federal procurement teams rightly treat it with suspicion. Resource cost: 100GB of RAM just for graph analysis, on top of whatever the rest of the system needs.

We needed graph algorithms. We didn't need a graph database.

The Gap

Python has NetworkX. Java has JGraphT. Go had nothing comparable.

Go had gonum for the basics -- BFS, DFS, shortest paths, basic connectivity. Solid foundation. But no comprehensive graph algorithm library existed in the ecosystem.

No Leiden community detection. No approximate betweenness centrality for large graphs. No network embeddings. No bipartite matching. No Louvain. No label propagation. No K-core decomposition.

If you were building anything that needed serious graph analysis in Go, you had three options: shell out to a Python process, add a graph database dependency, or implement the algorithms yourself from academic papers.

We chose the third option.

What We Built

40+ algorithms across 10 packages, all implemented cleanroom from academic papers.

GraphWizard is a comprehensive graph algorithm library for Go. Every algorithm was implemented directly from the original academic papers, not ported from another language's implementation.

Leiden community detection from Traag, Waltman & van Eck (2019) -- the successor to Louvain, with provably better communities and no disconnected-community bug.
Hopcroft-Karp bipartite matching from the 1973 paper -- maximum cardinality matching in O(E*sqrt(V)) time.
Node2Vec network embeddings from Grover & Leskovec (2016) -- biased random walks that capture both structural equivalence and homophily.
Approximate betweenness centrality for large graphs -- because exact betweenness on 82.6M edges would take days.
Louvain, label propagation, K-core decomposition, PageRank, HITS, and dozens more.

Every function accepts standard gonum graph interfaces. If you already use gonum, GraphWizard works with your existing graph types. Zero dependencies beyond gonum itself.

Package	Algorithms
centrality	PageRank, HITS, Betweenness, Closeness, Approximate Betweenness
community	Leiden, Louvain, Label Propagation, K-Core
matching	Hopcroft-Karp, Hungarian
embedding	Node2Vec, DeepWalk
similarity	Jaccard, Cosine, Adamic-Adar
flow	Max-Flow, Min-Cut
traversal	Random Walks, Biased Walks
metrics	Density, Clustering Coefficient, Modularity
partition	Spectral, Kernighan-Lin
layout	Force-Directed, Fruchterman-Reingold

The Impact

GraphWizard replaced Memgraph entirely in Integrity.

The full 82.6M-edge provider graph loads into memory as a standard gonum structure. Leiden community detection, approximate betweenness centrality, and six custom fraud algorithms run end-to-end in approximately 90 seconds.

No Docker container. No network protocol. No external service. No dual license. One binary, one process.

Before (Memgraph)	After (GraphWizard)
100GB RAM dedicated to graph DB	In-process, shared memory space
Docker container + Bolt protocol	Function calls in Go
Business Source License	MIT License
Network latency on every query	Zero serialization overhead
Separate failure domain	Single process, single binary
Cypher query language	Native Go API

The operational simplification matters as much as the performance. One fewer service to monitor, patch, back up, and debug at 2am. For Integrity, this was the difference between a system that requires a dedicated ops team and one that a four-person team can run.

One fewer service to monitor, patch, back up, and debug at 2am.

Design Decisions

Opinionated API design, academic rigor, production test coverage.

One import per domain

centrality.PageRank(g), not iterator chains or builder patterns. You call a function, you get a result. The API should be obvious to anyone who's read the paper.

Consistent return types

Community detection algorithms return community assignments. Centrality algorithms return node scores. Matching algorithms return edge sets. No surprises.

Standard gonum interfaces

Every algorithm accepts gonum/graph.Graph or its directed/weighted variants. If you already have a gonum graph, GraphWizard works with it. No wrapper types, no conversion steps.

97.3% test coverage

Every algorithm tested against known results from the original papers. Edge cases, empty graphs, disconnected components, single-node graphs. If it compiles, it's tested.

Academic references

Every algorithm's doc comment cites the paper it was implemented from. You can trace any implementation back to the original research.

Why Open Source

We built it for ourselves. We open-sourced it because the Go ecosystem needed it.

GraphWizard was born from a specific need: replace Memgraph in our fraud detection system. But the library itself is general-purpose. Nothing in it is tied to fraud detection, healthcare, or government. It's a graph algorithm library, period.

The Go ecosystem deserved a comprehensive graph algorithm library with a real open source license. Not BSL, not SSPL, not "source available." MIT. Use it however you want.

We wrote about the broader philosophy in 47 Users, 47 Engineers -- about building systems that are right-sized for their actual workload, not for imagined scale. GraphWizard is the same idea applied to graph analysis: you probably don't need a graph database. You need graph algorithms.

You probably don't need a graph database. You need graph algorithms.

View on GitHub pkg.go.dev Documentation Open Source Project Page