Today I Discovered: Graph Databases

So if you’re used to databases, you know what SQL is, most likely, and the most common SQL (relational) databases. If you’re a little more advanced, maybe you’ve heard of or even messed around with NoSQL databases. Well today, I just found something cool: graph databases — a cross between a database, and mathematical graph theory.

SQL / Relational Databases

SQL, or Structured Query Language, is a language that allows you to perform operations on a database, more or less independent of the database itself, the file format on disk, etc. As an example:

SELECT FROM Customers
WHERE Name LIKE "Bob %"
ORDER BY Last_Seen;

This query is getting some data (SELECTing it, if you will), FROM a table named Customers, and to filter the data, only showing rows WHERE the Name column starts with “Bob " (So, first name bob), and to ORDER the results BY the value in the Last_Seen column.

When most people I know say or think “Database”, this sort of thing comes to mind. Some call these SQL databases, but SQL is just the way in which you interact. A better description is to call them relational databases.

In a relational database, your data is organized into tables. Each table has a number of columns that describe what data it can hold, and your data itself is organized into rows. Think Excel here. Every table has to have one row that must contain a value, and that value must be unique. This is called that table’s primary key. Additional tables can reference another table’s PK, which is now called a foreign key. In this way, these two tables, and the data they hold, relate. See how they get their name?

Relational databases are, by far, the most common. Names like MySQL, PostgreSQL, Microsoft SQL Server, Oracle (yes, they made a product that’s literally their company name)… okay why does (almost) every one of these literally have “SQL” in the name, no wonder they’re called SQL databases.

NoSQL Databases

NoSQL, or “Not only SQL”, has a very broad syntactic definition, and a much narrowed semantic definition.

By the letter of its own definition, a NoSQL database is just one that does not use SQL as it’s primary interface method. In practice, this means that you’re also casting away the traditional relational data model too. The main two that I can think of? MongoDB and CouchDB

Both of them are document oriented: instead of storing rows in a table, they store documents in a collection. Both MongoDB and CouchDB take a similar approach (JSON) to this. The advantage here is that there is no hard-coded, inflexible schema that your data needs to adhere to. It’s just a collection of JSON that can be browsed, queried, and manipulated at will.

And yes before someone says it, even something like Redis is technically under the NoSQL label, as a key/value database.

Graph Databases

So, here’s the cool thing. In math, we have this thing called graph theory. And when I say “graph”, I don’t mean like “line graph” or “those fancy charts someone made in Excel” graph, I mean more like a graphviz / dot graph: (Taken from the Wikipedia page)

A drawing of a graph

In that graph pictured above, the circles, also called nodes or vertexes, are connected by edges, also called links. That graph is also undirected, meaning that the edges do not specify a “to” and a “from”, they are not arrows pointing from one to another.

Now in fairness I have played around with OrientDB, which is a hybrid system that does use SQL, and manages both relational and graph databases, but it wasn’t until I started playing with Neo4j that it actually made sense to me.

Graph databases can be really cool for showing, well, how data relates to other data, but not in the relational database way, more in a manner of mapping travel routes, or computer networks, social contacts, and so on. Something as simple as “Does person A know someone who knows someone who has an interest in surfing” is very easy to answer in a graph database… actually here it is in Cypher, Neo4j’s query language:

MATCH (a:Person)-[:KNOWS]-()-[:KNOWS]-(surfer)

a here is defined to be a node, of type Person (parenthesis signify a node, they kinda look like a circle), which is connected to another node by a link of type KNOWS, and said anonymous node KNOWS another node, with the trait surfer on it.

That’s, once you get used to it, really powerful and really cool to have on hand if I need it.

Like, I knew they existed before now but I think I finally just understood the true power of what a graph database can do, in just that simple part of the built in getting started guide.