on target
Databases: relational vs object vs graph vs document
Is relational database dying? Why a model affirmed and finally consolidated over years is now object of discussion? Which alternatives exist and how they can match your needs in practical situations?
Follow me, in this post, to understand how databases evolved and how to choose between new technological alternatives that have already produced valuable results.
Note: this post focus operational (OLTP) databases. I have already addressed OLAP approach for strategic applications in many precedent posts.
A little history
Let's go back to 70s. At that time everybody was discussing which one (out of three) database model should adopt: hierarchical, network or relational. At that time everybody realized that a database system was a must, but there was still no winner.
The hierarchical model organized data into records of different types, each having a predefined set of fields. Records were arranged in a tree structure navigable through parent-child relationships. Despite the “hierarchical” label data could be organized exactly like the network model. A typical implementation was IMS that I think had (at that time) the largest database market share, at least in mainframe world.

The network model was a flexible alternative natively supporting many-to-many relationships. It was based on CODASYL consortium and implemented in products like IDMS. Data were also organized into records navigable through sets.
As you probably know the relational model organizes data into tables. Each table contains a set of records with common attributes or fields represented by columns. There is no predefined navigation and data can be accessed through relational algebra usually expressed in SQL language. The relational model was also born inside IBM which heavily promoted it, but today, in addition to DB2, many other implementations are available like Oracle (de facto market leader).
No matter the approach any database system is built to guarantee data consistency under concurrent usage by mean of ACID transaction support.
Relational won the battle
In the 80s the discussion was practically over. Relational model won the battle. Why? Was the relational model really superior? Did IBM itself kill his other alternative (IMS)? Did SQL standard play a key role?
Maybe all those reasons contributed to heavy adoption of relational model, but frankly I believe that the real killer factor is that it was extremely intuitive. It is quite intuitive for humans to organize data into tables: just consider how much spreadsheets are used and overused. The table paradigm was still more intuitive at that time with all those COBOL minded programmers.
On the other side information is by nature interrelated and not “contained in tables” thus better represented as a graph, exactly as network and hierarchical model foresight! An invoice and its items, for example, are naturally related independently on any shared key exists (or you are forced to invent) in relational implementation.
But, like in history, winner is always right and the rest are villain, so everybody started porting their applications to SQL (I still miss an excellent network product from Raima called DbVista).
The relational mismatch
Relational adoption was almost 100% at the end of millennium, but it is still used more like an
access method than as a database. Tables were so easy to create that
produced a chaotic proliferation with consequent data redundancy. Keys incompatibility and meaning conflicts between departments of the
same organization are common situations.
Maybe for this reason a new buzzword was introduced: the Data Warehouse whose goal was exactly the same for which database were originally conceived (organize data), but with a more pompous name everybody was hoping a more serious approach.
SQL itself soon left to be a standard, giving rise to a multitude of incompatible variations and dialects.
In addition to all such problem a new one was quickly taking shape: the heavy adoption of object oriented development. Objects reflect information and are naturally organized in networks or graphs and not tables. New developers with better abstraction that old COBOL school, realized that there was an Object Relational Impedance Mismatch.
On the other side most application are now stacked over mature and reliable relational engines. So what could be done? First (and still most widely used approach) was to automate conversion through what is usually called ORM (Object Relational Mapping).
ORM takes care of translating object-oriented navigation to specific SQL (and SQL dialect) you use. A well known example of ORM (possibly the widely adopted) is Hibernate. Far from perfect, we will surely inhabit with ORM for many years on.
But from now on, shall we continue using relational databases for new applications? Here we are facing a new crucial choice: shall we continue using what we bet on 30 years ago?
Object Databases
A cleaner alternative to ORM are object-oriented databases that directly add database functionality to the object oriented model and language itself. Object persistence becomes almost native and application code is tremendously simplified.
Some implementations are db4o, Cache and Jade, just to cite a few. Once you use an object database, you realize how inadequate the relational model is and you will try never go back to it.
So why aren't object databases more widely used? Here are main reasons. If none of them is applicable to your case object database is probably a good choice:
Object oriented models are usually better bound to a specific environment. If you use exclusively Java, for example you have a lot of choices. But if your environment is hybrid (Java, Cobol, C++...) there is no common denominator.
Relational models are still more efficient in batch processing, and are backed by an infinity of tools for reporting and data exploration. Object databases are still limited in ad hoc reporting tools. On the other side, do you really need ad hoc reporting on operational data? Operational applications (OLTP) usually provide a predefined set of possibly customizable reports.
Your client or his policies can constrain or suggest a specific relational technology (shall run on …).
Relational models are usually backed by larger suppliers (what is considered more reliable at corporate level).
Document Databases

With massive advent of Internet, storing large amount of documents became a must. Such documents range from images to more or less structured text, including large chunks of information encoded in XML. Once again relational technology was not natively prepared to support such kind of data. BLOGs and CLOBs were later introduced in SQL but at the same time document oriented databases appeared.
Documents have other peculiarities. The first one is life cycle. While most business records are frequently updated (for example client's balance), documents are almost never updated (you don't update an image, you replace it). In case of text documents, replacement implies in versioning for both storage and auditability reasons. In addition, document transactions are extremely granular involving a single record: a perfect fit for REST services.
But what makes document databases really different, is the fact that documents are usually retrieved through dunamic and unpredictable queries. Thus document databases can usually associate any number of fields of any length to a document. This way you can store, together with a medical image, patient name and birth data. If you late decide to add also sex and profession, you can do it even if it wasn't originally conceived.
Document databases like eXist are usually schema-less: there is no predefined data model (the schema defines how database is organized). We will go back to schema-less databases about graph databases. Many document database also parse and index its textual content to allow quick retrieval under ad hoc classification and selection criteria.
In practice, you shall consider a document database only if your application is document based.
Graph Databases
During last 3 years IT offering has dramatically changed its shape. Many applications are now offered as SAAS (software as a service), possibly supported by some kind of cloud computing. The big challenge is: different users, hosted on same database, have different data needs.
Consider a typical CRM application. Information collected about clients depends on product sold, market and many other factors. How can you handle that in a relational model? If you provide columns for every possibility you end up with an unmanageable system. Some application partially solve this issue through a small collection of extra fields, like CUSTOM1, CUSTOM2 … It is usually an unhappy and limited choice.
Conventional schema where columns or fields are predefined do not provide enough flexibility. Here graph databases come in hand. Graph databases are usually schema-less and allow a set of nodes (object instances) with dynamic properties (corresponding to columns or attributes) to be arbitrary linked to other nodes through edges (associations).
At this point you will surely ask: what about database integrity and consistency rules in an arbitrary network where you can stuff everything? Such rules can be conveniently implemented and enforced through API, quite better than using relational model.
So graph database like InfoGrid and Noe4J can be your choice if you need model flexibility. This is not limited to SAAS application, but to any application that can evolve over time and, if applications shall follow business, this more the rule than the exception.
Interested in partnership?
Would like to try or coach this technique with your clients, do you need any additional or technical detail? Please let me know! Component Bases Solutions has great interest in partnership with consultants. We can help you automate your proposed solutions in a very short time. We can also help to increase your visibility through links from many management tools we make freely available on the Web. Please contact for more detail.
Did you like this post?
Please, spend a few seconds and click on button below to socially bookmark it or tweet it. It will help to promote this blog and improve its content, Thank you!
|
Posted at 10:45PM May 13, 2010 by admin in IT | Comments[31]
Blue collar:
Sorry, I think you really did not understand the point and took a fanatic position.
First of all, I do not need to study SQL because I do not write code. My developers do that and what I know is that they interface SQL they shall write code to retrieve and store data (that is absolutely native in an object oriented database). Now, blue collar, try to see things from MY SIDE: More code = More Effort = More cost (to build and to maintain). That's what matters to me, cause I sign the check.
But I can only sign a check once my clients checks have been cleared, so it is very important to me that my client sign checks too. If they want a damned extra field I shall be able to give then that field, even if it is in no schema, so for certain applications I need a flexibility that neither relational nor object oriented database can provide. We use graph databases in such cases.
Finally we use relational database for most administrative applications since there is no better choice. We choose best alternative in each case. Who's trying to use always the same screwdriver (SQL) is you!
We never had performance problems even with large databases (financial transactions always ends in the order of Tera). On the other side keep in mind that we user small groups of skilled developers instead of an army of average programmers.
We integrate through services (API), not database. You can not agree with that (understand - you shall defend your job). We do that not because service bus is “the hot thing”, but because it makes all sense as in life: When I wanna know the amount of my sales I ask my chief accountant, I don't open his files. He's already paid to provide that service.
Posted by Franco Graziosi on May 20, 2010 at 08:19 PM UTC #







Posted by Rich on May 13, 2010 at 11:48 PM UTC #
Posted by michael on May 14, 2010 at 02:06 AM UTC #
Posted by Prakash.T on May 14, 2010 at 06:22 AM UTC #
Posted by Wilson Kutegeka on May 14, 2010 at 06:30 AM UTC #
Posted by Franco Graziosi on May 14, 2010 at 07:06 AM UTC #
Posted by Alireza Haghighatkhah on May 14, 2010 at 07:35 AM UTC #
Posted by Jose M. Arranz on May 14, 2010 at 08:31 AM UTC #
Posted by jr on May 14, 2010 at 09:50 AM UTC #
Posted by Brian on May 14, 2010 at 09:55 AM UTC #
Posted by RS on May 14, 2010 at 09:59 AM UTC #
Posted by Jesse on May 14, 2010 at 02:37 PM UTC #
Posted by gw on May 14, 2010 at 03:14 PM UTC #
Jesse:
Agree but only in part. There was a language called QBE that could do that.
Today there are plenty of graph navigation languages. In addition this kind of computation is surely better suited for MDX than SQL.
Allrezza
I think you missed the point: graph databases are schemaless, in object oriented databases the schema is your metaclass. You have your opinion because you didn't use both, otherwise you would surely change your mind.
Posted by Franco Graziosi on May 14, 2010 at 03:35 PM UTC #
Posted by blindman on May 14, 2010 at 03:59 PM UTC #
Blindman
Data shall be accessed through a single layer API. This is the rule in any complex application that shall survive like SAP, just to cite one. If you wish to provide multiple API they shall call a common layer. Modern enterprise applications rely on a bus for such purpose. Application programs that directly interface the database are toys.Posted by Franco Graziosi on May 14, 2010 at 04:52 PM UTC #
Posted by blindman on May 14, 2010 at 06:25 PM UTC #
Please, let's avoid a philosophical discussion. I used the SAP example because SAP is the most widely used corporate ERP. This is not fantasy land! It is not a pure SAAS and supports a large diversity of users. On the SAAS side no example is needed: any data access is performed through an API layer. This applies to any service provider from giants like Google and Salesforce - that are also not fantasy land - to small startup. Please, understand: I am not suggesting to follow the leaders, I am only evidencing that wrapping access to data works fine. By the way why should my developers use a proprietary database dialect and stored procedures to implement constraints that can be expressed in an high level language? Finally I am not taking our developers point of view (that humorously refer to DBAs as database janitors) but my entrepreneur perspective: I shall have the agility to deliver what my client ask in affordable time and with minimum cost.
Posted by Franco Graziosi on May 14, 2010 at 09:39 PM UTC #
Posted by rdbpro on May 15, 2010 at 01:13 PM UTC #
Posted by Tony Rogerson on May 15, 2010 at 01:37 PM UTC #
Seems that my post has caused some dislike, specially in dogmatic defenders of relational approach. Please understand that I am not criticizing the relational model or its applicability. What I suggest is considering if it is the best fit for each specific case. Relational technology es excellent for accounting applications but not everything is accounting.
I understand that relational technology has a success background, exactly like COBOL had 30 years ago but, depending on case it is not the best choice. Today's largest databases (Google, Yahoo …) are not relational and there are reasons for that.
As a manager (I am not a developer) I shall guarantee best business choice for shareholders, there is no space for passionate or religious decisions. If an user of a SAAS application asks for an extra field I shall provide it for one simple reason: he is the client, he pays the check and I shall make money! I can't either tell him that the schema does not provide that field or introduce a new schema useless to all other users.
Posted by Franco Graziosi on May 15, 2010 at 03:17 PM UTC #
To Managers and Investors :)
My belief is ...
that for *mature* vendors of RDBMS it is much more easy to add step by step OO features into their products. Although I believe also that Valentina DB, which was created from the start with ideas about OO features have a win gap here.For example, Valentina Data Model have Links between tables (and this is really other concept both on logical and on storage level), Properties of Schema Objects. We working on adding inheritance soon.
Posted by Ruslan Zasukhin on May 16, 2010 at 06:19 PM UTC #
Posted by Ruslan Zasukhin on May 16, 2010 at 07:42 PM UTC #
To Managers and Investors :)
Posted by Ruslan Zasukhin on May 16, 2010 at 07:43 PM UTC #
To Managers and Investors :)
Posted by Ruslan Zasukhin on May 16, 2010 at 07:44 PM UTC #
Posted by Ruslan Zasukhin on May 16, 2010 at 07:44 PM UTC #
My belief is ...
that for *mature* vendors of RDBMS it is much more easy to add step by step OO features into their products. Although I believe also that Valentina DB, which was created from the start with ideas about OO features have a win gap here.For example, Valentina Data Model have Links between tables (and this is really other concept both on logical and on storage level), Properties of Schema Objects. We working on adding inheritance soon.
Posted by Ruslan Zasukhin on May 16, 2010 at 07:45 PM UTC #
Posted by Ruslan Zasukhin on May 16, 2010 at 07:50 PM UTC #
Posted by Ruslan Zasukhin on May 16, 2010 at 07:55 PM UTC #
Rusian
I have approved all your comments, but, as a moderator, I would like to avoid merchandising on this blog.
Frankly I don't know Valentina, and I wish you success with it, but the fact is that OO dbs but we have already made a choice about OO DB based on our Java culture.
I hope I left clear when a schemaless language is useful, I do not pretend to further discuss subjective considerations.
Posted by 93.34.222.95 on May 17, 2010 at 12:22 AM UTC #
You say you’re not bashing RDMS or SQL but yet your piece reads very much like a typical anti-RDBMS programmer. Must programmers gripe about SQL & Relational Databases because they have no understanding of how to use them and make little effort to do so beyond reading 'SQL For Dummies" or "SQL in 24 hours" and yet if a DBA did no more then read "VB for Dummies" or ".Net in 24 hours" and then complained about OOP, developers would rake the DBA over the fire letting him/her know how foolish they are to expect to be a real programmer after simply reading one of those types of books.
There's a lot of hypocrisy from the programming/developer world when it comes to RDBMS/SQL in general. You tell yourselves that the flaws in RDBMS/SQL are due to their design and yet most of the time it’s the developers own lack of understanding how to use these that leads to their bad (very bad in some cases) attempts at accessing/using a relational DB. I can tell you that those of us in the DB/DBA world do not sit around gripping about how OO programming languages don’t work in a set based manner like SQL does. Do you know why? Because we realize that there is no one approach that is the end all/fix all to every problem.
I’ve been on both sides and I can tell you that when you let an object-oriented programmer do it their way when it comes to data storage you end up with something that is neither efficient nor extensible. I currently manage a DB that is almost 200GB in size which was designed from the very beginning and updated/revised over the last 20 years strictly by VB/VB.Net/ASP programmers who have some basic knowledge of SQL and the end result is a beast of a DB that is often difficult to manage. In fact the software vendor is surprised that we (meaning myself) are able to use their Product with a DB of this size and get acceptable performance with 200+ users working in it at the same time. We can do this because unlike the database’s architects, we have people who actually know and understand RDBMS/SQL managing the thing and so we are able to work around the short comings in the database’s design that are there thanks to fact that programmers have the been the ones making the design decisions on the database from day one. Had this vendor involved a certified/experienced DB Architect from the beginning and at each key change in the DB’s design along the way then they would not have near the problems they do now and they’d spend far less time and money on working around the problems they’ve inadvertently introduced into the system thru their lack of proper education and understanding in how to use a RDBMS & SQL.
Instead of whining and griping about how RDBMS & SQL aren’t like OOP languages perhaps you should take the time to learn how to use a RDBMS & the SQL language. If you pull a screw driver out of your toolbox and find you are unable to get it to properly drive a nail into a board do you gripe about the inefficiencies of the screw driver and how if it were more like your favorite tool you would have less trouble with the nail or do you realize that there is no one tool (or in this case approach) to solve all problems and that you should draw from your toolbox the tool that is best for the job at hand which in this case would be a hammer and not a screwdriver that can work like hammer?
Posted by BlueCollarCritic on May 20, 2010 at 03:41 PM UTC #