Recently, there has been a lot of buzz about “No SQL” databases. In fact there are at least two conferences on the topic in 2009, one on each coast. Seemingly this buzz comes from people who are proponents of:
• document-style stores in which a database record consists of a collection of (key, value) pairs plus a payload. Examples of this class of system include CouchDB and MongoDB, and we call such systems document stores for simplicity
• key-value stores whose records consist of (key, payload) pairs. Usually, these are implemented by distributed hash tables (DHTs), and we call these key-value stores for simplicity. Examples include Memcachedb and Dynamo.
In either case, one usually gets a low-level record-at-a-time DBMS interface, instead of SQL. Hence, this group identifies itself as advocating “No SQL.”
There are two possible reasons to move to either of these alternate DBMS technologies: performance and flexibility.
The performance argument goes something like the following. I started with MySQL for my data storage needs and over time found performance to be inadequate. My options were:
1. “Shard” my data to partition it across several sites, giving me a serious headache managing distributed data in my application
or
2. Abandon MySQL and pay big licensing fees for an enterprise SQL DBMS or move to something other than a SQL DBMS.
The flexibility argument goes something like the following. My data does not conform to a rigid relational schema. Hence, I can’t be bound by the structure of a RDBMS and need something more flexible.
This blog posting considers the performance argument; a subsequent posting will address the flexibility argument.
For simplicity, we will focus this discussion on the workloads for which NoSQL databases are most often considered: update- and lookup-intensive OLTP workloads, not query-intensive data warehousing workloads. We do not consider document repositories or other specialized workloads for which NoSQL systems may be well suited.
There are two ways to improve OLTP performance; namely, provide automatic “sharding” over a shared-nothing processing environment and improve per-server OLTP performance.
In the first case, one improves performance by providing scalability as nodes are added to a computing environment; in the second case, one improves performance of individual nodes. Every serious SQL DBMS (e.g., Greenplum, Asterdata, Vertica, Paraccel, etc.) written in the last 10 years has provided shared nothing scalability, and any new effort would be remiss if it did not do likewise. Hence, this component of performance should be “table stakes” for any DBMS. In my opinion, nobody should ever run a DBMS that does not provide automatic sharding over computing nodes.
As a result, this posting continues with the other component, namely, single node OLTP performance. The overhead associated with OLTP databases in traditional SQL systems has little to do with SQL, which is why “NoSQL” is such a misnomer.
Instead, the major overhead in an OLTP SQL DBMS is communicating with the DBMS using ODBC or JDBC. Essentially all applications that are performance sensitive use a stored-procedure interface to run application logic inside the DBMS and avoid the crippling overhead of back-and-forth communication between the application and the DBMS. The other alternative is to run the DBMS in the same address space as the application, thereby giving up any pretense of access control or security. Such embeddable DBMSs are reasonable in some environments, but not for mainstream OLTP, where security is a big deal.
Using either stored procedures or embedding, the useful work component is a very small percentage of total transaction cost, for today’s OLTP data bases which usually fit in main memory. Instead, a recent paper [1] calculated that total OLTP time was divided almost equally between the following four overhead components:
Logging: Traditional databases write everything twice; once to the database and once to the log. Moreover, the log must be forced to disk, to guarantee transaction durability. Logging is, therefore, an expensive operation.
Locking: Before touching a record, a transaction must set a lock on it in the lock table. This is an overhead-intensive operation.
Latching: Updates to shared data structures (B-trees, the lock table, resource tables, etc.) must be done carefully in a multi-threaded environment. Typically, this is done with short-term duration latches, which are another considerable source of overhead.
Buffer Management: Data in traditional systems is stored on fixed-size disk pages. A buffer pool manages which set of disk pages is cached in memory at any given time. Moreover, records must be located on pages and the field boundaries identified. Again, these operations are overhead intensive.
If one eliminates any one of the above overhead components, one speeds up a DBMS by 25%. Eliminate three and your speedup is limited by a factor of two. You must get rid of all four to run a lot faster.
Although the No SQL systems have a variety of different features, there are some common themes. First, many manage data that is distributed across multiple sites, and provide the “table stakes” noted above. Obviously, a well-designed multi-site system, whether based on SQL or something else, is way more scalable than a single-site system.
Second, many No SQL systems are disk-based and retain a buffer pool as well as a multi-threaded architecture. This will leave intact two of the four sources of overhead above.
Concerning transactions, there is often support for only single record transactions and an eventual consistency replica system, which assumes that transactions are commutative. In effect the “gold standard” of ACID transactions is sacrificed for performance.
However, the net-net is that the single-node performance of a NoSQL, disk-based, non-ACID, multithreaded system is limited to be a modest factor faster than a well-designed stored-procedure SQL OLTP engine. In essence, ACID transactions are jettisoned for a modest performance boost, and this performance boost has nothing to do with SQL.
However, it is possible to have one’s cake and eat it too. To go fast, one needs to have a stored procedure interface to a run-time system, which compiles a high-level language (for example, SQL) into low level code. Moreover, one has to get rid of all of the above four sources of overhead.
A recent project [2] clearly indicated that this is doable, and showed blazing performance on TPC-C. Watch for commercial versions of these and similar ideas with open source packaging. Hence, I fully expect very high speed, open-source SQL engines in the near future that provide automatic sharding. Moreover, they will continue to provide ACID transactions along with the increased programmer productivity, lower maintenance, and better data independence afforded by SQL. Hence, high performance does not require jettisoning either SQL or ACID transactions.
In summary, blinding performance depends on removing overhead. Such overhead has nothing to do with SQL, but instead revolves around traditional implementations of ACID transactions, multi-threading, and disk management. To go wildly faster, one must remove all four sources of overhead, discussed above. This is possible in either a SQL context or some other context.
References
[1] S. Harizopoulos, et. al., “Through the Looking Glass, and What We Found There,” Proc. 2008 SIGMOD Conference, Vancouver, B.C., June 2008
[2] M. Stonebraker, et. al., “The End of an Architectural Era (It’s Time for a Complete Rewrite),” Proc 2007 VLDB Conference, Vienna, Austria, Sept. 2007
Disclosure: Michael Stonebraker is associated with four startups that are either producers or consumers of data base technology. Hence, his opinions should be considered in this light.
You seem to leave out several other sub-categories of the NoSQL movement in your discussion. For example: Google's BigTable (and clones) as well as graph databases. Considering those in addition, would that change your point of view?
Michael,
It seems to me the denormalization that occurs with typical usage of a document-oriented database improves performance: a complex document with some nested objects and/or arrays might have been 10 rows in an RDMS instead of one contiguous piece of data. Do you agree?
Re Dwight Merriman's comment:
I was careful to point out that NoSQL data bases might be advantageous in document repositories in my blog. There are multiple reasons for this possibility, including denormalization (your discussion), keyword retrieval, and semantic tagging.
--Michael Stonebraker
Re Johannes Ernst's comment:
I am a huge fan of "One size does not fit all." There are several implementations of SQL engines with very different performance characteristics along with a plethora of other engines. Along with the ones you mention, there are array stores such as Rasdaman and RDF stores such as Freebase. I applaud efforts to build DBMSs that are oriented toward particular market needs.
The purpose of the blog entry was to discuss the major actors in the NoSQL movement (as I see it) as they relate to bread-and-butter transaction processing (OLTP). My conclusion is that "NoSQL" really means "No disk" or "No ACID" or "No threading," i.e. speed in the OLTP market does not come from abandoning SQL. The efforts you describe as well as the ones in the above paragraphs are not focused on OLTP. My blog comments were restricted to OLTP, as I thought I made clear.
--Michael Stonebraker
You claim that scalable performance is available in "every serious SQL DBMS written in the last 10 years", suggesting that I can just add more servers to scale both the total dataset size and the rate of OLTP-style transactions (i.e. lots of little transactions, not a few increasingly-large transactions).
However, I have not able to find an example of a large-scale Web application that has been able to meet its needs with a single coherent RDMS system. On the other hand, there are numerous examples of sites that claim to have run out of scalability and ended up partitioning their data across independent RDMS instances or building a NoSQL system (Amazon, Facebook, Google, Yahoo, and Ebay, just to name a few). Can you point us to a couple of success stories to counter the widespread belief that RDMBS systems cannot scale?
My comment referred specifically to the data warehouse marketplace, where Greenplum, Asterdata, Paraccel, and Vertica all scale linearly. Moreover, older systems including Teradata, Netezza and DB2 also scale linearly. Microsoft will come out with their integration of DataAllegro with SQLServer this year. It is reputed to scale linearly. See the paper by Andy Pavlo et. al. in SIGMOD 2009 for some example numbers on Vertica and an unnamed major RDBMS vendor.
However, you asked about the OLTP market and not the data warehouse market. Here, there are two considerations: shardability and collision frequency. An application is shardable if there is a way to partition your data so MOST transactions are local to a single node. Collision frequency refers to how often parallel transactions will touch the same data item. The degenerate case is a one record data base, which will have a 100% collision frequency.
If your data is shardable and there is a low collision frequency, then DB2 should scale linearly. Similarly, the myriad of layers on top of both MySQL and Postgres, for example, EnterpriseDB, should do likewise. If your data is shardable, regardless of collision frequency, VoltDB (now in beta; to go production next month) has been shown to scale linearly.
If your data is not shardable, then everybody will slow down, dealing with distributed transactions. Some systems do not support distributed transactions, which just pushes the problem into application logic, where the performance hit will be taken. Most people I know deal with this issue by figuring out a way to rejigger their application to make it shardable. EBay (which you mentioned in your posting) is a good example of an enterprise which has taken this approach.
Michael Stonebraker, April 19, 2010
Thanks for the response; I have a couple of followup comments:
* I'm confused: you say that your comment referred to the data warehouse marketplace, but my understanding is that the entire blog posting was actually about OLTP. Quoting from the article: "we will focus this discussion on the workloads for which NoSQL databases are most often considered: update- and lookup-intensive online transaction processing workloads (OLTP), not query-intensive data-warehousing workloads." Are the blog references to Greenplum, Aster Data, etc. relevant in the OLTP context?
* Your response makes general arguments about why an RDMS might scale for OLTP, and they seem plausible, but what I'd really like to see is a success story (not vendor claims, but someone who will stand up and describe how their OLTP application actually did achieve high scale using a single RDBMS system spanning many nodes). If RDMSes can scale (under any plausible conditions) surely there would be an example we can all look at, and I'd guess that you are one of the people most likely to have heard about it. Is there anything you can point us to? In the absence of a compelling success story many people are going to believe the naysayers who claim that RDMS's can't scale for OLTP, particularly given the numerous examples of sites that couldn't scale their RDBMS systems and switched to NoSQL.
By the way, I don't think eBay qualifies as a success story under my definition: they partitioned their data to use multiple independent non-scaled RDBMS's, as opposed to a single scaled RDBMS, right?
The main problem I see with SQL is that it is usually a batch-data transfer system, from server to client, requiring the locking of many records to ensure data integrity, which then also increases the likelihood of locking others out of records, records which may not even be being updated at that moment. I guess this would violate atomicity. To combat this, many SQL databases are programmed without update locks, which seems ok at the time because they think the likelihood of others trying the access the same information at the same time is low, however, the reality is that there are many types of records which get updated repeatedly. This violates durability, because one persons updates are obliterated by anothers, at least to the point of leaving childless parent records, and parent-less child records. Breaking data down into header and detail records further violates atomicity because items like sales order records, which belong as a unit whole, are divided up into many parts and then distributed over the disk drive and have to be reassembled to recreate a simple thing like a sales order. Associative arrays can hold a sales order in one record, albeit with relational links to customer information like customer name, address, etc.
As complex as associative arrays can seem, multi-table joins can be even more complex and their output even more difficult to interpret because the relationships of the data are not transparent. NoSQL relational databases can still provide relational links and a real database design without the need to always break the data out, or, indeed, to always have the data in associative arrays. It gives freedom of design while providing a higher level of data integrity due to the ability to lock only the individual records one needs to lock. It promotes using locking strategies in general because such strategies become practical.
Note that I am referring to the type of NoSQL which Pick is. Note also that most Pick systems today also have a SQL interface for querying. Most people prefer the TCL interface, however, when given the choice. The other thing that some of these new NoSQL system have is hashed tables. Pick/MultiValue has had these forever and they are a vast improvement over indexing. Most Pick systems have indexing now, but it is not their primary access method and probably get used on the order of 1/100 or 1/1000 of the times that a regular SQL system does. Given the notorious fallibility of file indices, this is a great advantage. It also improves atomicity because the indices do not have to be updated when records are, at least not as often as in a SQL system.
I dont see any major difference, off the top of my head, between SQL and NoSQL in terms of consistency and isolation. Im not really sure what they mean in this context. In a relational NoSQL, which Pick is and not all are, data can be broken out to be shared and accessed in relation to different tables/files just like in SQL. The main difference is the constraint of having to, which SQL constrains and Pick/MultiValue does not.
I find that associative arrays are a more real-world data model than header/detail arrangements of data. Multivalues are much more common than SQL users would have you believe and in some applications multivalues and associations are the norm, not the exception.
Performance is one thing NoSQL and Pick often claim and we have found it to be true but the differences arent always that important. The biggest differences come in improvements in data integrity with MultiValue/Pick/NoSQL databases over SQL, far fewer concurrency issues, easier querying, more understandable data results, a closer relationship between the data model and the real world, and the ability to have more flexible and imaginative database designs.
SQL has outstripped MultiValue in the market largely due to User Interface advantages that the SQL systems have usually possessed. This has led to greater investment and system name recognition and marketability for SQL. The UI advantages are closing, but SQL enjoys such a lead, especially in name recognition, that MultiValue systems have faced an uphill battle. Even though there are many more applications written for MultiValue than any other database in the world (some existing from as far back as the 1960s, still in use today), most have ancient UIs and it is so prohibitively expensive to remake the UI that many are slowly being abandoned. Those of us able to provide UI solutions, such as AccuTerm and my BB4GL enjoy attention every time someone sees our products. I just cant afford to market my products the way I would like. The couple of times Ive put out press releases Ive gotten immediate enquiries and, were the current market to sustain new ventures, I would have customers, customers who are so busy just trying to keep their heads from sinking under the waves of the current economic hurricane, that they cannot entertain the new ventures. I frequently get enquiries from around the world from people who just happen upon my rarely travelled website, which I cannot afford to promote on web directories, attention received even though it is infrequently travelled.
Ill read the Stonebraker article later this evening. I just wanted to give you the perspective of someone (me) who has worked with both SQL and NoSQL systems over many, many years. To me, NoSQL means Not-Only-SQL, more than No SQL at all. In Pick, we have SQL phrases which allow access and handling of associative array data, so it isnt necessarily a one-or-the-other situation.
Logging: since more records and more data pages must be logged because header and detail records are broken up relative to combined records having associative arrays, more logging transactions must occur in SQL.
Locking: Like I already mentioned, more records must be locked when there are separate detail and header records, especially when detail records are distributed over multiple disk pages.
Latching: Again, since more individual records are involved in SQL, more latching must occur to manage these records.
Buffer Management: More data must be read from the disk drive to extract the same amount of data for the same net record set (use my example of a simple sales order; a sales order having 20 detail lines in an associative array requires one or two disk accesses, while it takes at least 21 to retrieve one in an SQL environment.
Finally, performance is not the chief advantage of a MultiValue system: data integrity, better concurrency, more easily understood querying, fewer complicated table joins, and the ability to create more flexible and imaginative database designs are much more important and are much better in a MultiValue system. However, the MultiValue performance advantage exists and doesnt suck, if youll excuse my French!
Also, the SQL view of atomicity is one of breaking the atom into electrons, neutrons, and protons, which violates the identity of the atom. A MultiValue associative array of, using my previous example, a sales order, keeps the atom of the sales order together. SQL would break this, not into atoms, but into electrons, etc., which lose their identity. Atoms have characteristics according to their make-up, electrons do not. A sales order means a lot when held together, but its detail lines mean little when disassociated from the header and the other detail lines (and their summation). Those detail lines rarely mean anything to any other piece of data on the system except when associated with that header, and MultiValue provides ways of extracting that detail data very easily and in a useful way when and if such a rare occurence exists.
P.S. - If you are ignorant of Pick/MultiValue, one of the most widely used databases in the world today, it trades under many different names, including Sequoia (pioneer of fault-tolerant systems), D3, mvBase, mvEnterprise, AP, General Automation, Mentor, Ultimate, Revelation, OpenInsight, Reality, Microdata (McDonnell Douglas), Applied Digital Data Systems (also a popular terminal manufacturer), NCR, IBM U2 (UniVerse & UniData), Rocket Software, Intersystems Cache, NorthGate, jBase, OpenQM, Maverick, ONWare, Prime Information Systems, and UniVision. Half the universities and colleges in the US use the Datatel application, which is a MultiValue system, to run their administration, academics, scheduling, billing, etc. The major banks of the UK use Pick as their main systems. Hughes Supply, the largest industrial supply house in the US, uses Pick to run its operations.
Displaying comments 1 - 10 of 12 in total