Monday, 27 May, 2019 UTC


Summary

The business applications have requirements: take customer orders, deliver customer orders, track shipping, generate inventory report, end of the day/month/quarter business report, generate business dashboards and more.  These requirements evolve slowly. They remain even when you choose a NoSQL database.
On NoSQL databases, challenges are addressed by a multitude of technologies and workarounds. Here are some of them:
  1. Copying the data with a different key to make the scans easier.
  2. Fetch all of the data needed to the application and then generate reports
  3. Load the data into relational databases to generate the report.
  4. Products provide workarounds with map-reduce views, APIs, etc.
  5. Finally, SQL itself has been unreasonably effective for both structured and semi-structured data. NoSQL databases like Couchbase, Cassandra, CosmosDB have extended SQL for JSON and wide-column data model.
NoSQL has evolved from “NO SQL” to Not Only SQL.  If you’re interested in some of the evolutionary background combined with a history of SQL evolution to support semi-structured data, I suggest the following interviews.
  1. Ravi Mayuram’s interview with Don Chamberlin[2017]:  https://youtu.be/-U_UjqnhMBI?t=3492
  2. A panel discussion between Ravi Mayuram, Don Chamberlin and Prof. Mike Carey [2018]:  https://www.youtube.com/watch?v=LAlDe1w7wxc
A lot of NoSQL databases claim “SQL Support”. The SQL standard is wide and deep, covered in nine voluminous books. No one, neither Oracle nor SQL Server, supports everything in the standard despite decades of work. So, the NoSQL databases have a long way to go in catching up. So, a detailed evaluation of the SQL support is worth your while. 
Here are the criteria you should use to evaluate the SQL support in NoSQL databases.
  1. Language support: What statements, data types, operations (joins, grouping, aggregation, windowing, pagination, etc)
  2. Indexing support:  Indexes are key to performance, especially in the workloads for interactive applications.
  3. Optimizer: Query rewrite, choosing the right access path, creating the optimal query execution path is what makes SQL a successful 4GL.  Some have a rule-based optimizer, some have a cost-based optimizer, some others have both.  Evaluating the quality of the optimizer is critical.  Typical benchmarks (TPC-C, TPC-DS, YCSB, YCSB-JSON) won’t help you here.
  4. As the saying goes: ” There are three things important in databases: performance, performance, and performance”. It’s important to measure the performance of your workload.  YCSB and the extended YCSB-JSON will make this evaluation easier.
  5. SDKs: Rich SDKs and language support speed up your development.
  6.  BI tool support: For large data analysis, support from BI tools usually via standard database connectivity drivers is important.
In this article, I’ll compare and contrast the SQL language support in Cassandra, CosmosDB, Couchbase, and MongoDB via their respective implementations.  To be sure, MongoDB doesn’t support SQL but have some comparative commands.
I’ve divided the analysis into multiple sections.  WordPress formatting makes these tables too big.  Here is a PDF version that’s compact and easy to read. [Click on the image to view the PDF]
Summary of the SQL Support in Cassandra, CosmosDB, Couchbase, and MongoDB.
 
 
 
 
 
SQL Support approach:  
SQL SQL is a declarative language, select-join-project operations serving as the foundation.
CassandraCQL: SQL inspired language for Cassandra.
CosmosDBSupports SQL along with MongoDB API, Gremlin (for the graph), etc. Simple select-from-where-order-by support.  
CouchbaseN1QL: SQL for JSON. 
Couchbase has two implementations of N1QL: query service and analytics service.
MongoDBJavascript based, simplified SQL command based query.
INPUT and OUTPUT
SQLINPUT: Set of rows (tuples)
OUTPUT: A set of rows (tuples)
CassandraINPUT: Sets of rows
OUTPUT: Set of rows
CosmosDBINPUT: Sets of JSON
OUTPUT: Set of JSON
CouchbaseINPUT: Sets of JSON
OUTPUT: Set of JSON
MongoDBINPUT: Sets of JSON
OUTPUT: Set of JSON
SELECT: FROM Clause
SQLSpecifies the datasource tables (relations)
CassandraFROM clause with only one table allowed.  Now joins, subqueries or expressions are allowed.  From clause interpretation is same as SQL.
CosmosDBFROM clause supports a single collection and self joins (same as UNNEST in Couchbase).   
CouchbaseFROM clause with multiple keyspaces (subset of bucket), subqueries, expressions.  Same as SQL
MongoDBdb.t1.find() for single table SELECT.
db.t1.aggregate() is used as the generalized query framework. aggregate() can join with additional collections using $lookup operator.  There can be multiple $lookup operators in the aggregation pipeline framework making it the closest cousin of the SQL FROM clause.
SELECT: WHERE Clause
SQLCriteria for selecting a row
CassandraStandard boolean expressions. No subqueries.
CosmosDBSame as SQL
CouchbaseStandard boolean expressions and subqueries.
MongoDBdb.t1.find({x:10});
aggregate() has the $match clause.
SELECT: SELECT Clause
SQLProjection clause
CassandraSELECT clause is same as SQL.
CosmosDBSELECT clause is same as SQL.
CouchbaseSELECT clause is same as SQL.
MongoDBdb.t1.find({x:10}, {a:1, b:1})
$project operator in the aggregation pipeline
SELECT: CTE – Common Table Expression
SQLDynamically defined data source (table, resultset)
CassandraUnsupported
CosmosDBUnsupported
CouchbaseWITH clause; same as SQL (in v6.5). Recursive CTE is unsupported
MongoDBUnsupported
SELECT: Subquery
SQLSubquery: Subqueries in the FROM clause, WHERE clause, anywhere an expression is allowed.
CassandraUnsupported
CosmosdbUnsupported
CouchbaseSupports both correlated and non-correlated subqueries.
MongoDBUnsupported in find(). Can add $match in the pipeline, but not exactly an equivalent of a subquery.
SELECT: GROUP BY
SQLGroup the rows based on one or more expressions. Quite useful in reporting and aggregation for groups.
CassandraSupported; Same as SQL.
CosmosdbUnsupported. Can only do the aggregation on the whole resultset.
CouchbaseSupported; Similar as SQL.
MongoDB$group operator in aggregate() pipeline
SELECT: HAVING clause
SQLFiltering after the aggregation.
CassandraUnsupported
CosmosDBUnsupported
CouchbaseHAVING clause; Same as SQL
MongoDB$match after the grouping and aggregation.
SELECT: ORDER BY Clause
SQLThe final order of the results produced by the query block
CassandraORDER BY clause; Same as SQL.
CosmosdbORDER BY clause; Same as SQL.
CouchbaseORDER BY clause; Same as SQL.
MongoDBdb.t1.find().sort({a:1, b:-1});
aggregate() has $sort to specify the result order.
SELECT: LIMIT, OFFSET Clause
SQLUsed for pagination of the resultset
Cassandra“LIMIT is supported.
OFFSET is unsupported.”
CosmosDBLIMIT (TOP) and OFFSET clause; Similar to SQL
CouchbaseLIMIT and OFFSET clause; Same as SQL
MongoDBskip(), limit() methods with find(). $offset, $limit with aggregate().
SELECT: JOIN clause
SQLINNER JOIN, LEFT/RIGHT/FULL outer joins.
CassandraJoins are unsupported.  Applications will have to model the data to avoid joins or do the joins in the application layer.
CosmosdbOnly self JOINs.  No INNER/LEFT/RIGHT/etc joins.
CouchbaseSupports INNER, LEFT OUTER, NEST, UNNEST and limited RIGHT outer.  Same syntax as SQL. FULL OUTER join is unsupported.
MongoDBLimited LEFT OUTER JOIN only via $lookup operator.  No join on array elements or expressions.
SELECT: Aggregation
SQLAggregation
CassandraSimple aggregation on the whole result is supported.  Aggregation with GROUP BY is unsupported.
CosmosDBSimple aggregation on the whole result is supported.  Aggregation with GROUP BY is unsupported.
CouchbaseSUM, AVG, COUNT, MAX, MIN, VARIANCE: same as SQL
MongoDB$sum, $count, $avg with grouping support
SELECT: aggregate functions
SQL Simple aggregation on the whole result is supported.  Aggregation with GROUP BY is unsupported.
CassandraUnsupported
CosmosdbUnsupported
CouchbaseSupports SQL Standard window analytical functions in 6.5.
MongoDBUnsupported
SELECT : Window (analytics/aggregation) functions
SQLWindow functions for running totals using the OVER() clause
CassandraUnsupported
CosmosDBUnsupported
CouchbaseSupports SQL Standard window analytical functions in 6.5.
See details: https://blog.couchbase.com/json-to-insights-fast-and-easy/
https://blog.couchbase.com/get-a-bigger-picture-with-n1ql-window-functions-and-cte/
MongoDBUnsupported
INSERT:  Single row/document insert.
SQLInsert a single row
CassandraINSERT statement
CosmosDBAPI Insert
CouchbaseINSERT statement
MongoDBdb.t1.save()
INSERT:  Multiple row/document insert.
SQLINSERT statement
CassandraUnsupported
CosmosdbUnsupported
CouchbaseINSERT with Multiple documents
MongoDBdb.t1.insert()
DELETE Statement
SQLDelete one or more documents
CassandraDELETE statement; Same as SQL
CosmosdbAPI delete
CouchbaseDELETE statement; Same as SQL
MongoDBdb.t1.delete()
UPSERT statement
SQLINSERT.  UPDATE if exists.
CassandraUnsupported
CosmosdbUnsupported
CouchbaseUPSERT statement.
MongoDBUnsupported
UPDATE Statement
SQL
CassandraUPDATE; Same as SQL
CosmosDBAPI update
CouchbaseUPDATE; Same as SQL
MongoDBdb.t1.update()
MERGE: Merge one relation (set of rows) to another.
SQLMerge a set of rows (documents) into another.
CassandraUnsupported
CosmosDBUnsupported
CouchbaseMERGE statement, same as SQL.
MongoDBUnsupported
PREPARE statement
SQLParse, analyze and create an execution plan.
CassandraSupported.   I see excamples of preparedStatement() in Java SDK.dd
CosmosDBUnsupported
CouchbaseSupported; PREPARE
MongoDBUnsupported
EXECUTE
SQLExecute an ad-hoc or prepared statement.
CassandraSupported in Java.
CosmosdbUnsupported
CouchbaseSupported, similar to SQL.
MongoDBUnsupported
GRANT/REVOKE
SQLGrant/REVOKE permissions for specific operation on the data set
CassandraGRANT, REVOKE
CosmosDBAPI support
CouchbaseGRANT ROLE, REVOKE ROLE
MongoDBTBD??
DESCRIBE statement
SQLDescribes the schema of a table
CassandraDESCRIBE
CosmosdbUnsupported
CouchbaseINFER describes the schema of the documents
MongoDBCompass tool — graphical only.
TRUNCATE statement
SQLTruncates the data in the table without altering security or physical schema.
CassandraTRUNCATE
CosmosDBUnsupported
CouchbaseFLUSH operation
MongoDBUnsupported.  Workaround via remove collection, recreate with the same security settings.
Value Logic (boolean values)
SQLTrue, False, NULL (Unknown)
CassandraTrue, False, NULL (Unknown)
CosmosdbTrue, False, NULL (Unknown)
CouchbaseTrue, False, NULL (Unknown), Missing
https://docs.couchbase.com/server/4.0/n1ql/n1ql-language-reference/booleanlogic.html
MongoDBTrue, False, NULL (Unknown)
Query Optimizer: Type of Optimizer
SQLRule-based and cost-based optimizer.  Does query rewrites, index selection, join ordering, join type selection and position of the tablers (inner/outer, hash-table build/probe)
CassandraRule-based optimizer.  Index selection for the single table is done since there are no joins.
CosmosDBRule-based optimizer mainly does index selection.
CouchbaseRule-based optimizer, index(es) selection.  Block Nested Loop join by default but supports hash join via user hint in the query.
MongoDB“Shape-based” optimizer according to the docs.  Each new query is matched with a query based on the “shape”.  The first time a query is run, the optimizer does index selection, but when there are multiple candidates, it’ll run multiple queries concurrently to see who returns the results first.
Query Optimizer: Index Selection
SQLYes
CassandraYes
CosmosDBYes
CouchbaseYes
MongoDBYes
Query Optimizer: Query Rewrite
SQLRewrite parts of the query to logical equivalent to better performance. E.g. Subquery rewrite, view folding, join type conversion, constant expression evaluation, etc.
CassandraNone
CosmosdbNo known rewrites
CouchbaseBasic query rewrite.  LEFT OUTER to INNER when applicable, constant expression evaluation.
MongoDBNone
Query Optimization: JOIN Type
SQLChoose the most efficient index, from the available join types.
CassandraNot applicable since joins are unsupported.
CosmosDBNot clearly documented.
CouchbaseNested loop by default. Hash join by user hint.
MongoDBJust the nested loop is supported.
TRANSACTION support.  
SQLACID support with multi-row and multi-statement support.
CassandraNo
CosmosdbYes
CouchbaseNo
MongoDBYes, in 4.0
Indexes
SQLData structures maintained to speed up the query performance.  Indexes are
CassandraSupports primary, secondary, array-indexes.  Need to install and index the data on SOLR for a search index.
CosmosDBIndexes everything by default: scalars, arrays.  No support for the search index.
CouchbaseSupports primary, secondary, composite, functional, adaptive, search, spatial, partitioned and replica index.  Indexes are eventually consistent.
MongoDBSupports primary, secondary, composite, search, spatial, partitioned and replica index.  The search index is simplistically created on a B-Tree.
SQL: Datatype support.
SQLExtensive numerical, character, date-time data type support.
CassandraNumeric, decimal, double. Int, float, varint, Timestamp, collection (set, list)
CosmosDBJSON data types: numeric, string, boolean, object, arrays
CouchbaseJSON data types: numeric, string, boolean, object, arrays
MongoDBJSON data types: numeric, string, boolean, object, arrays and custom extensions for the timestamp datatype.
Conclusion:
The popular NoSQL databases have tried to extend and support the declarative SQL for the respective data model and architectures. So, it’s important to understand the capabilities, limitations of the features and architecture during the evaluation.
References:
  1. The Unreasonable Effectiveness of SQL
The post The Unreasonable Effectiveness of SQL in NoSQL Databases: A Comparative Study. appeared first on The Couchbase Blog.