Apache Spark MapR Connector Provides JSON Support
Written by Kay Ewbank   
Monday, 05 June 2017

There's a new Native Spark Connector for MapR-DB JSON that gives developers APIs to access MapR-DB JSON documents from Apache Spark, using the Open JSON Application Interface (OJAI) API.

Apache Spark is an open source big data processing framework, which is used for analytics on streaming and batch workloads. MapR-DB is a high performance NoSQL database, which supports two primary data models: JSON documents and wide column tables. A Spark connector is available for each data model. With the Spark/MapR-DB connectors, you can use MapR-DB as a data source and as a data destination for Spark jobs.

The Native Spark Connector for MapR-DB JSON supports loading data from a MapR-DB table as a Spark Resilient Distributed Dataset (RDD) of OJAI documents and saving a Spark RDD into a MapR-DB JSON table. (An RDD is the base format for storing data for use by Spark.)

native connector batch image

The connector includes a set of APIs that that enable MapR users to write applications that consume MapR-DB JSON tables and use them in Spark. It is is a companion to the MapR-DB Binary Connector for Apache Spark, which can be used to write applications that consume HBase binary tables and use them in Spark.

The connector has two APIs that let you load data from a MapR-DB JSON table to a Spark RDD or save a Spark RDD to a MapR-DB JSON table. It also provides support for Scala bean classes, has a custom partitioner that allows you to partition data for better performance, and supports data locality. When the connector reads data from MapR-DB, it uses the data locality feature of MapR-DB to spawn the Spark executors.

The Native Spark Connector includes support for data frames and dataset APIs, so HBase and MapR-DB binary tables can be queried directly with Spark. The advantage this offers is that it removes any intermediary layers, making it easier to construct faster data pipelines and reduce latency associated with data movement.

mapr

More Information

MapR-DB OJAI Documentation

Related Articles

Apache Spark 2.0 Released

Apache Spark Technical Preview

Spark Announcements

Apache Releases Spark 1.6

Spark 1.4 Released

MOOC On Apache Spark 

Learning Spark (book review) 

 

To be informed about new articles on I Programmer, sign up for our weekly newsletter, subscribe to the RSS feed and follow us on Twitter, Facebook or Linkedin.

 

Banner


White House Urges Memory Safe Software
29/02/2024

The White House is urging developers to adopt memory safe programming languages, suggesting Rust would be a safer choice than C or C++. 



iOS 17.4 Released With Support For App Stores In The EU
06/03/2024

I have written about Apple's approach to complying with regulation, characterizing it as malicious compliance. It also seems that Apple is a master of creating the unintended consequence and letting i [ ... ]


More News

 

raspberry pi books

 

Comments




or email your comment to: comments@i-programmer.info

Last Updated ( Monday, 05 June 2017 )