Code Examples

MongoDB

Read from database

import com.mongodb.spark._
import com.mongodb.spark.sql._
import com.mongodb.spark.config._

val dfFA = spark.read.option("database", "foxconn-analytics")
                     .option("collection", "collect_pageview")
                     // option uri can be omitted if it has been defined in Spark interpreter 
                     .option("uri", "mongodb://mongo201,mongo202,mongo203/?replicaSet=bigdata&readPreference=secondaryPreferred")
                     .mongo()

Read from database with schema specifying by case class

case class EHAVE_TBL(
    CNAME: String, CCOVEY: String, ISCORE: String,
    IFPASS: String, FACTORY: String, ISEX: String
)

val dsEHAVE = spark.read
                   .option("database", "IEDB")
                   .option("collection", "EHAVE_TBL_2016")
                   .mongo[EHAVE_TBL]()

Write to database

MySQL/Maridb

Read from database

Write to database

Orc File

Read from database

Write to database

Parquet

Read from database

Write to database

CSV

Read from CSV

Write to CSV

Oracle

Read from database

Write to database

MS SQL Server

Read from database

Use isin

Concatenate all columns

Hash an array of columns

UDF

User-Defined Functions

UDF is a feature of Spark SQL to define new Column-based functions that extend the vocabulary of Spark SQL’s DSL for transforming Datasets.

converts a Scala list to its Java equivalent

Window Function

Rename Dataset Column Name

Regular Expression to extract values

Transpose rows to column

Last updated

Was this helpful?