Code Examples
MongoDB
Read from database
import com.mongodb.spark._
import com.mongodb.spark.sql._
import com.mongodb.spark.config._
val dfFA = spark.read.option("database", "foxconn-analytics")
.option("collection", "collect_pageview")
// option uri can be omitted if it has been defined in Spark interpreter
.option("uri", "mongodb://mongo201,mongo202,mongo203/?replicaSet=bigdata&readPreference=secondaryPreferred")
.mongo()Read from database with schema specifying by case class
case class EHAVE_TBL(
CNAME: String, CCOVEY: String, ISCORE: String,
IFPASS: String, FACTORY: String, ISEX: String
)
val dsEHAVE = spark.read
.option("database", "IEDB")
.option("collection", "EHAVE_TBL_2016")
.mongo[EHAVE_TBL]()Write to database
MySQL/Maridb
Read from database
Write to database
Orc File
Read from database
Write to database
Parquet
Read from database
Write to database
CSV
Read from CSV
Write to CSV
Oracle
Read from database
Write to database
MS SQL Server
Read from database
Use isin
Concatenate all columns
Hash an array of columns
UDF
User-Defined Functions
UDF is a feature of Spark SQL to define new Column-based functions that extend the vocabulary of Spark SQL’s DSL for transforming Datasets.
converts a Scala list to its Java equivalent
Window Function
Rename Dataset Column Name
Regular Expression to extract values
Transpose rows to column
Last updated
Was this helpful?