Memory

Memory usage in Spark largely falls under one of two categories: execution and storage. Execution memory refers to that used for computation in shuffles, joins, sorts and aggregations, while storage memory refers to that used for caching and propagating internal data across the cluster.

Java Platform, Standard Edition HotSpot Virtual Machine Garbage Collection Tuning Guide https://docs.oracle.com/javase/8/docs/technotes/guides/vm/gctuning/sizing.html

spark.memory.fraction(M of JVM heap)

expresses the size of M as a fraction of the (JVM heap space - 300MB) (default 0.6). The rest of the space (25%) is reserved for user data structures, internal metadata in Spark, and safeguarding against OOM errors in the case of sparse and unusually large records.

spark.memory.storageFraction(R of M)

expresses the size of R as a fraction of M (default 0.5). R is the storage space within M where cached blocks immune to being evicted by execution.

PreviousHDFS NextRDD

Last updated 5 years ago

Was this helpful?