File tree 1 file changed +11
-1
lines changed
code/bonus_chapters/mappartitions
1 file changed +11
-1
lines changed Original file line number Diff line number Diff line change @@ -159,7 +159,17 @@ Note that you may perform final reduction by `RDD.reduce()` as well:
159
159
160
160
161
161
NOTE: data can be huge, but for understanding
162
- the ` mapPartitions() ` we use a very small data set.
162
+ the ` mapPartitions() ` we used a very small data set.
163
+
164
+ # Is ` RDD.mapPartitions() ` Scalable?
165
+ The RDD.mapPartitions() is scalable, since we return a single element
166
+ from each source RDD partition (comprised of many elements). Even if
167
+ the number of partitions in source RDD is high, still it will not cause a
168
+ problem. You need to make sure that you custom function is not a bottleneck.
169
+ For example, if source RDD has 100,000 partitions, then the target RDD will
170
+ have 100,000 elements, which is very simple to apply a final reduction to
171
+ the target RDD. Again, make sure that you custom function is simple and
172
+ efficient.
163
173
164
174
165
175
# Questions/Comments
You can’t perform that action at this time.
0 commit comments