rdd1=sc.
parallelize(rdd);
glom()-check elements in partitions
map()-apply operation on every element of rdd
mappartitions()-map partitions is called on
each and every partition in rdd not on single elements
mappartitionwithindex:it maps partitions along with index;
it returns value and index;
mapValues():it is used to apply operations where in rdd key
value pairs are present ,mapvalue applies operations on
values only.
MAP VS FLATMAP:
map:map applies function on each and every element present in rdd
flatmap:flatmap applies map internally but when giving output
it flattens the result.
groupbykey vs reducebykey:
groupbykey:(1,1,1),data shuffling happens more,not optimized
reducebykey:(3),less data shuffling happens,more optized.
creating dataframe,with ddl schema.,creating df from rdd
/FileStore/tables/employees__1_-1.csv
/FileStore/tables/departments__1_-1.csv