This article is contributed. See the original author and article here.

In Azure Synapse, system configurations of spark pool look like below, where the number of executors, vcores, memory is defined by default.


mubhashk_0-1612465171017.png


 


There could be the requirement of few users who want to manipulate the number of executors or memory assigned to a spark session during execution time.


 


Usually, we can reconfigure them by traversing to the Spark pool on Azure Portal and set the configurations in the spark pool by uploading text file which looks like this:


 


mubhashk_1-1612465200341.png


 


mubhashk_2-1612465230427.png


 


But in the Synapse spark pool, few of these user-defined configurations get overridden by the default value of the Spark pool.


 


What should be the next step to persist these configurations at the spark pool Session level?


 


For notebooks


If we want to set config of a session with more than the executors defined at the system level (in this case there are 2 executors as we saw above), we need to write below sample code to populate the session with 4 executors. This sample code helps to logically get more executors for a session.


mubhashk_3-1612465255185.png


Execute the below code to confirm that the number of executors is the same as defined in the session which is 4 :


mubhashk_4-1612465272088.png


In the sparkUI you can also see these executors if you want to cross verify :


mubhashk_5-1612465293726.png


A list of many session configs is briefed here 


 


We can also setup the desired session-level configuration in Apache Spark Job definition :


 


For Apache Spark Job:


 


If we want to add those configurations to our job, we have to set them when we initialize the Spark session or Spark context, for example for a PySpark job:


 


Spark Session:


 


from pyspark.sql import SparkSession


 


if __name__ == “__main__”:


     


      # create Spark session with necessary configuration


      spark = SparkSession


    .builder


    .appName(“testApp”)


    .config(“spark.executor.instances”,”4″)


    .config(“spark.executor.cores”,”4″)


    .getOrCreate()


 


Spark Context:


 


from pyspark import SparkContext, SparkConf


 


if __name__ == “__main__”:


     


      # create Spark context with necessary configuration


      conf = SparkConf().setAppName(“testApp”).set(“spark.hadoop.validateOutputSpecs”, “false”).set(“spark.executor.cores”,”4″).set(“spark.executor.instances”,”4″)


      spark = SparkContext(conf=conf)


 


 


Hope this helps you to configure a job/notebook as per your convenience with the number of executors.


 


 

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.