This article is contributed. See the original author and article here.

Oftentimes data scientists and other users working on smaller data sets in Azure Databricks explore data and build machine learning (ML) models using single-machine python and R libraries. This exploration and modeling doesn’t always require the distributed computing power of the Delta Engine and Apache Spark offered in Azure Databricks. Doing this type of work on a traditional multi-node cluster often results in wasted/underutilized compute resources on worker machines which results in unnecessary cost.


 


MikeCornell_0-1602856306237.jpeg


 


Single Node clusters is a new cluster mode that allows users to use their favorite libraries like Pandas, Scikit-learn, PyTorch, etc. without wasting unnecessary compute/cost associated with traditional multi-node clusters. Single Node clusters also support running Spark operations if needed, where the single node will host both the driver and executors spread across the available cores on the node. This provides the ability to load and save data using the efficient Spark APIs (with security features such as User Credential Passthrough) and also doing efficient exploration and ML using the most popular single-machine libraries.  


 


MikeCornell_1-1602856306293.png


 


If/when a data scientist wants to use distributed compute to do things like hyperparameter tuning and AutoML or work with larger datasets, they can simply switch over to a standard cluster with more nodes. 


 


When the Single Node capability is combined with other capabilities like:



Azure Databricks provides a truly unified experience intended to make data scientists and other analysts more efficient and effective.


 

Brought to you by Dr. Ware, Microsoft Office 365 Silver Partner, Charleston SC.