Download PDFOpen PDF in browserDistributed Computing for Advanced Smart Meter Data Management with focus on Electrical Utility ApplicationsEasyChair Preprint 25676 pages•Date: February 5, 2020AbstractWith the advent of internet-of-things devices and sensors in smart grid, big data analytics tools have recently gained immense research interest for big data management and parallel processing of data. However, for the efficient use of big data analytics platforms, complex parameter configurations and in-depth understanding of the data processing design concept are essential. In this work, we analyze the parallelization by utilizing spark regression python library to assess the performance with workloads on up to 8 nodes. With the analysis of the effect of different configurations and architecture on the performance of Apache Spark, it was found that a trade-off between the number of nodes and cores is necessary to perform efficient parallel computing. A set of combinations of nodes and cores are considered to evaluate the response of the run time. The work also shows the importance of high-performance computing capability for the big data management in the smart meters. We infer that the computational time is not only dependent on the size but also on the number of compute nodes and the number of cores used to execute the program. Keyphrases: Apache Spark, Big data Parallel computing, High Performance Computing, Smart Grid, Smart Meter, execution time, load forecast, parallel computing, run-time, spark machine learning
|