Piyush Sewal

EDUCATION

Ph.D Computer Science

RESEARCH, TEACHING, or OTHER INTERESTS

Artificial Intelligence, Computer Science, Computer Science Applications, Software

Scopus Publications

Performance optimization of Spark MLlib workloads using cost efficient RICG model on exponential projective sampling
Piyush Sewal, Hari Singh
Cluster Computing, 2024
A predictive analysis of the COVID-19 pandemic for traditional and tree-based regression algorithms
Hari Singh, Piyush Sewal, Dinesh Chander Verma
Impact of Digital Solutions for Improved Healthcare Delivery, 2024
A lot of works exist in the literature that compares regression algorithms on different datasets. This chapter presents a model that uses best subset selection approach for the predictors and performs an exhaustive empirical comparison of eight regression algorithms Linear Regression, Multi-Linear Regression, Polynomial Regression, K-Nearest Neighbors, Lasso, Ridge, Decision Tree, Gradient Boost Tree, and Random Forest Regression algorithms on various predictors from Covid-19 dataset. The model is evaluated for train accuracy on metrics R2, Root Mean Square Error, and Mean Absolute Error. The test R2 and adjusted-R2 metrics evaluate the model on cross-validation prediction test errors. The predicted values of dependent variables are checked for similarity and validation using statistical z-test.
Correction to: Analyzing distributed Spark MLlib regression algorithms for accuracy, execution efficiency and scalability using best subset selection approach (Multimedia Tools and Applications, (2023), 83, 15, (44047-44066), 10.1007/s11042-023-17330-5)
Piyush Sewal, Hari Singh
Multimedia Tools and Applications, 2024
Analyzing distributed Spark MLlib regression algorithms for accuracy, execution efficiency and scalability using best subset selection approach
Piyush Sewal, Hari Singh
Multimedia Tools and Applications, 2024
Algorithmic Proficiency in Spark Configuration Tuning: An Empirical Study using Execution Time Metrics across Varied Workloads
Piyush Sewal, Hari Singh
Procedia Computer Science, 2024
In the realm of big data, where datasets of immense scale pose processing challenges, distributed processing platforms like open-source Apache Spark have emerged to address these issues. Spark’s internal configuration parameters exert varying impacts on execution times based on job characteristics, making manual optimization daunting. The core focus of this study lies in optimizing Spark’s internal configurations, with specific attention directed towards three types of workloads: Iterative-intensive, Memory-intensive, and CPU-intensive. Employing Grid Search, Random Search, and Evolutionary Optimization algorithms yields substantial execution time reductions: 23.24% with Grid Search, 19.71% with Random Search, and 23.06% with Evolutionary Optimization. Notably, Evolutionary Optimization achieves optimal configurations approximately 29% faster than Grid Search. While Random Search and Evolutionary Optimization share similar time requirements, Random Search’s execution time reduction for a given Spark workload is relatively lower. This research sheds light on algorithmic configuration tuning intricacies and its influence on Spark workload execution times, contributing to the exploration of optimizing big data processing platforms.
A Machine Learning Approach for Predicting Execution Statistics of Spark Application
Piyush Sewal, Hari Singh
Pdgc 2022 2022 7th International Conference on Parallel Distributed and Grid Computing, 2022
Apache Spark is one of the most popular, widely used and open-source distributed processing framework that can process huge site datasets in time efficient manner due to its in-memory computational capabilities. However, there are several factors that can affect the performance of an application which include the nature and size of the input dataset, computational capability of the system and nature and design of the algorithm. Hence, there are different parameters that are required to correctly predict the execution statistics of a Spark application which include execution time of jobs, stages and tasks, memory requirement and usage at the execution level and I/O cost in the form of read/ write shuffling of data. To address these challenges, a simulation and machine learning based prediction model is presented in this paper that takes only a few initial samples of execution statistics and predicts the performance and execution statistics of the Spark application with high accuracy. The proposed model is evaluated on the Wordcount application and Spark standalone mode and accuracy metrics show that the proposed model achieves high accuracy in predicting execution statistics.
A Critical Analysis of Apache Hadoop and Spark for Big Data Processing
Piyush Sewal, Hari Singh
Proceedings of IEEE International Conference on Signal Processing Computing and Control, 2021
The emergence of big data processing platforms that can work globally in an integrated manner and process the huge datasets efficiently has become very significant. A critical analysis of two big data processing platforms, Apache Hadoop MapReduce and Apache Spark, has been done in this paper. Earlier Hadoop MapReduce was one of the most popular platforms for batch-processing of huge size datasets but variation in the nature of data from static to dynamic, Apache Spark proves to be better for iterative jobs and live data streams. This paper aims to critically compare and analyze Hadoop-l.x, 2. x and 3. x, Spark-l.x, 2. x and 3. x on well-known key parameters like components, storage system, resource management, fault tolerance, data processing, scalability and performance etc.

RECENT SCHOLAR PUBLICATIONS

A Predictive Analysis of the COVID-19 Pandemic for Traditional and Tree-Based Regression Algorithms
H Singh, P Sewal, DC Verma
Impact of Digital Solutions for Improved Healthcare Delivery, 303-340 , 2025
2025
Citations: 2
Performance optimization of Spark MLlib workloads using cost efficient RICG model on exponential projective sampling
P Sewal, H Singh
Cluster Computing 27 (8), 10569-10588 , 2024
2024
Citations: 5
Utilizing Twitter data and NLP to analyze and predict public sentiment trends in mental health
T Gupta, A Sharma, Aryan, K Rana, P Sewal
The International Conference on Recent Trends in Communication & Intelligent … , 2024
2024
Citations: 1
Analyzing distributed Spark MLlib regression algorithms for accuracy, execution efficiency and scalability using best subset selection approach
P Sewal, H Singh
Multimedia Tools and Applications 83 (15), 44047-44066 , 2024
2024
Citations: 11
Performance comparison of apache spark and hadoop for machine learning based iterative GBTR on HIGGS and covid-19 datasets
P Sewal, H Singh
Scalable Computing: Practice and Experience 25 (3), 1373-1386 , 2024
2024
Citations: 12
Improving Execution Workloads in In-Memory Distributed Computing Platform–SPARK
P Sewal, H Singh
Jaypee University of Information Technology, Solan, HP , 2024
2024
Algorithmic proficiency in spark configuration tuning: An empirical study using execution time metrics across varied workloads
P Sewal, H Singh
Procedia Computer Science 235, 2307-2317 , 2024
2024
Citations: 2
A machine learning approach for predicting execution statistics of spark application
P Sewal, H Singh
2022 Seventh International Conference on Parallel, Distributed and Grid … , 2022
2022
Citations: 6
A critical analysis of apache hadoop and spark for big data processing
P Sewal, H Singh
2021 6th International Conference on Signal Processing, Computing and … , 2021
2021
Citations: 33

MOST CITED SCHOLAR PUBLICATIONS

A critical analysis of apache hadoop and spark for big data processing
P Sewal, H Singh
2021 6th International Conference on Signal Processing, Computing and … , 2021
2021
Citations: 33
Performance comparison of apache spark and hadoop for machine learning based iterative GBTR on HIGGS and covid-19 datasets
P Sewal, H Singh
Scalable Computing: Practice and Experience 25 (3), 1373-1386 , 2024
2024
Citations: 12
Analyzing distributed Spark MLlib regression algorithms for accuracy, execution efficiency and scalability using best subset selection approach
P Sewal, H Singh
Multimedia Tools and Applications 83 (15), 44047-44066 , 2024
2024
Citations: 11
A machine learning approach for predicting execution statistics of spark application
P Sewal, H Singh
2022 Seventh International Conference on Parallel, Distributed and Grid … , 2022
2022
Citations: 6
Performance optimization of Spark MLlib workloads using cost efficient RICG model on exponential projective sampling
P Sewal, H Singh
Cluster Computing 27 (8), 10569-10588 , 2024
2024
Citations: 5
A Predictive Analysis of the COVID-19 Pandemic for Traditional and Tree-Based Regression Algorithms
H Singh, P Sewal, DC Verma
Impact of Digital Solutions for Improved Healthcare Delivery, 303-340 , 2025
2025
Citations: 2
Algorithmic proficiency in spark configuration tuning: An empirical study using execution time metrics across varied workloads
P Sewal, H Singh
Procedia Computer Science 235, 2307-2317 , 2024
2024
Citations: 2
Utilizing Twitter data and NLP to analyze and predict public sentiment trends in mental health
T Gupta, A Sharma, Aryan, K Rana, P Sewal
The International Conference on Recent Trends in Communication & Intelligent … , 2024
2024
Citations: 1
Improving Execution Workloads in In-Memory Distributed Computing Platform–SPARK
P Sewal, H Singh
Jaypee University of Information Technology, Solan, HP , 2024
2024