Abstract:
Big data is the reality of the 21st century. However, big data modeling and prediction require advanced level analytics which encompasses both the computing-intensive and statistics-oriented analysis tools in data science. Regression analysis is the statistical method for predictive modeling, and it is one of the most commonly used methods in many scientific fields such as engineering, the physical and chemical sciences, economics, management, life and biological sciences, and the social sciences, sociology, geology, etc. Satisfying the assumptions such as collinearity between variables ought to be a significant issue in data science. Advanced level tools such as Lasso and Ridge regression methods are designed to overcome such problem. In this study we discussed about comparing linear regression with the Ridge and Lasso. The Vinho Verde white wine test data from the Minho (northwest) region of Portugal is used to analyze advantages of each of the three regression analysis methods. All the required calculations and graphical displays are performed using the R software for statistical computing.