Abstract:
Withtheabilitytosimplifythecodedeployment withone-clickuploadandlightweightexecution,serverlesscomputinghasemergedasapromisingparadigmwithincreasing popularity.However,thereremainopenchallengeswhenadapting data-intensiveanalyticsapplicationstotheserverlesscontext,in whichusersofserverlessanalyticsencounterwiththedifficultyin coordinatingcomputationacrossdifferentstagesandprovisioningresourcesinalargeconfigurationspace.Thispaperpresents ourdesignandimplementationofAstra,whichconfiguresand orchestratesserverlessanalyticsjobsinanautonomousmanner, whiletakingintoaccountflexibly-specifieduserrequirements. Astrareliesonthemodelingofperformanceandcostwhich characterizestheintricateinterplayamongmulti-dimensional factors(e.g.,functionmemorysize,degreeofparallelismat eachstage).Weformulateanoptimizationproblembasedon user-specificrequirementstowardsperformanceenhancementor costreduction,anddevelopasetofalgorithmsbasedongraph theorytoobtainoptimaljobexecution.WedeployAstrainthe AWSLambdaplatformandconductreal-worldexperimentsover threerepresentativebenchmarkswithdifferentscales.Results demonstratethatAstracanachievetheoptimalexecutiondecision forserverlessanalytics,byimprovingtheperformanceof21% to60%underagivenbudgetconstraint,andresultingina costreductionof20%to80%withoutviolatingperformance requirement,whencomparedwiththreebaselineconfiguration algorithms.