Abstract:
Withtheabilitytosimplifythecodedeploymentwithone-clickuploadandlightweightexecution,serverlesscomputinghas emergedasapromisingparadigmwithincreasingpopularity.However,thereremainopenchallengeswhenadaptingdata-intensive analyticsapplicationstotheserverlesscontext,inwhichusersofserverlessanalyticsencounterthedifficultyincoordinatingcomputation acrossdifferentstagesandprovisioningresourcesinalargeconfigurationspace.Thispaperpresentsourdesignandimplementationof Astrea,whichconfiguresandorchestratesserverlessanalyticsjobsinanautonomousmanner,whiletakingintoaccountflexibly-specified userrequirements.Astreareliesonthemodelingofperformanceandcostwhichcharacterizestheintricateinterplayamongmultidimensionalfactors(e.g.,functionmemorysize,degreeofparallelismateachstage).Weformulateanoptimizationproblembasedon user-specificrequirementstowardsperformanceenhancementorcostreduction,anddevelopasetofalgorithmsbasedongraph theorytoobtaintheoptimaljobexecution.WedeployAstreaintheAWSLambdaplatformandconductreal-worldexperimentsover representativebenchmarks,includingBigDataanalyticsandmachinelearningworkloads,atdifferentscales.Extensiveresults demonstratethatAstreacanachievetheoptimalexecutiondecisionforserverlessdataanalytics,incomparisonwithvariousprovisioning anddeploymentbaselines.Forexample,whencomparedwiththreeprovisioningbaselines,Astreamanagestoreducethejobcompletion timeby21%to69%underagivenbudgetconstraint,whilesavingcostby20%to84%withoutviolatingperformancerequirements.