Closures without function pointers | Lambda the Ultimate
文章推薦指數: 80 %
DX11 supports dynamic HLSL compilation directly (even for compute ... What you can do with a function pointer is only call it: f(x). LambdatheUltimate Home Feedback FAQ GettingStarted Discussions Siteoperationdiscussions RecentPosts (newtopic) Departments Courses ResearchPapers DesignDocs Quotations GenealogicalDiagrams Archives Userlogin Username: Password: CreatenewaccountRequestnewpassword Navigation Home»forums»LtUForumClosureswithoutfunctionpointers Iamworkingoncompilingadynamic,collection-oriented,(mostly)first-orderlanguageontohardwarewhichdoesnotefficientlysupportfunctionpointers(ormuchindirectionofanysort).Thiswouldbeeasyifmylanguagewerefullyfirst-order,butthelanguagehasalimitedsetofprimitivehigher-orderoperators(suchasmap,reduce,andscan).Iavoidthecreationoffunctionvaluesbyspecializingthesehigherorderoperatorsoneachdistinctfunctionargumenttheyreceive.So,map(sqrt,...)iscompiledseparatelyfrommap(cos,...)andsoforth.Inmycurrenttranslation/compilationschemeIhaveatemplateforeachoperator,suchas: map_template(f,arg_type,result_type)= emit_fn(arg) z=allocateresult_type[size(arg)] fori=1ton: z[i]=INLINE(f,arg[i])/*copybodyoffabovethis,replacingitsinputvariablewitharg[i]*/ returnz Irunintotrouble,however,whenItrytopassaclosuretoahigherorderoperator.Ican'tsimplyinlinethefunctionportionoftheclosure,sinceitwillbemissingitsclosed-overarguments.Dotheclosureparametersneedtobepassedintotheemittedfunction?ItseemsthatIneedtodistinguishordinarynakedfunctionsfromclosuresinsomeway--thoughI'mnotsurewhatarigorouswaytodothismightbe.Cansomeonesuggestsomereadingwhichwilleithergivemeideasworthcopyingorhelpmeclarifymythinking? edit:Forclarification,compilationhappensatruntime,sothesetofpossiblefirst-orderfunctionsisunbounded. Thanks, Alex ByAlexRubinsteynat2010-11-0708:41|LtUForum|previousforumtopic|nextforumtopic|otherblogs|8422reads Commentviewingoptions Flatlist-collapsed Flatlist-expanded Threadedlist-collapsed Threadedlist-expanded Date-newestfirst Date-oldestfirst 10commentsperpage30commentsperpage50commentsperpage70commentsperpage90commentsperpage150commentsperpage200commentsperpage Selectyourpreferredwaytodisplaythecommentsandclick"Savesettings"toactivateyourchanges. Defunctionalization Thegeneraltechniqueisknownasdefunctionalization,andtherespectiveWikipediapagecontainssomelinks,inparticulartoJohnReynold'soriginal1972article. Edit:Defunctionalizationassuchissomewhatdifferentfromwhatyoudescribe,butIthinkthedifferencemainlyamountstoinliningandpartiallyevaluatingtheapplyfunction. ByAndreasRossbergatSun,2010-11-0709:33|loginorregistertopostcomments Notaclosedsetoffunctions Thetroublewithdefunctionalization(asidefromresultinginextremelylargefunctionbodies)isthatIdon'thaveaknownfinitesetoffunctions.Compilationisinitiatedfromwithinaninterpreter,whichmeansIneedaschemethatcanhandlepreviouslyunseenfunctions. ByAlexRubinsteynatSun,2010-11-0717:12|loginorregistertopostcomments However,youdohaveaknown However,youdohaveaknownfinitesetoffunctionsforeachcallsite:that'showyou'reabletoinlinecosinmap(cos...),andendupwithaspecialisedmap-cosfunction. BypkhuongatSun,2010-11-0717:44|loginorregistertopostcomments I'veseenthedefunctionalizedlight Ok,IthinkIseewhatAndreasmeansby"thedifferencemainlyamountstoinliningandpartiallyevaluatingtheapplyfunction". IfIwantedtorecastwhatI'mdoingasperformingdefunctionalizationthenIcanpretendtohaveaseparateapplyfunctionforeveryeveryhigherorderoperator. So,thefollowingcode: f(x)=sqrt(x) g(x)=map(f,x) isdefunctionalizedinto f=F g(x)=apply_map(f,x) apply_map(tag,arg)=matchtagwithF->/*implementationofmap(f,arg)*/ Ithenperformatrivialcontrolflowanalysistogettheuniquetagbeingpassedtoeachinstanceofapply_map(andapply_reduce,etc..),andspecializethatcallsitetoget: g(x)=apply_map_f(x) apply_map_f(arg)=/*implementationofmap(f,arg)*/ Now,iffisactuallyaclosure,Ijustattachtheclosuredatatothefunctionfunctiontag. f=F(c_arg1,c_arg2,...) andultimatelyendupwith apply_map_f(c_arg1,c_arg2,...,arg) ByAlexRubinsteynatSun,2010-11-0722:06|loginorregistertopostcomments Notthatlight >Ithenperformatrivialcontrolflowanalysistogettheuniquetagbeingpassedtoeachinstanceofapply_map(andapply_reduce,etc..),andspecializethatcallsitetoget Hmm,yourcontrolflowanalysismightnotbethattrivial.AsIunderstanditdefunctionalizationsolvesthegenericcasewhereitisnota-prioriknownwhatargumentisgiventomap.It'saneither/orswitch:Eitheryousolvethegenericcasebypassingwhatisessentiallyaclosurearoundbydefunctionalization(whereyoumightwanttoconsiderwhetherdefunctionalizationisactuallysuchagoodapproachforsolvingthatgenericcase).Or,well,youdon't,andyoumightaswelljustspecializebyconstantpropagation. BymarcoatSun,2010-11-0723:42|loginorregistertopostcomments I'vebeenassumingthat, I'vebeenassumingthat,likeAPL,theOP'slanguagerestrictshowfunctionsareused,sothatthey'renotquitefirst-order,andtheflowanalysisbecomestrivial. BypkhuongatMon,2010-11-0816:57|loginorregistertopostcomments ShouldbeDoable Irunintotrouble,however,whenItrytopassaclosuretoahigherorderoperator.Ican'tsimplyinlinethefunctionportionoftheclosure,sinceitwillbemissingitsclosed-overarguments.Dotheclosureparametersneedtobepassedintotheemittedfunction?ItseemsthatIneedtodistinguishordinarynakedfunctionsfromclosuresinsomeway--thoughI'mnotsurewhatarigorouswaytodothismightbe.Cansomeonesuggestsomereadingwhichwilleithergivemeideasworthcopyingorhelpmeclarifymythinking? Tobehonest,Ithinktheanswerisquitetrivial:Itisalwayspossibletospecializeafunctionwithaconstantbysubstitution. Inyourcase,whereIassumethatyoualwaysmustpassaconstantdenotingafunctiontoahigher-orderfunction,youdon'tneedtothethinkaboutthegeneralcasewherethefunctionargumentmaybeunknownatcompiletime-whichmakesitaloteasier. Andindeed,yes,youneedprovisionstodealwithpossibleextraclosurearguments;butyousawthat,youcan'tjustletthemdissappear.SoIthinkyoucouldachievethatbymakingthoseexplicitinthesyntax.Forexample,bylettingthedeclarationreadmap_template(f(args),arg_type,result_type)whereargsreceivetheclosureparameters. WhatIdon'tunderstandisthatyoubuildclosuresinafirst-orderlanguage(withoutfunctionpointers)inthefirstplace.Howdoyoudothat?DoyouCurrysomefunctionapplications,provideapartiallistofconstantarguments,ordoyouhavelexicalscopeswherevariablesmaybebound?Maybeyoushouldgiveaconcreteexamplewhereyourunintoproblems. [Imean,areyounotsureyouaretryingtoimplementhigher-orderprogramming?Inthatcase,youcouldtrythedefunctionalizationapproachesmentionedintheotherposts.] BymarcoatSun,2010-11-0712:21|loginorregistertopostcomments Notquitehigherorder >Itisalwayspossibletospecializeafunctionwithaconstantbysubstitution. DoyoumeansomethingbeyondthespecializationI'malreadydoing? >Forexample,bylettingthedeclarationreadmap_template(f(args),arg_type,result_type)whereargsreceivetheclosureparameters. ThisistheonlysolutionI'vecomeupwithsofar--butIfeltsomewhatstuckinmythinkingandwantedtoseeiftherewerebetteralternatives. WhatIdon'tunderstandisthatyoubuildclosuresinafirst-orderlanguage(withoutfunctionpointers)inthefirstplace.Howdoyoudothat?DoyouCurrysomefunctionapplications,provideapartiallistofconstantarguments,ordoyouhavelexicalscopeswherevariablesmaybebound?Maybeyoushouldgiveaconcreteexamplewhereyourunintoproblems. Iendupwithclosuresbothfromlambdaliftingofnestedfunctionsandfromexplicitlypartiallyapplyingafunction. Example: vector_index(x,indices)= g(i)=x[i] map(g,indices) Theexpressionmap(g,indices)wouldgettranslatedviathemaptemplateintoalow-levelfunctionwhichiscalledwiththevectorindices. >Imean,areyounotsureyouaretryingtoimplementhigher-orderprogramming? It'sreallyanawkwardmixoffirstandhigherorder.Theprogrammerisfreetocreatefunctionsvaluesandpassthemabout.Theonlyrestrictionisthatfunctionvaluesmustallbe"statically"determinableafterexhaustiveinliningandconstantpropagation. Iguessthisschememightseempoorlymotivated(andmynotionofstaticconfusing),soit'sworthnotingthatthetargetarchitectureisagraphicsprocessorandcompilationhappensatrun-time. Anexampleofafunctionwhichwon'tbeaccepted: f(b,x)= g=ifbthensinelsecos map(g,x) >Inthatcase,youcouldtrythedefunctionalizationapproachesmentionedintheotherposts. Therearetwoproblemswithusingdefunctionalizationhere. 1)Thesetofpossiblefunctionsisnotstaticallyknown--I'mcompilingatruntimeandmorefunctionsmaybedefinedlater. 2)EvenifIknewallthepossiblefunctions,codebloatandcompilationtimewouldbecomeaseriousproblem. ByAlexRubinsteynatSun,2010-11-0717:02|loginorregistertopostcomments Isee,mostly >>Itisalwayspossibletospecializeafunctionwithaconstantbysubstitution. >DoyoumeansomethingbeyondthespecializationI'malreadydoing? No,Imeanexactlythespecializationyouaredoing.Youallowhigher-orderprogrammingwiththerestrictionthatyouonlyspecialize/propagateconstantsymbolsbutnotvariablesymbols.Thisisawell-knowntechnique.Good:veryfastexecutionofspecializedhigher-orderfunctions.Bad:possibleexponentialblow-upincode.(Imaginespecializingcompose(f,g,x)=f(g(x))forsinandcospairs,youendupwithsin/sin,sin/cos,cos/sinandcos/cosspecializations.) >Iendupwithclosuresbothfromlambdaliftingofnestedfunctionsandfromexplicitlypartiallyapplyingafunction. >Example: vector_index(x,indices)= g(i)=x[i] map(g,indices) >Theexpressionmap(g,indices)wouldgettranslatedviathemaptemplateintoalow-levelfunctionwhichiscalledwiththevectorindices. Uh,Ihaveseriousproblemsreadingthisexample.Whatvariableisi?Doyouloopovertheelementsofx?Towhatisg(i)bound,tothei-thelementofx?(Mybestestimateatinterpretingthatblobnowisthatiisuniversallyquantifiedandyoudeclarativelydefineg.Whichis,Imean,like,wow,almostlogicprogramming.g=lambdai->x[i]?) >It'sreallyanawkwardmixoffirstandhigherorder.Theprogrammerisfreetocreatefunctionsvaluesandpassthemabout.Theonlyrestrictionisthatfunctionvaluesmustallbe"statically"determinableafterexhaustiveinliningandconstantpropagation.Iguessthisschememightseempoorlymotivated(andmynotionofstaticconfusing),soit'sworthnotingthatthetargetarchitectureisagraphicsprocessorandcompilationhappensatrun-time.Anexampleofafunctionwhichwon'tbeaccepted: f(b,x)= g=ifbthensinelsecos map(g,x) No,IthinkIunderstandthemotivation.But,constantpropagation,asyouhavepointedout,willonlygetyousofar.Atsomepoint,you'llneedtodefinepreciselywhereandwhenyoucanpropagate.(Whichiseasy,Iguess,youcanonlypropagateconstantfunctionsoanyhigher-orderfunction,whenapplied,shouldonlyallowdefinitionsasargumentsforfunctions.) >Therearetwoproblemswithusingdefunctionalizationhere. >1)Thesetofpossiblefunctionsisnotstaticallyknown--I'mcompilingatruntimeandmorefunctionsmaybedefinedlater. >2)EvenifIknewallthepossiblefunctions,codebloatandcompilationtimewouldbecomeaseriousproblem. Tobehonest,Iexpectthatthatshouldn'tbeaproblemgettingdefunctionalizationtowork.Onewayoflookingat(partof)defunctionalizationisthatitjustspecializeswithaspecificsymbolapplywhichwillswitchbetweenallthefunctionspossible,wherethegenericcaseswitcheswithclosurearguments.IfIdon'tconsiderclosuresforamomentIexpectthataddingfunctionstoapplyatruntimemightnotbesuchabigproblemasyouthink? [Ijustreadyourotherresponse.Yah,weseemtoagree.But,tobehonest,ifyou'regoingtoconsiderdefunctionalization,you'regoinginthedirectionofallowinggenerichigher-orderprogramming,andthenIguesstherearebetterapproachesthandefunctionalizationaroundthen.I.e.,thenjustbitethebulletandimplementalambdainterpreter.] BymarcoatSun,2010-11-0722:52|loginorregistertopostcomments Bad:possibleexponential >Bad:possibleexponentialblow-upincode.(Imaginespecializingcompose(f,g,x)=f(g(x))forsinandcospairs,youendupwithsin/sin,sin/cos,cos/sinandcos/cosspecializations.) True,thoughI'mhopingthatsinceIonlyspecializefunctionargumentstheprogrammeractuallyusesthenumberofspecializationswillstaymanageable. >Uh,Ihaveseriousproblemsreadingthisexample.Whatvariableisi?Doyouloopovertheelementsofx?Towhatisg(i)bound,tothei-thelementofx?...g=lambdai->x[i]?) Sorry,thatwassloppytersenessonmypart.You'rerightaboutthedefinitionofg,it'sjust"g=lambdai->x[i]".Sothecode"map(g,indices)"simplyreturnsasubsetof"x"atthelocationsspecifiedin"indices". >therearebetterapproachesthandefunctionalizationaroundthen.I.e.,thenjustbitethebulletandimplementalambdainterpreter. IwishIcould,butmyexecutionenvironmentmakesitdifficult.I'mcompilingtoNVidiagraphicscardsandevenonthenewestmodelswheredynamicmemoryallocationandfunctionindirectionsarepossibletheperformancehitisterrible.Howdoyouimplementalambdainterpreterwithoutfunctionpointersordynamicmemoryallocation? ByAlexRubinsteynatMon,2010-11-0817:54|loginorregistertopostcomments LooksHard >IwishIcould,butmyexecutionenvironmentmakesitdifficult.I'mcompilingtoNVidiagraphicscardsandevenonthenewestmodelswheredynamicmemoryallocationandfunctionindirectionsarepossibletheperformancehitisterrible.Howdoyouimplementalambdainterpreterwithoutfunctionpointersordynamicmemoryallocation? Well,it'squitehardtoseeformeatthemomentwhatcoderemainsontheCPU,andwhatcodegoestotheGPU.ItlookedliketheGPUhasbranching,soyoucouldimplementfunctionpointersbyaspecializedroutinewhichbranchesonalabelinsteadofafunctionpointer.Dynamicmemorymanagement,Idon'tknow.But,hey,alldynamicmemorymanagementisstaticmanagementwithagarbagecollectorontop,sothatmightbesolvabletoo. [Iamalsopuzzlingmyselfwithalambda-calculusinterpreterwhichjustneedstwostacks,oneforstackframes,oneforvalueswithgarbagecollection,butthatisabitprematureIthink.] Havingsaidthat,IwasthinkingthatasimpleHaskell-likeapproachwhereGPUcodingiscapturedinaseparatemonadmustbedoable.Thusitisalwaystheoreticallypossible,butsuchanapproachisprobablytooslowforyourrequirements. ButtakeitupwithShap,hehastonsmoreexperienceinthisthanIdo. BymarcoatTue,2010-11-0916:47|loginorregistertopostcomments Runtimecompilationoughttobequick >Well,it'squitehardtoseeformeatthemomentwhatcoderemainsontheCPU,andwhatcodegoestotheGPU. GPUcodeis(selectively)synthesizedfromacombinationofanarrayoperator(map,reduce,scan,etc...)anditsknownfunctionargument.ThefunctionargumentistheninlinedintoatemplateGPUprogram. >ItlookedliketheGPUhasbranching,soyoucouldimplementfunctionpointersbyaspecializedroutinewhichbranchesonalabelinsteadofafunctionpointer. That'strue--Ithinkthat'sthedefunctionalizationsledgehammerapproach.Compileallyourfunctionbodiesintoasingleswitchstatement--iftheuserdefinesanewfunction,recompilethisgiganticapplyfunction.Ibelievethisapproachwouldworkbutthecompilationcostswouldbeextreme. ByAlexRubinsteynatTue,2010-11-1619:36|loginorregistertopostcomments Dependsontheshadermodel Dependsontheshadermodelyouarecompilingfor.Forabasicpixel/vertex/geometryshadercombination,compilationisfairlyfast,atleastforsmallamountsofcode.ForanyCUDA/OpenCL/ComputeShader,compilationisdirtslow...evencompilingonceatrun-timeisprobablytoomuch.ItsnotcleartomeifthenewlinkingfeaturescomingoutinCUDA/DirectComputealleviatethisproblemwhensourcecompilationisn'tneeded(thoughtheyarelanguagespecific). BySeanMcDirmidatWed,2010-11-1706:20|loginorregistertopostcomments Compiletimes >compilationisfairlyfast,atleastforsmallamountsofcode. I'mquestioninghowsmallyoucankeepthecode--anyuseofafunctionvalueinaGPUcomputationneedstoinlinethebodyofeveryfunctioninyourprogramintoagiantswitch! Furthermore,ifyouwanttocreateaninteractivesystemyouwouldneedtorecompileeverytimetheuseraddsanewfunction. Iguessifyou'reonlysupportingnon-interactivecompilationandareOKwithalossofmodularity,youcandoCFAtolimitthenumberofpotentialfunctionsflowingtoaparticularargument. >ForanyCUDA/OpenCL/ComputeShader,compilationisdirtslow...evencompilingonceatrun-timeisprobablytoomuch. CompiletimeswithCUDAdependonhowandwhatyou'recompiling.Myproject-mateandIchosetodirectlyemitPTX(GPUpseudo-assembly)andthecombinedtimeforbothourcompilerandNVidia'sJIThasbeeninthe10sofmilliseconds(forsmallishbutstillusefulbenchmarks).Ifyoutrytousethenvcctoolchaindirectly(likeCopperheadorNikola)youcanexpecttowaitasecond.Evenworse,ifyoursystemusesC++templatestoparameterizeCUDAkernels(orusesatemplatedlibrarylikeThrust)thencompiletimescanreallydrag.Thereare,however,manyadvantagestoreusinganexistingcompilersoIdon'tnecessarilyadvocategoingthePTXroute. ByAlexRubinsteynatWed,2010-11-1707:09|loginorregistertopostcomments DX11supportsdynamicHLSL DX11supportsdynamicHLSLcompilationdirectly(evenforcomputeshaders),butitcanbeveryveryslowwhentheyneedtodosomeheavydutyoptimizationsinhere.Theproblemappearstoarisewhentheiraresmallloopsthatcanbeunrolledinhugeways.I'mnotsureifthisoverheadismandatory,slightlybeneficial,orhugelybeneficial.Iaskedthedevsbuttheycouldn'tgivemeanythingthatcouldbeconsideredafinalanswer.Dynamiccompilationofshadersisnowdiscouragedevenifstillsupported,unfortunately. ThereisstillalotofworktobedoneonfastcompilationforGPUs.YourPTXapproachsoundsverynice,especiallysinceyouaren'tusinganyofthehighlevelfeaturesofthelanguage,butIwonderwhatthetradeoffsareofskippingtheNVIDIAtoolchain. OutsideofGPUs,Microsofthasbettersupportfordynamiccompilationthananyonewiththedynamiclanguageruntime.ThismeansBlingcangeneratefairlyefficient(unabstracted,non-allocating)codetomaintaintheGPUshaders. BySeanMcDirmidatWed,2010-11-1708:16|loginorregistertopostcomments Compilationtradeoffs >butIwonderwhatthetradeoffsareofskippingtheNVIDIAtoolchain Wehaveendedupreinventingnvccinatrimmeddownlessfeatureful(andsometimesbuggier)form.ThisincludesmaintainingaC-likekerneltemplatinglanguage,aninternalrepresentationofPTXandasimplecompilerbetweenthetwo.IfdynamiccompilationofCUDACcodeweresufficientlyfastIwouldhappilyswitchtousingthatinstead. ByAlexRubinsteynatWed,2010-11-1719:05|loginorregistertopostcomments Exponentialcodeexplosives Itseemstomethatparticularproblemcouldberesolvedwithalittlehotspotanalysis.Thiswouldonlyspecializedlong-lived,curriedfunctionsthatareusedoften.Andthespecializedfunctionscouldstillbegarbagecollected. Alternatively(oradditionally)youcouldfavora'specialize'tagtosupportprogrammerannotations.specializefx=fx.Thereisnofunctionalsemanticstothespecializetag,butittellstheruntimetogenerateanewcompiledfunction.Inthissense,itisuptherewith'par'and'seq'inHaskell. Aside:I'vebeenleaningevermoretowardsfavoringthose'identity'tagsasabasisforprogramannotation,tothepointthatIcontemplateevencommentsbeturnedintoacommenttaginmyASTs:commentcv=v.I'vecometothinkofpragmasasalittleugly,incomparison.Itmightbeinterestingtoprovidethefunctionsbehindtheseannotationsviapluggablemodules. Presumably,wecouldalsotag'inline'and'noinline'ifwewishedit.inlinef=f.noinlinef=f.Thesewouldcompose.Toinlineaspecializedfunction: z[i]←inline$specializefarg[i] ThoughI'mnotsurethiswouldmakesenseunlessyou'llbeapplyingadditionalargumentstof,andit'sallabithand-wavyuntilyouaddcompilersupportfortheannotationsandhammeroutthedetails. BydmbarbouratWed,2010-11-1717:36|loginorregistertopostcomments True,butmaynotberelevant Tobehonest,Ithinktheanswerisquitetrivial:Itisalwayspossibletospecializeafunctionwithaconstantbysubstitution. Whilethisstatementistrue,therearetwoproblemswithitinthecontextofwhatAlexistryingtodo: ExtensivepracticalexperienceintheCLRworldhasdemonstratedthattemplate-styleexpansionovervaluetypes(i.e.unboxedtypes)leadstoanexplosiveamountofcode.Alex'sstatementthatindirectiononthismachineisproblematicleadsmetoinfer-oratleastask-whetherhemayberelyingheavilyonunboxinginhiscontext. Closurepointersaretheresultofrun-timeallocation.Theyarenotcompile-timepointers.Infact,theyarenotconstantsatall. Allthatbeingsaid,I'mverysurprisedatthepropositionthatasensiblearchitecturecanexistwithoutadequateforfunctionpointers(ormoreprecisely,forindirectcontroltransfer),becauseeveryarchitectureIknowaboutimplementssomeformofRETURNinstruction.Whileitmaybetruethatthereisnostraightforwardindirectcallorindirectjumpinstruction,cleveruseofstack+RETURNshouldsufficeinthiscase. Alex:ifyoucontactmeoffline(shapateros-os.org),I'ldbehappytodiscussimplementationtechniqueswithyou.Therearearangeoftechniqueshere,andIlookedatanumberoftheminthecontextoftheBitCimplementation.Therightchoicedependsonbothyourtargetarchitecturesandyourinteroperabilityrequirements. Oh.IshouldperhapsaddthatwerelyheavilyonavariantofdefunctionalizationinBitC:weimplementpolymorphismbyconcretelyinstantiatingeachprocedure/valueateachtypewhereitisused.Forkernel-levelcodetheabilitytodothatisahardrequirement,butwe'velongsinceconcludedthatinmanypracticalimplementationswe'regoingtohavetospreadthatinstantiationbetweencompile-,link-,andrun-time.Thankfully,wehavesome[perhapsexcessively]cleverideasabouthowtoachievethatinatype-safe,low-overheadway. ByshapatMon,2010-11-0822:48|loginorregistertopostcomments BegtoDiffer >Closurepointersaretheresultofrun-timeallocation.Theyarenotcompile-timepointers.Infact,theyarenotconstantsatall Interesting.IfItakegraphrewritesemanticsforalambda-calculusinterpreterthenclosurescorrespondtothunks,i.e.acallf(x,y)correspondstopushingaclosure[F,x,y]intoaheap(orstack)andxandyareconstantvalues,whereFisthetagforf,andevaluatingthat.There'snoreasontonotseethatasaternaryconstant(symbol)whichisevaluatedbysomeabstractmachinewhichknowsthemeaningofF,i.e.theproceduref. Whenmovingfromtheabstractmachinetoaconcretemachine,Ithinktheobservationisthatpeoplejustseethat,hey,itisjustfastertopushx,pushy,andpushthepointertoftoastackandcallthatandpopoftheframeinsteadofimplementinganabstractmachinewhichallocatessuchathunkontheheapandsomewhatawkwardlyonlydoessymbolicmanipulation. Butit'safinedistinctioninsemantics:Isevaluationmanipulatingconstantsymbols,ordosymbolshavedynamicmeaning?Itjustdependswhereyouplaceadividerinyourhead,Iguess. Allinall,IthinkIdon'tagreewithyouthere. [Notethatthisalsoholdstrueifonethinksofclosuresastupleswhichcapturepartofanenvironmentinafunctionwhichshouldbeevaluated.I.e.,wedeterminedfandx,thusconstruct[F,x]andawaitthevaluefory.] [Iguessintheenditalsojustmatterwithwhichglassesyoulookatcode.Inthelambdacalculusanyfixedvalueisaconstant,whereasinC,anythingcompile-timedeterminableisaconstant.It'sjustamix-upofterminology.] BymarcoatTue,2010-11-0917:04|loginorregistertopostcomments PerhapsIammissingsomethingobvious Generally,whenpeoplestarttalkingaboutclosures,theyaredoingsointhecontextoflanguageshavingfirst-classprocedures.Insuchlanguages,closureformationismorethansymbolmanipulation.Given: (deff(lambda(x) (lambda(y) (+xy)))) (f5) Theresultreturnedbyf5mustrecordthevalueofxwhichisnotknowableatcompiletime.Itmustdosoforanindeterminatetime.Wecannotusuallytellhowmanytimesfwillbecalled,sowecannot(ingeneral)preallocatetheclosureitselfatcompiletime.Theclosuremustbeallocatedatruntime,andifproceduresarefirstclassitcannotresideonthestack. Whichbringsmebacktomyoriginalstatement:closurepointersaretheresultofrun-timeallocation;theyarenotcompile-timeconstantsatall. Thereareseveralimportantandusefulspecialcases: Closuresconsistingexclusivelyofthecaptureofglobalvariablescanbeentirelyelided. Closureswhoseassociatedproceduredoesnotescapecanbestack-allocated. Inspecificcaseswemaybeabletodeterminesomeofthethingsthatwecan'tdetermineingeneral,andinthosecasesseveralofmystatementsdonotapply. Yourproposaltoviewaclosureasaternaryconstantpresumesthatyoucanknowatcompiletimeallpossiblevaluesofxandy.ForveryspecialcasesIcanimagineenumeratingthem,butit'shardtoimaginethatthiswouldbeusefulinmanyrealcases. ByshapatWed,2010-11-1700:14|loginorregistertopostcomments JustDifferentTerminology Myoriginalremarkwasthatyoucanalwaysspecializefunctionsbysubstitutingaconstant. Withconstantandfunction,Imeaneither5andfinyourexampleabove,orcosandmapintheoriginalpost. Ithinkabetterphrasingwouldhavebeen:Youcanalwaysspecializefunctionargumentsbysubstitutingaconstantargumentifthatconstantisstaticallydeterminable. Onaslightlydifferentlevel:WhatIthinkIseeinthedefunctionalizationapproachisthatthatprocedureessentially'twists'higher-orderfunctionalprogramstofirst-order(interpretative)programsbyevaluatingclosures,expressedasdata,withanapplyfunctionwhennecessary.Giventhat,inLCterms,onecanseeclosuresasrecords(orinLCterms,constants)whicharemappedtofunctions(i.e.,theirmeaning)withanapplyoperator.[Onhindsight,Iprobablyshouldn'tcallarecordaconstant.] Thishaslittletodowiththeoperationalinterpretationofaclosureasarun-timedecoratedpointer.Hencethemisunderstanding. Yourinterpretationmakesbettersenseintraditionalcompilerbuildingterminology. BymarcoatWed,2010-11-1701:07|loginorregistertopostcomments You'dprobablyneedlambda You'dprobablyneedlambdaliftingforclosuresgeneratedontheGPU,butyoucertainlycouldspecializeanyclosurespassedintothesnippetofcodethatistoeventuallyrunontheGPU.AsnotedintheOP,thiscompilationcanoccurat'run-time'. IguessI'mfeelingalittleconfused.What,exactly,wouldyoucallthetimeofrun-timecompilation?Isitrun-timeorcompile-time?orjustin-time? Agreaterproblemwouldbesemanticmutability.IfaclosureencapsulatesmutablereferencesitmaybeimpracticaltoreplicatethatclosuretotheGPU.Butifourvaluesandvariablesareimmutable,thenwecanoftenspecializethemastheybecomeknown-and'constant'canreferjustaseasilytoimmutablerecordsandclosuresastovalues. BydmbarbouratWed,2010-11-1721:12|loginorregistertopostcomments Whatyoucandowitha Whatyoucandowithafunctionpointerisonlycallit:f(x).Thereareafinitenumberofdifferentfunctionsinyourprogram.Sowhatyoudoisthis:replaceacalltoafunctionpointerf(x)withcall(f,x).Thenwritecalllikethis: functioncall(f,x){ switch(f){ caseFUNCTION1:function1(x); caseFUNCTION2:function2(x); ... } } ByJulesJacobsatSun,2010-11-0712:59|loginorregistertopostcomments Defunctionalization? Thisisthesameasdefunctionalization,right?ExceptforclosuresIwouldneedtohavebothafunctiontagandadatastructurecontainingtheimplicitarguments. Anyhow,Ithinkthisdoesn'treallyworkformesincecompilationisn'thappeningallatoncebutratherdynamically,whichallowsfornewuserdefinedfunctionstoappear. ByAlexRubinsteynatSun,2010-11-0717:40|loginorregistertopostcomments Don'tforgetlink-time Anyhow,Ithinkthisdoesn'treallyworkformesincecompilationisn'thappeningallatoncebutratherdynamically,whichallowsfornewuserdefinedfunctionstoappear. Thinkingaboutlink-timehasbeenabighelpforme.Atlink-time,youalwayshaveastaticallyknownsetofdefinitions.Onceyoubringdynamiccompilationintothepicture,yourlink-timehappens(also)atruntime(inthedynamiclinkingloader).Butthatdoesn'tchangethefactthatstuffatlink-timeisstillstatic,andallowsyoutodovariousoptimizations.It'sjustthatwithdynamiclinking,youmighthavetoinvalidateearlierassumptions,andrecompilethingsjust-in-(link)time. ByManuelJ.SimoniatSun,2010-11-0718:14|loginorregistertopostcomments Lossofcodecaching? recompilethingsjust-in-(link)time. Doesn'tthatmeanthatIwouldlosetheabilitytocachepreviouslycompiledcode?Iftheuserloadsordefinesanewfunctionandthenusesitwithsomehigherorderoperator,Iwouldhavetocompletelyrecompilethedefunctionalizedimplementationofthatoperator--duplicatingmostoftheworkofthepreviouscompilation. ByAlexRubinsteynatSun,2010-11-0721:12|loginorregistertopostcomments No >Doesn'tthatmeanthatIwouldlosetheabilitytocachepreviouslycompiledcode? No,defunctionalizationmeansyouactuallydon'tneedtospecializehigher-orderfunction;thehigher-orderfunctionismadegenericbybeingabletoswitchbetweenfunctionswithanapplyoperator.(Ok,youneedtoreaddefunctionalizationwithabitofaskewedeyeforthis.) BymarcoatSun,2010-11-0723:01|loginorregistertopostcomments Recompilationofapply? thehigher-orderfunctionismadegenericbybeingabletoswitchbetweenfunctionswithanapplyoperator. EvenifIdidn'tspecializeandinline'apply'intothebodyofahigherorderoperator(whichinmycaseImust),wouldn'tIstillhavetorecompile'apply'everytimeanewfunctiongetsdefined? ByAlexRubinsteynatMon,2010-11-0817:43|loginorregistertopostcomments Inthatcasethereisno Inthatcasethereisnoothersolutionbutrecompilingcertainfunctions.Forexampleiftheuserdefined: functionfoo(f){f()} Whatareyougoingtocompilethatto?Iftheuserdefinesanewfunctionbar,hecanpassthattofoo.Butatthetimeofthedefinitionoffootherewasnobar. Thatsaidyouaregoingtocompiledowntohardware,butnotallfunctionsareknownatcompiletime?! ByJulesJacobsatMon,2010-11-0802:03|loginorregistertopostcomments Perhapshemeantthatnot Perhapshemeantthatnotallfunctionsareknownstatically,butareknowdynamicallyonageneralpurposeCPUthatmastersthehardware? BySeanMcDirmidatMon,2010-11-0804:59|loginorregistertopostcomments Yep Yes,that'swhatImeant.Aninterpreterinitiatestypespecializationandcompilationatruntime.Thismesseswithmyusualnotionsofstaticanddynamic.Ithinkitmakesmoresenseformetothinkaboutthedistinctphasesas(1)static(2)compile-time(3)dynamic. ByAlexRubinsteynatMon,2010-11-0817:35|loginorregistertopostcomments ThisisexactlywhereI'mat Amoreaccuratewaytodescribeit:mastercompiletime,masterruntime/slavecompiletime,slaveruntime.Dynamiccompilationandcodegenerationallowsyoutoblurmastercompile/runtime,especiallyifyouaregenerating/compilingcodefortheslave.Butthisdistinctionisverycommon...GPU(DirectX/OpenGL/OpenCL/CUDA/...)shaderAPIsallowyoutosubmitshadercodedynamicallytobecompiledfortheGPU. Inmycase,allofmyfunctionalcodeonlyneedstoexecuteonthemasteringCPU,andtheresultisbasicallyafunctionlessexpressiontreethatcantheneasilybetranslatedintoHLSL.Theblowupinloopsandinlinedproceduresissignificant,butatleastwitholderversionsofHLSL,thesewouldhavebeenblownupanyways(GPUsdidn'tsupportrealprocedures,indirection,orevendynamicloopingverywell).However,withnewerGPUhardwarethatdoessupportrealprocedures,indirection,anddynamiclooping,I'llprobablyhavetomodifythisapproach. BySeanMcDirmidatMon,2010-11-0823:27|loginorregistertopostcomments Stages Ilikeyourdescriptionofthestages--theyareclearerthanthetrioIusedbefore. I'malsotryingtoendwitha"functionless"representation(necessaryforefficientexecutionevenonnewerGPUs),butIhavetodoalotofworktotransformfromamoreexpressivesourcelanguageintothisfinalform. Ithinkallprojectstryingtorunhigh-levelcodeongraphicsprocessors(Copperhead,Nikola,etc...)areforcedto"concretize"codeintoalanguagewithouttheusualabstractions(functions,polymorphism,etc...). ByAlexRubinsteynatTue,2010-11-1622:06|loginorregistertopostcomments Yep,abstractionsarefine Yep,abstractionsarefineaslongastheycanmeltawayatruntime.ThisiswhyIthinkGPUsareanexcellentplaceformeta-programmingtothrive...theyreallyhaveapurposethere. BySeanMcDirmidatWed,2010-11-1706:29|loginorregistertopostcomments StagedcomputationinBling HereisasolutionthatworksformeinBlingforGPUshaders(whichisverymuchhardwarethatdoesn'tsupportindirectionverywell):writecodethatgeneratesthecodethatwillrunonthehardware.Blingdressesupexpressiontreeswithstronglytypedabstractions,whichissortofpainfultodoinC#andwouldbeanon-issueinadynamicallytypedFPLlikeClojure.Buttheresultisthatitseasytocompose,manipulate,andotherwisebuildintricatecomputationseasilyusingareallanguage(C#)ratherthanastrippeddownshaderlanguage.Anotherbenefitisthatyoucanmakeintegrationbetweenstagedandnon-stagedcodemoreseamless;e.g.,bytransparentlyrefreshingglobalparametershaderregistersviaCPU-hostedcodetointerfacewithgeneratedshadercode. YoucanusethefullpowerofyourFPLinmanipulatingexpressiontrees,butitdefinitelybecomesmoredifficultwhenyouactuallyneedindirectiontooccuronthehardware.I'vedoneabitofthisbybasicallyinliningifstatementsinthegeneratedcode,thoughcaremustbetakennottoexplodejumptables.I'vefoundthatIhaven'tneededthatmuchindirectiononthehardwareyet,thoughIhaven'texploredthetechniquefullyyet. BySeanMcDirmidatMon,2010-11-0801:33|loginorregistertopostcomments Expressiveness? Blinglooksverycool,IwishIhadknownaboutwhenIwas(inaformerlife)codinginC#. I'mcuriousabouttheexpressivenessoftheexpressionsub-language.Isitrestrictedtoscalarcomputations,orcanyouuseaggregateoperatorswithinashader? ByAlexRubinsteynatTue,2010-11-1621:47|loginorregistertopostcomments Ifyoumeanaggregating Ifyoumeanaggregatingresultsontheshader,Blingdoesn'treallysupportthat.Youcoulddosomehackwithageometryshader,butitstoodirtytotalkabout.RightnowBling'sGPUsupportispurelyforthepurposeofwritinggraphicsshaders. BySeanMcDirmidatWed,2010-11-1706:27|loginorregistertopostcomments Logs: HackThePlanet ;JavaLobby ;DailyPython-URL ;DailyWTF ;PHPeverywhere;(more) Wikis: WikiWiki ;Erlang ;CommonLisp ;Haskell ;Squeak ;Tcl;ProgramTransformation Browsearchives «June2022 Su Mo Tu We Th Fr Sa 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 Activeforumtopics ThehiddencostofexceptionhandlingCUE:Anopen-sourcedatavalidationlanguageMyarticleonstatemachinesandDSLevolutionDoweneedexactlytwobindingconstructs?ShenStandardLibrarymore Newforumtopics ThehiddencostofexceptionhandlingMyarticleonstatemachinesandDSLevolutionDoweneedexactlytwobindingconstructs?Cicadalanguage--anewdependentlytypedlanguageCUE:Anopen-sourcedatavalidationlanguagemore Recentcomments Nicewrite-up!1week6hoursagoltu3weeks1dayagoItmaybetoosubtleforme....12weeks3daysagoThequestionisaboutapotentialsubtledifference12weeks5daysagoThereisnonecessarydifference.13weeks1houragoThesis-Antithesis19weeks6daysagoNicereferences25weeks4daysagoProgrammingwithlatticestructure25weeks5daysagoAniceoverview26weeks4daysagoUpdatedlink29weeks3daysago
延伸文章資訊
- 1Example of VK_KHR_buffer_device_address? : r/vulkan - Reddit
I know this was probably added mainly to support raytracing shaders, but is there a good example ...
- 2dcl_interface (sm5 - asm) - Win32 apps | Microsoft Docs
Declare function table pointers (interfaces). ... In the HLSL virtual inlining model it's always ...
- 3Alexandre Mutel on Twitter: "Advanced HLSL using closures ...
Advanced HLSL using closures and function pointers: Shader languages like HLSL, Cg or GLSL are no...
- 4Tutorial 4: Buffers, Shaders, and HLSL - RasterTek
fx HLSL shader program through the technique pointer. The first step in this function is to set o...
- 5Restrictions on function parameters · Issue #1139 - GitHub
Regarding pointer arguments: In SPIR-V for Vulkan, a pointer operand to a function must be into o...