你所不知道的C語言：指標篇 - HackMD

2024-11-18

文章推薦指數： 80 %

投票人數：10人

The construction of a pointer type from a referenced type is called ''pointer type derivation''. 注意到術語！這是C 語言只有call-by-value 的實證，函式的傳遞都 ... ownedthisnote Published LinkedwithGitHub Like33 Bookmark Subscribe Edit --- tags:DYKC,C,CLANG,CLANGUAGE,pointer --- #[你所不知道的C語言](https://hackmd.io/@sysprog/c-prog/)：指標篇 *「指標」扮演「記憶體」和「物件」之間的橋樑* Copyright(**慣C**)2015,2016,2018[宅色夫](http://wiki.csie.ncku.edu.tw/User/jserv) ==[直播錄影(上)](https://youtu.be/G7vERppua9o)== ==[直播錄影(下)](https://www.youtube.com/watch?v=Owxols1RTAg&feature=youtu.be)== --- ##即便你對指標毫無掌握，你還是能寫程式 ÓlafurWaage在[CppCon2018](https://cppcon.org/cppcon-2018-program/)有個5分鐘的演講"[Let'slearnprogrammingbyinventingit](https://www.youtube.com/watch?v=l5Mp_DEn4bs)"提及，學習C語言的過程，以K&R一書為例，許多人聞風喪膽的「指標」一直到第5章，約全書一半才提到，可解讀為「你可以在不懂指標是什麼之前，仍可掌握多數的C語言功能」。

##這個講座並非「頭腦體操」 *stackoverflow上的[頭腦體操](http://stackoverflow.com/questions/8208021/how-to-increment-a-pointer-address-and-pointers-value/8208106#8208106) 取自[CTrapsandPitfalls](http://www.literateprogramming.com/ctraps.pdf)的案例"UnderstandingDeclarations": ```cpp (*(void(*)())0)(); ``` 可改寫為以下敘述： ```cpp typedefvoid(*funcptr)(); (*(funcptr)0)(); ``` -[][godbolt](http://gcc.godbolt.org/):直接在網頁上看到gcc生成的程式碼 ```cpp intmain(){ typedefvoid(*funcptr)(); (*(funcptr)(void*)0)(); } ``` 對應的組合語言，搭配`-Os`(空間最佳化) ```assembly main: pushq%rax xorl%eax,%eax call*%rax xorl%eax,%eax popq%rdx ret ``` >[source](https://media.giphy.com/media/G10pb1bOz98oE/giphy.gif) 科技公司面試題: ```cpp void**(*d)(int&,char**(*)(char*,char**)); ``` 上述宣告的解讀： *disapointertoafunctionthattakestwoparameters: -areferencetoanintand -apointertoafunctionthattakestwoparameters: -apointertoacharand -apointertoapointertoachar -andreturnsapointertoapointertoachar *andreturnsapointertoapointertovoid [signal系統呼叫](http://man7.org/linux/man-pages/man2/signal.2.html)的宣告方式也很經典: -[Howtoreadthisprototype?](http://stackoverflow.com/questions/15739500/how-to-read-this-prototype) ##Go語言也有指標 1999年4月27日，KenThompson和DennisRitchie自美國總統柯林頓手中接過1998年[NationalMedalofTechnology](https://en.wikipedia.org/wiki/National_Medal_of_Technology_and_Innovation)(國家科技獎章)，隔年12月，時年58歲的KenThompson自貝爾實驗室退休。

KenThompson自貝爾實驗室退休後成為一名飛行員。

大概是整日翱翔天際，獲得頗多啟發，在2006年，他進入Google工作，隔年他和過去貝爾實驗室的同僚[RobPike](https://en.wikipedia.org/wiki/Rob_Pike)及RobertGriesemer等人在公司內部提出嶄新的Go程式語言，後者可用於雲端運算在內的眾多領域。

指標這個好東西，當然也要從C語言帶過去給Go語言，連同美妙的struct。

-根據第一份[GolangTalk](https://talks.golang.org/2009/go_talk-20091030.pdf)，RobertGriesemer,KenThompson及RobPike等三人認為，世界在變，但系統語言卻已十年未有劇烈變革 -Go之前的程式語言未能達到: -新增函式庫不是一個正確的方向 -需要重新思考整個架構來開發新的程式語言在實做層面，pointer和struct往往是成雙成對存在(下方會解釋) ##先羅列你已經知道的部分 {%youtubet5NszbIerYc%} >[DavidBrailsford](http://www.cs.nott.ac.uk/~psadb1/)教授解說C語言指標 *[C語言:超好懂的指標](https://kopu.chat/2017/05/15/c%E8%AA%9E%E8%A8%80-%E8%B6%85%E5%A5%BD%E6%87%82%E7%9A%84%E6%8C%87%E6%A8%99%EF%BC%8C%E5%88%9D%E5%AD%B8%E8%80%85%E8%AB%8B%E9%80%B2%EF%BD%9E/) *[EverythingyouneedtoknowaboutpointersinC](https://boredzo.org/pointers/) *疑惑 *該如何解釋[qsort(3)](https://linux.die.net/man/3/qsort)的參數和設計考量呢？ *為何我看到的程式碼往往寫成類似下面這樣？ ```cpp structlist**lpp; for(lpp=&list;*lpp!=NULL;lpp=&(*lpp)->next) ``` ##回頭看C語言規格書在[開發工具和規格標準](https://hackmd.io/s/HJFyt37Mx)篇提過參考第一手資料的重要性，以下ISO/IEC9899(簡稱"C99")和指標相關的描述： -[規格書](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1256.pdf)(PDF)搜尋"***object***"，共出現735處 -搜尋"***pointer***"，共出現637處。

有趣的是，許多教材往往不談object，而是急著談論pointer，殊不知，這兩者其實就是一體兩面 -object!=object-oriented -前者的重點在於「資料表達法」，後者的重點在於"everythingisobject" -C11([ISO/IEC9899:201x](http://www.open-std.org/jtc1/sc22/WG14/www/docs/n1570.pdf))/[網頁版](http://port70.net/~nsz/c/c11/n1570.html) -`&`不要都念成and，涉及指標操作的時候，要讀為"addressof" -C99標準[6.5.3.2]Addressandindirectionoperators提到'==&=='address-ofoperator -C99[3.14]***object*** -regionofdatastorageintheexecutionenvironment,thecontentsofwhichcanrepresentvalues -在C語言的物件就指在執行時期，==資料==儲存的區域，可以明確表示數值的內容 -很多人誤認在C語言程式中，(int)7和(float)7.0是等價的，其實以資料表示的角度來看，這兩者截然不同，前者對應到二進位的"111"，而後者以IEEE754表示則大異於"111" -C99[6.2.4]***Storagedurationsofobjects*** -Anobjecthasastoragedurationthatdeterminesitslifetime.Therearethreestoragedurations:static,automatic,andallocated. >注意生命週期(lifetime)的概念，中文講「初始化」時，感覺像是「盤古開天」，很容易令人誤解。

其實initialize的[英文意義](http://dictionary.reference.com/browse/initialize)很狹隘："toset(variables,counters,switches,etc.)totheirstartingvaluesatthebeginningofaprogramorsubprogram." -Thelifetimeofanobjectistheportionofprogramexecutionduringwhichstorageisguaranteedtobereservedforit.Anobjectexists,hasaconstantaddressandretainsitslast-storedvaluethroughoutitslifetime.Ifanobjectisreferredtooutsideofitslifetime,thebehaviorisundefined. >在object的生命週期以內，其存在就意味著有對應的常數記憶體位址。

注意，C語言永遠只有call-by-value -Thevalueofapointerbecomesindeterminatewhentheobjectitpointstoreachestheendofitslifetime. >作為object操作的「代名詞」(alias)的pointer，倘若要在object生命週期以外的時機，去取出pointer所指向的object內含值，是未知的。

考慮先做`ptr=malloc(size);free(ptr);`倘若之後做`*ptr`，這個allocatedstorage已經超出原本的生命週期 -Anobjectwhoseidentifierisdeclaredwithnolinkageandwithoutthestorage-classspecifierstatichasautomaticstorageduration. -C99[6.2.5]***Types*** -Apointertypemaybederivedfromafunctiontype,anobjecttype,oranincompletetype,calledthereferencedtype.Apointertypedescribesanobjectwhosevalueprovidesareferencetoanentityofthereferencedtype.ApointertypederivedfromthereferencedtypeTissometimescalled‘‘pointertoT’’.Theconstructionofapointertypefromareferencedtypeiscalled‘‘pointertypederivation’’. >注意到術語！這是C語言只有call-by-value的實證，函式的傳遞都涉及到數值 >這裡的"incompletetype"要注意看，稍後會解釋。

要區分`char[]`和`char*` -Arithmetictypesandpointertypesarecollectivelycalledscalartypes.Arrayandstructuretypesarecollectivelycalledaggregatetypes. >注意"scalartype"這個術語，日後我們看到`++`,`--`,`+=`,`-=`等操作，都是對scalar(純量) >[[來源](http://www.cyut.edu.tw/~cpyu/oldphweb/chapter3/page3.htm)]純量只有大小，它們可用數目及單位來表示(例如溫度=30^o^C)。

純量遵守算數和普通的代數法則。

注意：純量有「單位」(可用`sizeof`操作子得知單位的「大小」)，假設`ptr`是個pointertype，對`ptr++`來說，並不是單純`ptr=ptr+1`，而是遞增或遞移1個「單位」 -Anarraytypeofunknownsizeisanincompletetype.Itiscompleted,foranidentifierofthattype,byspecifyingthesizeinalaterdeclaration(withinternalorexternallinkage).Astructureoruniontypeofunknowncontentisanincompletetype.Itiscompleted,foralldeclarationsofthattype,bydeclaringthesamestructureoruniontagwithitsdefiningcontentlaterinthesamescope. >這是C/C++常見的forwarddeclaration技巧的原理，比方說我們可以在標頭檔宣告`structGraphicsObject;`(不用給細部定義)然後`structGraphicsObject*initGraphics(intwidth,intheight);`是合法的，但`structGraphicsObjectobj;`不合法 -Array,function,andpointertypesarecollectivelycalledderiveddeclaratortypes.AdeclaratortypederivationfromatypeTistheconstructionofaderiveddeclaratortypefromTbytheapplicationofanarray-type,afunction-type,orapointer-typederivationtoT. >這句話很重要，貌似三個不相關的術語「陣列」、「函式」，及「指標」都歸類為deriveddeclaratortypes，讀到此處會感到驚訝者，表示不夠理解C語言 :::info "derivative"這詞在是微積分學中就是導數。

一個函數在某一點的導數描述了這個函數在這一點附近的變化率。

導數的本質是通過極限的概念對函數進行局部的線性逼近。

(一個實值函數的圖像曲線。

函數在一點的導數等於它的圖像上這一點處之切線的斜率) :notes:derivative的KK音標是[dəˋrɪvətɪv]，而derivation的KK音標是[d,ɛrəv'eʃən] ::: 回到C語言，你看到一個數值，是scalar，但可能也是自某個型態衍生出的declaratortypederivation，實際對應到array,function,pointer等型態的derivation **(練習題)**設定絕對地址為`0x67a9`的32-bit整數變數的值為`0xaa6`，該如何寫？ ```cpp *(int32_t*const)(0x67a9)=0xaa6; /*Lvalue*/ ``` -Apointertovoidshallhavethesamerepresentationandalignmentrequirementsasapointertoacharactertype. >關鍵描述！規範`void*`和`char*`彼此可互換的表示法 ```cpp void*memcpy(void*dest,constvoid*src,size_tn); ``` ##英文很重要安裝`cdecl`程式，可以幫你產生C程式的宣告。

```shell $sudoapt-getinstallcdecl ``` 使用案例: ```shell $cdecl cdecl>declareaasarrayofpointertofunctionreturningpointertofunctionreturningpointertochar ``` 會得到以下輸出: ```cpp char*(*(*a[])())() ``` 把前述C99規格的描述帶入，可得: ```shell cdecl>declarearrayofpointertofunctionreturningstructtag ``` ```cpp structtag(*var[])() ``` 如果你沒辦法用英文來解說C程式的宣告，通常表示你不理解！ `cdecl`可解釋C程式宣告的意義，比方說： ```shell cdecl>explainchar*(*fptab[])(int) declarefptabasarrayofpointertofunction(int)returningpointertochar ``` ##`void*`之謎 -`void`在最早的C語言是不存在的，直到C89才確立，為何要設計這樣的型態呢？ -[最早的C語言中](https://www.bell-labs.com/usr/dmr/www/primevalC.html)，任何函式若沒有特別標注返回型態，一律變成`int`(伴隨著`0`作為返回值)，但這導致無從驗證functionprototype和實際使用的狀況 -`void*`的設計，導致開發者必須透過==explicit(顯式)==或強制轉型，才能存取最終的object，否則就會丟出編譯器的錯誤訊息，從而避免危險的指標操作 -我們無法直接對`void*`做數值操作 ```cpp void*p=...; void*p2=p+1;/*whatexactlyisthesizeofvoid?*/ ``` -換句話說，`void*`存在的目的就是為了強迫使用者使用==顯式轉型==或是==強制轉型==，以避免Undefinedbehavior產生 -C/C++[Implicitconversion](http://en.cppreference.com/w/cpp/language/implicit_conversion)vs.[Explicittypeconversion](https://en.cppreference.com/w/cpp/language/explicit_cast) -C99對signextension的[定義和解說](https://www.ptt.cc/bbs/C_and_CPP/M.1460791524.A.603.html) -對某硬體架構，像是ARM，我們需要額外的==alignment==。

ARMv5(含)以前，若要操作32-bit整數(uint32_t)，該指標必須對齊32-bit邊界(否則會在dereference時觸發exception)。

於是，當要從`void*`位址讀取uint16_t時，需要這麼做： ```cpp /*mayreceivewrongvalueifptrisnot2-bytealigned*/ /*portablewayofreadingalittle-endianvalue*/ uint16_tvalue=*(uint16_t*)ptr; uint16_tvalue=*(uint8_t*)ptr|((*(uint8_t*)(ptr+1))<<8); ``` 延伸閱讀:[記憶體管理、對齊及硬體特性](https://hackmd.io/@sysprog/c-memory) ###`void*`真的萬能嗎？依據C99規格6.3.2.3:8[Pointers] >Apointertoafunctionofonetypemaybeconvertedtoapointertoafunctionofanothertypeandbackagain;theresultshallcompareequaltotheoriginalpointer.Ifaconvertedpointerisusedtocallafunctionwhosetypeisnotcompatiblewiththepointed-totype,thebehaviorisundefined. 換言之，C99不保證pointerstodata(inthestandard,"objectsorincompletetypes"e.g.`char*`or`void*`)和pointerstofunctions之間相互轉換是正確的 -可能會招致undefinedbehavior(UB) -注意：C99規範中，存在一系列的UB，詳見[未定義行為篇](https://hackmd.io/@sysprog/c-undefined-behavior) >延伸閱讀:[TheLostArtofStructurePacking](http://www.catb.org/esr/structure-packing/) ##沒有「雙指標」只有「指標的指標」「雙馬尾」(左右「獨立」的個體)和「馬尾的馬尾」(由單一個體關聯到另一個體的對應)不同 -中文的「[雙](https://www.moedict.tw/%E9%9B%99)」有「對稱」且「獨立」的意含，但這跟「指標的指標」行為完全迥異 -講「==雙==指標」已非「懂不懂C語言」，而是漢語認知的問題 C語言中，萬物皆是數值(everythingisavalue)，函式呼叫當然只有call-by-value。

「指標的指標」(英文就是apointerofapointer)是個常見用來改變「傳入變數原始數值」的技巧。

考慮以下程式碼: ```cpp=1 intB=2; voidfunc(int*p){p=&B;} intmain(){ intA=1,C=3; int*ptrA=&A; func(ptrA); printf("%d\n",*ptrA); return0; } ``` 在第5行(含)之前的記憶體示意: ```graphviz digraphstructs{ node[shape=record] {rank=same;structa,structb,structc} structptr[label="ptrA|&A"]; structb[label="B|2"]; structa[label="A|1"]; structc[label="C|3"]; structptr:ptr->structa:A:nw } ``` 第6行`ptrA`數值傳入`func`後的記憶體示意: ```graphviz digraphstructs{ node[shape=record] {rank=same;structa,structb,structc} structp[label="p|