[M3devel] function pointers and comparison to nil? mis-typed function pointers?
Jay
jayk123 at hotmail.com
Sun May 25 02:48:32 CEST 2008
I see somewhat.
It's stuff around "closure".
The comparison of code bytes to -1 comes from If_closure for example.
The problem is presumably to come up with a unified representation of pointers to functions that may or may not be nested, while avoiding runtime codegen, even just a little bit, and for Modula-3 and C function pointers to use the same representation.
I don't think the present solution is really valid, and I am skeptical that there is a solution.
One of the requirements has to be dropped.
Sniffing code bytes and trying to decide if they are code or not as appears to currently happen is bogus.
I think the solution is to remove the requirement that a Modula-3 function pointer and a C function pointer are the same.
Except, well, that probably doesn't work -- it means you need two types of function pointers.
Darn this is a hard problem.
The runtime codegen required can be exceedingly simple, fast, and small IF it were allowed to be on the stack. But that's a killer these days.
I think you have to give up unification of "closures" and "function pointers".
If you take the address of a nested function and call it, you cannot access the locals of the enclosing scopes.
So in affect, you end up with "two types of function pointers".
Regular stateless ones and "closures" with some captured state.
Thoughts?
I'm kind of stumped. It's a desirable problem to solve, and there is a purported solution in place, but the solution that is there is completely bogus, despite appearing to work for a long time, and there is no solution. That is my understanding. I could be wrong on any number of points but I'm pretty sure.
I think you have to separate out function pointers and closures.
Sniffing what it pointed to is dubous esp. as currently implemented.
If this is really the way to go, then signature bytes need to be worked out for all architectures that are guaranteed to not look like code.
Or vice versa -- signature bytes worked out that all functions start with, which is viable for Modula-3 but not for interop with C.
Currently -1 is used, of pointer-size.
That appears to be reasonable for x86:
0:000> eb . ff ff ff ff0:000> u .ntdll32!DbgBreakPoint:7d61002d ff ???7d61002e ff ???7d61002f ff ???7d610030 ffc3 inc ebx
but the instruction encodings or disassembly on other architectures would have to be checked.
- Jay
From: jayk123 at hotmail.comTo: m3devel at elegosoft.comDate: Sun, 25 May 2008 00:16:01 +0000Subject: [M3devel] function pointers and comparison to nil? mis-typed function pointers?
I'm being lazy...Tony can you explain this stuff?Comparison of function pointers..What are the various representations and rules?What does it mean to compare nested functions?What does it mean to compare a function to NIL?I'll poke around more.What I am seeing is that comparison of function pointers to NIL is surprisinglyexpensive, and probably somewhat buggy. Or at least some of the runtimegenerated "metadata-ish" stuff is produced or typed incorrectly.In particular, RTLinker.m3:PROCEDURE AddUnit (b: RT0.Binder) = VAR m: RT0.ModulePtr; BEGIN IF (b = NIL) THEN RETURN END; line 119 m := b(0); line 120 IF (m = NIL) THEN RETURN END; line 121 AddUnitI(m); line 122 END AddUnit;generates a lot of code, just for the first line: (556) set_source_line source line 119 (557) load m3cg_load (M3_DjPxE5_b): offset 0x0, convert 0xb -> 0xb (558) load_nil (559) if_eq (560) load m3cg_load (M3_DjPxE5_b): offset 0x0, convert 0xb -> 0xb (561) load_indirect load address offset 0x0 src_t 0x5 dst_t 0x5 (562) load_integer integer n_bytes 0x0 hi 0x0 low 0x1 sign -1 (563) if_eq (564) set_label (565) load_nil (566) load m3cg_load (M3_DjPxE5_b): offset 0x0, convert 0xb -> 0xb (567) if_ne (568) set_label (569) exit_proc (570) set_label (571) set_source_line source line 120 The details on the load_integer trace might not be completely correct. I will test a fix shortly.Esp. that n_bytes gets decremented to zero before the trace.Ok, I see now why some of the bloat -- because the "then return end" is on the same line.If it were written as: if (b = NIL THEN return end It probably wouldn't look so bad. That took me a while to realize.The following is generated for SPARC64_OPENBSD: line 119 .stabn 68,0,119,.LLM61-.LLFBB4 .LLM61: ldx [%fp+2175], %g1 brz %g1, .LL26 nop ldx [%fp+2175], %g1 ldx [%g1], %g1 bus error here? yes, probably this one cmp %g1, -1 be %xcc, .LL27 nop.LL26: ldx [%fp+2175], %g1 brz %g1, .LL33 nop.LL27: line 120 .stabn 68,0,120,.LLM62-.LLFBB4.LLM62: ldx [%fp+2175], %g1 stx %g1, [%fp+2007] ldx [%fp+2007], %g1 brz %g1, .LL30 nop ldx [%fp+2007], %g1 ldx [%g1], %g1 or here ? cmp %g1, -1 bne %xcc, .LL30 nop ldx [%fp+2007], %g1 add %g1, 16, %g1 ldx [%g1], %g1 or here? stx %g1, [%fp+2015] ldx [%fp+2007], %g1 add %g1, 8, %g1 ldx [%g1], %g1 stx %g1, [%fp+2007].LL30: ldx [%fp+2007], %g1 ldx [%fp+2015], %g5 mov 0, %o0 call %g1, 0 nop mov %o0, %g1 stx %g1, [%fp+2023] ldx [%fp+2023], %g1 stx %g1, [%fp+1999] line 121 .stabn 68,0,121,.LLM63-.LLFBB4.LLM63: ldx [%fp+1999], %g1 brz %g1, .LL33 nop.LL32: .stabn 68,0,122,.LLM64-.LLFBB4.LLM64:g1 points to RTSignal_I3(gdb) x/i $pc0x3ff0a8 <RTLinker__AddUnit+28>: ldx [ %g1 ], %g1(gdb) x/i $g10x4021f4 <RTParams_I3>: save %sp, -208, %spI am willing to accept that a "function pointer" is a pair of pointers, or even three pointers.A pointer to code, a pointer to globals for position independent code, a frame pointer to locals.That equality comparison of function pointers requires comparing two (or three) pointers. (Though the global pointer shouldn't need comparing.)At least for nested functions. Less so for non-nested. ?Much less for comparison to NIL. ?And either way, this code is reading bogus data.There isn't a pointer at the function address, there is code.Something doesn't add up.I'm going to try setting "aligned procedures" but that's quite bogus I think.EqualExpr.m3 says Note: procedures pointers are always aligned!but maybe not?Yeah yeah I'm being lazy. I'll read more code..I also wonder if a "function pointer" can be optimized for the case of not being to a nested function.It looks like calling a function pointer is very inefficient.It looks like..am I reading that correctly?.. that if the pointer points to -1, then it is nested anda pair of pointers, and not otherwise. That -1 is treated specially as the first bytes of a function?Is that a Modula-3-ism or a SPARC-ism?It looks like a Modula-3-ism. And it seems dubious.But I'll have to read more.. NT386GNU does the same sort of wrong looking thing: LFBB4: pushl %ebp movl %esp, %ebp subl $24, %espLBB5: .stabn 68,0,117,LM60-LFBB4LM60: movl $0, -16(%ebp) .stabn 68,0,119,LM61-LFBB4LM61: movl 8(%ebp), %eax testl %eax, %eax je L26 movl 8(%ebp), %eax movl (%eax), %eax BAD cmpl $-1, %eax BAD je L27L26: movl 8(%ebp), %eax testl %eax, %eax je L33L27: .stabn 68,0,120,LM62-LFBB4LM62: and NT386: 0:000> ucm3!RTLinker__AddUnit:00607864 55 push ebp00607865 8bec mov ebp,esp00607867 81ec0c000000 sub esp,0Ch0060786d 53 push ebx0060786e 56 push esi0060786f 57 push edi00607870 c745fc00000000 mov dword ptr [ebp-4],000607877 837d0800 cmp dword ptr [ebp+8],00:000> ucm3!RTLinker__AddUnit+0x17:0060787b 0f840f000000 je cm3!RTLinker__AddUnit+0x2c (00607890)00607881 8b7508 mov esi,dword ptr [ebp+8]00607884 8b5e00 mov ebx,dword ptr [esi] BAD 00607887 83fbff cmp ebx,0FFFFFFFFh BAD 0060788a 0f840f000000 je cm3!RTLinker__AddUnit+0x3b (0060789f)00607890 837d0800 cmp dword ptr [ebp+8],000607894 0f8505000000 jne cm3!RTLinker__AddUnit+0x3b (0060789f)0060789a e969000000 jmp cm3!RTLinker__AddUnit+0xa4 (00607908) cm3!RTLinker__AddUnit+0x20:00607884 8b5e00 mov ebx,dword ptr [esi] ds:002b:0062c950=81ec8b550:000> u @esicm3!RTLinker_I3:0062c950 55 push ebp0062c951 8bec mov ebp,esp0062c953 81ec00000000 sub esp,00062c959 53 push ebx0062c95a 56 push esi0062c95b 57 push edi0062c95c 837d0800 cmp dword ptr [ebp+8],00062c960 0f8400000000 je cm3!RTLinker_I3+0x16 (0062c966) This is just wrong.Comparing bytes of code to -1. I think the likely fix is for the "I3" code to be laid out as a "constant function pointer", a pointer to a pair of pointers where one points to the code and one is to -1. Something like that. That can't be quite correct given that the existing data is callable. - Jay
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://m3lists.elegosoft.com/pipermail/m3devel/attachments/20080525/10d33538/attachment-0002.html>
More information about the M3devel
mailing list