From hendrik at topoi.pooq.com Sun Jul 1 02:39:57 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sat, 30 Jun 2012 20:39:57 -0400 Subject: [M3devel] License compatibility Message-ID: <20120701003957.GA12807@topoi.pooq.com> I've heard, ages ago, that the SRC was not considered compatible with the GPL. I'd really like to know if this is true. Not whether it should be compatible, not whether people were afraid of it being incompatible... not whether some people think it's cmopatible, but whether it *is* compatible. Has anyone ever got a definitive answer to this question? If not, should I ask the FSF explicitly? -- hendrik From dabenavidesd at yahoo.es Sun Jul 1 04:27:24 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 1 Jul 2012 03:27:24 +0100 (BST) Subject: [M3devel] License compatibility In-Reply-To: <20120701003957.GA12807@topoi.pooq.com> Message-ID: <1341109644.19208.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: for me the question is, what kind of license they apply to GPL code for being compatible with us. They did an attempt for the Code Generator Interface, but DEC didn't release for thinking releasing it in some hardware way. Same happened with GPM2 from HP U-code interface, non-disclosure policy agreement negotiation. Thanks in advance --- El s?b, 30/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: [M3devel] License compatibility Para: m3devel at elegosoft.com Fecha: s?bado, 30 de junio, 2012 19:39 I've heard, ages ago, that the SRC was not considered compatible with the GPL.? I'd really like to know if this is true.? Not whether it should be compatible, not whether people were afraid of it being incompatible... not whether some people think it's cmopatible, but whether it *is* compatible. Has anyone ever got a definitive answer to this question? If not, should I ask the FSF explicitly? -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Sun Jul 1 10:52:08 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 1 Jul 2012 10:52:08 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> Message-ID: <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> Text.Length(Dragi?a Duri?)= 15 out from: WITH me = W"Dragi?a Duri?" DO IO.Put("Text.Length(" & me & ")= " & Fmt.Int(Text.Length(me)) & "\n"); END; On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: > Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? From dabenavidesd at yahoo.es Sun Jul 1 18:27:03 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 1 Jul 2012 17:27:03 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> Message-ID: <1341160023.40330.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: it should be less than that, but for single character is right, the problem is that you can't define a wider character in the machine basically, so if your machine can't ... why assume it isn't like that? So bigger machines should have bigger/smaller pointer types (char sizes with byte pointer size or word address size) and change rapidly criteria and keep it like that for the mentioned actual real operation needs for which was designed with char hard-coded and pointer sizes in a lot of classes in Rd/Wr (RdRep, for instance) Thanks in advance --- El dom, 1/7/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Simple change to WIDECHAR type Para: "Mika Nystrom" CC: m3devel at elegosoft.com Fecha: domingo, 1 de julio, 2012 03:52 Text.Length(Dragi?a Duri?)= 15 out from: ? WITH me = W"Dragi?a Duri?" DO ? ? IO.Put("Text.Length(" & me & ")= " & Fmt.Int(Text.Length(me)) & "\n"); ? END; On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: > Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Sun Jul 1 19:39:57 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sun, 1 Jul 2012 13:39:57 -0400 Subject: [M3devel] License compatibility In-Reply-To: References: <20120701003957.GA12807@topoi.pooq.com> Message-ID: <20120701173957.GA8757@topoi.pooq.com> On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > Not compatible. FSF official. > > Sent from my iPhone So this presumably means it is impossible to distribute binary for any Modula 3 program that uses a GPL library even if you include source code. Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. Which means it's practically impossible to provide such a program to anyone that doesn't understand how to use a compiler, which is most Windows users. Or is there some wiggle room somewhere? -- hendrik > > On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > > > I've heard, ages ago, that the SRC was not considered compatible with > > the GPL. I'd really like to know if this is true. Not whether it > > should be compatible, not whether people were afraid of it being > > incompatible... not whether some people think it's cmopatible, but > > whether it *is* compatible. > > > > Has anyone ever got a definitive answer to this question? > > > > If not, should I ask the FSF explicitly? > > > > -- hendrik > > From hendrik at topoi.pooq.com Sun Jul 1 20:58:10 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sun, 1 Jul 2012 14:58:10 -0400 Subject: [M3devel] License compatibility In-Reply-To: References: <20120701003957.GA12807@topoi.pooq.com> <20120701173957.GA8757@topoi.pooq.com> Message-ID: <20120701185810.GA9416@topoi.pooq.com> On Sun, Jul 01, 2012 at 02:08:04PM -0400, Antony Hosking wrote: > I thought LGPL allowed binary linkage without infection. Only if the program is distributed in such a way that the user can relink it with updated versions of the LGPL library. I don't know if that's too much to ask of the typical dumb user I've postulated. Considering how I've had to recompile several m3 libraries just to go on using them with libXaw, it may indeed be too much to expect. Now I don't mind sending out source code. I'm concerned with the end user who minds receiving it. It would presumably be the Modula 3 libraries that pose the problem, I suppose. I'm not talking about the compiler itself, which is not part of my program or the libraries. I guess I'm concerned with the libraries one cannot do without, like libm3. FSF claims that the GPL3 is compatible with more free licensess than the GPL2. Is there a document somewhere that identifies just what the problem is with out license? -- hendrik > > Sent from my iPad > > On Jul 1, 2012, at 1:39 PM, Hendrik Boom wrote: > > > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > >> Not compatible. FSF official. > >> > >> Sent from my iPhone > > > > So this presumably means it is impossible to distribute binary for any > > Modula 3 program that uses a GPL library even if you include source code. > > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > > > Which means it's practically impossible to provide such a program to anyone > > that doesn't understand how to use a compiler, which is most Windows users. > > > > Or is there some wiggle room somewhere? > > > > -- hendrik > > > >> > >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > >> > >>> I've heard, ages ago, that the SRC was not considered compatible with > >>> the GPL. I'd really like to know if this is true. Not whether it > >>> should be compatible, not whether people were afraid of it being > >>> incompatible... not whether some people think it's cmopatible, but > >>> whether it *is* compatible. > >>> > >>> Has anyone ever got a definitive answer to this question? > >>> > >>> If not, should I ask the FSF explicitly? > >>> > >>> -- hendrik > >>> From dabenavidesd at yahoo.es Sun Jul 1 21:10:16 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 1 Jul 2012 20:10:16 +0100 (BST) Subject: [M3devel] License compatibility In-Reply-To: <20120701185810.GA9416@topoi.pooq.com> Message-ID: <1341169816.90454.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: technically, the CM J-V-M was binary compatible with Sun JVM, wasn't it? So in terms of binary compatibility CM3 is binary compatible with Sun JDK (I guess the only version they had), wasn't that the idea to port Java to Modula-3 easily? Ando so if you can link Sun JDK with Gcc I guess you can do it with CM3 at least technically. Thanks in advance --- El dom, 1/7/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] License compatibility Para: "m3devel at elegosoft.com" Fecha: domingo, 1 de julio, 2012 13:58 On Sun, Jul 01, 2012 at 02:08:04PM -0400, Antony Hosking wrote: > I thought LGPL allowed binary linkage without infection. Only if the program is distributed in such a way that the user can relink it with updated versions of the LGPL library.? I don't know if that's too much to ask of the typical dumb user I've postulated.? Considering how I've had to recompile several m3 libraries just to go on using them with libXaw, it may indeed be too much to expect. Now I don't mind sending out source code.? I'm concerned with the end user who minds receiving it. It would presumably be the Modula 3 libraries that pose the problem, I suppose.? I'm not talking about the compiler itself, which is not part of my program or the libraries.? I guess I'm concerned with the libraries one cannot do without, like libm3. FSF claims that the GPL3 is compatible with more free licensess than the GPL2. Is there a document somewhere that identifies just what the problem is with out license? -- hendrik > > Sent from my iPad > > On Jul 1, 2012, at 1:39 PM, Hendrik Boom wrote: > > > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > >> Not compatible.? FSF official. > >> > >> Sent from my iPhone > > > > So this presumably means it is impossible to distribute binary for any > > Modula 3 program that uses a GPL library even if you include source code. > > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > > > Which means it's practically impossible to provide such a program to anyone > > that doesn't understand how to use a compiler, which is most Windows users. > > > > Or is there some wiggle room somewhere? > > > > -- hendrik > > > >> > >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > >> > >>> I've heard, ages ago, that the SRC was not considered compatible with > >>> the GPL.? I'd really like to know if this is true.? Not whether it > >>> should be compatible, not whether people were afraid of it being > >>> incompatible... not whether some people think it's cmopatible, but > >>> whether it *is* compatible. > >>> > >>> Has anyone ever got a definitive answer to this question? > >>> > >>> If not, should I ask the FSF explicitly? > >>> > >>> -- hendrik > >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Sun Jul 1 21:15:35 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 1 Jul 2012 21:15:35 +0200 Subject: [M3devel] License compatibility In-Reply-To: <20120701173957.GA8757@topoi.pooq.com> References: <20120701003957.GA12807@topoi.pooq.com> <20120701173957.GA8757@topoi.pooq.com> Message-ID: <30814087-79E4-429B-B438-F86B3375F23D@m3w.org> GPL is not LGPL. No same restrictions apply. LGPL means you have to link LGPL library dynamically so your program will use system's current version, presumably updateable as update becomes available, regardless of your actions. For GPL libraries, you are probably right. On Jul 1, 2012, at 7:39 PM, Hendrik Boom wrote: > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: >> Not compatible. FSF official. >> >> Sent from my iPhone > > So this presumably means it is impossible to distribute binary for any > Modula 3 program that uses a GPL library even if you include source code. > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > Which means it's practically impossible to provide such a program to anyone > that doesn't understand how to use a compiler, which is most Windows users. > > Or is there some wiggle room somewhere? > > -- hendrik > >> >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: >> >>> I've heard, ages ago, that the SRC was not considered compatible with >>> the GPL. I'd really like to know if this is true. Not whether it >>> should be compatible, not whether people were afraid of it being >>> incompatible... not whether some people think it's cmopatible, but >>> whether it *is* compatible. >>> >>> Has anyone ever got a definitive answer to this question? >>> >>> If not, should I ask the FSF explicitly? >>> >>> -- hendrik >>> From hendrik at topoi.pooq.com Sun Jul 1 21:49:50 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sun, 1 Jul 2012 15:49:50 -0400 Subject: [M3devel] License compatibility In-Reply-To: <1341169816.90454.YahooMailClassic@web29704.mail.ird.yahoo.com> References: <20120701185810.GA9416@topoi.pooq.com> <1341169816.90454.YahooMailClassic@web29704.mail.ird.yahoo.com> Message-ID: <20120701194950.GA9673@topoi.pooq.com> On Sun, Jul 01, 2012 at 08:10:16PM +0100, Daniel Alejandro Benavides D. wrote: > Hi all: > technically, the CM J-V-M was binary compatible with Sun JVM, wasn't > it? So in terms of binary compatibility CM3 is binary compatible with > Sun JDK You're not trying to tell me that I could use CM3 and Sun JDK interchangably, are you? That would mean I can use the JDK to compile Modula 3 code. I have my doubts. > (I guess the only version they had), wasn't that the idea to > port Java to Modula-3 easily? Ando so if you can link Sun JDK with > Gcc I guess you can do it with CM3 at least technically. The question isn't whether we can link CM3 programs with gcc. THe question is whether we can distribute such linked programs. And that doesn't depend on the CM3 compiler as much as the CM3 run-time system. And it's not aa question of technical compatibility. It's a matter off license compatibility. And I suspet the only way we'll get *thst* to work is to write a new run-time system and new libraries that *are* built with a GPL-compatibble license. Or hope the whole issue goes away as free software drifts to freeer licenses and we no longer need any GPL libraries. -- hendrik > Thanks in advance > > --- El dom, 1/7/12, Hendrik Boom escribi?: > > De: Hendrik Boom > Asunto: Re: [M3devel] License compatibility > Para: "m3devel at elegosoft.com" > Fecha: domingo, 1 de julio, 2012 13:58 > > On Sun, Jul 01, 2012 at 02:08:04PM -0400, Antony Hosking wrote: > > I thought LGPL allowed binary linkage without infection. > > Only if the program is distributed in such a way that the user can relink it > with updated versions of the LGPL library.? I don't know if that's too > much to ask of the typical dumb user I've postulated.? Considering how > I've had to recompile several m3 libraries just to go on using them with > libXaw, it may indeed be too much to expect. > > Now I don't mind sending out source code.? I'm concerned with the end > user who minds receiving it. > > It would presumably be the Modula 3 libraries that pose the problem, I > suppose.? I'm not talking about the compiler itself, which is not part > of my program or the libraries.? I guess I'm concerned with the > libraries one cannot do without, like libm3. > > FSF claims that the GPL3 is compatible with more free licensess than the > GPL2. > > Is there a document somewhere that identifies just what the problem is > with out license? > > -- hendrik > > > > > Sent from my iPad > > > > On Jul 1, 2012, at 1:39 PM, Hendrik Boom wrote: > > > > > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > > >> Not compatible.? FSF official. > > >> > > >> Sent from my iPhone > > > > > > So this presumably means it is impossible to distribute binary for any > > > Modula 3 program that uses a GPL library even if you include source code. > > > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > > > > > Which means it's practically impossible to provide such a program to anyone > > > that doesn't understand how to use a compiler, which is most Windows users. > > > > > > Or is there some wiggle room somewhere? > > > > > > -- hendrik > > > > > >> > > >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > > >> > > >>> I've heard, ages ago, that the SRC was not considered compatible with > > >>> the GPL.? I'd really like to know if this is true.? Not whether it > > >>> should be compatible, not whether people were afraid of it being > > >>> incompatible... not whether some people think it's cmopatible, but > > >>> whether it *is* compatible. > > >>> > > >>> Has anyone ever got a definitive answer to this question? > > >>> > > >>> If not, should I ask the FSF explicitly? > > >>> > > >>> -- hendrik > > >>> From hosking at cs.purdue.edu Mon Jul 2 03:34:16 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Sun, 1 Jul 2012 21:34:16 -0400 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <20120630172401.DFE8E1A207C@async.async.caltech.edu> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> Message-ID: <0F25CB80-6844-48A4-B34C-C08670CF96F8@cs.purdue.edu> As far as I know, WIDECHAR was simply for the CM3 JVM to support Java char which is 16-bit. On Jun 30, 2012, at 1:24 PM, Mika Nystrom wrote: > > =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > ... >> >> Solution: >> =3D=3D=3D=3D=3D=3D >> >> * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >> GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >> hold unencoded Unicode characters in scalar values in our Modula-3 = >> programs, while preserving their properties. >> * Implement properties, relations and methods defined for Unicode. With = >> ASCII, numeric order is everything. With Unicode - it is not. This is = >> probably very big project but we can start somewhere, and let interested = >> parties build on it. Dirk Muysers did work in this regard already. >> * Whoever thinks we don't need this and our "tradition" and "legacy" are = >> important, please read this: = >> http://unicode.org/standard/WhatIsUnicode.html . >> >> dd > > Given what you have said about the near-uselessness of WIDECHAR, does anything > actually use it much? What breaks if it is redefined to be the same as, say, > INTEGER? (Or Word.T) > > CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if > that could go back to using the SRC data structures. For people who do stuff > like write VLSI design tools... (probably many other large-scale applications > would like it too). > > Mika From dabenavidesd at yahoo.es Mon Jul 2 04:51:35 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 2 Jul 2012 03:51:35 +0100 (BST) Subject: [M3devel] License compatibility In-Reply-To: <20120701194950.GA9673@topoi.pooq.com> Message-ID: <1341197495.89971.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: technically they were binary license compatibles, I see you take too hard what I say thanks, but don't think so hard about this. But in the need of that you can use the compiler type checking for Modula-3, so most of what you say is true, also if the compiler is compatible perhaps would be question for Eric Muller, who wrote parts of it, the nice thing about Modula-3 was that it was everything object oriented (which is what Java claims about its System). Thanks in advance --- El dom, 1/7/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] License compatibility Para: "m3devel at elegosoft.com" Fecha: domingo, 1 de julio, 2012 14:49 On Sun, Jul 01, 2012 at 08:10:16PM +0100, Daniel Alejandro Benavides D. wrote: > Hi all: > technically, the CM J-V-M was binary compatible with Sun JVM, wasn't > it? So in terms of binary compatibility CM3 is binary compatible with > Sun JDK You're not trying to tell me that I could use CM3 and Sun JDK interchangably, are you?? That would mean I can use the JDK to compile Modula 3 code.? I have my doubts. > (I guess the only version they had), wasn't that the idea to > port Java to Modula-3 easily?? Ando so if you can link Sun JDK with > Gcc I guess you can do it with CM3 at least technically. The question isn't whether we can link CM3 programs with gcc.? THe question is whether we can distribute such linked programs.? And that doesn't depend on the CM3 compiler as much as the CM3 run-time system. And it's not aa question of technical compatibility.? It's a matter off license compatibility.? And I suspet the only way we'll get *thst* to work is? to write a new run-time system and new libraries that *are* built with a GPL-compatibble license. Or hope the whole issue goes away as free software drifts to freeer licenses and we no longer need any GPL libraries. -- hendrik > Thanks in advance > > --- El dom, 1/7/12, Hendrik Boom escribi?: > > De: Hendrik Boom > Asunto: Re: [M3devel] License compatibility > Para: "m3devel at elegosoft.com" > Fecha: domingo, 1 de julio, 2012 13:58 > > On Sun, Jul 01, 2012 at 02:08:04PM -0400, Antony Hosking wrote: > > I thought LGPL allowed binary linkage without infection. > > Only if the program is distributed in such a way that the user can relink it > with updated versions of the LGPL library. I don't know if that's too > much to ask of the typical dumb user I've postulated. Considering how > I've had to recompile several m3 libraries just to go on using them with > libXaw, it may indeed be too much to expect. > > Now I don't mind sending out source code. I'm concerned with the end > user who minds receiving it. > > It would presumably be the Modula 3 libraries that pose the problem, I > suppose. I'm not talking about the compiler itself, which is not part > of my program or the libraries. I guess I'm concerned with the > libraries one cannot do without, like libm3. > > FSF claims that the GPL3 is compatible with more free licensess than the > GPL2. > > Is there a document somewhere that identifies just what the problem is > with out license? > > -- hendrik > > > > > Sent from my iPad > > > > On Jul 1, 2012, at 1:39 PM, Hendrik Boom wrote: > > > > > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > > >> Not compatible. FSF official. > > >> > > >> Sent from my iPhone > > > > > > So this presumably means it is impossible to distribute binary for any > > > Modula 3 program that uses a GPL library even if you include source code. > > > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > > > > > Which means it's practically impossible to provide such a program to anyone > > > that doesn't understand how to use a compiler, which is most Windows users. > > > > > > Or is there some wiggle room somewhere? > > > > > > -- hendrik > > > > > >> > > >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > > >> > > >>> I've heard, ages ago, that the SRC was not considered compatible with > > >>> the GPL. I'd really like to know if this is true. Not whether it > > >>> should be compatible, not whether people were afraid of it being > > >>> incompatible... not whether some people think it's cmopatible, but > > >>> whether it *is* compatible. > > >>> > > >>> Has anyone ever got a definitive answer to this question? > > >>> > > >>> If not, should I ask the FSF explicitly? > > >>> > > >>> -- hendrik > > >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jul 2 10:09:43 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 2 Jul 2012 10:09:43 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <0F25CB80-6844-48A4-B34C-C08670CF96F8@cs.purdue.edu> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <0F25CB80-6844-48A4-B34C-C08670CF96F8@cs.purdue.edu> Message-ID: To be compatible, at least partially, with some other solution. Completeness of that other solution did not rub magically on cm3 just because they invented WIDECHAR as standard scalar type. On Jul 2, 2012, at 3:34 AM, Tony Hosking wrote: > As far as I know, WIDECHAR was simply for the CM3 JVM to support Java char which is 16-bit. > > On Jun 30, 2012, at 1:24 PM, Mika Nystrom wrote: > >> >> =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: >> ... >>> >>> Solution: >>> =3D=3D=3D=3D=3D=3D >>> >>> * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >>> GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >>> hold unencoded Unicode characters in scalar values in our Modula-3 = >>> programs, while preserving their properties. >>> * Implement properties, relations and methods defined for Unicode. With = >>> ASCII, numeric order is everything. With Unicode - it is not. This is = >>> probably very big project but we can start somewhere, and let interested = >>> parties build on it. Dirk Muysers did work in this regard already. >>> * Whoever thinks we don't need this and our "tradition" and "legacy" are = >>> important, please read this: = >>> http://unicode.org/standard/WhatIsUnicode.html . >>> >>> dd >> >> Given what you have said about the near-uselessness of WIDECHAR, does anything >> actually use it much? What breaks if it is redefined to be the same as, say, >> INTEGER? (Or Word.T) >> >> CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if >> that could go back to using the SRC data structures. For people who do stuff >> like write VLSI design tools... (probably many other large-scale applications >> would like it too). >> >> Mika > From rodney_bates at lcwb.coop Mon Jul 2 16:50:18 2012 From: rodney_bates at lcwb.coop (Rodney Bates) Date: Mon, 2 Jul 2012 07:50:18 -0700 Subject: [M3devel] UTF-8 TEXT Message-ID: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> -Rodney Bates --- antony.hosking at gmail.com wrote: From: Antony Hosking To: "Rodney M. Bates" Cc: "m3devel at elegosoft.com" Subject: Re: [M3devel] UTF-8 TEXT Date: Thu, 28 Jun 2012 10:37:36 -0400 Why not simply say that CHAR is an enumeration representing all of UTF-32? The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. We would need to translate the current Latin-1 literals into UTF-32. And we could simply have a new literal form for Unicode literals. This is almost what I would propose to do, with a couple of differences: Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. I am sure there is lots of existing code that depends on the implementation properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. Then I would define, in the language itself, that WIDECHAR is Unicode, not UTF-32. Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an implementation characteristic that BYTESIZE(WIDECHAR))=4. On Jun 27, 2012, at 10:12 PM, Rodney M. Bates wrote: > > > On 06/27/2012 07:32 PM, Antony Hosking wrote: >> So what do we do about 6-byte UTF-8 code points? They won't fit in WIDECHAR. Surely we should allow accessing a UTF-8 character as a CARDINAL and be done with it? >> > > Absolutely. Except I think a better way is to make WIDECHAR big enough to hold all of > Unicode. > >> Sent from my iPad >> >> On Jun 27, 2012, at 3:20 PM, "Rodney M. Bates" wrote: >> >>> >>> >>> On 06/26/2012 10:30 PM, Hendrik Boom wrote: >>>> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >>>>> I seem to recall that Rodney did some work a while back relating to TEXT. >>>>> Rodney, can you weigh in on some of this? >>>>> --Randy Coleburn >>>>> >>>>> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >>>>> Sent: Tuesday, June 26, 2012 12:46 PM >>>>> To: Jay >>>>> Cc: m3devel >>>>> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >>>>> >>>>> You had idea in other message. Store length! >>>>> >>>>> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. >>>> >>>> Most of the time, you don't need explicit integer indexes to character >>>> locations. What you do need is an operation that fetches a character >>>> given the string and its index (whatever data structure that index is), >>>> and one that increments the index past that character. As long as you >>>> can save an index and use it later on the same string, that's probably >>>> all you ever need. And with a simple TEXT representation (such as the >>>> obvious array of bytes containing characters of various widths) a byte >>>> index is all you need (note: NOT a character index). It's easy even to >>>> use TEXT and its integer indices as the data representation, as long as >>>> you use the proper functions parse the characters and increment the >>>> indices by amounts that might differ from 1. >>>> >>>> And if your source code is represented in UTF-8, the representation that >>>> requires little extra compiler effort to parse, your TEXT strings will >>>> automagically appear in UTF-8. >>> >>> The original designers of the language and its libraries have given us >>> two different abstractions for handling character strings (in addition >>> to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. >>> >>> Text is highly general and easy to use. Concatentations and substrings >>> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >>> Random access by *character* number is easy and, hopefully, implemented >>> with efficiency at least better than O(n). >>> >>> Wr and friends restrict you to sequential access, at least mostly, but >>> gain implementation convenience and efficiency as a result. >>> >>> I feel very stongly that we should *not* take away the full generality >>> of Text, especially efficient random access, to handle variable-length >>> character encodings in strings. For these, lets make more friends of >>> Wr and Rd, which already assume sequential access. For example, a >>> filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 >>> interpretation to its bytes, and delivers a stream of Unicode characters, >>> in variables of type WIDECHAR. >>> >>> Text should preserve the abstraction that it's a string of characters, >>> generalized as it already is in cm3, to have type WIDECHAR, so they can be any >>> Unicode character. The internal representation should, usually, not be >>> of concern. >>> >>> Note that nowhere in Text are character values transferred between >>> a Text.T and any form of I/O stream. In the Text abstraction, all >>> characters go in and out of a Text.T in variables of type CHAR, >>> WIDECHAR, and arrays thereof. IO, etc. is only done in streams, >>> e.g, TextWr. We can easily add new variants of these that encode/decode >>> by various rules. >>> >>> Of course, it is still valid to put a string of bytes in a Text.T and >>> apply, e.g., UTF-8 interpretation yourself. But that's lower-level >>> programming, and shouldn't confuse the abstraction. >>> >>>> >>>> I can see a use for various wide characters -- the things you extract >>>> from a TEXT by parsing biits of it, but none for anything >>>> really new complicated for wide TEXT. >>>> >>>> The only confusing thing is that the existing operations for extracting >>>> bytes from TEXT have names that suggest they are extracting characters. >>>> >>> >>> I think it's more than a suggestion. I think the abstraction clearly >>> considers them characters. And it should stay that way. If you want, >>> at a higher level of code, to treat them as bytes, that's fine, but the >>> abstraction continues to view them as characters (which only you, the >>> client, know is not really so.) >>> >>>> -- Hendrik >>>> >> From rodney_bates at lcwb.coop Mon Jul 2 17:04:25 2012 From: rodney_bates at lcwb.coop (Rodney Bates) Date: Mon, 2 Jul 2012 08:04:25 -0700 Subject: [M3devel] Simple change to WIDECHAR type Message-ID: <20120702080425.EEE2B81F@resin11.mta.everyone.net> -Rodney Bates --- dragisha at m3w.org wrote: From: Dragi?a Duri? To: Antony Hosking Cc: m3devel Subject: Re: [M3devel] Simple change to WIDECHAR type Date: Sat, 30 Jun 2012 09:33:00 +0200 Current GetChar/SetChars and GetWideChar/SetWideChars are not character-level access methods, in terms of Unicode. They are "byte-level", fixed width data accesses. Reason: Both CHAR (cardinality 2^8) and WIDECHAR (cardinality 2^16) based strings must use one or more characters to represent whole Unicode (cardinality 2^20). If we must encode in any case, then we don't have any benefit of WIDECHAR (as it is implemented/understood now) at all! To represent Unicode with either CHAR or WIDECHAR based TEXTs - we must use either UTF-8 or UTF-16. Both are one-to-multibyte encodings, encoding one Unicode character to either 1-4 CHARs or 1-2 WIDECHARs. What exactly is meaning (at Modula-3 usual levels of abstraction) of character-level access? Do we need whatever bit pattern physically happening at some location in our data's representation. Or maybe we need numerical representation of actual, visually distinguishable in written representation, Unicode character value? One from that set of 2^20 elements? What is meaning of Text.Sub() based on byte-level access operations where our resulting TEXTs first character is in fact a prefix of some Unicode characters encoding? And/or where our last character is invalid/incomplete suffix of some encoded character. Since when are fast and efficient operations doing something we don't need at all our priority? We are getting nothing at all with WIDECHAR. No. Single. Thing. WIDECHAR does not make us closer to Unicode at all. WIDECHAR, together with CHAR (in context of our current TEXT) makes two almost-solutions to Unicode problem and existence of WIDECHAR scalar type makes us a bit closer to Unicode almost-solution of C world and nothing else. ------------------------------------------------------------------------------------------------------------------------------------------- I think the only reason why we got nothing is that WIDECHAR isn't wide enough. Let's fix that. --------------------------------------------------------------------------------------------------------------------------------------- Currently, neither GetChar nor GetWideChar can get "a character at nth position". Reason: No character scalar type to keep any Unicode character. Solution: ====== * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can hold unencoded Unicode characters in scalar values in our Modula-3 programs, while preserving their properties. * Implement properties, relations and methods defined for Unicode. With ASCII, numeric order is everything. With Unicode - it is not. This is probably very big project but we can start somewhere, and let interested parties build on it. Dirk Muysers did work in this regard already. * Whoever thinks we don't need this and our "tradition" and "legacy" are important, please read this: http://unicode.org/standard/WhatIsUnicode.html . dd On Jun 29, 2012, at 5:52 PM, Dragi?a Duri? wrote: > That, or UTF-16 encoding on top of current WIDECHAR. > > On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > >> That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory. In other words, all TEXT containing WIDECHAR will double in size. >> >> On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: >> >>> m3front/src/builtinTypes/WCharr.m3, line: >>> >>> T := EnumType.New (16_10000, elts); >>> >>> to >>> >>> T := EnumType.New (16_100000, elts); >>> >>> Will this break things? Any other assumptions anywhere? >>> >> > From rodney_bates at lcwb.coop Mon Jul 2 17:09:25 2012 From: rodney_bates at lcwb.coop (Rodney Bates) Date: Mon, 2 Jul 2012 08:09:25 -0700 Subject: [M3devel] Some earlier work Message-ID: <20120702080925.EEE2BB96@resin11.mta.everyone.net> Hmm. This looks very much like original Text.i3, with CHAR replaced by UText.Char. Dare I infer that is was inspired that way? It presents just the abstraction that I think Text itself should present. -Rodney Bates --- dragisha at m3w.org wrote: From: Dragi?a Duri? To: m3devel Subject: [M3devel] Some earlier work Date: Sat, 30 Jun 2012 10:56:27 +0200 This is how we implemented UTF8 strings over current TEXTs. Current implementation is UNSAFE and uses glibc utf8 methods. Nothing too complicated and nothing we can't implemented in Modula-3/portable C. ===== INTERFACE UText; TYPE T = TEXT; Char = CARDINAL; PROCEDURE Cat(t, u: T): T; PROCEDURE Equal(t, u: T): BOOLEAN; PROCEDURE GetChar(t: T; i: CARDINAL): Char; PROCEDURE ByteSize(t: T): CARDINAL; PROCEDURE Length(t: T): CARDINAL; PROCEDURE Empty(t: T): BOOLEAN; PROCEDURE Sub(t: T; start: CARDINAL; length: CARDINAL := LAST(CARDINAL)): T; PROCEDURE SetChars(VAR a: ARRAY OF Char; t: T); PROCEDURE FromChar(ch: Char): T; PROCEDURE FromChars(READONLY a: ARRAY OF Char): T; PROCEDURE Hash(t: T): Word.T; PROCEDURE Compare(t1, t2: T): [-1..1]; PROCEDURE FindChar(t: T; ch: Char; start: CARDINAL := 0): INTEGER; PROCEDURE FindCharR(t: T; ch: Char; start: CARDINAL := LAST(INTEGER)): INTEGER; END UText. From dragisha at m3w.org Mon Jul 2 17:13:03 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 2 Jul 2012 17:13:03 +0200 Subject: [M3devel] Some earlier work In-Reply-To: <20120702080925.EEE2BB96@resin11.mta.everyone.net> References: <20120702080925.EEE2BB96@resin11.mta.everyone.net> Message-ID: <99FFC5CA-99A9-4E57-A41C-C82624123312@m3w.org> With Brand added, it is ready for generic containers from libm3. Yes, it was inspired by Text.i3. Idea was to make as thin an interface as possible. On Jul 2, 2012, at 5:09 PM, Rodney Bates wrote: > Hmm. This looks very much like original Text.i3, with CHAR replaced by UText.Char. > Dare I infer that is was inspired that way? It presents just the abstraction that > I think Text itself should present. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jul 2 17:27:56 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 2 Jul 2012 16:27:56 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: Message-ID: <1341242876.32584.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: I don't know if I would agree with the kind of thinking that Modula-3 needed CHAR and WIDECHAR for a JVM execution engine device, but for the interpretation function. For instance what would be the purpose of handling more than 140 CHARS in a mobile phone, I don't see the need for that, or if you need to target many languages is useful but in a compiler setting not in an execution environment like CM J-V-M For instance let's suppose you have a Win16 device and an IBM JVM ready hardware, would you need two types of char? Maybe but for efficiency reasons, not for anything more. I agree with WIDECHAR devices in the sense of a General purpose language is better than many language encodings but we need to see the devices for that, for instance mobile phones, etc. Normally JVM-ready phones. Thanks in advance --- El lun, 2/7/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Simple change to WIDECHAR type Para: "Tony Hosking" CC: m3devel at elegosoft.com Fecha: lunes, 2 de julio, 2012 03:09 To be compatible, at least partially, with some other solution. Completeness of that other solution did not rub magically on cm3 just because they invented WIDECHAR as standard scalar type. On Jul 2, 2012, at 3:34 AM, Tony Hosking wrote: > As far as I know, WIDECHAR was simply for the CM3 JVM to support Java char which is 16-bit. > > On Jun 30, 2012, at 1:24 PM, Mika Nystrom wrote: > >> >> =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: >> ... >>> >>> Solution: >>> =3D=3D=3D=3D=3D=3D >>> >>> * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >>> GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >>> hold unencoded Unicode characters in scalar values in our Modula-3 = >>> programs, while preserving their properties. >>> * Implement properties, relations and methods defined for? Unicode. With = >>> ASCII, numeric order is everything. With Unicode - it is not. This is = >>> probably very big project but we can start somewhere, and let interested = >>> parties build on it. Dirk Muysers did work in this regard already. >>> * Whoever thinks we don't need this and our "tradition" and "legacy" are = >>> important, please read this: = >>> http://unicode.org/standard/WhatIsUnicode.html . >>> >>> dd >> >> Given what you have said about the near-uselessness of WIDECHAR, does anything >> actually use it much?? What breaks if it is redefined to be the same as, say, >> INTEGER?? (Or Word.T) >> >> CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if >> that could go back to using the SRC data structures.? For people who do stuff >> like write VLSI design tools... (probably many other large-scale applications >> would like it too). >> >>? Mika > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hosking at cs.purdue.edu Mon Jul 2 17:57:14 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Mon, 2 Jul 2012 11:57:14 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> References: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> Message-ID: <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> On Jul 2, 2012, at 10:50 AM, Rodney Bates wrote: > > > -Rodney Bates > > --- antony.hosking at gmail.com wrote: > >> From: Antony Hosking >> To: "Rodney M. Bates" >> Cc: "m3devel at elegosoft.com" >> Subject: Re: [M3devel] UTF-8 TEXT >> Date: Thu, 28 Jun 2012 10:37:36 -0400 >> >> Why not simply say that CHAR is an enumeration representing all of UTF-32? >> The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. >> We would need to translate the current Latin-1 literals into UTF-32. >> And we could simply have a new literal form for Unicode literals. >> > This is almost what I would propose to do, with a couple of differences: > > Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. > I am sure there is lots of existing code that depends on the implementation > properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. Fair enough. Would we leave the encoding of CHAR as ISO-Latin-1? We?d still need translation from ISO-Latin-1 to UTF-8 wouldn?t we? > Then I would define, in the language itself, that WIDECHAR is Unicode, not > UTF-32. Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an > implementation characteristic that BYTESIZE(WIDECHAR))=4. I note this text from the Wikipedia entry for UTF-32: Though a fixed number of bytes per code point appear convenient, it is not as useful as it appears. It makes truncation easier but not significantly so compared to UTF-8 andUTF-16. It does not make it faster to find a particular offset in the string, as an "offset" can be measured in the fixed-size code units of any encoding. It does not make calculating the displayed width of a string easier except in limited cases, since even with a ?fixed width? font there may be more than one code point per character position (combining marks) or more than one character position per code point (for example CJK ideographs). Combining marks mean editors cannot treat one code point as being the same as one unit for editing. Editors that limit themselves to left-to-right languages and precomposed characters can take advantage of fixed-sized code units, but such editors are unlikely to support non-BMP characters and thus can work equally well with 16-bit UTF-16 encoding. Does this argue against WIDECHAR=UTF-32? Would we be better off simply saying WIDECHAR=UTF-16 and leaving things as they are? Yes, it would make the definition of WideCharAt a little odd, because the index would be defined in 16-bit units rather than UTF-16 glyph code-points. By the way, if we did change WIDECHAR to an enumeration containing 16_110000 elements then the stored (memory) size of WIDECHAR would be 4 bytes given the current CM3 implementation of enumerations, which chooses a (non-PACKED) stored size of 1/2/4/8 bytes depending on the number of elements. > > On Jun 27, 2012, at 10:12 PM, Rodney M. Bates wrote: > >> >> >> On 06/27/2012 07:32 PM, Antony Hosking wrote: >>> So what do we do about 6-byte UTF-8 code points? They won't fit in WIDECHAR. Surely we should allow accessing a UTF-8 character as a CARDINAL and be done with it? >>> >> >> Absolutely. Except I think a better way is to make WIDECHAR big enough to hold all of >> Unicode. >> >>> Sent from my iPad >>> >>> On Jun 27, 2012, at 3:20 PM, "Rodney M. Bates" wrote: >>> >>>> >>>> >>>> On 06/26/2012 10:30 PM, Hendrik Boom wrote: >>>>> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >>>>>> I seem to recall that Rodney did some work a while back relating to TEXT. >>>>>> Rodney, can you weigh in on some of this? >>>>>> --Randy Coleburn >>>>>> >>>>>> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >>>>>> Sent: Tuesday, June 26, 2012 12:46 PM >>>>>> To: Jay >>>>>> Cc: m3devel >>>>>> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >>>>>> >>>>>> You had idea in other message. Store length! >>>>>> >>>>>> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. >>>>> >>>>> Most of the time, you don't need explicit integer indexes to character >>>>> locations. What you do need is an operation that fetches a character >>>>> given the string and its index (whatever data structure that index is), >>>>> and one that increments the index past that character. As long as you >>>>> can save an index and use it later on the same string, that's probably >>>>> all you ever need. And with a simple TEXT representation (such as the >>>>> obvious array of bytes containing characters of various widths) a byte >>>>> index is all you need (note: NOT a character index). It's easy even to >>>>> use TEXT and its integer indices as the data representation, as long as >>>>> you use the proper functions parse the characters and increment the >>>>> indices by amounts that might differ from 1. >>>>> >>>>> And if your source code is represented in UTF-8, the representation that >>>>> requires little extra compiler effort to parse, your TEXT strings will >>>>> automagically appear in UTF-8. >>>> >>>> The original designers of the language and its libraries have given us >>>> two different abstractions for handling character strings (in addition >>>> to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. >>>> >>>> Text is highly general and easy to use. Concatentations and substrings >>>> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >>>> Random access by *character* number is easy and, hopefully, implemented >>>> with efficiency at least better than O(n). >>>> >>>> Wr and friends restrict you to sequential access, at least mostly, but >>>> gain implementation convenience and efficiency as a result. >>>> >>>> I feel very stongly that we should *not* take away the full generality >>>> of Text, especially efficient random access, to handle variable-length >>>> character encodings in strings. For these, lets make more friends of >>>> Wr and Rd, which already assume sequential access. For example, a >>>> filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 >>>> interpretation to its bytes, and delivers a stream of Unicode characters, >>>> in variables of type WIDECHAR. >>>> >>>> Text should preserve the abstraction that it's a string of characters, >>>> generalized as it already is in cm3, to have type WIDECHAR, so they can be any >>>> Unicode character. The internal representation should, usually, not be >>>> of concern. >>>> >>>> Note that nowhere in Text are character values transferred between >>>> a Text.T and any form of I/O stream. In the Text abstraction, all >>>> characters go in and out of a Text.T in variables of type CHAR, >>>> WIDECHAR, and arrays thereof. IO, etc. is only done in streams, >>>> e.g, TextWr. We can easily add new variants of these that encode/decode >>>> by various rules. >>>> >>>> Of course, it is still valid to put a string of bytes in a Text.T and >>>> apply, e.g., UTF-8 interpretation yourself. But that's lower-level >>>> programming, and shouldn't confuse the abstraction. >>>> >>>>> >>>>> I can see a use for various wide characters -- the things you extract >>>>> from a TEXT by parsing biits of it, but none for anything >>>>> really new complicated for wide TEXT. >>>>> >>>>> The only confusing thing is that the existing operations for extracting >>>>> bytes from TEXT have names that suggest they are extracting characters. >>>>> >>>> >>>> I think it's more than a suggestion. I think the abstraction clearly >>>> considers them characters. And it should stay that way. If you want, >>>> at a higher level of code, to treat them as bytes, that's fine, but the >>>> abstraction continues to view them as characters (which only you, the >>>> client, know is not really so.) >>>> >>>>> -- Hendrik >>>>> >>> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Mon Jul 2 18:54:44 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Mon, 2 Jul 2012 12:54:44 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> References: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> Message-ID: <20120702165444.GA20908@topoi.pooq.com> On Mon, Jul 02, 2012 at 11:57:14AM -0400, Tony Hosking wrote: > > On Jul 2, 2012, at 10:50 AM, Rodney Bates wrote: > > > > > > > -Rodney Bates > > > > --- antony.hosking at gmail.com wrote: > > > >> From: Antony Hosking > >> To: "Rodney M. Bates" > >> Cc: "m3devel at elegosoft.com" > >> Subject: Re: [M3devel] UTF-8 TEXT > >> Date: Thu, 28 Jun 2012 10:37:36 -0400 > >> > >> Why not simply say that CHAR is an enumeration representing all of UTF-32? > >> The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. > >> We would need to translate the current Latin-1 literals into UTF-32. > >> And we could simply have a new literal form for Unicode literals. > >> > > This is almost what I would propose to do, with a couple of differences: > > > > Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. > > I am sure there is lots of existing code that depends on the implementation > > properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. > > Fair enough. Would we leave the encoding of CHAR as ISO-Latin-1? We?d still need translation from ISO-Latin-1 to UTF-8 wouldn?t we? > > > Then I would define, in the language itself, that WIDECHAR is Unicode, not > > UTF-32. Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an > > implementation characteristic that BYTESIZE(WIDECHAR))=4. > > I note this text from the Wikipedia entry for UTF-32: I had just looked this paragraph up on Wikipedia to post it when I noticed you had already done so. > > Though a fixed number of bytes per code point appear convenient, it is > not as useful as it appears. Wich is the gist of my objection to storing implementing TEXT as fixed-width 16, 20, or 32-bit storage units. It wastes space without much gain. (Exception might be made for a few languages that can be efficiently stored in 16 bits but not in UTF-8.) > It makes truncation easier but not significantly so compared to UTF-8 > and UTF-16. > It does not make it faster to find a particular offset in the string, > as an "offset" can be measured in the fixed-size code units of any > encoding. Exactly why I want character-extraction to be expressible in efficient "offsets" with implementation-independent specifications (though possibly implementatino-dependent values). I don't mind if character counts are also made available, as long as it doesn't impose extra overhead on those that don't use them. Operations with offsets that allow one to extract characters and skip over characters are sufficient for most purposes. The use of efficient offsets is independent of the question of access to individual bytes. > It does not make calculating the displayed width of a string easier > except in limited cases, since even with a ?fixed width? font there > may be more than one code point per character position (combining > marks) or more than one character position per code point (for example > CJK ideographs). > Combining marks mean editors cannot treat one code point as being the > same as one unit for editing. Editors that limit themselves to > left-to-right languages and precomposed characters can take advantage > of fixed-sized code units, but such editors are unlikely to support > non-BMP characters and thus can work equally well with 16-bit UTF-16 > encoding. I'd like to point out that most string processing doesn't really deal in characters at all, but in terms of words, phrases, symbols, and other linguistic structures that have to be dealt with using parsing. Assembling bytes of UTF-8 into characters is just more parsing, and should be viewed as such. For many applications it isn't even necessary to decode UTF-8, because it can be copied without being aware of its character structure. And it the language ascribes special meanings only to some of the first 128 characters, these can be unambiguously recognised in UTF-8 without decoding UTF-8 at all. This does argue for having byte access as well. > > Does this argue against WIDECHAR=UTF-32? Would we be better off simply saying WIDECHAR=UTF-16 and leaving things as they are? Yes, it would make the definition of WideCharAt a little odd, because the index would be defined in 16-bit units rather than UTF-16 glyph code-points. > > By the way, if we did change WIDECHAR to an enumeration containing 16_110000 elements then the stored (memory) size of WIDECHAR would be 4 bytes given the current CM3 implementation of enumerations, which chooses a (non-PACKED) stored size of 1/2/4/8 bytes depending on the number of elements. 16-bit WIDECHARs would seem to me to be the worst choice of all, except in the special case that you *know* that all the characters you'll ever have to deaal with fit in 16 bits and most of them won't fit in 8. I'd use WIDECHAR when I'm dealing with individual characters/UnicodeCodepoints. I'd use TEXT when dealing with strings. Or some custom data structure that can handle text containing strings and other data structure (suched as parse trees). Generally, there won't be a lot of WIDECHARS around in a running program, so I don't care much about the few extra bytes. -- hendrik From dabenavidesd at yahoo.es Mon Jul 2 22:44:44 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 2 Jul 2012 21:44:44 +0100 (BST) Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120702165444.GA20908@topoi.pooq.com> Message-ID: <1341261884.27797.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: I was thinking in back-end encoding of the CHARs in WIDECHAR using Rd/Wr-Rep but the mentioned modules are done around the idea of efficient machine implementation. I just think that the only need for having a UTF-8 or whatever encoding for CHARs and WIDECHAR is in a machine with those types. Numerous ?-coded "rare little" JVM machines are capable of handling that kind of Unicodes but anything else is just spurious to me, make that encoding for everybody in CM3. There isn't any other machine with that byte encoding that I know about so the good news is that the machines are reduced to: 1) Industrial Size scenario JVM 2) Small sized vendor machines, a web browser client like a JS? I hope with that we find some common ground for a solution for the issue. Thanks in advance --- El lun, 2/7/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] UTF-8 TEXT Para: m3devel at elegosoft.com Fecha: lunes, 2 de julio, 2012 11:54 On Mon, Jul 02, 2012 at 11:57:14AM -0400, Tony Hosking wrote: > > On Jul 2, 2012, at 10:50 AM, Rodney Bates wrote: > > > > > > > -Rodney Bates > > > > --- antony.hosking at gmail.com wrote: > > > >> From: Antony Hosking > >> To: "Rodney M. Bates" > >> Cc: "m3devel at elegosoft.com" > >> Subject: Re: [M3devel] UTF-8 TEXT > >> Date: Thu, 28 Jun 2012 10:37:36 -0400 > >> > >> Why not simply say that CHAR is an enumeration representing all of UTF-32? > >> The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. > >> We would need to translate the current Latin-1 literals into UTF-32. > >> And we could simply have a new literal form for Unicode literals. > >> > > This is almost what I would propose to do, with a couple of differences: > > > > Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. > > I am sure there is lots of existing code that depends on the implementation > > properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. > > Fair enough.? Would we leave the encoding of CHAR as ISO-Latin-1?? We?d still need translation from ISO-Latin-1 to UTF-8 wouldn?t we? > > > Then I would define, in the language itself, that WIDECHAR is Unicode, not > > UTF-32.? Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an > > implementation characteristic that BYTESIZE(WIDECHAR))=4. > > I note this text from the Wikipedia entry for UTF-32: I had just looked this paragraph up on Wikipedia to post it when I noticed you had already done so. > > Though a fixed number of bytes per code point appear convenient, it is > not as useful as it appears. Wich is the gist of my objection to storing implementing TEXT as fixed-width 16, 20, or 32-bit storage units.? It wastes space without much gain.? (Exception might be made for a few languages that can be efficiently stored in 16 bits but not in UTF-8.) > It makes truncation easier but not significantly so compared to UTF-8 > and UTF-16. > It does not make it faster to find a particular offset in the string, > as an "offset" can be measured in the fixed-size code units of any > encoding. Exactly why I want character-extraction to be expressible in efficient "offsets" with implementation-independent specifications (though possibly implementatino-dependent values).? I don't mind if character counts are also made available, as long as it doesn't impose extra overhead on those that don't use them.? Operations with offsets that allow one to extract characters and skip over characters are sufficient for most purposes.? The use of efficient offsets is independent of the question of access to individual bytes. > It does not make calculating the displayed width of a string easier > except in limited cases, since even with a ?fixed width? font there > may be more than one code point per character position (combining > marks) or more than one character position per code point (for example > CJK ideographs). > Combining marks mean editors cannot treat one code point as being the > same as one unit for editing. Editors that limit themselves to > left-to-right languages and precomposed characters can take advantage > of fixed-sized code units, but such editors are unlikely to support > non-BMP characters and thus can work equally well with 16-bit UTF-16 > encoding. I'd like to point out that most string processing doesn't really deal in characters at all, but in terms of words, phrases, symbols, and other linguistic structures that have to be dealt with using parsing.? Assembling bytes of UTF-8 into characters is just more parsing, and should be viewed as such. For? many applications it isn't even necessary to decode UTF-8, because it can be copied without being aware of its character structure. And it the language ascribes special meanings only to some of the first 128 characters, these can be unambiguously recognised in UTF-8 without decoding UTF-8 at all.? This does argue for having byte access as well. > > Does this argue against WIDECHAR=UTF-32?? Would we be better off simply saying WIDECHAR=UTF-16 and leaving things as they are?? Yes, it would make the definition of WideCharAt a little odd, because the index would be defined in 16-bit units rather than UTF-16 glyph code-points. > > By the way, if we did change WIDECHAR to an enumeration containing 16_110000 elements then the stored (memory) size of WIDECHAR would be 4 bytes given the current CM3 implementation of enumerations, which chooses a (non-PACKED) stored size of 1/2/4/8 bytes depending on the number of elements. 16-bit WIDECHARs would seem to me to be the worst choice of all, except in the special case that you *know* that all the characters you'll ever have to deaal with fit in 16 bits and most of them won't fit in 8. I'd use WIDECHAR when I'm dealing with individual characters/UnicodeCodepoints.? I'd use TEXT when dealing with strings.? Or some custom data structure that can handle text containing strings and other data structure (suched as parse trees).? Generally, there won't be a lot of WIDECHARS around in a running program, so I don't care much about the few extra bytes. -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Fri Jul 6 11:23:34 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Fri, 6 Jul 2012 11:23:34 +0200 Subject: [M3devel] A question for our language lawyers Message-ID: The report says (2.6.9) "The values in the array will be arbitrary values of their type." Now, ParseParams in its "init" method allocates an array of BOOLEANs and relies on the fact that it is supposedly initialised with FALSE values. At the other hand the report says (2.2.4) "The constant default is a default value used when a record is constructed or allocated" If I allocate an array of records, which statement is stronger: - the array contains arbitray record values ? - the array record fields will be initialised to their default values? The ParseParams "init" method is obviously erroneous and works only by virtue of a happy combination of circumstances. But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 18:06:20 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 17:06:20 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: Message-ID: <1341590780.97298.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: if that's true that you say "relies " Init hence the MODULE is wrong (is not) to specify that. But record rules hasn't anything to do here. But anyway you may have a point in that record initialization are less important than record construction (c.f p.53, s2.6.8, SPwM3), and that in the array case, it might be that it is stronger the array initialization (as a declared variable) than array construction but are decided in two different cases for WITH expression, with 'a' as an a TEXT WITH non-initialization but WITH p as a READONLY array-valued expression which doesn't do what you say it needs, so you found a bug known by Jay of "incorrect" un-initialized values in m3cg, or m3cc or m3gcc or m3cgc. In that case you might need an array of uninitialized expressions else construct the value correctly before entering the inner WITH. Thanks in advance --- El vie, 6/7/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 04:23 The report says (2.6.9) "The values in the array will be arbitrary values of their type." ? Now, ?ParseParams in its "init" method allocates an array of BOOLEANs and relies on the fact that it is supposedly?initialised with FALSE values. ? At the other hand the report says (2.2.4) "The constant default is a default value used when a record is constructed or allocated" ? If I allocate an array of records, which statement is stronger: - the array contains arbitray record values ? - the array record fields will be initialised to their default values? ? The ParseParams "init" method is obviously erroneous and works only by virtue of a happy combination of circumstances. But how is the report to be interpreted in the second case? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Fri Jul 6 18:28:10 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Fri, 06 Jul 2012 11:28:10 -0500 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> Message-ID: <4FF7121A.9000909@lcwb.coop> This is the result of the fact that your editor is writing UTF-8, while the compiler is reading in ISO-latin-1, as the language specifies. This was sensible at the time it was defined, but has been overcome by the advent and proliferation of Unicode. The abstract code point values in the range 16_80..16_FF are indeed the same in Unicode and ISO-latin-1, but the bit encoding rules are different. The simple and correct solution is to fix the compiler so that, like many programs today, it can be told to use one of several encodings when interpreting its input. Then set it the same as you set your editor. On 07/01/2012 03:52 AM, Dragi?a Duri? wrote: > Text.Length(Dragi?a Duri?)= 15 > > out from: > WITH me = W"Dragi?a Duri?" DO > IO.Put("Text.Length("& me& ")= "& Fmt.Int(Text.Length(me))& "\n"); > END; > > On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: > >> Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? > > From dabenavidesd at yahoo.es Fri Jul 6 19:08:25 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 18:08:25 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <4FF7121A.9000909@lcwb.coop> Message-ID: <1341594505.40475.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: I think the problem is how to encode but not to REVEAL (which would need machine identification, so a generic target is my preferred abstraction as CM3 tried to do) the language encoding explicitly (we don't like to reveal anything of the machine from Modula-3 sense standard point of view you might need a language redefinition), I think if one needs that is because is on a machine like that. So, in a given platform you might know the encoding and that's all. The other approach is just very hard to use, to put burden of choice, my thinking is that if you need that you might end needing generics that tell at compile time what to use. Of course Type checking methods are done at instantiation time, but nevertheless is helpful that these other settings are done at compile time (which make sense for the question why do I need to compile this code). That's because in other machines you might need to exploit three times the needed time to encode, decode and encode again (cost affects if you think in changing parameters so you might not touch that for the benefit of third parties as a default). This matters in phones where you don't have time to do that, and generally any type of type machine, so in a hard-coded way this is not helpful option for everybody at all as well. The machine-dependent solution helps if you can't compile the thing there (cross-compilations or pre-compiled binaries), but anyway I guess if we want Java compatibility (I do as a platform for binary compatibility but just when it's needed not in every execution environment, say a real HW implemented JVMs). So basically the language implementation needs to know that nobody else means that module wise model might need to be introduced, which is not something we have now. Thanks in advance --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] Simple change to WIDECHAR type Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 11:28 This is the result of the fact that your editor is writing UTF-8, while the compiler is reading in ISO-latin-1, as the language specifies.? This was sensible at the time it was defined, but has been overcome by the advent and proliferation of Unicode. The abstract code point values in the range 16_80..16_FF are indeed the same in Unicode and ISO-latin-1, but the bit encoding rules are different. The simple and correct solution is to fix the compiler so that, like many programs today, it can be told to use one of several encodings when interpreting its input.? Then set it the same as you set your editor. On 07/01/2012 03:52 AM, Dragi?a Duri? wrote: > Text.Length(Dragi?a Duri?)= 15 > > out from: >? ? WITH me = W"Dragi?a Duri?" DO >? ? ? IO.Put("Text.Length("&? me&? ")= "&? Fmt.Int(Text.Length(me))&? "\n"); >? ? END; > > On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: > >> Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Fri Jul 6 19:54:32 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Fri, 06 Jul 2012 12:54:32 -0500 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> References: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> Message-ID: <4FF72658.905@lcwb.coop> On 07/02/2012 10:57 AM, Tony Hosking wrote: > > On Jul 2, 2012, at 10:50 AM, Rodney Bates wrote: > >> >> >> -Rodney Bates >> >> --- antony.hosking at gmail.com wrote: >> >>> From: Antony Hosking > >>> To: "Rodney M. Bates" > >>> Cc: "m3devel at elegosoft.com " > >>> Subject: Re: [M3devel] UTF-8 TEXT >>> Date: Thu, 28 Jun 2012 10:37:36 -0400 >>> >>> Why not simply say that CHAR is an enumeration representing all of UTF-32? >>> The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. >>> We would need to translate the current Latin-1 literals into UTF-32. >>> And we could simply have a new literal form for Unicode literals. >>> >> This is almost what I would propose to do, with a couple of differences: >> >> Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. >> I am sure there is lots of existing code that depends on the implementation >> properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. > > Fair enough. Would we leave the encoding of CHAR as ISO-Latin-1? We?d still need translation from ISO-Latin-1 to UTF-8 wouldn?t we? > Yes. The code points for Unicode and ISO-Latin-1, in the range 128..255 map to the same characters, (as in 0..127). But the physical encoding is different. ISO-Latin-1 is encoded one byte per character unconditionally. When Unicode is encoded in UTF-8, any code point 128 or more uses at least two bytes. We need translations, but these belong in Wr/Rd and friends, which handle serial streams. In in-memory variables, WIDECHAR holds a Unicode code point, ARRAY OF WIDECHAR would happen to be the same representation as UTF-32, and Text.T would abstract away the internal representation. >> Then I would define, in the language itself, that WIDECHAR is Unicode, not >> UTF-32. Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an >> implementation characteristic that BYTESIZE(WIDECHAR))=4. > > I note this text from the Wikipedia entry for UTF-32: > > Though a fixed number of bytes per code point appear convenient, it is not as useful as it appears. It makes truncation easier but not significantly so compared to UTF-8 andUTF-16 . It does not make it faster to find a particular offset in the string, as an "offset" can be measured in the fixed-size code units of any encoding. It does not make calculating the displayed width of a string easier except in limited cases, since even with a ?fixed width? font there may be more than one code point per character position (combining marks ) or more than one character position per code point (for example CJK ideographs). Combining marks mean editors cannot treat one code point as being the same as one unit for editing. Editors that limit themselves to left-to-right languages and precomposed characters > can take advantage of fixed-sized code units, but such editors are unlikely to support non-BMP characters and thus can work equally well with 16-bit UTF-16 encoding. > > > Does this argue against WIDECHAR=UTF-32? Would we be better off simply saying WIDECHAR=UTF-16 and leaving things as they are? Yes, it would make the definition of WideCharAt a little odd, because the index would be defined in 16-bit units rather than UTF-16 glyph code-points. > No. Keeping WIDECHAR at only 2^16 values does nothing to get us out of the morass we are now in where every bit of character-manipulating code has to cope with different encodings and/or variable-sized encodings. If we make WIDECHAR capable of holding any Unicode code point, then we have the possibility of dealing with characters in the same abstractions as we originally had, and, with only an 8-bit character set, still do Specifically, we have a variable type that holds any character, arrays thereof, and a very general functional style package of strings thereof. Library streams can handle encoding transformations, and most code won't have to worry about them, beyond specifying once what encoding it wants. Of course, you could still always do low-level stuff like putting one UTF-8 code _unit_ into a WIDECHAR or CHAR, having arrays or TEXTs thereof, and constantly fiddling with the encoding. But this should not be required. > By the way, if we did change WIDECHAR to an enumeration containing 16_110000 elements then the stored (memory) size of WIDECHAR would be 4 bytes given the current CM3 implementation of enumerations, which chooses a (non-PACKED) stored size of 1/2/4/8 bytes depending on the number of elements. > I have thought about making BYTESIZE(WIDECHAR) = 3, but that would at best trade one group of problems for another. In particular, applying ORD functions and doing arithmetic on characters located in arrays (including those hidden inside Text) would always involve repacking to get things aligned. I would think we would at least want to keep WIDECHAR scalars aligned. >> >> On Jun 27, 2012, at 10:12 PM, Rodney M. Bates wrote: >> >>> >>> >>> On 06/27/2012 07:32 PM, Antony Hosking wrote: >>>> So what do we do about 6-byte UTF-8 code points? They won't fit in WIDECHAR. Surely we should allow accessing a UTF-8 character as a CARDINAL and be done with it? >>>> >>> >>> Absolutely. Except I think a better way is to make WIDECHAR big enough to hold all of >>> Unicode. >>> >>>> Sent from my iPad >>>> >>>> On Jun 27, 2012, at 3:20 PM, "Rodney M. Bates"> wrote: >>>> >>>>> >>>>> >>>>> On 06/26/2012 10:30 PM, Hendrik Boom wrote: >>>>>> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >>>>>>> I seem to recall that Rodney did some work a while back relating to TEXT. >>>>>>> Rodney, can you weigh in on some of this? >>>>>>> --Randy Coleburn >>>>>>> >>>>>>> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >>>>>>> Sent: Tuesday, June 26, 2012 12:46 PM >>>>>>> To: Jay >>>>>>> Cc: m3devel >>>>>>> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >>>>>>> >>>>>>> You had idea in other message. Store length! >>>>>>> >>>>>>> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. >>>>>> >>>>>> Most of the time, you don't need explicit integer indexes to character >>>>>> locations. What you do need is an operation that fetches a character >>>>>> given the string and its index (whatever data structure that index is), >>>>>> and one that increments the index past that character. As long as you >>>>>> can save an index and use it later on the same string, that's probably >>>>>> all you ever need. And with a simple TEXT representation (such as the >>>>>> obvious array of bytes containing characters of various widths) a byte >>>>>> index is all you need (note: NOT a character index). It's easy even to >>>>>> use TEXT and its integer indices as the data representation, as long as >>>>>> you use the proper functions parse the characters and increment the >>>>>> indices by amounts that might differ from 1. >>>>>> >>>>>> And if your source code is represented in UTF-8, the representation that >>>>>> requires little extra compiler effort to parse, your TEXT strings will >>>>>> automagically appear in UTF-8. >>>>> >>>>> The original designers of the language and its libraries have given us >>>>> two different abstractions for handling character strings (in addition >>>>> to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. >>>>> >>>>> Text is highly general and easy to use. Concatentations and substrings >>>>> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >>>>> Random access by *character* number is easy and, hopefully, implemented >>>>> with efficiency at least better than O(n). >>>>> >>>>> Wr and friends restrict you to sequential access, at least mostly, but >>>>> gain implementation convenience and efficiency as a result. >>>>> >>>>> I feel very stongly that we should *not* take away the full generality >>>>> of Text, especially efficient random access, to handle variable-length >>>>> character encodings in strings. For these, lets make more friends of >>>>> Wr and Rd, which already assume sequential access. For example, a >>>>> filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 >>>>> interpretation to its bytes, and delivers a stream of Unicode characters, >>>>> in variables of type WIDECHAR. >>>>> >>>>> Text should preserve the abstraction that it's a string of characters, >>>>> generalized as it already is in cm3, to have type WIDECHAR, so they can be any >>>>> Unicode character. The internal representation should, usually, not be >>>>> of concern. >>>>> >>>>> Note that nowhere in Text are character values transferred between >>>>> a Text.T and any form of I/O stream. In the Text abstraction, all >>>>> characters go in and out of a Text.T in variables of type CHAR, >>>>> WIDECHAR, and arrays thereof. IO, etc. is only done in streams, >>>>> e.g, TextWr. We can easily add new variants of these that encode/decode >>>>> by various rules. >>>>> >>>>> Of course, it is still valid to put a string of bytes in a Text.T and >>>>> apply, e.g., UTF-8 interpretation yourself. But that's lower-level >>>>> programming, and shouldn't confuse the abstraction. >>>>> >>>>>> >>>>>> I can see a use for various wide characters -- the things you extract >>>>>> from a TEXT by parsing biits of it, but none for anything >>>>>> really new complicated for wide TEXT. >>>>>> >>>>>> The only confusing thing is that the existing operations for extracting >>>>>> bytes from TEXT have names that suggest they are extracting characters. >>>>>> >>>>> >>>>> I think it's more than a suggestion. I think the abstraction clearly >>>>> considers them characters. And it should stay that way. If you want, >>>>> at a higher level of code, to treat them as bytes, that's fine, but the >>>>> abstraction continues to view them as characters (which only you, the >>>>> client, know is not really so.) >>>>> >>>>>> -- Hendrik >>>>>> >>>> >> >> >> > > From rodney_bates at lcwb.coop Fri Jul 6 20:27:28 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Fri, 06 Jul 2012 13:27:28 -0500 Subject: [M3devel] A question for our language lawyers In-Reply-To: References: Message-ID: <4FF72E10.3030204@lcwb.coop> On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded. Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code. So why not define which value of the type is initialized-to and get behavioral predictability for free? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? From dragisha at m3w.org Fri Jul 6 20:51:10 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 6 Jul 2012 20:51:10 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <4FF7121A.9000909@lcwb.coop> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> <4FF7121A.9000909@lcwb.coop> Message-ID: And then, turn parsed string literals into broken WIDECHAR TEXTs? On Jul 6, 2012, at 6:28 PM, Rodney M. Bates wrote: > This is the result of the fact that your editor is writing UTF-8, while > the compiler is reading in ISO-latin-1, as the language specifies. This > was sensible at the time it was defined, but has been overcome by the > advent and proliferation of Unicode. > > The abstract code point values in the range 16_80..16_FF are indeed the same in > Unicode and ISO-latin-1, but the bit encoding rules are different. > > The simple and correct solution is to fix the compiler so that, like many > programs today, it can be told to use one of several encodings when interpreting > its input. Then set it the same as you set your editor. > > On 07/01/2012 03:52 AM, Dragi?a Duri? wrote: >> Text.Length(Dragi?a Duri?)= 15 >> >> out from: >> WITH me = W"Dragi?a Duri?" DO >> IO.Put("Text.Length("& me& ")= "& Fmt.Int(Text.Length(me))& "\n"); >> END; >> >> On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: >> >>> Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? >> >> From dabenavidesd at yahoo.es Fri Jul 6 21:17:51 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 20:17:51 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: <4FF72E10.3030204@lcwb.coop> Message-ID: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 21:57:25 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 20:57:25 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: <1341604645.24546.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: I think if we are to type define initialization, we need a kernel to type more fun than rigid Modula-3 semantics: http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf That said, we can define a m3kernel sort of type minimal abstraction of a Modula-3 Object, and built on top of that. Advantages are we can type theorize? in every wanted way with it and still protect us from incompatible type systems, by branding the type system to allow smooth transitions. Besides parallelization implicitly in the abstract machine (kernel) and check the type safety of it. Also rewrite the type system in terms of this kernel might get us to a new language in the sense of a language definition smoothly If someone steems this good I can make my try. Thanks in advance --- El vie, 6/7/12, Daniel Alejandro Benavides D. escribi?: De: Daniel Alejandro Benavides D. Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 14:17 Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Fri Jul 6 21:54:54 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Fri, 6 Jul 2012 21:54:54 +0200 Subject: [M3devel] A question for our language lawyers In-Reply-To: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: Daniel, with my apologies, sometimes I wonder if you do it on purpose. From: Daniel Alejandro Benavides D. Sent: Friday, July 06, 2012 9:17 PM To: m3devel at elegosoft.com ; Rodney M. Bates Subject: Re: [M3devel] A question for our language lawyers Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded. Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code. So why not define which value of the type is initialized-to and get behavioral predictability for free? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 22:07:15 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 21:07:15 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: Message-ID: <1341605235.19643.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: thing is as today is you don't have any software to show the language is incorrect, so I can't validate you (I don't pretend to do that). Because there isn't any compiler that defines that. Sorry for that, but nobody else seems to care, so thanks for sharing your problem, at least someone is interested in that as well. Dr Dobbs talks about tri state boolean, I thought it was to show that. Sorry if not. Thanks in advance --- El vie, 6/7/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: Re: [M3devel] A question for our language lawyers Para: "Daniel Alejandro Benavides D." , m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 14:54 Daniel, with my apologies, sometimes I wonder if you do it on purpose. From: Daniel Alejandro Benavides D. Sent: Friday, July 06, 2012 9:17 PM To: m3devel at elegosoft.com ; Rodney M. Bates Subject: Re: [M3devel] A question for our language lawyers Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 22:59:12 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 21:59:12 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: <1341604645.24546.YahooMailClassic@web29706.mail.ird.yahoo.com> Message-ID: <1341608352.82920.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: See Baby Modula-3 allows field definition (value by definition s. 3.1) for free se p. 10-11 in url. Thanks in advance --- El vie, 6/7/12, Daniel Alejandro Benavides D. escribi?: De: Daniel Alejandro Benavides D. Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 14:57 Hi all: I think if we are to type define initialization, we need a kernel to type more fun than rigid Modula-3 semantics: http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf That said, we can define a m3kernel sort of type minimal abstraction of a Modula-3 Object, and built on top of that. Advantages are we can type theorize? in every wanted way with it and still protect us from incompatible type systems, by branding the type system to allow smooth transitions. Besides parallelization implicitly in the abstract machine (kernel) and check the type safety of it. Also rewrite the type system in terms of this kernel might get us to a new language in the sense of a language definition smoothly If someone steems this good I can make my try. Thanks in advance --- El vie, 6/7/12, Daniel Alejandro Benavides D. escribi?: De: Daniel Alejandro Benavides D. Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 14:17 Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jul 6 23:07:59 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 6 Jul 2012 23:07:59 +0200 Subject: [M3devel] A question for our language lawyers In-Reply-To: References: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: Dirk, If you still have doubts, you are better man than most of us :) Thanks in advance! On Jul 6, 2012, at 9:54 PM, Dirk Muysers wrote: > Daniel, with my apologies, sometimes I wonder if you do it on purpose. > > From: Daniel Alejandro Benavides D. > Sent: Friday, July 06, 2012 9:17 PM > To: m3devel at elegosoft.com ; Rodney M. Bates > Subject: Re: [M3devel] A question for our language lawyers > > Hi all: > English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: > > http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 > > So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) > > This means we need to address this by either a native backend (NT386) or by another language for that matter. > > Thanks in advance for any comments you may have > > --- El vie, 6/7/12, Rodney M. Bates escribi?: > > De: Rodney M. Bates > Asunto: Re: [M3devel] A question for our language lawyers > Para: m3devel at elegosoft.com > Fecha: viernes, 6 de julio, 2012 13:27 > > > > On 07/06/2012 04:23 AM, Dirk Muysers wrote: > > The report says (2.6.9) > > "The values in the array will be arbitrary values of their type." > > > Now, ParseParams in its "init" method allocates an array of BOOLEANs > > and relies on the fact that it is supposedly initialised with FALSE values. > > > At the other hand the report says (2.2.4) > > "The constant |default| is a default value used when a record is constructed or allocated" > > > If I allocate an array of records, which statement is stronger: > > - the array contains arbitray record values ? > > - the array record fields will be initialised to their default values? > > Admittedly unclearly if not misleadingly worded. Better wording might be > to say each element is initialized as it would if it were a scalar variable > of its type. > > I think the way to interpret this is that the array itself does not impose > any initialization, but this fact will not eliminate initialization > imposed by other rules, specifically, the type of the array's elements. > > This is a language quirk that I have always been deeply ambivalent about. > The type safety would go down the drain if variables were not initialized > to a bit pattern that represents some value of the type, so we have to pay > the performance penalty of executing initialization code. So why not define > which value of the type is initialized-to and get behavioral predictability > for free? And further save redundant initialization in the likely event > that the compiler's chosen arbitrary value happens to match what the > programmer wants? > > (OK, a smart enough optimizer might figure this out, but we could have > had it even with a naive compiler.) > > The contrary case is a type whose compiler-chosen representation happens > to use every bit pattern in the allocated space for a value of the type. > Here, no compiler-generated runtime initialization is needed. > > Also, the rule we have might sometimes encourage programmers to at least give a > millisecond's thought to whether they need to do some explicit initialization. > > > > The ParseParams "init" method is obviously erroneous and works only > > by virtue of a happy combination of circumstances. > > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 23:44:55 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 22:44:55 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: Message-ID: <1341611095.41843.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: As I said once, why say what's right, what is wrong, in terms of standards nobody cares that, so who cares to say that. (See other programming languages that need help first, like C and friends!) Thanks in advance --- El vie, 6/7/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] A question for our language lawyers Para: "Dirk Muysers" CC: "Daniel Alejandro Benavides D." , m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 16:07 Dirk, If you still have doubts, you are better man than most of us :) Thanks in advance! On Jul 6, 2012, at 9:54 PM, Dirk Muysers wrote: Daniel, with my apologies, sometimes I wonder if you do it on purpose. From: Daniel Alejandro Benavides D. Sent: Friday, July 06, 2012 9:17 PM To: m3devel at elegosoft.com ; Rodney M. Bates Subject: Re: [M3devel] A question for our language lawyers Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jul 7 08:05:39 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 7 Jul 2012 06:05:39 +0000 Subject: [M3devel] A question for our language lawyers In-Reply-To: References: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com>, Message-ID: I quite like the idea that all heap and stack is initialized by zeroing. This is I believe stronger/safer than Modula-3, at least for stack. Anyone want to measure the change? I'd also like to see stack zeroed upon function return, so GC is easier to implement/understand... From: dmuysers at hotmail.com To: dabenavidesd at yahoo.es; m3devel at elegosoft.com; rodney_bates at lcwb.coop Date: Fri, 6 Jul 2012 21:54:54 +0200 Subject: Re: [M3devel] A question for our language lawyers Daniel, with my apologies, sometimes I wonder if you do it on purpose. From: Daniel Alejandro Benavides D. Sent: Friday, July 06, 2012 9:17 PM To: m3devel at elegosoft.com ; Rodney M. Bates Subject: Re: [M3devel] A question for our language lawyers Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded. Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code. So why not define which value of the type is initialized-to and get behavioral predictability for free? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Sat Jul 7 14:06:31 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Sat, 7 Jul 2012 14:06:31 +0200 Subject: [M3devel] A question for our language lawyers Message-ID: I reread ParseParams.m3 and, yes, they initialise the array of booleans. One should never trust one's memory, especially past a certain age. Yet I am sure having seen one of the library modules relying on zero initialisation. For my excuse, I never (except an occasional INC, where C would use ++) place two statements on the same line, so when I quickly browse through some code, the second statement often escapes my eyes. Nevertheless the initialisation question was worth to be mentionned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Sat Jul 7 14:57:03 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sat, 7 Jul 2012 13:57:03 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: Message-ID: <1341665823.8622.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: yes, it could be, but VAXen and Alpha's I believe it did not cause the wrong behavior to show that? incorrect initialization at start time, that most part of it trust on it (Alphas just throw an exception to show that it was changed). I didn't know it was wrong for sure, but I guess that confirms the initialization code is not working by vicious value initialization. Did you see the Baby Modula-3 (in p.10 - 11, s 3.1 - Relation to Modula-3) it says you can do overriding at the type level overriding of fields to override defaults? Thanks in advance --- El s?b, 7/7/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: s?bado, 7 de julio, 2012 07:06 I reread ParseParams.m3 and, yes, they initialise the array of booleans. One should never trust?one's memory, especially past a certain age. Yet I am sure having seen one of the library modules relying on zero initialisation. For my excuse, I never (except an occasional INC, where C would use ++) place two statements on the same line, so when I quickly browse through some code, the second statement often escapes my eyes. Nevertheless the initialisation question was worth to be mentionned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Sat Jul 7 15:59:07 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Sat, 07 Jul 2012 08:59:07 -0500 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> <4FF7121A.9000909@lcwb.coop> Message-ID: <4FF840AB.5050807@lcwb.coop> On 07/06/2012 01:51 PM, Dragi?a Duri? wrote: > And then, turn parsed string literals into broken WIDECHAR TEXTs? > Well, yes, that requires fixing WIDECHAR too. But at least it would work if you can live within the BMP. > On Jul 6, 2012, at 6:28 PM, Rodney M. Bates wrote: > >> This is the result of the fact that your editor is writing UTF-8, while >> the compiler is reading in ISO-latin-1, as the language specifies. This >> was sensible at the time it was defined, but has been overcome by the >> advent and proliferation of Unicode. >> >> The abstract code point values in the range 16_80..16_FF are indeed the same in >> Unicode and ISO-latin-1, but the bit encoding rules are different. >> >> The simple and correct solution is to fix the compiler so that, like many >> programs today, it can be told to use one of several encodings when interpreting >> its input. Then set it the same as you set your editor. >> >> On 07/01/2012 03:52 AM, Dragi?a Duri? wrote: >>> Text.Length(Dragi?a Duri?)= 15 >>> >>> out from: >>> WITH me = W"Dragi?a Duri?" DO >>> IO.Put("Text.Length("& me& ")="& Fmt.Int(Text.Length(me))& "\n"); >>> END; >>> >>> On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: >>> >>>> Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? >>> >>> > > From dabenavidesd at yahoo.es Sat Jul 7 18:17:10 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sat, 7 Jul 2012 17:17:10 +0100 (BST) Subject: [M3devel] Modula-3 TLA Win32 Kernel Threads API Specification by Leslie Lamport Message-ID: <1341677830.27299.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I wanted to share what I have found recently: http://web.archive.org/web/20010712210213/http://www.research.compaq.com/SRC/personal/lamport/tla/threads/threads.html I would like to make that for POSIX 1003.4 (original DEC proposal) and post it, would Elegofolks mind to upload the Lamport to CVS tree, I think are important design notes of the Win32 Threads API if at all please let me know if interested. Alas it's TLA code may be considered m3theory subdirectory of m3kernel In fact there is a TLA checker written in connection with Zeus Algorithm Animation system for automating the animation of proofs, so I guess we just lack that part for further integration. Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Tue Jul 10 17:57:04 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Tue, 10 Jul 2012 10:57:04 -0500 Subject: [M3devel] A Unicode/WIDECHAR proposal Message-ID: <4FFC50D0.4000805@lcwb.coop> Here is a more-or-less comprehensive proposal to get modern support of Unicode and its various encodings into Modula-3 and its libraries, while preserving both backward compatibility and original abstractions. Summary: Fix WIDECHAR so it holds all of Unicode. This restores the abstractions we once had, by treating every character as a value of a scalar type, for in-memory processing. The members of a TEXT and elements of ARRAY OF WIDECHAR get this property too. Do encoding/decoding in streams Wr and Rd, which are inherently sequential anyway. Give every stream an encoding property. Add procedures to get/put characters with encoding/decoding. These changes are backward-compatable. You can still do low-level stuff if you have good reason, or just want to leave existing code alone. E.g., putting the bytes of UTF-8 into the characters of a TEXT and doing your own encoding/decoding. CHAR: Leave CHAR as it is: exactly 256 values, encoded in ISO-Latin-1, ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=16_FF, BYTESIZE(CHAR)=1. The language allows CHAR to have more values, but changing this would no doubt seriously undermine a good bit of existing code. WIDECHAR: Change WIDECHAR to have exactly the Unicode range. ORD(FIRST(WIDECHAR))=0 and ORD(LAST(WIDECHAR))=16_10FFFF. The full ORD and VAL functions from/to WIDECHAR are defined by the code point to character mapping of the Unicode standard. BYTESIZE(WIDECHAR)=4. Make actual internal representation be Unicode code points also. This happens to match UTF-32, most significantly for arrays of WIDECHAR. Note that some of the codepoint values in this range are not unicode characters. Programmers will need to account for this. CHAR <: WIDECHAR, which means they are mutually assignable, with runtime check in the one direction. This works because the Unicode code points and the ISO-Latin-1 code points are identical in the entire ISO-Latin-1 range, up to 16_FF. Note that at 16_80 and above, the UTF-8 encoding is more than one byte, none of them equal to the encoded code point. This is not a problem, because both CHAR and WIDECHAR are actual code points, not one of the bytes UTF-8. TEXT: TEXT continues to be defined as abstractly a sequence of WIDECHAR. An index into a TEXT is an integer count of characters. The internal representation (used only in memory, and maybe in pickles) is hidden and could be just about anything. Given the extreme memory inefficiency of the current cm3 implementation of TEXT, we no doubt will want to change it, but this decision is independent and at a lower level. The abstract interface Text will hide this. There is hardly a remaining need for Text.FromChar, because by assignability, Text.FromWideChar can be used in its place, with the same result. But keep FromChar, for compatability with existing code. Text.FromChars just means the code points in the created text will happen to be members of type CHAR. Text.GetChar and Text.GetChars will raise an exception if a to-be-gotten code point in the TEXT lies outside the type CHAR. This is a change from existing behavior, which just truncates the high bits of a WIDECHAR value and returns only the low bits. Even if we didn't add the exception, we would want this to be an assignability runtime error. Literals: Inside wide character and wide text literals, add two new escapes, \u, which is followed by exactly 4 hex digits denoting a code point, and \U, which is followed by exactly 6 hex digits. The letters 'u' and 'U' are used in this way in the Unicode standard. \u would be redundant with the existing \x and \X escapes, but those would merely preserve compatability for existing code. (Or is there so little existing code using them that we could eliminate them for a more consistent system?) Encodings: Define an enumeration giving the possible encodings used in streams: TYPE Encoding = {Inherit, ISO_Latin_1, UCS_2LE, UTF_8, UTF_16, UTF_16BE, UTF_16LE, UTF_32, UTF_32BE, UTF_32LE}; ISO_Latin_1 means one byte per character, unconditionally. This is the way current Modula-3 always encodes CHAR. An attempt to Put a code point greater than 16_FF in this encoding will raise an exception. (This can happen only using newly added procedures.) Similarly, UCS_2LE, as I understand the standard, means exactly two bytes per character, LSB first. This is what our current Wr and Rd use for WIDECHAR. Here again, an exception will be raised for a code point greater than 16_FFFF. This, also, can happen only using newly added procedures. Inherit means get the encoding to be used from somewhere else, for example, from the file system, in case it is able to store this property of a file. Every Wr.T and every Rd.T has an Encoding property that can be specified when creating the stream, (from one of its subtypes). The ways of doing this can vary with the subtype. This defaults to Inherit, which means, if possible, take it from the file system, etc. Otherwise, there are defaults for the various streams. New operations that Put/Get Unicode characters have a parameter of type Encoding, with a default value of Inherit, which means get the encoding property from the stream. Accepting this default would be the usual way to use these procedures. Specifying the encoding differently in the Put/Get procedure allows mixed encodings in a single stream. It seems dubious to encourage this, but existing Wr and Rd already provide plenty of opportunities to do similar stuff anyway, so this just extends existing semantics to the new procedures. It also allows some existing Put/Get procedures to be defined as equivalents to new ones. Wr: New procedure PutUniWideChar(Wr: T; ch: WIDECHAR; Enc:=Encoding.Inherit) encodes the character using Enc and appends that to the stream. There is hardly a need for a CHAR counterpart. Since CHAR is assignable to WIDECHAR, PutUniWideChar suffices for an actual parameter of either type. Whether the caller provides a CHAR or a WIDECHAR (or whether we were alternatively to have different procedures) does _not_ affect the encoding, only the value range that can be passed in. Similar new procedures PutUniString, PutUniWideString, and PutUniText are counterparts to PutString, PutWideString, and PutText, respectively. Existing PutChar and PutString, which write CHARs as one byte, each become equivalent to PutUniWideChar and PutUniString, with Enc:=Encoding.ISO_Latin_1. Similarly, Existing PutWideChar and PutWideString, which write WIDECHARs as two bytes each, becomes equivalent to PutUniWideChar and PutUniWideString, with Enc:=Encoding.UCS_2LE. The existing Wr interface is peculiar, IMO, in that even though there is currently no distinction between a text and a wide text, we have PutText and PutWideText. These have identical signatures, both taking a TEXT (which can contain characters in the full WIDECHAR range). The difference is that PutText rather violently truncates every character in the text to 8 bits and writes that, implicitly in ISO-Latin-1 encoding. This is not equivalent to PutUniText with Enc:=Encoding.ISO_Latin_1, because the latter will raise an exception for unencodable code points. Rd: New procedure GetUniWideChar (rd:T; Enc:=Encoding.Inherit) :WIDECHAR decodes, using Enc, and consumes, enough bytes from rd for one Unicode code point and returns it. There is not a lot of need for a CHAR-returning counterpart of GetUniWideChar. A caller can just assign the result from GetUniWideChar to a CHAR variable and deal with the possible range error at the call site. GetUniSub, GetUniWideSub, GetUniSubLine, GetUniWideSubLine, GetUniText, and GetUniTextLine are counterparts to GetSub, GetWideSub, GetSubLine GetWideSubLine, GetWideText, and GetWideLine. They differ in decoding according to the Enc parameter. In the new GetUni* procedures, any case where a partial character is terminated by end-of-file will raise an exception. This differs from the current GetWide* procedures, which all implicitly use UCS_2LE and just insert a zero byte as the MSB in this case. Existing GetChar, GetSub, GetSubLine, GetText, and GetLine all implicitly use the ISO-Latin-1 encoding. GetWideChar, GetWideSub, GetWideSubLine, GetWideText, and GetWideLine all implicitly use UCS_2LE. They differ from new GetUni* procedures using UCS_2LE in that the latter raise an exception on a incomplete character. GetUniSub and GetUniSubLine return decoded characters in ARRAY OF CHAR and raise an exception if a decoded code point is not in CHAR. This might seem a bit ridiculous, but they could be useful for quick, partial adaptation of existing code to accept newer encodings and detect, without otherwise handling, higher code points. Actually, GetWideText is documented as being identical to GetText, in behavior, as well as signature. I think this must be an editing error. I wonder if we need to review the rules for what constitutes a line break. A new UnGetUni would work like UnGetChar, but would reencode the pushed-back character, (retained internally as a WIDECHAR), according to its Enc parameter. The next Get* would then redecode according to its Enc parameter or implicit encoding, which could be different and consume a different number of bytes. If this seems bizarre, note that it continues established semantics. Existing UnGetChar will push back a character, implicitly in ISO-Latin-1, and it is possible to call GetWideChar next, which will use the pushed-back byte plus the byte following, decode in UCS-2LE, and return the result. UnGetUni will be more complicated to implement, but it can be done. It seems odd that there is no UnGetWideChar. UnGetUni with Enc:=Encoding.UCS_2LE should accomplish this. A UniCharsReady might be nice, but it would be O(n), for UTF-8 and UTF-16. Of course, these changes will require corresponding changes in several other stream-related interfaces, particularly in providing ways to specify (and interrogated?) an encoding property of a stream. Compiler source file encoding: Existing rules for interpretation (defacto, from the cm3 implementation) of wide character and wide string literals depend on the encoding of the input file. At present, the compiler always assumes this is ISO-latin-1. If it actually is a UTF-8 file, as is often the case today, this will result in incorrect conversion of literals. If, in our current implementation, the value of such a literal is then written out by a Modula-3 program, unchanged, the program will write ISO-Latin-1. If some other program (e.g., an editor or terminal emulator) interprets this output file as UTF-8, the reverse incorrect reinterpretation will result in the original string. But if the program manipulates the characters using the language-defined abstraction, the result will in general be incorrect. The same scenario applies when a single program reads in ISO-Latin-1, a file that was produced in UTF-8, writes in ISO-Latin-1, with the output file then being fed to some other program that interprets it as UTF-8. From dabenavidesd at yahoo.es Wed Jul 11 00:30:15 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 10 Jul 2012 23:30:15 +0100 (BST) Subject: [M3devel] A Unicode/WIDECHAR proposal In-Reply-To: <4FFC50D0.4000805@lcwb.coop> Message-ID: <1341959415.94700.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: Widechar is char simulation of a word sized char which is not intended by Rd/Wr implementation, read and write of literals is assuming that you won't get any real speed improvement over the DEC-SRC source to source transliteration of a given literal. This is to say, what you want is the same it is CM3 TEXT type with better functionality, is better to make polymorphic functions. e.g use FromChar receives both kind of chars without losing DEC-SRC representation characteristic and returning what you want in polymorphic (for instance your file text editor assumes you don't have real wide strings just yet one raw stream, then you can feed the text file in memory efficiently with a digital encoder optimized for your architecture and grab it there wherever you want, conversely opening an unused file you have to convert it at execution time, etc) way. Thanks in advance --- El mar, 10/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: [M3devel] A Unicode/WIDECHAR proposal Para: "m3devel" Fecha: martes, 10 de julio, 2012 10:57 Here is a more-or-less comprehensive proposal to get modern support of Unicode and its various encodings into Modula-3 and its libraries, while preserving both backward compatibility and original abstractions. Summary: Fix WIDECHAR so it holds all of Unicode.? This restores the abstractions we once had, by treating every character as a value of a scalar type, for in-memory processing.? The members of a TEXT and elements of ARRAY OF WIDECHAR get this property too. Do encoding/decoding in streams Wr and Rd, which are inherently sequential anyway.? Give every stream an encoding property.? Add procedures to get/put characters with encoding/decoding.? These changes are backward-compatable. You can still do low-level stuff if you have good reason, or just want to leave existing code alone.? E.g., putting the bytes of UTF-8 into the characters of a TEXT and doing your own encoding/decoding. CHAR: Leave CHAR as it is: exactly 256 values, encoded in ISO-Latin-1, ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=16_FF, BYTESIZE(CHAR)=1.? The language allows CHAR to have more values, but changing this would no doubt seriously undermine a good bit of existing code. WIDECHAR: Change WIDECHAR to have exactly the Unicode range. ORD(FIRST(WIDECHAR))=0 and ORD(LAST(WIDECHAR))=16_10FFFF.? The full ORD and VAL functions from/to WIDECHAR are defined by the code point to character mapping of the Unicode standard.? BYTESIZE(WIDECHAR)=4. Make actual internal representation be Unicode code points also.? This happens to match UTF-32, most significantly for arrays of WIDECHAR. Note that some of the codepoint values in this range are not unicode characters.? Programmers will need to account for this. CHAR <: WIDECHAR, which means they are mutually assignable, with runtime check in the one direction.? This works because the Unicode code points and the ISO-Latin-1 code points are identical in the entire ISO-Latin-1 range, up to 16_FF.? Note that at 16_80 and above, the UTF-8 encoding is more than one byte, none of them equal to the encoded code point.? This is not a problem, because both CHAR and WIDECHAR are actual code points, not one of the bytes UTF-8. TEXT: TEXT continues to be defined as abstractly a sequence of WIDECHAR.? An index into a TEXT is an integer count of characters.? The internal representation (used only in memory, and maybe in pickles) is hidden and could be just about anything. Given the extreme memory inefficiency of the current cm3 implementation of TEXT, we no doubt will want to change it, but this decision is independent and at a lower level.? The abstract interface Text will hide this. There is hardly a remaining need for Text.FromChar, because by assignability, Text.FromWideChar can be used in its place, with the same result.? But keep FromChar, for compatability with existing code. Text.FromChars just means the code points in the created text will happen to be members of type CHAR. Text.GetChar and Text.GetChars will raise an exception if a to-be-gotten code point in the TEXT lies outside the type CHAR.? This is a change from existing behavior, which just truncates the high bits of a WIDECHAR value and returns only the low bits.? Even if we didn't add the exception, we would want this to be an assignability runtime error. Literals: Inside wide character and wide text literals, add two new escapes, \u, which is followed by exactly 4 hex digits denoting a code point, and \U, which is followed by exactly 6 hex digits.? The letters 'u' and 'U' are used in this way in the Unicode standard.? \u would be redundant with the existing \x and \X escapes, but those would merely preserve compatability for existing code.? (Or is there so little existing code using them that we could eliminate them for a more consistent system?) Encodings: Define an enumeration giving the possible encodings used in streams: TYPE Encoding ???= {Inherit, ISO_Latin_1, UCS_2LE, UTF_8, UTF_16, UTF_16BE, UTF_16LE, ? ? ? UTF_32, UTF_32BE, UTF_32LE}; ISO_Latin_1 means one byte per character, unconditionally.? This is the way current Modula-3 always encodes CHAR.? An attempt to Put a code point greater than 16_FF in this encoding will raise an exception. (This can happen only using newly added procedures.) Similarly, UCS_2LE, as I understand the standard, means exactly two bytes per character, LSB first.? This is what our current Wr and Rd use for WIDECHAR.? Here again, an exception will be raised for a code point greater than 16_FFFF.? This, also, can happen only using newly added procedures. Inherit means get the encoding to be used from somewhere else, for example, from the file system, in case it is able to store this property of a file. Every Wr.T and every Rd.T has an Encoding property that can be specified when creating the stream, (from one of its subtypes).? The ways of doing this can vary with the subtype.? This defaults to Inherit, which means, if possible, take it from the file system, etc. Otherwise, there are defaults for the various streams. New operations that Put/Get Unicode characters have a parameter of type Encoding, with a default value of Inherit, which means get the encoding property from the stream.? Accepting this default would be the usual way to use these procedures. Specifying the encoding differently in the Put/Get procedure allows mixed encodings in a single stream.? It seems dubious to encourage this, but existing Wr and Rd already provide plenty of opportunities to do similar stuff anyway, so this just extends existing semantics to the new procedures.? It also allows some existing Put/Get procedures to be defined as equivalents to new ones. Wr: New procedure ? PutUniWideChar(Wr: T; ch: WIDECHAR; Enc:=Encoding.Inherit) encodes the character using Enc and appends that to the stream.? There is hardly a need for a CHAR counterpart.? Since CHAR is assignable to WIDECHAR, PutUniWideChar suffices for an actual parameter of either type.? Whether the caller provides a CHAR or a WIDECHAR (or whether we were alternatively to have different procedures) does _not_ affect the encoding, only the value range that can be passed in. Similar new procedures PutUniString, PutUniWideString, and PutUniText are counterparts to PutString, PutWideString, and PutText, respectively. Existing PutChar and PutString, which write CHARs as one byte, each become equivalent to PutUniWideChar and PutUniString, with Enc:=Encoding.ISO_Latin_1.? Similarly, Existing PutWideChar and PutWideString, which write WIDECHARs as two bytes each, becomes equivalent to PutUniWideChar and PutUniWideString, with Enc:=Encoding.UCS_2LE. The existing Wr interface is peculiar, IMO, in that even though there is currently no distinction between a text and a wide text, we have PutText and PutWideText.? These have identical signatures, both taking a TEXT (which can contain characters in the full WIDECHAR range).? The difference is that PutText rather violently truncates every character in the text to 8 bits and writes that, implicitly in ISO-Latin-1 encoding.? This is not equivalent to PutUniText with Enc:=Encoding.ISO_Latin_1, because the latter will raise an exception for unencodable code points. Rd: New procedure ? GetUniWideChar (rd:T; Enc:=Encoding.Inherit) :WIDECHAR decodes, using Enc, and consumes, enough bytes from rd for one Unicode code point and returns it.? There is not a lot of need for a CHAR-returning counterpart of GetUniWideChar.? A caller can just assign the result from GetUniWideChar to a CHAR variable and deal with the possible range error at the call site. GetUniSub, GetUniWideSub, GetUniSubLine, GetUniWideSubLine, GetUniText, and GetUniTextLine are counterparts to GetSub, GetWideSub, GetSubLine GetWideSubLine, GetWideText, and GetWideLine.? They differ in decoding according to the Enc parameter. In the new GetUni* procedures, any case where a partial character is terminated by end-of-file will raise an exception.? This differs from the current GetWide* procedures, which all implicitly use UCS_2LE and just insert a zero byte as the MSB in this case. Existing GetChar, GetSub, GetSubLine, GetText, and GetLine all implicitly use the ISO-Latin-1 encoding.? GetWideChar, GetWideSub, GetWideSubLine, GetWideText, and GetWideLine all implicitly use UCS_2LE.? They differ from new GetUni* procedures using UCS_2LE in that the latter raise an exception on a incomplete character. GetUniSub and GetUniSubLine return decoded characters in ARRAY OF CHAR and raise an exception if a decoded code point is not in CHAR.? This might seem a bit ridiculous, but they could be useful for quick, partial adaptation of existing code to accept newer encodings and detect, without otherwise handling, higher code points. Actually, GetWideText is documented as being identical to GetText, in behavior, as well as signature.? I think this must be an editing error. I wonder if we need to review the rules for what constitutes a line break. A new UnGetUni would work like UnGetChar, but would reencode the pushed-back character, (retained internally as a WIDECHAR), according to its Enc parameter.? The next Get* would then redecode according to its Enc parameter or implicit encoding, which could be different and consume a different number of bytes.? If this seems bizarre, note that it continues established semantics.? Existing UnGetChar will push back a character, implicitly in ISO-Latin-1, and it is possible to call GetWideChar next, which will use the pushed-back byte plus the byte following, decode in UCS-2LE, and return the result.? UnGetUni will be more complicated to implement, but it can be done. It seems odd that there is no UnGetWideChar.? UnGetUni with Enc:=Encoding.UCS_2LE should accomplish this. A UniCharsReady might be nice, but it would be O(n), for UTF-8 and UTF-16. Of course, these changes will require corresponding changes in several other stream-related interfaces, particularly in providing ways to specify (and interrogated?) an encoding property of a stream. Compiler source file encoding: Existing rules for interpretation (defacto, from the cm3 implementation) of wide character and wide string literals depend on the encoding of the input file.? At present, the compiler always assumes this is ISO-latin-1.? If it actually is a UTF-8 file, as is often the case today, this will result in incorrect conversion of literals. If, in our current implementation, the value of such a literal is then written out by a Modula-3 program, unchanged, the program will write ISO-Latin-1.? If some other program (e.g., an editor or terminal emulator) interprets this output file as UTF-8, the reverse incorrect reinterpretation will result in the original string.? But if the program manipulates the characters using the language-defined abstraction, the result will in general be incorrect. The same scenario applies when a single program reads in ISO-Latin-1, a file that was produced in UTF-8, writes in ISO-Latin-1, with the output file then being fed to some other program that interprets it as UTF-8. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgoltzsch at gmail.com Thu Jul 12 11:39:58 2012 From: pgoltzsch at gmail.com (Patrick Goltzsch) Date: Thu, 12 Jul 2012 11:39:58 +0200 Subject: [M3devel] unix - unknown qualification Message-ID: <20120712113958.33d94bc4@leda> Hi! I am having trouble compiling some older sources. I had the impression that it would be sufficient to "IMPORT Unix;" in ClsShare.m3 but obviously it's not: --- building in ../AMD64_LINUX --- new source -> compiling ClsShare.m3 "/work/mylib/src/ClsShare.m3", line 73: unknown qualification '.' (struct_flock) "/work/mylib/src/ClsShare.m3", line 74: unknown qualification '.' (F_WRLCK) "/work/mylib/src/ClsShare.m3", line 75: unknown qualification '.' (L_SET) "/work/mylib/src/ClsShare.m3", line 82: unknown qualification '.' (F_SETLK) "/work/mylib/src/ClsShare.m3", line 98: unknown qualification '.' (struct_flock) "/work/mylib/src/ClsShare.m3", line 99: unknown qualification '.' (F_UNLCK) "/work/mylib/src/ClsShare.m3", line 100: unknown qualification '.' (L_SET) "/work/mylib/src/ClsShare.m3", line 107: unknown qualification '.' (F_SETLK) 8 errors encountered What am I doing wrong? Compiler is: Critical Mass Modula-3 version 5.8.6 last updated: 2010-04-11 compiled: 2010-07-12 20:10:34 configuration: /usr/local/cm3/bin/cm3.cfg host: AMD64_LINUX target: AMD64_LINUX Thanks a lot, Patrick From rodney_bates at lcwb.coop Thu Jul 12 14:18:01 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Thu, 12 Jul 2012 07:18:01 -0500 Subject: [M3devel] unix - unknown qualification In-Reply-To: <20120712113958.33d94bc4@leda> References: <20120712113958.33d94bc4@leda> Message-ID: <4FFEC079.7040104@lcwb.coop> I think we need to see some source code for ClsShare.m3. particularly to see what is before the dot on these lines. I don't see any of the failing qualifications in Unix.i3 in my cm3 directory. On 07/12/2012 04:39 AM, Patrick Goltzsch wrote: > Hi! > > I am having trouble compiling some older sources. I had the > impression that it would be sufficient to "IMPORT Unix;" in > ClsShare.m3 but obviously it's not: > > --- building in ../AMD64_LINUX --- > > new source -> compiling ClsShare.m3 > "/work/mylib/src/ClsShare.m3", line 73: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 74: unknown qualification '.' (F_WRLCK) > "/work/mylib/src/ClsShare.m3", line 75: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 82: unknown qualification '.' (F_SETLK) > "/work/mylib/src/ClsShare.m3", line 98: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 99: unknown qualification '.' (F_UNLCK) > "/work/mylib/src/ClsShare.m3", line 100: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 107: unknown qualification '.' (F_SETLK) > 8 errors encountered > > What am I doing wrong? Compiler is: > > Critical Mass Modula-3 version 5.8.6 > last updated: 2010-04-11 > compiled: 2010-07-12 20:10:34 > configuration: /usr/local/cm3/bin/cm3.cfg > host: AMD64_LINUX > target: AMD64_LINUX > > Thanks a lot, > > Patrick > From rodney_bates at lcwb.coop Thu Jul 12 14:27:38 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Thu, 12 Jul 2012 07:27:38 -0500 Subject: [M3devel] unix - unknown qualification In-Reply-To: <20120712113958.33d94bc4@leda> References: <20120712113958.33d94bc4@leda> Message-ID: <4FFEC2BA.4080406@lcwb.coop> I poked around in a version of PM3. There, there are multiple, OS-dependent versions of Unix.i3. Most or all of them do have the failing qualifications declared in them. So somewhere along the line, Unix.i3 has changed and lost these declarations, leaving ClsShare in the lurch. I don't know when or why this happened. Jay? On 07/12/2012 04:39 AM, Patrick Goltzsch wrote: > Hi! > > I am having trouble compiling some older sources. I had the > impression that it would be sufficient to "IMPORT Unix;" in > ClsShare.m3 but obviously it's not: > > --- building in ../AMD64_LINUX --- > > new source -> compiling ClsShare.m3 > "/work/mylib/src/ClsShare.m3", line 73: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 74: unknown qualification '.' (F_WRLCK) > "/work/mylib/src/ClsShare.m3", line 75: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 82: unknown qualification '.' (F_SETLK) > "/work/mylib/src/ClsShare.m3", line 98: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 99: unknown qualification '.' (F_UNLCK) > "/work/mylib/src/ClsShare.m3", line 100: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 107: unknown qualification '.' (F_SETLK) > 8 errors encountered > > What am I doing wrong? Compiler is: > > Critical Mass Modula-3 version 5.8.6 > last updated: 2010-04-11 > compiled: 2010-07-12 20:10:34 > configuration: /usr/local/cm3/bin/cm3.cfg > host: AMD64_LINUX > target: AMD64_LINUX > > Thanks a lot, > > Patrick > From pgoltzsch at gmail.com Thu Jul 12 14:58:11 2012 From: pgoltzsch at gmail.com (Patrick Goltzsch) Date: Thu, 12 Jul 2012 14:58:11 +0200 Subject: [M3devel] unix - unknown qualification In-Reply-To: <4FFEC079.7040104@lcwb.coop> References: <20120712113958.33d94bc4@leda> <4FFEC079.7040104@lcwb.coop> Message-ID: <20120712145811.2a4901d3@leda> >>>>> Rodney M. Bates wrote: > I think we need to see some source code for ClsShare.m3. > particularly to see what is before the dot on these lines. I > don't see any of the failing qualifications in Unix.i3 in my > cm3 directory. The first errors are caused by the following procedure, which seems to copied from old DEC example code as I found out while looking for a solution: PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = VAR flock := Unix.struct_flock { l_type := Unix.F_WRLCK, l_whence := Unix.L_SET, l_start := 0, l_len := 0, (* i.e., whole file *) l_pid := 0 }; (* don't care *) BEGIN flock.l_start := start; flock.l_len := len; IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 THEN IF Uerror.errno = Uerror.EACCES OR Uerror.errno = Uerror.EAGAIN THEN RETURN FALSE END; OSErrorPosix.Raise() END; RETURN TRUE END FilePartLock; Regards, Patrick From dabenavidesd at yahoo.es Thu Jul 12 15:43:52 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 12 Jul 2012 14:43:52 +0100 (BST) Subject: [M3devel] unix - unknown qualification In-Reply-To: <4FFEC2BA.4080406@lcwb.coop> Message-ID: <1342100632.27773.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: all gnu non-posix file consts and structs were pushed down to unix/linux-common files, but to accommodate for all non-posix standards is uncomfortable or impossible. So must use the kernel call directly to control the locking policy in C code and pass control to M3 youControlFile.c In a sane environment is better to reconstruct most of Unix Calls by Micro kernel, but I guess the world doesn't do that or maybe you can find a Unix API uniform enough Modular to do that like PosixFileC.c in libm3/src/os/POSIX for sure there is more than one outside there but who makes that thing doesn't uses Unixes like cygwin or some UnixControlFile.c that already do that would be wodnerful. Thanks in advance --- El jue, 12/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] unix - unknown qualification Para: m3devel at elegosoft.com Fecha: jueves, 12 de julio, 2012 07:27 I poked around in a version of PM3.? There, there are multiple, OS-dependent versions of Unix.i3.? Most or all of them do have the failing qualifications declared in them.? So somewhere along the line, Unix.i3 has changed and lost these declarations, leaving ClsShare in the lurch. I don't know when or why this happened.? Jay? On 07/12/2012 04:39 AM, Patrick Goltzsch wrote: > Hi! > > I am having trouble compiling some older sources. I had the > impression that it would be sufficient to "IMPORT Unix;" in > ClsShare.m3 but obviously it's not: > > --- building in ../AMD64_LINUX --- > > new source ->? compiling ClsShare.m3 > "/work/mylib/src/ClsShare.m3", line 73: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 74: unknown qualification '.' (F_WRLCK) > "/work/mylib/src/ClsShare.m3", line 75: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 82: unknown qualification '.' (F_SETLK) > "/work/mylib/src/ClsShare.m3", line 98: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 99: unknown qualification '.' (F_UNLCK) > "/work/mylib/src/ClsShare.m3", line 100: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 107: unknown qualification '.' (F_SETLK) > 8 errors encountered > > What am I doing wrong? Compiler is: > > Critical Mass Modula-3 version 5.8.6 >? ? last updated: 2010-04-11 >? ? compiled: 2010-07-12 20:10:34 >? ? configuration: /usr/local/cm3/bin/cm3.cfg >? ? host: AMD64_LINUX >? ? target: AMD64_LINUX > > Thanks a lot, > > ??? ??? ??? Patrick > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Thu Jul 12 18:52:38 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 12 Jul 2012 17:52:38 +0100 (BST) Subject: [M3devel] Why everything is an object Message-ID: <1342111958.55562.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: If you read this might give some idea to all users about why here everything is an object for real: http://wcook.blogspot.com/ Curiosity, it doesn't much explain why functional isn't subsumed by OO, but every Object in the Baby Modula-3 is functional Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jul 13 00:12:49 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 12 Jul 2012 22:12:49 +0000 Subject: [M3devel] unix - unknown qualification In-Reply-To: <20120712145811.2a4901d3@leda> References: <20120712113958.33d94bc4@leda>, <4FFEC079.7040104@lcwb.coop>, <20120712145811.2a4901d3@leda> Message-ID: Unix.i3 has always been a maintenance and portability problem.As such, it has been dramatically reduced.This stuff was probably removed, esp. struct_flock.The constants can exposed easily enough, portably, but aren't useful without the struct, prpbably. You REALLY REALLY REALLY want to write this in C.Writing it in Modula-3 has many downsides. You lose safety. You lose static checking. You lose portability.You gain infinitely small efficiency.Something like: jbook2:libm3 jay$ pwd/dev2/cm3/m3-libs/libm3jbook2:libm3 jay$ find . | xargs grep flock./src/os/POSIX/FilePosixC.c:saves us from having to declare struct flock, which is gnarled up in #ifdefs../src/os/POSIX/FilePosixC.c: struct flock lock;./src/os/POSIX/FilePosixC.c: struct flock lock;./tests/os/src/locktest.c: struct flock param; ./src/os/POSIX/FilePosixC.c: /* Copyright (C) 1993, Digital Equipment Corporation *//* All rights reserved. *//* See the file COPYRIGHT for a full description. */ /*Writing part of libm3/os/POSIX/FilePosix.m3/RegularFileLock, RegularFileUnlock in Csaves us from having to declare struct flock, which is gnarled up in #ifdefs. see http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html*/ #include "m3core.h"#include #ifdef __cplusplusextern "C" {#endif #define FALSE 0#define TRUE 1 INTEGER FilePosixC__RegularFileLock(int fd){ struct flock lock; int err; ZeroMemory(&lock, sizeof(lock)); lock.l_type = F_WRLCK; lock.l_whence = SEEK_SET; if (fcntl(fd, F_SETLK, &lock) < 0) { err = errno; if (err == EACCES || err == EAGAIN) return FALSE; return -1; } return TRUE;} INTEGER FilePosixC__RegularFileUnlock(int fd){ struct flock lock; ZeroMemory(&lock, sizeof(lock)); lock.l_type = F_UNLCK; lock.l_whence = SEEK_SET; return fcntl(fd, F_SETLK, &lock);} #ifdef __cplusplus} /* extern "C" */#endif We can add this to libm3 probably. - Jay > Date: Thu, 12 Jul 2012 14:58:11 +0200 > From: pgoltzsch at gmail.com > To: m3devel at elegosoft.com > Subject: Re: [M3devel] unix - unknown qualification > > >>>>> Rodney M. Bates wrote: > > > I think we need to see some source code for ClsShare.m3. > > particularly to see what is before the dot on these lines. I > > don't see any of the failing qualifications in Unix.i3 in my > > cm3 directory. > > The first errors are caused by the following procedure, > which seems to copied from old DEC example code as I found > out while looking for a solution: > > PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = > VAR flock := Unix.struct_flock { > l_type := Unix.F_WRLCK, > l_whence := Unix.L_SET, > l_start := 0, > l_len := 0, (* i.e., whole file *) > l_pid := 0 }; (* don't care *) > BEGIN > flock.l_start := start; > flock.l_len := len; > IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 > THEN > IF Uerror.errno = Uerror.EACCES OR > Uerror.errno = Uerror.EAGAIN THEN > RETURN FALSE > END; > OSErrorPosix.Raise() > END; > RETURN TRUE > END FilePartLock; > > > > Regards, > > Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jul 13 11:33:16 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 13 Jul 2012 09:33:16 +0000 Subject: [M3devel] unix - unknown qualification In-Reply-To: References: <20120712113958.33d94bc4@leda>, <4FFEC079.7040104@lcwb.coop>, , <20120712145811.2a4901d3@leda>, Message-ID: Hey, how about I just provide copying wrappers here, like we do for stat?Where we define an idealized portable struct flock and copy it back and forth in C to the real struct flock? It is a little strange -- the wrapper is fnctl.It must check the first parameter, and know/assume its meaning. - Jay From: jay.krell at cornell.edu To: pgoltzsch at gmail.com; m3devel at elegosoft.com Date: Thu, 12 Jul 2012 22:12:49 +0000 Subject: Re: [M3devel] unix - unknown qualification Unix.i3 has always been a maintenance and portability problem.As such, it has been dramatically reduced.This stuff was probably removed, esp. struct_flock.The constants can exposed easily enough, portably, but aren't useful without the struct, prpbably. You REALLY REALLY REALLY want to write this in C.Writing it in Modula-3 has many downsides. You lose safety. You lose static checking. You lose portability.You gain infinitely small efficiency.Something like: jbook2:libm3 jay$ pwd/dev2/cm3/m3-libs/libm3jbook2:libm3 jay$ find . | xargs grep flock./src/os/POSIX/FilePosixC.c:saves us from having to declare struct flock, which is gnarled up in #ifdefs../src/os/POSIX/FilePosixC.c: struct flock lock;./src/os/POSIX/FilePosixC.c: struct flock lock;./tests/os/src/locktest.c: struct flock param; ./src/os/POSIX/FilePosixC.c: /* Copyright (C) 1993, Digital Equipment Corporation *//* All rights reserved. *//* See the file COPYRIGHT for a full description. */ /*Writing part of libm3/os/POSIX/FilePosix.m3/RegularFileLock, RegularFileUnlock in Csaves us from having to declare struct flock, which is gnarled up in #ifdefs. see http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html*/ #include "m3core.h"#include #ifdef __cplusplusextern "C" {#endif #define FALSE 0#define TRUE 1 INTEGER FilePosixC__RegularFileLock(int fd){ struct flock lock; int err; ZeroMemory(&lock, sizeof(lock)); lock.l_type = F_WRLCK; lock.l_whence = SEEK_SET; if (fcntl(fd, F_SETLK, &lock) < 0) { err = errno; if (err == EACCES || err == EAGAIN) return FALSE; return -1; } return TRUE;} INTEGER FilePosixC__RegularFileUnlock(int fd){ struct flock lock; ZeroMemory(&lock, sizeof(lock)); lock.l_type = F_UNLCK; lock.l_whence = SEEK_SET; return fcntl(fd, F_SETLK, &lock);} #ifdef __cplusplus} /* extern "C" */#endif We can add this to libm3 probably. - Jay > Date: Thu, 12 Jul 2012 14:58:11 +0200 > From: pgoltzsch at gmail.com > To: m3devel at elegosoft.com > Subject: Re: [M3devel] unix - unknown qualification > > >>>>> Rodney M. Bates wrote: > > > I think we need to see some source code for ClsShare.m3. > > particularly to see what is before the dot on these lines. I > > don't see any of the failing qualifications in Unix.i3 in my > > cm3 directory. > > The first errors are caused by the following procedure, > which seems to copied from old DEC example code as I found > out while looking for a solution: > > PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = > VAR flock := Unix.struct_flock { > l_type := Unix.F_WRLCK, > l_whence := Unix.L_SET, > l_start := 0, > l_len := 0, (* i.e., whole file *) > l_pid := 0 }; (* don't care *) > BEGIN > flock.l_start := start; > flock.l_len := len; > IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 > THEN > IF Uerror.errno = Uerror.EACCES OR > Uerror.errno = Uerror.EAGAIN THEN > RETURN FALSE > END; > OSErrorPosix.Raise() > END; > RETURN TRUE > END FilePartLock; > > > > Regards, > > Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Fri Jul 13 14:54:37 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Fri, 13 Jul 2012 07:54:37 -0500 Subject: [M3devel] unix - unknown qualification In-Reply-To: References: <20120712113958.33d94bc4@leda>, <4FFEC079.7040104@lcwb.coop>, , <20120712145811.2a4901d3@leda>, Message-ID: <50001A8D.80805@lcwb.coop> Sounds like a good idea to me. IT moves the M3/C boundary back just enough to pick up all the #ifdef stuff, etc. but not the application-specific code. On 07/13/2012 04:33 AM, Jay K wrote: > Hey, how about I just provide copying wrappers here, like we do for stat? > Where we define an idealized portable struct flock and copy it back and forth in C to the real struct flock? > > It is a little strange -- the wrapper is fnctl. > It must check the first parameter, and know/assume its meaning. > > > - Jayrom: jay.krell at cornell.edu > To: pgoltzsch at gmail.com; m3devel at elegosoft.com > Date: Thu, 12 Jul 2012 22:12:49 +0000 > Subject: Re: [M3devel] unix - unknown qualification > > Unix.i3 has always been a maintenance and portability problem. > As such, it has been dramatically reduced. > This stuff was probably removed, esp. struct_flock. > The constants can exposed easily enough, portably, but aren't useful without the struct, prpbably. > > > You REALLY REALLY REALLY want to write this in C. > Writing it in Modula-3 has many downsides. You lose safety. You losestatic checking. You loseportability. > You gain infinitely small efficiency. > Something like: > > > jbook2:libm3 jay$ pwd > /dev2/cm3/m3-libs/libm3 > jbook2:libm3 jay$ find . | xargs grep flock > ./src/os/POSIX/FilePosixC.c:saves us from having to declare struct flock, which is gnarled up in #ifdefs. > ./src/os/POSIX/FilePosixC.c: struct flock lock; > ./src/os/POSIX/FilePosixC.c: struct flock lock; > ./tests/os/src/locktest.c: struct flock param; > > > ./src/os/POSIX/FilePosixC.c: > > /* Copyright (C) 1993, Digital Equipment Corporation */ > /* All rights reserved. */ > /* See the file COPYRIGHT for a full description. */ > > /* > Writing part of libm3/os/POSIX/FilePosix.m3/RegularFileLock, RegularFileUnlock in C > saves us from having to declare struct flock, which is gnarled up in #ifdefs. > > see http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html > */ > > #include "m3core.h" > #include > > #ifdef __cplusplus > extern "C" { > #endif > > #define FALSE 0 > #define TRUE 1 > > INTEGER FilePosixC__RegularFileLock(int fd) > { > struct flock lock; > int err; > > ZeroMemory(&lock, sizeof(lock)); > lock.l_type = F_WRLCK; > lock.l_whence = SEEK_SET; > > if (fcntl(fd, F_SETLK, &lock) < 0) > { > err = errno; > if (err == EACCES || err == EAGAIN) > return FALSE; > return -1; > } > return TRUE; > } > > INTEGER FilePosixC__RegularFileUnlock(int fd) > { > struct flock lock; > > ZeroMemory(&lock, sizeof(lock)); > lock.l_type = F_UNLCK; > lock.l_whence = SEEK_SET; > > return fcntl(fd, F_SETLK, &lock); > } > > #ifdef __cplusplus > } /* extern "C" */ > #endif > > > > We can add this to libm3 probably. > > > - Jay > > > > Date: Thu, 12 Jul 2012 14:58:11 +0200 > > From: pgoltzsch at gmail.com > > To: m3devel at elegosoft.com > > Subject: Re: [M3devel] unix - unknown qualification > > > > >>>>> Rodney M. Bates wrote: > > > > > I think we need to see some source code for ClsShare.m3. > > > particularly to see what is before the dot on these lines. I > > > don't see any of the failing qualifications in Unix.i3 in my > > > cm3 directory. > > > > The first errors are caused by the following procedure, > > which seems to copied from old DEC example code as I found > > out while looking for a solution: > > > > PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = > > VAR flock := Unix.struct_flock { > > l_type := Unix.F_WRLCK, > > l_whence := Unix.L_SET, > > l_start := 0, > > l_len := 0, (* i.e., whole file *) > > l_pid := 0 }; (* don't care *) > > BEGIN > > flock.l_start := start; > > flock.l_len := len; > > IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 > > THEN > > IF Uerror.errno = Uerror.EACCES OR > > Uerror.errno = Uerror.EAGAIN THEN > > RETURN FALSE > > END; > > OSErrorPosix.Raise() > > END; > > RETURN TRUE > > END FilePartLock; > > > > > > > > Regards, > > > > Patrick From dabenavidesd at yahoo.es Fri Jul 13 16:44:55 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 13 Jul 2012 15:44:55 +0100 (BST) Subject: [M3devel] unix - unknown qualification In-Reply-To: <50001A8D.80805@lcwb.coop> Message-ID: <1342190695.15538.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: indeed but I'm afraid that using C API level specification programming doesn't make the bulk sense of the language, the core is about machine programming, that so many believe is better in C. But UNSAFE in my way of think is just better than C because you still have some check not bullet proof, but with appropriate module isolation you can control it doesn't propagate by using Modula-3 keen Modules in RTMachinery stopped appropriately and where the machine allows safety manageable execution you can recover from that (trapped error, like arithmetic overflow e.g to dump it in disk) or update your data and finish with an expectancy of following rules to stop execution, this is my point Jay. Now quality of current machines is going more bad than before, so who cares if we use DEC stuff. I wanted to say, that here the language designers tried hard to make easier to optimize itself the language and for this purpose in mind, with that objective makes sense to believe that the application itself must be compiled with Modula-3, so at some degree I'm being hypocritical about Gcc use, but sometimes using Gcc gives more time to develop the rest of the system. Thanks in advance --- El vie, 13/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] unix - unknown qualification Para: m3devel at elegosoft.com Fecha: viernes, 13 de julio, 2012 07:54 Sounds like a good idea to me.? IT moves the M3/C boundary back just enough to pick up all the #ifdef stuff, etc. but not the application-specific code. On 07/13/2012 04:33 AM, Jay K wrote: > Hey, how about I just provide copying wrappers here, like we do for stat? > Where we define an idealized portable struct flock and copy it back and forth in C to the real struct flock? > > It is a little strange -- the wrapper is fnctl. > It must check the first parameter, and know/assume its meaning. > > >???- Jayrom: jay.krell at cornell.edu > To: pgoltzsch at gmail.com; m3devel at elegosoft.com > Date: Thu, 12 Jul 2012 22:12:49 +0000 > Subject: Re: [M3devel] unix - unknown qualification > > Unix.i3 has always been a maintenance and portability problem. > As such, it has been dramatically reduced. > This stuff was probably removed, esp. struct_flock. > The constants can exposed easily enough, portably, but aren't useful without the struct, prpbably. > > > You REALLY REALLY REALLY want to write this in C. > Writing it in Modula-3 has many downsides. You lose safety. You losestatic checking. You loseportability. > You gain infinitely small efficiency. > Something like: > > > jbook2:libm3 jay$ pwd > /dev2/cm3/m3-libs/libm3 > jbook2:libm3 jay$ find . | xargs grep flock > ./src/os/POSIX/FilePosixC.c:saves us from having to declare struct flock, which is gnarled up in #ifdefs. > ./src/os/POSIX/FilePosixC.c:? ? struct flock lock; > ./src/os/POSIX/FilePosixC.c:? ? struct flock lock; > ./tests/os/src/locktest.c:? struct flock param; > > > ./src/os/POSIX/FilePosixC.c: > > /* Copyright (C) 1993, Digital Equipment Corporation? ? ? ? ???*/ > /* All rights reserved.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? */ > /* See the file COPYRIGHT for a full description.? ? ? ? ? ? ? */ > > /* > Writing part of libm3/os/POSIX/FilePosix.m3/RegularFileLock, RegularFileUnlock in C > saves us from having to declare struct flock, which is gnarled up in #ifdefs. > > see http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html > */ > > #include "m3core.h" > #include > > #ifdef __cplusplus > extern "C" { > #endif > > #define FALSE 0 > #define TRUE 1 > > INTEGER FilePosixC__RegularFileLock(int fd) > { >? ? ? struct flock lock; >? ? ? int err; > >? ? ? ZeroMemory(&lock, sizeof(lock)); >? ? ? lock.l_type = F_WRLCK; >? ? ? lock.l_whence = SEEK_SET; > >? ? ? if (fcntl(fd, F_SETLK, &lock) < 0) >? ? ? { >? ? ? ? ? err = errno; >? ? ? ? ? if (err == EACCES || err == EAGAIN) >? ? ? ? ? ? ? return FALSE; >? ? ? ? ? return -1; >? ? ? } >? ? ? return TRUE; > } > > INTEGER FilePosixC__RegularFileUnlock(int fd) > { >? ? ? struct flock lock; > >? ? ? ZeroMemory(&lock, sizeof(lock)); >? ? ? lock.l_type = F_UNLCK; >? ? ? lock.l_whence = SEEK_SET; > >? ? ? return fcntl(fd, F_SETLK, &lock); > } > > #ifdef __cplusplus > } /* extern "C" */ > #endif > > > > We can add this to libm3 probably. > > >???- Jay > > >? > Date: Thu, 12 Jul 2012 14:58:11 +0200 >? > From: pgoltzsch at gmail.com >? > To: m3devel at elegosoft.com >? > Subject: Re: [M3devel] unix - unknown qualification >? > >? > >>>>> Rodney M. Bates wrote: >? > >? > > I think we need to see some source code for ClsShare.m3. >? > > particularly to see what is before the dot on these lines. I >? > > don't see any of the failing qualifications in Unix.i3 in my >? > > cm3 directory. >? > >? > The first errors are caused by the following procedure, >? > which seems to copied from old DEC example code as I found >? > out while looking for a solution: >? > >? > PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = >? > VAR flock := Unix.struct_flock { >? > l_type := Unix.F_WRLCK, >? > l_whence := Unix.L_SET, >? > l_start := 0, >? > l_len := 0, (* i.e., whole file *) >? > l_pid := 0 }; (* don't care *) >? > BEGIN >? > flock.l_start := start; >? > flock.l_len := len; >? > IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 >? > THEN >? > IF Uerror.errno = Uerror.EACCES OR >? > Uerror.errno = Uerror.EAGAIN THEN >? > RETURN FALSE >? > END; >? > OSErrorPosix.Raise() >? > END; >? > RETURN TRUE >? > END FilePartLock; >? > >? > >? > >? > Regards, >? > >? > Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jul 14 10:27:23 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 14 Jul 2012 08:27:23 +0000 Subject: [M3devel] fcntl last parameter int vs. pointer Message-ID: Thoughts on Unix__fcntl(int fd, int request, int arg) { ??? return fcntl(fd, request, arg); } vs. Unix__fcntl(int fd, int request, INTEGER arg) { ??? return fcntl(fd, request, arg); } where int is 32bits and INTEGER is exactly the same size as a pointer. Will it "just work" if I change it? arg is sometimes a pointer, sometimes an integer, maybe sometimes other? Ok, let's assume 32bit integer and 32bit or 64bit pointer are the only possibilities. Are there calling conventions that care? And will pass the parameter differently/wrong? Do any calling conventions pack multiple smaller-than-64bit parameters into one 64bit register? I'm *guessing* no. I guess, as well, I can experiment with a few... ?- Jay From dabenavidesd at yahoo.es Sat Jul 14 17:31:36 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sat, 14 Jul 2012 16:31:36 +0100 (BST) Subject: [M3devel] fcntl last parameter int vs. pointer In-Reply-To: Message-ID: <1342279896.71297.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: In fact both C and Modula-3 don't allow a signature change, original C compiler and decompiler type check the signature, although type casting is possible in both languages: http://web.cs.mun.ca/~ulf/pld/mocplus.html However, when talking about a functional language you can override as in Baby Modula-3 the type at instantiation time for methods and values, so I guess you can sort of relax the strict rules in that relation of the two object function types (to make it a subtype): http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf see p. 10 - S. 3.2.4 - Discussion I just know they could make it work, but it was very hard complex system. Thanks in advance --- El s?b, 14/7/12, Jay K escribi?: De: Jay K Asunto: [M3devel] fcntl last parameter int vs. pointer Para: "m3devel" Fecha: s?bado, 14 de julio, 2012 03:27 Thoughts on Unix__fcntl(int fd, int request, int arg) { return fcntl(fd, request, arg); } vs. Unix__fcntl(int fd, int request, INTEGER arg) { return fcntl(fd, request, arg); } where int is 32bits and INTEGER is exactly the same size as a pointer. Will it "just work" if I change it? arg is sometimes a pointer, sometimes an integer, maybe sometimes other? Ok, let's assume 32bit integer and 32bit or 64bit pointer are the only possibilities. Are there calling conventions that care? And will pass the parameter differently/wrong? Do any calling conventions pack multiple smaller-than-64bit parameters into one 64bit register? I'm *guessing* no. I guess, as well, I can experiment with a few... - Jay ??? ???????? ?????? ??? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Sat Jul 14 22:05:57 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Sat, 14 Jul 2012 15:05:57 -0500 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> <20120627015457.238041A205B@async.async.caltech.edu> Message-ID: <5001D125.6020704@lcwb.coop> On 06/27/2012 02:58 AM, Dirk Muysers wrote: > Some time ago I have started to develop a unicode library based > on the old M3 text model but using UTF-8 internally rather than > Latin-1 (see README attachement). For reasons best known to > me I had to put it on the backburner in favour of more urgent work. > If anybody is interested in furthering this solution I would eagerly > give the existing (pre-alpha) code away. > This being said, there are certainly better hash algorithms than the > one used by m3core (eg Goullburn, see > http://www.clockandflame.com/media/Goulburn06.pdf). > > And: 1. Properties This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as TYPE Rune = [0..16_10FFFF], but unfortunately not all values in the code-point range are valid and others are left undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to contain any invalid or undefined Rune. I don't understand the reasoning here. Your criticism of the subrange type is that it contains invalid values between the bounds, which you address with dynamic value checks inside the library code. But why eliminate the subrange and changing the type to an integer? It only drastically increases the number of invalid values, by a factor of over 2^11 times, if integer is 32-bit, otherwise more. And it demotes the status of these from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even partial level of confidence. It also precludes storing them in less than 64 bits on a 64-bit machine. Am I missing something? From jay.krell at cornell.edu Sun Jul 15 03:11:26 2012 From: jay.krell at cornell.edu (Jay) Date: Sat, 14 Jul 2012 18:11:26 -0700 Subject: [M3devel] fcntl last parameter int vs. pointer In-Reply-To: <1342279896.71297.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1342279896.71297.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: Daniel your replies are pointless. You have exhausted my patience. - Jay (briefly/pocket-sized-computer-aka-phone) On Jul 14, 2012, at 8:31 AM, "Daniel Alejandro Benavides D." wrote: > Hi all: > In fact both C and Modula-3 don't allow a signature change, original C compiler and decompiler type check the signature, although type casting is possible in both languages: > http://web.cs.mun.ca/~ulf/pld/mocplus.html > > However, when talking about a functional language you can override as in Baby Modula-3 the type at instantiation time for methods and values, so I guess you can sort of relax the strict rules in that relation of the two object function types (to make it a subtype): > http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf > > see p. 10 - S. 3.2.4 - Discussion > > I just know they could make it work, but it was very hard complex system. > > Thanks in advance > > > --- El s?b, 14/7/12, Jay K escribi?: > > De: Jay K > Asunto: [M3devel] fcntl last parameter int vs. pointer > Para: "m3devel" > Fecha: s?bado, 14 de julio, 2012 03:27 > > > Thoughts on > > Unix__fcntl(int fd, int request, int arg) > { > return fcntl(fd, request, arg); > } > > vs. > > Unix__fcntl(int fd, int request, INTEGER arg) > { > > return fcntl(fd, request, arg); > > } > > > > where int is 32bits and INTEGER is exactly the same size as a pointer. > > > Will it "just work" if I change it? > arg is sometimes a pointer, sometimes an integer, maybe sometimes other? > Ok, let's assume 32bit integer and 32bit or 64bit pointer are the only possibilities. > Are there calling conventions that care? And will pass the parameter differently/wrong? > > > Do any calling conventions pack multiple smaller-than-64bit parameters into one 64bit register? > > > I'm *guessing* no. > I guess, as well, I can experiment with a few... > > > > - Jay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Sun Jul 15 03:28:59 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 15 Jul 2012 02:28:59 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <5001D125.6020704@lcwb.coop> Message-ID: <1342315739.35223.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: making changes of type representation of TEXT in text makes no sense, it's a violation of the Text abstraction http://books.google.com.co/books?id=FbemcUFa0JIC&pg=PA303&dq=source=bl&ots=POGbXUhcW1&sig=WULSpZ74yYU30s-cZ2zhtNTByd8&hl=en&redir_esc=y#v=onepage&q&f=false you are in fact claiming that the default type of CHAR is Latin-1, I don't get that because you type extend CHAR and say it's not default, there is something bad, I'm some how suspicious about this need to type in *different* two ranges for receiving a character in one type script and one in another, essentially meaning that the language is wrong in declaring TEXT as a opaque type and should use both kind of strings always or worse the non-default type, which is naturally impossible. Sorry guys, but I'm not agreeing with you in this one, I hope you make the best of CM3 work or leave alone the package a la DEC-SRC. If you are thinking in widening the TEXT string package make it polymorphic it doesn't add complexity burden to the language, it explains better the CHAR type and its extension but do it naturally using the language types, don't create your own one with only that purpose. THanks? in advance ? --- El s?b, 14/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: m3devel at elegosoft.com Fecha: s?bado, 14 de julio, 2012 15:05 On 06/27/2012 02:58 AM, Dirk Muysers wrote: > Some time ago I have started to develop a unicode library based > on the old M3 text model but using UTF-8 internally rather than > Latin-1 (see README attachement). For reasons best known to > me I had to put it on the backburner in favour of more urgent work. > If anybody is interested in furthering this solution I would eagerly > give the existing (pre-alpha) code away. > This being said, there are certainly better hash algorithms than the > one used by m3core (eg Goullburn, see > http://www.clockandflame.com/media/Goulburn06.pdf). > > And: 1. Properties This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as TYPE Rune = [0..16_10FFFF], but? unfortunately not all values in the code-point range are valid and others are left undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to contain any invalid or undefined Rune. I don't understand the reasoning here.? Your criticism of the subrange type is that it contains invalid values between the bounds, which you address with dynamic value checks inside the library code.? But why eliminate the subrange and changing the type to an integer?? It only drastically increases the number of invalid values, by a factor of over 2^11 times, if integer is 32-bit, otherwise more.? And it demotes the status of these from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even partial level of confidence.? It also precludes storing them in less than 64 bits on a 64-bit machine. Am I missing something? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Sun Jul 15 03:44:36 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 15 Jul 2012 02:44:36 +0100 (BST) Subject: [M3devel] fcntl last parameter int vs. pointer In-Reply-To: Message-ID: <1342316676.56405.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: I'm sorry for you That I didn't exampled my self my point (perhaps I'm being too abstract for this point), but if you cared to tell all that I will say it more openly: Doing that type conversion as the first url says (look third row at the beginning a. literal) http://web.cs.mun.ca/~ulf/pld/mocplus.html#subclassing You will break the modular safety. However I'm telling you that one can make such an abstraction in Modula-3 (in Baby sized language) with functional programming making obeying subtype fcntl1 <: fcntl2, of course Jay I suppose your fcntl1 is badly signed, am I right? OK, I hope I'm being clearer. Thanks for the patience of all of that, in advance --- El s?b, 14/7/12, Jay escribi?: De: Jay Asunto: Re: [M3devel] fcntl last parameter int vs. pointer Para: "Daniel Alejandro Benavides D." CC: "m3devel" , "Jay K" Fecha: s?bado, 14 de julio, 2012 20:11 Daniel your replies are pointless. You have exhausted my patience. ?- Jay (briefly/pocket-sized-computer-aka-phone) On Jul 14, 2012, at 8:31 AM, "Daniel Alejandro Benavides D." wrote: Hi all: In fact both C and Modula-3 don't allow a signature change, original C compiler and decompiler type check the signature, although type casting is possible in both languages: http://web.cs.mun.ca/~ulf/pld/mocplus.html However, when talking about a functional language you can override as in Baby Modula-3 the type at instantiation time for methods and values, so I guess you can sort of relax the strict rules in that relation of the two object function types (to make it a subtype): http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf see p. 10 - S. 3.2.4 - Discussion I just know they could make it work, but it was very hard complex system. Thanks in advance --- El s?b, 14/7/12, Jay K escribi?: De: Jay K Asunto: [M3devel] fcntl last parameter int vs. pointer Para: "m3devel" Fecha: s?bado, 14 de julio, 2012 03:27 Thoughts on Unix__fcntl(int fd, int request, int arg) { return fcntl(fd, request, arg); } vs. Unix__fcntl(int fd, int request, INTEGER arg) { return fcntl(fd, request, arg); } where int is 32bits and INTEGER is exactly the same size as a pointer. Will it "just work" if I change it? arg is sometimes a pointer, sometimes an integer, maybe sometimes other? Ok, let's assume 32bit integer and 32bit or 64bit pointer are the only possibilities. Are there calling conventions that care? And will pass the parameter differently/wrong? Do any calling conventions pack multiple smaller-than-64bit parameters into one 64bit register? I'm *guessing* no. I guess, as well, I can experiment with a few... - Jay ??? ???????? ?????? ??? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Sun Jul 15 10:13:35 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Sun, 15 Jul 2012 10:13:35 +0200 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <5001D125.6020704@lcwb.coop> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org><20120626181955.GB29355@topoi.pooq.com><20120627015457.238041A205B@async.async.caltech.edu> <5001D125.6020704@lcwb.coop> Message-ID: My reasoning here was a pragmatic rather than a type-theoretical one. A rune defined as an integer can be freely passed around, while as a subrange it undergoes a hidden range check at every assignment. Now that range check wouldn't buy me anything, since the validation of a rune entails more than a simple range check and remains unavoidable in order to ensure the postcondition of pure Unicode in any text. -------------------------------------------------- From: "Rodney M. Bates" Sent: Saturday, July 14, 2012 10:05 PM To: Subject: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > > > On 06/27/2012 02:58 AM, Dirk Muysers wrote: >> Some time ago I have started to develop a unicode library based >> on the old M3 text model but using UTF-8 internally rather than >> Latin-1 (see README attachement). For reasons best known to >> me I had to put it on the backburner in favour of more urgent work. >> If anybody is interested in furthering this solution I would eagerly >> give the existing (pre-alpha) code away. >> This being said, there are certainly better hash algorithms than the >> one used by m3core (eg Goullburn, see >> http://www.clockandflame.com/media/Goulburn06.pdf). >> >> > And: > > > 1. Properties > > This part deals with properties of Unicode code-points/characters. We call > Unicode code-points "runes" for brevity. > Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode > specification. We could have defined a Rune as > TYPE Rune = [0..16_10FFFF], but unfortunately not all values in the > code-point range are valid and others are left > undefined, so a "Rune" is defined as an integer. The library uses > defensive programming by not allowing a string to > contain any invalid or undefined Rune. > > I don't understand the reasoning here. Your criticism of the subrange > type is that it contains invalid values > between the bounds, which you address with dynamic value checks inside the > library code. But why eliminate the > subrange and changing the type to an integer? It only drastically > increases the number of invalid values, > by a factor of over 2^11 times, if integer is 32-bit, otherwise more. And > it demotes the status of these > from statically-detected, in one compile, to dynamically-detected, > requiring massive testing to get an even > partial level of confidence. It also precludes storing them in less than > 64 bits on a 64-bit machine. > > Am I missing something? > From dabenavidesd at yahoo.es Sun Jul 15 15:14:51 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 15 Jul 2012 14:14:51 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: Message-ID: <1342358091.65493.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: wouldn't be pragmas the best solution here, making them inlining of TEXT type as some representation specific character type, still not making the language obey rules that aren't inherently correct, by that I mean, CHARs are what they are and string of CHARs values are compatible in current implementation just that it doesn't care too much to validate when one character or another is in typed. Thanks in advance --- El dom, 15/7/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "Rodney M. Bates" CC: m3devel at elegosoft.com Fecha: domingo, 15 de julio, 2012 03:13 My reasoning here was a pragmatic rather than a type-theoretical one. A rune defined as an integer can be freely passed around, while as a subrange it undergoes a hidden range check at every assignment. Now that range check wouldn't buy me anything, since the validation of a rune entails more than a simple range check and remains unavoidable in order to ensure the postcondition of pure Unicode in any text. -------------------------------------------------- From: "Rodney M. Bates" Sent: Saturday, July 14, 2012 10:05 PM To: Subject: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > > > On 06/27/2012 02:58 AM, Dirk Muysers wrote: >> Some time ago I have started to develop a unicode library based >> on the old M3 text model but using UTF-8 internally rather than >> Latin-1 (see README attachement). For reasons best known to >> me I had to put it on the backburner in favour of more urgent work. >> If anybody is interested in furthering this solution I would eagerly >> give the existing (pre-alpha) code away. >> This being said, there are certainly better hash algorithms than the >> one used by m3core (eg Goullburn, see >> http://www.clockandflame.com/media/Goulburn06.pdf). >> >> > And: > > > 1. Properties > > This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. > Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as > TYPE Rune = [0..16_10FFFF], but? unfortunately not all values in the code-point range are valid and others are left > undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to > contain any invalid or undefined Rune. > > I don't understand the reasoning here.? Your criticism of the subrange type is that it contains invalid values > between the bounds, which you address with dynamic value checks inside the library code.? But why eliminate the > subrange and changing the type to an integer?? It only drastically increases the number of invalid values, > by a factor of over 2^11 times, if integer is 32-bit, otherwise more.? And it demotes the status of these > from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even > partial level of confidence.? It also precludes storing them in less than 64 bits on a 64-bit machine. > > Am I missing something? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Sun Jul 15 18:22:48 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Sun, 15 Jul 2012 11:22:48 -0500 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <1342315739.35223.YahooMailClassic@web29705.mail.ird.yahoo.com> References: <1342315739.35223.YahooMailClassic@web29705.mail.ird.yahoo.com> Message-ID: <5002EE58.6010401@lcwb.coop> On 07/14/2012 08:28 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > making changes of type representation of TEXT in text makes no sense, it's a violation of the Text abstraction No, I disagree here. A primary property of an abstraction is that clients can use it _without_ knowledge of the internal representation. The representation can be changed without altering the behavior of any program that uses the abstraction. A program that imports representation-dependent interfaces such as TextRep.i3 is an exception, but doing so means it known abstraction violator, from the beginning. > http://books.google.com.co/books?id=FbemcUFa0JIC&pg=PA303&dq=source=bl&ots=POGbXUhcW1&sig=WULSpZ74yYU30s-cZ2zhtNTByd8&hl=en&redir_esc=y#v=onepage&q&f=false > > you are in fact claiming that the default type of CHAR is Latin-1, I don't get that because you type extend CHAR and say it's not default, there is something bad, I'm some how suspicious about this need to type in *different* two ranges for receiving a character in one type script and one in another, essentially meaning that the language is wrong in declaring TEXT as a opaque type and should use both kind of strings always or worse the non-default type, which is naturally impossible. I'm not sure what you are saying here. The language does clearly say that CHAR contains (at least) ISO-Latin-1. But I am not proposing to extend CHAR beyond exactly ISO-latin-1, as it is in every implementation of Modula-3. This is because I am sure doing so would break a large amount of existing code. Such code assumes that BYTESIZE(CHAR)=1. I _am_ proposing to extend WIDECHAR to hold Unicode. WIDECHAR was added with this in mind, but today, it fails because its range is too limited. I think probably WIDECHAR was added at a time when only 2^16 code points were in the standard(s). But that has changed. This is a very simple fix of that. As for TEXT, the CM3 version is and always was abstract a string of WIDECHAR. The procedures that have parameters of type CHAR just do the widening or narrowing at the time a character is passed in or out. The fact that the current representation holds some characters in 8-bit array elements is hidden by the Text abstraction, and can be changed if convenient. In contrast, Wr/Rd and friends do not hide character representations in the stream. This is as it must be, and I am proposing only to add additional representations that they can handle, and make it convenient for the usual case that an entire stream uses the same representation of characters. > Sorry guys, but I'm not agreeing with you in this one, I hope you make the best of CM3 work or leave alone the package a la DEC-SRC. > If you are thinking in widening the TEXT string package make it polymorphic it doesn't add complexity burden to the language, it explains better the CHAR type and its extension but do it naturally using the language types, don't create your own one with only that purpose. > THanks in advance > > > > > --- El *s?b, 14/7/12, Rodney M. Bates //* escribi?: > > > De: Rodney M. Bates > Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > Para: m3devel at elegosoft.com > Fecha: s?bado, 14 de julio, 2012 15:05 > > > > On 06/27/2012 02:58 AM, Dirk Muysers wrote: > > Some time ago I have started to develop a unicode library based > > on the old M3 text model but using UTF-8 internally rather than > > Latin-1 (see README attachement). For reasons best known to > > me I had to put it on the backburner in favour of more urgent work. > > If anybody is interested in furthering this solution I would eagerly > > give the existing (pre-alpha) code away. > > This being said, there are certainly better hash algorithms than the > > one used by m3core (eg Goullburn, see > > http://www.clockandflame.com/media/Goulburn06.pdf). > > > > > And: > > > 1. Properties > > This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. > Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as > TYPE Rune = [0..16_10FFFF], but unfortunately not all values in the code-point range are valid and others are left > undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to > contain any invalid or undefined Rune. > > I don't understand the reasoning here. Your criticism of the subrange type is that it contains invalid values > between the bounds, which you address with dynamic value checks inside the library code. But why eliminate the > subrange and changing the type to an integer? It only drastically increases the number of invalid values, > by a factor of over 2^11 times, if integer is 32-bit, otherwise more. And it demotes the status of these > from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even > partial level of confidence. It also precludes storing them in less than 64 bits on a 64-bit machine. > > Am I missing something? > From mika at async.caltech.edu Sun Jul 15 19:39:11 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Sun, 15 Jul 2012 10:39:11 -0700 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org><20120626181955.GB29355@topoi.pooq.com><20120627015457.238041A205B@async.async.caltech.edu> <5001D125.6020704@lcwb.coop> Message-ID: <20120715173911.D18A61A208F@async.async.caltech.edu> I believe the compilers in existence are smart enough not to insert the range check when the types are the same on both sides of the :=. At least for copying... i.e., a, b : WIDECHAR; BEGIN a := b END should not imply a range check. With the types in question, that is probably by far the most common operation, too. Mika "Dirk Muysers" writes: >My reasoning here was a pragmatic rather than a type-theoretical one. >A rune defined as an integer can be freely passed around, while as >a subrange it undergoes a hidden range check at every assignment. >Now that range check wouldn't buy me anything, since the validation >of a rune entails more than a simple range check and remains unavoidable >in order to ensure the postcondition of pure Unicode in any text. From dabenavidesd at yahoo.es Mon Jul 16 03:53:00 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 16 Jul 2012 02:53:00 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <5002EE58.6010401@lcwb.coop> Message-ID: <1342403580.18580.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: Yes, I was referring to native module clients (not C, nor anything else modules) to be able to change rep, is a violation. About the problem of type, is that REF CHAR is a value space strictly more than Latin-1, so this is what I mean, encoding in one type or another must be determined by its subexpressions not by defaults like TEXT type, this is what I mean, width subtyping refers to add some value range as you say may or may be not in the same range of Unicode then it must be called WIDECHAR, you can't call it UCHAR etc, it misses the point of abstraction here, if so, how many types, we would want, 20, 30 according to the bit ending please give a break, we are not C doers, and if we are then call them in your libraries we don't need to contaminate us, sorry I'm not telling that you are being noisy but this certainly could be that (also me). Rodney, please correct me when I say something wrong but are you saying that you will start to put in every interface procedures and stuff to convert oh no, sorry; I hope I'm not that guy converting because somebody needed an extra interface to code some language, it will be a real mess. Thanks in advance --- El dom, 15/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "Daniel Alejandro Benavides D." CC: m3devel at elegosoft.com Fecha: domingo, 15 de julio, 2012 11:22 On 07/14/2012 08:28 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > making changes of type representation of TEXT in text makes no sense, it's a violation of the Text abstraction No, I disagree here.? A primary property of an abstraction is that clients can use it _without_ knowledge of the internal representation.? The representation can be changed without altering the behavior of any program that uses the abstraction.? A program that imports representation-dependent interfaces such as TextRep.i3 is an exception, but doing so means it known abstraction violator, from the beginning. > http://books.google.com.co/books?id=FbemcUFa0JIC&pg=PA303&dq=source=bl&ots=POGbXUhcW1&sig=WULSpZ74yYU30s-cZ2zhtNTByd8&hl=en&redir_esc=y#v=onepage&q&f=false > > you are in fact claiming that the default type of CHAR is Latin-1, I don't get that because you type extend CHAR and say it's not default, there is something bad, I'm some how suspicious about this need to type in *different* two ranges for receiving a character in one type script and one in another, essentially meaning that the language is wrong in declaring TEXT as a opaque type and should use both kind of strings always or worse the non-default type, which is naturally impossible. I'm not sure what you are saying here.? The language does clearly say that CHAR contains (at least) ISO-Latin-1. But I am not proposing to extend CHAR beyond exactly ISO-latin-1, as it is in every implementation of Modula-3. This is because I am sure doing so would break a large amount of existing code.? Such code assumes that BYTESIZE(CHAR)=1. I _am_ proposing to extend WIDECHAR to hold Unicode.? WIDECHAR was added with this in mind, but today, it fails because its range is too limited.? I think probably WIDECHAR was added at a time when only 2^16 code points were in the standard(s).? But that has changed.? This is a very simple fix of that. As for TEXT, the CM3 version is and always was abstract a string of WIDECHAR.? The procedures that have parameters of type CHAR just do the widening or narrowing at the time a character is passed in or out. The fact that the current representation holds some characters in 8-bit array elements is hidden by the Text abstraction, and can be changed if convenient. In contrast, Wr/Rd and friends do not hide character representations in the stream.? This is as it must be, and I am proposing only to add additional representations that they can handle, and make it convenient for the usual case that an entire stream uses the same representation of characters. > Sorry guys, but I'm not agreeing with you in this one, I hope you make the best of CM3 work or leave alone the package a la DEC-SRC. > If you are thinking in widening the TEXT string package make it polymorphic it doesn't add complexity burden to the language, it explains better the CHAR type and its extension but do it naturally using the language types, don't create your own one with only that purpose. > THanks? in advance > > > > > --- El *s?b, 14/7/12, Rodney M. Bates //* escribi?: > > >? ???De: Rodney M. Bates >? ???Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! >? ???Para: m3devel at elegosoft.com >? ???Fecha: s?bado, 14 de julio, 2012 15:05 > > > >? ???On 06/27/2012 02:58 AM, Dirk Muysers wrote: >? ? ? > Some time ago I have started to develop a unicode library based >? ? ? > on the old M3 text model but using UTF-8 internally rather than >? ? ? > Latin-1 (see README attachement). For reasons best known to >? ? ? > me I had to put it on the backburner in favour of more urgent work. >? ? ? > If anybody is interested in furthering this solution I would eagerly >? ? ? > give the existing (pre-alpha) code away. >? ? ? > This being said, there are certainly better hash algorithms than the >? ? ? > one used by m3core (eg Goullburn, see >? ? ? > http://www.clockandflame.com/media/Goulburn06.pdf). >? ? ? > >? ? ? > >? ???And: > > >? ???1. Properties > >? ???This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. >? ???Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as >? ???TYPE Rune = [0..16_10FFFF], but? unfortunately not all values in the code-point range are valid and others are left >? ???undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to >? ???contain any invalid or undefined Rune. > >? ???I don't understand the reasoning here.? Your criticism of the subrange type is that it contains invalid values >? ???between the bounds, which you address with dynamic value checks inside the library code.? But why eliminate the >? ???subrange and changing the type to an integer?? It only drastically increases the number of invalid values, >? ???by a factor of over 2^11 times, if integer is 32-bit, otherwise more.? And it demotes the status of these >? ???from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even >? ???partial level of confidence.? It also precludes storing them in less than 64 bits on a 64-bit machine. > >? ???Am I missing something? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Mon Jul 16 18:45:58 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Mon, 16 Jul 2012 11:45:58 -0500 Subject: [M3devel] New OrdSets generic package Message-ID: <50044546.4010202@lcwb.coop> Now checked in inside m3-libs/ordsets, OrdSets is a generic interface and module for dynamically-sized sets of large-range ordinal types, in functional style. From comments in OrdSets.ig: (* This interface provides operations on sets whose members are of an ordinal type. It is written in a functional style. It never mutates a set value, (except for some internal lazy computation--not visible to clients), and thus it sometimes is able to share heap objects. Its primary use pattern is where the set values can have widely varying sizes, you want a very large maximum size limit, but many of the sets are expected to be much smaller than the maximum. For this to happen, you probably want to instantiate only with INTEGER or WIDECHAR. It will work with LONGINT, but only if its target-machine- dependent range is a subrange of INTEGER. There is no space or time performance benefit to instantiating with a subrange of the base type. If this does not fit your needs, you probably want to use Modula-3's builtin set type, or some other package. The set representations occupy variable-sized heap objects, just sufficient for the set value. In the most general case, these use heap-allocated open arrays of machine words, with one bit per actual set member, plus some overhead, of course. If you compile with a later CM3 Modula-3 compiler and garbage collector that tolerate misaligned "pseudo" pointers, i.e, with the least significant bit set to one, you can set a boolean constant in the corresponding module OrdSets.mg. This will cause it to utilize this Modula-3 implementation feature to store sufficiently small set values entirely within the pointer word, avoiding the high space and time overheads of heap allocation. The CM3 5-8 compiler is sufficient. SRC M3, PM3, EZM3, and earlier CM3 versions are not. As of 2012-7-15, Pickles do not handle these. Enable this with DoPseudoPointers, in OrdSets.mg. *) From dabenavidesd at yahoo.es Thu Jul 19 17:02:21 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 19 Jul 2012 16:02:21 +0100 (BST) Subject: [M3devel] About a new AMD64 binary Message-ID: <1342710141.21612.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: I'm writing to ask whether .deb produced file(s) is(are) available somehow, to install on AMD64_LINUX Hendrik do you have a copy of yourself, right? Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Sun Jul 1 02:39:57 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sat, 30 Jun 2012 20:39:57 -0400 Subject: [M3devel] License compatibility Message-ID: <20120701003957.GA12807@topoi.pooq.com> I've heard, ages ago, that the SRC was not considered compatible with the GPL. I'd really like to know if this is true. Not whether it should be compatible, not whether people were afraid of it being incompatible... not whether some people think it's cmopatible, but whether it *is* compatible. Has anyone ever got a definitive answer to this question? If not, should I ask the FSF explicitly? -- hendrik From dabenavidesd at yahoo.es Sun Jul 1 04:27:24 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 1 Jul 2012 03:27:24 +0100 (BST) Subject: [M3devel] License compatibility In-Reply-To: <20120701003957.GA12807@topoi.pooq.com> Message-ID: <1341109644.19208.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: for me the question is, what kind of license they apply to GPL code for being compatible with us. They did an attempt for the Code Generator Interface, but DEC didn't release for thinking releasing it in some hardware way. Same happened with GPM2 from HP U-code interface, non-disclosure policy agreement negotiation. Thanks in advance --- El s?b, 30/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: [M3devel] License compatibility Para: m3devel at elegosoft.com Fecha: s?bado, 30 de junio, 2012 19:39 I've heard, ages ago, that the SRC was not considered compatible with the GPL.? I'd really like to know if this is true.? Not whether it should be compatible, not whether people were afraid of it being incompatible... not whether some people think it's cmopatible, but whether it *is* compatible. Has anyone ever got a definitive answer to this question? If not, should I ask the FSF explicitly? -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Sun Jul 1 10:52:08 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 1 Jul 2012 10:52:08 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> Message-ID: <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> Text.Length(Dragi?a Duri?)= 15 out from: WITH me = W"Dragi?a Duri?" DO IO.Put("Text.Length(" & me & ")= " & Fmt.Int(Text.Length(me)) & "\n"); END; On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: > Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? From dabenavidesd at yahoo.es Sun Jul 1 18:27:03 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 1 Jul 2012 17:27:03 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> Message-ID: <1341160023.40330.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: it should be less than that, but for single character is right, the problem is that you can't define a wider character in the machine basically, so if your machine can't ... why assume it isn't like that? So bigger machines should have bigger/smaller pointer types (char sizes with byte pointer size or word address size) and change rapidly criteria and keep it like that for the mentioned actual real operation needs for which was designed with char hard-coded and pointer sizes in a lot of classes in Rd/Wr (RdRep, for instance) Thanks in advance --- El dom, 1/7/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Simple change to WIDECHAR type Para: "Mika Nystrom" CC: m3devel at elegosoft.com Fecha: domingo, 1 de julio, 2012 03:52 Text.Length(Dragi?a Duri?)= 15 out from: ? WITH me = W"Dragi?a Duri?" DO ? ? IO.Put("Text.Length(" & me & ")= " & Fmt.Int(Text.Length(me)) & "\n"); ? END; On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: > Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Sun Jul 1 19:39:57 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sun, 1 Jul 2012 13:39:57 -0400 Subject: [M3devel] License compatibility In-Reply-To: References: <20120701003957.GA12807@topoi.pooq.com> Message-ID: <20120701173957.GA8757@topoi.pooq.com> On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > Not compatible. FSF official. > > Sent from my iPhone So this presumably means it is impossible to distribute binary for any Modula 3 program that uses a GPL library even if you include source code. Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. Which means it's practically impossible to provide such a program to anyone that doesn't understand how to use a compiler, which is most Windows users. Or is there some wiggle room somewhere? -- hendrik > > On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > > > I've heard, ages ago, that the SRC was not considered compatible with > > the GPL. I'd really like to know if this is true. Not whether it > > should be compatible, not whether people were afraid of it being > > incompatible... not whether some people think it's cmopatible, but > > whether it *is* compatible. > > > > Has anyone ever got a definitive answer to this question? > > > > If not, should I ask the FSF explicitly? > > > > -- hendrik > > From hendrik at topoi.pooq.com Sun Jul 1 20:58:10 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sun, 1 Jul 2012 14:58:10 -0400 Subject: [M3devel] License compatibility In-Reply-To: References: <20120701003957.GA12807@topoi.pooq.com> <20120701173957.GA8757@topoi.pooq.com> Message-ID: <20120701185810.GA9416@topoi.pooq.com> On Sun, Jul 01, 2012 at 02:08:04PM -0400, Antony Hosking wrote: > I thought LGPL allowed binary linkage without infection. Only if the program is distributed in such a way that the user can relink it with updated versions of the LGPL library. I don't know if that's too much to ask of the typical dumb user I've postulated. Considering how I've had to recompile several m3 libraries just to go on using them with libXaw, it may indeed be too much to expect. Now I don't mind sending out source code. I'm concerned with the end user who minds receiving it. It would presumably be the Modula 3 libraries that pose the problem, I suppose. I'm not talking about the compiler itself, which is not part of my program or the libraries. I guess I'm concerned with the libraries one cannot do without, like libm3. FSF claims that the GPL3 is compatible with more free licensess than the GPL2. Is there a document somewhere that identifies just what the problem is with out license? -- hendrik > > Sent from my iPad > > On Jul 1, 2012, at 1:39 PM, Hendrik Boom wrote: > > > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > >> Not compatible. FSF official. > >> > >> Sent from my iPhone > > > > So this presumably means it is impossible to distribute binary for any > > Modula 3 program that uses a GPL library even if you include source code. > > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > > > Which means it's practically impossible to provide such a program to anyone > > that doesn't understand how to use a compiler, which is most Windows users. > > > > Or is there some wiggle room somewhere? > > > > -- hendrik > > > >> > >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > >> > >>> I've heard, ages ago, that the SRC was not considered compatible with > >>> the GPL. I'd really like to know if this is true. Not whether it > >>> should be compatible, not whether people were afraid of it being > >>> incompatible... not whether some people think it's cmopatible, but > >>> whether it *is* compatible. > >>> > >>> Has anyone ever got a definitive answer to this question? > >>> > >>> If not, should I ask the FSF explicitly? > >>> > >>> -- hendrik > >>> From dabenavidesd at yahoo.es Sun Jul 1 21:10:16 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 1 Jul 2012 20:10:16 +0100 (BST) Subject: [M3devel] License compatibility In-Reply-To: <20120701185810.GA9416@topoi.pooq.com> Message-ID: <1341169816.90454.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: technically, the CM J-V-M was binary compatible with Sun JVM, wasn't it? So in terms of binary compatibility CM3 is binary compatible with Sun JDK (I guess the only version they had), wasn't that the idea to port Java to Modula-3 easily? Ando so if you can link Sun JDK with Gcc I guess you can do it with CM3 at least technically. Thanks in advance --- El dom, 1/7/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] License compatibility Para: "m3devel at elegosoft.com" Fecha: domingo, 1 de julio, 2012 13:58 On Sun, Jul 01, 2012 at 02:08:04PM -0400, Antony Hosking wrote: > I thought LGPL allowed binary linkage without infection. Only if the program is distributed in such a way that the user can relink it with updated versions of the LGPL library.? I don't know if that's too much to ask of the typical dumb user I've postulated.? Considering how I've had to recompile several m3 libraries just to go on using them with libXaw, it may indeed be too much to expect. Now I don't mind sending out source code.? I'm concerned with the end user who minds receiving it. It would presumably be the Modula 3 libraries that pose the problem, I suppose.? I'm not talking about the compiler itself, which is not part of my program or the libraries.? I guess I'm concerned with the libraries one cannot do without, like libm3. FSF claims that the GPL3 is compatible with more free licensess than the GPL2. Is there a document somewhere that identifies just what the problem is with out license? -- hendrik > > Sent from my iPad > > On Jul 1, 2012, at 1:39 PM, Hendrik Boom wrote: > > > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > >> Not compatible.? FSF official. > >> > >> Sent from my iPhone > > > > So this presumably means it is impossible to distribute binary for any > > Modula 3 program that uses a GPL library even if you include source code. > > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > > > Which means it's practically impossible to provide such a program to anyone > > that doesn't understand how to use a compiler, which is most Windows users. > > > > Or is there some wiggle room somewhere? > > > > -- hendrik > > > >> > >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > >> > >>> I've heard, ages ago, that the SRC was not considered compatible with > >>> the GPL.? I'd really like to know if this is true.? Not whether it > >>> should be compatible, not whether people were afraid of it being > >>> incompatible... not whether some people think it's cmopatible, but > >>> whether it *is* compatible. > >>> > >>> Has anyone ever got a definitive answer to this question? > >>> > >>> If not, should I ask the FSF explicitly? > >>> > >>> -- hendrik > >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Sun Jul 1 21:15:35 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 1 Jul 2012 21:15:35 +0200 Subject: [M3devel] License compatibility In-Reply-To: <20120701173957.GA8757@topoi.pooq.com> References: <20120701003957.GA12807@topoi.pooq.com> <20120701173957.GA8757@topoi.pooq.com> Message-ID: <30814087-79E4-429B-B438-F86B3375F23D@m3w.org> GPL is not LGPL. No same restrictions apply. LGPL means you have to link LGPL library dynamically so your program will use system's current version, presumably updateable as update becomes available, regardless of your actions. For GPL libraries, you are probably right. On Jul 1, 2012, at 7:39 PM, Hendrik Boom wrote: > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: >> Not compatible. FSF official. >> >> Sent from my iPhone > > So this presumably means it is impossible to distribute binary for any > Modula 3 program that uses a GPL library even if you include source code. > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > Which means it's practically impossible to provide such a program to anyone > that doesn't understand how to use a compiler, which is most Windows users. > > Or is there some wiggle room somewhere? > > -- hendrik > >> >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: >> >>> I've heard, ages ago, that the SRC was not considered compatible with >>> the GPL. I'd really like to know if this is true. Not whether it >>> should be compatible, not whether people were afraid of it being >>> incompatible... not whether some people think it's cmopatible, but >>> whether it *is* compatible. >>> >>> Has anyone ever got a definitive answer to this question? >>> >>> If not, should I ask the FSF explicitly? >>> >>> -- hendrik >>> From hendrik at topoi.pooq.com Sun Jul 1 21:49:50 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sun, 1 Jul 2012 15:49:50 -0400 Subject: [M3devel] License compatibility In-Reply-To: <1341169816.90454.YahooMailClassic@web29704.mail.ird.yahoo.com> References: <20120701185810.GA9416@topoi.pooq.com> <1341169816.90454.YahooMailClassic@web29704.mail.ird.yahoo.com> Message-ID: <20120701194950.GA9673@topoi.pooq.com> On Sun, Jul 01, 2012 at 08:10:16PM +0100, Daniel Alejandro Benavides D. wrote: > Hi all: > technically, the CM J-V-M was binary compatible with Sun JVM, wasn't > it? So in terms of binary compatibility CM3 is binary compatible with > Sun JDK You're not trying to tell me that I could use CM3 and Sun JDK interchangably, are you? That would mean I can use the JDK to compile Modula 3 code. I have my doubts. > (I guess the only version they had), wasn't that the idea to > port Java to Modula-3 easily? Ando so if you can link Sun JDK with > Gcc I guess you can do it with CM3 at least technically. The question isn't whether we can link CM3 programs with gcc. THe question is whether we can distribute such linked programs. And that doesn't depend on the CM3 compiler as much as the CM3 run-time system. And it's not aa question of technical compatibility. It's a matter off license compatibility. And I suspet the only way we'll get *thst* to work is to write a new run-time system and new libraries that *are* built with a GPL-compatibble license. Or hope the whole issue goes away as free software drifts to freeer licenses and we no longer need any GPL libraries. -- hendrik > Thanks in advance > > --- El dom, 1/7/12, Hendrik Boom escribi?: > > De: Hendrik Boom > Asunto: Re: [M3devel] License compatibility > Para: "m3devel at elegosoft.com" > Fecha: domingo, 1 de julio, 2012 13:58 > > On Sun, Jul 01, 2012 at 02:08:04PM -0400, Antony Hosking wrote: > > I thought LGPL allowed binary linkage without infection. > > Only if the program is distributed in such a way that the user can relink it > with updated versions of the LGPL library.? I don't know if that's too > much to ask of the typical dumb user I've postulated.? Considering how > I've had to recompile several m3 libraries just to go on using them with > libXaw, it may indeed be too much to expect. > > Now I don't mind sending out source code.? I'm concerned with the end > user who minds receiving it. > > It would presumably be the Modula 3 libraries that pose the problem, I > suppose.? I'm not talking about the compiler itself, which is not part > of my program or the libraries.? I guess I'm concerned with the > libraries one cannot do without, like libm3. > > FSF claims that the GPL3 is compatible with more free licensess than the > GPL2. > > Is there a document somewhere that identifies just what the problem is > with out license? > > -- hendrik > > > > > Sent from my iPad > > > > On Jul 1, 2012, at 1:39 PM, Hendrik Boom wrote: > > > > > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > > >> Not compatible.? FSF official. > > >> > > >> Sent from my iPhone > > > > > > So this presumably means it is impossible to distribute binary for any > > > Modula 3 program that uses a GPL library even if you include source code. > > > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > > > > > Which means it's practically impossible to provide such a program to anyone > > > that doesn't understand how to use a compiler, which is most Windows users. > > > > > > Or is there some wiggle room somewhere? > > > > > > -- hendrik > > > > > >> > > >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > > >> > > >>> I've heard, ages ago, that the SRC was not considered compatible with > > >>> the GPL.? I'd really like to know if this is true.? Not whether it > > >>> should be compatible, not whether people were afraid of it being > > >>> incompatible... not whether some people think it's cmopatible, but > > >>> whether it *is* compatible. > > >>> > > >>> Has anyone ever got a definitive answer to this question? > > >>> > > >>> If not, should I ask the FSF explicitly? > > >>> > > >>> -- hendrik > > >>> From hosking at cs.purdue.edu Mon Jul 2 03:34:16 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Sun, 1 Jul 2012 21:34:16 -0400 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <20120630172401.DFE8E1A207C@async.async.caltech.edu> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> Message-ID: <0F25CB80-6844-48A4-B34C-C08670CF96F8@cs.purdue.edu> As far as I know, WIDECHAR was simply for the CM3 JVM to support Java char which is 16-bit. On Jun 30, 2012, at 1:24 PM, Mika Nystrom wrote: > > =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > ... >> >> Solution: >> =3D=3D=3D=3D=3D=3D >> >> * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >> GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >> hold unencoded Unicode characters in scalar values in our Modula-3 = >> programs, while preserving their properties. >> * Implement properties, relations and methods defined for Unicode. With = >> ASCII, numeric order is everything. With Unicode - it is not. This is = >> probably very big project but we can start somewhere, and let interested = >> parties build on it. Dirk Muysers did work in this regard already. >> * Whoever thinks we don't need this and our "tradition" and "legacy" are = >> important, please read this: = >> http://unicode.org/standard/WhatIsUnicode.html . >> >> dd > > Given what you have said about the near-uselessness of WIDECHAR, does anything > actually use it much? What breaks if it is redefined to be the same as, say, > INTEGER? (Or Word.T) > > CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if > that could go back to using the SRC data structures. For people who do stuff > like write VLSI design tools... (probably many other large-scale applications > would like it too). > > Mika From dabenavidesd at yahoo.es Mon Jul 2 04:51:35 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 2 Jul 2012 03:51:35 +0100 (BST) Subject: [M3devel] License compatibility In-Reply-To: <20120701194950.GA9673@topoi.pooq.com> Message-ID: <1341197495.89971.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: technically they were binary license compatibles, I see you take too hard what I say thanks, but don't think so hard about this. But in the need of that you can use the compiler type checking for Modula-3, so most of what you say is true, also if the compiler is compatible perhaps would be question for Eric Muller, who wrote parts of it, the nice thing about Modula-3 was that it was everything object oriented (which is what Java claims about its System). Thanks in advance --- El dom, 1/7/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] License compatibility Para: "m3devel at elegosoft.com" Fecha: domingo, 1 de julio, 2012 14:49 On Sun, Jul 01, 2012 at 08:10:16PM +0100, Daniel Alejandro Benavides D. wrote: > Hi all: > technically, the CM J-V-M was binary compatible with Sun JVM, wasn't > it? So in terms of binary compatibility CM3 is binary compatible with > Sun JDK You're not trying to tell me that I could use CM3 and Sun JDK interchangably, are you?? That would mean I can use the JDK to compile Modula 3 code.? I have my doubts. > (I guess the only version they had), wasn't that the idea to > port Java to Modula-3 easily?? Ando so if you can link Sun JDK with > Gcc I guess you can do it with CM3 at least technically. The question isn't whether we can link CM3 programs with gcc.? THe question is whether we can distribute such linked programs.? And that doesn't depend on the CM3 compiler as much as the CM3 run-time system. And it's not aa question of technical compatibility.? It's a matter off license compatibility.? And I suspet the only way we'll get *thst* to work is? to write a new run-time system and new libraries that *are* built with a GPL-compatibble license. Or hope the whole issue goes away as free software drifts to freeer licenses and we no longer need any GPL libraries. -- hendrik > Thanks in advance > > --- El dom, 1/7/12, Hendrik Boom escribi?: > > De: Hendrik Boom > Asunto: Re: [M3devel] License compatibility > Para: "m3devel at elegosoft.com" > Fecha: domingo, 1 de julio, 2012 13:58 > > On Sun, Jul 01, 2012 at 02:08:04PM -0400, Antony Hosking wrote: > > I thought LGPL allowed binary linkage without infection. > > Only if the program is distributed in such a way that the user can relink it > with updated versions of the LGPL library. I don't know if that's too > much to ask of the typical dumb user I've postulated. Considering how > I've had to recompile several m3 libraries just to go on using them with > libXaw, it may indeed be too much to expect. > > Now I don't mind sending out source code. I'm concerned with the end > user who minds receiving it. > > It would presumably be the Modula 3 libraries that pose the problem, I > suppose. I'm not talking about the compiler itself, which is not part > of my program or the libraries. I guess I'm concerned with the > libraries one cannot do without, like libm3. > > FSF claims that the GPL3 is compatible with more free licensess than the > GPL2. > > Is there a document somewhere that identifies just what the problem is > with out license? > > -- hendrik > > > > > Sent from my iPad > > > > On Jul 1, 2012, at 1:39 PM, Hendrik Boom wrote: > > > > > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > > >> Not compatible. FSF official. > > >> > > >> Sent from my iPhone > > > > > > So this presumably means it is impossible to distribute binary for any > > > Modula 3 program that uses a GPL library even if you include source code. > > > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > > > > > Which means it's practically impossible to provide such a program to anyone > > > that doesn't understand how to use a compiler, which is most Windows users. > > > > > > Or is there some wiggle room somewhere? > > > > > > -- hendrik > > > > > >> > > >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > > >> > > >>> I've heard, ages ago, that the SRC was not considered compatible with > > >>> the GPL. I'd really like to know if this is true. Not whether it > > >>> should be compatible, not whether people were afraid of it being > > >>> incompatible... not whether some people think it's cmopatible, but > > >>> whether it *is* compatible. > > >>> > > >>> Has anyone ever got a definitive answer to this question? > > >>> > > >>> If not, should I ask the FSF explicitly? > > >>> > > >>> -- hendrik > > >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jul 2 10:09:43 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 2 Jul 2012 10:09:43 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <0F25CB80-6844-48A4-B34C-C08670CF96F8@cs.purdue.edu> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <0F25CB80-6844-48A4-B34C-C08670CF96F8@cs.purdue.edu> Message-ID: To be compatible, at least partially, with some other solution. Completeness of that other solution did not rub magically on cm3 just because they invented WIDECHAR as standard scalar type. On Jul 2, 2012, at 3:34 AM, Tony Hosking wrote: > As far as I know, WIDECHAR was simply for the CM3 JVM to support Java char which is 16-bit. > > On Jun 30, 2012, at 1:24 PM, Mika Nystrom wrote: > >> >> =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: >> ... >>> >>> Solution: >>> =3D=3D=3D=3D=3D=3D >>> >>> * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >>> GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >>> hold unencoded Unicode characters in scalar values in our Modula-3 = >>> programs, while preserving their properties. >>> * Implement properties, relations and methods defined for Unicode. With = >>> ASCII, numeric order is everything. With Unicode - it is not. This is = >>> probably very big project but we can start somewhere, and let interested = >>> parties build on it. Dirk Muysers did work in this regard already. >>> * Whoever thinks we don't need this and our "tradition" and "legacy" are = >>> important, please read this: = >>> http://unicode.org/standard/WhatIsUnicode.html . >>> >>> dd >> >> Given what you have said about the near-uselessness of WIDECHAR, does anything >> actually use it much? What breaks if it is redefined to be the same as, say, >> INTEGER? (Or Word.T) >> >> CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if >> that could go back to using the SRC data structures. For people who do stuff >> like write VLSI design tools... (probably many other large-scale applications >> would like it too). >> >> Mika > From rodney_bates at lcwb.coop Mon Jul 2 16:50:18 2012 From: rodney_bates at lcwb.coop (Rodney Bates) Date: Mon, 2 Jul 2012 07:50:18 -0700 Subject: [M3devel] UTF-8 TEXT Message-ID: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> -Rodney Bates --- antony.hosking at gmail.com wrote: From: Antony Hosking To: "Rodney M. Bates" Cc: "m3devel at elegosoft.com" Subject: Re: [M3devel] UTF-8 TEXT Date: Thu, 28 Jun 2012 10:37:36 -0400 Why not simply say that CHAR is an enumeration representing all of UTF-32? The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. We would need to translate the current Latin-1 literals into UTF-32. And we could simply have a new literal form for Unicode literals. This is almost what I would propose to do, with a couple of differences: Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. I am sure there is lots of existing code that depends on the implementation properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. Then I would define, in the language itself, that WIDECHAR is Unicode, not UTF-32. Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an implementation characteristic that BYTESIZE(WIDECHAR))=4. On Jun 27, 2012, at 10:12 PM, Rodney M. Bates wrote: > > > On 06/27/2012 07:32 PM, Antony Hosking wrote: >> So what do we do about 6-byte UTF-8 code points? They won't fit in WIDECHAR. Surely we should allow accessing a UTF-8 character as a CARDINAL and be done with it? >> > > Absolutely. Except I think a better way is to make WIDECHAR big enough to hold all of > Unicode. > >> Sent from my iPad >> >> On Jun 27, 2012, at 3:20 PM, "Rodney M. Bates" wrote: >> >>> >>> >>> On 06/26/2012 10:30 PM, Hendrik Boom wrote: >>>> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >>>>> I seem to recall that Rodney did some work a while back relating to TEXT. >>>>> Rodney, can you weigh in on some of this? >>>>> --Randy Coleburn >>>>> >>>>> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >>>>> Sent: Tuesday, June 26, 2012 12:46 PM >>>>> To: Jay >>>>> Cc: m3devel >>>>> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >>>>> >>>>> You had idea in other message. Store length! >>>>> >>>>> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. >>>> >>>> Most of the time, you don't need explicit integer indexes to character >>>> locations. What you do need is an operation that fetches a character >>>> given the string and its index (whatever data structure that index is), >>>> and one that increments the index past that character. As long as you >>>> can save an index and use it later on the same string, that's probably >>>> all you ever need. And with a simple TEXT representation (such as the >>>> obvious array of bytes containing characters of various widths) a byte >>>> index is all you need (note: NOT a character index). It's easy even to >>>> use TEXT and its integer indices as the data representation, as long as >>>> you use the proper functions parse the characters and increment the >>>> indices by amounts that might differ from 1. >>>> >>>> And if your source code is represented in UTF-8, the representation that >>>> requires little extra compiler effort to parse, your TEXT strings will >>>> automagically appear in UTF-8. >>> >>> The original designers of the language and its libraries have given us >>> two different abstractions for handling character strings (in addition >>> to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. >>> >>> Text is highly general and easy to use. Concatentations and substrings >>> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >>> Random access by *character* number is easy and, hopefully, implemented >>> with efficiency at least better than O(n). >>> >>> Wr and friends restrict you to sequential access, at least mostly, but >>> gain implementation convenience and efficiency as a result. >>> >>> I feel very stongly that we should *not* take away the full generality >>> of Text, especially efficient random access, to handle variable-length >>> character encodings in strings. For these, lets make more friends of >>> Wr and Rd, which already assume sequential access. For example, a >>> filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 >>> interpretation to its bytes, and delivers a stream of Unicode characters, >>> in variables of type WIDECHAR. >>> >>> Text should preserve the abstraction that it's a string of characters, >>> generalized as it already is in cm3, to have type WIDECHAR, so they can be any >>> Unicode character. The internal representation should, usually, not be >>> of concern. >>> >>> Note that nowhere in Text are character values transferred between >>> a Text.T and any form of I/O stream. In the Text abstraction, all >>> characters go in and out of a Text.T in variables of type CHAR, >>> WIDECHAR, and arrays thereof. IO, etc. is only done in streams, >>> e.g, TextWr. We can easily add new variants of these that encode/decode >>> by various rules. >>> >>> Of course, it is still valid to put a string of bytes in a Text.T and >>> apply, e.g., UTF-8 interpretation yourself. But that's lower-level >>> programming, and shouldn't confuse the abstraction. >>> >>>> >>>> I can see a use for various wide characters -- the things you extract >>>> from a TEXT by parsing biits of it, but none for anything >>>> really new complicated for wide TEXT. >>>> >>>> The only confusing thing is that the existing operations for extracting >>>> bytes from TEXT have names that suggest they are extracting characters. >>>> >>> >>> I think it's more than a suggestion. I think the abstraction clearly >>> considers them characters. And it should stay that way. If you want, >>> at a higher level of code, to treat them as bytes, that's fine, but the >>> abstraction continues to view them as characters (which only you, the >>> client, know is not really so.) >>> >>>> -- Hendrik >>>> >> From rodney_bates at lcwb.coop Mon Jul 2 17:04:25 2012 From: rodney_bates at lcwb.coop (Rodney Bates) Date: Mon, 2 Jul 2012 08:04:25 -0700 Subject: [M3devel] Simple change to WIDECHAR type Message-ID: <20120702080425.EEE2B81F@resin11.mta.everyone.net> -Rodney Bates --- dragisha at m3w.org wrote: From: Dragi?a Duri? To: Antony Hosking Cc: m3devel Subject: Re: [M3devel] Simple change to WIDECHAR type Date: Sat, 30 Jun 2012 09:33:00 +0200 Current GetChar/SetChars and GetWideChar/SetWideChars are not character-level access methods, in terms of Unicode. They are "byte-level", fixed width data accesses. Reason: Both CHAR (cardinality 2^8) and WIDECHAR (cardinality 2^16) based strings must use one or more characters to represent whole Unicode (cardinality 2^20). If we must encode in any case, then we don't have any benefit of WIDECHAR (as it is implemented/understood now) at all! To represent Unicode with either CHAR or WIDECHAR based TEXTs - we must use either UTF-8 or UTF-16. Both are one-to-multibyte encodings, encoding one Unicode character to either 1-4 CHARs or 1-2 WIDECHARs. What exactly is meaning (at Modula-3 usual levels of abstraction) of character-level access? Do we need whatever bit pattern physically happening at some location in our data's representation. Or maybe we need numerical representation of actual, visually distinguishable in written representation, Unicode character value? One from that set of 2^20 elements? What is meaning of Text.Sub() based on byte-level access operations where our resulting TEXTs first character is in fact a prefix of some Unicode characters encoding? And/or where our last character is invalid/incomplete suffix of some encoded character. Since when are fast and efficient operations doing something we don't need at all our priority? We are getting nothing at all with WIDECHAR. No. Single. Thing. WIDECHAR does not make us closer to Unicode at all. WIDECHAR, together with CHAR (in context of our current TEXT) makes two almost-solutions to Unicode problem and existence of WIDECHAR scalar type makes us a bit closer to Unicode almost-solution of C world and nothing else. ------------------------------------------------------------------------------------------------------------------------------------------- I think the only reason why we got nothing is that WIDECHAR isn't wide enough. Let's fix that. --------------------------------------------------------------------------------------------------------------------------------------- Currently, neither GetChar nor GetWideChar can get "a character at nth position". Reason: No character scalar type to keep any Unicode character. Solution: ====== * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can hold unencoded Unicode characters in scalar values in our Modula-3 programs, while preserving their properties. * Implement properties, relations and methods defined for Unicode. With ASCII, numeric order is everything. With Unicode - it is not. This is probably very big project but we can start somewhere, and let interested parties build on it. Dirk Muysers did work in this regard already. * Whoever thinks we don't need this and our "tradition" and "legacy" are important, please read this: http://unicode.org/standard/WhatIsUnicode.html . dd On Jun 29, 2012, at 5:52 PM, Dragi?a Duri? wrote: > That, or UTF-16 encoding on top of current WIDECHAR. > > On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > >> That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory. In other words, all TEXT containing WIDECHAR will double in size. >> >> On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: >> >>> m3front/src/builtinTypes/WCharr.m3, line: >>> >>> T := EnumType.New (16_10000, elts); >>> >>> to >>> >>> T := EnumType.New (16_100000, elts); >>> >>> Will this break things? Any other assumptions anywhere? >>> >> > From rodney_bates at lcwb.coop Mon Jul 2 17:09:25 2012 From: rodney_bates at lcwb.coop (Rodney Bates) Date: Mon, 2 Jul 2012 08:09:25 -0700 Subject: [M3devel] Some earlier work Message-ID: <20120702080925.EEE2BB96@resin11.mta.everyone.net> Hmm. This looks very much like original Text.i3, with CHAR replaced by UText.Char. Dare I infer that is was inspired that way? It presents just the abstraction that I think Text itself should present. -Rodney Bates --- dragisha at m3w.org wrote: From: Dragi?a Duri? To: m3devel Subject: [M3devel] Some earlier work Date: Sat, 30 Jun 2012 10:56:27 +0200 This is how we implemented UTF8 strings over current TEXTs. Current implementation is UNSAFE and uses glibc utf8 methods. Nothing too complicated and nothing we can't implemented in Modula-3/portable C. ===== INTERFACE UText; TYPE T = TEXT; Char = CARDINAL; PROCEDURE Cat(t, u: T): T; PROCEDURE Equal(t, u: T): BOOLEAN; PROCEDURE GetChar(t: T; i: CARDINAL): Char; PROCEDURE ByteSize(t: T): CARDINAL; PROCEDURE Length(t: T): CARDINAL; PROCEDURE Empty(t: T): BOOLEAN; PROCEDURE Sub(t: T; start: CARDINAL; length: CARDINAL := LAST(CARDINAL)): T; PROCEDURE SetChars(VAR a: ARRAY OF Char; t: T); PROCEDURE FromChar(ch: Char): T; PROCEDURE FromChars(READONLY a: ARRAY OF Char): T; PROCEDURE Hash(t: T): Word.T; PROCEDURE Compare(t1, t2: T): [-1..1]; PROCEDURE FindChar(t: T; ch: Char; start: CARDINAL := 0): INTEGER; PROCEDURE FindCharR(t: T; ch: Char; start: CARDINAL := LAST(INTEGER)): INTEGER; END UText. From dragisha at m3w.org Mon Jul 2 17:13:03 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 2 Jul 2012 17:13:03 +0200 Subject: [M3devel] Some earlier work In-Reply-To: <20120702080925.EEE2BB96@resin11.mta.everyone.net> References: <20120702080925.EEE2BB96@resin11.mta.everyone.net> Message-ID: <99FFC5CA-99A9-4E57-A41C-C82624123312@m3w.org> With Brand added, it is ready for generic containers from libm3. Yes, it was inspired by Text.i3. Idea was to make as thin an interface as possible. On Jul 2, 2012, at 5:09 PM, Rodney Bates wrote: > Hmm. This looks very much like original Text.i3, with CHAR replaced by UText.Char. > Dare I infer that is was inspired that way? It presents just the abstraction that > I think Text itself should present. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jul 2 17:27:56 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 2 Jul 2012 16:27:56 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: Message-ID: <1341242876.32584.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: I don't know if I would agree with the kind of thinking that Modula-3 needed CHAR and WIDECHAR for a JVM execution engine device, but for the interpretation function. For instance what would be the purpose of handling more than 140 CHARS in a mobile phone, I don't see the need for that, or if you need to target many languages is useful but in a compiler setting not in an execution environment like CM J-V-M For instance let's suppose you have a Win16 device and an IBM JVM ready hardware, would you need two types of char? Maybe but for efficiency reasons, not for anything more. I agree with WIDECHAR devices in the sense of a General purpose language is better than many language encodings but we need to see the devices for that, for instance mobile phones, etc. Normally JVM-ready phones. Thanks in advance --- El lun, 2/7/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Simple change to WIDECHAR type Para: "Tony Hosking" CC: m3devel at elegosoft.com Fecha: lunes, 2 de julio, 2012 03:09 To be compatible, at least partially, with some other solution. Completeness of that other solution did not rub magically on cm3 just because they invented WIDECHAR as standard scalar type. On Jul 2, 2012, at 3:34 AM, Tony Hosking wrote: > As far as I know, WIDECHAR was simply for the CM3 JVM to support Java char which is 16-bit. > > On Jun 30, 2012, at 1:24 PM, Mika Nystrom wrote: > >> >> =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: >> ... >>> >>> Solution: >>> =3D=3D=3D=3D=3D=3D >>> >>> * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >>> GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >>> hold unencoded Unicode characters in scalar values in our Modula-3 = >>> programs, while preserving their properties. >>> * Implement properties, relations and methods defined for? Unicode. With = >>> ASCII, numeric order is everything. With Unicode - it is not. This is = >>> probably very big project but we can start somewhere, and let interested = >>> parties build on it. Dirk Muysers did work in this regard already. >>> * Whoever thinks we don't need this and our "tradition" and "legacy" are = >>> important, please read this: = >>> http://unicode.org/standard/WhatIsUnicode.html . >>> >>> dd >> >> Given what you have said about the near-uselessness of WIDECHAR, does anything >> actually use it much?? What breaks if it is redefined to be the same as, say, >> INTEGER?? (Or Word.T) >> >> CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if >> that could go back to using the SRC data structures.? For people who do stuff >> like write VLSI design tools... (probably many other large-scale applications >> would like it too). >> >>? Mika > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hosking at cs.purdue.edu Mon Jul 2 17:57:14 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Mon, 2 Jul 2012 11:57:14 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> References: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> Message-ID: <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> On Jul 2, 2012, at 10:50 AM, Rodney Bates wrote: > > > -Rodney Bates > > --- antony.hosking at gmail.com wrote: > >> From: Antony Hosking >> To: "Rodney M. Bates" >> Cc: "m3devel at elegosoft.com" >> Subject: Re: [M3devel] UTF-8 TEXT >> Date: Thu, 28 Jun 2012 10:37:36 -0400 >> >> Why not simply say that CHAR is an enumeration representing all of UTF-32? >> The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. >> We would need to translate the current Latin-1 literals into UTF-32. >> And we could simply have a new literal form for Unicode literals. >> > This is almost what I would propose to do, with a couple of differences: > > Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. > I am sure there is lots of existing code that depends on the implementation > properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. Fair enough. Would we leave the encoding of CHAR as ISO-Latin-1? We?d still need translation from ISO-Latin-1 to UTF-8 wouldn?t we? > Then I would define, in the language itself, that WIDECHAR is Unicode, not > UTF-32. Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an > implementation characteristic that BYTESIZE(WIDECHAR))=4. I note this text from the Wikipedia entry for UTF-32: Though a fixed number of bytes per code point appear convenient, it is not as useful as it appears. It makes truncation easier but not significantly so compared to UTF-8 andUTF-16. It does not make it faster to find a particular offset in the string, as an "offset" can be measured in the fixed-size code units of any encoding. It does not make calculating the displayed width of a string easier except in limited cases, since even with a ?fixed width? font there may be more than one code point per character position (combining marks) or more than one character position per code point (for example CJK ideographs). Combining marks mean editors cannot treat one code point as being the same as one unit for editing. Editors that limit themselves to left-to-right languages and precomposed characters can take advantage of fixed-sized code units, but such editors are unlikely to support non-BMP characters and thus can work equally well with 16-bit UTF-16 encoding. Does this argue against WIDECHAR=UTF-32? Would we be better off simply saying WIDECHAR=UTF-16 and leaving things as they are? Yes, it would make the definition of WideCharAt a little odd, because the index would be defined in 16-bit units rather than UTF-16 glyph code-points. By the way, if we did change WIDECHAR to an enumeration containing 16_110000 elements then the stored (memory) size of WIDECHAR would be 4 bytes given the current CM3 implementation of enumerations, which chooses a (non-PACKED) stored size of 1/2/4/8 bytes depending on the number of elements. > > On Jun 27, 2012, at 10:12 PM, Rodney M. Bates wrote: > >> >> >> On 06/27/2012 07:32 PM, Antony Hosking wrote: >>> So what do we do about 6-byte UTF-8 code points? They won't fit in WIDECHAR. Surely we should allow accessing a UTF-8 character as a CARDINAL and be done with it? >>> >> >> Absolutely. Except I think a better way is to make WIDECHAR big enough to hold all of >> Unicode. >> >>> Sent from my iPad >>> >>> On Jun 27, 2012, at 3:20 PM, "Rodney M. Bates" wrote: >>> >>>> >>>> >>>> On 06/26/2012 10:30 PM, Hendrik Boom wrote: >>>>> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >>>>>> I seem to recall that Rodney did some work a while back relating to TEXT. >>>>>> Rodney, can you weigh in on some of this? >>>>>> --Randy Coleburn >>>>>> >>>>>> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >>>>>> Sent: Tuesday, June 26, 2012 12:46 PM >>>>>> To: Jay >>>>>> Cc: m3devel >>>>>> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >>>>>> >>>>>> You had idea in other message. Store length! >>>>>> >>>>>> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. >>>>> >>>>> Most of the time, you don't need explicit integer indexes to character >>>>> locations. What you do need is an operation that fetches a character >>>>> given the string and its index (whatever data structure that index is), >>>>> and one that increments the index past that character. As long as you >>>>> can save an index and use it later on the same string, that's probably >>>>> all you ever need. And with a simple TEXT representation (such as the >>>>> obvious array of bytes containing characters of various widths) a byte >>>>> index is all you need (note: NOT a character index). It's easy even to >>>>> use TEXT and its integer indices as the data representation, as long as >>>>> you use the proper functions parse the characters and increment the >>>>> indices by amounts that might differ from 1. >>>>> >>>>> And if your source code is represented in UTF-8, the representation that >>>>> requires little extra compiler effort to parse, your TEXT strings will >>>>> automagically appear in UTF-8. >>>> >>>> The original designers of the language and its libraries have given us >>>> two different abstractions for handling character strings (in addition >>>> to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. >>>> >>>> Text is highly general and easy to use. Concatentations and substrings >>>> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >>>> Random access by *character* number is easy and, hopefully, implemented >>>> with efficiency at least better than O(n). >>>> >>>> Wr and friends restrict you to sequential access, at least mostly, but >>>> gain implementation convenience and efficiency as a result. >>>> >>>> I feel very stongly that we should *not* take away the full generality >>>> of Text, especially efficient random access, to handle variable-length >>>> character encodings in strings. For these, lets make more friends of >>>> Wr and Rd, which already assume sequential access. For example, a >>>> filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 >>>> interpretation to its bytes, and delivers a stream of Unicode characters, >>>> in variables of type WIDECHAR. >>>> >>>> Text should preserve the abstraction that it's a string of characters, >>>> generalized as it already is in cm3, to have type WIDECHAR, so they can be any >>>> Unicode character. The internal representation should, usually, not be >>>> of concern. >>>> >>>> Note that nowhere in Text are character values transferred between >>>> a Text.T and any form of I/O stream. In the Text abstraction, all >>>> characters go in and out of a Text.T in variables of type CHAR, >>>> WIDECHAR, and arrays thereof. IO, etc. is only done in streams, >>>> e.g, TextWr. We can easily add new variants of these that encode/decode >>>> by various rules. >>>> >>>> Of course, it is still valid to put a string of bytes in a Text.T and >>>> apply, e.g., UTF-8 interpretation yourself. But that's lower-level >>>> programming, and shouldn't confuse the abstraction. >>>> >>>>> >>>>> I can see a use for various wide characters -- the things you extract >>>>> from a TEXT by parsing biits of it, but none for anything >>>>> really new complicated for wide TEXT. >>>>> >>>>> The only confusing thing is that the existing operations for extracting >>>>> bytes from TEXT have names that suggest they are extracting characters. >>>>> >>>> >>>> I think it's more than a suggestion. I think the abstraction clearly >>>> considers them characters. And it should stay that way. If you want, >>>> at a higher level of code, to treat them as bytes, that's fine, but the >>>> abstraction continues to view them as characters (which only you, the >>>> client, know is not really so.) >>>> >>>>> -- Hendrik >>>>> >>> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Mon Jul 2 18:54:44 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Mon, 2 Jul 2012 12:54:44 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> References: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> Message-ID: <20120702165444.GA20908@topoi.pooq.com> On Mon, Jul 02, 2012 at 11:57:14AM -0400, Tony Hosking wrote: > > On Jul 2, 2012, at 10:50 AM, Rodney Bates wrote: > > > > > > > -Rodney Bates > > > > --- antony.hosking at gmail.com wrote: > > > >> From: Antony Hosking > >> To: "Rodney M. Bates" > >> Cc: "m3devel at elegosoft.com" > >> Subject: Re: [M3devel] UTF-8 TEXT > >> Date: Thu, 28 Jun 2012 10:37:36 -0400 > >> > >> Why not simply say that CHAR is an enumeration representing all of UTF-32? > >> The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. > >> We would need to translate the current Latin-1 literals into UTF-32. > >> And we could simply have a new literal form for Unicode literals. > >> > > This is almost what I would propose to do, with a couple of differences: > > > > Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. > > I am sure there is lots of existing code that depends on the implementation > > properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. > > Fair enough. Would we leave the encoding of CHAR as ISO-Latin-1? We?d still need translation from ISO-Latin-1 to UTF-8 wouldn?t we? > > > Then I would define, in the language itself, that WIDECHAR is Unicode, not > > UTF-32. Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an > > implementation characteristic that BYTESIZE(WIDECHAR))=4. > > I note this text from the Wikipedia entry for UTF-32: I had just looked this paragraph up on Wikipedia to post it when I noticed you had already done so. > > Though a fixed number of bytes per code point appear convenient, it is > not as useful as it appears. Wich is the gist of my objection to storing implementing TEXT as fixed-width 16, 20, or 32-bit storage units. It wastes space without much gain. (Exception might be made for a few languages that can be efficiently stored in 16 bits but not in UTF-8.) > It makes truncation easier but not significantly so compared to UTF-8 > and UTF-16. > It does not make it faster to find a particular offset in the string, > as an "offset" can be measured in the fixed-size code units of any > encoding. Exactly why I want character-extraction to be expressible in efficient "offsets" with implementation-independent specifications (though possibly implementatino-dependent values). I don't mind if character counts are also made available, as long as it doesn't impose extra overhead on those that don't use them. Operations with offsets that allow one to extract characters and skip over characters are sufficient for most purposes. The use of efficient offsets is independent of the question of access to individual bytes. > It does not make calculating the displayed width of a string easier > except in limited cases, since even with a ?fixed width? font there > may be more than one code point per character position (combining > marks) or more than one character position per code point (for example > CJK ideographs). > Combining marks mean editors cannot treat one code point as being the > same as one unit for editing. Editors that limit themselves to > left-to-right languages and precomposed characters can take advantage > of fixed-sized code units, but such editors are unlikely to support > non-BMP characters and thus can work equally well with 16-bit UTF-16 > encoding. I'd like to point out that most string processing doesn't really deal in characters at all, but in terms of words, phrases, symbols, and other linguistic structures that have to be dealt with using parsing. Assembling bytes of UTF-8 into characters is just more parsing, and should be viewed as such. For many applications it isn't even necessary to decode UTF-8, because it can be copied without being aware of its character structure. And it the language ascribes special meanings only to some of the first 128 characters, these can be unambiguously recognised in UTF-8 without decoding UTF-8 at all. This does argue for having byte access as well. > > Does this argue against WIDECHAR=UTF-32? Would we be better off simply saying WIDECHAR=UTF-16 and leaving things as they are? Yes, it would make the definition of WideCharAt a little odd, because the index would be defined in 16-bit units rather than UTF-16 glyph code-points. > > By the way, if we did change WIDECHAR to an enumeration containing 16_110000 elements then the stored (memory) size of WIDECHAR would be 4 bytes given the current CM3 implementation of enumerations, which chooses a (non-PACKED) stored size of 1/2/4/8 bytes depending on the number of elements. 16-bit WIDECHARs would seem to me to be the worst choice of all, except in the special case that you *know* that all the characters you'll ever have to deaal with fit in 16 bits and most of them won't fit in 8. I'd use WIDECHAR when I'm dealing with individual characters/UnicodeCodepoints. I'd use TEXT when dealing with strings. Or some custom data structure that can handle text containing strings and other data structure (suched as parse trees). Generally, there won't be a lot of WIDECHARS around in a running program, so I don't care much about the few extra bytes. -- hendrik From dabenavidesd at yahoo.es Mon Jul 2 22:44:44 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 2 Jul 2012 21:44:44 +0100 (BST) Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120702165444.GA20908@topoi.pooq.com> Message-ID: <1341261884.27797.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: I was thinking in back-end encoding of the CHARs in WIDECHAR using Rd/Wr-Rep but the mentioned modules are done around the idea of efficient machine implementation. I just think that the only need for having a UTF-8 or whatever encoding for CHARs and WIDECHAR is in a machine with those types. Numerous ?-coded "rare little" JVM machines are capable of handling that kind of Unicodes but anything else is just spurious to me, make that encoding for everybody in CM3. There isn't any other machine with that byte encoding that I know about so the good news is that the machines are reduced to: 1) Industrial Size scenario JVM 2) Small sized vendor machines, a web browser client like a JS? I hope with that we find some common ground for a solution for the issue. Thanks in advance --- El lun, 2/7/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] UTF-8 TEXT Para: m3devel at elegosoft.com Fecha: lunes, 2 de julio, 2012 11:54 On Mon, Jul 02, 2012 at 11:57:14AM -0400, Tony Hosking wrote: > > On Jul 2, 2012, at 10:50 AM, Rodney Bates wrote: > > > > > > > -Rodney Bates > > > > --- antony.hosking at gmail.com wrote: > > > >> From: Antony Hosking > >> To: "Rodney M. Bates" > >> Cc: "m3devel at elegosoft.com" > >> Subject: Re: [M3devel] UTF-8 TEXT > >> Date: Thu, 28 Jun 2012 10:37:36 -0400 > >> > >> Why not simply say that CHAR is an enumeration representing all of UTF-32? > >> The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. > >> We would need to translate the current Latin-1 literals into UTF-32. > >> And we could simply have a new literal form for Unicode literals. > >> > > This is almost what I would propose to do, with a couple of differences: > > > > Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. > > I am sure there is lots of existing code that depends on the implementation > > properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. > > Fair enough.? Would we leave the encoding of CHAR as ISO-Latin-1?? We?d still need translation from ISO-Latin-1 to UTF-8 wouldn?t we? > > > Then I would define, in the language itself, that WIDECHAR is Unicode, not > > UTF-32.? Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an > > implementation characteristic that BYTESIZE(WIDECHAR))=4. > > I note this text from the Wikipedia entry for UTF-32: I had just looked this paragraph up on Wikipedia to post it when I noticed you had already done so. > > Though a fixed number of bytes per code point appear convenient, it is > not as useful as it appears. Wich is the gist of my objection to storing implementing TEXT as fixed-width 16, 20, or 32-bit storage units.? It wastes space without much gain.? (Exception might be made for a few languages that can be efficiently stored in 16 bits but not in UTF-8.) > It makes truncation easier but not significantly so compared to UTF-8 > and UTF-16. > It does not make it faster to find a particular offset in the string, > as an "offset" can be measured in the fixed-size code units of any > encoding. Exactly why I want character-extraction to be expressible in efficient "offsets" with implementation-independent specifications (though possibly implementatino-dependent values).? I don't mind if character counts are also made available, as long as it doesn't impose extra overhead on those that don't use them.? Operations with offsets that allow one to extract characters and skip over characters are sufficient for most purposes.? The use of efficient offsets is independent of the question of access to individual bytes. > It does not make calculating the displayed width of a string easier > except in limited cases, since even with a ?fixed width? font there > may be more than one code point per character position (combining > marks) or more than one character position per code point (for example > CJK ideographs). > Combining marks mean editors cannot treat one code point as being the > same as one unit for editing. Editors that limit themselves to > left-to-right languages and precomposed characters can take advantage > of fixed-sized code units, but such editors are unlikely to support > non-BMP characters and thus can work equally well with 16-bit UTF-16 > encoding. I'd like to point out that most string processing doesn't really deal in characters at all, but in terms of words, phrases, symbols, and other linguistic structures that have to be dealt with using parsing.? Assembling bytes of UTF-8 into characters is just more parsing, and should be viewed as such. For? many applications it isn't even necessary to decode UTF-8, because it can be copied without being aware of its character structure. And it the language ascribes special meanings only to some of the first 128 characters, these can be unambiguously recognised in UTF-8 without decoding UTF-8 at all.? This does argue for having byte access as well. > > Does this argue against WIDECHAR=UTF-32?? Would we be better off simply saying WIDECHAR=UTF-16 and leaving things as they are?? Yes, it would make the definition of WideCharAt a little odd, because the index would be defined in 16-bit units rather than UTF-16 glyph code-points. > > By the way, if we did change WIDECHAR to an enumeration containing 16_110000 elements then the stored (memory) size of WIDECHAR would be 4 bytes given the current CM3 implementation of enumerations, which chooses a (non-PACKED) stored size of 1/2/4/8 bytes depending on the number of elements. 16-bit WIDECHARs would seem to me to be the worst choice of all, except in the special case that you *know* that all the characters you'll ever have to deaal with fit in 16 bits and most of them won't fit in 8. I'd use WIDECHAR when I'm dealing with individual characters/UnicodeCodepoints.? I'd use TEXT when dealing with strings.? Or some custom data structure that can handle text containing strings and other data structure (suched as parse trees).? Generally, there won't be a lot of WIDECHARS around in a running program, so I don't care much about the few extra bytes. -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Fri Jul 6 11:23:34 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Fri, 6 Jul 2012 11:23:34 +0200 Subject: [M3devel] A question for our language lawyers Message-ID: The report says (2.6.9) "The values in the array will be arbitrary values of their type." Now, ParseParams in its "init" method allocates an array of BOOLEANs and relies on the fact that it is supposedly initialised with FALSE values. At the other hand the report says (2.2.4) "The constant default is a default value used when a record is constructed or allocated" If I allocate an array of records, which statement is stronger: - the array contains arbitray record values ? - the array record fields will be initialised to their default values? The ParseParams "init" method is obviously erroneous and works only by virtue of a happy combination of circumstances. But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 18:06:20 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 17:06:20 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: Message-ID: <1341590780.97298.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: if that's true that you say "relies " Init hence the MODULE is wrong (is not) to specify that. But record rules hasn't anything to do here. But anyway you may have a point in that record initialization are less important than record construction (c.f p.53, s2.6.8, SPwM3), and that in the array case, it might be that it is stronger the array initialization (as a declared variable) than array construction but are decided in two different cases for WITH expression, with 'a' as an a TEXT WITH non-initialization but WITH p as a READONLY array-valued expression which doesn't do what you say it needs, so you found a bug known by Jay of "incorrect" un-initialized values in m3cg, or m3cc or m3gcc or m3cgc. In that case you might need an array of uninitialized expressions else construct the value correctly before entering the inner WITH. Thanks in advance --- El vie, 6/7/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 04:23 The report says (2.6.9) "The values in the array will be arbitrary values of their type." ? Now, ?ParseParams in its "init" method allocates an array of BOOLEANs and relies on the fact that it is supposedly?initialised with FALSE values. ? At the other hand the report says (2.2.4) "The constant default is a default value used when a record is constructed or allocated" ? If I allocate an array of records, which statement is stronger: - the array contains arbitray record values ? - the array record fields will be initialised to their default values? ? The ParseParams "init" method is obviously erroneous and works only by virtue of a happy combination of circumstances. But how is the report to be interpreted in the second case? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Fri Jul 6 18:28:10 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Fri, 06 Jul 2012 11:28:10 -0500 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> Message-ID: <4FF7121A.9000909@lcwb.coop> This is the result of the fact that your editor is writing UTF-8, while the compiler is reading in ISO-latin-1, as the language specifies. This was sensible at the time it was defined, but has been overcome by the advent and proliferation of Unicode. The abstract code point values in the range 16_80..16_FF are indeed the same in Unicode and ISO-latin-1, but the bit encoding rules are different. The simple and correct solution is to fix the compiler so that, like many programs today, it can be told to use one of several encodings when interpreting its input. Then set it the same as you set your editor. On 07/01/2012 03:52 AM, Dragi?a Duri? wrote: > Text.Length(Dragi?a Duri?)= 15 > > out from: > WITH me = W"Dragi?a Duri?" DO > IO.Put("Text.Length("& me& ")= "& Fmt.Int(Text.Length(me))& "\n"); > END; > > On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: > >> Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? > > From dabenavidesd at yahoo.es Fri Jul 6 19:08:25 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 18:08:25 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <4FF7121A.9000909@lcwb.coop> Message-ID: <1341594505.40475.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: I think the problem is how to encode but not to REVEAL (which would need machine identification, so a generic target is my preferred abstraction as CM3 tried to do) the language encoding explicitly (we don't like to reveal anything of the machine from Modula-3 sense standard point of view you might need a language redefinition), I think if one needs that is because is on a machine like that. So, in a given platform you might know the encoding and that's all. The other approach is just very hard to use, to put burden of choice, my thinking is that if you need that you might end needing generics that tell at compile time what to use. Of course Type checking methods are done at instantiation time, but nevertheless is helpful that these other settings are done at compile time (which make sense for the question why do I need to compile this code). That's because in other machines you might need to exploit three times the needed time to encode, decode and encode again (cost affects if you think in changing parameters so you might not touch that for the benefit of third parties as a default). This matters in phones where you don't have time to do that, and generally any type of type machine, so in a hard-coded way this is not helpful option for everybody at all as well. The machine-dependent solution helps if you can't compile the thing there (cross-compilations or pre-compiled binaries), but anyway I guess if we want Java compatibility (I do as a platform for binary compatibility but just when it's needed not in every execution environment, say a real HW implemented JVMs). So basically the language implementation needs to know that nobody else means that module wise model might need to be introduced, which is not something we have now. Thanks in advance --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] Simple change to WIDECHAR type Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 11:28 This is the result of the fact that your editor is writing UTF-8, while the compiler is reading in ISO-latin-1, as the language specifies.? This was sensible at the time it was defined, but has been overcome by the advent and proliferation of Unicode. The abstract code point values in the range 16_80..16_FF are indeed the same in Unicode and ISO-latin-1, but the bit encoding rules are different. The simple and correct solution is to fix the compiler so that, like many programs today, it can be told to use one of several encodings when interpreting its input.? Then set it the same as you set your editor. On 07/01/2012 03:52 AM, Dragi?a Duri? wrote: > Text.Length(Dragi?a Duri?)= 15 > > out from: >? ? WITH me = W"Dragi?a Duri?" DO >? ? ? IO.Put("Text.Length("&? me&? ")= "&? Fmt.Int(Text.Length(me))&? "\n"); >? ? END; > > On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: > >> Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Fri Jul 6 19:54:32 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Fri, 06 Jul 2012 12:54:32 -0500 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> References: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> Message-ID: <4FF72658.905@lcwb.coop> On 07/02/2012 10:57 AM, Tony Hosking wrote: > > On Jul 2, 2012, at 10:50 AM, Rodney Bates wrote: > >> >> >> -Rodney Bates >> >> --- antony.hosking at gmail.com wrote: >> >>> From: Antony Hosking > >>> To: "Rodney M. Bates" > >>> Cc: "m3devel at elegosoft.com " > >>> Subject: Re: [M3devel] UTF-8 TEXT >>> Date: Thu, 28 Jun 2012 10:37:36 -0400 >>> >>> Why not simply say that CHAR is an enumeration representing all of UTF-32? >>> The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. >>> We would need to translate the current Latin-1 literals into UTF-32. >>> And we could simply have a new literal form for Unicode literals. >>> >> This is almost what I would propose to do, with a couple of differences: >> >> Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. >> I am sure there is lots of existing code that depends on the implementation >> properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. > > Fair enough. Would we leave the encoding of CHAR as ISO-Latin-1? We?d still need translation from ISO-Latin-1 to UTF-8 wouldn?t we? > Yes. The code points for Unicode and ISO-Latin-1, in the range 128..255 map to the same characters, (as in 0..127). But the physical encoding is different. ISO-Latin-1 is encoded one byte per character unconditionally. When Unicode is encoded in UTF-8, any code point 128 or more uses at least two bytes. We need translations, but these belong in Wr/Rd and friends, which handle serial streams. In in-memory variables, WIDECHAR holds a Unicode code point, ARRAY OF WIDECHAR would happen to be the same representation as UTF-32, and Text.T would abstract away the internal representation. >> Then I would define, in the language itself, that WIDECHAR is Unicode, not >> UTF-32. Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an >> implementation characteristic that BYTESIZE(WIDECHAR))=4. > > I note this text from the Wikipedia entry for UTF-32: > > Though a fixed number of bytes per code point appear convenient, it is not as useful as it appears. It makes truncation easier but not significantly so compared to UTF-8 andUTF-16 . It does not make it faster to find a particular offset in the string, as an "offset" can be measured in the fixed-size code units of any encoding. It does not make calculating the displayed width of a string easier except in limited cases, since even with a ?fixed width? font there may be more than one code point per character position (combining marks ) or more than one character position per code point (for example CJK ideographs). Combining marks mean editors cannot treat one code point as being the same as one unit for editing. Editors that limit themselves to left-to-right languages and precomposed characters > can take advantage of fixed-sized code units, but such editors are unlikely to support non-BMP characters and thus can work equally well with 16-bit UTF-16 encoding. > > > Does this argue against WIDECHAR=UTF-32? Would we be better off simply saying WIDECHAR=UTF-16 and leaving things as they are? Yes, it would make the definition of WideCharAt a little odd, because the index would be defined in 16-bit units rather than UTF-16 glyph code-points. > No. Keeping WIDECHAR at only 2^16 values does nothing to get us out of the morass we are now in where every bit of character-manipulating code has to cope with different encodings and/or variable-sized encodings. If we make WIDECHAR capable of holding any Unicode code point, then we have the possibility of dealing with characters in the same abstractions as we originally had, and, with only an 8-bit character set, still do Specifically, we have a variable type that holds any character, arrays thereof, and a very general functional style package of strings thereof. Library streams can handle encoding transformations, and most code won't have to worry about them, beyond specifying once what encoding it wants. Of course, you could still always do low-level stuff like putting one UTF-8 code _unit_ into a WIDECHAR or CHAR, having arrays or TEXTs thereof, and constantly fiddling with the encoding. But this should not be required. > By the way, if we did change WIDECHAR to an enumeration containing 16_110000 elements then the stored (memory) size of WIDECHAR would be 4 bytes given the current CM3 implementation of enumerations, which chooses a (non-PACKED) stored size of 1/2/4/8 bytes depending on the number of elements. > I have thought about making BYTESIZE(WIDECHAR) = 3, but that would at best trade one group of problems for another. In particular, applying ORD functions and doing arithmetic on characters located in arrays (including those hidden inside Text) would always involve repacking to get things aligned. I would think we would at least want to keep WIDECHAR scalars aligned. >> >> On Jun 27, 2012, at 10:12 PM, Rodney M. Bates wrote: >> >>> >>> >>> On 06/27/2012 07:32 PM, Antony Hosking wrote: >>>> So what do we do about 6-byte UTF-8 code points? They won't fit in WIDECHAR. Surely we should allow accessing a UTF-8 character as a CARDINAL and be done with it? >>>> >>> >>> Absolutely. Except I think a better way is to make WIDECHAR big enough to hold all of >>> Unicode. >>> >>>> Sent from my iPad >>>> >>>> On Jun 27, 2012, at 3:20 PM, "Rodney M. Bates"> wrote: >>>> >>>>> >>>>> >>>>> On 06/26/2012 10:30 PM, Hendrik Boom wrote: >>>>>> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >>>>>>> I seem to recall that Rodney did some work a while back relating to TEXT. >>>>>>> Rodney, can you weigh in on some of this? >>>>>>> --Randy Coleburn >>>>>>> >>>>>>> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >>>>>>> Sent: Tuesday, June 26, 2012 12:46 PM >>>>>>> To: Jay >>>>>>> Cc: m3devel >>>>>>> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >>>>>>> >>>>>>> You had idea in other message. Store length! >>>>>>> >>>>>>> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. >>>>>> >>>>>> Most of the time, you don't need explicit integer indexes to character >>>>>> locations. What you do need is an operation that fetches a character >>>>>> given the string and its index (whatever data structure that index is), >>>>>> and one that increments the index past that character. As long as you >>>>>> can save an index and use it later on the same string, that's probably >>>>>> all you ever need. And with a simple TEXT representation (such as the >>>>>> obvious array of bytes containing characters of various widths) a byte >>>>>> index is all you need (note: NOT a character index). It's easy even to >>>>>> use TEXT and its integer indices as the data representation, as long as >>>>>> you use the proper functions parse the characters and increment the >>>>>> indices by amounts that might differ from 1. >>>>>> >>>>>> And if your source code is represented in UTF-8, the representation that >>>>>> requires little extra compiler effort to parse, your TEXT strings will >>>>>> automagically appear in UTF-8. >>>>> >>>>> The original designers of the language and its libraries have given us >>>>> two different abstractions for handling character strings (in addition >>>>> to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. >>>>> >>>>> Text is highly general and easy to use. Concatentations and substrings >>>>> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >>>>> Random access by *character* number is easy and, hopefully, implemented >>>>> with efficiency at least better than O(n). >>>>> >>>>> Wr and friends restrict you to sequential access, at least mostly, but >>>>> gain implementation convenience and efficiency as a result. >>>>> >>>>> I feel very stongly that we should *not* take away the full generality >>>>> of Text, especially efficient random access, to handle variable-length >>>>> character encodings in strings. For these, lets make more friends of >>>>> Wr and Rd, which already assume sequential access. For example, a >>>>> filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 >>>>> interpretation to its bytes, and delivers a stream of Unicode characters, >>>>> in variables of type WIDECHAR. >>>>> >>>>> Text should preserve the abstraction that it's a string of characters, >>>>> generalized as it already is in cm3, to have type WIDECHAR, so they can be any >>>>> Unicode character. The internal representation should, usually, not be >>>>> of concern. >>>>> >>>>> Note that nowhere in Text are character values transferred between >>>>> a Text.T and any form of I/O stream. In the Text abstraction, all >>>>> characters go in and out of a Text.T in variables of type CHAR, >>>>> WIDECHAR, and arrays thereof. IO, etc. is only done in streams, >>>>> e.g, TextWr. We can easily add new variants of these that encode/decode >>>>> by various rules. >>>>> >>>>> Of course, it is still valid to put a string of bytes in a Text.T and >>>>> apply, e.g., UTF-8 interpretation yourself. But that's lower-level >>>>> programming, and shouldn't confuse the abstraction. >>>>> >>>>>> >>>>>> I can see a use for various wide characters -- the things you extract >>>>>> from a TEXT by parsing biits of it, but none for anything >>>>>> really new complicated for wide TEXT. >>>>>> >>>>>> The only confusing thing is that the existing operations for extracting >>>>>> bytes from TEXT have names that suggest they are extracting characters. >>>>>> >>>>> >>>>> I think it's more than a suggestion. I think the abstraction clearly >>>>> considers them characters. And it should stay that way. If you want, >>>>> at a higher level of code, to treat them as bytes, that's fine, but the >>>>> abstraction continues to view them as characters (which only you, the >>>>> client, know is not really so.) >>>>> >>>>>> -- Hendrik >>>>>> >>>> >> >> >> > > From rodney_bates at lcwb.coop Fri Jul 6 20:27:28 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Fri, 06 Jul 2012 13:27:28 -0500 Subject: [M3devel] A question for our language lawyers In-Reply-To: References: Message-ID: <4FF72E10.3030204@lcwb.coop> On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded. Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code. So why not define which value of the type is initialized-to and get behavioral predictability for free? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? From dragisha at m3w.org Fri Jul 6 20:51:10 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 6 Jul 2012 20:51:10 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <4FF7121A.9000909@lcwb.coop> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> <4FF7121A.9000909@lcwb.coop> Message-ID: And then, turn parsed string literals into broken WIDECHAR TEXTs? On Jul 6, 2012, at 6:28 PM, Rodney M. Bates wrote: > This is the result of the fact that your editor is writing UTF-8, while > the compiler is reading in ISO-latin-1, as the language specifies. This > was sensible at the time it was defined, but has been overcome by the > advent and proliferation of Unicode. > > The abstract code point values in the range 16_80..16_FF are indeed the same in > Unicode and ISO-latin-1, but the bit encoding rules are different. > > The simple and correct solution is to fix the compiler so that, like many > programs today, it can be told to use one of several encodings when interpreting > its input. Then set it the same as you set your editor. > > On 07/01/2012 03:52 AM, Dragi?a Duri? wrote: >> Text.Length(Dragi?a Duri?)= 15 >> >> out from: >> WITH me = W"Dragi?a Duri?" DO >> IO.Put("Text.Length("& me& ")= "& Fmt.Int(Text.Length(me))& "\n"); >> END; >> >> On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: >> >>> Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? >> >> From dabenavidesd at yahoo.es Fri Jul 6 21:17:51 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 20:17:51 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: <4FF72E10.3030204@lcwb.coop> Message-ID: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 21:57:25 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 20:57:25 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: <1341604645.24546.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: I think if we are to type define initialization, we need a kernel to type more fun than rigid Modula-3 semantics: http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf That said, we can define a m3kernel sort of type minimal abstraction of a Modula-3 Object, and built on top of that. Advantages are we can type theorize? in every wanted way with it and still protect us from incompatible type systems, by branding the type system to allow smooth transitions. Besides parallelization implicitly in the abstract machine (kernel) and check the type safety of it. Also rewrite the type system in terms of this kernel might get us to a new language in the sense of a language definition smoothly If someone steems this good I can make my try. Thanks in advance --- El vie, 6/7/12, Daniel Alejandro Benavides D. escribi?: De: Daniel Alejandro Benavides D. Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 14:17 Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Fri Jul 6 21:54:54 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Fri, 6 Jul 2012 21:54:54 +0200 Subject: [M3devel] A question for our language lawyers In-Reply-To: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: Daniel, with my apologies, sometimes I wonder if you do it on purpose. From: Daniel Alejandro Benavides D. Sent: Friday, July 06, 2012 9:17 PM To: m3devel at elegosoft.com ; Rodney M. Bates Subject: Re: [M3devel] A question for our language lawyers Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded. Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code. So why not define which value of the type is initialized-to and get behavioral predictability for free? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 22:07:15 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 21:07:15 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: Message-ID: <1341605235.19643.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: thing is as today is you don't have any software to show the language is incorrect, so I can't validate you (I don't pretend to do that). Because there isn't any compiler that defines that. Sorry for that, but nobody else seems to care, so thanks for sharing your problem, at least someone is interested in that as well. Dr Dobbs talks about tri state boolean, I thought it was to show that. Sorry if not. Thanks in advance --- El vie, 6/7/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: Re: [M3devel] A question for our language lawyers Para: "Daniel Alejandro Benavides D." , m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 14:54 Daniel, with my apologies, sometimes I wonder if you do it on purpose. From: Daniel Alejandro Benavides D. Sent: Friday, July 06, 2012 9:17 PM To: m3devel at elegosoft.com ; Rodney M. Bates Subject: Re: [M3devel] A question for our language lawyers Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 22:59:12 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 21:59:12 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: <1341604645.24546.YahooMailClassic@web29706.mail.ird.yahoo.com> Message-ID: <1341608352.82920.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: See Baby Modula-3 allows field definition (value by definition s. 3.1) for free se p. 10-11 in url. Thanks in advance --- El vie, 6/7/12, Daniel Alejandro Benavides D. escribi?: De: Daniel Alejandro Benavides D. Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 14:57 Hi all: I think if we are to type define initialization, we need a kernel to type more fun than rigid Modula-3 semantics: http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf That said, we can define a m3kernel sort of type minimal abstraction of a Modula-3 Object, and built on top of that. Advantages are we can type theorize? in every wanted way with it and still protect us from incompatible type systems, by branding the type system to allow smooth transitions. Besides parallelization implicitly in the abstract machine (kernel) and check the type safety of it. Also rewrite the type system in terms of this kernel might get us to a new language in the sense of a language definition smoothly If someone steems this good I can make my try. Thanks in advance --- El vie, 6/7/12, Daniel Alejandro Benavides D. escribi?: De: Daniel Alejandro Benavides D. Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 14:17 Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jul 6 23:07:59 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 6 Jul 2012 23:07:59 +0200 Subject: [M3devel] A question for our language lawyers In-Reply-To: References: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: Dirk, If you still have doubts, you are better man than most of us :) Thanks in advance! On Jul 6, 2012, at 9:54 PM, Dirk Muysers wrote: > Daniel, with my apologies, sometimes I wonder if you do it on purpose. > > From: Daniel Alejandro Benavides D. > Sent: Friday, July 06, 2012 9:17 PM > To: m3devel at elegosoft.com ; Rodney M. Bates > Subject: Re: [M3devel] A question for our language lawyers > > Hi all: > English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: > > http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 > > So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) > > This means we need to address this by either a native backend (NT386) or by another language for that matter. > > Thanks in advance for any comments you may have > > --- El vie, 6/7/12, Rodney M. Bates escribi?: > > De: Rodney M. Bates > Asunto: Re: [M3devel] A question for our language lawyers > Para: m3devel at elegosoft.com > Fecha: viernes, 6 de julio, 2012 13:27 > > > > On 07/06/2012 04:23 AM, Dirk Muysers wrote: > > The report says (2.6.9) > > "The values in the array will be arbitrary values of their type." > > > Now, ParseParams in its "init" method allocates an array of BOOLEANs > > and relies on the fact that it is supposedly initialised with FALSE values. > > > At the other hand the report says (2.2.4) > > "The constant |default| is a default value used when a record is constructed or allocated" > > > If I allocate an array of records, which statement is stronger: > > - the array contains arbitray record values ? > > - the array record fields will be initialised to their default values? > > Admittedly unclearly if not misleadingly worded. Better wording might be > to say each element is initialized as it would if it were a scalar variable > of its type. > > I think the way to interpret this is that the array itself does not impose > any initialization, but this fact will not eliminate initialization > imposed by other rules, specifically, the type of the array's elements. > > This is a language quirk that I have always been deeply ambivalent about. > The type safety would go down the drain if variables were not initialized > to a bit pattern that represents some value of the type, so we have to pay > the performance penalty of executing initialization code. So why not define > which value of the type is initialized-to and get behavioral predictability > for free? And further save redundant initialization in the likely event > that the compiler's chosen arbitrary value happens to match what the > programmer wants? > > (OK, a smart enough optimizer might figure this out, but we could have > had it even with a naive compiler.) > > The contrary case is a type whose compiler-chosen representation happens > to use every bit pattern in the allocated space for a value of the type. > Here, no compiler-generated runtime initialization is needed. > > Also, the rule we have might sometimes encourage programmers to at least give a > millisecond's thought to whether they need to do some explicit initialization. > > > > The ParseParams "init" method is obviously erroneous and works only > > by virtue of a happy combination of circumstances. > > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 23:44:55 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 22:44:55 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: Message-ID: <1341611095.41843.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: As I said once, why say what's right, what is wrong, in terms of standards nobody cares that, so who cares to say that. (See other programming languages that need help first, like C and friends!) Thanks in advance --- El vie, 6/7/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] A question for our language lawyers Para: "Dirk Muysers" CC: "Daniel Alejandro Benavides D." , m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 16:07 Dirk, If you still have doubts, you are better man than most of us :) Thanks in advance! On Jul 6, 2012, at 9:54 PM, Dirk Muysers wrote: Daniel, with my apologies, sometimes I wonder if you do it on purpose. From: Daniel Alejandro Benavides D. Sent: Friday, July 06, 2012 9:17 PM To: m3devel at elegosoft.com ; Rodney M. Bates Subject: Re: [M3devel] A question for our language lawyers Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jul 7 08:05:39 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 7 Jul 2012 06:05:39 +0000 Subject: [M3devel] A question for our language lawyers In-Reply-To: References: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com>, Message-ID: I quite like the idea that all heap and stack is initialized by zeroing. This is I believe stronger/safer than Modula-3, at least for stack. Anyone want to measure the change? I'd also like to see stack zeroed upon function return, so GC is easier to implement/understand... From: dmuysers at hotmail.com To: dabenavidesd at yahoo.es; m3devel at elegosoft.com; rodney_bates at lcwb.coop Date: Fri, 6 Jul 2012 21:54:54 +0200 Subject: Re: [M3devel] A question for our language lawyers Daniel, with my apologies, sometimes I wonder if you do it on purpose. From: Daniel Alejandro Benavides D. Sent: Friday, July 06, 2012 9:17 PM To: m3devel at elegosoft.com ; Rodney M. Bates Subject: Re: [M3devel] A question for our language lawyers Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded. Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code. So why not define which value of the type is initialized-to and get behavioral predictability for free? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Sat Jul 7 14:06:31 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Sat, 7 Jul 2012 14:06:31 +0200 Subject: [M3devel] A question for our language lawyers Message-ID: I reread ParseParams.m3 and, yes, they initialise the array of booleans. One should never trust one's memory, especially past a certain age. Yet I am sure having seen one of the library modules relying on zero initialisation. For my excuse, I never (except an occasional INC, where C would use ++) place two statements on the same line, so when I quickly browse through some code, the second statement often escapes my eyes. Nevertheless the initialisation question was worth to be mentionned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Sat Jul 7 14:57:03 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sat, 7 Jul 2012 13:57:03 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: Message-ID: <1341665823.8622.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: yes, it could be, but VAXen and Alpha's I believe it did not cause the wrong behavior to show that? incorrect initialization at start time, that most part of it trust on it (Alphas just throw an exception to show that it was changed). I didn't know it was wrong for sure, but I guess that confirms the initialization code is not working by vicious value initialization. Did you see the Baby Modula-3 (in p.10 - 11, s 3.1 - Relation to Modula-3) it says you can do overriding at the type level overriding of fields to override defaults? Thanks in advance --- El s?b, 7/7/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: s?bado, 7 de julio, 2012 07:06 I reread ParseParams.m3 and, yes, they initialise the array of booleans. One should never trust?one's memory, especially past a certain age. Yet I am sure having seen one of the library modules relying on zero initialisation. For my excuse, I never (except an occasional INC, where C would use ++) place two statements on the same line, so when I quickly browse through some code, the second statement often escapes my eyes. Nevertheless the initialisation question was worth to be mentionned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Sat Jul 7 15:59:07 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Sat, 07 Jul 2012 08:59:07 -0500 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> <4FF7121A.9000909@lcwb.coop> Message-ID: <4FF840AB.5050807@lcwb.coop> On 07/06/2012 01:51 PM, Dragi?a Duri? wrote: > And then, turn parsed string literals into broken WIDECHAR TEXTs? > Well, yes, that requires fixing WIDECHAR too. But at least it would work if you can live within the BMP. > On Jul 6, 2012, at 6:28 PM, Rodney M. Bates wrote: > >> This is the result of the fact that your editor is writing UTF-8, while >> the compiler is reading in ISO-latin-1, as the language specifies. This >> was sensible at the time it was defined, but has been overcome by the >> advent and proliferation of Unicode. >> >> The abstract code point values in the range 16_80..16_FF are indeed the same in >> Unicode and ISO-latin-1, but the bit encoding rules are different. >> >> The simple and correct solution is to fix the compiler so that, like many >> programs today, it can be told to use one of several encodings when interpreting >> its input. Then set it the same as you set your editor. >> >> On 07/01/2012 03:52 AM, Dragi?a Duri? wrote: >>> Text.Length(Dragi?a Duri?)= 15 >>> >>> out from: >>> WITH me = W"Dragi?a Duri?" DO >>> IO.Put("Text.Length("& me& ")="& Fmt.Int(Text.Length(me))& "\n"); >>> END; >>> >>> On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: >>> >>>> Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? >>> >>> > > From dabenavidesd at yahoo.es Sat Jul 7 18:17:10 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sat, 7 Jul 2012 17:17:10 +0100 (BST) Subject: [M3devel] Modula-3 TLA Win32 Kernel Threads API Specification by Leslie Lamport Message-ID: <1341677830.27299.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I wanted to share what I have found recently: http://web.archive.org/web/20010712210213/http://www.research.compaq.com/SRC/personal/lamport/tla/threads/threads.html I would like to make that for POSIX 1003.4 (original DEC proposal) and post it, would Elegofolks mind to upload the Lamport to CVS tree, I think are important design notes of the Win32 Threads API if at all please let me know if interested. Alas it's TLA code may be considered m3theory subdirectory of m3kernel In fact there is a TLA checker written in connection with Zeus Algorithm Animation system for automating the animation of proofs, so I guess we just lack that part for further integration. Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Tue Jul 10 17:57:04 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Tue, 10 Jul 2012 10:57:04 -0500 Subject: [M3devel] A Unicode/WIDECHAR proposal Message-ID: <4FFC50D0.4000805@lcwb.coop> Here is a more-or-less comprehensive proposal to get modern support of Unicode and its various encodings into Modula-3 and its libraries, while preserving both backward compatibility and original abstractions. Summary: Fix WIDECHAR so it holds all of Unicode. This restores the abstractions we once had, by treating every character as a value of a scalar type, for in-memory processing. The members of a TEXT and elements of ARRAY OF WIDECHAR get this property too. Do encoding/decoding in streams Wr and Rd, which are inherently sequential anyway. Give every stream an encoding property. Add procedures to get/put characters with encoding/decoding. These changes are backward-compatable. You can still do low-level stuff if you have good reason, or just want to leave existing code alone. E.g., putting the bytes of UTF-8 into the characters of a TEXT and doing your own encoding/decoding. CHAR: Leave CHAR as it is: exactly 256 values, encoded in ISO-Latin-1, ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=16_FF, BYTESIZE(CHAR)=1. The language allows CHAR to have more values, but changing this would no doubt seriously undermine a good bit of existing code. WIDECHAR: Change WIDECHAR to have exactly the Unicode range. ORD(FIRST(WIDECHAR))=0 and ORD(LAST(WIDECHAR))=16_10FFFF. The full ORD and VAL functions from/to WIDECHAR are defined by the code point to character mapping of the Unicode standard. BYTESIZE(WIDECHAR)=4. Make actual internal representation be Unicode code points also. This happens to match UTF-32, most significantly for arrays of WIDECHAR. Note that some of the codepoint values in this range are not unicode characters. Programmers will need to account for this. CHAR <: WIDECHAR, which means they are mutually assignable, with runtime check in the one direction. This works because the Unicode code points and the ISO-Latin-1 code points are identical in the entire ISO-Latin-1 range, up to 16_FF. Note that at 16_80 and above, the UTF-8 encoding is more than one byte, none of them equal to the encoded code point. This is not a problem, because both CHAR and WIDECHAR are actual code points, not one of the bytes UTF-8. TEXT: TEXT continues to be defined as abstractly a sequence of WIDECHAR. An index into a TEXT is an integer count of characters. The internal representation (used only in memory, and maybe in pickles) is hidden and could be just about anything. Given the extreme memory inefficiency of the current cm3 implementation of TEXT, we no doubt will want to change it, but this decision is independent and at a lower level. The abstract interface Text will hide this. There is hardly a remaining need for Text.FromChar, because by assignability, Text.FromWideChar can be used in its place, with the same result. But keep FromChar, for compatability with existing code. Text.FromChars just means the code points in the created text will happen to be members of type CHAR. Text.GetChar and Text.GetChars will raise an exception if a to-be-gotten code point in the TEXT lies outside the type CHAR. This is a change from existing behavior, which just truncates the high bits of a WIDECHAR value and returns only the low bits. Even if we didn't add the exception, we would want this to be an assignability runtime error. Literals: Inside wide character and wide text literals, add two new escapes, \u, which is followed by exactly 4 hex digits denoting a code point, and \U, which is followed by exactly 6 hex digits. The letters 'u' and 'U' are used in this way in the Unicode standard. \u would be redundant with the existing \x and \X escapes, but those would merely preserve compatability for existing code. (Or is there so little existing code using them that we could eliminate them for a more consistent system?) Encodings: Define an enumeration giving the possible encodings used in streams: TYPE Encoding = {Inherit, ISO_Latin_1, UCS_2LE, UTF_8, UTF_16, UTF_16BE, UTF_16LE, UTF_32, UTF_32BE, UTF_32LE}; ISO_Latin_1 means one byte per character, unconditionally. This is the way current Modula-3 always encodes CHAR. An attempt to Put a code point greater than 16_FF in this encoding will raise an exception. (This can happen only using newly added procedures.) Similarly, UCS_2LE, as I understand the standard, means exactly two bytes per character, LSB first. This is what our current Wr and Rd use for WIDECHAR. Here again, an exception will be raised for a code point greater than 16_FFFF. This, also, can happen only using newly added procedures. Inherit means get the encoding to be used from somewhere else, for example, from the file system, in case it is able to store this property of a file. Every Wr.T and every Rd.T has an Encoding property that can be specified when creating the stream, (from one of its subtypes). The ways of doing this can vary with the subtype. This defaults to Inherit, which means, if possible, take it from the file system, etc. Otherwise, there are defaults for the various streams. New operations that Put/Get Unicode characters have a parameter of type Encoding, with a default value of Inherit, which means get the encoding property from the stream. Accepting this default would be the usual way to use these procedures. Specifying the encoding differently in the Put/Get procedure allows mixed encodings in a single stream. It seems dubious to encourage this, but existing Wr and Rd already provide plenty of opportunities to do similar stuff anyway, so this just extends existing semantics to the new procedures. It also allows some existing Put/Get procedures to be defined as equivalents to new ones. Wr: New procedure PutUniWideChar(Wr: T; ch: WIDECHAR; Enc:=Encoding.Inherit) encodes the character using Enc and appends that to the stream. There is hardly a need for a CHAR counterpart. Since CHAR is assignable to WIDECHAR, PutUniWideChar suffices for an actual parameter of either type. Whether the caller provides a CHAR or a WIDECHAR (or whether we were alternatively to have different procedures) does _not_ affect the encoding, only the value range that can be passed in. Similar new procedures PutUniString, PutUniWideString, and PutUniText are counterparts to PutString, PutWideString, and PutText, respectively. Existing PutChar and PutString, which write CHARs as one byte, each become equivalent to PutUniWideChar and PutUniString, with Enc:=Encoding.ISO_Latin_1. Similarly, Existing PutWideChar and PutWideString, which write WIDECHARs as two bytes each, becomes equivalent to PutUniWideChar and PutUniWideString, with Enc:=Encoding.UCS_2LE. The existing Wr interface is peculiar, IMO, in that even though there is currently no distinction between a text and a wide text, we have PutText and PutWideText. These have identical signatures, both taking a TEXT (which can contain characters in the full WIDECHAR range). The difference is that PutText rather violently truncates every character in the text to 8 bits and writes that, implicitly in ISO-Latin-1 encoding. This is not equivalent to PutUniText with Enc:=Encoding.ISO_Latin_1, because the latter will raise an exception for unencodable code points. Rd: New procedure GetUniWideChar (rd:T; Enc:=Encoding.Inherit) :WIDECHAR decodes, using Enc, and consumes, enough bytes from rd for one Unicode code point and returns it. There is not a lot of need for a CHAR-returning counterpart of GetUniWideChar. A caller can just assign the result from GetUniWideChar to a CHAR variable and deal with the possible range error at the call site. GetUniSub, GetUniWideSub, GetUniSubLine, GetUniWideSubLine, GetUniText, and GetUniTextLine are counterparts to GetSub, GetWideSub, GetSubLine GetWideSubLine, GetWideText, and GetWideLine. They differ in decoding according to the Enc parameter. In the new GetUni* procedures, any case where a partial character is terminated by end-of-file will raise an exception. This differs from the current GetWide* procedures, which all implicitly use UCS_2LE and just insert a zero byte as the MSB in this case. Existing GetChar, GetSub, GetSubLine, GetText, and GetLine all implicitly use the ISO-Latin-1 encoding. GetWideChar, GetWideSub, GetWideSubLine, GetWideText, and GetWideLine all implicitly use UCS_2LE. They differ from new GetUni* procedures using UCS_2LE in that the latter raise an exception on a incomplete character. GetUniSub and GetUniSubLine return decoded characters in ARRAY OF CHAR and raise an exception if a decoded code point is not in CHAR. This might seem a bit ridiculous, but they could be useful for quick, partial adaptation of existing code to accept newer encodings and detect, without otherwise handling, higher code points. Actually, GetWideText is documented as being identical to GetText, in behavior, as well as signature. I think this must be an editing error. I wonder if we need to review the rules for what constitutes a line break. A new UnGetUni would work like UnGetChar, but would reencode the pushed-back character, (retained internally as a WIDECHAR), according to its Enc parameter. The next Get* would then redecode according to its Enc parameter or implicit encoding, which could be different and consume a different number of bytes. If this seems bizarre, note that it continues established semantics. Existing UnGetChar will push back a character, implicitly in ISO-Latin-1, and it is possible to call GetWideChar next, which will use the pushed-back byte plus the byte following, decode in UCS-2LE, and return the result. UnGetUni will be more complicated to implement, but it can be done. It seems odd that there is no UnGetWideChar. UnGetUni with Enc:=Encoding.UCS_2LE should accomplish this. A UniCharsReady might be nice, but it would be O(n), for UTF-8 and UTF-16. Of course, these changes will require corresponding changes in several other stream-related interfaces, particularly in providing ways to specify (and interrogated?) an encoding property of a stream. Compiler source file encoding: Existing rules for interpretation (defacto, from the cm3 implementation) of wide character and wide string literals depend on the encoding of the input file. At present, the compiler always assumes this is ISO-latin-1. If it actually is a UTF-8 file, as is often the case today, this will result in incorrect conversion of literals. If, in our current implementation, the value of such a literal is then written out by a Modula-3 program, unchanged, the program will write ISO-Latin-1. If some other program (e.g., an editor or terminal emulator) interprets this output file as UTF-8, the reverse incorrect reinterpretation will result in the original string. But if the program manipulates the characters using the language-defined abstraction, the result will in general be incorrect. The same scenario applies when a single program reads in ISO-Latin-1, a file that was produced in UTF-8, writes in ISO-Latin-1, with the output file then being fed to some other program that interprets it as UTF-8. From dabenavidesd at yahoo.es Wed Jul 11 00:30:15 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 10 Jul 2012 23:30:15 +0100 (BST) Subject: [M3devel] A Unicode/WIDECHAR proposal In-Reply-To: <4FFC50D0.4000805@lcwb.coop> Message-ID: <1341959415.94700.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: Widechar is char simulation of a word sized char which is not intended by Rd/Wr implementation, read and write of literals is assuming that you won't get any real speed improvement over the DEC-SRC source to source transliteration of a given literal. This is to say, what you want is the same it is CM3 TEXT type with better functionality, is better to make polymorphic functions. e.g use FromChar receives both kind of chars without losing DEC-SRC representation characteristic and returning what you want in polymorphic (for instance your file text editor assumes you don't have real wide strings just yet one raw stream, then you can feed the text file in memory efficiently with a digital encoder optimized for your architecture and grab it there wherever you want, conversely opening an unused file you have to convert it at execution time, etc) way. Thanks in advance --- El mar, 10/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: [M3devel] A Unicode/WIDECHAR proposal Para: "m3devel" Fecha: martes, 10 de julio, 2012 10:57 Here is a more-or-less comprehensive proposal to get modern support of Unicode and its various encodings into Modula-3 and its libraries, while preserving both backward compatibility and original abstractions. Summary: Fix WIDECHAR so it holds all of Unicode.? This restores the abstractions we once had, by treating every character as a value of a scalar type, for in-memory processing.? The members of a TEXT and elements of ARRAY OF WIDECHAR get this property too. Do encoding/decoding in streams Wr and Rd, which are inherently sequential anyway.? Give every stream an encoding property.? Add procedures to get/put characters with encoding/decoding.? These changes are backward-compatable. You can still do low-level stuff if you have good reason, or just want to leave existing code alone.? E.g., putting the bytes of UTF-8 into the characters of a TEXT and doing your own encoding/decoding. CHAR: Leave CHAR as it is: exactly 256 values, encoded in ISO-Latin-1, ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=16_FF, BYTESIZE(CHAR)=1.? The language allows CHAR to have more values, but changing this would no doubt seriously undermine a good bit of existing code. WIDECHAR: Change WIDECHAR to have exactly the Unicode range. ORD(FIRST(WIDECHAR))=0 and ORD(LAST(WIDECHAR))=16_10FFFF.? The full ORD and VAL functions from/to WIDECHAR are defined by the code point to character mapping of the Unicode standard.? BYTESIZE(WIDECHAR)=4. Make actual internal representation be Unicode code points also.? This happens to match UTF-32, most significantly for arrays of WIDECHAR. Note that some of the codepoint values in this range are not unicode characters.? Programmers will need to account for this. CHAR <: WIDECHAR, which means they are mutually assignable, with runtime check in the one direction.? This works because the Unicode code points and the ISO-Latin-1 code points are identical in the entire ISO-Latin-1 range, up to 16_FF.? Note that at 16_80 and above, the UTF-8 encoding is more than one byte, none of them equal to the encoded code point.? This is not a problem, because both CHAR and WIDECHAR are actual code points, not one of the bytes UTF-8. TEXT: TEXT continues to be defined as abstractly a sequence of WIDECHAR.? An index into a TEXT is an integer count of characters.? The internal representation (used only in memory, and maybe in pickles) is hidden and could be just about anything. Given the extreme memory inefficiency of the current cm3 implementation of TEXT, we no doubt will want to change it, but this decision is independent and at a lower level.? The abstract interface Text will hide this. There is hardly a remaining need for Text.FromChar, because by assignability, Text.FromWideChar can be used in its place, with the same result.? But keep FromChar, for compatability with existing code. Text.FromChars just means the code points in the created text will happen to be members of type CHAR. Text.GetChar and Text.GetChars will raise an exception if a to-be-gotten code point in the TEXT lies outside the type CHAR.? This is a change from existing behavior, which just truncates the high bits of a WIDECHAR value and returns only the low bits.? Even if we didn't add the exception, we would want this to be an assignability runtime error. Literals: Inside wide character and wide text literals, add two new escapes, \u, which is followed by exactly 4 hex digits denoting a code point, and \U, which is followed by exactly 6 hex digits.? The letters 'u' and 'U' are used in this way in the Unicode standard.? \u would be redundant with the existing \x and \X escapes, but those would merely preserve compatability for existing code.? (Or is there so little existing code using them that we could eliminate them for a more consistent system?) Encodings: Define an enumeration giving the possible encodings used in streams: TYPE Encoding ???= {Inherit, ISO_Latin_1, UCS_2LE, UTF_8, UTF_16, UTF_16BE, UTF_16LE, ? ? ? UTF_32, UTF_32BE, UTF_32LE}; ISO_Latin_1 means one byte per character, unconditionally.? This is the way current Modula-3 always encodes CHAR.? An attempt to Put a code point greater than 16_FF in this encoding will raise an exception. (This can happen only using newly added procedures.) Similarly, UCS_2LE, as I understand the standard, means exactly two bytes per character, LSB first.? This is what our current Wr and Rd use for WIDECHAR.? Here again, an exception will be raised for a code point greater than 16_FFFF.? This, also, can happen only using newly added procedures. Inherit means get the encoding to be used from somewhere else, for example, from the file system, in case it is able to store this property of a file. Every Wr.T and every Rd.T has an Encoding property that can be specified when creating the stream, (from one of its subtypes).? The ways of doing this can vary with the subtype.? This defaults to Inherit, which means, if possible, take it from the file system, etc. Otherwise, there are defaults for the various streams. New operations that Put/Get Unicode characters have a parameter of type Encoding, with a default value of Inherit, which means get the encoding property from the stream.? Accepting this default would be the usual way to use these procedures. Specifying the encoding differently in the Put/Get procedure allows mixed encodings in a single stream.? It seems dubious to encourage this, but existing Wr and Rd already provide plenty of opportunities to do similar stuff anyway, so this just extends existing semantics to the new procedures.? It also allows some existing Put/Get procedures to be defined as equivalents to new ones. Wr: New procedure ? PutUniWideChar(Wr: T; ch: WIDECHAR; Enc:=Encoding.Inherit) encodes the character using Enc and appends that to the stream.? There is hardly a need for a CHAR counterpart.? Since CHAR is assignable to WIDECHAR, PutUniWideChar suffices for an actual parameter of either type.? Whether the caller provides a CHAR or a WIDECHAR (or whether we were alternatively to have different procedures) does _not_ affect the encoding, only the value range that can be passed in. Similar new procedures PutUniString, PutUniWideString, and PutUniText are counterparts to PutString, PutWideString, and PutText, respectively. Existing PutChar and PutString, which write CHARs as one byte, each become equivalent to PutUniWideChar and PutUniString, with Enc:=Encoding.ISO_Latin_1.? Similarly, Existing PutWideChar and PutWideString, which write WIDECHARs as two bytes each, becomes equivalent to PutUniWideChar and PutUniWideString, with Enc:=Encoding.UCS_2LE. The existing Wr interface is peculiar, IMO, in that even though there is currently no distinction between a text and a wide text, we have PutText and PutWideText.? These have identical signatures, both taking a TEXT (which can contain characters in the full WIDECHAR range).? The difference is that PutText rather violently truncates every character in the text to 8 bits and writes that, implicitly in ISO-Latin-1 encoding.? This is not equivalent to PutUniText with Enc:=Encoding.ISO_Latin_1, because the latter will raise an exception for unencodable code points. Rd: New procedure ? GetUniWideChar (rd:T; Enc:=Encoding.Inherit) :WIDECHAR decodes, using Enc, and consumes, enough bytes from rd for one Unicode code point and returns it.? There is not a lot of need for a CHAR-returning counterpart of GetUniWideChar.? A caller can just assign the result from GetUniWideChar to a CHAR variable and deal with the possible range error at the call site. GetUniSub, GetUniWideSub, GetUniSubLine, GetUniWideSubLine, GetUniText, and GetUniTextLine are counterparts to GetSub, GetWideSub, GetSubLine GetWideSubLine, GetWideText, and GetWideLine.? They differ in decoding according to the Enc parameter. In the new GetUni* procedures, any case where a partial character is terminated by end-of-file will raise an exception.? This differs from the current GetWide* procedures, which all implicitly use UCS_2LE and just insert a zero byte as the MSB in this case. Existing GetChar, GetSub, GetSubLine, GetText, and GetLine all implicitly use the ISO-Latin-1 encoding.? GetWideChar, GetWideSub, GetWideSubLine, GetWideText, and GetWideLine all implicitly use UCS_2LE.? They differ from new GetUni* procedures using UCS_2LE in that the latter raise an exception on a incomplete character. GetUniSub and GetUniSubLine return decoded characters in ARRAY OF CHAR and raise an exception if a decoded code point is not in CHAR.? This might seem a bit ridiculous, but they could be useful for quick, partial adaptation of existing code to accept newer encodings and detect, without otherwise handling, higher code points. Actually, GetWideText is documented as being identical to GetText, in behavior, as well as signature.? I think this must be an editing error. I wonder if we need to review the rules for what constitutes a line break. A new UnGetUni would work like UnGetChar, but would reencode the pushed-back character, (retained internally as a WIDECHAR), according to its Enc parameter.? The next Get* would then redecode according to its Enc parameter or implicit encoding, which could be different and consume a different number of bytes.? If this seems bizarre, note that it continues established semantics.? Existing UnGetChar will push back a character, implicitly in ISO-Latin-1, and it is possible to call GetWideChar next, which will use the pushed-back byte plus the byte following, decode in UCS-2LE, and return the result.? UnGetUni will be more complicated to implement, but it can be done. It seems odd that there is no UnGetWideChar.? UnGetUni with Enc:=Encoding.UCS_2LE should accomplish this. A UniCharsReady might be nice, but it would be O(n), for UTF-8 and UTF-16. Of course, these changes will require corresponding changes in several other stream-related interfaces, particularly in providing ways to specify (and interrogated?) an encoding property of a stream. Compiler source file encoding: Existing rules for interpretation (defacto, from the cm3 implementation) of wide character and wide string literals depend on the encoding of the input file.? At present, the compiler always assumes this is ISO-latin-1.? If it actually is a UTF-8 file, as is often the case today, this will result in incorrect conversion of literals. If, in our current implementation, the value of such a literal is then written out by a Modula-3 program, unchanged, the program will write ISO-Latin-1.? If some other program (e.g., an editor or terminal emulator) interprets this output file as UTF-8, the reverse incorrect reinterpretation will result in the original string.? But if the program manipulates the characters using the language-defined abstraction, the result will in general be incorrect. The same scenario applies when a single program reads in ISO-Latin-1, a file that was produced in UTF-8, writes in ISO-Latin-1, with the output file then being fed to some other program that interprets it as UTF-8. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgoltzsch at gmail.com Thu Jul 12 11:39:58 2012 From: pgoltzsch at gmail.com (Patrick Goltzsch) Date: Thu, 12 Jul 2012 11:39:58 +0200 Subject: [M3devel] unix - unknown qualification Message-ID: <20120712113958.33d94bc4@leda> Hi! I am having trouble compiling some older sources. I had the impression that it would be sufficient to "IMPORT Unix;" in ClsShare.m3 but obviously it's not: --- building in ../AMD64_LINUX --- new source -> compiling ClsShare.m3 "/work/mylib/src/ClsShare.m3", line 73: unknown qualification '.' (struct_flock) "/work/mylib/src/ClsShare.m3", line 74: unknown qualification '.' (F_WRLCK) "/work/mylib/src/ClsShare.m3", line 75: unknown qualification '.' (L_SET) "/work/mylib/src/ClsShare.m3", line 82: unknown qualification '.' (F_SETLK) "/work/mylib/src/ClsShare.m3", line 98: unknown qualification '.' (struct_flock) "/work/mylib/src/ClsShare.m3", line 99: unknown qualification '.' (F_UNLCK) "/work/mylib/src/ClsShare.m3", line 100: unknown qualification '.' (L_SET) "/work/mylib/src/ClsShare.m3", line 107: unknown qualification '.' (F_SETLK) 8 errors encountered What am I doing wrong? Compiler is: Critical Mass Modula-3 version 5.8.6 last updated: 2010-04-11 compiled: 2010-07-12 20:10:34 configuration: /usr/local/cm3/bin/cm3.cfg host: AMD64_LINUX target: AMD64_LINUX Thanks a lot, Patrick From rodney_bates at lcwb.coop Thu Jul 12 14:18:01 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Thu, 12 Jul 2012 07:18:01 -0500 Subject: [M3devel] unix - unknown qualification In-Reply-To: <20120712113958.33d94bc4@leda> References: <20120712113958.33d94bc4@leda> Message-ID: <4FFEC079.7040104@lcwb.coop> I think we need to see some source code for ClsShare.m3. particularly to see what is before the dot on these lines. I don't see any of the failing qualifications in Unix.i3 in my cm3 directory. On 07/12/2012 04:39 AM, Patrick Goltzsch wrote: > Hi! > > I am having trouble compiling some older sources. I had the > impression that it would be sufficient to "IMPORT Unix;" in > ClsShare.m3 but obviously it's not: > > --- building in ../AMD64_LINUX --- > > new source -> compiling ClsShare.m3 > "/work/mylib/src/ClsShare.m3", line 73: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 74: unknown qualification '.' (F_WRLCK) > "/work/mylib/src/ClsShare.m3", line 75: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 82: unknown qualification '.' (F_SETLK) > "/work/mylib/src/ClsShare.m3", line 98: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 99: unknown qualification '.' (F_UNLCK) > "/work/mylib/src/ClsShare.m3", line 100: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 107: unknown qualification '.' (F_SETLK) > 8 errors encountered > > What am I doing wrong? Compiler is: > > Critical Mass Modula-3 version 5.8.6 > last updated: 2010-04-11 > compiled: 2010-07-12 20:10:34 > configuration: /usr/local/cm3/bin/cm3.cfg > host: AMD64_LINUX > target: AMD64_LINUX > > Thanks a lot, > > Patrick > From rodney_bates at lcwb.coop Thu Jul 12 14:27:38 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Thu, 12 Jul 2012 07:27:38 -0500 Subject: [M3devel] unix - unknown qualification In-Reply-To: <20120712113958.33d94bc4@leda> References: <20120712113958.33d94bc4@leda> Message-ID: <4FFEC2BA.4080406@lcwb.coop> I poked around in a version of PM3. There, there are multiple, OS-dependent versions of Unix.i3. Most or all of them do have the failing qualifications declared in them. So somewhere along the line, Unix.i3 has changed and lost these declarations, leaving ClsShare in the lurch. I don't know when or why this happened. Jay? On 07/12/2012 04:39 AM, Patrick Goltzsch wrote: > Hi! > > I am having trouble compiling some older sources. I had the > impression that it would be sufficient to "IMPORT Unix;" in > ClsShare.m3 but obviously it's not: > > --- building in ../AMD64_LINUX --- > > new source -> compiling ClsShare.m3 > "/work/mylib/src/ClsShare.m3", line 73: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 74: unknown qualification '.' (F_WRLCK) > "/work/mylib/src/ClsShare.m3", line 75: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 82: unknown qualification '.' (F_SETLK) > "/work/mylib/src/ClsShare.m3", line 98: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 99: unknown qualification '.' (F_UNLCK) > "/work/mylib/src/ClsShare.m3", line 100: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 107: unknown qualification '.' (F_SETLK) > 8 errors encountered > > What am I doing wrong? Compiler is: > > Critical Mass Modula-3 version 5.8.6 > last updated: 2010-04-11 > compiled: 2010-07-12 20:10:34 > configuration: /usr/local/cm3/bin/cm3.cfg > host: AMD64_LINUX > target: AMD64_LINUX > > Thanks a lot, > > Patrick > From pgoltzsch at gmail.com Thu Jul 12 14:58:11 2012 From: pgoltzsch at gmail.com (Patrick Goltzsch) Date: Thu, 12 Jul 2012 14:58:11 +0200 Subject: [M3devel] unix - unknown qualification In-Reply-To: <4FFEC079.7040104@lcwb.coop> References: <20120712113958.33d94bc4@leda> <4FFEC079.7040104@lcwb.coop> Message-ID: <20120712145811.2a4901d3@leda> >>>>> Rodney M. Bates wrote: > I think we need to see some source code for ClsShare.m3. > particularly to see what is before the dot on these lines. I > don't see any of the failing qualifications in Unix.i3 in my > cm3 directory. The first errors are caused by the following procedure, which seems to copied from old DEC example code as I found out while looking for a solution: PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = VAR flock := Unix.struct_flock { l_type := Unix.F_WRLCK, l_whence := Unix.L_SET, l_start := 0, l_len := 0, (* i.e., whole file *) l_pid := 0 }; (* don't care *) BEGIN flock.l_start := start; flock.l_len := len; IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 THEN IF Uerror.errno = Uerror.EACCES OR Uerror.errno = Uerror.EAGAIN THEN RETURN FALSE END; OSErrorPosix.Raise() END; RETURN TRUE END FilePartLock; Regards, Patrick From dabenavidesd at yahoo.es Thu Jul 12 15:43:52 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 12 Jul 2012 14:43:52 +0100 (BST) Subject: [M3devel] unix - unknown qualification In-Reply-To: <4FFEC2BA.4080406@lcwb.coop> Message-ID: <1342100632.27773.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: all gnu non-posix file consts and structs were pushed down to unix/linux-common files, but to accommodate for all non-posix standards is uncomfortable or impossible. So must use the kernel call directly to control the locking policy in C code and pass control to M3 youControlFile.c In a sane environment is better to reconstruct most of Unix Calls by Micro kernel, but I guess the world doesn't do that or maybe you can find a Unix API uniform enough Modular to do that like PosixFileC.c in libm3/src/os/POSIX for sure there is more than one outside there but who makes that thing doesn't uses Unixes like cygwin or some UnixControlFile.c that already do that would be wodnerful. Thanks in advance --- El jue, 12/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] unix - unknown qualification Para: m3devel at elegosoft.com Fecha: jueves, 12 de julio, 2012 07:27 I poked around in a version of PM3.? There, there are multiple, OS-dependent versions of Unix.i3.? Most or all of them do have the failing qualifications declared in them.? So somewhere along the line, Unix.i3 has changed and lost these declarations, leaving ClsShare in the lurch. I don't know when or why this happened.? Jay? On 07/12/2012 04:39 AM, Patrick Goltzsch wrote: > Hi! > > I am having trouble compiling some older sources. I had the > impression that it would be sufficient to "IMPORT Unix;" in > ClsShare.m3 but obviously it's not: > > --- building in ../AMD64_LINUX --- > > new source ->? compiling ClsShare.m3 > "/work/mylib/src/ClsShare.m3", line 73: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 74: unknown qualification '.' (F_WRLCK) > "/work/mylib/src/ClsShare.m3", line 75: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 82: unknown qualification '.' (F_SETLK) > "/work/mylib/src/ClsShare.m3", line 98: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 99: unknown qualification '.' (F_UNLCK) > "/work/mylib/src/ClsShare.m3", line 100: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 107: unknown qualification '.' (F_SETLK) > 8 errors encountered > > What am I doing wrong? Compiler is: > > Critical Mass Modula-3 version 5.8.6 >? ? last updated: 2010-04-11 >? ? compiled: 2010-07-12 20:10:34 >? ? configuration: /usr/local/cm3/bin/cm3.cfg >? ? host: AMD64_LINUX >? ? target: AMD64_LINUX > > Thanks a lot, > > ??? ??? ??? Patrick > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Thu Jul 12 18:52:38 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 12 Jul 2012 17:52:38 +0100 (BST) Subject: [M3devel] Why everything is an object Message-ID: <1342111958.55562.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: If you read this might give some idea to all users about why here everything is an object for real: http://wcook.blogspot.com/ Curiosity, it doesn't much explain why functional isn't subsumed by OO, but every Object in the Baby Modula-3 is functional Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jul 13 00:12:49 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 12 Jul 2012 22:12:49 +0000 Subject: [M3devel] unix - unknown qualification In-Reply-To: <20120712145811.2a4901d3@leda> References: <20120712113958.33d94bc4@leda>, <4FFEC079.7040104@lcwb.coop>, <20120712145811.2a4901d3@leda> Message-ID: Unix.i3 has always been a maintenance and portability problem.As such, it has been dramatically reduced.This stuff was probably removed, esp. struct_flock.The constants can exposed easily enough, portably, but aren't useful without the struct, prpbably. You REALLY REALLY REALLY want to write this in C.Writing it in Modula-3 has many downsides. You lose safety. You lose static checking. You lose portability.You gain infinitely small efficiency.Something like: jbook2:libm3 jay$ pwd/dev2/cm3/m3-libs/libm3jbook2:libm3 jay$ find . | xargs grep flock./src/os/POSIX/FilePosixC.c:saves us from having to declare struct flock, which is gnarled up in #ifdefs../src/os/POSIX/FilePosixC.c: struct flock lock;./src/os/POSIX/FilePosixC.c: struct flock lock;./tests/os/src/locktest.c: struct flock param; ./src/os/POSIX/FilePosixC.c: /* Copyright (C) 1993, Digital Equipment Corporation *//* All rights reserved. *//* See the file COPYRIGHT for a full description. */ /*Writing part of libm3/os/POSIX/FilePosix.m3/RegularFileLock, RegularFileUnlock in Csaves us from having to declare struct flock, which is gnarled up in #ifdefs. see http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html*/ #include "m3core.h"#include #ifdef __cplusplusextern "C" {#endif #define FALSE 0#define TRUE 1 INTEGER FilePosixC__RegularFileLock(int fd){ struct flock lock; int err; ZeroMemory(&lock, sizeof(lock)); lock.l_type = F_WRLCK; lock.l_whence = SEEK_SET; if (fcntl(fd, F_SETLK, &lock) < 0) { err = errno; if (err == EACCES || err == EAGAIN) return FALSE; return -1; } return TRUE;} INTEGER FilePosixC__RegularFileUnlock(int fd){ struct flock lock; ZeroMemory(&lock, sizeof(lock)); lock.l_type = F_UNLCK; lock.l_whence = SEEK_SET; return fcntl(fd, F_SETLK, &lock);} #ifdef __cplusplus} /* extern "C" */#endif We can add this to libm3 probably. - Jay > Date: Thu, 12 Jul 2012 14:58:11 +0200 > From: pgoltzsch at gmail.com > To: m3devel at elegosoft.com > Subject: Re: [M3devel] unix - unknown qualification > > >>>>> Rodney M. Bates wrote: > > > I think we need to see some source code for ClsShare.m3. > > particularly to see what is before the dot on these lines. I > > don't see any of the failing qualifications in Unix.i3 in my > > cm3 directory. > > The first errors are caused by the following procedure, > which seems to copied from old DEC example code as I found > out while looking for a solution: > > PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = > VAR flock := Unix.struct_flock { > l_type := Unix.F_WRLCK, > l_whence := Unix.L_SET, > l_start := 0, > l_len := 0, (* i.e., whole file *) > l_pid := 0 }; (* don't care *) > BEGIN > flock.l_start := start; > flock.l_len := len; > IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 > THEN > IF Uerror.errno = Uerror.EACCES OR > Uerror.errno = Uerror.EAGAIN THEN > RETURN FALSE > END; > OSErrorPosix.Raise() > END; > RETURN TRUE > END FilePartLock; > > > > Regards, > > Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jul 13 11:33:16 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 13 Jul 2012 09:33:16 +0000 Subject: [M3devel] unix - unknown qualification In-Reply-To: References: <20120712113958.33d94bc4@leda>, <4FFEC079.7040104@lcwb.coop>, , <20120712145811.2a4901d3@leda>, Message-ID: Hey, how about I just provide copying wrappers here, like we do for stat?Where we define an idealized portable struct flock and copy it back and forth in C to the real struct flock? It is a little strange -- the wrapper is fnctl.It must check the first parameter, and know/assume its meaning. - Jay From: jay.krell at cornell.edu To: pgoltzsch at gmail.com; m3devel at elegosoft.com Date: Thu, 12 Jul 2012 22:12:49 +0000 Subject: Re: [M3devel] unix - unknown qualification Unix.i3 has always been a maintenance and portability problem.As such, it has been dramatically reduced.This stuff was probably removed, esp. struct_flock.The constants can exposed easily enough, portably, but aren't useful without the struct, prpbably. You REALLY REALLY REALLY want to write this in C.Writing it in Modula-3 has many downsides. You lose safety. You lose static checking. You lose portability.You gain infinitely small efficiency.Something like: jbook2:libm3 jay$ pwd/dev2/cm3/m3-libs/libm3jbook2:libm3 jay$ find . | xargs grep flock./src/os/POSIX/FilePosixC.c:saves us from having to declare struct flock, which is gnarled up in #ifdefs../src/os/POSIX/FilePosixC.c: struct flock lock;./src/os/POSIX/FilePosixC.c: struct flock lock;./tests/os/src/locktest.c: struct flock param; ./src/os/POSIX/FilePosixC.c: /* Copyright (C) 1993, Digital Equipment Corporation *//* All rights reserved. *//* See the file COPYRIGHT for a full description. */ /*Writing part of libm3/os/POSIX/FilePosix.m3/RegularFileLock, RegularFileUnlock in Csaves us from having to declare struct flock, which is gnarled up in #ifdefs. see http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html*/ #include "m3core.h"#include #ifdef __cplusplusextern "C" {#endif #define FALSE 0#define TRUE 1 INTEGER FilePosixC__RegularFileLock(int fd){ struct flock lock; int err; ZeroMemory(&lock, sizeof(lock)); lock.l_type = F_WRLCK; lock.l_whence = SEEK_SET; if (fcntl(fd, F_SETLK, &lock) < 0) { err = errno; if (err == EACCES || err == EAGAIN) return FALSE; return -1; } return TRUE;} INTEGER FilePosixC__RegularFileUnlock(int fd){ struct flock lock; ZeroMemory(&lock, sizeof(lock)); lock.l_type = F_UNLCK; lock.l_whence = SEEK_SET; return fcntl(fd, F_SETLK, &lock);} #ifdef __cplusplus} /* extern "C" */#endif We can add this to libm3 probably. - Jay > Date: Thu, 12 Jul 2012 14:58:11 +0200 > From: pgoltzsch at gmail.com > To: m3devel at elegosoft.com > Subject: Re: [M3devel] unix - unknown qualification > > >>>>> Rodney M. Bates wrote: > > > I think we need to see some source code for ClsShare.m3. > > particularly to see what is before the dot on these lines. I > > don't see any of the failing qualifications in Unix.i3 in my > > cm3 directory. > > The first errors are caused by the following procedure, > which seems to copied from old DEC example code as I found > out while looking for a solution: > > PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = > VAR flock := Unix.struct_flock { > l_type := Unix.F_WRLCK, > l_whence := Unix.L_SET, > l_start := 0, > l_len := 0, (* i.e., whole file *) > l_pid := 0 }; (* don't care *) > BEGIN > flock.l_start := start; > flock.l_len := len; > IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 > THEN > IF Uerror.errno = Uerror.EACCES OR > Uerror.errno = Uerror.EAGAIN THEN > RETURN FALSE > END; > OSErrorPosix.Raise() > END; > RETURN TRUE > END FilePartLock; > > > > Regards, > > Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Fri Jul 13 14:54:37 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Fri, 13 Jul 2012 07:54:37 -0500 Subject: [M3devel] unix - unknown qualification In-Reply-To: References: <20120712113958.33d94bc4@leda>, <4FFEC079.7040104@lcwb.coop>, , <20120712145811.2a4901d3@leda>, Message-ID: <50001A8D.80805@lcwb.coop> Sounds like a good idea to me. IT moves the M3/C boundary back just enough to pick up all the #ifdef stuff, etc. but not the application-specific code. On 07/13/2012 04:33 AM, Jay K wrote: > Hey, how about I just provide copying wrappers here, like we do for stat? > Where we define an idealized portable struct flock and copy it back and forth in C to the real struct flock? > > It is a little strange -- the wrapper is fnctl. > It must check the first parameter, and know/assume its meaning. > > > - Jayrom: jay.krell at cornell.edu > To: pgoltzsch at gmail.com; m3devel at elegosoft.com > Date: Thu, 12 Jul 2012 22:12:49 +0000 > Subject: Re: [M3devel] unix - unknown qualification > > Unix.i3 has always been a maintenance and portability problem. > As such, it has been dramatically reduced. > This stuff was probably removed, esp. struct_flock. > The constants can exposed easily enough, portably, but aren't useful without the struct, prpbably. > > > You REALLY REALLY REALLY want to write this in C. > Writing it in Modula-3 has many downsides. You lose safety. You losestatic checking. You loseportability. > You gain infinitely small efficiency. > Something like: > > > jbook2:libm3 jay$ pwd > /dev2/cm3/m3-libs/libm3 > jbook2:libm3 jay$ find . | xargs grep flock > ./src/os/POSIX/FilePosixC.c:saves us from having to declare struct flock, which is gnarled up in #ifdefs. > ./src/os/POSIX/FilePosixC.c: struct flock lock; > ./src/os/POSIX/FilePosixC.c: struct flock lock; > ./tests/os/src/locktest.c: struct flock param; > > > ./src/os/POSIX/FilePosixC.c: > > /* Copyright (C) 1993, Digital Equipment Corporation */ > /* All rights reserved. */ > /* See the file COPYRIGHT for a full description. */ > > /* > Writing part of libm3/os/POSIX/FilePosix.m3/RegularFileLock, RegularFileUnlock in C > saves us from having to declare struct flock, which is gnarled up in #ifdefs. > > see http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html > */ > > #include "m3core.h" > #include > > #ifdef __cplusplus > extern "C" { > #endif > > #define FALSE 0 > #define TRUE 1 > > INTEGER FilePosixC__RegularFileLock(int fd) > { > struct flock lock; > int err; > > ZeroMemory(&lock, sizeof(lock)); > lock.l_type = F_WRLCK; > lock.l_whence = SEEK_SET; > > if (fcntl(fd, F_SETLK, &lock) < 0) > { > err = errno; > if (err == EACCES || err == EAGAIN) > return FALSE; > return -1; > } > return TRUE; > } > > INTEGER FilePosixC__RegularFileUnlock(int fd) > { > struct flock lock; > > ZeroMemory(&lock, sizeof(lock)); > lock.l_type = F_UNLCK; > lock.l_whence = SEEK_SET; > > return fcntl(fd, F_SETLK, &lock); > } > > #ifdef __cplusplus > } /* extern "C" */ > #endif > > > > We can add this to libm3 probably. > > > - Jay > > > > Date: Thu, 12 Jul 2012 14:58:11 +0200 > > From: pgoltzsch at gmail.com > > To: m3devel at elegosoft.com > > Subject: Re: [M3devel] unix - unknown qualification > > > > >>>>> Rodney M. Bates wrote: > > > > > I think we need to see some source code for ClsShare.m3. > > > particularly to see what is before the dot on these lines. I > > > don't see any of the failing qualifications in Unix.i3 in my > > > cm3 directory. > > > > The first errors are caused by the following procedure, > > which seems to copied from old DEC example code as I found > > out while looking for a solution: > > > > PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = > > VAR flock := Unix.struct_flock { > > l_type := Unix.F_WRLCK, > > l_whence := Unix.L_SET, > > l_start := 0, > > l_len := 0, (* i.e., whole file *) > > l_pid := 0 }; (* don't care *) > > BEGIN > > flock.l_start := start; > > flock.l_len := len; > > IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 > > THEN > > IF Uerror.errno = Uerror.EACCES OR > > Uerror.errno = Uerror.EAGAIN THEN > > RETURN FALSE > > END; > > OSErrorPosix.Raise() > > END; > > RETURN TRUE > > END FilePartLock; > > > > > > > > Regards, > > > > Patrick From dabenavidesd at yahoo.es Fri Jul 13 16:44:55 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 13 Jul 2012 15:44:55 +0100 (BST) Subject: [M3devel] unix - unknown qualification In-Reply-To: <50001A8D.80805@lcwb.coop> Message-ID: <1342190695.15538.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: indeed but I'm afraid that using C API level specification programming doesn't make the bulk sense of the language, the core is about machine programming, that so many believe is better in C. But UNSAFE in my way of think is just better than C because you still have some check not bullet proof, but with appropriate module isolation you can control it doesn't propagate by using Modula-3 keen Modules in RTMachinery stopped appropriately and where the machine allows safety manageable execution you can recover from that (trapped error, like arithmetic overflow e.g to dump it in disk) or update your data and finish with an expectancy of following rules to stop execution, this is my point Jay. Now quality of current machines is going more bad than before, so who cares if we use DEC stuff. I wanted to say, that here the language designers tried hard to make easier to optimize itself the language and for this purpose in mind, with that objective makes sense to believe that the application itself must be compiled with Modula-3, so at some degree I'm being hypocritical about Gcc use, but sometimes using Gcc gives more time to develop the rest of the system. Thanks in advance --- El vie, 13/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] unix - unknown qualification Para: m3devel at elegosoft.com Fecha: viernes, 13 de julio, 2012 07:54 Sounds like a good idea to me.? IT moves the M3/C boundary back just enough to pick up all the #ifdef stuff, etc. but not the application-specific code. On 07/13/2012 04:33 AM, Jay K wrote: > Hey, how about I just provide copying wrappers here, like we do for stat? > Where we define an idealized portable struct flock and copy it back and forth in C to the real struct flock? > > It is a little strange -- the wrapper is fnctl. > It must check the first parameter, and know/assume its meaning. > > >???- Jayrom: jay.krell at cornell.edu > To: pgoltzsch at gmail.com; m3devel at elegosoft.com > Date: Thu, 12 Jul 2012 22:12:49 +0000 > Subject: Re: [M3devel] unix - unknown qualification > > Unix.i3 has always been a maintenance and portability problem. > As such, it has been dramatically reduced. > This stuff was probably removed, esp. struct_flock. > The constants can exposed easily enough, portably, but aren't useful without the struct, prpbably. > > > You REALLY REALLY REALLY want to write this in C. > Writing it in Modula-3 has many downsides. You lose safety. You losestatic checking. You loseportability. > You gain infinitely small efficiency. > Something like: > > > jbook2:libm3 jay$ pwd > /dev2/cm3/m3-libs/libm3 > jbook2:libm3 jay$ find . | xargs grep flock > ./src/os/POSIX/FilePosixC.c:saves us from having to declare struct flock, which is gnarled up in #ifdefs. > ./src/os/POSIX/FilePosixC.c:? ? struct flock lock; > ./src/os/POSIX/FilePosixC.c:? ? struct flock lock; > ./tests/os/src/locktest.c:? struct flock param; > > > ./src/os/POSIX/FilePosixC.c: > > /* Copyright (C) 1993, Digital Equipment Corporation? ? ? ? ???*/ > /* All rights reserved.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? */ > /* See the file COPYRIGHT for a full description.? ? ? ? ? ? ? */ > > /* > Writing part of libm3/os/POSIX/FilePosix.m3/RegularFileLock, RegularFileUnlock in C > saves us from having to declare struct flock, which is gnarled up in #ifdefs. > > see http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html > */ > > #include "m3core.h" > #include > > #ifdef __cplusplus > extern "C" { > #endif > > #define FALSE 0 > #define TRUE 1 > > INTEGER FilePosixC__RegularFileLock(int fd) > { >? ? ? struct flock lock; >? ? ? int err; > >? ? ? ZeroMemory(&lock, sizeof(lock)); >? ? ? lock.l_type = F_WRLCK; >? ? ? lock.l_whence = SEEK_SET; > >? ? ? if (fcntl(fd, F_SETLK, &lock) < 0) >? ? ? { >? ? ? ? ? err = errno; >? ? ? ? ? if (err == EACCES || err == EAGAIN) >? ? ? ? ? ? ? return FALSE; >? ? ? ? ? return -1; >? ? ? } >? ? ? return TRUE; > } > > INTEGER FilePosixC__RegularFileUnlock(int fd) > { >? ? ? struct flock lock; > >? ? ? ZeroMemory(&lock, sizeof(lock)); >? ? ? lock.l_type = F_UNLCK; >? ? ? lock.l_whence = SEEK_SET; > >? ? ? return fcntl(fd, F_SETLK, &lock); > } > > #ifdef __cplusplus > } /* extern "C" */ > #endif > > > > We can add this to libm3 probably. > > >???- Jay > > >? > Date: Thu, 12 Jul 2012 14:58:11 +0200 >? > From: pgoltzsch at gmail.com >? > To: m3devel at elegosoft.com >? > Subject: Re: [M3devel] unix - unknown qualification >? > >? > >>>>> Rodney M. Bates wrote: >? > >? > > I think we need to see some source code for ClsShare.m3. >? > > particularly to see what is before the dot on these lines. I >? > > don't see any of the failing qualifications in Unix.i3 in my >? > > cm3 directory. >? > >? > The first errors are caused by the following procedure, >? > which seems to copied from old DEC example code as I found >? > out while looking for a solution: >? > >? > PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = >? > VAR flock := Unix.struct_flock { >? > l_type := Unix.F_WRLCK, >? > l_whence := Unix.L_SET, >? > l_start := 0, >? > l_len := 0, (* i.e., whole file *) >? > l_pid := 0 }; (* don't care *) >? > BEGIN >? > flock.l_start := start; >? > flock.l_len := len; >? > IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 >? > THEN >? > IF Uerror.errno = Uerror.EACCES OR >? > Uerror.errno = Uerror.EAGAIN THEN >? > RETURN FALSE >? > END; >? > OSErrorPosix.Raise() >? > END; >? > RETURN TRUE >? > END FilePartLock; >? > >? > >? > >? > Regards, >? > >? > Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jul 14 10:27:23 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 14 Jul 2012 08:27:23 +0000 Subject: [M3devel] fcntl last parameter int vs. pointer Message-ID: Thoughts on Unix__fcntl(int fd, int request, int arg) { ??? return fcntl(fd, request, arg); } vs. Unix__fcntl(int fd, int request, INTEGER arg) { ??? return fcntl(fd, request, arg); } where int is 32bits and INTEGER is exactly the same size as a pointer. Will it "just work" if I change it? arg is sometimes a pointer, sometimes an integer, maybe sometimes other? Ok, let's assume 32bit integer and 32bit or 64bit pointer are the only possibilities. Are there calling conventions that care? And will pass the parameter differently/wrong? Do any calling conventions pack multiple smaller-than-64bit parameters into one 64bit register? I'm *guessing* no. I guess, as well, I can experiment with a few... ?- Jay From dabenavidesd at yahoo.es Sat Jul 14 17:31:36 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sat, 14 Jul 2012 16:31:36 +0100 (BST) Subject: [M3devel] fcntl last parameter int vs. pointer In-Reply-To: Message-ID: <1342279896.71297.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: In fact both C and Modula-3 don't allow a signature change, original C compiler and decompiler type check the signature, although type casting is possible in both languages: http://web.cs.mun.ca/~ulf/pld/mocplus.html However, when talking about a functional language you can override as in Baby Modula-3 the type at instantiation time for methods and values, so I guess you can sort of relax the strict rules in that relation of the two object function types (to make it a subtype): http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf see p. 10 - S. 3.2.4 - Discussion I just know they could make it work, but it was very hard complex system. Thanks in advance --- El s?b, 14/7/12, Jay K escribi?: De: Jay K Asunto: [M3devel] fcntl last parameter int vs. pointer Para: "m3devel" Fecha: s?bado, 14 de julio, 2012 03:27 Thoughts on Unix__fcntl(int fd, int request, int arg) { return fcntl(fd, request, arg); } vs. Unix__fcntl(int fd, int request, INTEGER arg) { return fcntl(fd, request, arg); } where int is 32bits and INTEGER is exactly the same size as a pointer. Will it "just work" if I change it? arg is sometimes a pointer, sometimes an integer, maybe sometimes other? Ok, let's assume 32bit integer and 32bit or 64bit pointer are the only possibilities. Are there calling conventions that care? And will pass the parameter differently/wrong? Do any calling conventions pack multiple smaller-than-64bit parameters into one 64bit register? I'm *guessing* no. I guess, as well, I can experiment with a few... - Jay ??? ???????? ?????? ??? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Sat Jul 14 22:05:57 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Sat, 14 Jul 2012 15:05:57 -0500 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> <20120627015457.238041A205B@async.async.caltech.edu> Message-ID: <5001D125.6020704@lcwb.coop> On 06/27/2012 02:58 AM, Dirk Muysers wrote: > Some time ago I have started to develop a unicode library based > on the old M3 text model but using UTF-8 internally rather than > Latin-1 (see README attachement). For reasons best known to > me I had to put it on the backburner in favour of more urgent work. > If anybody is interested in furthering this solution I would eagerly > give the existing (pre-alpha) code away. > This being said, there are certainly better hash algorithms than the > one used by m3core (eg Goullburn, see > http://www.clockandflame.com/media/Goulburn06.pdf). > > And: 1. Properties This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as TYPE Rune = [0..16_10FFFF], but unfortunately not all values in the code-point range are valid and others are left undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to contain any invalid or undefined Rune. I don't understand the reasoning here. Your criticism of the subrange type is that it contains invalid values between the bounds, which you address with dynamic value checks inside the library code. But why eliminate the subrange and changing the type to an integer? It only drastically increases the number of invalid values, by a factor of over 2^11 times, if integer is 32-bit, otherwise more. And it demotes the status of these from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even partial level of confidence. It also precludes storing them in less than 64 bits on a 64-bit machine. Am I missing something? From jay.krell at cornell.edu Sun Jul 15 03:11:26 2012 From: jay.krell at cornell.edu (Jay) Date: Sat, 14 Jul 2012 18:11:26 -0700 Subject: [M3devel] fcntl last parameter int vs. pointer In-Reply-To: <1342279896.71297.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1342279896.71297.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: Daniel your replies are pointless. You have exhausted my patience. - Jay (briefly/pocket-sized-computer-aka-phone) On Jul 14, 2012, at 8:31 AM, "Daniel Alejandro Benavides D." wrote: > Hi all: > In fact both C and Modula-3 don't allow a signature change, original C compiler and decompiler type check the signature, although type casting is possible in both languages: > http://web.cs.mun.ca/~ulf/pld/mocplus.html > > However, when talking about a functional language you can override as in Baby Modula-3 the type at instantiation time for methods and values, so I guess you can sort of relax the strict rules in that relation of the two object function types (to make it a subtype): > http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf > > see p. 10 - S. 3.2.4 - Discussion > > I just know they could make it work, but it was very hard complex system. > > Thanks in advance > > > --- El s?b, 14/7/12, Jay K escribi?: > > De: Jay K > Asunto: [M3devel] fcntl last parameter int vs. pointer > Para: "m3devel" > Fecha: s?bado, 14 de julio, 2012 03:27 > > > Thoughts on > > Unix__fcntl(int fd, int request, int arg) > { > return fcntl(fd, request, arg); > } > > vs. > > Unix__fcntl(int fd, int request, INTEGER arg) > { > > return fcntl(fd, request, arg); > > } > > > > where int is 32bits and INTEGER is exactly the same size as a pointer. > > > Will it "just work" if I change it? > arg is sometimes a pointer, sometimes an integer, maybe sometimes other? > Ok, let's assume 32bit integer and 32bit or 64bit pointer are the only possibilities. > Are there calling conventions that care? And will pass the parameter differently/wrong? > > > Do any calling conventions pack multiple smaller-than-64bit parameters into one 64bit register? > > > I'm *guessing* no. > I guess, as well, I can experiment with a few... > > > > - Jay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Sun Jul 15 03:28:59 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 15 Jul 2012 02:28:59 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <5001D125.6020704@lcwb.coop> Message-ID: <1342315739.35223.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: making changes of type representation of TEXT in text makes no sense, it's a violation of the Text abstraction http://books.google.com.co/books?id=FbemcUFa0JIC&pg=PA303&dq=source=bl&ots=POGbXUhcW1&sig=WULSpZ74yYU30s-cZ2zhtNTByd8&hl=en&redir_esc=y#v=onepage&q&f=false you are in fact claiming that the default type of CHAR is Latin-1, I don't get that because you type extend CHAR and say it's not default, there is something bad, I'm some how suspicious about this need to type in *different* two ranges for receiving a character in one type script and one in another, essentially meaning that the language is wrong in declaring TEXT as a opaque type and should use both kind of strings always or worse the non-default type, which is naturally impossible. Sorry guys, but I'm not agreeing with you in this one, I hope you make the best of CM3 work or leave alone the package a la DEC-SRC. If you are thinking in widening the TEXT string package make it polymorphic it doesn't add complexity burden to the language, it explains better the CHAR type and its extension but do it naturally using the language types, don't create your own one with only that purpose. THanks? in advance ? --- El s?b, 14/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: m3devel at elegosoft.com Fecha: s?bado, 14 de julio, 2012 15:05 On 06/27/2012 02:58 AM, Dirk Muysers wrote: > Some time ago I have started to develop a unicode library based > on the old M3 text model but using UTF-8 internally rather than > Latin-1 (see README attachement). For reasons best known to > me I had to put it on the backburner in favour of more urgent work. > If anybody is interested in furthering this solution I would eagerly > give the existing (pre-alpha) code away. > This being said, there are certainly better hash algorithms than the > one used by m3core (eg Goullburn, see > http://www.clockandflame.com/media/Goulburn06.pdf). > > And: 1. Properties This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as TYPE Rune = [0..16_10FFFF], but? unfortunately not all values in the code-point range are valid and others are left undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to contain any invalid or undefined Rune. I don't understand the reasoning here.? Your criticism of the subrange type is that it contains invalid values between the bounds, which you address with dynamic value checks inside the library code.? But why eliminate the subrange and changing the type to an integer?? It only drastically increases the number of invalid values, by a factor of over 2^11 times, if integer is 32-bit, otherwise more.? And it demotes the status of these from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even partial level of confidence.? It also precludes storing them in less than 64 bits on a 64-bit machine. Am I missing something? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Sun Jul 15 03:44:36 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 15 Jul 2012 02:44:36 +0100 (BST) Subject: [M3devel] fcntl last parameter int vs. pointer In-Reply-To: Message-ID: <1342316676.56405.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: I'm sorry for you That I didn't exampled my self my point (perhaps I'm being too abstract for this point), but if you cared to tell all that I will say it more openly: Doing that type conversion as the first url says (look third row at the beginning a. literal) http://web.cs.mun.ca/~ulf/pld/mocplus.html#subclassing You will break the modular safety. However I'm telling you that one can make such an abstraction in Modula-3 (in Baby sized language) with functional programming making obeying subtype fcntl1 <: fcntl2, of course Jay I suppose your fcntl1 is badly signed, am I right? OK, I hope I'm being clearer. Thanks for the patience of all of that, in advance --- El s?b, 14/7/12, Jay escribi?: De: Jay Asunto: Re: [M3devel] fcntl last parameter int vs. pointer Para: "Daniel Alejandro Benavides D." CC: "m3devel" , "Jay K" Fecha: s?bado, 14 de julio, 2012 20:11 Daniel your replies are pointless. You have exhausted my patience. ?- Jay (briefly/pocket-sized-computer-aka-phone) On Jul 14, 2012, at 8:31 AM, "Daniel Alejandro Benavides D." wrote: Hi all: In fact both C and Modula-3 don't allow a signature change, original C compiler and decompiler type check the signature, although type casting is possible in both languages: http://web.cs.mun.ca/~ulf/pld/mocplus.html However, when talking about a functional language you can override as in Baby Modula-3 the type at instantiation time for methods and values, so I guess you can sort of relax the strict rules in that relation of the two object function types (to make it a subtype): http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf see p. 10 - S. 3.2.4 - Discussion I just know they could make it work, but it was very hard complex system. Thanks in advance --- El s?b, 14/7/12, Jay K escribi?: De: Jay K Asunto: [M3devel] fcntl last parameter int vs. pointer Para: "m3devel" Fecha: s?bado, 14 de julio, 2012 03:27 Thoughts on Unix__fcntl(int fd, int request, int arg) { return fcntl(fd, request, arg); } vs. Unix__fcntl(int fd, int request, INTEGER arg) { return fcntl(fd, request, arg); } where int is 32bits and INTEGER is exactly the same size as a pointer. Will it "just work" if I change it? arg is sometimes a pointer, sometimes an integer, maybe sometimes other? Ok, let's assume 32bit integer and 32bit or 64bit pointer are the only possibilities. Are there calling conventions that care? And will pass the parameter differently/wrong? Do any calling conventions pack multiple smaller-than-64bit parameters into one 64bit register? I'm *guessing* no. I guess, as well, I can experiment with a few... - Jay ??? ???????? ?????? ??? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Sun Jul 15 10:13:35 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Sun, 15 Jul 2012 10:13:35 +0200 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <5001D125.6020704@lcwb.coop> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org><20120626181955.GB29355@topoi.pooq.com><20120627015457.238041A205B@async.async.caltech.edu> <5001D125.6020704@lcwb.coop> Message-ID: My reasoning here was a pragmatic rather than a type-theoretical one. A rune defined as an integer can be freely passed around, while as a subrange it undergoes a hidden range check at every assignment. Now that range check wouldn't buy me anything, since the validation of a rune entails more than a simple range check and remains unavoidable in order to ensure the postcondition of pure Unicode in any text. -------------------------------------------------- From: "Rodney M. Bates" Sent: Saturday, July 14, 2012 10:05 PM To: Subject: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > > > On 06/27/2012 02:58 AM, Dirk Muysers wrote: >> Some time ago I have started to develop a unicode library based >> on the old M3 text model but using UTF-8 internally rather than >> Latin-1 (see README attachement). For reasons best known to >> me I had to put it on the backburner in favour of more urgent work. >> If anybody is interested in furthering this solution I would eagerly >> give the existing (pre-alpha) code away. >> This being said, there are certainly better hash algorithms than the >> one used by m3core (eg Goullburn, see >> http://www.clockandflame.com/media/Goulburn06.pdf). >> >> > And: > > > 1. Properties > > This part deals with properties of Unicode code-points/characters. We call > Unicode code-points "runes" for brevity. > Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode > specification. We could have defined a Rune as > TYPE Rune = [0..16_10FFFF], but unfortunately not all values in the > code-point range are valid and others are left > undefined, so a "Rune" is defined as an integer. The library uses > defensive programming by not allowing a string to > contain any invalid or undefined Rune. > > I don't understand the reasoning here. Your criticism of the subrange > type is that it contains invalid values > between the bounds, which you address with dynamic value checks inside the > library code. But why eliminate the > subrange and changing the type to an integer? It only drastically > increases the number of invalid values, > by a factor of over 2^11 times, if integer is 32-bit, otherwise more. And > it demotes the status of these > from statically-detected, in one compile, to dynamically-detected, > requiring massive testing to get an even > partial level of confidence. It also precludes storing them in less than > 64 bits on a 64-bit machine. > > Am I missing something? > From dabenavidesd at yahoo.es Sun Jul 15 15:14:51 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 15 Jul 2012 14:14:51 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: Message-ID: <1342358091.65493.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: wouldn't be pragmas the best solution here, making them inlining of TEXT type as some representation specific character type, still not making the language obey rules that aren't inherently correct, by that I mean, CHARs are what they are and string of CHARs values are compatible in current implementation just that it doesn't care too much to validate when one character or another is in typed. Thanks in advance --- El dom, 15/7/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "Rodney M. Bates" CC: m3devel at elegosoft.com Fecha: domingo, 15 de julio, 2012 03:13 My reasoning here was a pragmatic rather than a type-theoretical one. A rune defined as an integer can be freely passed around, while as a subrange it undergoes a hidden range check at every assignment. Now that range check wouldn't buy me anything, since the validation of a rune entails more than a simple range check and remains unavoidable in order to ensure the postcondition of pure Unicode in any text. -------------------------------------------------- From: "Rodney M. Bates" Sent: Saturday, July 14, 2012 10:05 PM To: Subject: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > > > On 06/27/2012 02:58 AM, Dirk Muysers wrote: >> Some time ago I have started to develop a unicode library based >> on the old M3 text model but using UTF-8 internally rather than >> Latin-1 (see README attachement). For reasons best known to >> me I had to put it on the backburner in favour of more urgent work. >> If anybody is interested in furthering this solution I would eagerly >> give the existing (pre-alpha) code away. >> This being said, there are certainly better hash algorithms than the >> one used by m3core (eg Goullburn, see >> http://www.clockandflame.com/media/Goulburn06.pdf). >> >> > And: > > > 1. Properties > > This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. > Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as > TYPE Rune = [0..16_10FFFF], but? unfortunately not all values in the code-point range are valid and others are left > undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to > contain any invalid or undefined Rune. > > I don't understand the reasoning here.? Your criticism of the subrange type is that it contains invalid values > between the bounds, which you address with dynamic value checks inside the library code.? But why eliminate the > subrange and changing the type to an integer?? It only drastically increases the number of invalid values, > by a factor of over 2^11 times, if integer is 32-bit, otherwise more.? And it demotes the status of these > from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even > partial level of confidence.? It also precludes storing them in less than 64 bits on a 64-bit machine. > > Am I missing something? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Sun Jul 15 18:22:48 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Sun, 15 Jul 2012 11:22:48 -0500 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <1342315739.35223.YahooMailClassic@web29705.mail.ird.yahoo.com> References: <1342315739.35223.YahooMailClassic@web29705.mail.ird.yahoo.com> Message-ID: <5002EE58.6010401@lcwb.coop> On 07/14/2012 08:28 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > making changes of type representation of TEXT in text makes no sense, it's a violation of the Text abstraction No, I disagree here. A primary property of an abstraction is that clients can use it _without_ knowledge of the internal representation. The representation can be changed without altering the behavior of any program that uses the abstraction. A program that imports representation-dependent interfaces such as TextRep.i3 is an exception, but doing so means it known abstraction violator, from the beginning. > http://books.google.com.co/books?id=FbemcUFa0JIC&pg=PA303&dq=source=bl&ots=POGbXUhcW1&sig=WULSpZ74yYU30s-cZ2zhtNTByd8&hl=en&redir_esc=y#v=onepage&q&f=false > > you are in fact claiming that the default type of CHAR is Latin-1, I don't get that because you type extend CHAR and say it's not default, there is something bad, I'm some how suspicious about this need to type in *different* two ranges for receiving a character in one type script and one in another, essentially meaning that the language is wrong in declaring TEXT as a opaque type and should use both kind of strings always or worse the non-default type, which is naturally impossible. I'm not sure what you are saying here. The language does clearly say that CHAR contains (at least) ISO-Latin-1. But I am not proposing to extend CHAR beyond exactly ISO-latin-1, as it is in every implementation of Modula-3. This is because I am sure doing so would break a large amount of existing code. Such code assumes that BYTESIZE(CHAR)=1. I _am_ proposing to extend WIDECHAR to hold Unicode. WIDECHAR was added with this in mind, but today, it fails because its range is too limited. I think probably WIDECHAR was added at a time when only 2^16 code points were in the standard(s). But that has changed. This is a very simple fix of that. As for TEXT, the CM3 version is and always was abstract a string of WIDECHAR. The procedures that have parameters of type CHAR just do the widening or narrowing at the time a character is passed in or out. The fact that the current representation holds some characters in 8-bit array elements is hidden by the Text abstraction, and can be changed if convenient. In contrast, Wr/Rd and friends do not hide character representations in the stream. This is as it must be, and I am proposing only to add additional representations that they can handle, and make it convenient for the usual case that an entire stream uses the same representation of characters. > Sorry guys, but I'm not agreeing with you in this one, I hope you make the best of CM3 work or leave alone the package a la DEC-SRC. > If you are thinking in widening the TEXT string package make it polymorphic it doesn't add complexity burden to the language, it explains better the CHAR type and its extension but do it naturally using the language types, don't create your own one with only that purpose. > THanks in advance > > > > > --- El *s?b, 14/7/12, Rodney M. Bates //* escribi?: > > > De: Rodney M. Bates > Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > Para: m3devel at elegosoft.com > Fecha: s?bado, 14 de julio, 2012 15:05 > > > > On 06/27/2012 02:58 AM, Dirk Muysers wrote: > > Some time ago I have started to develop a unicode library based > > on the old M3 text model but using UTF-8 internally rather than > > Latin-1 (see README attachement). For reasons best known to > > me I had to put it on the backburner in favour of more urgent work. > > If anybody is interested in furthering this solution I would eagerly > > give the existing (pre-alpha) code away. > > This being said, there are certainly better hash algorithms than the > > one used by m3core (eg Goullburn, see > > http://www.clockandflame.com/media/Goulburn06.pdf). > > > > > And: > > > 1. Properties > > This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. > Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as > TYPE Rune = [0..16_10FFFF], but unfortunately not all values in the code-point range are valid and others are left > undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to > contain any invalid or undefined Rune. > > I don't understand the reasoning here. Your criticism of the subrange type is that it contains invalid values > between the bounds, which you address with dynamic value checks inside the library code. But why eliminate the > subrange and changing the type to an integer? It only drastically increases the number of invalid values, > by a factor of over 2^11 times, if integer is 32-bit, otherwise more. And it demotes the status of these > from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even > partial level of confidence. It also precludes storing them in less than 64 bits on a 64-bit machine. > > Am I missing something? > From mika at async.caltech.edu Sun Jul 15 19:39:11 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Sun, 15 Jul 2012 10:39:11 -0700 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org><20120626181955.GB29355@topoi.pooq.com><20120627015457.238041A205B@async.async.caltech.edu> <5001D125.6020704@lcwb.coop> Message-ID: <20120715173911.D18A61A208F@async.async.caltech.edu> I believe the compilers in existence are smart enough not to insert the range check when the types are the same on both sides of the :=. At least for copying... i.e., a, b : WIDECHAR; BEGIN a := b END should not imply a range check. With the types in question, that is probably by far the most common operation, too. Mika "Dirk Muysers" writes: >My reasoning here was a pragmatic rather than a type-theoretical one. >A rune defined as an integer can be freely passed around, while as >a subrange it undergoes a hidden range check at every assignment. >Now that range check wouldn't buy me anything, since the validation >of a rune entails more than a simple range check and remains unavoidable >in order to ensure the postcondition of pure Unicode in any text. From dabenavidesd at yahoo.es Mon Jul 16 03:53:00 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 16 Jul 2012 02:53:00 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <5002EE58.6010401@lcwb.coop> Message-ID: <1342403580.18580.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: Yes, I was referring to native module clients (not C, nor anything else modules) to be able to change rep, is a violation. About the problem of type, is that REF CHAR is a value space strictly more than Latin-1, so this is what I mean, encoding in one type or another must be determined by its subexpressions not by defaults like TEXT type, this is what I mean, width subtyping refers to add some value range as you say may or may be not in the same range of Unicode then it must be called WIDECHAR, you can't call it UCHAR etc, it misses the point of abstraction here, if so, how many types, we would want, 20, 30 according to the bit ending please give a break, we are not C doers, and if we are then call them in your libraries we don't need to contaminate us, sorry I'm not telling that you are being noisy but this certainly could be that (also me). Rodney, please correct me when I say something wrong but are you saying that you will start to put in every interface procedures and stuff to convert oh no, sorry; I hope I'm not that guy converting because somebody needed an extra interface to code some language, it will be a real mess. Thanks in advance --- El dom, 15/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "Daniel Alejandro Benavides D." CC: m3devel at elegosoft.com Fecha: domingo, 15 de julio, 2012 11:22 On 07/14/2012 08:28 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > making changes of type representation of TEXT in text makes no sense, it's a violation of the Text abstraction No, I disagree here.? A primary property of an abstraction is that clients can use it _without_ knowledge of the internal representation.? The representation can be changed without altering the behavior of any program that uses the abstraction.? A program that imports representation-dependent interfaces such as TextRep.i3 is an exception, but doing so means it known abstraction violator, from the beginning. > http://books.google.com.co/books?id=FbemcUFa0JIC&pg=PA303&dq=source=bl&ots=POGbXUhcW1&sig=WULSpZ74yYU30s-cZ2zhtNTByd8&hl=en&redir_esc=y#v=onepage&q&f=false > > you are in fact claiming that the default type of CHAR is Latin-1, I don't get that because you type extend CHAR and say it's not default, there is something bad, I'm some how suspicious about this need to type in *different* two ranges for receiving a character in one type script and one in another, essentially meaning that the language is wrong in declaring TEXT as a opaque type and should use both kind of strings always or worse the non-default type, which is naturally impossible. I'm not sure what you are saying here.? The language does clearly say that CHAR contains (at least) ISO-Latin-1. But I am not proposing to extend CHAR beyond exactly ISO-latin-1, as it is in every implementation of Modula-3. This is because I am sure doing so would break a large amount of existing code.? Such code assumes that BYTESIZE(CHAR)=1. I _am_ proposing to extend WIDECHAR to hold Unicode.? WIDECHAR was added with this in mind, but today, it fails because its range is too limited.? I think probably WIDECHAR was added at a time when only 2^16 code points were in the standard(s).? But that has changed.? This is a very simple fix of that. As for TEXT, the CM3 version is and always was abstract a string of WIDECHAR.? The procedures that have parameters of type CHAR just do the widening or narrowing at the time a character is passed in or out. The fact that the current representation holds some characters in 8-bit array elements is hidden by the Text abstraction, and can be changed if convenient. In contrast, Wr/Rd and friends do not hide character representations in the stream.? This is as it must be, and I am proposing only to add additional representations that they can handle, and make it convenient for the usual case that an entire stream uses the same representation of characters. > Sorry guys, but I'm not agreeing with you in this one, I hope you make the best of CM3 work or leave alone the package a la DEC-SRC. > If you are thinking in widening the TEXT string package make it polymorphic it doesn't add complexity burden to the language, it explains better the CHAR type and its extension but do it naturally using the language types, don't create your own one with only that purpose. > THanks? in advance > > > > > --- El *s?b, 14/7/12, Rodney M. Bates //* escribi?: > > >? ???De: Rodney M. Bates >? ???Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! >? ???Para: m3devel at elegosoft.com >? ???Fecha: s?bado, 14 de julio, 2012 15:05 > > > >? ???On 06/27/2012 02:58 AM, Dirk Muysers wrote: >? ? ? > Some time ago I have started to develop a unicode library based >? ? ? > on the old M3 text model but using UTF-8 internally rather than >? ? ? > Latin-1 (see README attachement). For reasons best known to >? ? ? > me I had to put it on the backburner in favour of more urgent work. >? ? ? > If anybody is interested in furthering this solution I would eagerly >? ? ? > give the existing (pre-alpha) code away. >? ? ? > This being said, there are certainly better hash algorithms than the >? ? ? > one used by m3core (eg Goullburn, see >? ? ? > http://www.clockandflame.com/media/Goulburn06.pdf). >? ? ? > >? ? ? > >? ???And: > > >? ???1. Properties > >? ???This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. >? ???Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as >? ???TYPE Rune = [0..16_10FFFF], but? unfortunately not all values in the code-point range are valid and others are left >? ???undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to >? ???contain any invalid or undefined Rune. > >? ???I don't understand the reasoning here.? Your criticism of the subrange type is that it contains invalid values >? ???between the bounds, which you address with dynamic value checks inside the library code.? But why eliminate the >? ???subrange and changing the type to an integer?? It only drastically increases the number of invalid values, >? ???by a factor of over 2^11 times, if integer is 32-bit, otherwise more.? And it demotes the status of these >? ???from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even >? ???partial level of confidence.? It also precludes storing them in less than 64 bits on a 64-bit machine. > >? ???Am I missing something? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Mon Jul 16 18:45:58 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Mon, 16 Jul 2012 11:45:58 -0500 Subject: [M3devel] New OrdSets generic package Message-ID: <50044546.4010202@lcwb.coop> Now checked in inside m3-libs/ordsets, OrdSets is a generic interface and module for dynamically-sized sets of large-range ordinal types, in functional style. From comments in OrdSets.ig: (* This interface provides operations on sets whose members are of an ordinal type. It is written in a functional style. It never mutates a set value, (except for some internal lazy computation--not visible to clients), and thus it sometimes is able to share heap objects. Its primary use pattern is where the set values can have widely varying sizes, you want a very large maximum size limit, but many of the sets are expected to be much smaller than the maximum. For this to happen, you probably want to instantiate only with INTEGER or WIDECHAR. It will work with LONGINT, but only if its target-machine- dependent range is a subrange of INTEGER. There is no space or time performance benefit to instantiating with a subrange of the base type. If this does not fit your needs, you probably want to use Modula-3's builtin set type, or some other package. The set representations occupy variable-sized heap objects, just sufficient for the set value. In the most general case, these use heap-allocated open arrays of machine words, with one bit per actual set member, plus some overhead, of course. If you compile with a later CM3 Modula-3 compiler and garbage collector that tolerate misaligned "pseudo" pointers, i.e, with the least significant bit set to one, you can set a boolean constant in the corresponding module OrdSets.mg. This will cause it to utilize this Modula-3 implementation feature to store sufficiently small set values entirely within the pointer word, avoiding the high space and time overheads of heap allocation. The CM3 5-8 compiler is sufficient. SRC M3, PM3, EZM3, and earlier CM3 versions are not. As of 2012-7-15, Pickles do not handle these. Enable this with DoPseudoPointers, in OrdSets.mg. *) From dabenavidesd at yahoo.es Thu Jul 19 17:02:21 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 19 Jul 2012 16:02:21 +0100 (BST) Subject: [M3devel] About a new AMD64 binary Message-ID: <1342710141.21612.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: I'm writing to ask whether .deb produced file(s) is(are) available somehow, to install on AMD64_LINUX Hendrik do you have a copy of yourself, right? Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Sun Jul 1 02:39:57 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sat, 30 Jun 2012 20:39:57 -0400 Subject: [M3devel] License compatibility Message-ID: <20120701003957.GA12807@topoi.pooq.com> I've heard, ages ago, that the SRC was not considered compatible with the GPL. I'd really like to know if this is true. Not whether it should be compatible, not whether people were afraid of it being incompatible... not whether some people think it's cmopatible, but whether it *is* compatible. Has anyone ever got a definitive answer to this question? If not, should I ask the FSF explicitly? -- hendrik From dabenavidesd at yahoo.es Sun Jul 1 04:27:24 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 1 Jul 2012 03:27:24 +0100 (BST) Subject: [M3devel] License compatibility In-Reply-To: <20120701003957.GA12807@topoi.pooq.com> Message-ID: <1341109644.19208.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: for me the question is, what kind of license they apply to GPL code for being compatible with us. They did an attempt for the Code Generator Interface, but DEC didn't release for thinking releasing it in some hardware way. Same happened with GPM2 from HP U-code interface, non-disclosure policy agreement negotiation. Thanks in advance --- El s?b, 30/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: [M3devel] License compatibility Para: m3devel at elegosoft.com Fecha: s?bado, 30 de junio, 2012 19:39 I've heard, ages ago, that the SRC was not considered compatible with the GPL.? I'd really like to know if this is true.? Not whether it should be compatible, not whether people were afraid of it being incompatible... not whether some people think it's cmopatible, but whether it *is* compatible. Has anyone ever got a definitive answer to this question? If not, should I ask the FSF explicitly? -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Sun Jul 1 10:52:08 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 1 Jul 2012 10:52:08 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> Message-ID: <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> Text.Length(Dragi?a Duri?)= 15 out from: WITH me = W"Dragi?a Duri?" DO IO.Put("Text.Length(" & me & ")= " & Fmt.Int(Text.Length(me)) & "\n"); END; On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: > Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? From dabenavidesd at yahoo.es Sun Jul 1 18:27:03 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 1 Jul 2012 17:27:03 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> Message-ID: <1341160023.40330.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: it should be less than that, but for single character is right, the problem is that you can't define a wider character in the machine basically, so if your machine can't ... why assume it isn't like that? So bigger machines should have bigger/smaller pointer types (char sizes with byte pointer size or word address size) and change rapidly criteria and keep it like that for the mentioned actual real operation needs for which was designed with char hard-coded and pointer sizes in a lot of classes in Rd/Wr (RdRep, for instance) Thanks in advance --- El dom, 1/7/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Simple change to WIDECHAR type Para: "Mika Nystrom" CC: m3devel at elegosoft.com Fecha: domingo, 1 de julio, 2012 03:52 Text.Length(Dragi?a Duri?)= 15 out from: ? WITH me = W"Dragi?a Duri?" DO ? ? IO.Put("Text.Length(" & me & ")= " & Fmt.Int(Text.Length(me)) & "\n"); ? END; On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: > Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Sun Jul 1 19:39:57 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sun, 1 Jul 2012 13:39:57 -0400 Subject: [M3devel] License compatibility In-Reply-To: References: <20120701003957.GA12807@topoi.pooq.com> Message-ID: <20120701173957.GA8757@topoi.pooq.com> On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > Not compatible. FSF official. > > Sent from my iPhone So this presumably means it is impossible to distribute binary for any Modula 3 program that uses a GPL library even if you include source code. Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. Which means it's practically impossible to provide such a program to anyone that doesn't understand how to use a compiler, which is most Windows users. Or is there some wiggle room somewhere? -- hendrik > > On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > > > I've heard, ages ago, that the SRC was not considered compatible with > > the GPL. I'd really like to know if this is true. Not whether it > > should be compatible, not whether people were afraid of it being > > incompatible... not whether some people think it's cmopatible, but > > whether it *is* compatible. > > > > Has anyone ever got a definitive answer to this question? > > > > If not, should I ask the FSF explicitly? > > > > -- hendrik > > From hendrik at topoi.pooq.com Sun Jul 1 20:58:10 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sun, 1 Jul 2012 14:58:10 -0400 Subject: [M3devel] License compatibility In-Reply-To: References: <20120701003957.GA12807@topoi.pooq.com> <20120701173957.GA8757@topoi.pooq.com> Message-ID: <20120701185810.GA9416@topoi.pooq.com> On Sun, Jul 01, 2012 at 02:08:04PM -0400, Antony Hosking wrote: > I thought LGPL allowed binary linkage without infection. Only if the program is distributed in such a way that the user can relink it with updated versions of the LGPL library. I don't know if that's too much to ask of the typical dumb user I've postulated. Considering how I've had to recompile several m3 libraries just to go on using them with libXaw, it may indeed be too much to expect. Now I don't mind sending out source code. I'm concerned with the end user who minds receiving it. It would presumably be the Modula 3 libraries that pose the problem, I suppose. I'm not talking about the compiler itself, which is not part of my program or the libraries. I guess I'm concerned with the libraries one cannot do without, like libm3. FSF claims that the GPL3 is compatible with more free licensess than the GPL2. Is there a document somewhere that identifies just what the problem is with out license? -- hendrik > > Sent from my iPad > > On Jul 1, 2012, at 1:39 PM, Hendrik Boom wrote: > > > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > >> Not compatible. FSF official. > >> > >> Sent from my iPhone > > > > So this presumably means it is impossible to distribute binary for any > > Modula 3 program that uses a GPL library even if you include source code. > > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > > > Which means it's practically impossible to provide such a program to anyone > > that doesn't understand how to use a compiler, which is most Windows users. > > > > Or is there some wiggle room somewhere? > > > > -- hendrik > > > >> > >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > >> > >>> I've heard, ages ago, that the SRC was not considered compatible with > >>> the GPL. I'd really like to know if this is true. Not whether it > >>> should be compatible, not whether people were afraid of it being > >>> incompatible... not whether some people think it's cmopatible, but > >>> whether it *is* compatible. > >>> > >>> Has anyone ever got a definitive answer to this question? > >>> > >>> If not, should I ask the FSF explicitly? > >>> > >>> -- hendrik > >>> From dabenavidesd at yahoo.es Sun Jul 1 21:10:16 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 1 Jul 2012 20:10:16 +0100 (BST) Subject: [M3devel] License compatibility In-Reply-To: <20120701185810.GA9416@topoi.pooq.com> Message-ID: <1341169816.90454.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: technically, the CM J-V-M was binary compatible with Sun JVM, wasn't it? So in terms of binary compatibility CM3 is binary compatible with Sun JDK (I guess the only version they had), wasn't that the idea to port Java to Modula-3 easily? Ando so if you can link Sun JDK with Gcc I guess you can do it with CM3 at least technically. Thanks in advance --- El dom, 1/7/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] License compatibility Para: "m3devel at elegosoft.com" Fecha: domingo, 1 de julio, 2012 13:58 On Sun, Jul 01, 2012 at 02:08:04PM -0400, Antony Hosking wrote: > I thought LGPL allowed binary linkage without infection. Only if the program is distributed in such a way that the user can relink it with updated versions of the LGPL library.? I don't know if that's too much to ask of the typical dumb user I've postulated.? Considering how I've had to recompile several m3 libraries just to go on using them with libXaw, it may indeed be too much to expect. Now I don't mind sending out source code.? I'm concerned with the end user who minds receiving it. It would presumably be the Modula 3 libraries that pose the problem, I suppose.? I'm not talking about the compiler itself, which is not part of my program or the libraries.? I guess I'm concerned with the libraries one cannot do without, like libm3. FSF claims that the GPL3 is compatible with more free licensess than the GPL2. Is there a document somewhere that identifies just what the problem is with out license? -- hendrik > > Sent from my iPad > > On Jul 1, 2012, at 1:39 PM, Hendrik Boom wrote: > > > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > >> Not compatible.? FSF official. > >> > >> Sent from my iPhone > > > > So this presumably means it is impossible to distribute binary for any > > Modula 3 program that uses a GPL library even if you include source code. > > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > > > Which means it's practically impossible to provide such a program to anyone > > that doesn't understand how to use a compiler, which is most Windows users. > > > > Or is there some wiggle room somewhere? > > > > -- hendrik > > > >> > >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > >> > >>> I've heard, ages ago, that the SRC was not considered compatible with > >>> the GPL.? I'd really like to know if this is true.? Not whether it > >>> should be compatible, not whether people were afraid of it being > >>> incompatible... not whether some people think it's cmopatible, but > >>> whether it *is* compatible. > >>> > >>> Has anyone ever got a definitive answer to this question? > >>> > >>> If not, should I ask the FSF explicitly? > >>> > >>> -- hendrik > >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Sun Jul 1 21:15:35 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 1 Jul 2012 21:15:35 +0200 Subject: [M3devel] License compatibility In-Reply-To: <20120701173957.GA8757@topoi.pooq.com> References: <20120701003957.GA12807@topoi.pooq.com> <20120701173957.GA8757@topoi.pooq.com> Message-ID: <30814087-79E4-429B-B438-F86B3375F23D@m3w.org> GPL is not LGPL. No same restrictions apply. LGPL means you have to link LGPL library dynamically so your program will use system's current version, presumably updateable as update becomes available, regardless of your actions. For GPL libraries, you are probably right. On Jul 1, 2012, at 7:39 PM, Hendrik Boom wrote: > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: >> Not compatible. FSF official. >> >> Sent from my iPhone > > So this presumably means it is impossible to distribute binary for any > Modula 3 program that uses a GPL library even if you include source code. > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > Which means it's practically impossible to provide such a program to anyone > that doesn't understand how to use a compiler, which is most Windows users. > > Or is there some wiggle room somewhere? > > -- hendrik > >> >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: >> >>> I've heard, ages ago, that the SRC was not considered compatible with >>> the GPL. I'd really like to know if this is true. Not whether it >>> should be compatible, not whether people were afraid of it being >>> incompatible... not whether some people think it's cmopatible, but >>> whether it *is* compatible. >>> >>> Has anyone ever got a definitive answer to this question? >>> >>> If not, should I ask the FSF explicitly? >>> >>> -- hendrik >>> From hendrik at topoi.pooq.com Sun Jul 1 21:49:50 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sun, 1 Jul 2012 15:49:50 -0400 Subject: [M3devel] License compatibility In-Reply-To: <1341169816.90454.YahooMailClassic@web29704.mail.ird.yahoo.com> References: <20120701185810.GA9416@topoi.pooq.com> <1341169816.90454.YahooMailClassic@web29704.mail.ird.yahoo.com> Message-ID: <20120701194950.GA9673@topoi.pooq.com> On Sun, Jul 01, 2012 at 08:10:16PM +0100, Daniel Alejandro Benavides D. wrote: > Hi all: > technically, the CM J-V-M was binary compatible with Sun JVM, wasn't > it? So in terms of binary compatibility CM3 is binary compatible with > Sun JDK You're not trying to tell me that I could use CM3 and Sun JDK interchangably, are you? That would mean I can use the JDK to compile Modula 3 code. I have my doubts. > (I guess the only version they had), wasn't that the idea to > port Java to Modula-3 easily? Ando so if you can link Sun JDK with > Gcc I guess you can do it with CM3 at least technically. The question isn't whether we can link CM3 programs with gcc. THe question is whether we can distribute such linked programs. And that doesn't depend on the CM3 compiler as much as the CM3 run-time system. And it's not aa question of technical compatibility. It's a matter off license compatibility. And I suspet the only way we'll get *thst* to work is to write a new run-time system and new libraries that *are* built with a GPL-compatibble license. Or hope the whole issue goes away as free software drifts to freeer licenses and we no longer need any GPL libraries. -- hendrik > Thanks in advance > > --- El dom, 1/7/12, Hendrik Boom escribi?: > > De: Hendrik Boom > Asunto: Re: [M3devel] License compatibility > Para: "m3devel at elegosoft.com" > Fecha: domingo, 1 de julio, 2012 13:58 > > On Sun, Jul 01, 2012 at 02:08:04PM -0400, Antony Hosking wrote: > > I thought LGPL allowed binary linkage without infection. > > Only if the program is distributed in such a way that the user can relink it > with updated versions of the LGPL library.? I don't know if that's too > much to ask of the typical dumb user I've postulated.? Considering how > I've had to recompile several m3 libraries just to go on using them with > libXaw, it may indeed be too much to expect. > > Now I don't mind sending out source code.? I'm concerned with the end > user who minds receiving it. > > It would presumably be the Modula 3 libraries that pose the problem, I > suppose.? I'm not talking about the compiler itself, which is not part > of my program or the libraries.? I guess I'm concerned with the > libraries one cannot do without, like libm3. > > FSF claims that the GPL3 is compatible with more free licensess than the > GPL2. > > Is there a document somewhere that identifies just what the problem is > with out license? > > -- hendrik > > > > > Sent from my iPad > > > > On Jul 1, 2012, at 1:39 PM, Hendrik Boom wrote: > > > > > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > > >> Not compatible.? FSF official. > > >> > > >> Sent from my iPhone > > > > > > So this presumably means it is impossible to distribute binary for any > > > Modula 3 program that uses a GPL library even if you include source code. > > > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > > > > > Which means it's practically impossible to provide such a program to anyone > > > that doesn't understand how to use a compiler, which is most Windows users. > > > > > > Or is there some wiggle room somewhere? > > > > > > -- hendrik > > > > > >> > > >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > > >> > > >>> I've heard, ages ago, that the SRC was not considered compatible with > > >>> the GPL.? I'd really like to know if this is true.? Not whether it > > >>> should be compatible, not whether people were afraid of it being > > >>> incompatible... not whether some people think it's cmopatible, but > > >>> whether it *is* compatible. > > >>> > > >>> Has anyone ever got a definitive answer to this question? > > >>> > > >>> If not, should I ask the FSF explicitly? > > >>> > > >>> -- hendrik > > >>> From hosking at cs.purdue.edu Mon Jul 2 03:34:16 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Sun, 1 Jul 2012 21:34:16 -0400 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <20120630172401.DFE8E1A207C@async.async.caltech.edu> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> Message-ID: <0F25CB80-6844-48A4-B34C-C08670CF96F8@cs.purdue.edu> As far as I know, WIDECHAR was simply for the CM3 JVM to support Java char which is 16-bit. On Jun 30, 2012, at 1:24 PM, Mika Nystrom wrote: > > =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > ... >> >> Solution: >> =3D=3D=3D=3D=3D=3D >> >> * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >> GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >> hold unencoded Unicode characters in scalar values in our Modula-3 = >> programs, while preserving their properties. >> * Implement properties, relations and methods defined for Unicode. With = >> ASCII, numeric order is everything. With Unicode - it is not. This is = >> probably very big project but we can start somewhere, and let interested = >> parties build on it. Dirk Muysers did work in this regard already. >> * Whoever thinks we don't need this and our "tradition" and "legacy" are = >> important, please read this: = >> http://unicode.org/standard/WhatIsUnicode.html . >> >> dd > > Given what you have said about the near-uselessness of WIDECHAR, does anything > actually use it much? What breaks if it is redefined to be the same as, say, > INTEGER? (Or Word.T) > > CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if > that could go back to using the SRC data structures. For people who do stuff > like write VLSI design tools... (probably many other large-scale applications > would like it too). > > Mika From dabenavidesd at yahoo.es Mon Jul 2 04:51:35 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 2 Jul 2012 03:51:35 +0100 (BST) Subject: [M3devel] License compatibility In-Reply-To: <20120701194950.GA9673@topoi.pooq.com> Message-ID: <1341197495.89971.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: technically they were binary license compatibles, I see you take too hard what I say thanks, but don't think so hard about this. But in the need of that you can use the compiler type checking for Modula-3, so most of what you say is true, also if the compiler is compatible perhaps would be question for Eric Muller, who wrote parts of it, the nice thing about Modula-3 was that it was everything object oriented (which is what Java claims about its System). Thanks in advance --- El dom, 1/7/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] License compatibility Para: "m3devel at elegosoft.com" Fecha: domingo, 1 de julio, 2012 14:49 On Sun, Jul 01, 2012 at 08:10:16PM +0100, Daniel Alejandro Benavides D. wrote: > Hi all: > technically, the CM J-V-M was binary compatible with Sun JVM, wasn't > it? So in terms of binary compatibility CM3 is binary compatible with > Sun JDK You're not trying to tell me that I could use CM3 and Sun JDK interchangably, are you?? That would mean I can use the JDK to compile Modula 3 code.? I have my doubts. > (I guess the only version they had), wasn't that the idea to > port Java to Modula-3 easily?? Ando so if you can link Sun JDK with > Gcc I guess you can do it with CM3 at least technically. The question isn't whether we can link CM3 programs with gcc.? THe question is whether we can distribute such linked programs.? And that doesn't depend on the CM3 compiler as much as the CM3 run-time system. And it's not aa question of technical compatibility.? It's a matter off license compatibility.? And I suspet the only way we'll get *thst* to work is? to write a new run-time system and new libraries that *are* built with a GPL-compatibble license. Or hope the whole issue goes away as free software drifts to freeer licenses and we no longer need any GPL libraries. -- hendrik > Thanks in advance > > --- El dom, 1/7/12, Hendrik Boom escribi?: > > De: Hendrik Boom > Asunto: Re: [M3devel] License compatibility > Para: "m3devel at elegosoft.com" > Fecha: domingo, 1 de julio, 2012 13:58 > > On Sun, Jul 01, 2012 at 02:08:04PM -0400, Antony Hosking wrote: > > I thought LGPL allowed binary linkage without infection. > > Only if the program is distributed in such a way that the user can relink it > with updated versions of the LGPL library. I don't know if that's too > much to ask of the typical dumb user I've postulated. Considering how > I've had to recompile several m3 libraries just to go on using them with > libXaw, it may indeed be too much to expect. > > Now I don't mind sending out source code. I'm concerned with the end > user who minds receiving it. > > It would presumably be the Modula 3 libraries that pose the problem, I > suppose. I'm not talking about the compiler itself, which is not part > of my program or the libraries. I guess I'm concerned with the > libraries one cannot do without, like libm3. > > FSF claims that the GPL3 is compatible with more free licensess than the > GPL2. > > Is there a document somewhere that identifies just what the problem is > with out license? > > -- hendrik > > > > > Sent from my iPad > > > > On Jul 1, 2012, at 1:39 PM, Hendrik Boom wrote: > > > > > On Sat, Jun 30, 2012 at 08:45:17PM -0400, Antony Hosking wrote: > > >> Not compatible. FSF official. > > >> > > >> Sent from my iPhone > > > > > > So this presumably means it is impossible to distribute binary for any > > > Modula 3 program that uses a GPL library even if you include source code. > > > Because presumably the basic M3 run-time system is under the M3 license and therefore incompatible. > > > > > > Which means it's practically impossible to provide such a program to anyone > > > that doesn't understand how to use a compiler, which is most Windows users. > > > > > > Or is there some wiggle room somewhere? > > > > > > -- hendrik > > > > > >> > > >> On Jun 30, 2012, at 20:39, Hendrik Boom wrote: > > >> > > >>> I've heard, ages ago, that the SRC was not considered compatible with > > >>> the GPL. I'd really like to know if this is true. Not whether it > > >>> should be compatible, not whether people were afraid of it being > > >>> incompatible... not whether some people think it's cmopatible, but > > >>> whether it *is* compatible. > > >>> > > >>> Has anyone ever got a definitive answer to this question? > > >>> > > >>> If not, should I ask the FSF explicitly? > > >>> > > >>> -- hendrik > > >>> -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jul 2 10:09:43 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 2 Jul 2012 10:09:43 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <0F25CB80-6844-48A4-B34C-C08670CF96F8@cs.purdue.edu> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <0F25CB80-6844-48A4-B34C-C08670CF96F8@cs.purdue.edu> Message-ID: To be compatible, at least partially, with some other solution. Completeness of that other solution did not rub magically on cm3 just because they invented WIDECHAR as standard scalar type. On Jul 2, 2012, at 3:34 AM, Tony Hosking wrote: > As far as I know, WIDECHAR was simply for the CM3 JVM to support Java char which is 16-bit. > > On Jun 30, 2012, at 1:24 PM, Mika Nystrom wrote: > >> >> =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: >> ... >>> >>> Solution: >>> =3D=3D=3D=3D=3D=3D >>> >>> * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >>> GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >>> hold unencoded Unicode characters in scalar values in our Modula-3 = >>> programs, while preserving their properties. >>> * Implement properties, relations and methods defined for Unicode. With = >>> ASCII, numeric order is everything. With Unicode - it is not. This is = >>> probably very big project but we can start somewhere, and let interested = >>> parties build on it. Dirk Muysers did work in this regard already. >>> * Whoever thinks we don't need this and our "tradition" and "legacy" are = >>> important, please read this: = >>> http://unicode.org/standard/WhatIsUnicode.html . >>> >>> dd >> >> Given what you have said about the near-uselessness of WIDECHAR, does anything >> actually use it much? What breaks if it is redefined to be the same as, say, >> INTEGER? (Or Word.T) >> >> CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if >> that could go back to using the SRC data structures. For people who do stuff >> like write VLSI design tools... (probably many other large-scale applications >> would like it too). >> >> Mika > From rodney_bates at lcwb.coop Mon Jul 2 16:50:18 2012 From: rodney_bates at lcwb.coop (Rodney Bates) Date: Mon, 2 Jul 2012 07:50:18 -0700 Subject: [M3devel] UTF-8 TEXT Message-ID: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> -Rodney Bates --- antony.hosking at gmail.com wrote: From: Antony Hosking To: "Rodney M. Bates" Cc: "m3devel at elegosoft.com" Subject: Re: [M3devel] UTF-8 TEXT Date: Thu, 28 Jun 2012 10:37:36 -0400 Why not simply say that CHAR is an enumeration representing all of UTF-32? The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. We would need to translate the current Latin-1 literals into UTF-32. And we could simply have a new literal form for Unicode literals. This is almost what I would propose to do, with a couple of differences: Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. I am sure there is lots of existing code that depends on the implementation properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. Then I would define, in the language itself, that WIDECHAR is Unicode, not UTF-32. Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an implementation characteristic that BYTESIZE(WIDECHAR))=4. On Jun 27, 2012, at 10:12 PM, Rodney M. Bates wrote: > > > On 06/27/2012 07:32 PM, Antony Hosking wrote: >> So what do we do about 6-byte UTF-8 code points? They won't fit in WIDECHAR. Surely we should allow accessing a UTF-8 character as a CARDINAL and be done with it? >> > > Absolutely. Except I think a better way is to make WIDECHAR big enough to hold all of > Unicode. > >> Sent from my iPad >> >> On Jun 27, 2012, at 3:20 PM, "Rodney M. Bates" wrote: >> >>> >>> >>> On 06/26/2012 10:30 PM, Hendrik Boom wrote: >>>> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >>>>> I seem to recall that Rodney did some work a while back relating to TEXT. >>>>> Rodney, can you weigh in on some of this? >>>>> --Randy Coleburn >>>>> >>>>> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >>>>> Sent: Tuesday, June 26, 2012 12:46 PM >>>>> To: Jay >>>>> Cc: m3devel >>>>> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >>>>> >>>>> You had idea in other message. Store length! >>>>> >>>>> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. >>>> >>>> Most of the time, you don't need explicit integer indexes to character >>>> locations. What you do need is an operation that fetches a character >>>> given the string and its index (whatever data structure that index is), >>>> and one that increments the index past that character. As long as you >>>> can save an index and use it later on the same string, that's probably >>>> all you ever need. And with a simple TEXT representation (such as the >>>> obvious array of bytes containing characters of various widths) a byte >>>> index is all you need (note: NOT a character index). It's easy even to >>>> use TEXT and its integer indices as the data representation, as long as >>>> you use the proper functions parse the characters and increment the >>>> indices by amounts that might differ from 1. >>>> >>>> And if your source code is represented in UTF-8, the representation that >>>> requires little extra compiler effort to parse, your TEXT strings will >>>> automagically appear in UTF-8. >>> >>> The original designers of the language and its libraries have given us >>> two different abstractions for handling character strings (in addition >>> to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. >>> >>> Text is highly general and easy to use. Concatentations and substrings >>> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >>> Random access by *character* number is easy and, hopefully, implemented >>> with efficiency at least better than O(n). >>> >>> Wr and friends restrict you to sequential access, at least mostly, but >>> gain implementation convenience and efficiency as a result. >>> >>> I feel very stongly that we should *not* take away the full generality >>> of Text, especially efficient random access, to handle variable-length >>> character encodings in strings. For these, lets make more friends of >>> Wr and Rd, which already assume sequential access. For example, a >>> filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 >>> interpretation to its bytes, and delivers a stream of Unicode characters, >>> in variables of type WIDECHAR. >>> >>> Text should preserve the abstraction that it's a string of characters, >>> generalized as it already is in cm3, to have type WIDECHAR, so they can be any >>> Unicode character. The internal representation should, usually, not be >>> of concern. >>> >>> Note that nowhere in Text are character values transferred between >>> a Text.T and any form of I/O stream. In the Text abstraction, all >>> characters go in and out of a Text.T in variables of type CHAR, >>> WIDECHAR, and arrays thereof. IO, etc. is only done in streams, >>> e.g, TextWr. We can easily add new variants of these that encode/decode >>> by various rules. >>> >>> Of course, it is still valid to put a string of bytes in a Text.T and >>> apply, e.g., UTF-8 interpretation yourself. But that's lower-level >>> programming, and shouldn't confuse the abstraction. >>> >>>> >>>> I can see a use for various wide characters -- the things you extract >>>> from a TEXT by parsing biits of it, but none for anything >>>> really new complicated for wide TEXT. >>>> >>>> The only confusing thing is that the existing operations for extracting >>>> bytes from TEXT have names that suggest they are extracting characters. >>>> >>> >>> I think it's more than a suggestion. I think the abstraction clearly >>> considers them characters. And it should stay that way. If you want, >>> at a higher level of code, to treat them as bytes, that's fine, but the >>> abstraction continues to view them as characters (which only you, the >>> client, know is not really so.) >>> >>>> -- Hendrik >>>> >> From rodney_bates at lcwb.coop Mon Jul 2 17:04:25 2012 From: rodney_bates at lcwb.coop (Rodney Bates) Date: Mon, 2 Jul 2012 08:04:25 -0700 Subject: [M3devel] Simple change to WIDECHAR type Message-ID: <20120702080425.EEE2B81F@resin11.mta.everyone.net> -Rodney Bates --- dragisha at m3w.org wrote: From: Dragi?a Duri? To: Antony Hosking Cc: m3devel Subject: Re: [M3devel] Simple change to WIDECHAR type Date: Sat, 30 Jun 2012 09:33:00 +0200 Current GetChar/SetChars and GetWideChar/SetWideChars are not character-level access methods, in terms of Unicode. They are "byte-level", fixed width data accesses. Reason: Both CHAR (cardinality 2^8) and WIDECHAR (cardinality 2^16) based strings must use one or more characters to represent whole Unicode (cardinality 2^20). If we must encode in any case, then we don't have any benefit of WIDECHAR (as it is implemented/understood now) at all! To represent Unicode with either CHAR or WIDECHAR based TEXTs - we must use either UTF-8 or UTF-16. Both are one-to-multibyte encodings, encoding one Unicode character to either 1-4 CHARs or 1-2 WIDECHARs. What exactly is meaning (at Modula-3 usual levels of abstraction) of character-level access? Do we need whatever bit pattern physically happening at some location in our data's representation. Or maybe we need numerical representation of actual, visually distinguishable in written representation, Unicode character value? One from that set of 2^20 elements? What is meaning of Text.Sub() based on byte-level access operations where our resulting TEXTs first character is in fact a prefix of some Unicode characters encoding? And/or where our last character is invalid/incomplete suffix of some encoded character. Since when are fast and efficient operations doing something we don't need at all our priority? We are getting nothing at all with WIDECHAR. No. Single. Thing. WIDECHAR does not make us closer to Unicode at all. WIDECHAR, together with CHAR (in context of our current TEXT) makes two almost-solutions to Unicode problem and existence of WIDECHAR scalar type makes us a bit closer to Unicode almost-solution of C world and nothing else. ------------------------------------------------------------------------------------------------------------------------------------------- I think the only reason why we got nothing is that WIDECHAR isn't wide enough. Let's fix that. --------------------------------------------------------------------------------------------------------------------------------------- Currently, neither GetChar nor GetWideChar can get "a character at nth position". Reason: No character scalar type to keep any Unicode character. Solution: ====== * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can hold unencoded Unicode characters in scalar values in our Modula-3 programs, while preserving their properties. * Implement properties, relations and methods defined for Unicode. With ASCII, numeric order is everything. With Unicode - it is not. This is probably very big project but we can start somewhere, and let interested parties build on it. Dirk Muysers did work in this regard already. * Whoever thinks we don't need this and our "tradition" and "legacy" are important, please read this: http://unicode.org/standard/WhatIsUnicode.html . dd On Jun 29, 2012, at 5:52 PM, Dragi?a Duri? wrote: > That, or UTF-16 encoding on top of current WIDECHAR. > > On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > >> That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory. In other words, all TEXT containing WIDECHAR will double in size. >> >> On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: >> >>> m3front/src/builtinTypes/WCharr.m3, line: >>> >>> T := EnumType.New (16_10000, elts); >>> >>> to >>> >>> T := EnumType.New (16_100000, elts); >>> >>> Will this break things? Any other assumptions anywhere? >>> >> > From rodney_bates at lcwb.coop Mon Jul 2 17:09:25 2012 From: rodney_bates at lcwb.coop (Rodney Bates) Date: Mon, 2 Jul 2012 08:09:25 -0700 Subject: [M3devel] Some earlier work Message-ID: <20120702080925.EEE2BB96@resin11.mta.everyone.net> Hmm. This looks very much like original Text.i3, with CHAR replaced by UText.Char. Dare I infer that is was inspired that way? It presents just the abstraction that I think Text itself should present. -Rodney Bates --- dragisha at m3w.org wrote: From: Dragi?a Duri? To: m3devel Subject: [M3devel] Some earlier work Date: Sat, 30 Jun 2012 10:56:27 +0200 This is how we implemented UTF8 strings over current TEXTs. Current implementation is UNSAFE and uses glibc utf8 methods. Nothing too complicated and nothing we can't implemented in Modula-3/portable C. ===== INTERFACE UText; TYPE T = TEXT; Char = CARDINAL; PROCEDURE Cat(t, u: T): T; PROCEDURE Equal(t, u: T): BOOLEAN; PROCEDURE GetChar(t: T; i: CARDINAL): Char; PROCEDURE ByteSize(t: T): CARDINAL; PROCEDURE Length(t: T): CARDINAL; PROCEDURE Empty(t: T): BOOLEAN; PROCEDURE Sub(t: T; start: CARDINAL; length: CARDINAL := LAST(CARDINAL)): T; PROCEDURE SetChars(VAR a: ARRAY OF Char; t: T); PROCEDURE FromChar(ch: Char): T; PROCEDURE FromChars(READONLY a: ARRAY OF Char): T; PROCEDURE Hash(t: T): Word.T; PROCEDURE Compare(t1, t2: T): [-1..1]; PROCEDURE FindChar(t: T; ch: Char; start: CARDINAL := 0): INTEGER; PROCEDURE FindCharR(t: T; ch: Char; start: CARDINAL := LAST(INTEGER)): INTEGER; END UText. From dragisha at m3w.org Mon Jul 2 17:13:03 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 2 Jul 2012 17:13:03 +0200 Subject: [M3devel] Some earlier work In-Reply-To: <20120702080925.EEE2BB96@resin11.mta.everyone.net> References: <20120702080925.EEE2BB96@resin11.mta.everyone.net> Message-ID: <99FFC5CA-99A9-4E57-A41C-C82624123312@m3w.org> With Brand added, it is ready for generic containers from libm3. Yes, it was inspired by Text.i3. Idea was to make as thin an interface as possible. On Jul 2, 2012, at 5:09 PM, Rodney Bates wrote: > Hmm. This looks very much like original Text.i3, with CHAR replaced by UText.Char. > Dare I infer that is was inspired that way? It presents just the abstraction that > I think Text itself should present. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jul 2 17:27:56 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 2 Jul 2012 16:27:56 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: Message-ID: <1341242876.32584.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: I don't know if I would agree with the kind of thinking that Modula-3 needed CHAR and WIDECHAR for a JVM execution engine device, but for the interpretation function. For instance what would be the purpose of handling more than 140 CHARS in a mobile phone, I don't see the need for that, or if you need to target many languages is useful but in a compiler setting not in an execution environment like CM J-V-M For instance let's suppose you have a Win16 device and an IBM JVM ready hardware, would you need two types of char? Maybe but for efficiency reasons, not for anything more. I agree with WIDECHAR devices in the sense of a General purpose language is better than many language encodings but we need to see the devices for that, for instance mobile phones, etc. Normally JVM-ready phones. Thanks in advance --- El lun, 2/7/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Simple change to WIDECHAR type Para: "Tony Hosking" CC: m3devel at elegosoft.com Fecha: lunes, 2 de julio, 2012 03:09 To be compatible, at least partially, with some other solution. Completeness of that other solution did not rub magically on cm3 just because they invented WIDECHAR as standard scalar type. On Jul 2, 2012, at 3:34 AM, Tony Hosking wrote: > As far as I know, WIDECHAR was simply for the CM3 JVM to support Java char which is 16-bit. > > On Jun 30, 2012, at 1:24 PM, Mika Nystrom wrote: > >> >> =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: >> ... >>> >>> Solution: >>> =3D=3D=3D=3D=3D=3D >>> >>> * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >>> GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >>> hold unencoded Unicode characters in scalar values in our Modula-3 = >>> programs, while preserving their properties. >>> * Implement properties, relations and methods defined for? Unicode. With = >>> ASCII, numeric order is everything. With Unicode - it is not. This is = >>> probably very big project but we can start somewhere, and let interested = >>> parties build on it. Dirk Muysers did work in this regard already. >>> * Whoever thinks we don't need this and our "tradition" and "legacy" are = >>> important, please read this: = >>> http://unicode.org/standard/WhatIsUnicode.html . >>> >>> dd >> >> Given what you have said about the near-uselessness of WIDECHAR, does anything >> actually use it much?? What breaks if it is redefined to be the same as, say, >> INTEGER?? (Or Word.T) >> >> CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if >> that could go back to using the SRC data structures.? For people who do stuff >> like write VLSI design tools... (probably many other large-scale applications >> would like it too). >> >>? Mika > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hosking at cs.purdue.edu Mon Jul 2 17:57:14 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Mon, 2 Jul 2012 11:57:14 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> References: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> Message-ID: <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> On Jul 2, 2012, at 10:50 AM, Rodney Bates wrote: > > > -Rodney Bates > > --- antony.hosking at gmail.com wrote: > >> From: Antony Hosking >> To: "Rodney M. Bates" >> Cc: "m3devel at elegosoft.com" >> Subject: Re: [M3devel] UTF-8 TEXT >> Date: Thu, 28 Jun 2012 10:37:36 -0400 >> >> Why not simply say that CHAR is an enumeration representing all of UTF-32? >> The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. >> We would need to translate the current Latin-1 literals into UTF-32. >> And we could simply have a new literal form for Unicode literals. >> > This is almost what I would propose to do, with a couple of differences: > > Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. > I am sure there is lots of existing code that depends on the implementation > properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. Fair enough. Would we leave the encoding of CHAR as ISO-Latin-1? We?d still need translation from ISO-Latin-1 to UTF-8 wouldn?t we? > Then I would define, in the language itself, that WIDECHAR is Unicode, not > UTF-32. Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an > implementation characteristic that BYTESIZE(WIDECHAR))=4. I note this text from the Wikipedia entry for UTF-32: Though a fixed number of bytes per code point appear convenient, it is not as useful as it appears. It makes truncation easier but not significantly so compared to UTF-8 andUTF-16. It does not make it faster to find a particular offset in the string, as an "offset" can be measured in the fixed-size code units of any encoding. It does not make calculating the displayed width of a string easier except in limited cases, since even with a ?fixed width? font there may be more than one code point per character position (combining marks) or more than one character position per code point (for example CJK ideographs). Combining marks mean editors cannot treat one code point as being the same as one unit for editing. Editors that limit themselves to left-to-right languages and precomposed characters can take advantage of fixed-sized code units, but such editors are unlikely to support non-BMP characters and thus can work equally well with 16-bit UTF-16 encoding. Does this argue against WIDECHAR=UTF-32? Would we be better off simply saying WIDECHAR=UTF-16 and leaving things as they are? Yes, it would make the definition of WideCharAt a little odd, because the index would be defined in 16-bit units rather than UTF-16 glyph code-points. By the way, if we did change WIDECHAR to an enumeration containing 16_110000 elements then the stored (memory) size of WIDECHAR would be 4 bytes given the current CM3 implementation of enumerations, which chooses a (non-PACKED) stored size of 1/2/4/8 bytes depending on the number of elements. > > On Jun 27, 2012, at 10:12 PM, Rodney M. Bates wrote: > >> >> >> On 06/27/2012 07:32 PM, Antony Hosking wrote: >>> So what do we do about 6-byte UTF-8 code points? They won't fit in WIDECHAR. Surely we should allow accessing a UTF-8 character as a CARDINAL and be done with it? >>> >> >> Absolutely. Except I think a better way is to make WIDECHAR big enough to hold all of >> Unicode. >> >>> Sent from my iPad >>> >>> On Jun 27, 2012, at 3:20 PM, "Rodney M. Bates" wrote: >>> >>>> >>>> >>>> On 06/26/2012 10:30 PM, Hendrik Boom wrote: >>>>> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >>>>>> I seem to recall that Rodney did some work a while back relating to TEXT. >>>>>> Rodney, can you weigh in on some of this? >>>>>> --Randy Coleburn >>>>>> >>>>>> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >>>>>> Sent: Tuesday, June 26, 2012 12:46 PM >>>>>> To: Jay >>>>>> Cc: m3devel >>>>>> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >>>>>> >>>>>> You had idea in other message. Store length! >>>>>> >>>>>> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. >>>>> >>>>> Most of the time, you don't need explicit integer indexes to character >>>>> locations. What you do need is an operation that fetches a character >>>>> given the string and its index (whatever data structure that index is), >>>>> and one that increments the index past that character. As long as you >>>>> can save an index and use it later on the same string, that's probably >>>>> all you ever need. And with a simple TEXT representation (such as the >>>>> obvious array of bytes containing characters of various widths) a byte >>>>> index is all you need (note: NOT a character index). It's easy even to >>>>> use TEXT and its integer indices as the data representation, as long as >>>>> you use the proper functions parse the characters and increment the >>>>> indices by amounts that might differ from 1. >>>>> >>>>> And if your source code is represented in UTF-8, the representation that >>>>> requires little extra compiler effort to parse, your TEXT strings will >>>>> automagically appear in UTF-8. >>>> >>>> The original designers of the language and its libraries have given us >>>> two different abstractions for handling character strings (in addition >>>> to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. >>>> >>>> Text is highly general and easy to use. Concatentations and substrings >>>> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >>>> Random access by *character* number is easy and, hopefully, implemented >>>> with efficiency at least better than O(n). >>>> >>>> Wr and friends restrict you to sequential access, at least mostly, but >>>> gain implementation convenience and efficiency as a result. >>>> >>>> I feel very stongly that we should *not* take away the full generality >>>> of Text, especially efficient random access, to handle variable-length >>>> character encodings in strings. For these, lets make more friends of >>>> Wr and Rd, which already assume sequential access. For example, a >>>> filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 >>>> interpretation to its bytes, and delivers a stream of Unicode characters, >>>> in variables of type WIDECHAR. >>>> >>>> Text should preserve the abstraction that it's a string of characters, >>>> generalized as it already is in cm3, to have type WIDECHAR, so they can be any >>>> Unicode character. The internal representation should, usually, not be >>>> of concern. >>>> >>>> Note that nowhere in Text are character values transferred between >>>> a Text.T and any form of I/O stream. In the Text abstraction, all >>>> characters go in and out of a Text.T in variables of type CHAR, >>>> WIDECHAR, and arrays thereof. IO, etc. is only done in streams, >>>> e.g, TextWr. We can easily add new variants of these that encode/decode >>>> by various rules. >>>> >>>> Of course, it is still valid to put a string of bytes in a Text.T and >>>> apply, e.g., UTF-8 interpretation yourself. But that's lower-level >>>> programming, and shouldn't confuse the abstraction. >>>> >>>>> >>>>> I can see a use for various wide characters -- the things you extract >>>>> from a TEXT by parsing biits of it, but none for anything >>>>> really new complicated for wide TEXT. >>>>> >>>>> The only confusing thing is that the existing operations for extracting >>>>> bytes from TEXT have names that suggest they are extracting characters. >>>>> >>>> >>>> I think it's more than a suggestion. I think the abstraction clearly >>>> considers them characters. And it should stay that way. If you want, >>>> at a higher level of code, to treat them as bytes, that's fine, but the >>>> abstraction continues to view them as characters (which only you, the >>>> client, know is not really so.) >>>> >>>>> -- Hendrik >>>>> >>> > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Mon Jul 2 18:54:44 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Mon, 2 Jul 2012 12:54:44 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> References: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> Message-ID: <20120702165444.GA20908@topoi.pooq.com> On Mon, Jul 02, 2012 at 11:57:14AM -0400, Tony Hosking wrote: > > On Jul 2, 2012, at 10:50 AM, Rodney Bates wrote: > > > > > > > -Rodney Bates > > > > --- antony.hosking at gmail.com wrote: > > > >> From: Antony Hosking > >> To: "Rodney M. Bates" > >> Cc: "m3devel at elegosoft.com" > >> Subject: Re: [M3devel] UTF-8 TEXT > >> Date: Thu, 28 Jun 2012 10:37:36 -0400 > >> > >> Why not simply say that CHAR is an enumeration representing all of UTF-32? > >> The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. > >> We would need to translate the current Latin-1 literals into UTF-32. > >> And we could simply have a new literal form for Unicode literals. > >> > > This is almost what I would propose to do, with a couple of differences: > > > > Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. > > I am sure there is lots of existing code that depends on the implementation > > properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. > > Fair enough. Would we leave the encoding of CHAR as ISO-Latin-1? We?d still need translation from ISO-Latin-1 to UTF-8 wouldn?t we? > > > Then I would define, in the language itself, that WIDECHAR is Unicode, not > > UTF-32. Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an > > implementation characteristic that BYTESIZE(WIDECHAR))=4. > > I note this text from the Wikipedia entry for UTF-32: I had just looked this paragraph up on Wikipedia to post it when I noticed you had already done so. > > Though a fixed number of bytes per code point appear convenient, it is > not as useful as it appears. Wich is the gist of my objection to storing implementing TEXT as fixed-width 16, 20, or 32-bit storage units. It wastes space without much gain. (Exception might be made for a few languages that can be efficiently stored in 16 bits but not in UTF-8.) > It makes truncation easier but not significantly so compared to UTF-8 > and UTF-16. > It does not make it faster to find a particular offset in the string, > as an "offset" can be measured in the fixed-size code units of any > encoding. Exactly why I want character-extraction to be expressible in efficient "offsets" with implementation-independent specifications (though possibly implementatino-dependent values). I don't mind if character counts are also made available, as long as it doesn't impose extra overhead on those that don't use them. Operations with offsets that allow one to extract characters and skip over characters are sufficient for most purposes. The use of efficient offsets is independent of the question of access to individual bytes. > It does not make calculating the displayed width of a string easier > except in limited cases, since even with a ?fixed width? font there > may be more than one code point per character position (combining > marks) or more than one character position per code point (for example > CJK ideographs). > Combining marks mean editors cannot treat one code point as being the > same as one unit for editing. Editors that limit themselves to > left-to-right languages and precomposed characters can take advantage > of fixed-sized code units, but such editors are unlikely to support > non-BMP characters and thus can work equally well with 16-bit UTF-16 > encoding. I'd like to point out that most string processing doesn't really deal in characters at all, but in terms of words, phrases, symbols, and other linguistic structures that have to be dealt with using parsing. Assembling bytes of UTF-8 into characters is just more parsing, and should be viewed as such. For many applications it isn't even necessary to decode UTF-8, because it can be copied without being aware of its character structure. And it the language ascribes special meanings only to some of the first 128 characters, these can be unambiguously recognised in UTF-8 without decoding UTF-8 at all. This does argue for having byte access as well. > > Does this argue against WIDECHAR=UTF-32? Would we be better off simply saying WIDECHAR=UTF-16 and leaving things as they are? Yes, it would make the definition of WideCharAt a little odd, because the index would be defined in 16-bit units rather than UTF-16 glyph code-points. > > By the way, if we did change WIDECHAR to an enumeration containing 16_110000 elements then the stored (memory) size of WIDECHAR would be 4 bytes given the current CM3 implementation of enumerations, which chooses a (non-PACKED) stored size of 1/2/4/8 bytes depending on the number of elements. 16-bit WIDECHARs would seem to me to be the worst choice of all, except in the special case that you *know* that all the characters you'll ever have to deaal with fit in 16 bits and most of them won't fit in 8. I'd use WIDECHAR when I'm dealing with individual characters/UnicodeCodepoints. I'd use TEXT when dealing with strings. Or some custom data structure that can handle text containing strings and other data structure (suched as parse trees). Generally, there won't be a lot of WIDECHARS around in a running program, so I don't care much about the few extra bytes. -- hendrik From dabenavidesd at yahoo.es Mon Jul 2 22:44:44 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 2 Jul 2012 21:44:44 +0100 (BST) Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120702165444.GA20908@topoi.pooq.com> Message-ID: <1341261884.27797.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: I was thinking in back-end encoding of the CHARs in WIDECHAR using Rd/Wr-Rep but the mentioned modules are done around the idea of efficient machine implementation. I just think that the only need for having a UTF-8 or whatever encoding for CHARs and WIDECHAR is in a machine with those types. Numerous ?-coded "rare little" JVM machines are capable of handling that kind of Unicodes but anything else is just spurious to me, make that encoding for everybody in CM3. There isn't any other machine with that byte encoding that I know about so the good news is that the machines are reduced to: 1) Industrial Size scenario JVM 2) Small sized vendor machines, a web browser client like a JS? I hope with that we find some common ground for a solution for the issue. Thanks in advance --- El lun, 2/7/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] UTF-8 TEXT Para: m3devel at elegosoft.com Fecha: lunes, 2 de julio, 2012 11:54 On Mon, Jul 02, 2012 at 11:57:14AM -0400, Tony Hosking wrote: > > On Jul 2, 2012, at 10:50 AM, Rodney Bates wrote: > > > > > > > -Rodney Bates > > > > --- antony.hosking at gmail.com wrote: > > > >> From: Antony Hosking > >> To: "Rodney M. Bates" > >> Cc: "m3devel at elegosoft.com" > >> Subject: Re: [M3devel] UTF-8 TEXT > >> Date: Thu, 28 Jun 2012 10:37:36 -0400 > >> > >> Why not simply say that CHAR is an enumeration representing all of UTF-32? > >> The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. > >> We would need to translate the current Latin-1 literals into UTF-32. > >> And we could simply have a new literal form for Unicode literals. > >> > > This is almost what I would propose to do, with a couple of differences: > > > > Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. > > I am sure there is lots of existing code that depends on the implementation > > properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. > > Fair enough.? Would we leave the encoding of CHAR as ISO-Latin-1?? We?d still need translation from ISO-Latin-1 to UTF-8 wouldn?t we? > > > Then I would define, in the language itself, that WIDECHAR is Unicode, not > > UTF-32.? Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an > > implementation characteristic that BYTESIZE(WIDECHAR))=4. > > I note this text from the Wikipedia entry for UTF-32: I had just looked this paragraph up on Wikipedia to post it when I noticed you had already done so. > > Though a fixed number of bytes per code point appear convenient, it is > not as useful as it appears. Wich is the gist of my objection to storing implementing TEXT as fixed-width 16, 20, or 32-bit storage units.? It wastes space without much gain.? (Exception might be made for a few languages that can be efficiently stored in 16 bits but not in UTF-8.) > It makes truncation easier but not significantly so compared to UTF-8 > and UTF-16. > It does not make it faster to find a particular offset in the string, > as an "offset" can be measured in the fixed-size code units of any > encoding. Exactly why I want character-extraction to be expressible in efficient "offsets" with implementation-independent specifications (though possibly implementatino-dependent values).? I don't mind if character counts are also made available, as long as it doesn't impose extra overhead on those that don't use them.? Operations with offsets that allow one to extract characters and skip over characters are sufficient for most purposes.? The use of efficient offsets is independent of the question of access to individual bytes. > It does not make calculating the displayed width of a string easier > except in limited cases, since even with a ?fixed width? font there > may be more than one code point per character position (combining > marks) or more than one character position per code point (for example > CJK ideographs). > Combining marks mean editors cannot treat one code point as being the > same as one unit for editing. Editors that limit themselves to > left-to-right languages and precomposed characters can take advantage > of fixed-sized code units, but such editors are unlikely to support > non-BMP characters and thus can work equally well with 16-bit UTF-16 > encoding. I'd like to point out that most string processing doesn't really deal in characters at all, but in terms of words, phrases, symbols, and other linguistic structures that have to be dealt with using parsing.? Assembling bytes of UTF-8 into characters is just more parsing, and should be viewed as such. For? many applications it isn't even necessary to decode UTF-8, because it can be copied without being aware of its character structure. And it the language ascribes special meanings only to some of the first 128 characters, these can be unambiguously recognised in UTF-8 without decoding UTF-8 at all.? This does argue for having byte access as well. > > Does this argue against WIDECHAR=UTF-32?? Would we be better off simply saying WIDECHAR=UTF-16 and leaving things as they are?? Yes, it would make the definition of WideCharAt a little odd, because the index would be defined in 16-bit units rather than UTF-16 glyph code-points. > > By the way, if we did change WIDECHAR to an enumeration containing 16_110000 elements then the stored (memory) size of WIDECHAR would be 4 bytes given the current CM3 implementation of enumerations, which chooses a (non-PACKED) stored size of 1/2/4/8 bytes depending on the number of elements. 16-bit WIDECHARs would seem to me to be the worst choice of all, except in the special case that you *know* that all the characters you'll ever have to deaal with fit in 16 bits and most of them won't fit in 8. I'd use WIDECHAR when I'm dealing with individual characters/UnicodeCodepoints.? I'd use TEXT when dealing with strings.? Or some custom data structure that can handle text containing strings and other data structure (suched as parse trees).? Generally, there won't be a lot of WIDECHARS around in a running program, so I don't care much about the few extra bytes. -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Fri Jul 6 11:23:34 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Fri, 6 Jul 2012 11:23:34 +0200 Subject: [M3devel] A question for our language lawyers Message-ID: The report says (2.6.9) "The values in the array will be arbitrary values of their type." Now, ParseParams in its "init" method allocates an array of BOOLEANs and relies on the fact that it is supposedly initialised with FALSE values. At the other hand the report says (2.2.4) "The constant default is a default value used when a record is constructed or allocated" If I allocate an array of records, which statement is stronger: - the array contains arbitray record values ? - the array record fields will be initialised to their default values? The ParseParams "init" method is obviously erroneous and works only by virtue of a happy combination of circumstances. But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 18:06:20 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 17:06:20 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: Message-ID: <1341590780.97298.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: if that's true that you say "relies " Init hence the MODULE is wrong (is not) to specify that. But record rules hasn't anything to do here. But anyway you may have a point in that record initialization are less important than record construction (c.f p.53, s2.6.8, SPwM3), and that in the array case, it might be that it is stronger the array initialization (as a declared variable) than array construction but are decided in two different cases for WITH expression, with 'a' as an a TEXT WITH non-initialization but WITH p as a READONLY array-valued expression which doesn't do what you say it needs, so you found a bug known by Jay of "incorrect" un-initialized values in m3cg, or m3cc or m3gcc or m3cgc. In that case you might need an array of uninitialized expressions else construct the value correctly before entering the inner WITH. Thanks in advance --- El vie, 6/7/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 04:23 The report says (2.6.9) "The values in the array will be arbitrary values of their type." ? Now, ?ParseParams in its "init" method allocates an array of BOOLEANs and relies on the fact that it is supposedly?initialised with FALSE values. ? At the other hand the report says (2.2.4) "The constant default is a default value used when a record is constructed or allocated" ? If I allocate an array of records, which statement is stronger: - the array contains arbitray record values ? - the array record fields will be initialised to their default values? ? The ParseParams "init" method is obviously erroneous and works only by virtue of a happy combination of circumstances. But how is the report to be interpreted in the second case? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Fri Jul 6 18:28:10 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Fri, 06 Jul 2012 11:28:10 -0500 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> Message-ID: <4FF7121A.9000909@lcwb.coop> This is the result of the fact that your editor is writing UTF-8, while the compiler is reading in ISO-latin-1, as the language specifies. This was sensible at the time it was defined, but has been overcome by the advent and proliferation of Unicode. The abstract code point values in the range 16_80..16_FF are indeed the same in Unicode and ISO-latin-1, but the bit encoding rules are different. The simple and correct solution is to fix the compiler so that, like many programs today, it can be told to use one of several encodings when interpreting its input. Then set it the same as you set your editor. On 07/01/2012 03:52 AM, Dragi?a Duri? wrote: > Text.Length(Dragi?a Duri?)= 15 > > out from: > WITH me = W"Dragi?a Duri?" DO > IO.Put("Text.Length("& me& ")= "& Fmt.Int(Text.Length(me))& "\n"); > END; > > On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: > >> Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? > > From dabenavidesd at yahoo.es Fri Jul 6 19:08:25 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 18:08:25 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <4FF7121A.9000909@lcwb.coop> Message-ID: <1341594505.40475.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: I think the problem is how to encode but not to REVEAL (which would need machine identification, so a generic target is my preferred abstraction as CM3 tried to do) the language encoding explicitly (we don't like to reveal anything of the machine from Modula-3 sense standard point of view you might need a language redefinition), I think if one needs that is because is on a machine like that. So, in a given platform you might know the encoding and that's all. The other approach is just very hard to use, to put burden of choice, my thinking is that if you need that you might end needing generics that tell at compile time what to use. Of course Type checking methods are done at instantiation time, but nevertheless is helpful that these other settings are done at compile time (which make sense for the question why do I need to compile this code). That's because in other machines you might need to exploit three times the needed time to encode, decode and encode again (cost affects if you think in changing parameters so you might not touch that for the benefit of third parties as a default). This matters in phones where you don't have time to do that, and generally any type of type machine, so in a hard-coded way this is not helpful option for everybody at all as well. The machine-dependent solution helps if you can't compile the thing there (cross-compilations or pre-compiled binaries), but anyway I guess if we want Java compatibility (I do as a platform for binary compatibility but just when it's needed not in every execution environment, say a real HW implemented JVMs). So basically the language implementation needs to know that nobody else means that module wise model might need to be introduced, which is not something we have now. Thanks in advance --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] Simple change to WIDECHAR type Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 11:28 This is the result of the fact that your editor is writing UTF-8, while the compiler is reading in ISO-latin-1, as the language specifies.? This was sensible at the time it was defined, but has been overcome by the advent and proliferation of Unicode. The abstract code point values in the range 16_80..16_FF are indeed the same in Unicode and ISO-latin-1, but the bit encoding rules are different. The simple and correct solution is to fix the compiler so that, like many programs today, it can be told to use one of several encodings when interpreting its input.? Then set it the same as you set your editor. On 07/01/2012 03:52 AM, Dragi?a Duri? wrote: > Text.Length(Dragi?a Duri?)= 15 > > out from: >? ? WITH me = W"Dragi?a Duri?" DO >? ? ? IO.Put("Text.Length("&? me&? ")= "&? Fmt.Int(Text.Length(me))&? "\n"); >? ? END; > > On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: > >> Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Fri Jul 6 19:54:32 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Fri, 06 Jul 2012 12:54:32 -0500 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> References: <20120702075018.EEE2B9F6@resin11.mta.everyone.net> <2293DFAB-7532-4969-8A77-66FBB60EE427@cs.purdue.edu> Message-ID: <4FF72658.905@lcwb.coop> On 07/02/2012 10:57 AM, Tony Hosking wrote: > > On Jul 2, 2012, at 10:50 AM, Rodney Bates wrote: > >> >> >> -Rodney Bates >> >> --- antony.hosking at gmail.com wrote: >> >>> From: Antony Hosking > >>> To: "Rodney M. Bates" > >>> Cc: "m3devel at elegosoft.com " > >>> Subject: Re: [M3devel] UTF-8 TEXT >>> Date: Thu, 28 Jun 2012 10:37:36 -0400 >>> >>> Why not simply say that CHAR is an enumeration representing all of UTF-32? >>> The current definition merely says that CHAR is an enumeration containing *at least* 256 elements. >>> We would need to translate the current Latin-1 literals into UTF-32. >>> And we could simply have a new literal form for Unicode literals. >>> >> This is almost what I would propose to do, with a couple of differences: >> >> Leave CHAR alone and fix WIDECHAR to handle the entire Unicode space. >> I am sure there is lots of existing code that depends on the implementation >> properties: ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=255, and BYTESIZE(CHAR)=1. > > Fair enough. Would we leave the encoding of CHAR as ISO-Latin-1? We?d still need translation from ISO-Latin-1 to UTF-8 wouldn?t we? > Yes. The code points for Unicode and ISO-Latin-1, in the range 128..255 map to the same characters, (as in 0..127). But the physical encoding is different. ISO-Latin-1 is encoded one byte per character unconditionally. When Unicode is encoded in UTF-8, any code point 128 or more uses at least two bytes. We need translations, but these belong in Wr/Rd and friends, which handle serial streams. In in-memory variables, WIDECHAR holds a Unicode code point, ARRAY OF WIDECHAR would happen to be the same representation as UTF-32, and Text.T would abstract away the internal representation. >> Then I would define, in the language itself, that WIDECHAR is Unicode, not >> UTF-32. Thus ORD(LAST(WIDECHAR))=16_10FFFF. Then I would make it an >> implementation characteristic that BYTESIZE(WIDECHAR))=4. > > I note this text from the Wikipedia entry for UTF-32: > > Though a fixed number of bytes per code point appear convenient, it is not as useful as it appears. It makes truncation easier but not significantly so compared to UTF-8 andUTF-16 . It does not make it faster to find a particular offset in the string, as an "offset" can be measured in the fixed-size code units of any encoding. It does not make calculating the displayed width of a string easier except in limited cases, since even with a ?fixed width? font there may be more than one code point per character position (combining marks ) or more than one character position per code point (for example CJK ideographs). Combining marks mean editors cannot treat one code point as being the same as one unit for editing. Editors that limit themselves to left-to-right languages and precomposed characters > can take advantage of fixed-sized code units, but such editors are unlikely to support non-BMP characters and thus can work equally well with 16-bit UTF-16 encoding. > > > Does this argue against WIDECHAR=UTF-32? Would we be better off simply saying WIDECHAR=UTF-16 and leaving things as they are? Yes, it would make the definition of WideCharAt a little odd, because the index would be defined in 16-bit units rather than UTF-16 glyph code-points. > No. Keeping WIDECHAR at only 2^16 values does nothing to get us out of the morass we are now in where every bit of character-manipulating code has to cope with different encodings and/or variable-sized encodings. If we make WIDECHAR capable of holding any Unicode code point, then we have the possibility of dealing with characters in the same abstractions as we originally had, and, with only an 8-bit character set, still do Specifically, we have a variable type that holds any character, arrays thereof, and a very general functional style package of strings thereof. Library streams can handle encoding transformations, and most code won't have to worry about them, beyond specifying once what encoding it wants. Of course, you could still always do low-level stuff like putting one UTF-8 code _unit_ into a WIDECHAR or CHAR, having arrays or TEXTs thereof, and constantly fiddling with the encoding. But this should not be required. > By the way, if we did change WIDECHAR to an enumeration containing 16_110000 elements then the stored (memory) size of WIDECHAR would be 4 bytes given the current CM3 implementation of enumerations, which chooses a (non-PACKED) stored size of 1/2/4/8 bytes depending on the number of elements. > I have thought about making BYTESIZE(WIDECHAR) = 3, but that would at best trade one group of problems for another. In particular, applying ORD functions and doing arithmetic on characters located in arrays (including those hidden inside Text) would always involve repacking to get things aligned. I would think we would at least want to keep WIDECHAR scalars aligned. >> >> On Jun 27, 2012, at 10:12 PM, Rodney M. Bates wrote: >> >>> >>> >>> On 06/27/2012 07:32 PM, Antony Hosking wrote: >>>> So what do we do about 6-byte UTF-8 code points? They won't fit in WIDECHAR. Surely we should allow accessing a UTF-8 character as a CARDINAL and be done with it? >>>> >>> >>> Absolutely. Except I think a better way is to make WIDECHAR big enough to hold all of >>> Unicode. >>> >>>> Sent from my iPad >>>> >>>> On Jun 27, 2012, at 3:20 PM, "Rodney M. Bates"> wrote: >>>> >>>>> >>>>> >>>>> On 06/26/2012 10:30 PM, Hendrik Boom wrote: >>>>>> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >>>>>>> I seem to recall that Rodney did some work a while back relating to TEXT. >>>>>>> Rodney, can you weigh in on some of this? >>>>>>> --Randy Coleburn >>>>>>> >>>>>>> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >>>>>>> Sent: Tuesday, June 26, 2012 12:46 PM >>>>>>> To: Jay >>>>>>> Cc: m3devel >>>>>>> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >>>>>>> >>>>>>> You had idea in other message. Store length! >>>>>>> >>>>>>> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. >>>>>> >>>>>> Most of the time, you don't need explicit integer indexes to character >>>>>> locations. What you do need is an operation that fetches a character >>>>>> given the string and its index (whatever data structure that index is), >>>>>> and one that increments the index past that character. As long as you >>>>>> can save an index and use it later on the same string, that's probably >>>>>> all you ever need. And with a simple TEXT representation (such as the >>>>>> obvious array of bytes containing characters of various widths) a byte >>>>>> index is all you need (note: NOT a character index). It's easy even to >>>>>> use TEXT and its integer indices as the data representation, as long as >>>>>> you use the proper functions parse the characters and increment the >>>>>> indices by amounts that might differ from 1. >>>>>> >>>>>> And if your source code is represented in UTF-8, the representation that >>>>>> requires little extra compiler effort to parse, your TEXT strings will >>>>>> automagically appear in UTF-8. >>>>> >>>>> The original designers of the language and its libraries have given us >>>>> two different abstractions for handling character strings (in addition >>>>> to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. >>>>> >>>>> Text is highly general and easy to use. Concatentations and substrings >>>>> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >>>>> Random access by *character* number is easy and, hopefully, implemented >>>>> with efficiency at least better than O(n). >>>>> >>>>> Wr and friends restrict you to sequential access, at least mostly, but >>>>> gain implementation convenience and efficiency as a result. >>>>> >>>>> I feel very stongly that we should *not* take away the full generality >>>>> of Text, especially efficient random access, to handle variable-length >>>>> character encodings in strings. For these, lets make more friends of >>>>> Wr and Rd, which already assume sequential access. For example, a >>>>> filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 >>>>> interpretation to its bytes, and delivers a stream of Unicode characters, >>>>> in variables of type WIDECHAR. >>>>> >>>>> Text should preserve the abstraction that it's a string of characters, >>>>> generalized as it already is in cm3, to have type WIDECHAR, so they can be any >>>>> Unicode character. The internal representation should, usually, not be >>>>> of concern. >>>>> >>>>> Note that nowhere in Text are character values transferred between >>>>> a Text.T and any form of I/O stream. In the Text abstraction, all >>>>> characters go in and out of a Text.T in variables of type CHAR, >>>>> WIDECHAR, and arrays thereof. IO, etc. is only done in streams, >>>>> e.g, TextWr. We can easily add new variants of these that encode/decode >>>>> by various rules. >>>>> >>>>> Of course, it is still valid to put a string of bytes in a Text.T and >>>>> apply, e.g., UTF-8 interpretation yourself. But that's lower-level >>>>> programming, and shouldn't confuse the abstraction. >>>>> >>>>>> >>>>>> I can see a use for various wide characters -- the things you extract >>>>>> from a TEXT by parsing biits of it, but none for anything >>>>>> really new complicated for wide TEXT. >>>>>> >>>>>> The only confusing thing is that the existing operations for extracting >>>>>> bytes from TEXT have names that suggest they are extracting characters. >>>>>> >>>>> >>>>> I think it's more than a suggestion. I think the abstraction clearly >>>>> considers them characters. And it should stay that way. If you want, >>>>> at a higher level of code, to treat them as bytes, that's fine, but the >>>>> abstraction continues to view them as characters (which only you, the >>>>> client, know is not really so.) >>>>> >>>>>> -- Hendrik >>>>>> >>>> >> >> >> > > From rodney_bates at lcwb.coop Fri Jul 6 20:27:28 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Fri, 06 Jul 2012 13:27:28 -0500 Subject: [M3devel] A question for our language lawyers In-Reply-To: References: Message-ID: <4FF72E10.3030204@lcwb.coop> On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded. Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code. So why not define which value of the type is initialized-to and get behavioral predictability for free? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? From dragisha at m3w.org Fri Jul 6 20:51:10 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 6 Jul 2012 20:51:10 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <4FF7121A.9000909@lcwb.coop> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> <4FF7121A.9000909@lcwb.coop> Message-ID: And then, turn parsed string literals into broken WIDECHAR TEXTs? On Jul 6, 2012, at 6:28 PM, Rodney M. Bates wrote: > This is the result of the fact that your editor is writing UTF-8, while > the compiler is reading in ISO-latin-1, as the language specifies. This > was sensible at the time it was defined, but has been overcome by the > advent and proliferation of Unicode. > > The abstract code point values in the range 16_80..16_FF are indeed the same in > Unicode and ISO-latin-1, but the bit encoding rules are different. > > The simple and correct solution is to fix the compiler so that, like many > programs today, it can be told to use one of several encodings when interpreting > its input. Then set it the same as you set your editor. > > On 07/01/2012 03:52 AM, Dragi?a Duri? wrote: >> Text.Length(Dragi?a Duri?)= 15 >> >> out from: >> WITH me = W"Dragi?a Duri?" DO >> IO.Put("Text.Length("& me& ")= "& Fmt.Int(Text.Length(me))& "\n"); >> END; >> >> On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: >> >>> Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? >> >> From dabenavidesd at yahoo.es Fri Jul 6 21:17:51 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 20:17:51 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: <4FF72E10.3030204@lcwb.coop> Message-ID: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 21:57:25 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 20:57:25 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: <1341604645.24546.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: I think if we are to type define initialization, we need a kernel to type more fun than rigid Modula-3 semantics: http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf That said, we can define a m3kernel sort of type minimal abstraction of a Modula-3 Object, and built on top of that. Advantages are we can type theorize? in every wanted way with it and still protect us from incompatible type systems, by branding the type system to allow smooth transitions. Besides parallelization implicitly in the abstract machine (kernel) and check the type safety of it. Also rewrite the type system in terms of this kernel might get us to a new language in the sense of a language definition smoothly If someone steems this good I can make my try. Thanks in advance --- El vie, 6/7/12, Daniel Alejandro Benavides D. escribi?: De: Daniel Alejandro Benavides D. Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 14:17 Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Fri Jul 6 21:54:54 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Fri, 6 Jul 2012 21:54:54 +0200 Subject: [M3devel] A question for our language lawyers In-Reply-To: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: Daniel, with my apologies, sometimes I wonder if you do it on purpose. From: Daniel Alejandro Benavides D. Sent: Friday, July 06, 2012 9:17 PM To: m3devel at elegosoft.com ; Rodney M. Bates Subject: Re: [M3devel] A question for our language lawyers Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded. Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code. So why not define which value of the type is initialized-to and get behavioral predictability for free? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 22:07:15 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 21:07:15 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: Message-ID: <1341605235.19643.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: thing is as today is you don't have any software to show the language is incorrect, so I can't validate you (I don't pretend to do that). Because there isn't any compiler that defines that. Sorry for that, but nobody else seems to care, so thanks for sharing your problem, at least someone is interested in that as well. Dr Dobbs talks about tri state boolean, I thought it was to show that. Sorry if not. Thanks in advance --- El vie, 6/7/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: Re: [M3devel] A question for our language lawyers Para: "Daniel Alejandro Benavides D." , m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 14:54 Daniel, with my apologies, sometimes I wonder if you do it on purpose. From: Daniel Alejandro Benavides D. Sent: Friday, July 06, 2012 9:17 PM To: m3devel at elegosoft.com ; Rodney M. Bates Subject: Re: [M3devel] A question for our language lawyers Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 22:59:12 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 21:59:12 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: <1341604645.24546.YahooMailClassic@web29706.mail.ird.yahoo.com> Message-ID: <1341608352.82920.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: See Baby Modula-3 allows field definition (value by definition s. 3.1) for free se p. 10-11 in url. Thanks in advance --- El vie, 6/7/12, Daniel Alejandro Benavides D. escribi?: De: Daniel Alejandro Benavides D. Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 14:57 Hi all: I think if we are to type define initialization, we need a kernel to type more fun than rigid Modula-3 semantics: http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf That said, we can define a m3kernel sort of type minimal abstraction of a Modula-3 Object, and built on top of that. Advantages are we can type theorize? in every wanted way with it and still protect us from incompatible type systems, by branding the type system to allow smooth transitions. Besides parallelization implicitly in the abstract machine (kernel) and check the type safety of it. Also rewrite the type system in terms of this kernel might get us to a new language in the sense of a language definition smoothly If someone steems this good I can make my try. Thanks in advance --- El vie, 6/7/12, Daniel Alejandro Benavides D. escribi?: De: Daniel Alejandro Benavides D. Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 14:17 Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jul 6 23:07:59 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 6 Jul 2012 23:07:59 +0200 Subject: [M3devel] A question for our language lawyers In-Reply-To: References: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: Dirk, If you still have doubts, you are better man than most of us :) Thanks in advance! On Jul 6, 2012, at 9:54 PM, Dirk Muysers wrote: > Daniel, with my apologies, sometimes I wonder if you do it on purpose. > > From: Daniel Alejandro Benavides D. > Sent: Friday, July 06, 2012 9:17 PM > To: m3devel at elegosoft.com ; Rodney M. Bates > Subject: Re: [M3devel] A question for our language lawyers > > Hi all: > English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: > > http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 > > So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) > > This means we need to address this by either a native backend (NT386) or by another language for that matter. > > Thanks in advance for any comments you may have > > --- El vie, 6/7/12, Rodney M. Bates escribi?: > > De: Rodney M. Bates > Asunto: Re: [M3devel] A question for our language lawyers > Para: m3devel at elegosoft.com > Fecha: viernes, 6 de julio, 2012 13:27 > > > > On 07/06/2012 04:23 AM, Dirk Muysers wrote: > > The report says (2.6.9) > > "The values in the array will be arbitrary values of their type." > > > Now, ParseParams in its "init" method allocates an array of BOOLEANs > > and relies on the fact that it is supposedly initialised with FALSE values. > > > At the other hand the report says (2.2.4) > > "The constant |default| is a default value used when a record is constructed or allocated" > > > If I allocate an array of records, which statement is stronger: > > - the array contains arbitray record values ? > > - the array record fields will be initialised to their default values? > > Admittedly unclearly if not misleadingly worded. Better wording might be > to say each element is initialized as it would if it were a scalar variable > of its type. > > I think the way to interpret this is that the array itself does not impose > any initialization, but this fact will not eliminate initialization > imposed by other rules, specifically, the type of the array's elements. > > This is a language quirk that I have always been deeply ambivalent about. > The type safety would go down the drain if variables were not initialized > to a bit pattern that represents some value of the type, so we have to pay > the performance penalty of executing initialization code. So why not define > which value of the type is initialized-to and get behavioral predictability > for free? And further save redundant initialization in the likely event > that the compiler's chosen arbitrary value happens to match what the > programmer wants? > > (OK, a smart enough optimizer might figure this out, but we could have > had it even with a naive compiler.) > > The contrary case is a type whose compiler-chosen representation happens > to use every bit pattern in the allocated space for a value of the type. > Here, no compiler-generated runtime initialization is needed. > > Also, the rule we have might sometimes encourage programmers to at least give a > millisecond's thought to whether they need to do some explicit initialization. > > > > The ParseParams "init" method is obviously erroneous and works only > > by virtue of a happy combination of circumstances. > > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jul 6 23:44:55 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 6 Jul 2012 22:44:55 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: Message-ID: <1341611095.41843.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: As I said once, why say what's right, what is wrong, in terms of standards nobody cares that, so who cares to say that. (See other programming languages that need help first, like C and friends!) Thanks in advance --- El vie, 6/7/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] A question for our language lawyers Para: "Dirk Muysers" CC: "Daniel Alejandro Benavides D." , m3devel at elegosoft.com, "Rodney M. Bates" Fecha: viernes, 6 de julio, 2012 16:07 Dirk, If you still have doubts, you are better man than most of us :) Thanks in advance! On Jul 6, 2012, at 9:54 PM, Dirk Muysers wrote: Daniel, with my apologies, sometimes I wonder if you do it on purpose. From: Daniel Alejandro Benavides D. Sent: Friday, July 06, 2012 9:17 PM To: m3devel at elegosoft.com ; Rodney M. Bates Subject: Re: [M3devel] A question for our language lawyers Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded.? Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code.? So why not define which value of the type is initialized-to and get behavioral predictability for free?? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jul 7 08:05:39 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 7 Jul 2012 06:05:39 +0000 Subject: [M3devel] A question for our language lawyers In-Reply-To: References: <1341602271.14326.YahooMailClassic@web29701.mail.ird.yahoo.com>, Message-ID: I quite like the idea that all heap and stack is initialized by zeroing. This is I believe stronger/safer than Modula-3, at least for stack. Anyone want to measure the change? I'd also like to see stack zeroed upon function return, so GC is easier to implement/understand... From: dmuysers at hotmail.com To: dabenavidesd at yahoo.es; m3devel at elegosoft.com; rodney_bates at lcwb.coop Date: Fri, 6 Jul 2012 21:54:54 +0200 Subject: Re: [M3devel] A question for our language lawyers Daniel, with my apologies, sometimes I wonder if you do it on purpose. From: Daniel Alejandro Benavides D. Sent: Friday, July 06, 2012 9:17 PM To: m3devel at elegosoft.com ; Rodney M. Bates Subject: Re: [M3devel] A question for our language lawyers Hi all: English men say array is a sequence of elements (of a common type), and a BOOLEAN is an enumeration so you might attack that distinction to define what is an initialized boolean or array of boolean in common compilers, gcc javac, etc, which if is java-like is really undefined: http://www.drdobbs.com/architecture-and-design/the-humble-boolean-deserves-help/232900836?cid=DDJ_nl_upd_2012-05-02_h&elq=cef656ee4d6c4bca996b337620b98f85 So I prefer non-uniform rules for records different of Sets, Arrays, and records as that, note that NEW expression doesn't allow constructors to be used, so the only thing you can use is array of uninitialized variables (but current gcc or javac, etc are really wrong in that) This means we need to address this by either a native backend (NT386) or by another language for that matter. Thanks in advance for any comments you may have --- El vie, 6/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: viernes, 6 de julio, 2012 13:27 On 07/06/2012 04:23 AM, Dirk Muysers wrote: > The report says (2.6.9) > "The values in the array will be arbitrary values of their type." > Now, ParseParams in its "init" method allocates an array of BOOLEANs > and relies on the fact that it is supposedly initialised with FALSE values. > At the other hand the report says (2.2.4) > "The constant |default| is a default value used when a record is constructed or allocated" > If I allocate an array of records, which statement is stronger: > - the array contains arbitray record values ? > - the array record fields will be initialised to their default values? Admittedly unclearly if not misleadingly worded. Better wording might be to say each element is initialized as it would if it were a scalar variable of its type. I think the way to interpret this is that the array itself does not impose any initialization, but this fact will not eliminate initialization imposed by other rules, specifically, the type of the array's elements. This is a language quirk that I have always been deeply ambivalent about. The type safety would go down the drain if variables were not initialized to a bit pattern that represents some value of the type, so we have to pay the performance penalty of executing initialization code. So why not define which value of the type is initialized-to and get behavioral predictability for free? And further save redundant initialization in the likely event that the compiler's chosen arbitrary value happens to match what the programmer wants? (OK, a smart enough optimizer might figure this out, but we could have had it even with a naive compiler.) The contrary case is a type whose compiler-chosen representation happens to use every bit pattern in the allocated space for a value of the type. Here, no compiler-generated runtime initialization is needed. Also, the rule we have might sometimes encourage programmers to at least give a millisecond's thought to whether they need to do some explicit initialization. > The ParseParams "init" method is obviously erroneous and works only > by virtue of a happy combination of circumstances. > But how is the report to be interpreted in the second case? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Sat Jul 7 14:06:31 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Sat, 7 Jul 2012 14:06:31 +0200 Subject: [M3devel] A question for our language lawyers Message-ID: I reread ParseParams.m3 and, yes, they initialise the array of booleans. One should never trust one's memory, especially past a certain age. Yet I am sure having seen one of the library modules relying on zero initialisation. For my excuse, I never (except an occasional INC, where C would use ++) place two statements on the same line, so when I quickly browse through some code, the second statement often escapes my eyes. Nevertheless the initialisation question was worth to be mentionned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Sat Jul 7 14:57:03 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sat, 7 Jul 2012 13:57:03 +0100 (BST) Subject: [M3devel] A question for our language lawyers In-Reply-To: Message-ID: <1341665823.8622.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: yes, it could be, but VAXen and Alpha's I believe it did not cause the wrong behavior to show that? incorrect initialization at start time, that most part of it trust on it (Alphas just throw an exception to show that it was changed). I didn't know it was wrong for sure, but I guess that confirms the initialization code is not working by vicious value initialization. Did you see the Baby Modula-3 (in p.10 - 11, s 3.1 - Relation to Modula-3) it says you can do overriding at the type level overriding of fields to override defaults? Thanks in advance --- El s?b, 7/7/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: [M3devel] A question for our language lawyers Para: m3devel at elegosoft.com Fecha: s?bado, 7 de julio, 2012 07:06 I reread ParseParams.m3 and, yes, they initialise the array of booleans. One should never trust?one's memory, especially past a certain age. Yet I am sure having seen one of the library modules relying on zero initialisation. For my excuse, I never (except an occasional INC, where C would use ++) place two statements on the same line, so when I quickly browse through some code, the second statement often escapes my eyes. Nevertheless the initialisation question was worth to be mentionned. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Sat Jul 7 15:59:07 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Sat, 07 Jul 2012 08:59:07 -0500 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> <95923C4D-B4F9-4245-809B-C3D186B8679C@m3w.org> <4FF7121A.9000909@lcwb.coop> Message-ID: <4FF840AB.5050807@lcwb.coop> On 07/06/2012 01:51 PM, Dragi?a Duri? wrote: > And then, turn parsed string literals into broken WIDECHAR TEXTs? > Well, yes, that requires fixing WIDECHAR too. But at least it would work if you can live within the BMP. > On Jul 6, 2012, at 6:28 PM, Rodney M. Bates wrote: > >> This is the result of the fact that your editor is writing UTF-8, while >> the compiler is reading in ISO-latin-1, as the language specifies. This >> was sensible at the time it was defined, but has been overcome by the >> advent and proliferation of Unicode. >> >> The abstract code point values in the range 16_80..16_FF are indeed the same in >> Unicode and ISO-latin-1, but the bit encoding rules are different. >> >> The simple and correct solution is to fix the compiler so that, like many >> programs today, it can be told to use one of several encodings when interpreting >> its input. Then set it the same as you set your editor. >> >> On 07/01/2012 03:52 AM, Dragi?a Duri? wrote: >>> Text.Length(Dragi?a Duri?)= 15 >>> >>> out from: >>> WITH me = W"Dragi?a Duri?" DO >>> IO.Put("Text.Length("& me& ")="& Fmt.Int(Text.Length(me))& "\n"); >>> END; >>> >>> On Jun 30, 2012, at 8:12 PM, Dragi?a Duri? wrote: >>> >>>> Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? >>> >>> > > From dabenavidesd at yahoo.es Sat Jul 7 18:17:10 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sat, 7 Jul 2012 17:17:10 +0100 (BST) Subject: [M3devel] Modula-3 TLA Win32 Kernel Threads API Specification by Leslie Lamport Message-ID: <1341677830.27299.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I wanted to share what I have found recently: http://web.archive.org/web/20010712210213/http://www.research.compaq.com/SRC/personal/lamport/tla/threads/threads.html I would like to make that for POSIX 1003.4 (original DEC proposal) and post it, would Elegofolks mind to upload the Lamport to CVS tree, I think are important design notes of the Win32 Threads API if at all please let me know if interested. Alas it's TLA code may be considered m3theory subdirectory of m3kernel In fact there is a TLA checker written in connection with Zeus Algorithm Animation system for automating the animation of proofs, so I guess we just lack that part for further integration. Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Tue Jul 10 17:57:04 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Tue, 10 Jul 2012 10:57:04 -0500 Subject: [M3devel] A Unicode/WIDECHAR proposal Message-ID: <4FFC50D0.4000805@lcwb.coop> Here is a more-or-less comprehensive proposal to get modern support of Unicode and its various encodings into Modula-3 and its libraries, while preserving both backward compatibility and original abstractions. Summary: Fix WIDECHAR so it holds all of Unicode. This restores the abstractions we once had, by treating every character as a value of a scalar type, for in-memory processing. The members of a TEXT and elements of ARRAY OF WIDECHAR get this property too. Do encoding/decoding in streams Wr and Rd, which are inherently sequential anyway. Give every stream an encoding property. Add procedures to get/put characters with encoding/decoding. These changes are backward-compatable. You can still do low-level stuff if you have good reason, or just want to leave existing code alone. E.g., putting the bytes of UTF-8 into the characters of a TEXT and doing your own encoding/decoding. CHAR: Leave CHAR as it is: exactly 256 values, encoded in ISO-Latin-1, ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=16_FF, BYTESIZE(CHAR)=1. The language allows CHAR to have more values, but changing this would no doubt seriously undermine a good bit of existing code. WIDECHAR: Change WIDECHAR to have exactly the Unicode range. ORD(FIRST(WIDECHAR))=0 and ORD(LAST(WIDECHAR))=16_10FFFF. The full ORD and VAL functions from/to WIDECHAR are defined by the code point to character mapping of the Unicode standard. BYTESIZE(WIDECHAR)=4. Make actual internal representation be Unicode code points also. This happens to match UTF-32, most significantly for arrays of WIDECHAR. Note that some of the codepoint values in this range are not unicode characters. Programmers will need to account for this. CHAR <: WIDECHAR, which means they are mutually assignable, with runtime check in the one direction. This works because the Unicode code points and the ISO-Latin-1 code points are identical in the entire ISO-Latin-1 range, up to 16_FF. Note that at 16_80 and above, the UTF-8 encoding is more than one byte, none of them equal to the encoded code point. This is not a problem, because both CHAR and WIDECHAR are actual code points, not one of the bytes UTF-8. TEXT: TEXT continues to be defined as abstractly a sequence of WIDECHAR. An index into a TEXT is an integer count of characters. The internal representation (used only in memory, and maybe in pickles) is hidden and could be just about anything. Given the extreme memory inefficiency of the current cm3 implementation of TEXT, we no doubt will want to change it, but this decision is independent and at a lower level. The abstract interface Text will hide this. There is hardly a remaining need for Text.FromChar, because by assignability, Text.FromWideChar can be used in its place, with the same result. But keep FromChar, for compatability with existing code. Text.FromChars just means the code points in the created text will happen to be members of type CHAR. Text.GetChar and Text.GetChars will raise an exception if a to-be-gotten code point in the TEXT lies outside the type CHAR. This is a change from existing behavior, which just truncates the high bits of a WIDECHAR value and returns only the low bits. Even if we didn't add the exception, we would want this to be an assignability runtime error. Literals: Inside wide character and wide text literals, add two new escapes, \u, which is followed by exactly 4 hex digits denoting a code point, and \U, which is followed by exactly 6 hex digits. The letters 'u' and 'U' are used in this way in the Unicode standard. \u would be redundant with the existing \x and \X escapes, but those would merely preserve compatability for existing code. (Or is there so little existing code using them that we could eliminate them for a more consistent system?) Encodings: Define an enumeration giving the possible encodings used in streams: TYPE Encoding = {Inherit, ISO_Latin_1, UCS_2LE, UTF_8, UTF_16, UTF_16BE, UTF_16LE, UTF_32, UTF_32BE, UTF_32LE}; ISO_Latin_1 means one byte per character, unconditionally. This is the way current Modula-3 always encodes CHAR. An attempt to Put a code point greater than 16_FF in this encoding will raise an exception. (This can happen only using newly added procedures.) Similarly, UCS_2LE, as I understand the standard, means exactly two bytes per character, LSB first. This is what our current Wr and Rd use for WIDECHAR. Here again, an exception will be raised for a code point greater than 16_FFFF. This, also, can happen only using newly added procedures. Inherit means get the encoding to be used from somewhere else, for example, from the file system, in case it is able to store this property of a file. Every Wr.T and every Rd.T has an Encoding property that can be specified when creating the stream, (from one of its subtypes). The ways of doing this can vary with the subtype. This defaults to Inherit, which means, if possible, take it from the file system, etc. Otherwise, there are defaults for the various streams. New operations that Put/Get Unicode characters have a parameter of type Encoding, with a default value of Inherit, which means get the encoding property from the stream. Accepting this default would be the usual way to use these procedures. Specifying the encoding differently in the Put/Get procedure allows mixed encodings in a single stream. It seems dubious to encourage this, but existing Wr and Rd already provide plenty of opportunities to do similar stuff anyway, so this just extends existing semantics to the new procedures. It also allows some existing Put/Get procedures to be defined as equivalents to new ones. Wr: New procedure PutUniWideChar(Wr: T; ch: WIDECHAR; Enc:=Encoding.Inherit) encodes the character using Enc and appends that to the stream. There is hardly a need for a CHAR counterpart. Since CHAR is assignable to WIDECHAR, PutUniWideChar suffices for an actual parameter of either type. Whether the caller provides a CHAR or a WIDECHAR (or whether we were alternatively to have different procedures) does _not_ affect the encoding, only the value range that can be passed in. Similar new procedures PutUniString, PutUniWideString, and PutUniText are counterparts to PutString, PutWideString, and PutText, respectively. Existing PutChar and PutString, which write CHARs as one byte, each become equivalent to PutUniWideChar and PutUniString, with Enc:=Encoding.ISO_Latin_1. Similarly, Existing PutWideChar and PutWideString, which write WIDECHARs as two bytes each, becomes equivalent to PutUniWideChar and PutUniWideString, with Enc:=Encoding.UCS_2LE. The existing Wr interface is peculiar, IMO, in that even though there is currently no distinction between a text and a wide text, we have PutText and PutWideText. These have identical signatures, both taking a TEXT (which can contain characters in the full WIDECHAR range). The difference is that PutText rather violently truncates every character in the text to 8 bits and writes that, implicitly in ISO-Latin-1 encoding. This is not equivalent to PutUniText with Enc:=Encoding.ISO_Latin_1, because the latter will raise an exception for unencodable code points. Rd: New procedure GetUniWideChar (rd:T; Enc:=Encoding.Inherit) :WIDECHAR decodes, using Enc, and consumes, enough bytes from rd for one Unicode code point and returns it. There is not a lot of need for a CHAR-returning counterpart of GetUniWideChar. A caller can just assign the result from GetUniWideChar to a CHAR variable and deal with the possible range error at the call site. GetUniSub, GetUniWideSub, GetUniSubLine, GetUniWideSubLine, GetUniText, and GetUniTextLine are counterparts to GetSub, GetWideSub, GetSubLine GetWideSubLine, GetWideText, and GetWideLine. They differ in decoding according to the Enc parameter. In the new GetUni* procedures, any case where a partial character is terminated by end-of-file will raise an exception. This differs from the current GetWide* procedures, which all implicitly use UCS_2LE and just insert a zero byte as the MSB in this case. Existing GetChar, GetSub, GetSubLine, GetText, and GetLine all implicitly use the ISO-Latin-1 encoding. GetWideChar, GetWideSub, GetWideSubLine, GetWideText, and GetWideLine all implicitly use UCS_2LE. They differ from new GetUni* procedures using UCS_2LE in that the latter raise an exception on a incomplete character. GetUniSub and GetUniSubLine return decoded characters in ARRAY OF CHAR and raise an exception if a decoded code point is not in CHAR. This might seem a bit ridiculous, but they could be useful for quick, partial adaptation of existing code to accept newer encodings and detect, without otherwise handling, higher code points. Actually, GetWideText is documented as being identical to GetText, in behavior, as well as signature. I think this must be an editing error. I wonder if we need to review the rules for what constitutes a line break. A new UnGetUni would work like UnGetChar, but would reencode the pushed-back character, (retained internally as a WIDECHAR), according to its Enc parameter. The next Get* would then redecode according to its Enc parameter or implicit encoding, which could be different and consume a different number of bytes. If this seems bizarre, note that it continues established semantics. Existing UnGetChar will push back a character, implicitly in ISO-Latin-1, and it is possible to call GetWideChar next, which will use the pushed-back byte plus the byte following, decode in UCS-2LE, and return the result. UnGetUni will be more complicated to implement, but it can be done. It seems odd that there is no UnGetWideChar. UnGetUni with Enc:=Encoding.UCS_2LE should accomplish this. A UniCharsReady might be nice, but it would be O(n), for UTF-8 and UTF-16. Of course, these changes will require corresponding changes in several other stream-related interfaces, particularly in providing ways to specify (and interrogated?) an encoding property of a stream. Compiler source file encoding: Existing rules for interpretation (defacto, from the cm3 implementation) of wide character and wide string literals depend on the encoding of the input file. At present, the compiler always assumes this is ISO-latin-1. If it actually is a UTF-8 file, as is often the case today, this will result in incorrect conversion of literals. If, in our current implementation, the value of such a literal is then written out by a Modula-3 program, unchanged, the program will write ISO-Latin-1. If some other program (e.g., an editor or terminal emulator) interprets this output file as UTF-8, the reverse incorrect reinterpretation will result in the original string. But if the program manipulates the characters using the language-defined abstraction, the result will in general be incorrect. The same scenario applies when a single program reads in ISO-Latin-1, a file that was produced in UTF-8, writes in ISO-Latin-1, with the output file then being fed to some other program that interprets it as UTF-8. From dabenavidesd at yahoo.es Wed Jul 11 00:30:15 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 10 Jul 2012 23:30:15 +0100 (BST) Subject: [M3devel] A Unicode/WIDECHAR proposal In-Reply-To: <4FFC50D0.4000805@lcwb.coop> Message-ID: <1341959415.94700.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: Widechar is char simulation of a word sized char which is not intended by Rd/Wr implementation, read and write of literals is assuming that you won't get any real speed improvement over the DEC-SRC source to source transliteration of a given literal. This is to say, what you want is the same it is CM3 TEXT type with better functionality, is better to make polymorphic functions. e.g use FromChar receives both kind of chars without losing DEC-SRC representation characteristic and returning what you want in polymorphic (for instance your file text editor assumes you don't have real wide strings just yet one raw stream, then you can feed the text file in memory efficiently with a digital encoder optimized for your architecture and grab it there wherever you want, conversely opening an unused file you have to convert it at execution time, etc) way. Thanks in advance --- El mar, 10/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: [M3devel] A Unicode/WIDECHAR proposal Para: "m3devel" Fecha: martes, 10 de julio, 2012 10:57 Here is a more-or-less comprehensive proposal to get modern support of Unicode and its various encodings into Modula-3 and its libraries, while preserving both backward compatibility and original abstractions. Summary: Fix WIDECHAR so it holds all of Unicode.? This restores the abstractions we once had, by treating every character as a value of a scalar type, for in-memory processing.? The members of a TEXT and elements of ARRAY OF WIDECHAR get this property too. Do encoding/decoding in streams Wr and Rd, which are inherently sequential anyway.? Give every stream an encoding property.? Add procedures to get/put characters with encoding/decoding.? These changes are backward-compatable. You can still do low-level stuff if you have good reason, or just want to leave existing code alone.? E.g., putting the bytes of UTF-8 into the characters of a TEXT and doing your own encoding/decoding. CHAR: Leave CHAR as it is: exactly 256 values, encoded in ISO-Latin-1, ORD(FIRST(CHAR))=0, ORD(LAST(CHAR))=16_FF, BYTESIZE(CHAR)=1.? The language allows CHAR to have more values, but changing this would no doubt seriously undermine a good bit of existing code. WIDECHAR: Change WIDECHAR to have exactly the Unicode range. ORD(FIRST(WIDECHAR))=0 and ORD(LAST(WIDECHAR))=16_10FFFF.? The full ORD and VAL functions from/to WIDECHAR are defined by the code point to character mapping of the Unicode standard.? BYTESIZE(WIDECHAR)=4. Make actual internal representation be Unicode code points also.? This happens to match UTF-32, most significantly for arrays of WIDECHAR. Note that some of the codepoint values in this range are not unicode characters.? Programmers will need to account for this. CHAR <: WIDECHAR, which means they are mutually assignable, with runtime check in the one direction.? This works because the Unicode code points and the ISO-Latin-1 code points are identical in the entire ISO-Latin-1 range, up to 16_FF.? Note that at 16_80 and above, the UTF-8 encoding is more than one byte, none of them equal to the encoded code point.? This is not a problem, because both CHAR and WIDECHAR are actual code points, not one of the bytes UTF-8. TEXT: TEXT continues to be defined as abstractly a sequence of WIDECHAR.? An index into a TEXT is an integer count of characters.? The internal representation (used only in memory, and maybe in pickles) is hidden and could be just about anything. Given the extreme memory inefficiency of the current cm3 implementation of TEXT, we no doubt will want to change it, but this decision is independent and at a lower level.? The abstract interface Text will hide this. There is hardly a remaining need for Text.FromChar, because by assignability, Text.FromWideChar can be used in its place, with the same result.? But keep FromChar, for compatability with existing code. Text.FromChars just means the code points in the created text will happen to be members of type CHAR. Text.GetChar and Text.GetChars will raise an exception if a to-be-gotten code point in the TEXT lies outside the type CHAR.? This is a change from existing behavior, which just truncates the high bits of a WIDECHAR value and returns only the low bits.? Even if we didn't add the exception, we would want this to be an assignability runtime error. Literals: Inside wide character and wide text literals, add two new escapes, \u, which is followed by exactly 4 hex digits denoting a code point, and \U, which is followed by exactly 6 hex digits.? The letters 'u' and 'U' are used in this way in the Unicode standard.? \u would be redundant with the existing \x and \X escapes, but those would merely preserve compatability for existing code.? (Or is there so little existing code using them that we could eliminate them for a more consistent system?) Encodings: Define an enumeration giving the possible encodings used in streams: TYPE Encoding ???= {Inherit, ISO_Latin_1, UCS_2LE, UTF_8, UTF_16, UTF_16BE, UTF_16LE, ? ? ? UTF_32, UTF_32BE, UTF_32LE}; ISO_Latin_1 means one byte per character, unconditionally.? This is the way current Modula-3 always encodes CHAR.? An attempt to Put a code point greater than 16_FF in this encoding will raise an exception. (This can happen only using newly added procedures.) Similarly, UCS_2LE, as I understand the standard, means exactly two bytes per character, LSB first.? This is what our current Wr and Rd use for WIDECHAR.? Here again, an exception will be raised for a code point greater than 16_FFFF.? This, also, can happen only using newly added procedures. Inherit means get the encoding to be used from somewhere else, for example, from the file system, in case it is able to store this property of a file. Every Wr.T and every Rd.T has an Encoding property that can be specified when creating the stream, (from one of its subtypes).? The ways of doing this can vary with the subtype.? This defaults to Inherit, which means, if possible, take it from the file system, etc. Otherwise, there are defaults for the various streams. New operations that Put/Get Unicode characters have a parameter of type Encoding, with a default value of Inherit, which means get the encoding property from the stream.? Accepting this default would be the usual way to use these procedures. Specifying the encoding differently in the Put/Get procedure allows mixed encodings in a single stream.? It seems dubious to encourage this, but existing Wr and Rd already provide plenty of opportunities to do similar stuff anyway, so this just extends existing semantics to the new procedures.? It also allows some existing Put/Get procedures to be defined as equivalents to new ones. Wr: New procedure ? PutUniWideChar(Wr: T; ch: WIDECHAR; Enc:=Encoding.Inherit) encodes the character using Enc and appends that to the stream.? There is hardly a need for a CHAR counterpart.? Since CHAR is assignable to WIDECHAR, PutUniWideChar suffices for an actual parameter of either type.? Whether the caller provides a CHAR or a WIDECHAR (or whether we were alternatively to have different procedures) does _not_ affect the encoding, only the value range that can be passed in. Similar new procedures PutUniString, PutUniWideString, and PutUniText are counterparts to PutString, PutWideString, and PutText, respectively. Existing PutChar and PutString, which write CHARs as one byte, each become equivalent to PutUniWideChar and PutUniString, with Enc:=Encoding.ISO_Latin_1.? Similarly, Existing PutWideChar and PutWideString, which write WIDECHARs as two bytes each, becomes equivalent to PutUniWideChar and PutUniWideString, with Enc:=Encoding.UCS_2LE. The existing Wr interface is peculiar, IMO, in that even though there is currently no distinction between a text and a wide text, we have PutText and PutWideText.? These have identical signatures, both taking a TEXT (which can contain characters in the full WIDECHAR range).? The difference is that PutText rather violently truncates every character in the text to 8 bits and writes that, implicitly in ISO-Latin-1 encoding.? This is not equivalent to PutUniText with Enc:=Encoding.ISO_Latin_1, because the latter will raise an exception for unencodable code points. Rd: New procedure ? GetUniWideChar (rd:T; Enc:=Encoding.Inherit) :WIDECHAR decodes, using Enc, and consumes, enough bytes from rd for one Unicode code point and returns it.? There is not a lot of need for a CHAR-returning counterpart of GetUniWideChar.? A caller can just assign the result from GetUniWideChar to a CHAR variable and deal with the possible range error at the call site. GetUniSub, GetUniWideSub, GetUniSubLine, GetUniWideSubLine, GetUniText, and GetUniTextLine are counterparts to GetSub, GetWideSub, GetSubLine GetWideSubLine, GetWideText, and GetWideLine.? They differ in decoding according to the Enc parameter. In the new GetUni* procedures, any case where a partial character is terminated by end-of-file will raise an exception.? This differs from the current GetWide* procedures, which all implicitly use UCS_2LE and just insert a zero byte as the MSB in this case. Existing GetChar, GetSub, GetSubLine, GetText, and GetLine all implicitly use the ISO-Latin-1 encoding.? GetWideChar, GetWideSub, GetWideSubLine, GetWideText, and GetWideLine all implicitly use UCS_2LE.? They differ from new GetUni* procedures using UCS_2LE in that the latter raise an exception on a incomplete character. GetUniSub and GetUniSubLine return decoded characters in ARRAY OF CHAR and raise an exception if a decoded code point is not in CHAR.? This might seem a bit ridiculous, but they could be useful for quick, partial adaptation of existing code to accept newer encodings and detect, without otherwise handling, higher code points. Actually, GetWideText is documented as being identical to GetText, in behavior, as well as signature.? I think this must be an editing error. I wonder if we need to review the rules for what constitutes a line break. A new UnGetUni would work like UnGetChar, but would reencode the pushed-back character, (retained internally as a WIDECHAR), according to its Enc parameter.? The next Get* would then redecode according to its Enc parameter or implicit encoding, which could be different and consume a different number of bytes.? If this seems bizarre, note that it continues established semantics.? Existing UnGetChar will push back a character, implicitly in ISO-Latin-1, and it is possible to call GetWideChar next, which will use the pushed-back byte plus the byte following, decode in UCS-2LE, and return the result.? UnGetUni will be more complicated to implement, but it can be done. It seems odd that there is no UnGetWideChar.? UnGetUni with Enc:=Encoding.UCS_2LE should accomplish this. A UniCharsReady might be nice, but it would be O(n), for UTF-8 and UTF-16. Of course, these changes will require corresponding changes in several other stream-related interfaces, particularly in providing ways to specify (and interrogated?) an encoding property of a stream. Compiler source file encoding: Existing rules for interpretation (defacto, from the cm3 implementation) of wide character and wide string literals depend on the encoding of the input file.? At present, the compiler always assumes this is ISO-latin-1.? If it actually is a UTF-8 file, as is often the case today, this will result in incorrect conversion of literals. If, in our current implementation, the value of such a literal is then written out by a Modula-3 program, unchanged, the program will write ISO-Latin-1.? If some other program (e.g., an editor or terminal emulator) interprets this output file as UTF-8, the reverse incorrect reinterpretation will result in the original string.? But if the program manipulates the characters using the language-defined abstraction, the result will in general be incorrect. The same scenario applies when a single program reads in ISO-Latin-1, a file that was produced in UTF-8, writes in ISO-Latin-1, with the output file then being fed to some other program that interprets it as UTF-8. -------------- next part -------------- An HTML attachment was scrubbed... URL: From pgoltzsch at gmail.com Thu Jul 12 11:39:58 2012 From: pgoltzsch at gmail.com (Patrick Goltzsch) Date: Thu, 12 Jul 2012 11:39:58 +0200 Subject: [M3devel] unix - unknown qualification Message-ID: <20120712113958.33d94bc4@leda> Hi! I am having trouble compiling some older sources. I had the impression that it would be sufficient to "IMPORT Unix;" in ClsShare.m3 but obviously it's not: --- building in ../AMD64_LINUX --- new source -> compiling ClsShare.m3 "/work/mylib/src/ClsShare.m3", line 73: unknown qualification '.' (struct_flock) "/work/mylib/src/ClsShare.m3", line 74: unknown qualification '.' (F_WRLCK) "/work/mylib/src/ClsShare.m3", line 75: unknown qualification '.' (L_SET) "/work/mylib/src/ClsShare.m3", line 82: unknown qualification '.' (F_SETLK) "/work/mylib/src/ClsShare.m3", line 98: unknown qualification '.' (struct_flock) "/work/mylib/src/ClsShare.m3", line 99: unknown qualification '.' (F_UNLCK) "/work/mylib/src/ClsShare.m3", line 100: unknown qualification '.' (L_SET) "/work/mylib/src/ClsShare.m3", line 107: unknown qualification '.' (F_SETLK) 8 errors encountered What am I doing wrong? Compiler is: Critical Mass Modula-3 version 5.8.6 last updated: 2010-04-11 compiled: 2010-07-12 20:10:34 configuration: /usr/local/cm3/bin/cm3.cfg host: AMD64_LINUX target: AMD64_LINUX Thanks a lot, Patrick From rodney_bates at lcwb.coop Thu Jul 12 14:18:01 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Thu, 12 Jul 2012 07:18:01 -0500 Subject: [M3devel] unix - unknown qualification In-Reply-To: <20120712113958.33d94bc4@leda> References: <20120712113958.33d94bc4@leda> Message-ID: <4FFEC079.7040104@lcwb.coop> I think we need to see some source code for ClsShare.m3. particularly to see what is before the dot on these lines. I don't see any of the failing qualifications in Unix.i3 in my cm3 directory. On 07/12/2012 04:39 AM, Patrick Goltzsch wrote: > Hi! > > I am having trouble compiling some older sources. I had the > impression that it would be sufficient to "IMPORT Unix;" in > ClsShare.m3 but obviously it's not: > > --- building in ../AMD64_LINUX --- > > new source -> compiling ClsShare.m3 > "/work/mylib/src/ClsShare.m3", line 73: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 74: unknown qualification '.' (F_WRLCK) > "/work/mylib/src/ClsShare.m3", line 75: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 82: unknown qualification '.' (F_SETLK) > "/work/mylib/src/ClsShare.m3", line 98: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 99: unknown qualification '.' (F_UNLCK) > "/work/mylib/src/ClsShare.m3", line 100: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 107: unknown qualification '.' (F_SETLK) > 8 errors encountered > > What am I doing wrong? Compiler is: > > Critical Mass Modula-3 version 5.8.6 > last updated: 2010-04-11 > compiled: 2010-07-12 20:10:34 > configuration: /usr/local/cm3/bin/cm3.cfg > host: AMD64_LINUX > target: AMD64_LINUX > > Thanks a lot, > > Patrick > From rodney_bates at lcwb.coop Thu Jul 12 14:27:38 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Thu, 12 Jul 2012 07:27:38 -0500 Subject: [M3devel] unix - unknown qualification In-Reply-To: <20120712113958.33d94bc4@leda> References: <20120712113958.33d94bc4@leda> Message-ID: <4FFEC2BA.4080406@lcwb.coop> I poked around in a version of PM3. There, there are multiple, OS-dependent versions of Unix.i3. Most or all of them do have the failing qualifications declared in them. So somewhere along the line, Unix.i3 has changed and lost these declarations, leaving ClsShare in the lurch. I don't know when or why this happened. Jay? On 07/12/2012 04:39 AM, Patrick Goltzsch wrote: > Hi! > > I am having trouble compiling some older sources. I had the > impression that it would be sufficient to "IMPORT Unix;" in > ClsShare.m3 but obviously it's not: > > --- building in ../AMD64_LINUX --- > > new source -> compiling ClsShare.m3 > "/work/mylib/src/ClsShare.m3", line 73: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 74: unknown qualification '.' (F_WRLCK) > "/work/mylib/src/ClsShare.m3", line 75: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 82: unknown qualification '.' (F_SETLK) > "/work/mylib/src/ClsShare.m3", line 98: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 99: unknown qualification '.' (F_UNLCK) > "/work/mylib/src/ClsShare.m3", line 100: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 107: unknown qualification '.' (F_SETLK) > 8 errors encountered > > What am I doing wrong? Compiler is: > > Critical Mass Modula-3 version 5.8.6 > last updated: 2010-04-11 > compiled: 2010-07-12 20:10:34 > configuration: /usr/local/cm3/bin/cm3.cfg > host: AMD64_LINUX > target: AMD64_LINUX > > Thanks a lot, > > Patrick > From pgoltzsch at gmail.com Thu Jul 12 14:58:11 2012 From: pgoltzsch at gmail.com (Patrick Goltzsch) Date: Thu, 12 Jul 2012 14:58:11 +0200 Subject: [M3devel] unix - unknown qualification In-Reply-To: <4FFEC079.7040104@lcwb.coop> References: <20120712113958.33d94bc4@leda> <4FFEC079.7040104@lcwb.coop> Message-ID: <20120712145811.2a4901d3@leda> >>>>> Rodney M. Bates wrote: > I think we need to see some source code for ClsShare.m3. > particularly to see what is before the dot on these lines. I > don't see any of the failing qualifications in Unix.i3 in my > cm3 directory. The first errors are caused by the following procedure, which seems to copied from old DEC example code as I found out while looking for a solution: PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = VAR flock := Unix.struct_flock { l_type := Unix.F_WRLCK, l_whence := Unix.L_SET, l_start := 0, l_len := 0, (* i.e., whole file *) l_pid := 0 }; (* don't care *) BEGIN flock.l_start := start; flock.l_len := len; IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 THEN IF Uerror.errno = Uerror.EACCES OR Uerror.errno = Uerror.EAGAIN THEN RETURN FALSE END; OSErrorPosix.Raise() END; RETURN TRUE END FilePartLock; Regards, Patrick From dabenavidesd at yahoo.es Thu Jul 12 15:43:52 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 12 Jul 2012 14:43:52 +0100 (BST) Subject: [M3devel] unix - unknown qualification In-Reply-To: <4FFEC2BA.4080406@lcwb.coop> Message-ID: <1342100632.27773.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: all gnu non-posix file consts and structs were pushed down to unix/linux-common files, but to accommodate for all non-posix standards is uncomfortable or impossible. So must use the kernel call directly to control the locking policy in C code and pass control to M3 youControlFile.c In a sane environment is better to reconstruct most of Unix Calls by Micro kernel, but I guess the world doesn't do that or maybe you can find a Unix API uniform enough Modular to do that like PosixFileC.c in libm3/src/os/POSIX for sure there is more than one outside there but who makes that thing doesn't uses Unixes like cygwin or some UnixControlFile.c that already do that would be wodnerful. Thanks in advance --- El jue, 12/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] unix - unknown qualification Para: m3devel at elegosoft.com Fecha: jueves, 12 de julio, 2012 07:27 I poked around in a version of PM3.? There, there are multiple, OS-dependent versions of Unix.i3.? Most or all of them do have the failing qualifications declared in them.? So somewhere along the line, Unix.i3 has changed and lost these declarations, leaving ClsShare in the lurch. I don't know when or why this happened.? Jay? On 07/12/2012 04:39 AM, Patrick Goltzsch wrote: > Hi! > > I am having trouble compiling some older sources. I had the > impression that it would be sufficient to "IMPORT Unix;" in > ClsShare.m3 but obviously it's not: > > --- building in ../AMD64_LINUX --- > > new source ->? compiling ClsShare.m3 > "/work/mylib/src/ClsShare.m3", line 73: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 74: unknown qualification '.' (F_WRLCK) > "/work/mylib/src/ClsShare.m3", line 75: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 82: unknown qualification '.' (F_SETLK) > "/work/mylib/src/ClsShare.m3", line 98: unknown qualification '.' (struct_flock) > "/work/mylib/src/ClsShare.m3", line 99: unknown qualification '.' (F_UNLCK) > "/work/mylib/src/ClsShare.m3", line 100: unknown qualification '.' (L_SET) > "/work/mylib/src/ClsShare.m3", line 107: unknown qualification '.' (F_SETLK) > 8 errors encountered > > What am I doing wrong? Compiler is: > > Critical Mass Modula-3 version 5.8.6 >? ? last updated: 2010-04-11 >? ? compiled: 2010-07-12 20:10:34 >? ? configuration: /usr/local/cm3/bin/cm3.cfg >? ? host: AMD64_LINUX >? ? target: AMD64_LINUX > > Thanks a lot, > > ??? ??? ??? Patrick > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Thu Jul 12 18:52:38 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 12 Jul 2012 17:52:38 +0100 (BST) Subject: [M3devel] Why everything is an object Message-ID: <1342111958.55562.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: If you read this might give some idea to all users about why here everything is an object for real: http://wcook.blogspot.com/ Curiosity, it doesn't much explain why functional isn't subsumed by OO, but every Object in the Baby Modula-3 is functional Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jul 13 00:12:49 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 12 Jul 2012 22:12:49 +0000 Subject: [M3devel] unix - unknown qualification In-Reply-To: <20120712145811.2a4901d3@leda> References: <20120712113958.33d94bc4@leda>, <4FFEC079.7040104@lcwb.coop>, <20120712145811.2a4901d3@leda> Message-ID: Unix.i3 has always been a maintenance and portability problem.As such, it has been dramatically reduced.This stuff was probably removed, esp. struct_flock.The constants can exposed easily enough, portably, but aren't useful without the struct, prpbably. You REALLY REALLY REALLY want to write this in C.Writing it in Modula-3 has many downsides. You lose safety. You lose static checking. You lose portability.You gain infinitely small efficiency.Something like: jbook2:libm3 jay$ pwd/dev2/cm3/m3-libs/libm3jbook2:libm3 jay$ find . | xargs grep flock./src/os/POSIX/FilePosixC.c:saves us from having to declare struct flock, which is gnarled up in #ifdefs../src/os/POSIX/FilePosixC.c: struct flock lock;./src/os/POSIX/FilePosixC.c: struct flock lock;./tests/os/src/locktest.c: struct flock param; ./src/os/POSIX/FilePosixC.c: /* Copyright (C) 1993, Digital Equipment Corporation *//* All rights reserved. *//* See the file COPYRIGHT for a full description. */ /*Writing part of libm3/os/POSIX/FilePosix.m3/RegularFileLock, RegularFileUnlock in Csaves us from having to declare struct flock, which is gnarled up in #ifdefs. see http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html*/ #include "m3core.h"#include #ifdef __cplusplusextern "C" {#endif #define FALSE 0#define TRUE 1 INTEGER FilePosixC__RegularFileLock(int fd){ struct flock lock; int err; ZeroMemory(&lock, sizeof(lock)); lock.l_type = F_WRLCK; lock.l_whence = SEEK_SET; if (fcntl(fd, F_SETLK, &lock) < 0) { err = errno; if (err == EACCES || err == EAGAIN) return FALSE; return -1; } return TRUE;} INTEGER FilePosixC__RegularFileUnlock(int fd){ struct flock lock; ZeroMemory(&lock, sizeof(lock)); lock.l_type = F_UNLCK; lock.l_whence = SEEK_SET; return fcntl(fd, F_SETLK, &lock);} #ifdef __cplusplus} /* extern "C" */#endif We can add this to libm3 probably. - Jay > Date: Thu, 12 Jul 2012 14:58:11 +0200 > From: pgoltzsch at gmail.com > To: m3devel at elegosoft.com > Subject: Re: [M3devel] unix - unknown qualification > > >>>>> Rodney M. Bates wrote: > > > I think we need to see some source code for ClsShare.m3. > > particularly to see what is before the dot on these lines. I > > don't see any of the failing qualifications in Unix.i3 in my > > cm3 directory. > > The first errors are caused by the following procedure, > which seems to copied from old DEC example code as I found > out while looking for a solution: > > PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = > VAR flock := Unix.struct_flock { > l_type := Unix.F_WRLCK, > l_whence := Unix.L_SET, > l_start := 0, > l_len := 0, (* i.e., whole file *) > l_pid := 0 }; (* don't care *) > BEGIN > flock.l_start := start; > flock.l_len := len; > IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 > THEN > IF Uerror.errno = Uerror.EACCES OR > Uerror.errno = Uerror.EAGAIN THEN > RETURN FALSE > END; > OSErrorPosix.Raise() > END; > RETURN TRUE > END FilePartLock; > > > > Regards, > > Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jul 13 11:33:16 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 13 Jul 2012 09:33:16 +0000 Subject: [M3devel] unix - unknown qualification In-Reply-To: References: <20120712113958.33d94bc4@leda>, <4FFEC079.7040104@lcwb.coop>, , <20120712145811.2a4901d3@leda>, Message-ID: Hey, how about I just provide copying wrappers here, like we do for stat?Where we define an idealized portable struct flock and copy it back and forth in C to the real struct flock? It is a little strange -- the wrapper is fnctl.It must check the first parameter, and know/assume its meaning. - Jay From: jay.krell at cornell.edu To: pgoltzsch at gmail.com; m3devel at elegosoft.com Date: Thu, 12 Jul 2012 22:12:49 +0000 Subject: Re: [M3devel] unix - unknown qualification Unix.i3 has always been a maintenance and portability problem.As such, it has been dramatically reduced.This stuff was probably removed, esp. struct_flock.The constants can exposed easily enough, portably, but aren't useful without the struct, prpbably. You REALLY REALLY REALLY want to write this in C.Writing it in Modula-3 has many downsides. You lose safety. You lose static checking. You lose portability.You gain infinitely small efficiency.Something like: jbook2:libm3 jay$ pwd/dev2/cm3/m3-libs/libm3jbook2:libm3 jay$ find . | xargs grep flock./src/os/POSIX/FilePosixC.c:saves us from having to declare struct flock, which is gnarled up in #ifdefs../src/os/POSIX/FilePosixC.c: struct flock lock;./src/os/POSIX/FilePosixC.c: struct flock lock;./tests/os/src/locktest.c: struct flock param; ./src/os/POSIX/FilePosixC.c: /* Copyright (C) 1993, Digital Equipment Corporation *//* All rights reserved. *//* See the file COPYRIGHT for a full description. */ /*Writing part of libm3/os/POSIX/FilePosix.m3/RegularFileLock, RegularFileUnlock in Csaves us from having to declare struct flock, which is gnarled up in #ifdefs. see http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html*/ #include "m3core.h"#include #ifdef __cplusplusextern "C" {#endif #define FALSE 0#define TRUE 1 INTEGER FilePosixC__RegularFileLock(int fd){ struct flock lock; int err; ZeroMemory(&lock, sizeof(lock)); lock.l_type = F_WRLCK; lock.l_whence = SEEK_SET; if (fcntl(fd, F_SETLK, &lock) < 0) { err = errno; if (err == EACCES || err == EAGAIN) return FALSE; return -1; } return TRUE;} INTEGER FilePosixC__RegularFileUnlock(int fd){ struct flock lock; ZeroMemory(&lock, sizeof(lock)); lock.l_type = F_UNLCK; lock.l_whence = SEEK_SET; return fcntl(fd, F_SETLK, &lock);} #ifdef __cplusplus} /* extern "C" */#endif We can add this to libm3 probably. - Jay > Date: Thu, 12 Jul 2012 14:58:11 +0200 > From: pgoltzsch at gmail.com > To: m3devel at elegosoft.com > Subject: Re: [M3devel] unix - unknown qualification > > >>>>> Rodney M. Bates wrote: > > > I think we need to see some source code for ClsShare.m3. > > particularly to see what is before the dot on these lines. I > > don't see any of the failing qualifications in Unix.i3 in my > > cm3 directory. > > The first errors are caused by the following procedure, > which seems to copied from old DEC example code as I found > out while looking for a solution: > > PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = > VAR flock := Unix.struct_flock { > l_type := Unix.F_WRLCK, > l_whence := Unix.L_SET, > l_start := 0, > l_len := 0, (* i.e., whole file *) > l_pid := 0 }; (* don't care *) > BEGIN > flock.l_start := start; > flock.l_len := len; > IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 > THEN > IF Uerror.errno = Uerror.EACCES OR > Uerror.errno = Uerror.EAGAIN THEN > RETURN FALSE > END; > OSErrorPosix.Raise() > END; > RETURN TRUE > END FilePartLock; > > > > Regards, > > Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Fri Jul 13 14:54:37 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Fri, 13 Jul 2012 07:54:37 -0500 Subject: [M3devel] unix - unknown qualification In-Reply-To: References: <20120712113958.33d94bc4@leda>, <4FFEC079.7040104@lcwb.coop>, , <20120712145811.2a4901d3@leda>, Message-ID: <50001A8D.80805@lcwb.coop> Sounds like a good idea to me. IT moves the M3/C boundary back just enough to pick up all the #ifdef stuff, etc. but not the application-specific code. On 07/13/2012 04:33 AM, Jay K wrote: > Hey, how about I just provide copying wrappers here, like we do for stat? > Where we define an idealized portable struct flock and copy it back and forth in C to the real struct flock? > > It is a little strange -- the wrapper is fnctl. > It must check the first parameter, and know/assume its meaning. > > > - Jayrom: jay.krell at cornell.edu > To: pgoltzsch at gmail.com; m3devel at elegosoft.com > Date: Thu, 12 Jul 2012 22:12:49 +0000 > Subject: Re: [M3devel] unix - unknown qualification > > Unix.i3 has always been a maintenance and portability problem. > As such, it has been dramatically reduced. > This stuff was probably removed, esp. struct_flock. > The constants can exposed easily enough, portably, but aren't useful without the struct, prpbably. > > > You REALLY REALLY REALLY want to write this in C. > Writing it in Modula-3 has many downsides. You lose safety. You losestatic checking. You loseportability. > You gain infinitely small efficiency. > Something like: > > > jbook2:libm3 jay$ pwd > /dev2/cm3/m3-libs/libm3 > jbook2:libm3 jay$ find . | xargs grep flock > ./src/os/POSIX/FilePosixC.c:saves us from having to declare struct flock, which is gnarled up in #ifdefs. > ./src/os/POSIX/FilePosixC.c: struct flock lock; > ./src/os/POSIX/FilePosixC.c: struct flock lock; > ./tests/os/src/locktest.c: struct flock param; > > > ./src/os/POSIX/FilePosixC.c: > > /* Copyright (C) 1993, Digital Equipment Corporation */ > /* All rights reserved. */ > /* See the file COPYRIGHT for a full description. */ > > /* > Writing part of libm3/os/POSIX/FilePosix.m3/RegularFileLock, RegularFileUnlock in C > saves us from having to declare struct flock, which is gnarled up in #ifdefs. > > see http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html > */ > > #include "m3core.h" > #include > > #ifdef __cplusplus > extern "C" { > #endif > > #define FALSE 0 > #define TRUE 1 > > INTEGER FilePosixC__RegularFileLock(int fd) > { > struct flock lock; > int err; > > ZeroMemory(&lock, sizeof(lock)); > lock.l_type = F_WRLCK; > lock.l_whence = SEEK_SET; > > if (fcntl(fd, F_SETLK, &lock) < 0) > { > err = errno; > if (err == EACCES || err == EAGAIN) > return FALSE; > return -1; > } > return TRUE; > } > > INTEGER FilePosixC__RegularFileUnlock(int fd) > { > struct flock lock; > > ZeroMemory(&lock, sizeof(lock)); > lock.l_type = F_UNLCK; > lock.l_whence = SEEK_SET; > > return fcntl(fd, F_SETLK, &lock); > } > > #ifdef __cplusplus > } /* extern "C" */ > #endif > > > > We can add this to libm3 probably. > > > - Jay > > > > Date: Thu, 12 Jul 2012 14:58:11 +0200 > > From: pgoltzsch at gmail.com > > To: m3devel at elegosoft.com > > Subject: Re: [M3devel] unix - unknown qualification > > > > >>>>> Rodney M. Bates wrote: > > > > > I think we need to see some source code for ClsShare.m3. > > > particularly to see what is before the dot on these lines. I > > > don't see any of the failing qualifications in Unix.i3 in my > > > cm3 directory. > > > > The first errors are caused by the following procedure, > > which seems to copied from old DEC example code as I found > > out while looking for a solution: > > > > PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = > > VAR flock := Unix.struct_flock { > > l_type := Unix.F_WRLCK, > > l_whence := Unix.L_SET, > > l_start := 0, > > l_len := 0, (* i.e., whole file *) > > l_pid := 0 }; (* don't care *) > > BEGIN > > flock.l_start := start; > > flock.l_len := len; > > IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 > > THEN > > IF Uerror.errno = Uerror.EACCES OR > > Uerror.errno = Uerror.EAGAIN THEN > > RETURN FALSE > > END; > > OSErrorPosix.Raise() > > END; > > RETURN TRUE > > END FilePartLock; > > > > > > > > Regards, > > > > Patrick From dabenavidesd at yahoo.es Fri Jul 13 16:44:55 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 13 Jul 2012 15:44:55 +0100 (BST) Subject: [M3devel] unix - unknown qualification In-Reply-To: <50001A8D.80805@lcwb.coop> Message-ID: <1342190695.15538.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: indeed but I'm afraid that using C API level specification programming doesn't make the bulk sense of the language, the core is about machine programming, that so many believe is better in C. But UNSAFE in my way of think is just better than C because you still have some check not bullet proof, but with appropriate module isolation you can control it doesn't propagate by using Modula-3 keen Modules in RTMachinery stopped appropriately and where the machine allows safety manageable execution you can recover from that (trapped error, like arithmetic overflow e.g to dump it in disk) or update your data and finish with an expectancy of following rules to stop execution, this is my point Jay. Now quality of current machines is going more bad than before, so who cares if we use DEC stuff. I wanted to say, that here the language designers tried hard to make easier to optimize itself the language and for this purpose in mind, with that objective makes sense to believe that the application itself must be compiled with Modula-3, so at some degree I'm being hypocritical about Gcc use, but sometimes using Gcc gives more time to develop the rest of the system. Thanks in advance --- El vie, 13/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] unix - unknown qualification Para: m3devel at elegosoft.com Fecha: viernes, 13 de julio, 2012 07:54 Sounds like a good idea to me.? IT moves the M3/C boundary back just enough to pick up all the #ifdef stuff, etc. but not the application-specific code. On 07/13/2012 04:33 AM, Jay K wrote: > Hey, how about I just provide copying wrappers here, like we do for stat? > Where we define an idealized portable struct flock and copy it back and forth in C to the real struct flock? > > It is a little strange -- the wrapper is fnctl. > It must check the first parameter, and know/assume its meaning. > > >???- Jayrom: jay.krell at cornell.edu > To: pgoltzsch at gmail.com; m3devel at elegosoft.com > Date: Thu, 12 Jul 2012 22:12:49 +0000 > Subject: Re: [M3devel] unix - unknown qualification > > Unix.i3 has always been a maintenance and portability problem. > As such, it has been dramatically reduced. > This stuff was probably removed, esp. struct_flock. > The constants can exposed easily enough, portably, but aren't useful without the struct, prpbably. > > > You REALLY REALLY REALLY want to write this in C. > Writing it in Modula-3 has many downsides. You lose safety. You losestatic checking. You loseportability. > You gain infinitely small efficiency. > Something like: > > > jbook2:libm3 jay$ pwd > /dev2/cm3/m3-libs/libm3 > jbook2:libm3 jay$ find . | xargs grep flock > ./src/os/POSIX/FilePosixC.c:saves us from having to declare struct flock, which is gnarled up in #ifdefs. > ./src/os/POSIX/FilePosixC.c:? ? struct flock lock; > ./src/os/POSIX/FilePosixC.c:? ? struct flock lock; > ./tests/os/src/locktest.c:? struct flock param; > > > ./src/os/POSIX/FilePosixC.c: > > /* Copyright (C) 1993, Digital Equipment Corporation? ? ? ? ???*/ > /* All rights reserved.? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? */ > /* See the file COPYRIGHT for a full description.? ? ? ? ? ? ? */ > > /* > Writing part of libm3/os/POSIX/FilePosix.m3/RegularFileLock, RegularFileUnlock in C > saves us from having to declare struct flock, which is gnarled up in #ifdefs. > > see http://www.opengroup.org/onlinepubs/009695399/functions/fcntl.html > */ > > #include "m3core.h" > #include > > #ifdef __cplusplus > extern "C" { > #endif > > #define FALSE 0 > #define TRUE 1 > > INTEGER FilePosixC__RegularFileLock(int fd) > { >? ? ? struct flock lock; >? ? ? int err; > >? ? ? ZeroMemory(&lock, sizeof(lock)); >? ? ? lock.l_type = F_WRLCK; >? ? ? lock.l_whence = SEEK_SET; > >? ? ? if (fcntl(fd, F_SETLK, &lock) < 0) >? ? ? { >? ? ? ? ? err = errno; >? ? ? ? ? if (err == EACCES || err == EAGAIN) >? ? ? ? ? ? ? return FALSE; >? ? ? ? ? return -1; >? ? ? } >? ? ? return TRUE; > } > > INTEGER FilePosixC__RegularFileUnlock(int fd) > { >? ? ? struct flock lock; > >? ? ? ZeroMemory(&lock, sizeof(lock)); >? ? ? lock.l_type = F_UNLCK; >? ? ? lock.l_whence = SEEK_SET; > >? ? ? return fcntl(fd, F_SETLK, &lock); > } > > #ifdef __cplusplus > } /* extern "C" */ > #endif > > > > We can add this to libm3 probably. > > >???- Jay > > >? > Date: Thu, 12 Jul 2012 14:58:11 +0200 >? > From: pgoltzsch at gmail.com >? > To: m3devel at elegosoft.com >? > Subject: Re: [M3devel] unix - unknown qualification >? > >? > >>>>> Rodney M. Bates wrote: >? > >? > > I think we need to see some source code for ClsShare.m3. >? > > particularly to see what is before the dot on these lines. I >? > > don't see any of the failing qualifications in Unix.i3 in my >? > > cm3 directory. >? > >? > The first errors are caused by the following procedure, >? > which seems to copied from old DEC example code as I found >? > out while looking for a solution: >? > >? > PROCEDURE FilePartLock( h : INTEGER; start, len : INTEGER ) : BOOLEAN RAISES {OSError.E} = >? > VAR flock := Unix.struct_flock { >? > l_type := Unix.F_WRLCK, >? > l_whence := Unix.L_SET, >? > l_start := 0, >? > l_len := 0, (* i.e., whole file *) >? > l_pid := 0 }; (* don't care *) >? > BEGIN >? > flock.l_start := start; >? > flock.l_len := len; >? > IF Unix.fcntl( h, Unix.F_SETLK, LOOPHOLE( ADR( flock ), Ctypes.long ) ) < 0 >? > THEN >? > IF Uerror.errno = Uerror.EACCES OR >? > Uerror.errno = Uerror.EAGAIN THEN >? > RETURN FALSE >? > END; >? > OSErrorPosix.Raise() >? > END; >? > RETURN TRUE >? > END FilePartLock; >? > >? > >? > >? > Regards, >? > >? > Patrick -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jul 14 10:27:23 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 14 Jul 2012 08:27:23 +0000 Subject: [M3devel] fcntl last parameter int vs. pointer Message-ID: Thoughts on Unix__fcntl(int fd, int request, int arg) { ??? return fcntl(fd, request, arg); } vs. Unix__fcntl(int fd, int request, INTEGER arg) { ??? return fcntl(fd, request, arg); } where int is 32bits and INTEGER is exactly the same size as a pointer. Will it "just work" if I change it? arg is sometimes a pointer, sometimes an integer, maybe sometimes other? Ok, let's assume 32bit integer and 32bit or 64bit pointer are the only possibilities. Are there calling conventions that care? And will pass the parameter differently/wrong? Do any calling conventions pack multiple smaller-than-64bit parameters into one 64bit register? I'm *guessing* no. I guess, as well, I can experiment with a few... ?- Jay From dabenavidesd at yahoo.es Sat Jul 14 17:31:36 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sat, 14 Jul 2012 16:31:36 +0100 (BST) Subject: [M3devel] fcntl last parameter int vs. pointer In-Reply-To: Message-ID: <1342279896.71297.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: In fact both C and Modula-3 don't allow a signature change, original C compiler and decompiler type check the signature, although type casting is possible in both languages: http://web.cs.mun.ca/~ulf/pld/mocplus.html However, when talking about a functional language you can override as in Baby Modula-3 the type at instantiation time for methods and values, so I guess you can sort of relax the strict rules in that relation of the two object function types (to make it a subtype): http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf see p. 10 - S. 3.2.4 - Discussion I just know they could make it work, but it was very hard complex system. Thanks in advance --- El s?b, 14/7/12, Jay K escribi?: De: Jay K Asunto: [M3devel] fcntl last parameter int vs. pointer Para: "m3devel" Fecha: s?bado, 14 de julio, 2012 03:27 Thoughts on Unix__fcntl(int fd, int request, int arg) { return fcntl(fd, request, arg); } vs. Unix__fcntl(int fd, int request, INTEGER arg) { return fcntl(fd, request, arg); } where int is 32bits and INTEGER is exactly the same size as a pointer. Will it "just work" if I change it? arg is sometimes a pointer, sometimes an integer, maybe sometimes other? Ok, let's assume 32bit integer and 32bit or 64bit pointer are the only possibilities. Are there calling conventions that care? And will pass the parameter differently/wrong? Do any calling conventions pack multiple smaller-than-64bit parameters into one 64bit register? I'm *guessing* no. I guess, as well, I can experiment with a few... - Jay ??? ???????? ?????? ??? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Sat Jul 14 22:05:57 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Sat, 14 Jul 2012 15:05:57 -0500 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> <20120627015457.238041A205B@async.async.caltech.edu> Message-ID: <5001D125.6020704@lcwb.coop> On 06/27/2012 02:58 AM, Dirk Muysers wrote: > Some time ago I have started to develop a unicode library based > on the old M3 text model but using UTF-8 internally rather than > Latin-1 (see README attachement). For reasons best known to > me I had to put it on the backburner in favour of more urgent work. > If anybody is interested in furthering this solution I would eagerly > give the existing (pre-alpha) code away. > This being said, there are certainly better hash algorithms than the > one used by m3core (eg Goullburn, see > http://www.clockandflame.com/media/Goulburn06.pdf). > > And: 1. Properties This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as TYPE Rune = [0..16_10FFFF], but unfortunately not all values in the code-point range are valid and others are left undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to contain any invalid or undefined Rune. I don't understand the reasoning here. Your criticism of the subrange type is that it contains invalid values between the bounds, which you address with dynamic value checks inside the library code. But why eliminate the subrange and changing the type to an integer? It only drastically increases the number of invalid values, by a factor of over 2^11 times, if integer is 32-bit, otherwise more. And it demotes the status of these from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even partial level of confidence. It also precludes storing them in less than 64 bits on a 64-bit machine. Am I missing something? From jay.krell at cornell.edu Sun Jul 15 03:11:26 2012 From: jay.krell at cornell.edu (Jay) Date: Sat, 14 Jul 2012 18:11:26 -0700 Subject: [M3devel] fcntl last parameter int vs. pointer In-Reply-To: <1342279896.71297.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1342279896.71297.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: Daniel your replies are pointless. You have exhausted my patience. - Jay (briefly/pocket-sized-computer-aka-phone) On Jul 14, 2012, at 8:31 AM, "Daniel Alejandro Benavides D." wrote: > Hi all: > In fact both C and Modula-3 don't allow a signature change, original C compiler and decompiler type check the signature, although type casting is possible in both languages: > http://web.cs.mun.ca/~ulf/pld/mocplus.html > > However, when talking about a functional language you can override as in Baby Modula-3 the type at instantiation time for methods and values, so I guess you can sort of relax the strict rules in that relation of the two object function types (to make it a subtype): > http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf > > see p. 10 - S. 3.2.4 - Discussion > > I just know they could make it work, but it was very hard complex system. > > Thanks in advance > > > --- El s?b, 14/7/12, Jay K escribi?: > > De: Jay K > Asunto: [M3devel] fcntl last parameter int vs. pointer > Para: "m3devel" > Fecha: s?bado, 14 de julio, 2012 03:27 > > > Thoughts on > > Unix__fcntl(int fd, int request, int arg) > { > return fcntl(fd, request, arg); > } > > vs. > > Unix__fcntl(int fd, int request, INTEGER arg) > { > > return fcntl(fd, request, arg); > > } > > > > where int is 32bits and INTEGER is exactly the same size as a pointer. > > > Will it "just work" if I change it? > arg is sometimes a pointer, sometimes an integer, maybe sometimes other? > Ok, let's assume 32bit integer and 32bit or 64bit pointer are the only possibilities. > Are there calling conventions that care? And will pass the parameter differently/wrong? > > > Do any calling conventions pack multiple smaller-than-64bit parameters into one 64bit register? > > > I'm *guessing* no. > I guess, as well, I can experiment with a few... > > > > - Jay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Sun Jul 15 03:28:59 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 15 Jul 2012 02:28:59 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <5001D125.6020704@lcwb.coop> Message-ID: <1342315739.35223.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: making changes of type representation of TEXT in text makes no sense, it's a violation of the Text abstraction http://books.google.com.co/books?id=FbemcUFa0JIC&pg=PA303&dq=source=bl&ots=POGbXUhcW1&sig=WULSpZ74yYU30s-cZ2zhtNTByd8&hl=en&redir_esc=y#v=onepage&q&f=false you are in fact claiming that the default type of CHAR is Latin-1, I don't get that because you type extend CHAR and say it's not default, there is something bad, I'm some how suspicious about this need to type in *different* two ranges for receiving a character in one type script and one in another, essentially meaning that the language is wrong in declaring TEXT as a opaque type and should use both kind of strings always or worse the non-default type, which is naturally impossible. Sorry guys, but I'm not agreeing with you in this one, I hope you make the best of CM3 work or leave alone the package a la DEC-SRC. If you are thinking in widening the TEXT string package make it polymorphic it doesn't add complexity burden to the language, it explains better the CHAR type and its extension but do it naturally using the language types, don't create your own one with only that purpose. THanks? in advance ? --- El s?b, 14/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: m3devel at elegosoft.com Fecha: s?bado, 14 de julio, 2012 15:05 On 06/27/2012 02:58 AM, Dirk Muysers wrote: > Some time ago I have started to develop a unicode library based > on the old M3 text model but using UTF-8 internally rather than > Latin-1 (see README attachement). For reasons best known to > me I had to put it on the backburner in favour of more urgent work. > If anybody is interested in furthering this solution I would eagerly > give the existing (pre-alpha) code away. > This being said, there are certainly better hash algorithms than the > one used by m3core (eg Goullburn, see > http://www.clockandflame.com/media/Goulburn06.pdf). > > And: 1. Properties This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as TYPE Rune = [0..16_10FFFF], but? unfortunately not all values in the code-point range are valid and others are left undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to contain any invalid or undefined Rune. I don't understand the reasoning here.? Your criticism of the subrange type is that it contains invalid values between the bounds, which you address with dynamic value checks inside the library code.? But why eliminate the subrange and changing the type to an integer?? It only drastically increases the number of invalid values, by a factor of over 2^11 times, if integer is 32-bit, otherwise more.? And it demotes the status of these from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even partial level of confidence.? It also precludes storing them in less than 64 bits on a 64-bit machine. Am I missing something? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Sun Jul 15 03:44:36 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 15 Jul 2012 02:44:36 +0100 (BST) Subject: [M3devel] fcntl last parameter int vs. pointer In-Reply-To: Message-ID: <1342316676.56405.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: I'm sorry for you That I didn't exampled my self my point (perhaps I'm being too abstract for this point), but if you cared to tell all that I will say it more openly: Doing that type conversion as the first url says (look third row at the beginning a. literal) http://web.cs.mun.ca/~ulf/pld/mocplus.html#subclassing You will break the modular safety. However I'm telling you that one can make such an abstraction in Modula-3 (in Baby sized language) with functional programming making obeying subtype fcntl1 <: fcntl2, of course Jay I suppose your fcntl1 is badly signed, am I right? OK, I hope I'm being clearer. Thanks for the patience of all of that, in advance --- El s?b, 14/7/12, Jay escribi?: De: Jay Asunto: Re: [M3devel] fcntl last parameter int vs. pointer Para: "Daniel Alejandro Benavides D." CC: "m3devel" , "Jay K" Fecha: s?bado, 14 de julio, 2012 20:11 Daniel your replies are pointless. You have exhausted my patience. ?- Jay (briefly/pocket-sized-computer-aka-phone) On Jul 14, 2012, at 8:31 AM, "Daniel Alejandro Benavides D." wrote: Hi all: In fact both C and Modula-3 don't allow a signature change, original C compiler and decompiler type check the signature, although type casting is possible in both languages: http://web.cs.mun.ca/~ulf/pld/mocplus.html However, when talking about a functional language you can override as in Baby Modula-3 the type at instantiation time for methods and values, so I guess you can sort of relax the strict rules in that relation of the two object function types (to make it a subtype): http://www.hpl.hp.com/techreports/Compaq-DEC/SRC-RR-95.pdf see p. 10 - S. 3.2.4 - Discussion I just know they could make it work, but it was very hard complex system. Thanks in advance --- El s?b, 14/7/12, Jay K escribi?: De: Jay K Asunto: [M3devel] fcntl last parameter int vs. pointer Para: "m3devel" Fecha: s?bado, 14 de julio, 2012 03:27 Thoughts on Unix__fcntl(int fd, int request, int arg) { return fcntl(fd, request, arg); } vs. Unix__fcntl(int fd, int request, INTEGER arg) { return fcntl(fd, request, arg); } where int is 32bits and INTEGER is exactly the same size as a pointer. Will it "just work" if I change it? arg is sometimes a pointer, sometimes an integer, maybe sometimes other? Ok, let's assume 32bit integer and 32bit or 64bit pointer are the only possibilities. Are there calling conventions that care? And will pass the parameter differently/wrong? Do any calling conventions pack multiple smaller-than-64bit parameters into one 64bit register? I'm *guessing* no. I guess, as well, I can experiment with a few... - Jay ??? ???????? ?????? ??? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Sun Jul 15 10:13:35 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Sun, 15 Jul 2012 10:13:35 +0200 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <5001D125.6020704@lcwb.coop> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org><20120626181955.GB29355@topoi.pooq.com><20120627015457.238041A205B@async.async.caltech.edu> <5001D125.6020704@lcwb.coop> Message-ID: My reasoning here was a pragmatic rather than a type-theoretical one. A rune defined as an integer can be freely passed around, while as a subrange it undergoes a hidden range check at every assignment. Now that range check wouldn't buy me anything, since the validation of a rune entails more than a simple range check and remains unavoidable in order to ensure the postcondition of pure Unicode in any text. -------------------------------------------------- From: "Rodney M. Bates" Sent: Saturday, July 14, 2012 10:05 PM To: Subject: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > > > On 06/27/2012 02:58 AM, Dirk Muysers wrote: >> Some time ago I have started to develop a unicode library based >> on the old M3 text model but using UTF-8 internally rather than >> Latin-1 (see README attachement). For reasons best known to >> me I had to put it on the backburner in favour of more urgent work. >> If anybody is interested in furthering this solution I would eagerly >> give the existing (pre-alpha) code away. >> This being said, there are certainly better hash algorithms than the >> one used by m3core (eg Goullburn, see >> http://www.clockandflame.com/media/Goulburn06.pdf). >> >> > And: > > > 1. Properties > > This part deals with properties of Unicode code-points/characters. We call > Unicode code-points "runes" for brevity. > Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode > specification. We could have defined a Rune as > TYPE Rune = [0..16_10FFFF], but unfortunately not all values in the > code-point range are valid and others are left > undefined, so a "Rune" is defined as an integer. The library uses > defensive programming by not allowing a string to > contain any invalid or undefined Rune. > > I don't understand the reasoning here. Your criticism of the subrange > type is that it contains invalid values > between the bounds, which you address with dynamic value checks inside the > library code. But why eliminate the > subrange and changing the type to an integer? It only drastically > increases the number of invalid values, > by a factor of over 2^11 times, if integer is 32-bit, otherwise more. And > it demotes the status of these > from statically-detected, in one compile, to dynamically-detected, > requiring massive testing to get an even > partial level of confidence. It also precludes storing them in less than > 64 bits on a 64-bit machine. > > Am I missing something? > From dabenavidesd at yahoo.es Sun Jul 15 15:14:51 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 15 Jul 2012 14:14:51 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: Message-ID: <1342358091.65493.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: wouldn't be pragmas the best solution here, making them inlining of TEXT type as some representation specific character type, still not making the language obey rules that aren't inherently correct, by that I mean, CHARs are what they are and string of CHARs values are compatible in current implementation just that it doesn't care too much to validate when one character or another is in typed. Thanks in advance --- El dom, 15/7/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "Rodney M. Bates" CC: m3devel at elegosoft.com Fecha: domingo, 15 de julio, 2012 03:13 My reasoning here was a pragmatic rather than a type-theoretical one. A rune defined as an integer can be freely passed around, while as a subrange it undergoes a hidden range check at every assignment. Now that range check wouldn't buy me anything, since the validation of a rune entails more than a simple range check and remains unavoidable in order to ensure the postcondition of pure Unicode in any text. -------------------------------------------------- From: "Rodney M. Bates" Sent: Saturday, July 14, 2012 10:05 PM To: Subject: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > > > On 06/27/2012 02:58 AM, Dirk Muysers wrote: >> Some time ago I have started to develop a unicode library based >> on the old M3 text model but using UTF-8 internally rather than >> Latin-1 (see README attachement). For reasons best known to >> me I had to put it on the backburner in favour of more urgent work. >> If anybody is interested in furthering this solution I would eagerly >> give the existing (pre-alpha) code away. >> This being said, there are certainly better hash algorithms than the >> one used by m3core (eg Goullburn, see >> http://www.clockandflame.com/media/Goulburn06.pdf). >> >> > And: > > > 1. Properties > > This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. > Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as > TYPE Rune = [0..16_10FFFF], but? unfortunately not all values in the code-point range are valid and others are left > undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to > contain any invalid or undefined Rune. > > I don't understand the reasoning here.? Your criticism of the subrange type is that it contains invalid values > between the bounds, which you address with dynamic value checks inside the library code.? But why eliminate the > subrange and changing the type to an integer?? It only drastically increases the number of invalid values, > by a factor of over 2^11 times, if integer is 32-bit, otherwise more.? And it demotes the status of these > from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even > partial level of confidence.? It also precludes storing them in less than 64 bits on a 64-bit machine. > > Am I missing something? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Sun Jul 15 18:22:48 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Sun, 15 Jul 2012 11:22:48 -0500 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <1342315739.35223.YahooMailClassic@web29705.mail.ird.yahoo.com> References: <1342315739.35223.YahooMailClassic@web29705.mail.ird.yahoo.com> Message-ID: <5002EE58.6010401@lcwb.coop> On 07/14/2012 08:28 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > making changes of type representation of TEXT in text makes no sense, it's a violation of the Text abstraction No, I disagree here. A primary property of an abstraction is that clients can use it _without_ knowledge of the internal representation. The representation can be changed without altering the behavior of any program that uses the abstraction. A program that imports representation-dependent interfaces such as TextRep.i3 is an exception, but doing so means it known abstraction violator, from the beginning. > http://books.google.com.co/books?id=FbemcUFa0JIC&pg=PA303&dq=source=bl&ots=POGbXUhcW1&sig=WULSpZ74yYU30s-cZ2zhtNTByd8&hl=en&redir_esc=y#v=onepage&q&f=false > > you are in fact claiming that the default type of CHAR is Latin-1, I don't get that because you type extend CHAR and say it's not default, there is something bad, I'm some how suspicious about this need to type in *different* two ranges for receiving a character in one type script and one in another, essentially meaning that the language is wrong in declaring TEXT as a opaque type and should use both kind of strings always or worse the non-default type, which is naturally impossible. I'm not sure what you are saying here. The language does clearly say that CHAR contains (at least) ISO-Latin-1. But I am not proposing to extend CHAR beyond exactly ISO-latin-1, as it is in every implementation of Modula-3. This is because I am sure doing so would break a large amount of existing code. Such code assumes that BYTESIZE(CHAR)=1. I _am_ proposing to extend WIDECHAR to hold Unicode. WIDECHAR was added with this in mind, but today, it fails because its range is too limited. I think probably WIDECHAR was added at a time when only 2^16 code points were in the standard(s). But that has changed. This is a very simple fix of that. As for TEXT, the CM3 version is and always was abstract a string of WIDECHAR. The procedures that have parameters of type CHAR just do the widening or narrowing at the time a character is passed in or out. The fact that the current representation holds some characters in 8-bit array elements is hidden by the Text abstraction, and can be changed if convenient. In contrast, Wr/Rd and friends do not hide character representations in the stream. This is as it must be, and I am proposing only to add additional representations that they can handle, and make it convenient for the usual case that an entire stream uses the same representation of characters. > Sorry guys, but I'm not agreeing with you in this one, I hope you make the best of CM3 work or leave alone the package a la DEC-SRC. > If you are thinking in widening the TEXT string package make it polymorphic it doesn't add complexity burden to the language, it explains better the CHAR type and its extension but do it naturally using the language types, don't create your own one with only that purpose. > THanks in advance > > > > > --- El *s?b, 14/7/12, Rodney M. Bates //* escribi?: > > > De: Rodney M. Bates > Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > Para: m3devel at elegosoft.com > Fecha: s?bado, 14 de julio, 2012 15:05 > > > > On 06/27/2012 02:58 AM, Dirk Muysers wrote: > > Some time ago I have started to develop a unicode library based > > on the old M3 text model but using UTF-8 internally rather than > > Latin-1 (see README attachement). For reasons best known to > > me I had to put it on the backburner in favour of more urgent work. > > If anybody is interested in furthering this solution I would eagerly > > give the existing (pre-alpha) code away. > > This being said, there are certainly better hash algorithms than the > > one used by m3core (eg Goullburn, see > > http://www.clockandflame.com/media/Goulburn06.pdf). > > > > > And: > > > 1. Properties > > This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. > Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as > TYPE Rune = [0..16_10FFFF], but unfortunately not all values in the code-point range are valid and others are left > undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to > contain any invalid or undefined Rune. > > I don't understand the reasoning here. Your criticism of the subrange type is that it contains invalid values > between the bounds, which you address with dynamic value checks inside the library code. But why eliminate the > subrange and changing the type to an integer? It only drastically increases the number of invalid values, > by a factor of over 2^11 times, if integer is 32-bit, otherwise more. And it demotes the status of these > from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even > partial level of confidence. It also precludes storing them in less than 64 bits on a 64-bit machine. > > Am I missing something? > From mika at async.caltech.edu Sun Jul 15 19:39:11 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Sun, 15 Jul 2012 10:39:11 -0700 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org><20120626181955.GB29355@topoi.pooq.com><20120627015457.238041A205B@async.async.caltech.edu> <5001D125.6020704@lcwb.coop> Message-ID: <20120715173911.D18A61A208F@async.async.caltech.edu> I believe the compilers in existence are smart enough not to insert the range check when the types are the same on both sides of the :=. At least for copying... i.e., a, b : WIDECHAR; BEGIN a := b END should not imply a range check. With the types in question, that is probably by far the most common operation, too. Mika "Dirk Muysers" writes: >My reasoning here was a pragmatic rather than a type-theoretical one. >A rune defined as an integer can be freely passed around, while as >a subrange it undergoes a hidden range check at every assignment. >Now that range check wouldn't buy me anything, since the validation >of a rune entails more than a simple range check and remains unavoidable >in order to ensure the postcondition of pure Unicode in any text. From dabenavidesd at yahoo.es Mon Jul 16 03:53:00 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 16 Jul 2012 02:53:00 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <5002EE58.6010401@lcwb.coop> Message-ID: <1342403580.18580.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: Yes, I was referring to native module clients (not C, nor anything else modules) to be able to change rep, is a violation. About the problem of type, is that REF CHAR is a value space strictly more than Latin-1, so this is what I mean, encoding in one type or another must be determined by its subexpressions not by defaults like TEXT type, this is what I mean, width subtyping refers to add some value range as you say may or may be not in the same range of Unicode then it must be called WIDECHAR, you can't call it UCHAR etc, it misses the point of abstraction here, if so, how many types, we would want, 20, 30 according to the bit ending please give a break, we are not C doers, and if we are then call them in your libraries we don't need to contaminate us, sorry I'm not telling that you are being noisy but this certainly could be that (also me). Rodney, please correct me when I say something wrong but are you saying that you will start to put in every interface procedures and stuff to convert oh no, sorry; I hope I'm not that guy converting because somebody needed an extra interface to code some language, it will be a real mess. Thanks in advance --- El dom, 15/7/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "Daniel Alejandro Benavides D." CC: m3devel at elegosoft.com Fecha: domingo, 15 de julio, 2012 11:22 On 07/14/2012 08:28 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > making changes of type representation of TEXT in text makes no sense, it's a violation of the Text abstraction No, I disagree here.? A primary property of an abstraction is that clients can use it _without_ knowledge of the internal representation.? The representation can be changed without altering the behavior of any program that uses the abstraction.? A program that imports representation-dependent interfaces such as TextRep.i3 is an exception, but doing so means it known abstraction violator, from the beginning. > http://books.google.com.co/books?id=FbemcUFa0JIC&pg=PA303&dq=source=bl&ots=POGbXUhcW1&sig=WULSpZ74yYU30s-cZ2zhtNTByd8&hl=en&redir_esc=y#v=onepage&q&f=false > > you are in fact claiming that the default type of CHAR is Latin-1, I don't get that because you type extend CHAR and say it's not default, there is something bad, I'm some how suspicious about this need to type in *different* two ranges for receiving a character in one type script and one in another, essentially meaning that the language is wrong in declaring TEXT as a opaque type and should use both kind of strings always or worse the non-default type, which is naturally impossible. I'm not sure what you are saying here.? The language does clearly say that CHAR contains (at least) ISO-Latin-1. But I am not proposing to extend CHAR beyond exactly ISO-latin-1, as it is in every implementation of Modula-3. This is because I am sure doing so would break a large amount of existing code.? Such code assumes that BYTESIZE(CHAR)=1. I _am_ proposing to extend WIDECHAR to hold Unicode.? WIDECHAR was added with this in mind, but today, it fails because its range is too limited.? I think probably WIDECHAR was added at a time when only 2^16 code points were in the standard(s).? But that has changed.? This is a very simple fix of that. As for TEXT, the CM3 version is and always was abstract a string of WIDECHAR.? The procedures that have parameters of type CHAR just do the widening or narrowing at the time a character is passed in or out. The fact that the current representation holds some characters in 8-bit array elements is hidden by the Text abstraction, and can be changed if convenient. In contrast, Wr/Rd and friends do not hide character representations in the stream.? This is as it must be, and I am proposing only to add additional representations that they can handle, and make it convenient for the usual case that an entire stream uses the same representation of characters. > Sorry guys, but I'm not agreeing with you in this one, I hope you make the best of CM3 work or leave alone the package a la DEC-SRC. > If you are thinking in widening the TEXT string package make it polymorphic it doesn't add complexity burden to the language, it explains better the CHAR type and its extension but do it naturally using the language types, don't create your own one with only that purpose. > THanks? in advance > > > > > --- El *s?b, 14/7/12, Rodney M. Bates //* escribi?: > > >? ???De: Rodney M. Bates >? ???Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! >? ???Para: m3devel at elegosoft.com >? ???Fecha: s?bado, 14 de julio, 2012 15:05 > > > >? ???On 06/27/2012 02:58 AM, Dirk Muysers wrote: >? ? ? > Some time ago I have started to develop a unicode library based >? ? ? > on the old M3 text model but using UTF-8 internally rather than >? ? ? > Latin-1 (see README attachement). For reasons best known to >? ? ? > me I had to put it on the backburner in favour of more urgent work. >? ? ? > If anybody is interested in furthering this solution I would eagerly >? ? ? > give the existing (pre-alpha) code away. >? ? ? > This being said, there are certainly better hash algorithms than the >? ? ? > one used by m3core (eg Goullburn, see >? ? ? > http://www.clockandflame.com/media/Goulburn06.pdf). >? ? ? > >? ? ? > >? ???And: > > >? ???1. Properties > >? ???This part deals with properties of Unicode code-points/characters. We call Unicode code-points "runes" for brevity. >? ???Unlike WIDECHAR's, runes cover the the whole gamut of the Unicode specification. We could have defined a Rune as >? ???TYPE Rune = [0..16_10FFFF], but? unfortunately not all values in the code-point range are valid and others are left >? ???undefined, so a "Rune" is defined as an integer. The library uses defensive programming by not allowing a string to >? ???contain any invalid or undefined Rune. > >? ???I don't understand the reasoning here.? Your criticism of the subrange type is that it contains invalid values >? ???between the bounds, which you address with dynamic value checks inside the library code.? But why eliminate the >? ???subrange and changing the type to an integer?? It only drastically increases the number of invalid values, >? ???by a factor of over 2^11 times, if integer is 32-bit, otherwise more.? And it demotes the status of these >? ???from statically-detected, in one compile, to dynamically-detected, requiring massive testing to get an even >? ???partial level of confidence.? It also precludes storing them in less than 64 bits on a 64-bit machine. > >? ???Am I missing something? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Mon Jul 16 18:45:58 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Mon, 16 Jul 2012 11:45:58 -0500 Subject: [M3devel] New OrdSets generic package Message-ID: <50044546.4010202@lcwb.coop> Now checked in inside m3-libs/ordsets, OrdSets is a generic interface and module for dynamically-sized sets of large-range ordinal types, in functional style. From comments in OrdSets.ig: (* This interface provides operations on sets whose members are of an ordinal type. It is written in a functional style. It never mutates a set value, (except for some internal lazy computation--not visible to clients), and thus it sometimes is able to share heap objects. Its primary use pattern is where the set values can have widely varying sizes, you want a very large maximum size limit, but many of the sets are expected to be much smaller than the maximum. For this to happen, you probably want to instantiate only with INTEGER or WIDECHAR. It will work with LONGINT, but only if its target-machine- dependent range is a subrange of INTEGER. There is no space or time performance benefit to instantiating with a subrange of the base type. If this does not fit your needs, you probably want to use Modula-3's builtin set type, or some other package. The set representations occupy variable-sized heap objects, just sufficient for the set value. In the most general case, these use heap-allocated open arrays of machine words, with one bit per actual set member, plus some overhead, of course. If you compile with a later CM3 Modula-3 compiler and garbage collector that tolerate misaligned "pseudo" pointers, i.e, with the least significant bit set to one, you can set a boolean constant in the corresponding module OrdSets.mg. This will cause it to utilize this Modula-3 implementation feature to store sufficiently small set values entirely within the pointer word, avoiding the high space and time overheads of heap allocation. The CM3 5-8 compiler is sufficient. SRC M3, PM3, EZM3, and earlier CM3 versions are not. As of 2012-7-15, Pickles do not handle these. Enable this with DoPseudoPointers, in OrdSets.mg. *) From dabenavidesd at yahoo.es Thu Jul 19 17:02:21 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 19 Jul 2012 16:02:21 +0100 (BST) Subject: [M3devel] About a new AMD64 binary Message-ID: <1342710141.21612.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: I'm writing to ask whether .deb produced file(s) is(are) available somehow, to install on AMD64_LINUX Hendrik do you have a copy of yourself, right? Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: