From dabenavidesd at yahoo.es Sun Jun 3 18:51:51 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 3 Jun 2012 17:51:51 +0100 (BST) Subject: [M3devel] Renewed interest in Modula-3 in HP Labs In-Reply-To: <1338470019.63945.YahooMailClassic@web29703.mail.ird.yahoo.com> Message-ID: <1338742311.65879.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all l looking to support a c-backend we would need to know how much can we optimize the energy consumption of any backend CG or how long can we use M3CG in compilation time total (the result could be that we need to distribute precompiled form, see p. 7: http://www.fdi.ucm.es/profesor/ricardo/ei2/crisis.pdf ). This would be a rather good measure of the need of a Object code backend or not (like Gcc, or JVM one, or a translation based like Pascal first implementations were pascal manually machine coded). For instance HP had? HP3000 [1] with several measurements, as their "u-code" Interface was not open but proprietary so you couldn't get their compiler for A-L/SPL (contrary to pascal). I'm sure they have worked out in this problem as well as for newer machines (like for fpga reposition programs for VAXen and Alpha) but how much they will emulate in SW I don't know. I write that because VAX is essentially translated to Alpha via M3CG via HW and equally in SW. I know they are producing VAX in FPGA, but don't know abut Alphas at all. Thanks in advance [1] R. P. Blake, ?Exploring a Stack Architecture,? Computer, vol. 10, no. 5, pp. 30?39, May 1977. --- El jue, 31/5/12, Daniel Alejandro Benavides D. escribi?: De: Daniel Alejandro Benavides D. Asunto: [M3devel] Renewed interest in Modula-3 in HP Labs Para: m3devel at elegosoft.com Fecha: jueves, 31 de mayo, 2012 08:13 Hi all: I see there is some products coming from HP, and others, but specially HP, claiming that provide lower consumption in data center power management. As I see they are working in Tycoon as a Data processor (created in Germany and Europe). As Greg Nelson wrote code for profiling the Alphas and Itanium, perhaps they are interested in work on ESC, but nevertheless Modula-3 and family languages (Quest) as Tycoon is based on them. If I may say so, Quest was defined by its simple denotational semantics, which is the natural deduction system of Baby Modula-3 (though it lacks more than that, but you can process the language of it through the former) Do we want to confirm that, if anyone interested in the TML - TVM please write me for any other questions or comments Thanks in advance http://www.eetimes.com/electronics-news/4373994/HP-cuts-data-center-power-in-lab-tests?cid=NL_EETimesDaily http://tycoon.hpl.hp.com/~tycoon/doc/users_manual_en/ch-intro.html http://wwwmatthes.in.tum.de/file/Publications/1992/Math92/paper.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Sun Jun 3 23:18:47 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sun, 3 Jun 2012 17:18:47 -0400 Subject: [M3devel] Renewed interest in Modula-3 in HP Labs In-Reply-To: <1338742311.65879.YahooMailClassic@web29703.mail.ird.yahoo.com> References: <1338470019.63945.YahooMailClassic@web29703.mail.ird.yahoo.com> <1338742311.65879.YahooMailClassic@web29703.mail.ird.yahoo.com> Message-ID: <20120603211847.GA17923@topoi.pooq.com> On Sun, Jun 03, 2012 at 05:51:51PM +0100, Daniel Alejandro Benavides D. wrote: > semantics, which is the natural deduction system of Baby Modula-3 You keep mentioning Baby Modula 3, but I have no idea what it is. Can you expalin and provide lins? -- hendrik From dabenavidesd at yahoo.es Sun Jun 3 23:48:42 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 3 Jun 2012 22:48:42 +0100 (BST) Subject: [M3devel] Renewed interest in Modula-3 in HP Labs In-Reply-To: <20120603211847.GA17923@topoi.pooq.com> Message-ID: <1338760122.84788.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: for sure yes, it's a first-order prototype-oriented functional programming language for writing programming language's type systems (in Spanish-native tongue countries like Abadi's, most common games or toy tool are Baby dolls, if you care. hence its name if I may say so). Basically the? language itself is not dissimilar from Modula-3 in its object-oriented part. It has a type system in lambda calculus, written for its meta-languages as well (e.g. Modula-3). Its denotational semantics are expressed in a natural deduction system logic. Basically was constructed to explain object oriented languages, though it wasn't written specially for that, but for type system calculus construction (you could say a kind of IBM's Axiom for computers science type theoretician? if I may say so). No other system besides DEC ones had ever play with it (its functional language although simple is not easily executable so Cardelli and others decide to use a different calculus for their joint Book "A Theory of Objects"). But at the? very core issue of unification it lead the work on type systems for its times. Thanks in advance --- El dom, 3/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] Renewed interest in Modula-3 in HP Labs Para: m3devel at elegosoft.com Fecha: domingo, 3 de junio, 2012 16:18 On Sun, Jun 03, 2012 at 05:51:51PM +0100, Daniel Alejandro Benavides D. wrote: > semantics, which is the natural deduction system of Baby Modula-3 You keep mentioning Baby Modula 3, but I have no idea what it is.? Can you expalin and provide lins? -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Wed Jun 6 09:57:40 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 6 Jun 2012 09:57:40 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120606064732.2C9242474003@birch.elegosoft.com> References: <20120606064732.2C9242474003@birch.elegosoft.com> Message-ID: <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Jay, What benefit from 4.6 backend do we expect for cm3 if most of optimizer is "optimized out" of cm3cg? If our "trees" are reason why you must switch optimizations off, is it not more logical to fix our "trees"? One by one, if need be. A look into gm2 (for example), a fix in our backend. That way, future porting to most recent gcc's will be much easier? TIA, dd On Jun 6, 2012, at 8:47 AM, Jay Krell wrote: > Log message: > remove more of the optimizer -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Wed Jun 6 10:10:06 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 6 Jun 2012 08:10:06 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Message-ID: I have very mixed feelings about the optimizer. 1) I'm not certain it is worth the time it takes to run. 2) Fixing our trees isn't necessarily trivial. The most expedient thing is neither to fix the trees, nor remove the optimizer code, but merely to set the optimizer to be off in parse.c. 3) gcc is huge, I'd kind of like to see if I can get actually building it can be made much faster/smaller 4) Probably what really got me started here is the gmp/mpfr/mpc dependency. 5) The "best" thing isn't necessarily to use gcc at all. 6) I'll maybe move up to 4.7 soon. 6b) and maybe not spend so much time on it? Maybe just ln -s in gmp/mpfr/mpc and port only the needed changes? Maybe even not using g++ but the hybrid gcc/g++ I use for gcc-apple (4.2) 7) Do folks out there really use the Modula-3/gcc optimizer, and notice it produces code that runs much faster? ?- Jay ________________________________ > From: dragisha at m3w.org > Date: Wed, 6 Jun 2012 09:57:40 +0200 > To: jkrell at elego.de > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > Jay, > > What benefit from 4.6 backend do we expect for cm3 if most of optimizer > is "optimized out" of cm3cg? > > If our "trees" are reason why you must switch optimizations off, is it > not more logical to fix our "trees"? One by one, if need be. A look > into gm2 (for example), a fix in our backend. That way, future porting > to most recent gcc's will be much easier? > > TIA, > dd > > On Jun 6, 2012, at 8:47 AM, Jay Krell wrote: > > Log message: > remove more of the optimizer > From jay.krell at cornell.edu Wed Jun 6 10:15:32 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 6 Jun 2012 08:15:32 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, Message-ID: > > What benefit from 4.6 backend do we expect for cm3 if most of optimizer ps: just the general goodness of staying current. Even if a hacked up current. 4.7.0 is out already.. ?- Jay ---------------------------------------- > From: jay.krell at cornell.edu > To: dragisha at m3w.org; jkrell at elego.de > CC: m3devel at elegosoft.com > Subject: RE: [M3devel] [M3commit] CVS Update: cm3 > Date: Wed, 6 Jun 2012 08:10:06 +0000 > > > I have very mixed feelings about the optimizer. > 1) I'm not certain it is worth the time it takes to run. > 2) Fixing our trees isn't necessarily trivial. > The most expedient thing is neither to fix the trees, nor remove the optimizer code, but merely > to set the optimizer to be off in parse.c. > 3) gcc is huge, I'd kind of like to see if I can get actually building it can be made much faster/smaller > 4) Probably what really got me started here is the gmp/mpfr/mpc dependency. > 5) The "best" thing isn't necessarily to use gcc at all. > 6) I'll maybe move up to 4.7 soon. > 6b) and maybe not spend so much time on it? Maybe just ln -s in gmp/mpfr/mpc and port only the needed changes? > Maybe even not using g++ but the hybrid gcc/g++ I use for gcc-apple (4.2) > 7) Do folks out there really use the Modula-3/gcc optimizer, and notice it produces code that runs much faster? > > > - Jay > > > ________________________________ > > From: dragisha at m3w.org > > Date: Wed, 6 Jun 2012 09:57:40 +0200 > > To: jkrell at elego.de > > CC: m3devel at elegosoft.com > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > > > Jay, > > > > What benefit from 4.6 backend do we expect for cm3 if most of optimizer > > is "optimized out" of cm3cg? > > > > If our "trees" are reason why you must switch optimizations off, is it > > not more logical to fix our "trees"? One by one, if need be. A look > > into gm2 (for example), a fix in our backend. That way, future porting > > to most recent gcc's will be much easier? > > > > TIA, > > dd > > > > On Jun 6, 2012, at 8:47 AM, Jay Krell wrote: > > > > Log message: > > remove more of the optimizer > > > From dragisha at m3w.org Wed Jun 6 10:51:33 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 6 Jun 2012 10:51:33 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Message-ID: I am using it, and I need it. Does it run better/faster? I didn't test, but is it something to even ask, these days, architectures, ? ? Only if you turned everything off in 5.8.6 and later, as you'r doing it now, then probably my "-O2" default it is of no benefit at all :). Generally, our "pitch" to "sell" super-modern-ultra-blast-mega-fast-superlative-OO and everything else you only dreamed about? And add "no CPU optimizations"? Imagine that. On Jun 6, 2012, at 10:10 AM, Jay K wrote: > 7) Do folks out there really use the Modula-3/gcc optimizer, and notice it produces code that runs much faster? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Wed Jun 6 11:38:18 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 6 Jun 2012 09:38:18 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , Message-ID: 5.8.6 does allow many optimizations to occur. We turn off a very small number directly. Functions that call setjmp have optimizations inhibited by declaring all locals volatile. We don't give the compiler good type information, and we take the address of stuff more than necessary, by generating very low level code. Where you have e.g. MODULE Foo; TYPE Point =? RECORD x,y:INTEGER END; PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; We generate the equivalent of: typedef ptrdiff_t INTEGER; typedef char* ADDRESS; INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. ?- Jay ________________________________ > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > From: dragisha at m3w.org > Date: Wed, 6 Jun 2012 10:51:33 +0200 > CC: jkrell at elego.de; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > I am using it, and I need it. > > Does it run better/faster? I didn't test, but is it something to even > ask, these days, architectures, ? ? > > Only if you turned everything off in 5.8.6 and later, as you'r doing it > now, then probably my "-O2" default it is of no benefit at all :). > > Generally, our "pitch" to "sell" > super-modern-ultra-blast-mega-fast-superlative-OO and everything else > you only dreamed about? And add "no CPU optimizations"? Imagine that. > > On Jun 6, 2012, at 10:10 AM, Jay K wrote: > > 7) Do folks out there really use the Modula-3/gcc optimizer, and notice > it produces code that runs much faster? > From jay.krell at cornell.edu Wed Jun 6 11:42:52 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 6 Jun 2012 09:42:52 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, , <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , Message-ID: ?> Functions that call setjmp I meant -- functions wtih TRY/EXCEPT or TRY/FINALLY. :) ?- Jay ---------------------------------------- > From: jay.krell at cornell.edu > To: dragisha at m3w.org > Date: Wed, 6 Jun 2012 09:38:18 +0000 > CC: jkrell at elego.de; m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > > 5.8.6 does allow many optimizations to occur. > We turn off a very small number directly. > Functions that call setjmp have optimizations inhibited by declaring all locals volatile. > We don't give the compiler good type information, and we take the address of stuff more than necessary, by > generating very low level code. > Where you have e.g. > MODULE Foo; > TYPE Point = RECORD x,y:INTEGER END; > PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; > > > We generate the equivalent of: > > > typedef ptrdiff_t INTEGER; > typedef char* ADDRESS; > INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } > > > Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. > > > > - Jay > > > ________________________________ > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > From: dragisha at m3w.org > > Date: Wed, 6 Jun 2012 10:51:33 +0200 > > CC: jkrell at elego.de; m3devel at elegosoft.com > > To: jay.krell at cornell.edu > > > > I am using it, and I need it. > > > > Does it run better/faster? I didn't test, but is it something to even > > ask, these days, architectures, ? ? > > > > Only if you turned everything off in 5.8.6 and later, as you'r doing it > > now, then probably my "-O2" default it is of no benefit at all :). > > > > Generally, our "pitch" to "sell" > > super-modern-ultra-blast-mega-fast-superlative-OO and everything else > > you only dreamed about? And add "no CPU optimizations"? Imagine that. > > > > On Jun 6, 2012, at 10:10 AM, Jay K wrote: > > > > 7) Do folks out there really use the Modula-3/gcc optimizer, and notice > > it produces code that runs much faster? > > > From dragisha at m3w.org Wed Jun 6 12:17:54 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 6 Jun 2012 12:17:54 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, , <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , Message-ID: I know that much about generated code :). "Good" thing is - not many things changed in *m3 backend since I ported pm3 to LINUX_ALPHA :) On Jun 6, 2012, at 11:42 AM, Jay K wrote: > > > Functions that call setjmp > > > I meant -- functions wtih TRY/EXCEPT or TRY/FINALLY. :) > > - Jay > > ---------------------------------------- >> From: jay.krell at cornell.edu >> To: dragisha at m3w.org >> Date: Wed, 6 Jun 2012 09:38:18 +0000 >> CC: jkrell at elego.de; m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> >> >> 5.8.6 does allow many optimizations to occur. >> We turn off a very small number directly. >> Functions that call setjmp have optimizations inhibited by declaring all locals volatile. >> We don't give the compiler good type information, and we take the address of stuff more than necessary, by >> generating very low level code. >> Where you have e.g. >> MODULE Foo; >> TYPE Point = RECORD x,y:INTEGER END; >> PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; >> >> >> We generate the equivalent of: >> >> >> typedef ptrdiff_t INTEGER; >> typedef char* ADDRESS; >> INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } >> >> >> Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. >> >> >> >> - Jay >> >> >> ________________________________ >>> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >>> From: dragisha at m3w.org >>> Date: Wed, 6 Jun 2012 10:51:33 +0200 >>> CC: jkrell at elego.de; m3devel at elegosoft.com >>> To: jay.krell at cornell.edu >>> >>> I am using it, and I need it. >>> >>> Does it run better/faster? I didn't test, but is it something to even >>> ask, these days, architectures, ? ? >>> >>> Only if you turned everything off in 5.8.6 and later, as you'r doing it >>> now, then probably my "-O2" default it is of no benefit at all :). >>> >>> Generally, our "pitch" to "sell" >>> super-modern-ultra-blast-mega-fast-superlative-OO and everything else >>> you only dreamed about? And add "no CPU optimizations"? Imagine that. >>> >>> On Jun 6, 2012, at 10:10 AM, Jay K wrote: >>> >>> 7) Do folks out there really use the Modula-3/gcc optimizer, and notice >>> it produces code that runs much faster? >>> >> > From dabenavidesd at yahoo.es Wed Jun 6 16:17:23 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 6 Jun 2012 15:17:23 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: Message-ID: <1338992243.7847.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I noticed originally factored code is better, and if its dead then that's optimization. I don't know too much gcc or gdb, but factoring to match open64 (c++) might be better. About Alphas, I know that DEC Firefly was commercialized as SMP VS3520/40 and unrelease V3820/40, given that a DB vendor ported products to it, shouldn't we use their backends to a DB machine? Besides that I think that developing a product for that end is what HP is doing: http://www.zdnetasia.com/hp-aiming-for-data-protection-battleground-62305019.htm?src=newsletter That said, alphas wouldn't use gcc but their own backend directed optimizer, like for their DEClanguages internal products. Thanks in advance --- El mi?, 6/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "Jay K" CC: "Jay Krell" , "m3devel" Fecha: mi?rcoles, 6 de junio, 2012 05:17 I know that much about generated code :). "Good" thing is - not many things changed in *m3 backend since I ported pm3 to LINUX_ALPHA :) On Jun 6, 2012, at 11:42 AM, Jay K wrote: > >? > Functions that call setjmp > > > I meant -- functions wtih TRY/EXCEPT or TRY/FINALLY. :) > >? - Jay > > ---------------------------------------- >> From: jay.krell at cornell.edu >> To: dragisha at m3w.org >> Date: Wed, 6 Jun 2012 09:38:18 +0000 >> CC: jkrell at elego.de; m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> >> >> 5.8.6 does allow many optimizations to occur. >> We turn off a very small number directly. >> Functions that call setjmp have optimizations inhibited by declaring all locals volatile. >> We don't give the compiler good type information, and we take the address of stuff more than necessary, by >> generating very low level code. >> Where you have e.g. >> MODULE Foo; >> TYPE Point =? RECORD x,y:INTEGER END; >> PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; >> >> >> We generate the equivalent of: >> >> >> typedef ptrdiff_t INTEGER; >> typedef char* ADDRESS; >> INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } >> >> >> Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. >> >> >> >> - Jay >> >> >> ________________________________ >>> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >>> From: dragisha at m3w.org >>> Date: Wed, 6 Jun 2012 10:51:33 +0200 >>> CC: jkrell at elego.de; m3devel at elegosoft.com >>> To: jay.krell at cornell.edu >>> >>> I am using it, and I need it. >>> >>> Does it run better/faster? I didn't test, but is it something to even >>> ask, these days, architectures, ? ? >>> >>> Only if you turned everything off in 5.8.6 and later, as you'r doing it >>> now, then probably my "-O2" default it is of no benefit at all :). >>> >>> Generally, our "pitch" to "sell" >>> super-modern-ultra-blast-mega-fast-superlative-OO and everything else >>> you only dreamed about? And add "no CPU optimizations"? Imagine that. >>> >>> On Jun 6, 2012, at 10:10 AM, Jay K wrote: >>> >>> 7) Do folks out there really use the Modula-3/gcc optimizer, and notice >>> it produces code that runs much faster? >>> >> > ??? ???????? ?????? ??? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Wed Jun 6 18:18:08 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Wed, 06 Jun 2012 09:18:08 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Message-ID: <20120606161808.7F5EA1A205B@async.async.caltech.edu> Jay K writes: > ... >7) Do folks out there really use the Modula-3/gcc optimizer=2C and notice i= >t produces code that runs much faster? If we are talking about turning on optimizations in the m3makefile, then the answer is: Yes! At least with CM3 it makes a huge difference in runtime. Without the optimizer CM3-produced code runs far slower than PM3-produced code (I've seen 3X I think.) With it, CM3 can sometimes keep up. Unless you use a lot of TYPECASE or other constructs that have a much less efficient implementation in the CM3 libraries than in the PM3 libraries. Mika From dabenavidesd at yahoo.es Wed Jun 6 20:50:59 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 6 Jun 2012 19:50:59 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120606161808.7F5EA1A205B@async.async.caltech.edu> Message-ID: <1339008659.61806.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: this is very bad news, sounds like we had a old RT. I wonder how parallelized was DEC-SRC Vulcan or alike environments. Thanks in advance --- El mi?, 6/6/12, Mika Nystrom escribi?: De: Mika Nystrom Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "Jay K" CC: m3devel at elegosoft.com Fecha: mi?rcoles, 6 de junio, 2012 11:18 Jay K writes: > ... >7) Do folks out there really use the Modula-3/gcc optimizer=2C and notice i= >t produces code that runs much faster? If we are talking about turning on optimizations in the m3makefile, then the answer is: Yes!? At least with CM3 it makes a huge difference in runtime.? Without the optimizer CM3-produced code runs far slower than PM3-produced code (I've seen 3X I think.)? With it, CM3 can sometimes keep up.? Unless you use a lot of TYPECASE or other constructs that have a much less efficient implementation in the CM3 libraries than in the PM3 libraries. ? ? Mika -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Thu Jun 7 02:06:30 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Wed, 6 Jun 2012 20:06:30 -0400 Subject: [M3devel] ran out of space in /tmp while building .deb Message-ID: <20120607000630.GA4233@topoi.pooq.com> While trying to build a deb for modula 3 on my laptop (a wheezy 32-bit intel machine) /tmp got full and the build aborted. Obviously, I should place /tmp elsewhere -- except that there's no entry in my /etc/fstab telling it where the tmpfs should be mounted. If I could just get it not to mount anything on /tmp things should be fine. Apparently, though, the kernel just know better, and I'm stuck wit a small /tmp. Is there eny way to tell make-dist.py that it's supposed to put its temporary files somewhere other than .tmp? -- hendrik From dragisha at m3w.org Thu Jun 7 03:02:19 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 7 Jun 2012 03:02:19 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607011634.468b6bbf@wenus.next.com.pl> References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> <20120607011634.468b6bbf@wenus.next.com.pl> Message-ID: <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org> Try ALPHA_LINUX, maybe ask Jay first :) On Jun 7, 2012, at 1:16 AM, Dariusz Knoci?ski wrote: > Dnia 2012-06-06, o godz. 12:17:54 > Dragi?a Duri? napisa?(a): > >> I know that much about generated code :). >> >> "Good" thing is - not many things changed in *m3 backend since I ported pm3 >> to LINUX_ALPHA :) >> > Let me ask a stupid question. Is cm3 working on LINUX_ALPHA? I have one ES40 > working server with Gentoo Linux. > > Best Regards > Dariusz Knoci?ski. From jay.krell at cornell.edu Thu Jun 7 03:19:20 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 7 Jun 2012 01:19:20 +0000 Subject: [M3devel] ran out of space in /tmp while building .deb In-Reply-To: <20120607000630.GA4233@topoi.pooq.com> References: <20120607000630.GA4233@topoi.pooq.com> Message-ID: Use the source. Change it if needed. - Jay > Date: Wed, 6 Jun 2012 20:06:30 -0400 > From: hendrik at topoi.pooq.com > To: m3devel at elegosoft.com > Subject: [M3devel] ran out of space in /tmp while building .deb > > While trying to build a deb for modula 3 on my laptop (a wheezy 32-bit > intel machine) /tmp got full and the build aborted. > > Obviously, I should place /tmp elsewhere -- except that there's no entry > in my /etc/fstab telling it where the tmpfs should be mounted. If I > could just get it not to mount anything on /tmp things should be fine. > Apparently, though, the kernel just know better, and I'm stuck wit a > small /tmp. > > Is there eny way to tell make-dist.py that it's supposed to put its > temporary files somewhere other than .tmp? > > -- hendrik > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Thu Jun 7 03:28:15 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 7 Jun 2012 01:28:15 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , , , <20120607011634.468b6bbf@wenus.next.com.pl>, <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org> Message-ID: > Is cm3 working on LINUX_ALPHA? I have one ES40 working server with Gentoo Linux I don't think it does yet, but give me ssh access and I can most likely make it work pretty quickly.There is very very very little to porting these days. The main thing is finding the jmpbuf size, and adding the target to various tables, describing at little or big endian, 32bit or 64bit, etc., but even that is often automatic, if it starts "alpha_" or contains "64", it is assumed 64bit. If it contains "alpha", it is probably assumed little endian. If it contains "_linux", then it is assumed Linux, etc. The jmpbuf size we can just assume something big like 1k (that is a tremendous overkill). jmpbuf size should/will soon be eliminated as a factor in porting anyway.And then you just need to create a config file ALPHA_LINUX that includes("Alpha64.common") and "Linux.common" or such. Does ALPHA_LINUX have a 32bit mode/ABI?Or is it all 64bit all the time?i.e.what does this do:echo > foo.cgcc -m32 foo.c I had some Alphas but I've sold them all.I was given access to Alphas running Tru64 v4.something and v5.something and got that to work.But the "kernel" (Tru64 vs. Linux) and not the "processor architecture" (alpha, x86, sparc) are generally a larger concern, and Linux is really old hat at this point. See..one day...we'll generate C (and maybe have cooperative suspend) and these questions will all just go away. The answer will be "of course, most likely, nothing special". - Jay > From: dragisha at m3w.org > Date: Thu, 7 Jun 2012 03:02:19 +0200 > To: dknoto at gmail.com > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > Try ALPHA_LINUX, maybe ask Jay first :) > > On Jun 7, 2012, at 1:16 AM, Dariusz Knoci?ski wrote: > > > Dnia 2012-06-06, o godz. 12:17:54 > > Dragi?a Duri? napisa?(a): > > > >> I know that much about generated code :). > >> > >> "Good" thing is - not many things changed in *m3 backend since I ported pm3 > >> to LINUX_ALPHA :) > >> > > Let me ask a stupid question. Is cm3 working on LINUX_ALPHA? I have one ES40 > > working server with Gentoo Linux. > > > > Best Regards > > Dariusz Knoci?ski. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Thu Jun 7 06:45:38 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 7 Jun 2012 04:45:38 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120606161808.7F5EA1A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> Message-ID: Daniel, I can't find the email now, as usual, you are probably wrong. We don't have an older runtime, we have a newer one, I think. With more allowance for dynamic loading. Mika, Maybe a TYPECASE-intense design is generally poor? dynamic_cast is slow in some C++ implementations. And I've never seen it used much. Some, but not much. The "type matching" that C++ exception handling has to do isn't particularly fast, though there are other costs there. Other than the stack walk, there is "finding the base of the object", and strcmp to do the actual type match -- name-based-type-equality and all that, with a hope that it suffices and no runtime checking of type hashes like Modula-3 does.. Maybe you should switch on your own type tag? ? But I guess Modula-3 doesn't have unions. Or use OBJECT and method calls? Which reminds me...it bothers me that OBJECT requires heap allocation and garbage collection. It shouldn't require either. I know we have function pointers available to simulate it, without heap allocation, but what I don't know, is if the "implicit downcast" in a virtual function/method call is doable in safe code or not. I'll have to look into it..but I'm busy now.. Maybe there is an optimization whereby the compiler can figure out that there is a small set of likely types that it could check first? Or maybe the full feature could be implemented more efficiently? Maybe it can be optimized based on the fact that the types known to the system are read-mostly, rarely written/appended? I don't know. I'd really have to look into what the language supports and how it is implemented. I'm not certain of either. In C++, typeid() is fast, and requires there be virtual functions (OBJECT). Is TYPECASE limited to OBJECTs? Or heap allocated data? Later.. ?- Jay ---------------------------------------- > To: jay.krell at cornell.edu > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > Date: Wed, 6 Jun 2012 09:18:08 -0700 > From: mika at async.caltech.edu > > Jay K writes: > > > ... > >7) Do folks out there really use the Modula-3/gcc optimizer=2C and notice i= > >t produces code that runs much faster? > > If we are talking about turning on optimizations in the m3makefile, then the > answer is: > > Yes! At least with CM3 it makes a huge difference in runtime. Without > the optimizer CM3-produced code runs far slower than PM3-produced code > (I've seen 3X I think.) With it, CM3 can sometimes keep up. Unless you > use a lot of TYPECASE or other constructs that have a much less efficient > implementation in the CM3 libraries than in the PM3 libraries. > > Mika From dragisha at m3w.org Thu Jun 7 09:30:29 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 7 Jun 2012 09:30:29 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> Message-ID: <0DF4844B-46D5-4AC7-97AD-AE18A38C2BED@m3w.org> Exatcly. Relevant parts of initialization are incremental. On Jun 7, 2012, at 6:45 AM, Jay K wrote: > Daniel, I can't find the email now, as usual, you are probably wrong. > > > We don't have an older runtime, we have a newer one, I think. > With more allowance for dynamic loading. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Thu Jun 7 16:48:24 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 7 Jun 2012 15:48:24 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <0DF4844B-46D5-4AC7-97AD-AE18A38C2BED@m3w.org> Message-ID: <1339080504.10970.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: Yes, but your estimation that user kind of behavior respect a programmer educated in using true multitask machine is not accurate. You can't read a program in two parts at a same machine, you need two different people, that it's so a true system of processors, you need a different kind of system to execute some action described. Little is said if you need to modify an OS code and another one also needs that how do you change the OS without interfering the other? To maintain a consistent view of your system? DEC-SRC were very well educated people who thought that easy of this was not hold in their system (Bob Taylor). They created yet another improvement to Modula-3+ in Modula-2+e Instead of taking inspiration for that kind of systems, they developed a newer one but I don't know much more than that it was a Win system-like. That is the reason why Modula-3 in Object code view isn't quite of many other traditional OS fixed Machine (systems that don't scale anyhow). Instead of Virtual Machinery you are confronted a true Multitasking machine. OK, if you care about that, think what is done to be done for Modula-3 is the full formal definition of the language which starts in Baby Modula-3 consist in that user of the language use it in its own description (hard to explain but that's the only way I'm afraid). Thanks in advance --- El jue, 7/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "Jay K" CC: "m3devel" Fecha: jueves, 7 de junio, 2012 02:30 Exatcly. Relevant parts of initialization are incremental. On Jun 7, 2012, at 6:45 AM, Jay K wrote: Daniel, I can't find the email now, as usual, you are probably wrong. We don't have an older runtime, we have a newer one, I think. With more allowance for dynamic loading. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Thu Jun 7 17:35:52 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 7 Jun 2012 11:35:52 -0400 Subject: [M3devel] ran out of space in /tmp while building .deb In-Reply-To: References: <20120607000630.GA4233@topoi.pooq.com> Message-ID: <20120607153552.GA8202@topoi.pooq.com> On Thu, Jun 07, 2012 at 01:19:20AM +0000, Jay K wrote: > > Use the source. Change it if needed. - Jay Thanks. But before I started hacking the source I found anothher way. It turns out that there's a parameter that suppresses mounting /tmp as a tmpfs. and it seems Debian thinks they got it wrong, and when the current initscripts trickle down from sid to testing the problem will go away by itself. I didn't wait; I changed the parameter; I won't have to hack the source. -- hendrik From hendrik at topoi.pooq.com Thu Jun 7 17:37:29 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 7 Jun 2012 11:37:29 -0400 Subject: [M3devel] .debs for modula 3 In-Reply-To: References: <20120607000630.GA4233@topoi.pooq.com> Message-ID: <20120607153729.GB8202@topoi.pooq.com> By the way, is there anything I should be doing with these .debs I'm creating other than just using them myseof? -- hendrik From dabenavidesd at yahoo.es Thu Jun 7 18:06:53 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 7 Jun 2012 17:06:53 +0100 (BST) Subject: [M3devel] .debs for modula 3 In-Reply-To: <20120607153729.GB8202@topoi.pooq.com> Message-ID: <1339085213.73020.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: Encouraging. What about a chip cipher sign from the repository? I like the idea of the signing of the deb, if I had a utility to sign them by yourself or Elego folks who want to recreate them there (I think this is mostly perl )guys? http://www.advogato.org/article/750.html A different question is whether their sharing of packages is accepted by Elego admin since most of the development occurs not only there so you know, so use a center development or distributed (only DEC-SRC used their Vesta to sign cache builds but maybe others used it in DEC-*, etc). Thanks in advance --- El jue, 7/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: [M3devel] .debs for modula 3 Para: "m3devel" Fecha: jueves, 7 de junio, 2012 10:37 By the way, is there anything I should be doing with these .debs I'm creating other than just using them myseof? -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Thu Jun 7 18:36:41 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 07 Jun 2012 09:36:41 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> Message-ID: <20120607163641.A81351A205B@async.async.caltech.edu> Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated. Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things. In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example. In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know). The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so. Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time. There is simply a static array of the types. CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome. It's all in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a heavy performance cost. And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation. Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job". I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit. I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads. Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code. I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong. Life's too short... Mika P.S. how are the pthreads coming along? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do >isn't particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the object"=2C >and strcmp to do the actual type match -- name-based-type-equality >and all that=2C with a hope that it suffices and no runtime checking >of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires >heap allocation and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C >without heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler >can figure out that there is a small set of likely types >that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types >known to the system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports >and how it is implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual >functions (OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced code >> (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. Unless you >> use a lot of TYPECASE or other constructs that have a much less efficient >> implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > = From rcolebur at SCIRES.COM Thu Jun 7 18:52:04 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Thu, 7 Jun 2012 12:52:04 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607163641.A81351A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> Message-ID: Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7). What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated. Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things. In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example. In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know). The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so. Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time. There is simply a static array of the types. CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome. It's all in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a heavy performance cost. And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation. Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job". I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit. I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads. Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code. I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong. Life's too short... Mika P.S. how are the pthreads coming along? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > = From dabenavidesd at yahoo.es Thu Jun 7 21:42:44 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 7 Jun 2012 20:42:44 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: Message-ID: <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: Yes, it is, but the same conditioning over System Pthreads, is that you can't always link the threads against themselves, so you need re-implement it correctly. Good style DEC-SRC threads might be along the verification project for the Alpha with Vector extensions: http://barroso.org/publications/piranha_asilomar.pdf Thanks in advance --- El jue, 7/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" Fecha: jueves, 7 de junio, 2012 11:52 Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7).? What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated.? Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things.? In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example.? In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know).? The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so.? Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time.? There is simply a static array of the types.? CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome.? It's all in RT0 somewhere.? In short, CM3 does "more" than SRC M3 did but at a heavy performance cost.? And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation.? Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job".? I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit.? I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads.? Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code.? I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong.? Life's too short... ? ???Mika P.S. how are the pthreads coming along?? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now?? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > ??? ???????? ?????? ??? ? = -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Thu Jun 7 22:09:58 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 7 Jun 2012 22:09:58 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607163641.A81351A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> Message-ID: <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> Are you sure about this? Both pm3 and cm3 load type structures from object files on initialization. Type data is in UNTRACED REF ARRAY? structures, for both of them. Difference is in algorithm being incremental, "multi-pass" in cm3 and single-pass in pm3/SRC. Also, for garbage collection, there is a check to see if number of modules (meaning more globals areas) has grown, and rebuilding of globals list in case it is. There is nothing static in type structure of Modula-3. On Jun 7, 2012, at 6:36 PM, Mika Nystrom wrote: > Because of the restrictions of SRC and P M3, types are statically > allocated at compile time and all their subtyping relationships are known > at that time. There is simply a static array of the types. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Thu Jun 7 22:35:37 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 07 Jun 2012 13:35:37 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> Message-ID: <20120607203537.1CBC81A205B@async.async.caltech.edu> Sorry, "static" was (slightly) the wrong word. I believe they are malloced as an array during program startup. There is something significant about the ordering of this array, which is why you can't just add types to the PM3 environment during runtime. CM3 uses more indirection, so it's much easier to add things while running, but it also makes TYPECASE, ISTYPE, etc., slower. Possibly NARROW (explicit as well as implicit) as well... Mika =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > >--Apple-Mail=_D8C54D3B-50C9-47D3-AD4D-116B678A55EC >Content-Transfer-Encoding: quoted-printable >Content-Type: text/plain; > charset=windows-1252 > >Are you sure about this? > >Both pm3 and cm3 load type structures from object files on = >initialization. Type data is in UNTRACED REF ARRAY=85 structures, for = >both of them. > >Difference is in algorithm being incremental, "multi-pass" in cm3 and = >single-pass in pm3/SRC. Also, for garbage collection, there is a check = >to see if number of modules (meaning more globals areas) has grown, and = >rebuilding of globals list in case it is. >=20 >There is nothing static in type structure of Modula-3. > >On Jun 7, 2012, at 6:36 PM, Mika Nystrom wrote: > >> Because of the restrictions of SRC and P M3, types are statically >> allocated at compile time and all their subtyping relationships are = >known >> at that time. There is simply a static array of the types. > > >--Apple-Mail=_D8C54D3B-50C9-47D3-AD4D-116B678A55EC >Content-Transfer-Encoding: quoted-printable >Content-Type: text/html; > charset=windows-1252 > >-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Are = >you sure about this?

Both pm3 and cm3 load type = >structures from object files on initialization. Type data is in UNTRACED = >REF ARRAY=85 structures, for both of = >them.

Difference is in algorithm being = >incremental, "multi-pass" in cm3 and single-pass in pm3/SRC. Also, for = >garbage collection, there is a check to see if number of modules = >(meaning more globals areas) has grown, and rebuilding of globals list = >in case it is.
 
There is nothing static in = >type structure of Modula-3.

On Jun 7, 2012, at = >6:36 PM, Mika Nystrom wrote:

class=3D"Apple-interchange-newline">
class=3D"Apple-style-span" style=3D"border-collapse: separate; = >font-family: Helvetica; font-style: normal; font-variant: normal; = >font-weight: normal; letter-spacing: normal; line-height: normal; = >orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: = >none; white-space: normal; widows: 2; word-spacing: 0px; = >-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: = >0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: = >auto; -webkit-text-stroke-width: 0px; font-size: medium; ">Because of = >the restrictions of SRC and P M3, types are statically
allocated at = >compile time and all their subtyping relationships are known
at that = >time.  There is simply a static array of the = >types.

= > >--Apple-Mail=_D8C54D3B-50C9-47D3-AD4D-116B678A55EC-- From hendrik at topoi.pooq.com Thu Jun 7 23:11:35 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 7 Jun 2012 17:11:35 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607163641.A81351A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> Message-ID: <20120607211135.GA6314@topoi.pooq.com> On Thu, Jun 07, 2012 at 09:36:41AM -0700, Mika Nystrom wrote: > Hi Jay, > > TYPECASE is limited to "reference" types, which effectively means > heap-allocated. Unless you can get alloca in there, I suppose... what > I mean is that in Green Book Modula-3 the only way to get a reference > type is either through a heap allocation or an UNSAFE operation. > > TYPECASE is sometimes the only way to do things. In the Green Book > there are examples of using subtyping to have multiple generations > of objects in the same pickles, for example. In my program, it was > inside an interpreter that's figuring things out without any prior > type information, using ISTYPE or TYPECASE. > > The issue with TYPECASE that I brought up is actually that the > implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than > in PM3's (= SRC M3 as far as I know). The reason (which you allude to) > is that Critical Mass did a lot of work on supporting dynamic loading > of Modula-3 code (loading in types not known at compile time) and as > with many of the other projects they carried out, the code quality was > so-so. Because of the restrictions of SRC and P M3, types are statically > allocated at compile time and all their subtyping relationships are known > at that time. There is simply a static array of the types. CM3, on the > other hand, has some more complicated dynamic data structure that makes > all the TYPECASE and ISTYPE operations much more cumbersome. It's all > in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a > heavy performance cost. And of course no one uses the "more" bit now. I'd like to, if I only knew how. I'd be really interested in having the low-level infrastructure for JIT code generators. -- hendrik From rcolebur at SCIRES.COM Thu Jun 7 23:44:56 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Thu, 7 Jun 2012 17:44:56 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com> References: <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com> Message-ID: Daniel: I'm impressed by your ability to provide so many different research links in your posts. But, after looking at the link you gave in response to my post, I don't see the immediate relevance to my question regarding Modula-3 threading on Windows. Also, I'm sorry, but I have a very difficult time trying to understand what you are saying in your posts. I suppose it must have something to do with the translation between our different languages. Forgive me, but I don't understand your reply. --Randy Coleburn From: Daniel Alejandro Benavides D. [mailto:dabenavidesd at yahoo.es] Sent: Thursday, June 07, 2012 3:43 PM To: m3devel at elegosoft.com; Coleburn, Randy Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi all: Yes, it is, but the same conditioning over System Pthreads, is that you can't always link the threads against themselves, so you need re-implement it correctly. Good style DEC-SRC threads might be along the verification project for the Alpha with Vector extensions: http://barroso.org/publications/piranha_asilomar.pdf Thanks in advance --- El jue, 7/6/12, Coleburn, Randy > escribi?: De: Coleburn, Randy > Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" > Fecha: jueves, 7 de junio, 2012 11:52 Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7). What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated. Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things. In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example. In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know). The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so. Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time. There is simply a static array of the types. CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome. It's all in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a heavy performance cost. And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation. Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job". I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit. I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads. Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code. I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong. Life's too short... Mika P.S. how are the pthreads coming along? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > = -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 00:01:50 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 00:01:50 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607203537.1CBC81A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> <20120607203537.1CBC81A205B@async.async.caltech.edu> Message-ID: <1495A062-1210-4D07-815C-D3609442C51B@m3w.org> I've worked with both runtimes at this level (but not lately). And I can't think of one reason why this would be correct. (It does not make me right, I know:). Structures are equivalent, IIRC, primary difference being in algorithm. Incremental RTLinker operation results in possible reallocation of type structures (bottom of the world, you-are-the-wizard-if you-read-this), but they are still "static" for the most of (99.999..%) process lifetime. Question is important and I am sure it is fixable, if only we can identify problem here. There is nothing inherent to ability for dynamic loading demanding bad data structures at the botom of M3 world. Only (not-improbable) sub-optimal decisions made by cmass people at the moment. On Jun 7, 2012, at 10:35 PM, Mika Nystrom wrote: > Sorry, "static" was (slightly) the wrong word. > > I believe they are malloced as an array during program startup. There is > something significant about the ordering of this array, which is why you > can't just add types to the PM3 environment during runtime. CM3 uses > more indirection, so it's much easier to add things while running, > but it also makes TYPECASE, ISTYPE, etc., slower. Possibly NARROW > (explicit as well as implicit) as well... > > Mika -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Fri Jun 8 00:23:11 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 07 Jun 2012 15:23:11 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <1495A062-1210-4D07-815C-D3609442C51B@m3w.org> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> <20120607203537.1CBC81A205B@async.async.caltech.edu> <1495A062-1210-4D07-815C-D3609442C51B@m3w.org> Message-ID: <20120607222311.E35A71A205B@async.async.caltech.edu> Admittedly it's been a while since I looked at this. I think what's going on is that they used some sort of topological sorting in SRC M3, which was broken by Critical Mass. The reason for the slowdowns is clear if you study the following code for IsSubtype. PM3: PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN = VAR t := Get (b); BEGIN IF (a >= RT0u.nTypes) THEN BadType (a) END; IF (a = 0) THEN RETURN TRUE END; RETURN (t.typecode <= a AND a <= t.lastSubTypeTC); END IsSubtype; CM3: PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN = VAR t: RT0.TypeDefn; BEGIN IF (a = RT0.NilTypecode) THEN RETURN TRUE END; t := Get (a); IF (t = NIL) THEN RETURN FALSE; END; IF (t.typecode = b) THEN RETURN TRUE END; WHILE (t.kind = ORD (TK.Obj)) DO IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END; t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent; IF (t = NIL) THEN RETURN FALSE; END; IF (t.typecode = b) THEN RETURN TRUE; END; END; IF (t.traced # 0) THEN RETURN (b = RT0.RefanyTypecode); ELSE RETURN (b = RT0.AddressTypecode); END; END IsSubtype; Now let's take a peek at Typecase (it is emitted by the compiler for SRC and P M3!)... PROCEDURE ScanTypecase (ref: REFANY; x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER = VAR p: UNTRACED REF TypecaseCell; i: INTEGER; tc, xc: Typecode; BEGIN IF (ref = NIL) THEN RETURN 0; END; tc := TYPECODE (ref); p := x; i := 0; LOOP IF (p.uid = 0) THEN RETURN i; END; IF (p.defn = NIL) THEN p.defn := FindType (p.uid); IF (p.defn = NIL) THEN Fail (RTE.MissingType, RTModule.FromDataAddress(x), LOOPHOLE (p.uid, ADDRESS), NIL); END; END; xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode; IF (tc = xc) OR IsSubtype (tc, xc) THEN RETURN i; END; INC (p, ADRSIZE (p^)); INC (i); END; END ScanTypecase; Mika =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > >--Apple-Mail=_37D33CA3-82A5-4037-BFBC-40CEE6E0DADD >Content-Transfer-Encoding: quoted-printable >Content-Type: text/plain; > charset=us-ascii > >I've worked with both runtimes at this level (but not lately). And I = >can't think of one reason why this would be correct. (It does not make = >me right, I know:). Structures are equivalent, IIRC, primary difference = >being in algorithm. Incremental RTLinker operation results in possible = >reallocation of type structures (bottom of the world, = >you-are-the-wizard-if you-read-this), but they are still "static" for = >the most of (99.999..%) process lifetime. > >Question is important and I am sure it is fixable, if only we can = >identify problem here. There is nothing inherent to ability for dynamic = >loading demanding bad data structures at the botom of M3 world. Only = >(not-improbable) sub-optimal decisions made by cmass people at the = >moment.=20 > >On Jun 7, 2012, at 10:35 PM, Mika Nystrom wrote: > >> Sorry, "static" was (slightly) the wrong word. >>=20 >> I believe they are malloced as an array during program startup. There = >is >> something significant about the ordering of this array, which is why = >you >> can't just add types to the PM3 environment during runtime. CM3 uses >> more indirection, so it's much easier to add things while running, >> but it also makes TYPECASE, ISTYPE, etc., slower. Possibly NARROW >> (explicit as well as implicit) as well... >>=20 >> Mika > > >--Apple-Mail=_37D33CA3-82A5-4037-BFBC-40CEE6E0DADD >Content-Transfer-Encoding: quoted-printable >Content-Type: text/html; > charset=us-ascii > >-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">I've = >worked with both runtimes at this level (but not lately). And I can't = >think of one reason why this would be correct. (It does not make me = >right, I know:). Structures are equivalent, IIRC, primary difference = >being in algorithm. Incremental RTLinker operation results in possible = >reallocation of type structures (bottom of the world, = >you-are-the-wizard-if you-read-this), but they are still "static" for = >the most of (99.999..%) process lifetime.

Question is = >important and I am sure it is fixable, if only we can identify problem = >here. There is nothing inherent to ability for dynamic loading demanding = >bad data structures at the botom of M3 world. Only (not-improbable) = >sub-optimal decisions made by cmass people at the = >moment. 

On Jun 7, 2012, at 10:35 PM, Mika = >Nystrom wrote:

type=3D"cite">separate; font-family: Helvetica; font-style: normal; font-variant: = >normal; font-weight: normal; letter-spacing: normal; line-height: = >normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; = >text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; = >-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: = >0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: = >auto; -webkit-text-stroke-width: 0px; font-size: medium; ">Sorry, = >"static" was (slightly) the wrong word.

I believe they are = >malloced as an array during program startup.  There is
something = >significant about the ordering of this array, which is why you
can't = >just add types to the PM3 environment during runtime.  CM3 = >uses
more indirection, so it's much easier to add things while = >running,
but it also makes TYPECASE, ISTYPE, etc., slower. = > Possibly NARROW
(explicit as well as implicit) as = >well...

    Mika

<= >/div>
= > >--Apple-Mail=_37D33CA3-82A5-4037-BFBC-40CEE6E0DADD-- From jay.krell at cornell.edu Fri Jun 8 01:18:51 2012 From: jay.krell at cornell.edu (Jay) Date: Thu, 7 Jun 2012 16:18:51 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Message-ID: Actually what I showed is frequently wrong. We often use bitfield references, which seems wierd or wrong, but seems to work generally ok and produce better code. The RIGHT thing to use would be "component refs" in gcc parlance, but currently we don't and it isn't a small change. There is kind of a mismatch in the compiler architecture currently... - Jay (briefly/pocket-sized-computer-aka-phone) On Jun 6, 2012, at 3:17 AM, Dragi?a Duri? wrote: > I know that much about generated code :). > > "Good" thing is - not many things changed in *m3 backend since I ported pm3 to LINUX_ALPHA :) > > On Jun 6, 2012, at 11:42 AM, Jay K wrote: > >> >>> Functions that call setjmp >> >> >> I meant -- functions wtih TRY/EXCEPT or TRY/FINALLY. :) >> >> - Jay >> >> ---------------------------------------- >>> From: jay.krell at cornell.edu >>> To: dragisha at m3w.org >>> Date: Wed, 6 Jun 2012 09:38:18 +0000 >>> CC: jkrell at elego.de; m3devel at elegosoft.com >>> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >>> >>> >>> 5.8.6 does allow many optimizations to occur. >>> We turn off a very small number directly. >>> Functions that call setjmp have optimizations inhibited by declaring all locals volatile. >>> We don't give the compiler good type information, and we take the address of stuff more than necessary, by >>> generating very low level code. >>> Where you have e.g. >>> MODULE Foo; >>> TYPE Point = RECORD x,y:INTEGER END; >>> PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; >>> >>> >>> We generate the equivalent of: >>> >>> >>> typedef ptrdiff_t INTEGER; >>> typedef char* ADDRESS; >>> INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } >>> >>> >>> Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. >>> >>> >>> >>> - Jay >>> >>> >>> ________________________________ >>>> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >>>> From: dragisha at m3w.org >>>> Date: Wed, 6 Jun 2012 10:51:33 +0200 >>>> CC: jkrell at elego.de; m3devel at elegosoft.com >>>> To: jay.krell at cornell.edu >>>> >>>> I am using it, and I need it. >>>> >>>> Does it run better/faster? I didn't test, but is it something to even >>>> ask, these days, architectures, ? ? >>>> >>>> Only if you turned everything off in 5.8.6 and later, as you'r doing it >>>> now, then probably my "-O2" default it is of no benefit at all :). >>>> >>>> Generally, our "pitch" to "sell" >>>> super-modern-ultra-blast-mega-fast-superlative-OO and everything else >>>> you only dreamed about? And add "no CPU optimizations"? Imagine that. >>>> >>>> On Jun 6, 2012, at 10:10 AM, Jay K wrote: >>>> >>>> 7) Do folks out there really use the Modula-3/gcc optimizer, and notice >>>> it produces code that runs much faster? >>>> >>> >> > From dabenavidesd at yahoo.es Fri Jun 8 01:21:47 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 8 Jun 2012 00:21:47 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: Message-ID: <1339111307.87279.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: perhaps this would show it: Again, what I'm saying is that you can use a WinNT system thread without losing M3 semantics as long as is implemented as it is in the consistent Model Architecture of the system: http://research.microsoft.com/en-us/um/people/qadeer/talks/microsoft-dec00.ppt Recently a guy from Intel (Rick Hudson) explained and out his? thoughts on that (but I find the same problem I can't understand the problem his is talking about that much). Rialto NT OS was implemented along the lines for embedded devices (nice!): http://www.youtube.com/watch?v=WUfvvFD5tAA DEC-SRC and MS worked together on this, in acting like so there was an Alpha "beta" Win2000, but it didn't happen, as the piranha project :( See this new architectures don't scale for that much they say (sorry HW guys, but show me a good proof I'm writing this from nothing related to it) Thanks in advance --- El jue, 7/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" Fecha: jueves, 7 de junio, 2012 16:44 Daniel: ?I?m impressed by your ability to provide so many different research links in your posts. ?But, after looking at the link you gave in response to my post, I don?t see the immediate relevance to my question regarding Modula-3 threading on Windows. ?Also, I?m sorry, but I have a very difficult time trying to understand what you are saying in your posts. I suppose it must have something to do with the translation between our different languages. Forgive me, but I don?t understand your reply. ?--Randy Coleburn ?From: Daniel Alejandro Benavides D. [mailto:dabenavidesd at yahoo.es] Sent: Thursday, June 07, 2012 3:43 PM To: m3devel at elegosoft.com; Coleburn, Randy Subject: Re: [M3devel] [M3commit] CVS Update: cm3 ?Hi all: Yes, it is, but the same conditioning over System Pthreads, is that you can't always link the threads against themselves, so you need re-implement it correctly. Good style DEC-SRC threads might be along the verification project for the Alpha with Vector extensions: http://barroso.org/publications/piranha_asilomar.pdf Thanks in advance --- El jue, 7/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" Fecha: jueves, 7 de junio, 2012 11:52Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7).? What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated.? Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things.? In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example.? In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know).? The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so.? Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time.? There is simply a static array of the types.? CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome.? It's all in RT0 somewhere.? In short, CM3 does "more" than SRC M3 did but at a heavy performance cost.? And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation.? Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job".? I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit.? I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads.? Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code.? I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong.? Life's too short... ? ???Mika P.S. how are the pthreads coming along?? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now?? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > ??? ???????? ?????? ??? ? = ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 04:05:11 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 02:05:11 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, Message-ID: 1. Yes, Daniel generally doesn't make sense to me either. 2. > Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Yes.Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do.Definitely better than others e.g. Boost. At some point maybe we could use the condition variables that Vista introduces, but 1) I'm reluctant to drop 2000/XP/etc. support and 2) if we implemented something that chose one implementation or the other at runtime, we'd lose coverage of the pre-Vista code. (I'm really disappointed in this area in Win32, that NT 3.1 and Windows 95 didn't have small locks, zero-or-at-least-statically-initializable locks, read/write locks, "once", and condition variables. Vista, finally, has all that. (SRWLOCK are the first three all in one -- small, zero-initialized, read/write...and given them, I'm not sure you really need "once".) Also note that historically we maintained a thread pool, so /creating/ a Modula-3 thread did not necessarily create a Win32 thread. I removed that though, so the implementation is more direct now, albeit probably slower. I didn't realize or forgot we had a problem here. I can try to look into it. The Win32 and pthreads implementation is similar enough, that it might easily be the same problem. - Jay From: rcolebur at SCIRES.COM To: m3devel at elegosoft.com Date: Thu, 7 Jun 2012 17:44:56 -0400 Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Daniel: I?m impressed by your ability to provide so many different research links in your posts. But, after looking at the link you gave in response to my post, I don?t see the immediate relevance to my question regarding Modula-3 threading on Windows. Also, I?m sorry, but I have a very difficult time trying to understand what you are saying in your posts. I suppose it must have something to do with the translation between our different languages. Forgive me, but I don?t understand your reply. --Randy Coleburn From: Daniel Alejandro Benavides D. [mailto:dabenavidesd at yahoo.es] Sent: Thursday, June 07, 2012 3:43 PM To: m3devel at elegosoft.com; Coleburn, Randy Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi all: Yes, it is, but the same conditioning over System Pthreads, is that you can't always link the threads against themselves, so you need re-implement it correctly. Good style DEC-SRC threads might be along the verification project for the Alpha with Vector extensions: http://barroso.org/publications/piranha_asilomar.pdf Thanks in advance --- El jue, 7/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" Fecha: jueves, 7 de junio, 2012 11:52Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7). What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated. Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things. In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example. In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know). The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so. Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time. There is simply a static array of the types. CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome. It's all in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a heavy performance cost. And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation. Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job". I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit. I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads. Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code. I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong. Life's too short... Mika P.S. how are the pthreads coming along? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > = -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 04:13:02 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 02:13:02 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607211135.GA6314@topoi.pooq.com> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , <20120606161808.7F5EA1A205B@async.async.caltech.edu>, , <20120607163641.A81351A205B@async.async.caltech.edu>, <20120607211135.GA6314@topoi.pooq.com> Message-ID: > I'd like to, if I only knew how. I'd be really interested in having the > low-level infrastructure for JIT code generators Would you be satisfied with a Modula-3 interpreter that interpreted a mostly-compiled form?It shouldn't be difficult.I don't know if our intermediate code was designed with interpretation in mind, but it seemslike it wouldn't be particularly difficult.You'd want a "linker" that just zips all the files and puts it "in" or "next to" the stub executable. This would solve the distribution format problem, partly.The existing intermediate code is platform-specific, but not by much (again: jumpbuf size, word size, endian,win32 vs. posix). But I have to admit, I'm keener on generating C than a JIT or an interpreter, andinterpreter is not JIT. Um. What do you hope to gain from JIT?A big reason I ask..is because..well, do you want to ship some portable-executable that relieson JIT being already installed/available? Or do you want to carry the JITer and its code together?Or do you want to target an existing widely deployed JITer such as CLR or Java? In my opinion, the biggest advantage of JIT is portable-executable, depending on widely deployed JITer.But targeting CLR or Java isn't as easy as targeting your own custom thing. I understand there are other advantages -- faster compilation, optimization very specific to runtime environment.But I think portable-executable is most important. That's why I like "script". :)There are disadvantages to JIT: slower execution/startup, maybe harder to debug, easy to reverse engineer (if you care). Heck, at some point you just ship the compiler and portable-executable is source code.There are pluses and minuses all around. - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 04:19:57 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 02:19:57 +0000 Subject: [M3devel] gcc 4.6 backend w/o optimizer? Message-ID: I need to know if I can start moving targets up to a gcc 4.6 backend, given that I've removed the vast majority of the optimizer from it.I will test some of the targets, maybe not all of them. So far I386_DARWIN and AMD64_DARWIN work and I built boot/cross archives for a very large list, and I can run cm3 on Solaris also (I forget which architectures, there are 4, probably SPARC32 at least). Or if there is vehement rejection of a missing optimizer, I can abandon 4.6 and start work on 4.7 instead.I get tired of the unnecessary tedium that I invented, so with 4.7, I'll try to keep the diff small, in particular: keep the gmp/mpfr/mpc dependencies don't compile it with C++ (except parse.c) There is no longer a "core" distribution of gcc, but I'll still cut out vast swaths like all but the C and LTO frontends (Java, C++, Objective C, Objective C++, Fortran, Ada), all of the libraries (libjava, libada, libssp, libmudflap, libgfortran, libquadmath, libgcc, libstdc++, etc.) I know I have one rejection of this but that might not be enough. Tony? - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 09:15:59 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 09:15:59 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607222311.E35A71A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> <20120607203537.1CBC81A205B@async.async.caltech.edu> <1495A062-1210-4D07-815C-D3609442C51B@m3w.org> <20120607222311.E35A71A205B@async.async.caltech.edu> Message-ID: <77A59CCB-0800-4C3C-8AF6-5B455B29DEF7@m3w.org> Thank you for effort. Possible solution is to map typecodes to orderable id's and re-sort every time dynamic loader changes type metadata. Any takers? That way, we will only add one to two array lookups to every TYPECASE invocation. Additional complexity for re-sort is single to small number of invocations. On Jun 8, 2012, at 12:23 AM, Mika Nystrom wrote: > The reason for the slowdowns is clear if you study the following code > for IsSubtype. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 10:06:49 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 10:06:49 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, Message-ID: Please explain this more, and if you can - draw parallel to *nix. TIA On Jun 8, 2012, at 4:05 AM, Jay K wrote: > Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > > > I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > Definitely better than others e.g. Boost. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 11:23:38 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 09:23:38 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , Message-ID: sorry -- clarification, we are similar to the widely used Sun/Oracle JVM.Not necessarily state-of-the-art, but not bad. Our locks map pretty directly to underlying pthread mutex, Win32 critical section.Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. Our condition variable functionaliy maps pretty directly to pthread condition variables.Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation.It was pretty bad. Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. - Jay> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > From: antony.hosking at gmail.com > Date: Fri, 8 Jun 2012 04:38:20 -0400 > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > To: dragisha at m3w.org > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > Please explain this more, and if you can - draw parallel to *nix. > > > > TIA > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > >> > >> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > >> Definitely better than others e.g. Boost. > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > - Tony > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 12:38:30 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 12:38:30 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> Message-ID: <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. Same thing? On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote: > My point is that modern JVMs, including Sun/Oracle HotSpot, don?t map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. > > On Jun 8, 2012, at 5:23 AM, Jay K wrote: > >> sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. >> Not necessarily state-of-the-art, but not bad. >> >> >> Our locks map pretty directly to underlying pthread mutex, Win32 critical section. >> Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. >> >> >> Our condition variable functionaliy maps pretty directly to pthread condition variables. >> Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. >> >> >> Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. >> It was pretty bad. >> >> >> Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. >> >> >> - Jay >> > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) >> > From: antony.hosking at gmail.com >> > Date: Fri, 8 Jun 2012 04:38:20 -0400 >> > CC: jay.krell at cornell.edu; m3devel at elegosoft.com >> > To: dragisha at m3w.org >> > >> > >> > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: >> > >> > > Please explain this more, and if you can - draw parallel to *nix. >> > > >> > > TIA >> > > >> > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: >> > > >> > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. >> > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) >> > >> >> > >> >> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. >> > >> Definitely better than others e.g. Boost. >> > >> > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. >> > >> > - Tony >> > > > > > Antony Hosking | Associate Professor | Computer Science | Purdue University > 305 N. University Street | West Lafayette | IN 47907 | USA > Mobile +1 765 427 5484 > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 12:48:47 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 12:48:47 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> Message-ID: <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org> On Jun 8, 2012, at 12:38 PM, Dragi?a Duri? wrote: > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. > > Same thing? > Meaning: "At least under Linux, Modula-3 using pthreads does same thing as modern JVMs?" > On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote: > >> My point is that modern JVMs, including Sun/Oracle HotSpot, don?t map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. >> >> On Jun 8, 2012, at 5:23 AM, Jay K wrote: >> >>> sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. >>> Not necessarily state-of-the-art, but not bad. >>> >>> >>> Our locks map pretty directly to underlying pthread mutex, Win32 critical section. >>> Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. >>> >>> >>> Our condition variable functionaliy maps pretty directly to pthread condition variables. >>> Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. >>> >>> >>> Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. >>> It was pretty bad. >>> >>> >>> Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. >>> >>> >>> - Jay >>> > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) >>> > From: antony.hosking at gmail.com >>> > Date: Fri, 8 Jun 2012 04:38:20 -0400 >>> > CC: jay.krell at cornell.edu; m3devel at elegosoft.com >>> > To: dragisha at m3w.org >>> > >>> > >>> > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: >>> > >>> > > Please explain this more, and if you can - draw parallel to *nix. >>> > > >>> > > TIA >>> > > >>> > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: >>> > > >>> > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. >>> > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) >>> > >> >>> > >> >>> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. >>> > >> Definitely better than others e.g. Boost. >>> > >>> > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. >>> > >>> > - Tony >>> > >> >> >> >> Antony Hosking | Associate Professor | Computer Science | Purdue University >> 305 N. University Street | West Lafayette | IN 47907 | USA >> Mobile +1 765 427 5484 >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hosking at cs.purdue.edu Fri Jun 8 12:25:20 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 8 Jun 2012 06:25:20 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , Message-ID: <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> My point is that modern JVMs, including Sun/Oracle HotSpot, don?t map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. On Jun 8, 2012, at 5:23 AM, Jay K wrote: > sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. > Not necessarily state-of-the-art, but not bad. > > > Our locks map pretty directly to underlying pthread mutex, Win32 critical section. > Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. > > > Our condition variable functionaliy maps pretty directly to pthread condition variables. > Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. > > > Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. > It was pretty bad. > > > Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. > > > - Jay > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > > From: antony.hosking at gmail.com > > Date: Fri, 8 Jun 2012 04:38:20 -0400 > > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > > To: dragisha at m3w.org > > > > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > > > Please explain this more, and if you can - draw parallel to *nix. > > > > > > TIA > > > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > > >> > > >> > > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > > >> Definitely better than others e.g. Boost. > > > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > > > - Tony > > Antony Hosking | Associate Professor | Computer Science | Purdue University 305 N. University Street | West Lafayette | IN 47907 | USA Mobile +1 765 427 5484 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 13:20:56 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 11:20:56 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org> References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org>, <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org> Message-ID: > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. So is uncontended Win32 critical section and uncontended Win32 SRWLOCK. Just disassemble and/or step through them... Mutex/Semaphore/Event, those always go to the kernel, unfortunately. So our win32 condition-variable-ish stuff might, I have to check. It'd be unfortunate, but it still probably as good as it can be, short of depending on Vista. (Uncontended Vista+ condition variables surely don't involve the kernel either.) The CAS isn't inlined. There is a function call. A dynamically linked one, so at least on Win32, it goes through a function pointer, but other than inlining factors, it can still be very fast. It can bias to a thread, and such. But there will be a function call. - Jay Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) From: dragisha at m3w.org Date: Fri, 8 Jun 2012 12:48:47 +0200 CC: m3devel at elegosoft.com; jay.krell at cornell.edu To: hosking at cs.purdue.edu On Jun 8, 2012, at 12:38 PM, Dragi?a Duri? wrote:At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. Same thing? Meaning: "At least under Linux, Modula-3 using pthreads does same thing as modern JVMs?" On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote:My point is that modern JVMs, including Sun/Oracle HotSpot, don't map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. On Jun 8, 2012, at 5:23 AM, Jay K wrote:sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. Not necessarily state-of-the-art, but not bad. Our locks map pretty directly to underlying pthread mutex, Win32 critical section. Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. Our condition variable functionaliy maps pretty directly to pthread condition variables. Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. It was pretty bad. Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. - Jay > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > From: antony.hosking at gmail.com > Date: Fri, 8 Jun 2012 04:38:20 -0400 > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > To: dragisha at m3w.org > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > Please explain this more, and if you can - draw parallel to *nix. > > > > TIA > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > >> > >> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > >> Definitely better than others e.g. Boost. > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > - Tony > Antony Hosking | Associate Professor | Computer Science | Purdue University305 N. University Street | West Lafayette | IN 47907 | USAMobile +1 765 427 5484 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 13:50:14 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 11:50:14 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , , , , , , , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu>, <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org>, , <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org>, Message-ID: I don't fully understand the paper, but clearly people want to both avoid the function call, and the CAS. And clearly this is viable and often profitable -- often times locks are only ever acquired by one thread, or are locked many times by one thread, then many times by another, etc. The tricky part is adapting to determine which locks benefit, and handling the "transitions" (or "bias revocation") when a "second" thread does acquire the lock. Traditional C/C++ systems are always going to have the function call.Whether or not the CAS can be optimized away in such "unmanaged" systems, I don't know.For example, Win32 SRWLOCKs have no "cleanup" function, nor a required "initialize" function, so that might limit the flexibility of the implementation, though certainly is also advantageous.. - Jay From: jay.krell at cornell.edu To: dragisha at m3w.org; hosking at cs.purdue.edu Date: Fri, 8 Jun 2012 11:20:56 +0000 CC: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. So is uncontended Win32 critical section and uncontended Win32 SRWLOCK. Just disassemble and/or step through them... Mutex/Semaphore/Event, those always go to the kernel, unfortunately. So our win32 condition-variable-ish stuff might, I have to check. It'd be unfortunate, but it still probably as good as it can be, short of depending on Vista. (Uncontended Vista+ condition variables surely don't involve the kernel either.) The CAS isn't inlined. There is a function call. A dynamically linked one, so at least on Win32, it goes through a function pointer, but other than inlining factors, it can still be very fast. It can bias to a thread, and such. But there will be a function call. - Jay Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) From: dragisha at m3w.org Date: Fri, 8 Jun 2012 12:48:47 +0200 CC: m3devel at elegosoft.com; jay.krell at cornell.edu To: hosking at cs.purdue.edu On Jun 8, 2012, at 12:38 PM, Dragi?a Duri? wrote:At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. Same thing? Meaning: "At least under Linux, Modula-3 using pthreads does same thing as modern JVMs?" On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote:My point is that modern JVMs, including Sun/Oracle HotSpot, don't map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. On Jun 8, 2012, at 5:23 AM, Jay K wrote:sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. Not necessarily state-of-the-art, but not bad. Our locks map pretty directly to underlying pthread mutex, Win 32 critical section. Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. Our condition variable functionaliy maps pretty directly to pthread condition variables. Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. It was pretty bad. Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the va st majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. - Jay > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > From: antony.hosking at gmail.com > Date: Fri, 8 Jun 2012 04:38:20 -0400 > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > To: dragisha at m3w.org > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > Please explain this more, and if you can - draw parallel to *nix. > > > > TIA > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > >> > >> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > >> Definitely better than others e.g. Boost. > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > - Tony > Antony Hosking | Associate Professor | Computer Science | Purdue University305 N. University Street | West Lafayette | IN 47907 | USAMobile +1 765 427 5484 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hosking at cs.purdue.edu Fri Jun 8 16:40:35 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 8 Jun 2012 10:40:35 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , , , , , , , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu>, <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org>, , <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org>, Message-ID: <9A392836-2304-4D12-BB87-78A01C7391DF@cs.purdue.edu> Right. On Jun 8, 2012, at 7:50 AM, Jay K wrote: > I don't fully understand the paper, but clearly people want to both avoid the function call, and the CAS. > And clearly this is viable and often profitable -- often times locks are only ever acquired by one thread, or are locked many times by one thread, then many times by another, etc. The tricky part is adapting to determine which locks benefit, and handling the "transitions" (or "bias revocation") when a "second" thread does acquire the lock. > > > Traditional C/C++ systems are always going to have the function call. > Whether or not the CAS can be optimized away in such "unmanaged" systems, I don't know. > For example, Win32 SRWLOCKs have no "cleanup" function, nor a required "initialize" function, so that might limit the flexibility of the implementation, though certainly is also advantageous.. > > > - Jay > > From: jay.krell at cornell.edu > To: dragisha at m3w.org; hosking at cs.purdue.edu > Date: Fri, 8 Jun 2012 11:20:56 +0000 > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > > > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. > > > So is uncontended Win32 critical section and uncontended Win32 SRWLOCK. Just disassemble and/or step through them... > Mutex/Semaphore/Event, those always go to the kernel, unfortunately. > So our win32 condition-variable-ish stuff might, I have to check. It'd be unfortunate, but it still probably as good as it can be, short of depending on Vista. (Uncontended Vista+ condition variables surely don't involve the kernel either.) > > > The CAS isn't inlined. There is a function call. A dynamically linked one, so at least on Win32, it goes through a function pointer, but other than inlining factors, it can still be very fast. It can bias to a thread, and such. But there will be a function call. > > > - Jay > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > From: dragisha at m3w.org > Date: Fri, 8 Jun 2012 12:48:47 +0200 > CC: m3devel at elegosoft.com; jay.krell at cornell.edu > To: hosking at cs.purdue.edu > > > On Jun 8, 2012, at 12:38 PM, Dragi?a Duri? wrote: > > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. > > Same thing? > > > Meaning: "At least under Linux, Modula-3 using pthreads does same thing as modern JVMs?" > > > On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote: > > My point is that modern JVMs, including Sun/Oracle HotSpot, don't map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. > > On Jun 8, 2012, at 5:23 AM, Jay K wrote: > > sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. > Not necessarily state-of-the-art, but not bad. > > > Our locks map pretty directly to underlying pthread mutex, Win 32 critical section. > Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. > > > Our condition variable functionaliy maps pretty directly to pthread condition variables. > Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. > > > Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. > It was pretty bad. > > > Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the va st majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. > > > - Jay > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > > From: antony.hosking at gmail.com > > Date: Fri, 8 Jun 2012 04:38:20 -0400 > > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > > To: dragisha at m3w.org > > > > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > > > Please explain this more, and if you can - draw parallel to *nix. > > > > > > TIA > > > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > > >> > > >> > > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > > >> Definitely better than others e.g. Boost. > > > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > > > - Tony > > > > > > Antony Hosking | Associate Professor | Computer Science | Purdue University > 305 N. University Street | West Lafayette | IN 47907 | USA > Mobile +1 765 427 5484 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Fri Jun 8 16:55:40 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Fri, 8 Jun 2012 10:55:40 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <20120607211135.GA6314@topoi.pooq.com> Message-ID: <20120608145540.GA10805@topoi.pooq.com> On Fri, Jun 08, 2012 at 02:13:02AM +0000, Jay K wrote: > > > I'd like to, if I only knew how. I'd be really interested in having the > > low-level infrastructure for JIT code generators > Would you be satisfied with a Modula-3 interpreter that interpreted a > mostly-compiled form?It shouldn't be difficult. That would be lovely, for all the reasons and opportunitied you mentioned, but it's mostly orthogonal to what I want. I want to write JIT implementations for other languages, languages that have their own methods for defining data structures, but I want them to be interoperable with the Modula 3 I know and like. I don't mind writing a code generator or two, if necessary. But an interpreter would provide poratbility instead of efficiency. Having both could be useful. For example, I'd like to implement a formalism that enables me to download code from the net, formally verify its safety and then be able to execute it really fast. Yes, I might be comiling it all at once instead of a line at at time, but I do want to be able to add it to an existing running program, and saying "JIT" is about the easiest brief summary. I'm quite aware that doing more than a half-assed version of this would be a big project, and that's probably an understatement. > I don't know if our intermediate code was designed with interpretation > in mind, but it seems like it wouldn't be particularly difficult. > You'd want a "linker" that just zips all the files and puts it "in" or > "next to" the stub executable. This would solve the distribution > format problem, partly.The existing intermediate code is > platform-specific, but not by much (again: jumpbuf size, word size, > endian,win32 vs. posix). > But I have to admit, I'm keener on generating C than a JIT or an > interpreter, and interpreter is not JIT. > Um. What do you hope to gain from JIT? The ability to dynamically add code to an existing program and have it run fast. Possibly to have the program generate additional code to add to itself. > A big reason I ask..is > because..well, do you want to ship some portable-executable that > relieson JIT being already installed/available? Or do you want to > carry the JITer and its code together?Or do you want to target an > existing widely deployed JITer such as CLR or Java? In my opinion, > the biggest advantage of JIT is portable-executable, depending on > widely deployed JITer.But targeting CLR or Java isn't as easy as > targeting your own custom thing. I understand there are other > advantages -- faster compilation, optimization very specific to > runtime environment.But I think portable-executable is most important. > That's why I like "script". :)There are disadvantages to JIT: slower > execution/startup, maybe harder to debug, easy to reverse engineer (if > you care). Heck, at some point you just ship the compiler and > portable-executable is source code.There are pluses and minuses all > around. JIT is for speed. Otherwise, interpretation would suffice, and could even be portbale. But even an interpreter would like to be able to add new garbage-collectible types, which is what I'm asking for at the moment. - Jay From hosking at cs.purdue.edu Fri Jun 8 16:39:39 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 8 Jun 2012 10:39:39 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> Message-ID: <7FF89030-927D-40C6-993D-DB44E88A35AD@cs.purdue.edu> Agreed, but we should be able to inline the CAS, avoiding a function call. On Jun 8, 2012, at 6:38 AM, Dragi?a Duri? wrote: > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. > > Same thing? > > On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote: > >> My point is that modern JVMs, including Sun/Oracle HotSpot, don?t map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. >> >> On Jun 8, 2012, at 5:23 AM, Jay K wrote: >> >>> sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. >>> Not necessarily state-of-the-art, but not bad. >>> >>> >>> Our locks map pretty directly to underlying pthread mutex, Win32 critical section. >>> Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. >>> >>> >>> Our condition variable functionaliy maps pretty directly to pthread condition variables. >>> Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. >>> >>> >>> Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. >>> It was pretty bad. >>> >>> >>> Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. >>> >>> >>> - Jay >>> > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) >>> > From: antony.hosking at gmail.com >>> > Date: Fri, 8 Jun 2012 04:38:20 -0400 >>> > CC: jay.krell at cornell.edu; m3devel at elegosoft.com >>> > To: dragisha at m3w.org >>> > >>> > >>> > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: >>> > >>> > > Please explain this more, and if you can - draw parallel to *nix. >>> > > >>> > > TIA >>> > > >>> > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: >>> > > >>> > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. >>> > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) >>> > >> >>> > >> >>> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. >>> > >> Definitely better than others e.g. Boost. >>> > >>> > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. >>> > >>> > - Tony >>> > >> >> >> >> Antony Hosking | Associate Professor | Computer Science | Purdue University >> 305 N. University Street | West Lafayette | IN 47907 | USA >> Mobile +1 765 427 5484 >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jun 8 17:20:48 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 8 Jun 2012 16:20:48 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120608145540.GA10805@topoi.pooq.com> Message-ID: <1339168848.48067.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: interesting someone did that (see others in web search engine): http://compilers.iecc.com/comparch/article/98-03-247 Besides a partial JVM. It would be a selling point for CM3 to be readily implemented and efficient. Thanks in advance --- El vie, 8/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel" Fecha: viernes, 8 de junio, 2012 09:55 On Fri, Jun 08, 2012 at 02:13:02AM +0000, Jay K wrote: > >? > I'd like to, if I only knew how.? I'd be really interested in having the >? > low-level infrastructure for JIT code generators >? Would you be satisfied with a Modula-3 interpreter that interpreted a > mostly-compiled form?It shouldn't be difficult. That would be lovely, for all the reasons and opportunitied you mentioned, but it's mostly orthogonal to what I want. I want to write JIT implementations for other languages, languages that have their own methods for defining data structures, but I want them to be interoperable with the Modula 3 I know and like. I don't mind writing a code generator or two, if necessary.? But an interpreter would provide poratbility instead of efficiency.? Having both could be useful. For example, I'd like to implement a formalism that enables me to download code from the net, formally verify its safety and then be able to execute it really fast.? Yes, I might be comiling it all at once instead of a line at at time, but I do want to be able to add it to an existing running program, and saying "JIT" is about the easiest brief summary. I'm quite aware that doing more than a half-assed version of this would be a big project, and that's probably an understatement. ? > I don't know if our intermediate code was designed with interpretation > in mind, but it seems like it wouldn't be particularly difficult. > You'd want a "linker" that just zips all the files and puts it "in" or > "next to" the stub executable.? This would solve the distribution > format problem, partly.The existing intermediate code is > platform-specific, but not by much (again: jumpbuf size, word size, > endian,win32 vs. posix). > But I have to admit, I'm keener on generating C than a JIT or an > interpreter, and interpreter is not JIT. >? Um. What do you hope to gain from JIT? The ability to dynamically add code to an existing program and have it run fast.? Possibly to have the program generate additional code to add to itself. > A big reason I ask..is > because..well, do you want to ship some portable-executable that > relieson JIT being already installed/available? Or do you want to > carry the JITer and its code together?Or do you want to target an > existing widely deployed JITer such as CLR or Java?? In my opinion, > the biggest advantage of JIT is portable-executable, depending on > widely deployed JITer.But targeting CLR or Java isn't as easy as > targeting your own custom thing.? I understand there are other > advantages -- faster compilation, optimization very specific to > runtime environment.But I think portable-executable is most important. > That's why I like "script". :)There are disadvantages to JIT: slower > execution/startup, maybe harder to debug, easy to reverse engineer (if > you care).? Heck, at some point you just ship the compiler and > portable-executable is source code.There are pluses and minuses all > around. JIT is for speed.? Otherwise, interpretation would suffice, and could even be portbale.? But even an interpreter would like to be able to add new garbage-collectible types, which is what I'm asking for at the moment. ? ? - Jay ??? ???????? ?????? ??? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Fri Jun 8 17:37:04 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Fri, 8 Jun 2012 17:37:04 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120608145540.GA10805@topoi.pooq.com> References: <20120606064732.2C9242474003@birch.elegosoft.com><55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org><20120606161808.7F5EA1A205B@async.async.caltech.edu><20120607163641.A81351A205B@async.async.caltech.edu><20120607211135.GA6314@topoi.pooq.com> <20120608145540.GA10805@topoi.pooq.com> Message-ID: That would be relatively easy. libjit offers an excellent infrastructure for building just in time compilers. On the down-side: Slow program start and a considerable waste of memory resources. Their code generator is as good as non-optimised C. An example: A JIT translator for Oberon. -------------------------------------------------- From: "Hendrik Boom" Sent: Friday, June 08, 2012 4:55 PM To: "m3devel" Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > On Fri, Jun 08, 2012 at 02:13:02AM +0000, Jay K wrote: >> >> > I'd like to, if I only knew how. I'd be really interested in having the >> > low-level infrastructure for JIT code generators >> Would you be satisfied with a Modula-3 interpreter that interpreted a >> mostly-compiled form?It shouldn't be difficult. > > That would be lovely, for all the reasons and opportunitied you > mentioned, but it's mostly orthogonal to what I want. > > I want to write JIT implementations for other languages, languages that > have their own methods for defining data structures, but I want them to > be interoperable with the Modula 3 I know and like. > > I don't mind writing a code generator or two, if necessary. But an > interpreter would provide poratbility instead of efficiency. Having > both could be useful. > > For example, I'd like to implement a formalism that enables me to > download code from the net, formally verify its safety and then be able > to execute it really fast. Yes, I might be comiling it all at once > instead of a line at at time, but I do want to be able to add it to an > existing running program, and saying "JIT" is about the easiest brief > summary. > > I'm quite aware that doing more than a half-assed version of this would > be a big project, and that's probably an understatement. > >> I don't know if our intermediate code was designed with interpretation >> in mind, but it seems like it wouldn't be particularly difficult. >> You'd want a "linker" that just zips all the files and puts it "in" or >> "next to" the stub executable. This would solve the distribution >> format problem, partly.The existing intermediate code is >> platform-specific, but not by much (again: jumpbuf size, word size, >> endian,win32 vs. posix). > >> But I have to admit, I'm keener on generating C than a JIT or an >> interpreter, and interpreter is not JIT. >> Um. What do you hope to gain from JIT? > > The ability to dynamically add code to an existing program and have it > run fast. Possibly to have the program generate additional code to add > to itself. > >> A big reason I ask..is >> because..well, do you want to ship some portable-executable that >> relieson JIT being already installed/available? Or do you want to >> carry the JITer and its code together?Or do you want to target an >> existing widely deployed JITer such as CLR or Java? In my opinion, >> the biggest advantage of JIT is portable-executable, depending on >> widely deployed JITer.But targeting CLR or Java isn't as easy as >> targeting your own custom thing. I understand there are other >> advantages -- faster compilation, optimization very specific to >> runtime environment.But I think portable-executable is most important. >> That's why I like "script". :)There are disadvantages to JIT: slower >> execution/startup, maybe harder to debug, easy to reverse engineer (if >> you care). Heck, at some point you just ship the compiler and >> portable-executable is source code.There are pluses and minuses all >> around. > > JIT is for speed. Otherwise, interpretation would suffice, and could > even be portbale. But even an interpreter would like to be able to add > new garbage-collectible types, which is what I'm asking for at the > moment. > > - Jay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jun 8 20:50:23 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 8 Jun 2012 19:50:23 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: Message-ID: <1339181423.68039.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: Olivetti M3 had one AST-based interpreter, Vulcan was AST-based environment I don't know which was better. Vulcan was heavily parallelized could be nice to make a Multi-Threaded Execution Engine. Olivetti M3 AST tk could be mostly like a good AST for doing extensible kind of meta-environment (and you could retarget C) so for instance use it to generate a portable? environment? in that sense and then execute it to on fast Vulcan parallel make fast JIT builder Thanks in advance --- El vie, 8/6/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "Hendrik Boom" , "m3devel" Fecha: viernes, 8 de junio, 2012 10:37 That would be relatively easy. libjit offers an excellent infrastructure for building just in time compilers. On the down-side: Slow program start and a considerable waste of memory resources. Their code generator is as good as non-optimised C. An example: A JIT translator for Oberon. -------------------------------------------------- From: "Hendrik Boom" Sent: Friday, June 08, 2012 4:55 PM To: "m3devel" Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > On Fri, Jun 08, 2012 at 02:13:02AM +0000, Jay K wrote: >> >>? > I'd like to, if I only knew how.? I'd be really interested in having the >>? > low-level infrastructure for JIT code generators >>? Would you be satisfied with a Modula-3 interpreter that interpreted a >> mostly-compiled form?It shouldn't be difficult. > > That would be lovely, for all the reasons and opportunitied you > mentioned, but it's mostly orthogonal to what I want. > > I want to write JIT implementations for other languages, languages that > have their own methods for defining data structures, but I want them to > be interoperable with the Modula 3 I know and like. > > I don't mind writing a code generator or two, if necessary.? But an > interpreter would provide poratbility instead of efficiency.? Having > both could be useful. > > For example, I'd like to implement a formalism that enables me to > download code from the net, formally verify its safety and then be able > to execute it really fast.? Yes, I might be comiling it all at once > instead of a line at at time, but I do want to be able to add it to an > existing running program, and saying "JIT" is about the easiest brief > summary. > > I'm quite aware that doing more than a half-assed version of this would > be a big project, and that's probably an understatement. >? >> I don't know if our intermediate code was designed with interpretation >> in mind, but it seems like it wouldn't be particularly difficult. >> You'd want a "linker" that just zips all the files and puts it "in" or >> "next to" the stub executable.? This would solve the distribution >> format problem, partly.The existing intermediate code is >> platform-specific, but not by much (again: jumpbuf size, word size, >> endian,win32 vs. posix). > >> But I have to admit, I'm keener on generating C than a JIT or an >> interpreter, and interpreter is not JIT. >>? Um. What do you hope to gain from JIT? > > The ability to dynamically add code to an existing program and have it > run fast.? Possibly to have the program generate additional code to add > to itself. > >> A big reason I ask..is >> because..well, do you want to ship some portable-executable that >> relieson JIT being already installed/available? Or do you want to >> carry the JITer and its code together?Or do you want to target an >> existing widely deployed JITer such as CLR or Java?? In my opinion, >> the biggest advantage of JIT is portable-executable, depending on >> widely deployed JITer.But targeting CLR or Java isn't as easy as >> targeting your own custom thing.? I understand there are other >> advantages -- faster compilation, optimization very specific to >> runtime environment.But I think portable-executable is most important. >> That's why I like "script". :)There are disadvantages to JIT: slower >> execution/startup, maybe harder to debug, easy to reverse engineer (if >> you care).? Heck, at some point you just ship the compiler and >> portable-executable is source code.There are pluses and minuses all >> around. > > JIT is for speed.? Otherwise, interpretation would suffice, and could > even be portbale.? But even an interpreter would like to be able to add > new garbage-collectible types, which is what I'm asking for at the > moment. > >??? - Jay????? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sun Jun 10 10:34:36 2012 From: jay.krell at cornell.edu (Jay K) Date: Sun, 10 Jun 2012 08:34:36 +0000 Subject: [M3devel] reducing our diff to gcc? Message-ID: reducing our diff to gcc? Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. But to my point: gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. but, tree-nested.c, I doubt this can be avoided..so I'm left probably just not bothering with the others. Thoughts? There is also at least one bug fix...that I could avoid needing. There is a bug optimizing our form of div/mod. We could avoid that by going back to function calls, but..again, I'm torn. If you configure -enable-checking, at least currently, there are asserts that have to be removed. I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. ?- Jay From dragisha at m3w.org Sun Jun 10 10:58:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 10 Jun 2012 10:58:00 +0200 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: References: Message-ID: <58EF8A55-4D81-401C-AC47-C5826F6EE759@m3w.org> Think Occam. Not overdoing is good idea :). On Jun 10, 2012, at 10:34 AM, Jay K wrote: > > I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sun Jun 10 12:31:46 2012 From: jay.krell at cornell.edu (Jay K) Date: Sun, 10 Jun 2012 10:31:46 +0000 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: <58EF8A55-4D81-401C-AC47-C5826F6EE759@m3w.org> References: , <58EF8A55-4D81-401C-AC47-C5826F6EE759@m3w.org> Message-ID: Hehe. If someone builds something over-complicated, am I obligated to strip it back down? :) ?- Jay ________________________________ > Subject: Re: [M3devel] reducing our diff to gcc? > From: dragisha at m3w.org > Date: Sun, 10 Jun 2012 10:58:00 +0200 > CC: m3devel at elegosoft.com > To: jay.krell at cornell.edu > > Think Occam. Not overdoing is good idea :). > > On Jun 10, 2012, at 10:34 AM, Jay K wrote: > > > I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. > From dragisha at m3w.org Sun Jun 10 13:05:35 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 10 Jun 2012 13:05:35 +0200 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: References: , <58EF8A55-4D81-401C-AC47-C5826F6EE759@m3w.org> Message-ID: <5663CB3D-C3ED-4BA0-823F-4D251B29F1A6@m3w.org> That makes your change compilcated :). So, no! :) On Jun 10, 2012, at 12:31 PM, Jay K wrote: > > Hehe. If someone builds something over-complicated, am I obligated to strip it back down? > :) > > - Jay > > ________________________________ >> Subject: Re: [M3devel] reducing our diff to gcc? >> From: dragisha at m3w.org >> Date: Sun, 10 Jun 2012 10:58:00 +0200 >> CC: m3devel at elegosoft.com >> To: jay.krell at cornell.edu >> >> Think Occam. Not overdoing is good idea :). >> >> On Jun 10, 2012, at 10:34 AM, Jay K wrote: >> >> >> I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. >> > From dragisha at m3w.org Sun Jun 10 16:16:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 10 Jun 2012 16:16:00 +0200 Subject: [M3devel] new kid on the block: http://lycus.org/ Message-ID: <3590891F-3B7B-46B1-83F6-7155F9254927@m3w.org> Maybe of interest. A friend of mine, D fan, sent this to me. From rodney_bates at lcwb.coop Mon Jun 11 14:39:09 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Mon, 11 Jun 2012 07:39:09 -0500 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: References: Message-ID: <4FD5E6ED.3040503@lcwb.coop> On 06/10/2012 03:34 AM, Jay K wrote: > > reducing our diff to gcc? > > > Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. > > > but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. > > But to my point: > > > gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. > > > tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. > > > but, tree-nested.c, I doubt this can be avoided..so I'm left probably > just not bothering with the others. > tree-nested.c has been a thorn in my side from its inception. I broke a whole lot of stuff in m3gdb, everything that has to do with nonlocal variable access and/or variables of procedure type. It reshuffles the activation record around, with multiple copies of lots of things, especially the static link, which has either two, or, if I remember right, three copies in different places. Moreover, they don't all point to the same place in their target AR. All this wouldn't be too bad, if we got the debug info altered to reflect the reality, but by the time tree-nested does its thing, it's kind of late to do that easily. That's one of the attractions of llvm to me, that it's well set up to transform both the code and its debug info in parallel, when doing optimization. Maybe gcc would be easier too, if we didn't do our own debug info production in parse.c. That could be a lot of work, but would fit fit nicely with switching to dwarf. As I understood it, all of the changes tree-nested.c makes are really only needed for the interaction between nonlocal variable access _and_ inlining. The last I knew we have had inlining disabled from the beginning anyway. Jay, if this is still true, and as you are into disabling various gcc optimizations, what would you think of just disabling what tree-nested does? > > Thoughts? > > > There is also at least one bug fix...that I could avoid needing. > There is a bug optimizing our form of div/mod. > We could avoid that by going back to function calls, but..again, I'm torn. > > > If you configure -enable-checking, at least currently, there are asserts that have to be removed. > > > I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. > > > - Jay > From jay.krell at cornell.edu Mon Jun 11 21:07:26 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 11 Jun 2012 19:07:26 +0000 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: <4FD5E6ED.3040503@lcwb.coop> References: , <4FD5E6ED.3040503@lcwb.coop> Message-ID: ?> Maybe gcc would be easier too, if we didn't do our own debug ? > info production in parse.c. Correct. It is "our fault" for doing wierd things debugging-wise. ?> That could be a lot of work It is "the right amount of work", but yeah, kind of a lot. ?> but would fit fit nicely with switching to dwarf. We'd just use -g and use whatever gcc wants for the target system. Sometimes Dwarf, sometimes not, we wouldn't care. ? > As I understood it, all of the changes tree-nested.c makes are really only > needed for the interaction between nonlocal variable access _and_ inlining. I don't think so, but I don't know. > The last I knew we have had inlining disabled from the beginning anyway. We have inlining on mostly. Aside from a small sprinkling of "volatile". Off in gcc 4.6 backend, but I never enabled that and am moving on to 4.7 rapidly. > what would you think of just disabling what tree-nested does? I'm really not sure it is possible. Sure, if nested functions used only for "lexical hiding" of the functions themselves. But Modula-3 uses the "static link" in a unique-to-itself way. I don't expect gcc to "just work". I can explain the Modula-3 unique way if people want. It turns out..I have thought about this a bunch, there is no good way to handle the static link, given that you can take the addresses of nested functions. (Right?) Where you don't take the address, the static link can just be an extra parameter. Or maybe this is dealt with elsewhere or otherwise... We do actually use "extra parameter" sometimes for static link. And maybe elsewhere/otherwise is in the frontend, mostly..just mostly... There are comments in tree-nested.c indicating it has "bad history". But actually, I'm not sure it does things so poorly. The basic theory of nested functions includes stuffing locals into a struct, at least locals accessed by nested functions, and passing a pointer to that struct as an extra parameter. The locals include said pointer to struct of locals, in the case of multiple nesting levels. OR you can "flatten" things, I guess, maybe. Flattening is problematic though, given nested functions can be mutually recursive and such..you want to update just one place and have all the other code follow pointers to it. Optimization can copy around copies instead of pointers, where it is profitable. Sorry, I don't have time to explain right now. ?- Jay ---------------------------------------- > Date: Mon, 11 Jun 2012 07:39:09 -0500 > From: rodney_bates at lcwb.coop > To: m3devel at elegosoft.com > Subject: Re: [M3devel] reducing our diff to gcc? > > > > On 06/10/2012 03:34 AM, Jay K wrote: > > > > reducing our diff to gcc? > > > > > > Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. > > > > > > but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. > > > > But to my point: > > > > > > gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. > > > > > > tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. > > > > > > but, tree-nested.c, I doubt this can be avoided..so I'm left probably > > just not bothering with the others. > > > > tree-nested.c has been a thorn in my side from its inception. I broke a whole > lot of stuff in m3gdb, everything that has to do with nonlocal variable access > and/or variables of procedure type. It reshuffles the activation record around, > with multiple copies of lots of things, especially the static link, which has > either two, or, if I remember right, three copies in different places. Moreover, > they don't all point to the same place in their target AR. > > All this wouldn't be too bad, if we got the debug info altered to reflect the > reality, but by the time tree-nested does its thing, it's kind of late to do > that easily. That's one of the attractions of llvm to me, that it's well set > up to transform both the code and its debug info in parallel, when doing > optimization. Maybe gcc would be easier too, if we didn't do our own debug > info production in parse.c. That could be a lot of work, but would fit > fit nicely with switching to dwarf. > > As I understood it, all of the changes tree-nested.c makes are really only > needed for the interaction between nonlocal variable access _and_ inlining. > The last I knew we have had inlining disabled from the beginning anyway. > Jay, if this is still true, and as you are into disabling various gcc > optimizations, what would you think of just disabling what tree-nested does? > > > > > Thoughts? > > > > > > There is also at least one bug fix...that I could avoid needing. > > There is a bug optimizing our form of div/mod. > > We could avoid that by going back to function calls, but..again, I'm torn. > > > > > > If you configure -enable-checking, at least currently, there are asserts that have to be removed. > > > > > > I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. > > > > > > - Jay > > From rodney_bates at lcwb.coop Tue Jun 12 18:17:50 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Tue, 12 Jun 2012 11:17:50 -0500 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: References: , <4FD5E6ED.3040503@lcwb.coop> Message-ID: <4FD76BAE.7060702@lcwb.coop> On 06/11/2012 02:07 PM, Jay K wrote: > > > Maybe gcc would be easier too, if we didn't do our own debug > > info production in parse.c. > > Correct. It is "our fault" for doing wierd things debugging-wise. > > > That could be a lot of work > > > It is "the right amount of work", but yeah, kind of a lot. > > > but would fit fit nicely with switching to dwarf. > We'd just use -g and use whatever gcc wants for the target system. > Sometimes Dwarf, sometimes not, we wouldn't care. > It's going to require quite a lot in m3gdb. Stock gdb has readers for several debug info formats, but there's a lot that is language-dependent, even for C, let alone the other languages supported by stock gdb. I think this has considerable debug-format dependency too, leading to a Cartesian product. It is certainly that way for Modula-3. I would be greatly surprised if gcc didn't also require at least a bit of M3-dependent work, even for dwarf. > > > As I understood it, all of the changes tree-nested.c makes are really only > > needed for the interaction between nonlocal variable access _and_ inlining. > > > I don't think so, but I don't know. > > > > The last I knew we have had inlining disabled from the beginning anyway. > > > We have inlining on mostly. Aside from a small sprinkling of "volatile". > Off in gcc 4.6 backend, but I never enabled that and am moving on to 4.7 rapidly. > > > > what would you think of just disabling what tree-nested does? > I'm really not sure it is possible. > Sure, if nested functions used only for "lexical hiding" of the functions themselves. > But Modula-3 uses the "static link" in a unique-to-itself way. > I don't expect gcc to "just work". > I can explain the Modula-3 unique way if people want. > It turns out..I have thought about this a bunch, there is no good way to handle the static link, > given that you can take the addresses of nested functions. (Right?) > Please elaborate. Yes, you can take the address of a nested function. But you can only pass it as a parameter. You can't assign it to a variable. This latter restriction requires some runtime enforcement, but I think it is taken care of by explicitly coded runtime checks generated by parse.c or earlier. The nested-function language extension to C, implemented by stock gcc, allows the taking of the address of a nested function, without the restriction against assigning it to a variable, with no linguistic safety added. If, in C, you use such a function "address" for a function that has returned, to quote from gcc "all hell will break loose". But this should imply that stock gcc support is enough for Modula-3. > > > Where you don't take the address, the static link can just be an extra parameter. > Either way, you need a static link, and it is just passed as an extra parameter. In the x86 case, it is always passed in the same register (ecx, if I recall) and always immediately stored by prolog code at the same place in the AR. tree-nested doesn't mess with this, but adds extra static-linkish variable(s) elsewhere in the AR, derived from this one, and uses them in some/all places. > > Or maybe this is dealt with elsewhere or otherwise... > > > We do actually use "extra parameter" sometimes for static link. > And maybe elsewhere/otherwise is in the frontend, mostly..just mostly... > > > There are comments in tree-nested.c indicating it has "bad history". > But actually, I'm not sure it does things so poorly. > I haven't read the comments in later gcc versions, but the bad history I recall is that it greatly simplifies an "insanely complicated" scheme. Unfortunately, the simplification is all compile-time, at the expense of replacing a relatively simple runtime scheme with one I would call at least very complicated, if not insanely so. > The basic theory of nested functions includes stuffing locals into a struct, > at least locals accessed by nested functions, and passing a pointer to that struct > as an extra parameter. The locals include said pointer to struct of locals, in the case > of multiple nesting levels. OR you can "flatten" things, I guess, maybe.f Actually, it's the other way around. All locals start out in a flat AR. If the function contains nested function(s), tree-nested collects the locals that are referenced nonlocally (i.e., from within one of the nested functions) into a local struct. Then, the nested functions get and use what you could call a "derived static link" (a better term is needed) that points directly to this struct rather than to the whole AR. I guess this helps with inlining, in case the struct isn't actually located in the same way inside the parent AR. > Flattening is problematic though, given nested functions can be mutually recursive > and such..you want to update just one place and have all the other code follow pointers to it. > Optimization can copy around copies instead of pointers, where it is profitable. > Sorry, I don't have time to explain right now. > > > - Jay > > > ---------------------------------------- >> Date: Mon, 11 Jun 2012 07:39:09 -0500 >> From: rodney_bates at lcwb.coop >> To: m3devel at elegosoft.com >> Subject: Re: [M3devel] reducing our diff to gcc? >> >> >> >> On 06/10/2012 03:34 AM, Jay K wrote: >>> >>> reducing our diff to gcc? >>> >>> >>> Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. >>> >>> >>> but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. >>> >>> But to my point: >>> >>> >>> gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. >>> >>> >>> tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. >>> >>> >>> but, tree-nested.c, I doubt this can be avoided..so I'm left probably >>> just not bothering with the others. >>> >> >> tree-nested.c has been a thorn in my side from its inception. I broke a whole >> lot of stuff in m3gdb, everything that has to do with nonlocal variable access >> and/or variables of procedure type. It reshuffles the activation record around, >> with multiple copies of lots of things, especially the static link, which has >> either two, or, if I remember right, three copies in different places. Moreover, >> they don't all point to the same place in their target AR. >> >> All this wouldn't be too bad, if we got the debug info altered to reflect the >> reality, but by the time tree-nested does its thing, it's kind of late to do >> that easily. That's one of the attractions of llvm to me, that it's well set >> up to transform both the code and its debug info in parallel, when doing >> optimization. Maybe gcc would be easier too, if we didn't do our own debug >> info production in parse.c. That could be a lot of work, but would fit >> fit nicely with switching to dwarf. >> >> As I understood it, all of the changes tree-nested.c makes are really only >> needed for the interaction between nonlocal variable access _and_ inlining. >> The last I knew we have had inlining disabled from the beginning anyway. >> Jay, if this is still true, and as you are into disabling various gcc >> optimizations, what would you think of just disabling what tree-nested does? >> >>> >>> Thoughts? >>> >>> >>> There is also at least one bug fix...that I could avoid needing. >>> There is a bug optimizing our form of div/mod. >>> We could avoid that by going back to function calls, but..again, I'm torn. >>> >>> >>> If you configure -enable-checking, at least currently, there are asserts that have to be removed. >>> >>> >>> I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. >>> >>> >>> - Jay >>> > From dabenavidesd at yahoo.es Wed Jun 13 04:18:33 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 13 Jun 2012 03:18:33 +0100 (BST) Subject: [M3devel] reducing our diff to gcc? In-Reply-To: <4FD76BAE.7060702@lcwb.coop> Message-ID: <1339553913.24183.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: in fact language-dependent-parts of a debugger inherently are 'part' of compiler architecture (needs to re-implement a lot of machinery in Gdb from Gcc, maybe it's still the same but could be reordered to cut it down if so is done in C). I have heard M3gdb is like 20k loc, this is hard to me, and in C, worse, I think a full debugger can be implemented in such lines, at least in ldb is like that, so I don't how much really M3gdb is not in Gdb. Now, m3gcc or m3cgc or m3cg or m3cc is not of interest in GNU why keep it,like that, we should use it as a real backend for using it as a language but as a real architecture, as it isn't what would it take to do that? In fact that's what we are trying to do with JIT, right? What I have found tells me that C code tends to be AFAIK portable in the form of a stack architecture like M3CG than anything else In the other sense, compiling gcc over and over again, I don't know how many of us want to do that each time we compile a Modula-3 distribution (I do). Now, I don't think gcc wnats to add and support our ideal architecture, but anyway who knows if the thing will work for us, maybe they will want it, won't they? Thanks in advance --- El mar, 12/6/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] reducing our diff to gcc? Para: "m3devel" Fecha: martes, 12 de junio, 2012 11:17 On 06/11/2012 02:07 PM, Jay K wrote: > >???>? Maybe gcc would be easier too, if we didn't do our own debug >? ? >? info production in parse.c. > > Correct. It is "our fault" for doing wierd things debugging-wise. > >???>? That could be a lot of work > > > It is "the right amount of work", but yeah, kind of a lot. > >???>? but would fit fit nicely with switching to dwarf. > We'd just use -g and use whatever gcc wants for the target system. > Sometimes Dwarf, sometimes not, we wouldn't care. > It's going to require quite a lot in m3gdb.? Stock gdb has readers for several debug info formats, but there's a lot that is language-dependent, even for C, let alone the other languages supported by stock gdb.? I think this has considerable debug-format dependency too, leading to a Cartesian product.? It is certainly that way for Modula-3.? I would be greatly surprised if gcc didn't also require at least a bit of M3-dependent work, even for dwarf. > >???>? As I understood it, all of the changes tree-nested.c makes are really only >???>? needed for the interaction between nonlocal variable access _and_ inlining. > > > I don't think so, but I don't know. > > >???>? The last I knew we have had inlining disabled from the beginning anyway. > > > We have inlining on mostly. Aside from a small sprinkling of "volatile". > Off in gcc 4.6 backend, but I never enabled that and am moving on to 4.7 rapidly. > > >???>? what would you think of just disabling what tree-nested does? > I'm really not sure it is possible. > Sure, if nested functions used only for "lexical hiding" of the functions themselves. > But Modula-3 uses the "static link" in a unique-to-itself way. > I don't expect gcc to "just work". > I can explain the Modula-3 unique way if people want. > It turns out..I have thought about this a bunch, there is no good way to handle the static link, > given that you can take the addresses of nested functions. (Right?) > Please elaborate.? Yes, you can take the address of a nested function.? But you can only pass it as a parameter.? You can't assign it to a variable.? This latter restriction requires some runtime enforcement, but I think it is taken care of by explicitly coded runtime checks generated by parse.c or earlier. The nested-function language extension to C, implemented by stock gcc, allows the taking of the address of a nested function, without the restriction against assigning it to a variable, with no linguistic safety added.? If, in C, you use such a function "address" for a function that has returned, to quote from gcc "all hell will break loose". But this should imply that stock gcc support is enough for Modula-3. > > > Where you don't take the address, the static link can just be an extra parameter. > Either way, you need a static link, and it is just passed as an extra parameter. In the x86 case, it is always passed in the same register (ecx, if I recall) and always immediately stored by prolog code at the same place in the AR.? tree-nested doesn't mess with this, but adds extra static-linkish variable(s) elsewhere in the AR, derived from this one, and uses them in some/all places. > > Or maybe this is dealt with elsewhere or otherwise... > > > We do actually use "extra parameter" sometimes for static link. > And maybe elsewhere/otherwise is in the frontend, mostly..just mostly... > > > There are comments in tree-nested.c indicating it has "bad history". > But actually, I'm not sure it does things so poorly. > I haven't read the comments in later gcc versions, but the bad history I recall is that it greatly simplifies an "insanely complicated" scheme.? Unfortunately, the simplification is all compile-time, at the expense of replacing a relatively simple runtime scheme with one I would call at least very complicated, if not insanely so. > The basic theory of nested functions includes stuffing locals into a struct, > at least locals accessed by nested functions, and passing a pointer to that struct > as an extra parameter. The locals include said pointer to struct of locals, in the case > of multiple nesting levels. OR you can "flatten" things, I guess, maybe.f Actually, it's the other way around.? All locals start out in a flat AR.? If the function contains nested function(s), tree-nested collects the locals that are referenced nonlocally (i.e., from within one of the nested functions) into a local struct.? Then, the nested functions get and use what you could call a "derived static link" (a better term is needed) that points directly to this struct rather than to the whole AR. I guess this helps with inlining, in case the struct isn't actually located in the same way inside the parent AR. > Flattening is problematic though, given nested functions can be mutually recursive > and such..you want to update just one place and have all the other code follow pointers to it. > Optimization can copy around copies instead of pointers, where it is profitable. > Sorry, I don't have time to explain right now. > > >???- Jay > > > ---------------------------------------- >> Date: Mon, 11 Jun 2012 07:39:09 -0500 >> From: rodney_bates at lcwb.coop >> To: m3devel at elegosoft.com >> Subject: Re: [M3devel] reducing our diff to gcc? >> >> >> >> On 06/10/2012 03:34 AM, Jay K wrote: >>> >>> reducing our diff to gcc? >>> >>> >>> Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. >>> >>> >>> but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. >>> >>> But to my point: >>> >>> >>> gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. >>> >>> >>> tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. >>> >>> >>> but, tree-nested.c, I doubt this can be avoided..so I'm left probably >>> just not bothering with the others. >>> >> >> tree-nested.c has been a thorn in my side from its inception. I broke a whole >> lot of stuff in m3gdb, everything that has to do with nonlocal variable access >> and/or variables of procedure type. It reshuffles the activation record around, >> with multiple copies of lots of things, especially the static link, which has >> either two, or, if I remember right, three copies in different places. Moreover, >> they don't all point to the same place in their target AR. >> >> All this wouldn't be too bad, if we got the debug info altered to reflect the >> reality, but by the time tree-nested does its thing, it's kind of late to do >> that easily. That's one of the attractions of llvm to me, that it's well set >> up to transform both the code and its debug info in parallel, when doing >> optimization. Maybe gcc would be easier too, if we didn't do our own debug >> info production in parse.c. That could be a lot of work, but would fit >> fit nicely with switching to dwarf. >> >> As I understood it, all of the changes tree-nested.c makes are really only >> needed for the interaction between nonlocal variable access _and_ inlining. >> The last I knew we have had inlining disabled from the beginning anyway. >> Jay, if this is still true, and as you are into disabling various gcc >> optimizations, what would you think of just disabling what tree-nested does? >> >>> >>> Thoughts? >>> >>> >>> There is also at least one bug fix...that I could avoid needing. >>> There is a bug optimizing our form of div/mod. >>> We could avoid that by going back to function calls, but..again, I'm torn. >>> >>> >>> If you configure -enable-checking, at least currently, there are asserts that have to be removed. >>> >>> >>> I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. >>> >>> >>> - Jay >>> >?????? ???????? ?????? ??? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jun 16 08:09:33 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 16 Jun 2012 06:09:33 +0000 Subject: [M3devel] help test 4.7 backend? Message-ID: help test 4.7 backend? Can folks try out the new 4.7 backend? edit m3-sys/m3cc/src/m3makefile add your platform to the list near the top, mapped to "47" and then run scripts/python/boot2.sh and then, do it again, but edit config/Unix.common, the functon m3_backend to always args += m3back_optimize and optionally but preferably try with -O3 instead of -O2 in the same file and try running some GUI apps like solataire I could use help particularly with: ?SPARC{32,64}_LINUX ?PPC_{LINUX,OPENBSD,NETBSD,FREEBSD,DARWIN} ?ALPHA_OSF ?I386_LINUX, I386_INTERIX, I386_MINGWIN, I386_CYGWIN, because I'm being lazy ? I can do various x86/amd64, either in a VM or opencsw, but splitting that load would be good too. I might go back to not having much time soon or temporarily. Still to do: ? apply OpenBSD patches ? update from 4.7.0 to 4.7.1 that was just released. ? ? Thanks, ?- Jay From jay.krell at cornell.edu Sat Jun 16 10:47:35 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 16 Jun 2012 08:47:35 +0000 Subject: [M3devel] ALPHA_LINUX In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, , <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , , , , , , , , , <20120607011634.468b6bbf@wenus.next.com.pl>, , <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org>, Message-ID: > > Is cm3 working on LINUX_ALPHA? I have one ES40 working server with Gentoo Linux > I don't think it does yet, but give me ssh access and I can most likely make it work pretty quickly. > There is very very very little to porting these days. So forgetful of me. Yes, it works. See:http://www.opencm3.net/uploaded-archives/index.html - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jun 16 11:03:22 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 16 Jun 2012 09:03:22 +0000 Subject: [M3devel] IA64_LINUX In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, , <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , , , , , , , , , <20120607011634.468b6bbf@wenus.next.com.pl>, , <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org>, , Message-ID: also IA64_LINUX, I thought I was working on recently, yet I already put up here a while ago: http://www.opencm3.net/uploaded-archives/index.html i don't remember if I solved the finding the register spill stack coding..and indeed..I don't see the code in m3core...so a little bit to do there... I expect there might be a GC bug there..or maybe we should make all stores volatile..or something... - Jay From: jay.krell at cornell.edu To: dragisha at m3w.org; dknoto at gmail.com CC: m3devel at elegosoft.com Subject: ALPHA_LINUX Date: Sat, 16 Jun 2012 08:47:35 +0000 > > Is cm3 working on LINUX_ALPHA? I have one ES40 working server with Gentoo Linux > I don't think it does yet, but give me ssh access and I can most likely make it work pretty quickly. > There is very very very little to porting these days. So forgetful of me. Yes, it works. See: http://www.opencm3.net/uploaded-archives/index.html - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at wickensonline.co.uk Sat Jun 16 11:49:45 2012 From: mark at wickensonline.co.uk (Mark Wickens) Date: Sat, 16 Jun 2012 10:49:45 +0100 Subject: [M3devel] IA64_LINUX In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> <20120607011634.468b6bbf@wenus.next.com.pl> <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org> Message-ID: If you feel the need to address the issues let me know and I'll put the ZX6000 online for you. Mark. Sent from my iPad On 16 Jun 2012, at 10:03, Jay K wrote: > also IA64_LINUX, I thought I was working on recently, yet I already put up here a while ago: > > http://www.opencm3.net/uploaded-archives/index.html > > > i don't remember if I solved the finding the register spill stack coding..and indeed..I don't see the code in m3core...so a little bit to do there... I expect there might be a GC bug there..or maybe we should make all stores volatile..or something... > > > - Jay > > From: jay.krell at cornell.edu > To: dragisha at m3w.org; dknoto at gmail.com > CC: m3devel at elegosoft.com > Subject: ALPHA_LINUX > Date: Sat, 16 Jun 2012 08:47:35 +0000 > > > > Is cm3 working on LINUX_ALPHA? I have one ES40 working server with Gentoo Linux > > > I don't think it does yet, but give me ssh access and I can most likely make it work pretty quickly. > > There is very very very little to porting these days. > > > So forgetful of me. > > Yes, it works. > > See: > http://www.opencm3.net/uploaded-archives/index.html > > > - Jay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Sun Jun 17 20:36:02 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 17 Jun 2012 20:36:02 +0200 Subject: [M3devel] g_open, GLib wrapper for open() Message-ID: From doc: === There is a group of functions which wrap the common POSIX functions dealing with filenames (g_open(), g_rename(), g_mkdir(), g_stat(), g_unlink(), g_remove(), g_fopen(), g_freopen()). The point of these wrappers is to make it possible to handle file names with any Unicode characters in them on Windows without having to use ifdefs and the wide character API in the application code. The pathname argument should be in the GLib file name encoding. On POSIX this is the actual on-disk encoding which might correspond to the locale settings of the process (or the G_FILENAME_ENCODING environment variable), or not. On Windows the GLib file name encoding is UTF-8. Note that the Microsoft C library does not use UTF-8, but has separate APIs for current system code page and wide characters (UTF-16). The GLib wrappers call the wide character API if present (on modern Windows systems), otherwise convert to/from the system code page. === Template for g_open is: int g_open (const gchar *filename, int flags, int mode); Obviously, I need FilePosix.i3 and descendants, but under Windows? Anyone met/solved this? TIA, dd -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jun 18 22:57:07 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 18 Jun 2012 21:57:07 +0100 (BST) Subject: [M3devel] g_open, GLib wrapper for open() In-Reply-To: Message-ID: <1340053027.33106.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: Win32 doesn't support Unicode character code set natively but as separated strings API for both ANSI and Unicode same as C Run-Time library just as you say, not as it's in Windows NT native code set for all strings. But I don't think the Win32 Win98 Is a common type of system daily, so I guess you can be safe without that. Couldn't you? Thanks in advance --- El dom, 17/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] g_open, GLib wrapper for open() Para: "m3devel" Fecha: domingo, 17 de junio, 2012 13:36 >From doc:===?There is a group of functions which wrap the common POSIX functions dealing with filenames (g_open(), g_rename(), g_mkdir(), g_stat(), g_unlink(), g_remove(), g_fopen(), g_freopen()). The point of these wrappers is to make it possible to handle file names with any Unicode characters in them on Windows without having to use ifdefs and the wide character API in the application code. The pathname argument should be in the GLib file name encoding. On POSIX this is the actual on-disk encoding which might correspond to the locale settings of the process (or the G_FILENAME_ENCODING environment variable), or not. On Windows the GLib file name encoding is UTF-8. Note that the Microsoft C library does not use UTF-8, but has separate APIs for current system code page and wide characters (UTF-16). The GLib wrappers call the wide character API if present (on modern Windows systems), otherwise convert to/from the system code page.===Template for g_open is: int g_open (const gchar *filename, int flags, int mode);Obviously, I need FilePosix.i3 and descendants, but under Windows? Anyone met/solved this? TIA,dd -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 19 09:15:32 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 19 Jun 2012 09:15:32 +0200 Subject: [M3devel] g_open, GLib wrapper for open() In-Reply-To: <1340053027.33106.YahooMailClassic@web29705.mail.ird.yahoo.com> References: <1340053027.33106.YahooMailClassic@web29705.mail.ird.yahoo.com> Message-ID: <110BFBCA-C682-4210-8D44-375550B6DB55@m3w.org> You could not do without. Once you need to access a file from Gtk application, and file is named with at least one Unicode character, you cannot ignore it. On Jun 18, 2012, at 10:57 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > Win32 doesn't support Unicode character code set natively but as separated strings API for both ANSI and Unicode same as C Run-Time library just as you say, not as it's in Windows NT native code set for all strings. But I don't think the Win32 Win98 Is a common type of system daily, so I guess you can be safe without that. Couldn't you? > Thanks in advance > > --- El dom, 17/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: [M3devel] g_open, GLib wrapper for open() > Para: "m3devel" > Fecha: domingo, 17 de junio, 2012 13:36 > > From doc: > === > There is a group of functions which wrap the common POSIX functions dealing with filenames (g_open(), g_rename(), g_mkdir(), g_stat(), g_unlink(), g_remove(), g_fopen(), g_freopen()). The point of these wrappers is to make it possible to handle file names with any Unicode characters in them on Windows without having to use ifdefs and the wide character API in the application code. > > The pathname argument should be in the GLib file name encoding. On POSIX this is the actual on-disk encoding which might correspond to the locale settings of the process (or the G_FILENAME_ENCODING environment variable), or not. > > On Windows the GLib file name encoding is UTF-8. Note that the Microsoft C library does not use UTF-8, but has separate APIs for current system code page and wide characters (UTF-16). The GLib wrappers call the wide character API if present (on modern Windows systems), otherwise convert to/from the system code page. > === > Template for g_open is: > > int g_open (const gchar *filename, > int flags, > int mode); > Obviously, I need FilePosix.i3 and descendants, but under Windows? Anyone met/solved this? > > TIA, > dd > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Tue Jun 19 16:35:34 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 10:35:34 -0400 Subject: [M3devel] missing m3gdb? Message-ID: <20120619143534.GA30034@topoi.pooq.com> Having downloaded the development version in mid-May and succeeded in biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. I then removed my existing Modula 3, installed the new .deb, and proceeded to use it with no problems until today. Today tried to use the debugger, and discovered that m3gdb is missing. Did I bungle something or was m3gdb left out of the script for building the .deb for some reason? If the latter, is it still missing? The only package I remenber deliberately removing is ESC, which didn't compile. -- hendrik From hendrik at topoi.pooq.com Tue Jun 19 17:13:04 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 11:13:04 -0400 Subject: [M3devel] missing m3gdb? In-Reply-To: <20120619143534.GA30034@topoi.pooq.com> References: <20120619143534.GA30034@topoi.pooq.com> Message-ID: <20120619151304.GB30034@topoi.pooq.com> On Tue, Jun 19, 2012 at 10:35:34AM -0400, Hendrik Boom wrote: > Having downloaded the development version in mid-May and succeeded in > biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. I then removed my > existing Modula 3, installed the new .deb, and proceeded to use it with > no problems until today. > > Today tried to use the debugger, and discovered that m3gdb is missing. > > Did I bungle something or was m3gdb left out of the script for building > the .deb for some reason? If the latter, is it still missing? > > The only package I remenber deliberately removing is ESC, which didn't > compile. I don't know if this is relevant, but:::: On LINUXLIBC6, which I've only partially recompiled so far from those same mid-May sources, I get (m3gdb) bt #0 0x0804c75e in RunSeq (code=0xb6c3436c, exec=0xbfdad6d4) at ../src/PqCd.m3:907 #1 0x0804c950 in EnvRunMe (self=0xb6c34308) at ../src/PqCd.m3:923 Debug info for file "Stupid.mc" not in stabs format (m3gdb) which suggests there may be some inncompatibility, possibly caused by the partial recompilation of Modula 3. I don't know whether the debugger is there from my initial download or from my recompilation. > > -- hendrik From dabenavidesd at yahoo.es Tue Jun 19 17:57:52 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 19 Jun 2012 16:57:52 +0100 (BST) Subject: [M3devel] missing m3gdb? In-Reply-To: <20120619143534.GA30034@topoi.pooq.com> Message-ID: <1340121472.14046.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: while I haven't checked cm3-std last (released) build but I didn't need it since anything broke in build time, but ESC hasn't been compiled after last CM3 as the HP' version didn't compile for me (though older CM3 did compile with same HP version) so I tried and worked OK, which might be good for timing it. I don't know if your m3cgc works or not? with other releases, I guess it should not break m3gdb support (whichever m3cgc do you use). My main comment here is that you don't update something or anything else unless isn't working OK (I guess this is pure SW Eng blah blah but if it works ...). Thanks in advance --- El mar, 19/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: [M3devel] missing m3gdb? Para: "m3devel" Fecha: martes, 19 de junio, 2012 09:35 Having downloaded the development version in mid-May and succeeded in biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb.? I then removed my existing Modula 3, installed the new .deb, and proceeded to use it with no problems until today. Today tried to use the debugger, and discovered that m3gdb is missing. Did I bungle something or was m3gdb left out of the script for building the .deb for some reason?? If the latter, is it still missing? The only package I remenber deliberately removing is ESC, which didn't compile. -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 19 18:08:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 19 Jun 2012 18:08:00 +0200 Subject: [M3devel] missing m3gdb? In-Reply-To: <20120619143534.GA30034@topoi.pooq.com> References: <20120619143534.GA30034@topoi.pooq.com> Message-ID: <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> Short answer: If you need m3gdb - use 5.8.6 release version. On Jun 19, 2012, at 4:35 PM, Hendrik Boom wrote: > Having downloaded the development version in mid-May and succeeded in > biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. I then removed my > existing Modula 3, installed the new .deb, and proceeded to use it with > no problems until today. > > Today tried to use the debugger, and discovered that m3gdb is missing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Tue Jun 19 18:28:33 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 19 Jun 2012 17:28:33 +0100 (BST) Subject: [M3devel] missing m3gdb? In-Reply-To: <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> Message-ID: <1340123313.64869.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I don't think is so much a waste of time, if compiler has done well, you don't need to make a debugger, but instead if it hasn't done it well, don't waste more time use a different compiler (optimized). ESC had a X postfix for every package name that has been verified so you could have a different verified and compiled version and as a reference for program behavior and then use an experimental debug able version. That was the idea with a Module system with separate compilation and version stamps, IMHO, to really have a fast to execute and easy to debug around cycle and you need that lately as compiler versions are getting faster or harder to debug. The interesting stuff is whether you could use the same infrastructure to verify in less time or not, that will proof ESC is worth of anything which I'm sure no body uses for that reasoning broken -not Dragisha's nor Hendrick- but most people do ahead of time. Thanks in advance --- El mar, 19/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] missing m3gdb? Para: "Hendrik Boom" CC: "m3devel" Fecha: martes, 19 de junio, 2012 11:08 Short answer: If you need m3gdb - use 5.8.6 release version. On Jun 19, 2012, at 4:35 PM, Hendrik Boom wrote: Having downloaded the development version in mid-May and succeeded in? biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. ?I then removed my? existing Modula 3, installed the new .deb, and proceeded to use it with? no problems until today. Today tried to use the debugger, and discovered that m3gdb is missing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Tue Jun 19 18:55:16 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 12:55:16 -0400 Subject: [M3devel] missing m3gdb? In-Reply-To: <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> References: <20120619143534.GA30034@topoi.pooq.com> <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> Message-ID: <20120619165516.GA32036@topoi.pooq.com> That was my fallback plan, and I'd still have to recompile it so it would access current libraries on Debian. What I wanted to know was whether it was intentional to leave the debugger out of the current .deb-building script (because, perhaps, that it didn't work). And as I've said before, recompiling frmo source is too much work for a beginner. Not that I class myself as a beginner anymore. but if, for example, I'd want to submit a video game written in Modula 3 to an open-source video-game competition, the judges would have to be able to run it on their machines, and they would be beginners. So if the development-source doesn't build a working .deb, I'll build one from 5.8.6. But if I didn't bungle the .deb build, and the m3gdb isn't a known bug, it probably warrants some attentioin, by someone, someday.. -- hendrik The LINUXLIBC6 problem may just be a problem with an incomplete build. I've restarted it after installing postgresql (which was holding things up), and it's compiling, comppiling, and compiling now. But I really had thought the AMD64 Linux build has good, and it seemed not to be. -- hendrik On Tue, Jun 19, 2012 at 06:08:00PM +0200, Dragi?a Duri? wrote: > Short answer: If you need m3gdb - use 5.8.6 release version. > > On Jun 19, 2012, at 4:35 PM, Hendrik Boom wrote: > > > Having downloaded the development version in mid-May and succeeded in > > biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. I then removed my > > existing Modula 3, installed the new .deb, and proceeded to use it with > > no problems until today. > > > > Today tried to use the debugger, and discovered that m3gdb is missing. > From hendrik at topoi.pooq.com Tue Jun 19 19:00:39 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 13:00:39 -0400 Subject: [M3devel] ESC In-Reply-To: <1340123313.64869.YahooMailClassic@web29704.mail.ird.yahoo.com> References: <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> <1340123313.64869.YahooMailClassic@web29704.mail.ird.yahoo.com> Message-ID: <20120619170038.GB32036@topoi.pooq.com> On Tue, Jun 19, 2012 at 05:28:33PM +0100, Daniel Alejandro Benavides D. wrote: > Hi all: > I don't think is so much a waste of time, if compiler has done well, you don't need to make a debugger, but instead if it hasn't done it well, don't waste more time use a different compiler (optimized). > ESC had a X postfix for every package name that has been verified so you could have a different verified and compiled version and as a reference for program behavior and then use an experimental debug able version. > That was the idea with a Module system with separate compilation and version stamps, IMHO, to really have a fast to execute and easy to debug around cycle and you need that lately as compiler versions are getting faster or harder to debug. > The interesting stuff is whether you could use the same infrastructure to verify in less time or not, that will proof ESC is worth of anything which I'm sure no body uses for that reasoning broken -not Dragisha's nor Hendrick- but most people do ahead of time. > Thanks in advance > > --- El mar, 19/6/12, Dragi?a Duri? escribi?: Yes, I agree. It would be worthwhile to track down the ESC source code. Or rewrite it. But until that's been done I'll probably need a debugger. And maybe occasinoally afterward, for the things that ESC doesn't catch. -- hendrik From hendrik at topoi.pooq.com Tue Jun 19 19:57:13 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 13:57:13 -0400 Subject: [M3devel] Rebuilding 5.8.6 for current Debian. In-Reply-To: <20120619165516.GA32036@topoi.pooq.com> References: <20120619143534.GA30034@topoi.pooq.com> <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> <20120619165516.GA32036@topoi.pooq.com> Message-ID: <20120619175713.GA32389@topoi.pooq.com> On Tue, Jun 19, 2012 at 12:55:16PM -0400, Hendrik Boom wrote: > > So if the development-source doesn't build a working .deb, I'll build > one from 5.8.6. The current 5.8.6 .deb is not compatible with current versions of debian. If I build a .deb from the sources in cm3-src-all-5.8.6-REL.tgz, will its version number be 5.8.6, or some modification of 5.8.6? I'd very much want it to be *different* so that my build will be recognised as a more recent build (for a more recent version of Debian). If not, is there a way of specifying it explicitly? The new .deb I make will likely not be compatible with really old versions of Debian. -- hendrik From dabenavidesd at yahoo.es Tue Jun 19 20:17:07 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 19 Jun 2012 19:17:07 +0100 (BST) Subject: [M3devel] ESC In-Reply-To: <20120619170038.GB32036@topoi.pooq.com> Message-ID: <1340129827.93106.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: in fact there was another ESC written exclusively for the purpose of finding the time complexity (from source) of multi-threaded programs, but this would be another approach to find whether ESC system and its proof machine (Simplify) will perform OK using it in normal basis, at the average case scenario (but Simplify has unsoundnesses and program-dependent checker coming from ESC front end), at least in a programming environment like Modula-3 to have the class of complexity of a programming model is something I want. However there is proof of such an environment used for big SW development at IBM, which targeted Modula-3, was not good without formal software analysis (in both fronts, development and performance) Thing is I don't how many studies of Software developers given by a systematic analysis are aside of IBM 80's and some more for Modula-3 theres later. So based in experience I can infer it's good, but in the real world I don't know how many will buy the idea not backed by some real good experience and with some real proof. Anyone else :)? Thanks in advance Thanks in advance --- El mar, 19/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: [M3devel] ESC Para: "m3devel" Fecha: martes, 19 de junio, 2012 12:00 On Tue, Jun 19, 2012 at 05:28:33PM +0100, Daniel Alejandro Benavides D. wrote: > Hi all: > I don't think is so much a waste of time, if compiler has done well, you don't need to make a debugger, but instead if it hasn't done it well, don't waste more time use a different compiler (optimized). > ESC had a X postfix for every package name that has been verified so you could have a different verified and compiled version and as a reference for program behavior and then use an experimental debug able version. > That was the idea with a Module system with separate compilation and version stamps, IMHO, to really have a fast to execute and easy to debug around cycle and you need that lately as compiler versions are getting faster or harder to debug. > The interesting stuff is whether you could use the same infrastructure to verify in less time or not, that will proof ESC is worth of anything which I'm sure no body uses for that reasoning broken -not Dragisha's nor Hendrick- but most people do ahead of time. > Thanks in advance > > --- El mar, 19/6/12, Dragi?a Duri? escribi?: Yes, I agree. It would be worthwhile to track down the ESC source code.? Or rewrite it. But until that's been done I'll probably need a debugger.? And maybe occasinoally afterward, for the things that ESC doesn't catch. -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Wed Jun 20 13:17:06 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Wed, 20 Jun 2012 07:17:06 -0400 Subject: [M3devel] test driver? Message-ID: <20120620111705.GA10486@topoi.pooq.com> Is there a test suite driver somewhere in the Modula 3 ecosystem? I'd like to feed various files of test data into a program to see if it produces acceptable output. Currently it's all text in and out, but I'd prefer not to have to rewrite my test suite because of trivialities, such as spelling corrections in my error messages. This is for regression testing, so automation is appreciated. -- hendrik From dragisha at m3w.org Wed Jun 20 13:26:53 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 20 Jun 2012 13:26:53 +0200 Subject: [M3devel] test driver? In-Reply-To: <20120620111705.GA10486@topoi.pooq.com> References: <20120620111705.GA10486@topoi.pooq.com> Message-ID: cm3/m3-libs/libm3/tests And under. AFAIK, there is continuous building/testing configured for cm3. Search for Hudson, Modula-3? On Jun 20, 2012, at 1:17 PM, Hendrik Boom wrote: > Is there a test suite driver somewhere in the Modula 3 ecosystem? > > I'd like to feed various files of test data into a program to see if it > produces acceptable output. Currently it's all text in and out, but I'd > prefer not to have to rewrite my test suite because of trivialities, > such as spelling corrections in my error messages. > > This is for regression testing, so automation is appreciated. > > -- hendrik > From dabenavidesd at yahoo.es Wed Jun 20 14:41:26 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 20 Jun 2012 13:41:26 +0100 (BST) Subject: [M3devel] test driver? In-Reply-To: Message-ID: <1340196086.64556.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: black-box testing for C or m3cgc, m3cg, m3cc, or m3cg is something we should use daily basis. I know of a free testing platform for C# based on Spec# I think we could use it for static optimization (test -O2 -O3) which combines both adding reasoning to the system (knowledge management): http://books.google.com.co/books?id=Am43BAC06L8C This can be a good thing to do in later stages (code generation, etc). Thanks in advance --- El mi?, 20/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] test driver? Para: "Hendrik Boom" CC: "m3devel" Fecha: mi?rcoles, 20 de junio, 2012 06:26 cm3/m3-libs/libm3/tests And under. AFAIK, there is continuous building/testing configured for cm3. Search for Hudson, Modula-3? On Jun 20, 2012, at 1:17 PM, Hendrik Boom wrote: > Is there a test suite driver somewhere in the Modula 3 ecosystem? > > I'd like to feed various files of test data into a program to see if it > produces acceptable output.? Currently it's all text in and out, but I'd > prefer not to have to rewrite my test suite because of trivialities, > such as spelling corrections in my error messages. > > This is for regression testing, so automation is appreciated. > > -- hendrik > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wagner at elegosoft.com Fri Jun 22 09:16:16 2012 From: wagner at elegosoft.com (mail.elegosoft.com) Date: Fri, 22 Jun 2012 09:16:16 +0200 Subject: [M3devel] help test 4.7 backend? In-Reply-To: References: Message-ID: <20120622091616.18b39755.wagner@elegosoft.com> I just noticed that m3tests have been hanging on luthien/AMD64_FREEBSD for several days now in p006: http://hudson.modula3.com:8080/job/cm3-current-test-m3tests-AMD64_FREEBSD/479/console I don't know if it is related, but it used to run OK. Olaf On Sat, 16 Jun 2012 06:09:33 +0000 Jay K wrote: > > help test 4.7 backend? > > > Can folks try out the new 4.7 backend? > edit m3-sys/m3cc/src/m3makefile > add your platform to the list near the top, mapped to "47" > and then run scripts/python/boot2.sh > and then, do it again, but edit config/Unix.common, the functon > m3_backend to always args += m3back_optimize > and optionally but preferably try with -O3 instead of -O2 in > the same file > and try running some GUI apps like solataire > > > I could use help particularly with: > ?SPARC{32,64}_LINUX > ?PPC_{LINUX,OPENBSD,NETBSD,FREEBSD,DARWIN} > ?ALPHA_OSF > ?I386_LINUX, I386_INTERIX, I386_MINGWIN, I386_CYGWIN, because I'm being lazy > > > ? > I can do various x86/amd64, either in a VM or opencsw, > but splitting that load would be good too. > I might go back to not having much time soon or temporarily. > > > Still to do: > ? apply OpenBSD patches > ? update from 4.7.0 to 4.7.1 that was just released. > ? > ? > Thanks, > ?- Jay > -- Olaf Wagner -- elego Software Solutions GmbH Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 45 86 95 http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: Berlin Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194 From dabenavidesd at yahoo.es Fri Jun 22 17:51:37 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 22 Jun 2012 16:51:37 +0100 (BST) Subject: [M3devel] help test 4.7 backend? In-Reply-To: <20120622091616.18b39755.wagner@elegosoft.com> Message-ID: <1340380297.77309.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: maybe not, else if somebody isn't playing optimization unintended aggressively for m3tests/src but to break semantics of Modula-3 threads? I mean, m3 sources are OK, respect of the Thread interface, but I don't think for the thing they so call pthreads can be the same at the same time, though DEC-SRC hard influenced it. The only way to test? that is in No in SW bug, but the HW, kernel aside, but with this HW I can't be sure they are doing thread safe system code (in other words those machines are badly behaved). I have been thinking in this idea, but requiring to make a Virtual Machine for Modula-3 worth the value of playing it for that matter. It could have multithreading capabilities, tough multitasking system and all. Jay, and all we could try the DEC/Compaq Alpha/Piranha simulator, to catch that kind of errors. Thanks in advance --- El vie, 22/6/12, mail.elegosoft.com escribi?: De: mail.elegosoft.com Asunto: Re: [M3devel] help test 4.7 backend? Para: m3devel at elegosoft.com Fecha: viernes, 22 de junio, 2012 02:16 I just noticed that m3tests have been hanging on luthien/AMD64_FREEBSD for several days now in p006: http://hudson.modula3.com:8080/job/cm3-current-test-m3tests-AMD64_FREEBSD/479/console I don't know if it is related, but it used to run OK. Olaf On Sat, 16 Jun 2012 06:09:33 +0000 Jay K wrote: > > help test 4.7 backend? > > > Can folks try out the new 4.7 backend? > edit m3-sys/m3cc/src/m3makefile > add your platform to the list near the top, mapped to "47" > and then run scripts/python/boot2.sh > and then, do it again, but edit config/Unix.common, the functon > m3_backend to always args += m3back_optimize > and optionally but preferably try with -O3 instead of -O2 in > the same file > and try running some GUI apps like solataire > > > I could use help particularly with: > SPARC{32,64}_LINUX > PPC_{LINUX,OPENBSD,NETBSD,FREEBSD,DARWIN} > ALPHA_OSF > I386_LINUX, I386_INTERIX, I386_MINGWIN, I386_CYGWIN, because I'm being lazy > > > > I can do various x86/amd64, either in a VM or opencsw, > but splitting that load would be good too. > I might go back to not having much time soon or temporarily. > > > Still to do: > apply OpenBSD patches > update from 4.7.0 to 4.7.1 that was just released. > > > Thanks, > - Jay >? ??? ???????? ?????? ??? ? -- Olaf Wagner -- elego Software Solutions GmbH ? ? ? ? ? ? ???Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany phone: +49 30 23 45 86 96? mobile: +49 177 2345 869? fax: +49 30 23 45 86 95 ???http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: Berlin Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jun 23 02:45:17 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 23 Jun 2012 00:45:17 +0000 Subject: [M3devel] help test 4.7 backend? In-Reply-To: <20120622091616.18b39755.wagner@elegosoft.com> References: , <20120622091616.18b39755.wagner@elegosoft.com> Message-ID: I kind of haven't touched FreeBSD. They are still on gcc 4.5. But maybe I did. I'll look into it maybe soon..but I'm super busy the next two weeks. I'm hoping to test FreeBSD/x86 and FreeBSD/amd64 with gcc 4.7 and then move them to it. Thank you for pointing this out. It is good to see the Hudson stuff continue to work. My nodes are kind of all down/gone -- some remain but the router is no longer configured as it was. ?- Jay ---------------------------------------- > Date: Fri, 22 Jun 2012 09:16:16 +0200 > From: wagner at elegosoft.com > To: m3devel at elegosoft.com > Subject: Re: [M3devel] help test 4.7 backend? > > I just noticed that m3tests have been hanging on luthien/AMD64_FREEBSD > for several days now in p006: > > http://hudson.modula3.com:8080/job/cm3-current-test-m3tests-AMD64_FREEBSD/479/console > > I don't know if it is related, but it used to run OK. > > Olaf > > On Sat, 16 Jun 2012 06:09:33 +0000 > Jay K wrote: > > > > > help test 4.7 backend? > > > > > > Can folks try out the new 4.7 backend? > > edit m3-sys/m3cc/src/m3makefile > > add your platform to the list near the top, mapped to "47" > > and then run scripts/python/boot2.sh > > and then, do it again, but edit config/Unix.common, the functon > > m3_backend to always args += m3back_optimize > > and optionally but preferably try with -O3 instead of -O2 in > > the same file > > and try running some GUI apps like solataire > > > > > > I could use help particularly with: > > SPARC{32,64}_LINUX > > PPC_{LINUX,OPENBSD,NETBSD,FREEBSD,DARWIN} > > ALPHA_OSF > > I386_LINUX, I386_INTERIX, I386_MINGWIN, I386_CYGWIN, because I'm being lazy > > > > > > > > I can do various x86/amd64, either in a VM or opencsw, > > but splitting that load would be good too. > > I might go back to not having much time soon or temporarily. > > > > > > Still to do: > > apply OpenBSD patches > > update from 4.7.0 to 4.7.1 that was just released. > > > > > > Thanks, > > - Jay > > > > -- > Olaf Wagner -- elego Software Solutions GmbH > Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany > phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 45 86 95 > http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: Berlin > Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194 From dragisha at m3w.org Mon Jun 25 12:51:05 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 12:51:05 +0200 Subject: [M3devel] Windows, Unicode file names Message-ID: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> Anybody aware of issues with FSWin32.m3 and cases where one actually has to cover situation with non-ASCII filenames under Windows? Met problem? TIA, dd From dabenavidesd at yahoo.es Mon Jun 25 18:52:35 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 17:52:35 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> Message-ID: <1340643155.54846.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: I was asked why there wasn't a faster Modula-3 environment (the Modula-3 NT386GNU is way too slow even nowadays) and without? answer I guess this is the same question of who wants Windows- ready environment and if you are interested DEC had a project M3lite for WinNT/95 (compatible) system I guess addressing compatibility with old users might get better results for CM3, but that's history now, which don't makes or makes little sense anyway on the understanding that Windows8 will be incompatible anyway with Win32. As of today I haven't understand what is the new API they will bring on, and frankly I don't care either if they have a new system to get hands on, but certainly you would want sort like that if you have a tablet or mobile phone where there isn't too much time to spend compiling from source Gcc. Thanks in advance ? --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] Windows, Unicode file names Para: "m3devel" Fecha: lunes, 25 de junio, 2012 05:51 Anybody aware of issues with FSWin32.m3 and cases where one actually has to cover situation with non-ASCII filenames under Windows? Met problem? TIA, dd -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 18:54:56 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 18:54:56 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340643155.54846.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1340643155.54846.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: <8A65A674-1120-459E-98FC-AF622D24EC66@m3w.org> Daniel, please start your own topics and don't dillute other discussions with off topic talk. Thanks in advance, dd On Jun 25, 2012, at 6:52 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > I was asked why there wasn't a faster Modula-3 environment (the Modula-3 NT386GNU is way too slow even nowadays) and without answer I guess this is the same question of who wants Windows- ready environment and if you are interested DEC had a project M3lite for WinNT/95 (compatible) system -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jun 25 19:04:20 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 18:04:20 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <8A65A674-1120-459E-98FC-AF622D24EC66@m3w.org> Message-ID: <1340643860.74333.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: thanks but I don't know if you know that M3lite was Win95 NT compatible system. Perhaps I missed what your point is, but this is the same question I guess (but I don't know your answer either that's a different point). See M3-FAQ ( WHAT IS M3-LITE, MS-WINDOWS SUPPORT ) Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 11:54 Daniel, please start your own topics and don't dillute other discussions with off topic talk. Thanks in advance,dd On Jun 25, 2012, at 6:52 PM, Daniel Alejandro Benavides D. wrote: Hi all: I was asked why there wasn't a faster Modula-3 environment (the Modula-3 NT386GNU is way too slow even nowadays) and without? answer I guess this is the same question of who wants Windows- ready environment and if you are interested DEC had a project M3lite for WinNT/95 (compatible) system -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 19:07:20 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 19:07:20 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340643860.74333.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1340643860.74333.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: <8E8B1021-7B2C-415F-A965-F49257C4C2FB@m3w.org> See subject - Windows, Unicode file names. Thank in advance. On Jun 25, 2012, at 7:04 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > thanks but I don't know if you know that M3lite was Win95 NT compatible system. > Perhaps I missed what your point is, but this is the same question I guess (but I don't know your answer either that's a different point). See M3-FAQ (WHAT IS M3-LITE, MS-WINDOWS SUPPORT ) > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jun 25 19:27:44 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 18:27:44 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <8E8B1021-7B2C-415F-A965-F49257C4C2FB@m3w.org> Message-ID: <1340645264.336.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance PS being clearer about topics is what I want so please be free to tell me as? as much I'm not --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:07 See subject - Windows, Unicode file names. Thank in advance. On Jun 25, 2012, at 7:04 PM, Daniel Alejandro Benavides D. wrote: Hi all: thanks but I don't know if you know that M3lite was Win95 NT compatible system. Perhaps I missed what your point is, but this is the same question I guess (but I don't know your answer either that's a different point). See M3-FAQ (WHAT IS M3-LITE, MS-WINDOWS SUPPORT?) Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 19:36:39 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 19:36:39 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340645264.336.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1340645264.336.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From rcolebur at SCIRES.COM Mon Jun 25 19:51:10 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Mon, 25 Jun 2012 13:51:10 -0400 Subject: [M3devel] EXT [M3commit] CVS Update: cm3 In-Reply-To: <20120624094248.AAB932474003@birch.elegosoft.com> References: <20120624094248.AAB932474003@birch.elegosoft.com> Message-ID: Does this mean HPUX will no longer be supported? -----Original Message----- From: Jay Krell [mailto:jkrell at elego.de] Sent: Sunday, June 24, 2012 7:43 AM To: m3commit at elegosoft.com Subject: EXT [M3commit] CVS Update: cm3 CVSROOT: /usr/cvs Changes by: jkrell at birch. 12/06/24 11:42:45 Modified files: cm3/m3-sys/cminstall/src/config-no-install/: Unix.common Log message: hpux_flags is never used, remove it From dabenavidesd at yahoo.es Mon Jun 25 20:06:10 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 19:06:10 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: Message-ID: <1340647570.11529.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 20:20:01 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 20:20:01 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340647570.11529.YahooMailClassic@web29706.mail.ird.yahoo.com> References: <1340647570.11529.YahooMailClassic@web29706.mail.ird.yahoo.com> Message-ID: <6DF57887-C46F-408C-863F-1242C4C4C6A9@m3w.org> Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > >> Hi all: >> I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. >> But in turn you want to keep compatibility with older file name encodes. >> I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! >> Thanks in advance > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 20:40:43 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 18:40:43 +0000 Subject: [M3devel] EXT [M3commit] CVS Update: cm3 In-Reply-To: References: <20120624094248.AAB932474003@birch.elegosoft.com>, Message-ID: No. This does not represent a loss of any support for any target.It is just a removal of a local variable that is initialized and never further referenced, unless I read the code incorrectly.On the other hand, I don't think anyone here has HPUX available for any testing/development.I used to, but no longer. - Jay > From: rcolebur at SCIRES.COM > To: jkrell at elego.de; m3devel at elegosoft.com > Date: Mon, 25 Jun 2012 13:51:10 -0400 > Subject: Re: [M3devel] EXT [M3commit] CVS Update: cm3 > > Does this mean HPUX will no longer be supported? > > -----Original Message----- > From: Jay Krell [mailto:jkrell at elego.de] > Sent: Sunday, June 24, 2012 7:43 AM > To: m3commit at elegosoft.com > Subject: EXT [M3commit] CVS Update: cm3 > > CVSROOT: /usr/cvs > Changes by: jkrell at birch. 12/06/24 11:42:45 > > Modified files: > cm3/m3-sys/cminstall/src/config-no-install/: Unix.common > > Log message: > hpux_flags is never used, remove it > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 20:49:22 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 20:49:22 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> Message-ID: <461C7DCF-E432-4434-BAD3-7FA3B9775F45@m3w.org> My situation was - Gtk2 interface (GtkFileChooser in my case) returns an UTF-8 encoded string. UTF-8 being GLib internal/native encoding. Neither CreateFileA not CreateFileW can handle it so I "hardcoded" some logic into FS.OpenFile(Readonly)? and handled a case with non-ASCII input. Ideal would be to have encoding information as an integral part of every TEXT, but? In my knowledge, POSIX systems handle UTF-8 filenames well (?check:) so explicit information on encoding for FS is needed only for Windows. On Jun 25, 2012, at 8:44 PM, Jay K wrote: > Functions like CreateFileA use the "ANSI" or "OEM" code page, subject to a public global in Win32, and the two code pages vary per-install (or per-user). It is just not a good system. > > > Functions like CreateFileW work very well with 16bit encoded characters. > > > Can/do we arrange to have 16bit encoded characters? > > > - Jay > > > From: dragisha at m3w.org > > Date: Mon, 25 Jun 2012 12:51:05 +0200 > > To: m3devel at elegosoft.com > > Subject: [M3devel] Windows, Unicode file names > > > > Anybody aware of issues with FSWin32.m3 and cases where one actually has to cover situation with non-ASCII filenames under Windows? Met problem? > > > > TIA, > > dd > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 20:44:18 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 18:44:18 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> References: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> Message-ID: Functions like CreateFileA use the "ANSI" or "OEM" code page, subject to a public global in Win32, and the two code pages vary per-install (or per-user). It is just not a good system. Functions like CreateFileW work very well with 16bit encoded characters. Can/do we arrange to have 16bit encoded characters? - Jay > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 12:51:05 +0200 > To: m3devel at elegosoft.com > Subject: [M3devel] Windows, Unicode file names > > Anybody aware of issues with FSWin32.m3 and cases where one actually has to cover situation with non-ASCII filenames under Windows? Met problem? > > TIA, > dd > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 21:05:59 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 21:05:59 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> Message-ID: <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> If you cared enough to check FSWin32.m3, answer would be obvious :). Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= VAR handle: WinNT.HANDLE; fname := M3toC.SharedTtoS(p); dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); pwText: WinBaseTypes.PCWSTR; BEGIN IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN (* dwNum includes terminating null character. that's +1 above. *) handle := WinBase.CreateFile( lpFileName := fname, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); ELSE pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); handle := WinBase.CreateFileW( lpFileName := pwText, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); DISPOSE(pwText); END; IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN Fail(p, fname); END; M3toC.FreeSharedS(p, fname); RETURN FileWin32.New(handle, FileWin32.Read) END OpenFileReadonly; And similar in OpenFile. Not nice :). Also, I've added CP_UTF8 constant to WinNLS.i3. On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > >> Hi all: >> OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. >> But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): >> http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html >> >> Thanks in advance >> >> --- El lun, 25/6/12, Dragi?a Duri? escribi?: >> >> De: Dragi?a Duri? >> Asunto: Re: [M3devel] Windows, Unicode file names >> Para: "Daniel Alejandro Benavides D." >> CC: "m3devel" >> Fecha: lunes, 25 de junio, 2012 12:36 >> >> Daniel, >> >> I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. >> >> Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. >> >> I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. >> >> dd >> >> On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: >> >>> Hi all: >>> I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. >>> But in turn you want to keep compatibility with older file name encodes. >>> I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! >>> Thanks in advance >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jun 25 21:01:56 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 20:01:56 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <6DF57887-C46F-408C-863F-1242C4C4C6A9@m3w.org> Message-ID: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 13:20 Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 21:39:04 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 19:39:04 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> Message-ID: I think I know what to do here and will look into it..later.. We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW.Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution.A layer above this needs to decode UTF8, if that is the encoding. Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. - Jay From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:05:59 +0200 To: dabenavidesd at yahoo.es CC: m3devel at elegosoft.com Subject: Re: [M3devel] Windows, Unicode file names If you cared enough to check FSWin32.m3, answer would be obvious :). Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= VAR handle: WinNT.HANDLE; fname := M3toC.SharedTtoS(p); dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); pwText: WinBaseTypes.PCWSTR; BEGIN IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN (* dwNum includes terminating null character. that's +1 above. *) handle := WinBase.CreateFile( lpFileName := fname, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); ELSE pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); handle := WinBase.CreateFileW( lpFileName := pwText, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); DISPOSE(pwText); END; IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN Fail(p, fname); END; M3toC.FreeSharedS(p, fname); RETURN FileWin32.New(handle, FileWin32.Read) END OpenFileReadonly; And similar in OpenFile. Not nice :). Also, I've added CP_UTF8 constant to WinNLS.i3. On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote:Hi all: So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 13:20 Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote:Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote:Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 21:48:09 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 21:48:09 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> Message-ID: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. As long as we agree on what exacty WIDECHAR is :) ===From wikipedia The Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed]. === On Jun 25, 2012, at 9:39 PM, Jay K wrote: > I think I know what to do here and will look into it..later.. > > We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. > Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. > A layer above this needs to decode UTF8, if that is the encoding. > > Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. > > - Jay > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:05:59 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > If you cared enough to check FSWin32.m3, answer would be obvious :). > > Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: > > PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= > VAR > handle: WinNT.HANDLE; > fname := M3toC.SharedTtoS(p); > dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); > pwText: WinBaseTypes.PCWSTR; > BEGIN > IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN > (* dwNum includes terminating null character. that's +1 above. > *) > handle := WinBase.CreateFile( > lpFileName := fname, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > ELSE > pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); > EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); > handle := WinBase.CreateFileW( > lpFileName := pwText, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > DISPOSE(pwText); > END; > > IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN > Fail(p, fname); > END; > M3toC.FreeSharedS(p, fname); > RETURN FileWin32.New(handle, FileWin32.Read) > END OpenFileReadonly; > > And similar in OpenFile. Not nice :). > > Also, I've added CP_UTF8 constant to WinNLS.i3. > > On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 22:17:52 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 20:17:52 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> Message-ID: I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert fromTEXT to a flat array of either, and if 32bits, walk the array, checking for > 0xFFFF, throw an exception or return some error if any found, narrow to 16bits, call some "W" function, free the flat array.The size can, I guess, vary between Win32 and non-Win32 platforms.Its size should be stored in a global to communicate between Modula-3 and C. I'd also quite like if TEXT was internally represented as a nul terminated flat array of 8 and/or 16 and/or 32bit quantities, materialzing on demand some of them. But I suspect that flat and readonly and exposing a concat operation are in conflict. I'm not sure. MFC uses a flat reference counted nul terminated representation and it works pretty well. It doesn't materialize-on-demand other widths. - Jay Subject: Re: [M3devel] Windows, Unicode file names From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:48:09 +0200 CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com To: jay.krell at cornell.edu It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. As long as we agree on what exacty WIDECHAR is :)===From wikipediaThe Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed].=== On Jun 25, 2012, at 9:39 PM, Jay K wrote:I think I know what to do here and will look into it..later.. We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. A layer above this needs to decode UTF8, if that is the encoding. Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. - Jay From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:05:59 +0200 To: dabenavidesd at yahoo.es CC: m3devel at elegosoft.com Subject: Re: [M3devel] Windows, Unicode file names If you cared enough to check FSWin32.m3, answer would be obvious :). Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= VAR handle: WinNT.HANDLE; fname := M3toC.SharedTtoS(p); dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); pwText: WinBaseTypes.PCWSTR; BEGIN IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN (* dwNum includes terminating null character. that's +1 above. *) handle := WinBase.CreateFile( lpFileName := fname, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); ELSE pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); handle := WinBase.CreateFileW( lpFileName := pwText, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); DISPOSE(pwText); END; IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN Fail(p, fname); END; M3toC.FreeSharedS(p, fname); RETURN FileWin32.New(handle, FileWin32.Read) END OpenFileReadonly; And similar in OpenFile. Not nice :). Also, I've added CP_UTF8 constant to WinNLS.i3. On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote:Hi all: So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 13:20 Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote:Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote:Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Mon Jun 25 22:34:22 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Mon, 25 Jun 2012 16:34:22 -0400 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> Message-ID: <20120625203422.GA24287@topoi.pooq.com> On Mon, Jun 25, 2012 at 08:17:52PM +0000, Jay K wrote: > > I'd also quite like if TEXT was internally represented as a nul > terminated flat array of 8 and/or 16 and/or 32bit quantities, > materialzing on demand some of them. Does that conflict with NUL being a valid ASCII character? -- hendrik From rodney_bates at lcwb.coop Mon Jun 25 22:29:06 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Mon, 25 Jun 2012 15:29:06 -0500 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> Message-ID: <4FE8CA12.5040104@lcwb.coop> On 06/25/2012 02:48 PM, Dragi?a Duri? wrote: > It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. > > At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. > > As long as we agree on what exacty WIDECHAR is :) > ===From wikipedia > The Microsoft Windows application programming interfaces Win32 and Win64 , as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix -like systems generally require 32-bit values encoded using UTF-32 ^[/citation needed /] . > === > This is not necessarily a proposal, but FWIW: hen working on my altered cm3 TEXT implementations, I put every relevant thing I could find into a state that should allow M3 WIDECHAR to be 32-bit, with only one or two declarations changed. I think Pickles might need some attention to cope with this, however. We would want them to not only handle 32-bit WIDECHAR, but be able to read older pickle files that used 16-bits. > > On Jun 25, 2012, at 9:39 PM, Jay K wrote: > >> I think I know what to do here and will look into it..later.. >> >> We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. >> Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. >> A layer above this needs to decode UTF8, if that is the encoding. >> >> Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. >> >> - Jay > From jay.krell at cornell.edu Mon Jun 25 22:46:18 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 20:46:18 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <20120625203422.GA24287@topoi.pooq.com> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com> Message-ID: Somewhat but not fully. Text.Length should fetch a stored length. As I'm sure it already does.That length should always be correctly maintained. Same as today.Adding one extra nul at the end doesn't invalidate the data.std::string has the same properties -- c_str() can on-demand append a terminal nul,but there could also be one in the string itself.I understand it is a bit wierd. Maintaining a terminal nul does add cost that might be wasted.And reduces the capacity by one.It could be on-demand, I guess. - Jay > Date: Mon, 25 Jun 2012 16:34:22 -0400 > From: hendrik at topoi.pooq.com > To: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > On Mon, Jun 25, 2012 at 08:17:52PM +0000, Jay K wrote: > > > > I'd also quite like if TEXT was internally represented as a nul > > terminated flat array of 8 and/or 16 and/or 32bit quantities, > > materialzing on demand some of them. > > Does that conflict with NUL being a valid ASCII character? > > -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 23:09:37 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 23:09:37 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> Message-ID: On Jun 25, 2012, at 10:17 PM, Jay K wrote: > I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from > TEXT to a flat array of either, and if 32bits, walk the array, checking for > 0xFFFF, throw an exception or return some error if any found, narrow to 16bits, call some "W" function, free the flat array. > The size can, I guess, vary between Win32 and non-Win32 platforms. a) If you like to make it as unportable as possible then yes - 16 or 32 is not important. b) invalid value would be over 0xFFFFF, not 0xFFFF c) Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? WinNLS does that. Simple narrowing (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to UTF-16 is very different thing. d) Size varies, yes. > Its size should be stored in a global to communicate between Modula-3 and C. > > > I'd also quite like if TEXT was internally represented as a nul terminated flat array of 8 and/or 16 and/or 32bit quantities, materialzing on demand some of them. But I suspect that flat and readonly and exposing a concat operation are in conflict. I'm not sure. MFC uses a flat reference counted nul terminated representation and it works pretty well. It doesn't materialize-on-demand other widths. > > - Jay > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:48:09 +0200 > CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. > > At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. > > As long as we agree on what exacty WIDECHAR is :) > ===From wikipedia > The Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed]. > === > > > On Jun 25, 2012, at 9:39 PM, Jay K wrote: > > I think I know what to do here and will look into it..later.. > > We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. > Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. > A layer above this needs to decode UTF8, if that is the encoding. > > Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. > > - Jay > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:05:59 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > If you cared enough to check FSWin32.m3, answer would be obvious :). > > Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: > > PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= > VAR > handle: WinNT.HANDLE; > fname := M3toC.SharedTtoS(p); > dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); > pwText: WinBaseTypes.PCWSTR; > BEGIN > IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN > (* dwNum includes terminating null character. that's +1 above. > *) > handle := WinBase.CreateFile( > lpFileName := fname, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > ELSE > pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); > EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); > handle := WinBase.CreateFileW( > lpFileName := pwText, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > DISPOSE(pwText); > END; > > IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN > Fail(p, fname); > END; > M3toC.FreeSharedS(p, fname); > RETURN FileWin32.New(handle, FileWin32.Read) > END OpenFileReadonly; > > And similar in OpenFile. Not nice :). > > Also, I've added CP_UTF8 constant to WinNLS.i3. > > On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 23:11:49 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 23:11:49 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <4FE8CA12.5040104@lcwb.coop> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <4FE8CA12.5040104@lcwb.coop> Message-ID: <99C12F66-6DC0-4FC3-BC99-3C2A61595CBC@m3w.org> I agree with this. This way we are compatible with Unices (majority of systems we use) but we also have straight way to W functions of Windows API, similar to method I used but with distinctive presumption of input encoding. On Jun 25, 2012, at 10:29 PM, Rodney M. Bates wrote: > This is not necessarily a proposal, but FWIW: > > hen working on my altered cm3 TEXT implementations, I put every relevant thing I could find into > a state that should allow M3 WIDECHAR to be 32-bit, with only one or two declarations > changed. I think Pickles might need some attention to cope with this, however. We would > want them to not only handle 32-bit WIDECHAR, but be able to read older pickle files that > used 16-bits. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 23:30:08 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 21:30:08 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> , Message-ID: > Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? Yes. > WinNLS does that. I doubt that. There is a 32bit to 16bit conversion?Ok, I guess there is. "Surrogate pairs" and all that?Maybe not in WinNLS, but easy enough for us to write, in portable C or Modula-3. :)Part of Text.i3 perhaps. So then, I guess I can sign up for WIDECHAR being 32bits across the board. - Jay Subject: Re: [M3devel] Windows, Unicode file names From: dragisha at m3w.org Date: Mon, 25 Jun 2012 23:09:37 +0200 CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com To: jay.krell at cornell.edu On Jun 25, 2012, at 10:17 PM, Jay K wrote:I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from TEXT to a flat array of either, and if 32bits, walk the array, checking for > 0xFFFF, throw an exception or return some error if any found, narrow to 16bits, call some "W" function, free the flat array. The size can, I guess, vary between Win32 and non-Win32 platforms. a) If you like to make it as unportable as possible then yes - 16 or 32 is not important.b) invalid value would be over 0xFFFFF, not 0xFFFFc) Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? WinNLS does that. Simple narrowing (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to UTF-16 is very different thing.d) Size varies, yes. Its size should be stored in a global to communicate between Modula-3 and C. I'd also quite like if TEXT was internally represented as a nul terminated flat array of 8 and/or 16 and/or 32bit quantities, materialzing on demand some of them. But I suspect that flat and readonly and exposing a concat operation are in conflict. I'm not sure. MFC uses a flat reference counted nul terminated representation and it works pretty well. It doesn't materialize-on-demand other widths. - Jay Subject: Re: [M3devel] Windows, Unicode file names From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:48:09 +0200 CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com To: jay.krell at cornell.edu It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. As long as we agree on what exacty WIDECHAR is :)===From wikipediaThe Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed].=== On Jun 25, 2012, at 9:39 PM, Jay K wrote:I think I know what to do here and will look into it..later.. We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. A layer above this needs to decode UTF8, if that is the encoding. Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. - Jay From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:05:59 +0200 To: dabenavidesd at yahoo.es CC: m3devel at elegosoft.com Subject: Re: [M3devel] Windows, Unicode file names If you cared enough to check FSWin32.m3, answer would be obvious :). Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= VAR handle: WinNT.HANDLE; fname := M3toC.SharedTtoS(p); dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); pwText: WinBaseTypes.PCWSTR; BEGIN IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN (* dwNum includes terminating null character. that's +1 above. *) handle := WinBase.CreateFile( lpFileName := fname, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); ELSE pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); handle := WinBase.CreateFileW( lpFileName := pwText, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); DISPOSE(pwText); END; IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN Fail(p, fname); END; M3toC.FreeSharedS(p, fname); RETURN FileWin32.New(handle, FileWin32.Read) END OpenFileReadonly; And similar in OpenFile. Not nice :). Also, I've added CP_UTF8 constant to WinNLS.i3. On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote:Hi all: So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 13:20 Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote:Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote:Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 26 00:55:45 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 00:55:45 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> , Message-ID: <1E70B011-E236-4931-AB6C-78EC47EA8126@m3w.org> On Jun 25, 2012, at 11:30 PM, Jay K wrote: > > Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? > > Yes. > > > WinNLS does that. > > > I doubt that. There is a 32bit to 16bit conversion? http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx whatever this means: 12000utf-32Unicode UTF-32, little endian byte order; available only to managed applications 12001utf-32BEUnicode UTF-32, big endian byte order; available only to managed applications > Ok, I guess there is. "Surrogate pairs" and all that? > Maybe not in WinNLS, but easy enough for us to write, in portable C or Modula-3. :) That too :) > Part of Text.i3 perhaps. UTF-32 -> UTF-16? Maybe. > > > So then, I guess I can sign up for WIDECHAR being 32bits across the board. > > - Jay > > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 23:09:37 +0200 > CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > > On Jun 25, 2012, at 10:17 PM, Jay K wrote: > > I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from > TEXT to a flat array of either, and if 32bits, walk the array, checking for > 0xFFFF, throw an exception or return some error if any found, narrow to 16bits, call some "W" function, free the flat array. > The size can, I guess, vary between Win32 and non-Win32 platforms. > > a) If you like to make it as unportable as possible then yes - 16 or 32 is not important. > b) invalid value would be over 0xFFFFF, not 0xFFFF > c) Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? WinNLS does that. Simple narrowing (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to UTF-16 is very different thing. > d) Size varies, yes. > > Its size should be stored in a global to communicate between Modula-3 and C. > > > I'd also quite like if TEXT was internally represented as a nul terminated flat array of 8 and/or 16 and/or 32bit quantities, materialzing on demand some of them. But I suspect that flat and readonly and exposing a concat operation are in conflict. I'm not sure. MFC uses a flat reference counted nul terminated representation and it works pretty well. It doesn't materialize-on-demand other widths. > > - Jay > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:48:09 +0200 > CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. > > At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. > > As long as we agree on what exacty WIDECHAR is :) > ===From wikipedia > The Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed]. > === > > > On Jun 25, 2012, at 9:39 PM, Jay K wrote: > > I think I know what to do here and will look into it..later.. > > We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. > Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. > A layer above this needs to decode UTF8, if that is the encoding. > > Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. > > - Jay > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:05:59 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > If you cared enough to check FSWin32.m3, answer would be obvious :). > > Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: > > PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= > VAR > handle: WinNT.HANDLE; > fname := M3toC.SharedTtoS(p); > dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); > pwText: WinBaseTypes.PCWSTR; > BEGIN > IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN > (* dwNum includes terminating null character. that's +1 above. > *) > handle := WinBase.CreateFile( > lpFileName := fname, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > ELSE > pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); > EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); > handle := WinBase.CreateFileW( > lpFileName := pwText, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > DISPOSE(pwText); > END; > > IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN > Fail(p, fname); > END; > M3toC.FreeSharedS(p, fname); > RETURN FileWin32.New(handle, FileWin32.Read) > END OpenFileReadonly; > > And similar in OpenFile. Not nice :). > > Also, I've added CP_UTF8 constant to WinNLS.i3. > > On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Tue Jun 26 02:58:05 2012 From: jay.krell at cornell.edu (Jay K) Date: Tue, 26 Jun 2012 00:58:05 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1E70B011-E236-4931-AB6C-78EC47EA8126@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , , , , <1E70B011-E236-4931-AB6C-78EC47EA8126@m3w.org> Message-ID: ? > http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx?? ? > 12000utf-32Unicode UTF-32, little endian byte order; available only to? managed applications?? ? > 12001utf-32BEUnicode UTF-32, big endian byte order; available only to? managed applications ? Is not useful to us...unless we target .NET instead of native code... Portable Modula-3 or C it should be. ?- Jay ________________________________ > From: dragisha at m3w.org > Date: Tue, 26 Jun 2012 00:55:45 +0200 > To: jay.krell at cornell.edu > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > > On Jun 25, 2012, at 11:30 PM, Jay K wrote: > > > Why would you narrow it to 16bit? You need to convert to UTF-16 and > make it ready for Windows API calls? > > Yes. > > > WinNLS does that. > > > I doubt that. There is a 32bit to 16bit conversion? > > http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx > > whatever this means: > 12000utf-32Unicode UTF-32, little endian byte order; available only to > managed applications > 12001utf-32BEUnicode UTF-32, big endian byte order; available only to > managed applications > > Ok, I guess there is. "Surrogate pairs" and all that? > Maybe not in WinNLS, but easy enough for us to write, in portable C or > Modula-3. :) > > That too :) > > Part of Text.i3 perhaps. > > UTF-32 -> UTF-16? Maybe. > > > > So then, I guess I can sign up for WIDECHAR being 32bits across the board. > > - Jay > > ________________________________ > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 23:09:37 +0200 > CC: dabenavidesd at yahoo.es; > m3devel at elegosoft.com > To: jay.krell at cornell.edu > > > On Jun 25, 2012, at 10:17 PM, Jay K wrote: > > I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from > TEXT to a flat array of either, and if 32bits, walk the array, checking > for > 0xFFFF, throw an exception or return some error if any found, > narrow to 16bits, call some "W" function, free the flat array. > The size can, I guess, vary between Win32 and non-Win32 platforms. > > a) If you like to make it as unportable as possible then yes - 16 or 32 > is not important. > b) invalid value would be over 0xFFFFF, not 0xFFFF > c) Why would you narrow it to 16bit? You need to convert to UTF-16 and > make it ready for Windows API calls? WinNLS does that. Simple narrowing > (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to > UTF-16 is very different thing. > d) Size varies, yes. > > Its size should be stored in a global to communicate between Modula-3 and C. > > > I'd also quite like if TEXT was internally represented as a nul > terminated flat array of 8 and/or 16 and/or 32bit quantities, > materialzing on demand some of them. But I suspect that flat and > readonly and exposing a concat operation are in conflict. I'm not sure. > MFC uses a flat reference counted nul terminated representation and it > works pretty well. It doesn't materialize-on-demand other widths. > > - Jay > ________________________________ > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:48:09 +0200 > CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > It can be what cm3 people had in mind when they created WIDECHAR as a > catchall for Unicode. > > At first glance it looked like no solution to me, but after counting to > ten - I think it is. We can have an UTF-8 layer and use it when and > where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. > > As long as we agree on what exacty WIDECHAR is :) > ===From wikipedia > The Microsoft Windows application programming > interfaces Win32 and Win64, > as well as > the Java and .Net > Framework platforms, > require that wide character variables be defined as 16-bit values, and > that characters be encoded > using UTF-16 (due to former use of > UCS-2), while modern Unix-like > systems generally require 32-bit values encoded > using UTF-32[citation > needed]. > === > > > On Jun 25, 2012, at 9:39 PM, Jay K wrote: > > I think I know what to do here and will look into it..later.. > > We have TEXT. We should just always get WIDECHARs out of it and call > CreateFileW. > Assuming UTF8 is the wrong solution at this level, and passing in UTF8 > won't work with the correct solution. > A layer above this needs to decode UTF8, if that is the encoding. > > Unless someone has declared and implemented that TEXT is in fact always > UTF8-encoded, which I doubt. > > - Jay > ________________________________ > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:05:59 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > If you cared enough to check FSWin32.m3, answer would be obvious :). > > Whatever I do with pathname before I call FS.OpenFile(Readonly)? - > FSWin32.m3 will call CreateFileA. My solution is: > > PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= > VAR > handle: WinNT.HANDLE; > fname := M3toC.SharedTtoS(p); > dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, > NIL, 0); > pwText: WinBaseTypes.PCWSTR; > BEGIN > IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN > (* dwNum includes terminating null character. that's +1 above. > *) > handle := WinBase.CreateFile( > lpFileName := fname, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > ELSE > pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), > WinBaseTypes.PCWSTR); > EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, > pwText, dwNum); > handle := WinBase.CreateFileW( > lpFileName := pwText, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > DISPOSE(pwText); > END; > > IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN > Fail(p, fname); > END; > M3toC.FreeSharedS(p, fname); > RETURN FileWin32.New(handle, FileWin32.Read) > END OpenFileReadonly; > > And similar in OpenFile. Not nice :). > > Also, I've added CP_UTF8 constant to WinNLS.i3. > > On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > So do you need Double-Byte Character String module as currently in TEXT > types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a > Duri? > escribi?: > > De: Dragi?a Duri? > > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > > > CC: "m3devel" > > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit > partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to > FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at > ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of > Modula-3 (but for CM3, though it isn't compiled in elego servers, but > here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a > Duri? > escribi?: > > De: Dragi?a Duri? > > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > > > CC: "m3devel" > > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest > to me. Once you start a topic, and I can understand what is it about, > and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files > with filenames in ASCII and UTF-16. Everything else - you must check > twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with > people who understand this problem. My solution is a workaround and > assumes filename is UTF-8 or ASCII. I would like to start discussion on > this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 > / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't > care at all either) I don't know know your problem was because it won't > be able to be solved! > Thanks in advance > > > From dragisha at m3w.org Tue Jun 26 12:18:41 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 12:18:41 +0200 Subject: [M3devel] =?windows-1252?q?AND_=28=85=2C_16=5Fff=29=85_Not_seriou?= =?windows-1252?q?s_-_or_so_I_hope!?= Message-ID: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> This piece of code, from TextClass.m3, disturbs me? a lot. If we are to use WIDECHAR, I think we must be a lot more serious than this. Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. Probably by someone whose mother tongue is fully writeable with ASCII :). ==== PROCEDURE GetChars (t: TEXT; VAR a: ARRAY OF CHAR; start: CARDINAL) = VAR info : Info; cnt : INTEGER; next : CARDINAL := 0; buf : ARRAY [0..127] OF WIDECHAR; BEGIN t.get_info (info); cnt := MIN (NUMBER (a), info.length - start); WHILE (cnt > 0) DO t.get_wide_chars (buf, start); FOR i := FIRST (buf) TO LAST (buf) DO IF (cnt = 0) THEN RETURN END; a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); INC (next); DEC (cnt); END; INC (start, NUMBER (buf)); END; END GetChars; ==== From dabenavidesd at yahoo.es Tue Jun 26 14:12:42 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 26 Jun 2012 13:12:42 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> Message-ID: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: Maybe is a left over of older code (almost just used a decade ago) but if not, then this meant to be just a partial implementation? If we are to get serious about memory usage seems over strict (or just in case you don't need system NIL terminated widechars be checked?). Thanks in advance --- El mar, 26/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "m3devel" Fecha: martes, 26 de junio, 2012 05:18 This piece of code, from TextClass.m3, disturbs me? a lot. If we are to use WIDECHAR, I think we must be a lot more serious than this. Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. Probably by someone whose mother tongue is fully writeable with ASCII :). ==== PROCEDURE GetChars (t: TEXT;? VAR a: ARRAY OF CHAR;? start: CARDINAL) = VAR ???info : Info; ???cnt? : INTEGER; ???next : CARDINAL := 0; ???buf? : ARRAY [0..127] OF WIDECHAR; BEGIN ???t.get_info (info); ???cnt := MIN (NUMBER (a), info.length - start); ???WHILE (cnt > 0) DO ? ???t.get_wide_chars (buf, start); ? ???FOR i := FIRST (buf) TO LAST (buf) DO ? ? ???IF (cnt = 0) THEN RETURN END; ? ? ???a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); ? ? ???INC (next);? DEC (cnt); ? ???END; ? ???INC (start, NUMBER (buf)); ???END; END GetChars; ==== -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 26 14:27:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 14:27:00 +0200 Subject: [M3devel] =?windows-1252?q?AND_=28=85=2C_16=5Fff=29=85_Not_seriou?= =?windows-1252?q?s_-_or_so_I_hope!?= In-Reply-To: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> Message-ID: <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> If you cared to read, for example Text.i3, you would see this is exactly what cm3 people meant to be. On Jun 26, 2012, at 2:12 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > Maybe is a left over of older code (almost just used a decade ago) but if not, then this meant to be just a partial implementation? If we are to get serious about memory usage seems over strict (or just in case you don't need system NIL terminated widechars be checked?). > Thanks in advance > > --- El mar, 26/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > Para: "m3devel" > Fecha: martes, 26 de junio, 2012 05:18 > > This piece of code, from TextClass.m3, disturbs me? a lot. > > If we are to use WIDECHAR, I think we must be a lot more serious than this. > > Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. > > Probably by someone whose mother tongue is fully writeable with ASCII :). > > ==== > PROCEDURE GetChars (t: TEXT; VAR a: ARRAY OF CHAR; start: CARDINAL) = > VAR > info : Info; > cnt : INTEGER; > next : CARDINAL := 0; > buf : ARRAY [0..127] OF WIDECHAR; > BEGIN > t.get_info (info); > cnt := MIN (NUMBER (a), info.length - start); > WHILE (cnt > 0) DO > t.get_wide_chars (buf, start); > FOR i := FIRST (buf) TO LAST (buf) DO > IF (cnt = 0) THEN RETURN END; > a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); > INC (next); DEC (cnt); > END; > INC (start, NUMBER (buf)); > END; > END GetChars; > ==== > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Tue Jun 26 14:47:31 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 26 Jun 2012 13:47:31 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> Message-ID: <1340714851.17688.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: copied that, but interface TextClass GetChars is kind of different from GetChar in Text. I can't see the interrelation Thanks in advance --- El mar, 26/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: martes, 26 de junio, 2012 07:27 If you cared to read, for example Text.i3, you would see this is exactly what cm3 people meant to be. On Jun 26, 2012, at 2:12 PM, Daniel Alejandro Benavides D. wrote: Hi all: Maybe is a left over of older code (almost just used a decade ago) but if not, then this meant to be just a partial implementation? If we are to get serious about memory usage seems over strict (or just in case you don't need system NIL terminated widechars be checked?). Thanks in advance --- El mar, 26/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "m3devel" Fecha: martes, 26 de junio, 2012 05:18 This piece of code, from TextClass.m3, disturbs me? a lot. If we are to use WIDECHAR, I think we must be a lot more serious than this. Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. Probably by someone whose mother tongue is fully writeable with ASCII :). ==== PROCEDURE GetChars (t: TEXT;? VAR a: ARRAY OF CHAR;? start: CARDINAL) = VAR ???info : Info; ???cnt? : INTEGER; ???next : CARDINAL := 0; ???buf? : ARRAY [0..127] OF WIDECHAR; BEGIN ???t.get_info (info); ???cnt := MIN (NUMBER (a), info.length - start); ???WHILE (cnt > 0) DO ? ???t.get_wide_chars (buf, start); ? ???FOR i := FIRST (buf) TO LAST (buf) DO ? ? ???IF (cnt = 0) THEN RETURN END; ? ? ???a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); ? ? ???INC (next);? DEC (cnt); ? ???END; ? ???INC (start, NUMBER (buf)); ???END; END GetChars; ==== -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Tue Jun 26 16:28:46 2012 From: jay.krell at cornell.edu (Jay K) Date: Tue, 26 Jun 2012 14:28:46 +0000 Subject: [M3devel] =?iso-8859-2?q?AND_=28=2E=2C_16=5Fff=29=2E_Not_serious_?= =?iso-8859-2?q?-_or_so_I_hope!?= In-Reply-To: <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com>, <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> Message-ID: ?> 128 limit I haven't read the code enough yet to verify that but you are probably right ?> ignoring everything over 16_FF Probably that is the responsibility/claim of the caller of GetChars. If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. Perhaps raising an exception would be reasonable to signal the loss of data. Or something. There is HasWideChars for you to check. There is no encoding implied remember. This isn't UTF8 data. ?- Jay ________________________________ > From: dragisha at m3w.org > Date: Tue, 26 Jun 2012 14:27:00 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > If you cared to read, for example Text.i3, you would see this is > exactly what cm3 people meant to be. > > On Jun 26, 2012, at 2:12 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > Maybe is a left over of older code (almost just used a decade ago) but > if not, then this meant to be just a partial implementation? If we are > to get serious about memory usage seems over strict (or just in case > you don't need system NIL terminated widechars be checked?). > Thanks in advance > > --- El mar, 26/6/12, Dragi?a Duri? > > escribi?: > > De: Dragi?a Duri? > > Asunto: [M3devel] AND (., 16_ff). Not serious - or so I hope! > Para: "m3devel" > > Fecha: martes, 26 de junio, 2012 05:18 > > This piece of code, from TextClass.m3, disturbs me. a lot. > > If we are to use WIDECHAR, I think we must be a lot more serious than this. > > Probably, text pieces are limited to 128 bytes by design, somewhere. > But - whose idea was to "narrow" by ignoring everything except 8 LSB's? > By mapping set of 2^20 elements to set of 2^8 elements. > > Probably by someone whose mother tongue is fully writeable with ASCII :). > > ==== > PROCEDURE GetChars (t: TEXT; VAR a: ARRAY OF CHAR; start: CARDINAL) = > VAR > info : Info; > cnt : INTEGER; > next : CARDINAL := 0; > buf : ARRAY [0..127] OF WIDECHAR; > BEGIN > t.get_info (info); > cnt := MIN (NUMBER (a), info.length - start); > WHILE (cnt > 0) DO > t.get_wide_chars (buf, start); > FOR i := FIRST (buf) TO LAST (buf) DO > IF (cnt = 0) THEN RETURN END; > a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); > INC (next); DEC (cnt); > END; > INC (start, NUMBER (buf)); > END; > END GetChars; > ==== > > > From dragisha at m3w.org Tue Jun 26 17:14:06 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 17:14:06 +0200 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com>, <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> Message-ID: <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> On Jun 26, 2012, at 4:28 PM, Jay K wrote: > > 128 limit > > I haven't read the code enough yet to verify that but you are probably right I was not right :), that call is incremental. > > > ignoring everything over 16_FF > > Probably that is the responsibility/claim of the caller of GetChars. > If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. > Perhaps raising an exception would be reasonable to signal the loss of data. Or something. > There is HasWideChars for you to check. > > There is no encoding implied remember. > This isn't UTF8 data. It is not, but probably only way to solve this without exception is to make UTF8 "official" 8bit encoding :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Tue Jun 26 18:00:05 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 26 Jun 2012 12:00:05 -0400 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> Message-ID: <20120626160005.GA29355@topoi.pooq.com> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: > > Somewhat but not fully. Text.Length should fetch a stored length. As > I'm sure it already does.That length should always be correctly > maintained. Same as today.Adding one extra nul at the end doesn't > invalidate the data.std::string has the same properties -- c_str() can > on-demand append a terminal nul,but there could also be one in the > string itself.I understand it is a bit wierd. Maintaining a terminal > nul does add cost that might be wasted.And reduces the capacity by > one.It could be on-demand, I guess. - Jay Don't need the 'on demand'. For the benefits of C interoperability, the extra byte is well worth the price. What I'm worrying about is someone using an enbedded NUL as an end-of-string marker. I smell more bugs creeping in. But I guess bug are inherent in C use, so I'm not surprised seeing them in C interoperation. -- hendrik From jay.krell at cornell.edu Tue Jun 26 18:34:01 2012 From: jay.krell at cornell.edu (Jay) Date: Tue, 26 Jun 2012 09:34:01 -0700 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> Message-ID: >> > 128 limit >> >> I haven't read the code enough yet to verify that but you are probably right > > I was not right :), that call is incremental. I looked for that aspect too but missed it. :( >> > ignoring everything over 16_FF >> >> Probably that is the responsibility/claim of the caller of GetChars. >> If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. >> Perhaps raising an exception would be reasonable to signal the loss of data. Or something. >> There is HasWideChars for you to check. > > >> >> There is no encoding implied remember. >> This isn't UTF8 data. > > > It is not, but probably only way to solve this without exception is to make UTF8 "official" 8bit encoding :) > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. Java is in-between but also simple & predictable -- there being no narrow option other than array of byte, which is reasonable. - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 26 18:46:07 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 18:46:07 +0200 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> Message-ID: <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> You had idea in other message. Store length! Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! On Jun 26, 2012, at 6:34 PM, Jay wrote: > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Tue Jun 26 18:51:00 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 26 Jun 2012 17:51:00 +0100 (BST) Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> Message-ID: <1340729460.40972.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: it would be so much greater fun time/verify and correct than use by hand. Let's do it sooner than later. Thanks in advance --- El mar, 26/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! Para: "Jay K" CC: dabenavidesd at yahoo.es, "m3devel" Fecha: martes, 26 de junio, 2012 10:14 On Jun 26, 2012, at 4:28 PM, Jay K wrote: > 128 limit I haven't read the code enough yet to verify that but you are probably right I was not right :), that call is incremental. ?> ignoring everything over 16_FF Probably that is the responsibility/claim of the caller of GetChars. If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. Perhaps raising an exception would be reasonable to signal the loss of data. Or something. There is HasWideChars for you to check. There is no encoding implied remember. This isn't UTF8 data. It is not, but probably only way to solve this without exception is to make UTF8 "official" 8bit encoding :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 26 19:01:42 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 19:01:42 +0200 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: <117F1599-4A24-462F-9462-7CC756BB7E4B@m3w.org> As for input encoding? Benjamin Kowarch (of M2R10 project) solved this with pragmas. There is good idea on how to instruct parser about text encoding used for source code (meaning also encoding used for string literals). As it's dependent on locals settings, it is important to let compiler know how to parse source. Of course, Unicode string literals will be stored as UTF8 strings after parsing. On Jun 26, 2012, at 6:46 PM, Dragi?a Duri? wrote: > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! > > On Jun 26, 2012, at 6:34 PM, Jay wrote: > >> I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. >> >> >> Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Tue Jun 26 20:19:55 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 26 Jun 2012 14:19:55 -0400 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> Message-ID: <20120626181955.GB29355@topoi.pooq.com> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi?a Duri? wrote: > This piece of code, from TextClass.m3, disturbs me? a lot. > > If we are to use WIDECHAR, I think we must be a lot more serious than this. > > Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. > > Probably by someone whose mother tongue is fully writeable with ASCII :). I'm told the Japanese hate UTF-8, because it expands their characters from two bytes to three. -- hendrik From mika at async.caltech.edu Tue Jun 26 20:50:08 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 26 Jun 2012 11:50:08 -0700 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <20120626160005.GA29355@topoi.pooq.com> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> Message-ID: <20120626185008.50E131A205B@async.async.caltech.edu> As far as I know, SRC M3 and PM3 come with a TEXT implementation that works exactly as described below. An extra byte is used at the end with a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. One of the big advantages of the old version is that Text.Hash is really, really fast. Especially on Alphas... it's hugely more expensive to have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under CM3 than under the old compilers and runtimes. We're talking a factor of five or so in speed since the Table routines are generally entirely dominated by Text.Hash. Mika Hendrik Boom writes: >On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: >> >> Somewhat but not fully. Text.Length should fetch a stored length. As >> I'm sure it already does.That length should always be correctly >> maintained. Same as today.Adding one extra nul at the end doesn't >> invalidate the data.std::string has the same properties -- c_str() can >> on-demand append a terminal nul,but there could also be one in the >> string itself.I understand it is a bit wierd. Maintaining a terminal >> nul does add cost that might be wasted.And reduces the capacity by >> one.It could be on-demand, I guess. - Jay > >Don't need the 'on demand'. For the benefits of C interoperability, the >extra byte is well worth the price. What I'm worrying about is someone >using an enbedded NUL as an end-of-string marker. I smell more bugs >creeping in. But I guess bug are inherent in C use, so I'm not >surprised seeing them in C interoperation. > >-- hendrik From mika at async.caltech.edu Tue Jun 26 20:52:21 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 26 Jun 2012 11:52:21 -0700 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: <20120626185221.24C8B1A205B@async.async.caltech.edu> =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > >--Apple-Mail=_03217A26-DF5A-42D7-BAA5-DF805C7EE80E >Content-Transfer-Encoding: quoted-printable >Content-Type: text/plain; > charset=us-ascii > >You had idea in other message. Store length! > >Another idea - store partial list of indices to character locations. So = >whatever one does, that list can be used/expanded. Whatever storage = >issues this makes, they are probably minor as compared to 32bit WIDECHAR = >for all idea. > >Mika had performance problems with cm3 TEXT. I hope he follows and cares = >to refresh us on those issues?! Apart from the hash table issue I mentioned there were horrible performance issues when concatenating in particular ways, but I think that's been solved now. I don't think anyone has looked at Text.Hash very closely. Mika From dmuysers at hotmail.com Tue Jun 26 21:38:16 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Tue, 26 Jun 2012 21:38:16 +0200 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <20120626181955.GB29355@topoi.pooq.com> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> Message-ID: So let them hate it. Memory is not a problem anymore. -------------------------------------------------- From: "Hendrik Boom" Sent: Tuesday, June 26, 2012 8:19 PM To: Subject: Re: [M3devel]AND (?, 16_ff)? Not serious - or so I hope! > On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi?a Duri? wrote: >> This piece of code, from TextClass.m3, disturbs me? a lot. >> >> If we are to use WIDECHAR, I think we must be a lot more serious than >> this. >> >> Probably, text pieces are limited to 128 bytes by design, somewhere. >> But - whose idea was to "narrow" by ignoring everything except 8 LSB's? >> By mapping set of 2^20 elements to set of 2^8 elements. >> >> Probably by someone whose mother tongue is fully writeable with ASCII :). > > I'm told the Japanese hate UTF-8, because it expands their characters > from two bytes to three. > > -- hendrik > From dragisha at m3w.org Tue Jun 26 21:53:18 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 21:53:18 +0200 Subject: [M3devel] =?windows-1252?q?AND_=28=85=2C_16=5Fff=29=85_Not_seriou?= =?windows-1252?q?s_-_or_so_I_hope!?= In-Reply-To: References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> Message-ID: <5D286A2C-CA1A-4F8B-846A-CDCBDACA2661@m3w.org> Also? If we add length info to TEXT fragments, we might as well add encoding info :). So, most of TEXT fragments in memory will use same (system default) encoding but there will also be a way to mix them, convert to system default or anything some API (like Win32) requires. On Jun 26, 2012, at 9:38 PM, Dirk Muysers wrote: > So let them hate it. Memory is not a problem anymore. > > -------------------------------------------------- > From: "Hendrik Boom" > Sent: Tuesday, June 26, 2012 8:19 PM > To: > Subject: Re: [M3devel]AND (?, 16_ff)? Not serious - or so I hope! > >> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi?a Duri? wrote: >>> This piece of code, from TextClass.m3, disturbs me? a lot. >>> >>> If we are to use WIDECHAR, I think we must be a lot more serious than this. >>> >>> Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. >>> >>> Probably by someone whose mother tongue is fully writeable with ASCII :). >> >> I'm told the Japanese hate UTF-8, because it expands their characters >> from two bytes to three. >> >> -- hendrik From rcolebur at SCIRES.COM Tue Jun 26 22:22:22 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Tue, 26 Jun 2012 16:22:22 -0400 Subject: [M3devel] EXT Re: AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: I seem to recall that Rodney did some work a while back relating to TEXT. Rodney, can you weigh in on some of this? --Randy Coleburn From: Dragi?a Duri? [mailto:dragisha at m3w.org] Sent: Tuesday, June 26, 2012 12:46 PM To: Jay Cc: m3devel Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! You had idea in other message. Store length! Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! On Jun 26, 2012, at 6:34 PM, Jay wrote: I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Tue Jun 26 23:42:02 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Tue, 26 Jun 2012 16:42:02 -0500 Subject: [M3devel] EXT Re: AND (., 16_ff). Not serious - or so I hope! In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: <4FEA2CAA.4010306@lcwb.coop> On 06/26/2012 03:22 PM, Coleburn, Randy wrote: > I seem to recall that Rodney did some work a while back relating to TEXT. > > Rodney, can you weigh in on some of this? > I wrote a modified implementation of cm3 TEXT. It uses the same data structure and invariants, so any internal values it creates are useable by any existing code that imports the various revelations. It improves performance problems deriving from Cat operations' building trees that actually degenerate into linear lists (fairly likely, as it happens whenever a string is constructed by a left-to-right or right-to-left series of concatenations.) As usual, some operations on some values are slower, but it seems to be a gain overall. I have an extensive test driver and statistics gatherer, which shows good results. However, only tested it on LINUXLIBC6 and AMD64_LINUX, machines I have. Olaf was not comfortable that it was fully tested this way, and I have never taken the time to figure out how to run tests on targets I don't have. I think it is a significant improvement over the stock cm3 TEXT implementation. Whether it is as good as just going back to the pm3 implementation is not so clear. All three implementations correctly (except for possible bugs--none known at present, AFAIK) implement the language's abstract Text interface, and code that only uses Text would see only performance differences. > --Randy Coleburn > > *From:*Dragi?a Duri? [mailto:dragisha at m3w.org] > *Sent:* Tuesday, June 26, 2012 12:46 PM > *To:* Jay > *Cc:* m3devel > *Subject:* EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! > > On Jun 26, 2012, at 6:34 PM, Jay wrote: > > > > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > From hendrik at topoi.pooq.com Wed Jun 27 00:16:39 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 26 Jun 2012 18:16:39 -0400 Subject: [M3devel] TEXT In-Reply-To: <5D286A2C-CA1A-4F8B-846A-CDCBDACA2661@m3w.org> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> <5D286A2C-CA1A-4F8B-846A-CDCBDACA2661@m3w.org> Message-ID: <20120626221639.GA28021@topoi.pooq.com> On Tue, Jun 26, 2012 at 09:53:18PM +0200, Dragi?a Duri? wrote: > Also? If we add length info to TEXT fragments, we might as well add encoding info :). We could do that by letting TEXT have subtypes, depending on the encoding. -- hendrik From rcolebur at SCIRES.COM Wed Jun 27 01:44:26 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Tue, 26 Jun 2012 19:44:26 -0400 Subject: [M3devel] EXT Re: EXT Re: AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <4FEA2CAA.4010306@lcwb.coop> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <4FEA2CAA.4010306@lcwb.coop> Message-ID: I am willing to run tests on platforms that I have, mostly Windows flavors. --Randy Coleburn -----Original Message----- From: Rodney M. Bates [mailto:rodney_bates at lcwb.coop] Sent: Tuesday, June 26, 2012 5:42 PM To: m3devel at elegosoft.com Subject: EXT Re: [M3devel] EXT Re: AND (., 16_ff). Not serious - or so I hope! On 06/26/2012 03:22 PM, Coleburn, Randy wrote: > I seem to recall that Rodney did some work a while back relating to TEXT. > > Rodney, can you weigh in on some of this? > I wrote a modified implementation of cm3 TEXT. It uses the same data structure and invariants, so any internal values it creates are useable by any existing code that imports the various revelations. It improves performance problems deriving from Cat operations' building trees that actually degenerate into linear lists (fairly likely, as it happens whenever a string is constructed by a left-to-right or right-to-left series of concatenations.) As usual, some operations on some values are slower, but it seems to be a gain overall. I have an extensive test driver and statistics gatherer, which shows good results. However, only tested it on LINUXLIBC6 and AMD64_LINUX, machines I have. Olaf was not comfortable that it was fully tested this way, and I have never taken the time to figure out how to run tests on targets I don't have. I think it is a significant improvement over the stock cm3 TEXT implementation. Whether it is as good as just going back to the pm3 implementation is not so clear. All three implementations correctly (except for possible bugs--none known at present, AFAIK) implement the language's abstract Text interface, and code that only uses Text would see only performance differences. > --Randy Coleburn > > *From:*Dragi?a Duri? [mailto:dragisha at m3w.org] > *Sent:* Tuesday, June 26, 2012 12:46 PM > *To:* Jay > *Cc:* m3devel > *Subject:* EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! > > On Jun 26, 2012, at 6:34 PM, Jay wrote: > > > > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > From dabenavidesd at yahoo.es Wed Jun 27 03:41:33 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 27 Jun 2012 02:41:33 +0100 (BST) Subject: [M3devel] EXT Re: EXT Re: AND (., 16_ff). Not serious - or so I hope! In-Reply-To: Message-ID: <1340761293.52332.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: even if we have non-faulty implementation the problem remains the same, the coding standard is non-uniformly used, but instead use old TEXT with C cross-compiled version seemed the way to win at least in Win flavors. I give that point to Jay, he is absolutely right,? I fear that if we don't do this correctly, we could loss in C compiler intrinsics. Perhaps before all this work continues we need to port this better we won't do real big advances more quickly. Thanks in advance --- El mar, 26/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] EXT Re: EXT Re: AND (., 16_ff). Not serious - or so I hope! Para: "m3devel at elegosoft.com" Fecha: martes, 26 de junio, 2012 18:44 I am willing to run tests on platforms that I have, mostly Windows flavors. --Randy Coleburn -----Original Message----- From: Rodney M. Bates [mailto:rodney_bates at lcwb.coop] Sent: Tuesday, June 26, 2012 5:42 PM To: m3devel at elegosoft.com Subject: EXT Re: [M3devel] EXT Re: AND (., 16_ff). Not serious - or so I hope! On 06/26/2012 03:22 PM, Coleburn, Randy wrote: > I seem to recall that Rodney did some work a while back relating to TEXT. > > Rodney, can you weigh in on some of this? > I wrote a modified implementation of cm3 TEXT.? It uses the same data structure and invariants, so any internal values it creates are useable by any existing code that imports the various revelations.? It improves performance problems deriving from Cat operations' building trees that actually degenerate into linear lists (fairly likely, as it happens whenever a string is constructed by a left-to-right or right-to-left series of concatenations.)? As usual, some operations on some values are slower, but it seems to be a gain overall. I have an extensive test driver and statistics gatherer, which shows good results.? However, only tested it on LINUXLIBC6 and AMD64_LINUX, machines I have.? Olaf was not comfortable that it was fully tested this way, and I have never taken the time to figure out how to run tests on targets I don't have. I think it is a significant improvement over the stock cm3 TEXT implementation. Whether it is as good as just going back to the pm3 implementation is not so clear. All three implementations correctly (except for possible bugs--none known at present, AFAIK) implement the language's abstract Text interface, and code that only uses Text would see only performance differences. > --Randy Coleburn > > *From:*Dragi?a Duri? [mailto:dragisha at m3w.org] > *Sent:* Tuesday, June 26, 2012 12:46 PM > *To:* Jay > *Cc:* m3devel > *Subject:* EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! > > On Jun 26, 2012, at 6:34 PM, Jay wrote: > > > > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Wed Jun 27 03:54:31 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 27 Jun 2012 02:54:31 +0100 (BST) Subject: [M3devel] TEXT In-Reply-To: <20120626221639.GA28021@topoi.pooq.com> Message-ID: <1340762071.63111.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I don't know, but if this would coexist with everything (e.g C) hard to know is whether this will affect the overall performance, sometimes this is like that (for instance CM3 Text), but perhaps if it's just costs in memory then I wish it were like that. Thanks in advance --- El mar, 26/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] TEXT Para: m3devel at elegosoft.com Fecha: martes, 26 de junio, 2012 17:16 On Tue, Jun 26, 2012 at 09:53:18PM +0200, Dragi?a Duri? wrote: > Also? If we add length info to TEXT fragments, we might as well add encoding info :). We could do that by letting TEXT have subtypes, depending on the encoding. -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Wed Jun 27 03:54:57 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 26 Jun 2012 18:54:57 -0700 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> Message-ID: <20120627015457.238041A205B@async.async.caltech.edu> Memory is always potentially a problem!!!! One of the main reasons my group was slow at switching from PM3 to CM3 was because we were processing node names for chip designs as TEXTs. Chip designs tend to be deeply hierarchical and you wind up printing a lot of strings such as a.b.c.d.e.f.g.h to files. That's when you run into problems with Text.Cat. And memory will always be a problem since you are always designing the next generation of computers with the current generation of computers. Also even if memory weren't a problem, speed is always a problem, and speed isn't entirely unrelated to memory. The Text.Hash I was alluding to earlier hashes eight characters per iteration on a 64-bit machine, as long as characters are 8 bits... If you go to 16 bits it'll take at least twice as long. Furthermore if there is more than one way (bit pattern) to represent a single CHAR it becomes difficult to use algorithms that take more than one at a time. Mika "Dirk Muysers" writes: >So let them hate it. Memory is not a problem anymore. > >-------------------------------------------------- >From: "Hendrik Boom" >Sent: Tuesday, June 26, 2012 8:19 PM >To: >Subject: Re: [M3devel]AND (???, 16_ff)??? Not serious - or so I hope! > >> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi??a Duri?? wrote: >>> This piece of code, from TextClass.m3, disturbs me??? a lot. >>> >>> If we are to use WIDECHAR, I think we must be a lot more serious than >>> this. >>> >>> Probably, text pieces are limited to 128 bytes by design, somewhere. >>> But - whose idea was to "narrow" by ignoring everything except 8 LSB's? >>> By mapping set of 2^20 elements to set of 2^8 elements. >>> >>> Probably by someone whose mother tongue is fully writeable with ASCII :). >> >> I'm told the Japanese hate UTF-8, because it expands their characters >> from two bytes to three. >> >> -- hendrik >> From dabenavidesd at yahoo.es Wed Jun 27 04:18:53 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 27 Jun 2012 03:18:53 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <20120627015457.238041A205B@async.async.caltech.edu> Message-ID: <1340763533.57548.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: Well you have created a chicken and egg problem/opportunity you can create your theory and move forward. I guess history has shown that big chunks of memory won't make higher speed execution programs, but distributed machines with less memory. The problem is I could set a theory that explains how computers could actually evolve and so on, based on family of computers, you know like Stack Computers, and it turns out that in reality it doesn't work like that, and by the reality it's not true (also I don't consider the "reality" to be that, I don't think tablets and stuff will be takers of tomorrow as today, it's very very useful, as were Micros in their time but can't come back and do that again, Micros are gone). I don't think or hate devices or people who uses it (perhaps I'm old for that) but this things are mostly used to send messages to set up quickly a web page (an every day task which I still consider for talented people) Frankly we can say many thing sin theory again but just good people and companies can make a standard way of doing things. Thanks in advance --- El mar, 26/6/12, Mika Nystrom escribi?: De: Mika Nystrom Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "Dirk Muysers" CC: m3devel at elegosoft.com Fecha: martes, 26 de junio, 2012 20:54 Memory is always potentially a problem!!!! One of the main reasons my group was slow at switching from PM3 to CM3 was because we were processing node names for chip designs as TEXTs. Chip designs tend to be deeply hierarchical and you wind up printing a lot of strings such as a.b.c.d.e.f.g.h to files. That's when you run into problems with Text.Cat. And memory will always be a problem since you are always designing the next generation of computers with the current generation of computers. Also even if memory weren't a problem, speed is always a problem, and speed isn't entirely unrelated to memory.? The Text.Hash I was alluding to earlier hashes eight characters per iteration on a 64-bit machine, as long as characters are 8 bits...? If you go to 16 bits it'll take at least twice as long.? Furthermore if there is more than one way (bit pattern) to represent a single CHAR it becomes difficult to use algorithms that take more than one at a time. ? ? Mika "Dirk Muysers" writes: >So let them hate it. Memory is not a problem anymore. > >-------------------------------------------------- >From: "Hendrik Boom" >Sent: Tuesday, June 26, 2012 8:19 PM >To: >Subject: Re: [M3devel]AND (?, 16_ff)? Not serious - or so I hope! > >> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi?a Duri? wrote: >>> This piece of code, from TextClass.m3, disturbs me? a lot. >>> >>> If we are to use WIDECHAR, I think we must be a lot more serious than >>> this. >>> >>> Probably, text pieces are limited to 128 bytes by design, somewhere. >>> But - whose idea was to "narrow" by ignoring everything except 8 LSB's? >>> By mapping set of 2^20 elements to set of 2^8 elements. >>> >>> Probably by someone whose mother tongue is fully writeable with ASCII :). >> >> I'm told the Japanese hate UTF-8, because it expands? their characters >> from two bytes to three. >> >> -- hendrik >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Wed Jun 27 05:30:01 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 26 Jun 2012 23:30:01 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: <20120627033000.GB28021@topoi.pooq.com> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: > I seem to recall that Rodney did some work a while back relating to TEXT. > Rodney, can you weigh in on some of this? > --Randy Coleburn > > From: Dragi?a Duri? [mailto:dragisha at m3w.org] > Sent: Tuesday, June 26, 2012 12:46 PM > To: Jay > Cc: m3devel > Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. Most of the time, you don't need explicit integer indexes to character locations. What you do need is an operation that fetches a character given the string and its index (whatever data structure that index is), and one that increments the index past that character. As long as you can save an index and use it later on the same string, that's probably all you ever need. And with a simple TEXT representation (such as the obvious array of bytes containing characters of various widths) a byte index is all you need (note: NOT a character index). It's easy even to use TEXT and its integer indices as the data representation, as long as you use the proper functions parse the characters and increment the indices by amounts that might differ from 1. And if your source code is represented in UTF-8, the representation that requires little extra compiler effort to parse, your TEXT strings will automagically appear in UTF-8. I can see a use for various wide characters -- the things you extract from a TEXT by parsing biits of it, but none for anything really new complicated for wide TEXT. The only confusing thing is that the existing operations for extracting bytes from TEXT have names that suggest they are extracting characters. -- Hendrik From dmuysers at hotmail.com Wed Jun 27 09:58:28 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Wed, 27 Jun 2012 09:58:28 +0200 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <20120627015457.238041A205B@async.async.caltech.edu> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> <20120627015457.238041A205B@async.async.caltech.edu> Message-ID: Some time ago I have started to develop a unicode library based on the old M3 text model but using UTF-8 internally rather than Latin-1 (see README attachement). For reasons best known to me I had to put it on the backburner in favour of more urgent work. If anybody is interested in furthering this solution I would eagerly give the existing (pre-alpha) code away. This being said, there are certainly better hash algorithms than the one used by m3core (eg Goullburn, see http://www.clockandflame.com/media/Goulburn06.pdf). -------------------------------------------------- From: "Mika Nystrom" Sent: Wednesday, June 27, 2012 3:54 AM To: "Dirk Muysers" Cc: Subject: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > Memory is always potentially a problem!!!! > > One of the main reasons my group was slow at switching from PM3 to CM3 > was because we were processing node names for chip designs as TEXTs. > > Chip designs tend to be deeply hierarchical and you wind up printing a > lot of strings such as > > a.b.c.d.e.f.g.h > > to files. > > That's when you run into problems with Text.Cat. > > And memory will always be a problem since you are always designing the > next generation of computers with the current generation of computers. > > Also even if memory weren't a problem, speed is always a problem, and > speed isn't entirely unrelated to memory. The Text.Hash I was alluding > to earlier hashes eight characters per iteration on a 64-bit machine, > as long as characters are 8 bits... If you go to 16 bits it'll take > at least twice as long. Furthermore if there is more than one way > (bit pattern) to represent a single CHAR it becomes difficult to use > algorithms that take more than one at a time. > > Mika > > "Dirk Muysers" writes: >>So let them hate it. Memory is not a problem anymore. >> >>-------------------------------------------------- >>From: "Hendrik Boom" >>Sent: Tuesday, June 26, 2012 8:19 PM >>To: >>Subject: Re: [M3devel]AND (???, 16_ff)??? Not serious - or so I hope! >> >>> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi??a Duri?? wrote: >>>> This piece of code, from TextClass.m3, disturbs me??? a lot. >>>> >>>> If we are to use WIDECHAR, I think we must be a lot more serious than >>>> this. >>>> >>>> Probably, text pieces are limited to 128 bytes by design, somewhere. >>>> But - whose idea was to "narrow" by ignoring everything except 8 LSB's? >>>> By mapping set of 2^20 elements to set of 2^8 elements. >>>> >>>> Probably by someone whose mother tongue is fully writeable with ASCII >>>> :). >>> >>> I'm told the Japanese hate UTF-8, because it expands their characters >>> from two bytes to three. >>> >>> -- hendrik >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Wed Jun 27 11:52:53 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 27 Jun 2012 11:52:53 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <20120626185008.50E131A205B@async.async.caltech.edu> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> <20120626185008.50E131A205B@async.async.caltech.edu> Message-ID: More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). What we need is to make compler map from input encoding (whatever user chooses or is choosen for him) to internal UTF8. On Jun 26, 2012, at 8:50 PM, Mika Nystrom wrote: > > As far as I know, SRC M3 and PM3 come with a TEXT implementation that > works exactly as described below. An extra byte is used at the end with > a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. > > One of the big advantages of the old version is that Text.Hash is really, > really fast. Especially on Alphas... it's hugely more expensive to > have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under > CM3 than under the old compilers and runtimes. We're talking a factor > of five or so in speed since the Table routines are generally entirely > dominated by Text.Hash. > > Mika > > Hendrik Boom writes: >> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: >>> >>> Somewhat but not fully. Text.Length should fetch a stored length. As >>> I'm sure it already does.That length should always be correctly >>> maintained. Same as today.Adding one extra nul at the end doesn't >>> invalidate the data.std::string has the same properties -- c_str() can >>> on-demand append a terminal nul,but there could also be one in the >>> string itself.I understand it is a bit wierd. Maintaining a terminal >>> nul does add cost that might be wasted.And reduces the capacity by >>> one.It could be on-demand, I guess. - Jay >> >> Don't need the 'on demand'. For the benefits of C interoperability, the >> extra byte is well worth the price. What I'm worrying about is someone >> using an enbedded NUL as an end-of-string marker. I smell more bugs >> creeping in. But I guess bug are inherent in C use, so I'm not >> surprised seeing them in C interoperation. >> >> -- hendrik From jay.krell at cornell.edu Wed Jun 27 12:19:08 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 27 Jun 2012 10:19:08 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com>, , <20120626160005.GA29355@topoi.pooq.com>, <20120626185008.50E131A205B@async.async.caltech.edu>, Message-ID: > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). I don't quite agree.There are two ideal approaches.1) TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR 2) something that can change between them, or possibly store both, but is still mainly flat arraysThat is, once you store a value over 0xFF, the internal represenation changes to flat array of WIDECHAR.Probably it stays that way -- you don't want to thrash back and forth in worst case.Lesser evil is probably to stick with wide represenation.Setting the string to empty might bounce it back narrow.Ditto assigning it from another narrow text, maybe. What I don't yet understand in all this is how to efficiently combine thread safety, immutability, and quadratic growth. The following should be as efficient as in typical C++ libraries: VAR a: TEXT;WHILE TRUE DO a := a & " ";END; I kind of thing that immutability and quadratic growth are in conflict.But not because that sounds obvious.Note that typical C++ libraries do have value semantics for std::string and std::vector. - Jay > From: dragisha at m3w.org > Date: Wed, 27 Jun 2012 11:52:53 +0200 > To: mika at async.caltech.edu > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > What we need is to make compler map from input encoding (whatever user chooses or is choosen for him) to internal UTF8. > > On Jun 26, 2012, at 8:50 PM, Mika Nystrom wrote: > > > > > As far as I know, SRC M3 and PM3 come with a TEXT implementation that > > works exactly as described below. An extra byte is used at the end with > > a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. > > > > One of the big advantages of the old version is that Text.Hash is really, > > really fast. Especially on Alphas... it's hugely more expensive to > > have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under > > CM3 than under the old compilers and runtimes. We're talking a factor > > of five or so in speed since the Table routines are generally entirely > > dominated by Text.Hash. > > > > Mika > > > > Hendrik Boom writes: > >> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: > >>> > >>> Somewhat but not fully. Text.Length should fetch a stored length. As > >>> I'm sure it already does.That length should always be correctly > >>> maintained. Same as today.Adding one extra nul at the end doesn't > >>> invalidate the data.std::string has the same properties -- c_str() can > >>> on-demand append a terminal nul,but there could also be one in the > >>> string itself.I understand it is a bit wierd. Maintaining a terminal > >>> nul does add cost that might be wasted.And reduces the capacity by > >>> one.It could be on-demand, I guess. - Jay > >> > >> Don't need the 'on demand'. For the benefits of C interoperability, the > >> extra byte is well worth the price. What I'm worrying about is someone > >> using an enbedded NUL as an end-of-string marker. I smell more bugs > >> creeping in. But I guess bug are inherent in C use, so I'm not > >> surprised seeing them in C interoperation. > >> > >> -- hendrik > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Wed Jun 27 13:14:22 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 27 Jun 2012 13:14:22 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com>, , <20120626160005.GA29355@topoi.pooq.com>, <20120626185008.50E131A205B@async.async.caltech.edu>, Message-ID: <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> On Jun 27, 2012, at 12:19 PM, Jay K wrote: > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > I don't quite agree. > There are two ideal approaches. > 1) > TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) > "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Wed Jun 27 13:26:31 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 27 Jun 2012 13:26:31 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com>, , <20120626160005.GA29355@topoi.pooq.com>, <20120626185008.50E131A205B@async.async.caltech.edu>, <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> Message-ID: <21109700-2223-46D4-A151-DABC7084BDC1@m3w.org> This is one place where insisting on some imagined/future purity (fully compatible withyour argument - thread safety + immutability + non-quadratic performance) will lead to unreasonable fragmentation and de-facto gray area in CM3 and it's usage. I am only one of people here who de-facto uses TEXT's to hold UTF8 content. And while we all think/talk about solution, every single user who needs international characters and wants to use them in sensible way - will go same way. Then, some "proper" CM3 solution comes and what happens? We rewrite everything to support it? Or ignore it? On Jun 27, 2012, at 1:14 PM, Dragi?a Duri? wrote: > > On Jun 27, 2012, at 12:19 PM, Jay K wrote: > >> > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). >> >> I don't quite agree. >> There are two ideal approaches. >> 1) >> TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) >> "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR > > So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Wed Jun 27 13:52:29 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 27 Jun 2012 12:52:29 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <21109700-2223-46D4-A151-DABC7084BDC1@m3w.org> Message-ID: <1340797949.85516.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: In reality it turns out that ASCII is still the suitable and adhered standard for Modula-2 Command control, structured text, formatting in PLCs systems programming. We better when we pick something be clearer, but nevertheless I agree with internationalization as with compatibility, etc >From what I gather TEXT is allowed to be Latin-1 superset Thanks in advance --- El mi?, 27/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Jay K" CC: "m3devel" Fecha: mi?rcoles, 27 de junio, 2012 06:26 This is one place where insisting on some imagined/future purity (fully compatible withyour argument - thread safety + immutability + non-quadratic performance) will lead to unreasonable fragmentation and de-facto gray area in CM3 and it's usage. I am only one of people here who de-facto uses TEXT's to hold UTF8 content. And while we all think/talk about solution, every single user who needs international characters and wants to use them in sensible way - will go same way. Then, some "proper" CM3 solution comes and what happens? We rewrite everything to support it? Or ignore it? On Jun 27, 2012, at 1:14 PM, Dragi?a Duri? wrote: On Jun 27, 2012, at 12:19 PM, Jay K wrote: ?> More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). I don't?quite agree. There are two ideal approaches. 1) ? TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F)?? "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR? So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Wed Jun 27 21:20:41 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 14:20:41 -0500 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120627033000.GB28021@topoi.pooq.com> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> Message-ID: <4FEB5D09.2080601@lcwb.coop> On 06/26/2012 10:30 PM, Hendrik Boom wrote: > On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >> I seem to recall that Rodney did some work a while back relating to TEXT. >> Rodney, can you weigh in on some of this? >> --Randy Coleburn >> >> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >> Sent: Tuesday, June 26, 2012 12:46 PM >> To: Jay >> Cc: m3devel >> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >> >> You had idea in other message. Store length! >> >> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Most of the time, you don't need explicit integer indexes to character > locations. What you do need is an operation that fetches a character > given the string and its index (whatever data structure that index is), > and one that increments the index past that character. As long as you > can save an index and use it later on the same string, that's probably > all you ever need. And with a simple TEXT representation (such as the > obvious array of bytes containing characters of various widths) a byte > index is all you need (note: NOT a character index). It's easy even to > use TEXT and its integer indices as the data representation, as long as > you use the proper functions parse the characters and increment the > indices by amounts that might differ from 1. > > And if your source code is represented in UTF-8, the representation that > requires little extra compiler effort to parse, your TEXT strings will > automagically appear in UTF-8. The original designers of the language and its libraries have given us two different abstractions for handling character strings (in addition to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. Text is highly general and easy to use. Concatentations and substrings are easy. Semantics, to its clients, are value semantics, similar to INTEGER. Random access by *character* number is easy and, hopefully, implemented with efficiency at least better than O(n). Wr and friends restrict you to sequential access, at least mostly, but gain implementation convenience and efficiency as a result. I feel very stongly that we should *not* take away the full generality of Text, especially efficient random access, to handle variable-length character encodings in strings. For these, lets make more friends of Wr and Rd, which already assume sequential access. For example, a filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 interpretation to its bytes, and delivers a stream of Unicode characters, in variables of type WIDECHAR. Text should preserve the abstraction that it's a string of characters, generalized as it already is in cm3, to have type WIDECHAR, so they can be any Unicode character. The internal representation should, usually, not be of concern. Note that nowhere in Text are character values transferred between a Text.T and any form of I/O stream. In the Text abstraction, all characters go in and out of a Text.T in variables of type CHAR, WIDECHAR, and arrays thereof. IO, etc. is only done in streams, e.g, TextWr. We can easily add new variants of these that encode/decode by various rules. Of course, it is still valid to put a string of bytes in a Text.T and apply, e.g., UTF-8 interpretation yourself. But that's lower-level programming, and shouldn't confuse the abstraction. > > I can see a use for various wide characters -- the things you extract > from a TEXT by parsing biits of it, but none for anything > really new complicated for wide TEXT. > > The only confusing thing is that the existing operations for extracting > bytes from TEXT have names that suggest they are extracting characters. > I think it's more than a suggestion. I think the abstraction clearly considers them characters. And it should stay that way. If you want, at a higher level of code, to treat them as bytes, that's fine, but the abstraction continues to view them as characters (which only you, the client, know is not really so.) > -- Hendrik > From rodney_bates at lcwb.coop Wed Jun 27 22:04:59 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 15:04:59 -0500 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com>, , <20120626160005.GA29355@topoi.pooq.com>, <20120626185008.50E131A205B@async.async.caltech.edu>, Message-ID: <4FEB676B.1010505@lcwb.coop> On 06/27/2012 05:19 AM, Jay K wrote: > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > I don't quite agree. > There are two ideal approaches. > 1) > TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) > "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR > > > 2) something that can change between them, or possibly store both, but is still mainly flat arrays > That is, once you store a value over 0xFF, the internal represenation changes to flat array of WIDECHAR. > Probably it stays that way -- you don't want to thrash back and forth in worst case. > Lesser evil is probably to stick with wide represenation. > Setting the string to empty might bounce it back narrow. > Ditto assigning it from another narrow text, maybe. > This is similar to what the cm3 modification of Text does now. The details of what goes on inside the implementation are a bit different than you describe. There can be mixtures of 8-bit string fragments and 16-bit string fragments, plus other stuff hooking them together. But the abstraction works just like this. > > > What I don't yet understand in all this is how to efficiently combine thread safety, immutability, and quadratic growth. > > The following should be as efficient as in typical C++ libraries: > > > VAR a: TEXT; VAR a: TEXT:= " "; > WHILE TRUE DO > a := a & " "; > END; > In pm3 Text, this will take quadratic time and linear space. The partial strings will be garbage collected, as no copies of the pointers to them are made. GetChar is then O(1). In cm3 Text, this is linear in both time and space, but the space usage has a much higher constant factor than in pm3. In pm3, the asymptotic space used is exactly what the characters themselves require, i.e, one byte per character. For cm3, I count 21 native words per character, plus fragmentation loss for 3 separate heap objects per character. That's 84 times or 168 times, depending on word size. Well, lots of people keep saying RAM is virtually free these days. I guess we really need to hope they are right. GetChar is O(n) when the string is built linearly like this. Best case is O(log n) when built by Cats of single characters. My modification of cm3 Text lies between these. It flattens strings up to a point, then does some imperfect balancing of them higher in trees. Frankly, I think I like going back to the pm3 implementation best. > > I kind of thing that immutability and quadratic growth are in conflict. They are, to a considerable extent, as with all functional-style data structures. But more sophisticated (i.e., complicated) implementations can mitigate somewhat. > But not because that sounds obvious. > Note that typical C++ libraries do have value semantics for std::string and std::vector. > > > - Jay > > > > From: dragisha at m3w.org > > Date: Wed, 27 Jun 2012 11:52:53 +0200 > > To: mika at async.caltech.edu > > CC: m3devel at elegosoft.com > > Subject: Re: [M3devel] Windows, Unicode file names > > > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > > > What we need is to make compler map from input encoding (whatever user chooses or is choosen for him) to internal UTF8. > > > > On Jun 26, 2012, at 8:50 PM, Mika Nystrom wrote: > > > > > > > > As far as I know, SRC M3 and PM3 come with a TEXT implementation that > > > works exactly as described below. An extra byte is used at the end with > > > a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. > > > > > > One of the big advantages of the old version is that Text.Hash is really, > > > really fast. Especially on Alphas... it's hugely more expensive to > > > have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under > > > CM3 than under the old compilers and runtimes. We're talking a factor > > > of five or so in speed since the Table routines are generally entirely > > > dominated by Text.Hash. > > > > > > Mika > > > > > > Hendrik Boom writes: > > >> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: > > >>> > > >>> Somewhat but not fully. Text.Length should fetch a stored length. As > > >>> I'm sure it already does.That length should always be correctly > > >>> maintained. Same as today.Adding one extra nul at the end doesn't > > >>> invalidate the data.std::string has the same properties -- c_str() can > > >>> on-demand append a terminal nul,but there could also be one in the > > >>> string itself.I understand it is a bit wierd. Maintaining a terminal > > >>> nul does add cost that might be wasted.And reduces the capacity by > > >>> one.It could be on-demand, I guess. - Jay > > >> > > >> Don't need the 'on demand'. For the benefits of C interoperability, the > > >> extra byte is well worth the price. What I'm worrying about is someone > > >> using an enbedded NUL as an end-of-string marker. I smell more bugs > > >> creeping in. But I guess bug are inherent in C use, so I'm not > > >> surprised seeing them in C interoperation. > > >> > > >> -- hendrik > > From rodney_bates at lcwb.coop Wed Jun 27 22:10:42 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 15:10:42 -0500 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> Message-ID: <4FEB68C2.7000202@lcwb.coop> Yes, this is a disturbing quirk, and quite out of character with the nature of Modula-3. It would be consistent to say that CHAR<:WIDECHAR, and apply the usual assignability rules. That would make this a runtime range error. On 06/26/2012 05:18 AM, Dragi?a Duri? wrote: > This piece of code, from TextClass.m3, disturbs me? a lot. > > If we are to use WIDECHAR, I think we must be a lot more serious than this. > > Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. > > Probably by someone whose mother tongue is fully writeable with ASCII :). > > ==== > PROCEDURE GetChars (t: TEXT; VAR a: ARRAY OF CHAR; start: CARDINAL) = > VAR > info : Info; > cnt : INTEGER; > next : CARDINAL := 0; > buf : ARRAY [0..127] OF WIDECHAR; > BEGIN > t.get_info (info); > cnt := MIN (NUMBER (a), info.length - start); > WHILE (cnt> 0) DO > t.get_wide_chars (buf, start); > FOR i := FIRST (buf) TO LAST (buf) DO > IF (cnt = 0) THEN RETURN END; > a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); > INC (next); DEC (cnt); > END; > INC (start, NUMBER (buf)); > END; > END GetChars; > ==== > > From rodney_bates at lcwb.coop Wed Jun 27 22:27:29 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 15:27:29 -0500 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> Message-ID: <4FEB6CB1.3070509@lcwb.coop> On 06/26/2012 11:34 AM, Jay wrote: > >>> > 128 limit >>> >>> I haven't read the code enough yet to verify that but you are probably right >> >> I was not right :), that call is incremental. > > I looked for that aspect too but missed it. :( > > >>> > ignoring everything over 16_FF >>> >>> Probably that is the responsibility/claim of the caller of GetChars. >>> If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. >>> Perhaps raising an exception would be reasonable to signal the loss of data. Or something. >>> There is HasWideChars for you to check. >> >> >>> >>> There is no encoding implied remember. >>> This isn't UTF8 data. >> >> It is not, but probably only way to solve this without exception is to make UTF8 "official" 8bit encoding :) >> > > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > TEXT is well abstracted and can be widened, with the exception that truncating characters to 8 bits to return them in a CHAR is wrong. It should be a checked runtime error, and this should be documented. Note that while we have two types CHAR and WIDECHAR for scalars (and can also have arrays thereof), there is still only one type TEXT. Conceptually, it should be viewed as holding strings of WIDECHAR, with some convenience functions for putting CHARs into and getting them out of a TEXT, when the programmer knows the value is in this range. The fact that our implementation stores some values in fields of type CHAR is a hidden implementation detail. There is nothing in the abstraction that requires it to be done this way, or enables clients to know that. We do have two kinds of text literals, conventional and wide. They differ only in how the value is specified, and the ability to specify characters outside of CHAR. > > Java is in-between but also simple & predictable -- there being no narrow option other than array of byte, which is reasonable. > > > - Jay From rodney_bates at lcwb.coop Thu Jun 28 04:12:26 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 21:12:26 -0500 Subject: [M3devel] UTF-8 TEXT In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> Message-ID: <4FEBBD8A.5020206@lcwb.coop> On 06/27/2012 07:32 PM, Antony Hosking wrote: > So what do we do about 6-byte UTF-8 code points? They won't fit in WIDECHAR. Surely we should allow accessing a UTF-8 character as a CARDINAL and be done with it? > Absolutely. Except I think a better way is to make WIDECHAR big enough to hold all of Unicode. > Sent from my iPad > > On Jun 27, 2012, at 3:20 PM, "Rodney M. Bates" wrote: > >> >> >> On 06/26/2012 10:30 PM, Hendrik Boom wrote: >>> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >>>> I seem to recall that Rodney did some work a while back relating to TEXT. >>>> Rodney, can you weigh in on some of this? >>>> --Randy Coleburn >>>> >>>> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >>>> Sent: Tuesday, June 26, 2012 12:46 PM >>>> To: Jay >>>> Cc: m3devel >>>> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >>>> >>>> You had idea in other message. Store length! >>>> >>>> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. >>> >>> Most of the time, you don't need explicit integer indexes to character >>> locations. What you do need is an operation that fetches a character >>> given the string and its index (whatever data structure that index is), >>> and one that increments the index past that character. As long as you >>> can save an index and use it later on the same string, that's probably >>> all you ever need. And with a simple TEXT representation (such as the >>> obvious array of bytes containing characters of various widths) a byte >>> index is all you need (note: NOT a character index). It's easy even to >>> use TEXT and its integer indices as the data representation, as long as >>> you use the proper functions parse the characters and increment the >>> indices by amounts that might differ from 1. >>> >>> And if your source code is represented in UTF-8, the representation that >>> requires little extra compiler effort to parse, your TEXT strings will >>> automagically appear in UTF-8. >> >> The original designers of the language and its libraries have given us >> two different abstractions for handling character strings (in addition >> to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. >> >> Text is highly general and easy to use. Concatentations and substrings >> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). >> >> Wr and friends restrict you to sequential access, at least mostly, but >> gain implementation convenience and efficiency as a result. >> >> I feel very stongly that we should *not* take away the full generality >> of Text, especially efficient random access, to handle variable-length >> character encodings in strings. For these, lets make more friends of >> Wr and Rd, which already assume sequential access. For example, a >> filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 >> interpretation to its bytes, and delivers a stream of Unicode characters, >> in variables of type WIDECHAR. >> >> Text should preserve the abstraction that it's a string of characters, >> generalized as it already is in cm3, to have type WIDECHAR, so they can be any >> Unicode character. The internal representation should, usually, not be >> of concern. >> >> Note that nowhere in Text are character values transferred between >> a Text.T and any form of I/O stream. In the Text abstraction, all >> characters go in and out of a Text.T in variables of type CHAR, >> WIDECHAR, and arrays thereof. IO, etc. is only done in streams, >> e.g, TextWr. We can easily add new variants of these that encode/decode >> by various rules. >> >> Of course, it is still valid to put a string of bytes in a Text.T and >> apply, e.g., UTF-8 interpretation yourself. But that's lower-level >> programming, and shouldn't confuse the abstraction. >> >>> >>> I can see a use for various wide characters -- the things you extract >>> from a TEXT by parsing biits of it, but none for anything >>> really new complicated for wide TEXT. >>> >>> The only confusing thing is that the existing operations for extracting >>> bytes from TEXT have names that suggest they are extracting characters. >>> >> >> I think it's more than a suggestion. I think the abstraction clearly >> considers them characters. And it should stay that way. If you want, >> at a higher level of code, to treat them as bytes, that's fine, but the >> abstraction continues to view them as characters (which only you, the >> client, know is not really so.) >> >>> -- Hendrik >>> > From jay.krell at cornell.edu Thu Jun 28 07:31:04 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 28 Jun 2012 05:31:04 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <4FEB676B.1010505@lcwb.coop> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , , , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , , , <20120625203422.GA24287@topoi.pooq.com>, , , , <20120626160005.GA29355@topoi.pooq.com>, , <20120626185008.50E131A205B@async.async.caltech.edu>, , , , <4FEB676B.1010505@lcwb.coop> Message-ID: ? > Random access by *character* number is easy and, hopefully, implemented ?> with efficiency at least better than O(n). ? Random access by "something, not 'character'" should be O(1). > > I kind of thing that immutability and quadratic growth are in conflict. > > They are, to a considerable extent, as with all functional-style data structures. > But more sophisticated (i.e., complicated) implementations can mitigate somewhat. I'm hoping we can win here somehow. In Java and C# they solve this by having, in a sense, two string types. constant "string"s an mutable "StringBuffer"s Strings never grow. They are always flat. StringBuffers grow quadratically. They are always flat. They are mutable. I suspect we need do something similar. Somehow. As I understand, C# and Java do expose string concatenation. As I understand, they are similar to Modula-3 here, in that the compiler knows about string concatenation and rewrites the code somewhat. Thinking about it further, I suspect my example also can't/doesn't run performantly in Java or C# either. Hopefully we can come up with some good solution to this. I have to run. ?- Jay ---------------------------------------- > Date: Wed, 27 Jun 2012 15:04:59 -0500 > From: rodney_bates at lcwb.coop > To: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > > > On 06/27/2012 05:19 AM, Jay K wrote: > > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > > > I don't quite agree. > > There are two ideal approaches. > > 1) > > TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) > > "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR > > > > > > 2) something that can change between them, or possibly store both, but is still mainly flat arrays > > That is, once you store a value over 0xFF, the internal represenation changes to flat array of WIDECHAR. > > Probably it stays that way -- you don't want to thrash back and forth in worst case. > > Lesser evil is probably to stick with wide represenation. > > Setting the string to empty might bounce it back narrow. > > Ditto assigning it from another narrow text, maybe. > > > > This is similar to what the cm3 modification of Text does now. The details of > what goes on inside the implementation are a bit different than you describe. > There can be mixtures of 8-bit string fragments and 16-bit string fragments, plus > other stuff hooking them together. But the abstraction works just like this. > > > > > > > What I don't yet understand in all this is how to efficiently combine thread safety, immutability, and quadratic growth. > > > > The following should be as efficient as in typical C++ libraries: > > > > > > VAR a: TEXT; > > VAR a: TEXT:= " "; > > > WHILE TRUE DO > > a := a & " "; > > END; > > > > In pm3 Text, this will take quadratic time and linear space. The partial > strings will be garbage collected, as no copies of the pointers to them > are made. GetChar is then O(1). > > In cm3 Text, this is linear in both time and space, but the space usage > has a much higher constant factor than in pm3. In pm3, the asymptotic space used > is exactly what the characters themselves require, i.e, one byte per character. > For cm3, I count 21 native words per character, plus fragmentation loss > for 3 separate heap objects per character. That's 84 times or 168 times, > depending on word size. Well, lots of people keep saying RAM is virtually > free these days. I guess we really need to hope they are right. > GetChar is O(n) when the string is built linearly like this. > Best case is O(log n) when built by Cats of single characters. > > My modification of cm3 Text lies between these. It flattens strings > up to a point, then does some imperfect balancing of them higher in trees. > > Frankly, I think I like going back to the pm3 implementation best. > > > > > I kind of thing that immutability and quadratic growth are in conflict. > > They are, to a considerable extent, as with all functional-style data structures. > But more sophisticated (i.e., complicated) implementations can mitigate somewhat. > > > But not because that sounds obvious. > > Note that typical C++ libraries do have value semantics for std::string and std::vector. > > > > > > - Jay > > > > > > > From: dragisha at m3w.org > > > Date: Wed, 27 Jun 2012 11:52:53 +0200 > > > To: mika at async.caltech.edu > > > CC: m3devel at elegosoft.com > > > Subject: Re: [M3devel] Windows, Unicode file names > > > > > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > > > > > What we need is to make compler map from input encoding (whatever user chooses or is choosen for him) to internal UTF8. > > > > > > On Jun 26, 2012, at 8:50 PM, Mika Nystrom wrote: > > > > > > > > > > > As far as I know, SRC M3 and PM3 come with a TEXT implementation that > > > > works exactly as described below. An extra byte is used at the end with > > > > a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. > > > > > > > > One of the big advantages of the old version is that Text.Hash is really, > > > > really fast. Especially on Alphas... it's hugely more expensive to > > > > have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under > > > > CM3 than under the old compilers and runtimes. We're talking a factor > > > > of five or so in speed since the Table routines are generally entirely > > > > dominated by Text.Hash. > > > > > > > > Mika > > > > > > > > Hendrik Boom writes: > > > >> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: > > > >>> > > > >>> Somewhat but not fully. Text.Length should fetch a stored length. As > > > >>> I'm sure it already does.That length should always be correctly > > > >>> maintained. Same as today.Adding one extra nul at the end doesn't > > > >>> invalidate the data.std::string has the same properties -- c_str() can > > > >>> on-demand append a terminal nul,but there could also be one in the > > > >>> string itself.I understand it is a bit wierd. Maintaining a terminal > > > >>> nul does add cost that might be wasted.And reduces the capacity by > > > >>> one.It could be on-demand, I guess. - Jay > > > >> > > > >> Don't need the 'on demand'. For the benefits of C interoperability, the > > > >> extra byte is well worth the price. What I'm worrying about is someone > > > >> using an enbedded NUL as an end-of-string marker. I smell more bugs > > > >> creeping in. But I guess bug are inherent in C use, so I'm not > > > >> surprised seeing them in C interoperation. > > > >> > > > >> -- hendrik > > > From hendrik at topoi.pooq.com Thu Jun 28 14:37:56 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 28 Jun 2012 08:37:56 -0400 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> <20120626185008.50E131A205B@async.async.caltech.edu> <4FEB676B.1010505@lcwb.coop> Message-ID: <20120628123756.GA2279@topoi.pooq.com> On Thu, Jun 28, 2012 at 05:31:04AM +0000, Jay K wrote: > > ? > Random access by *character* number is easy and, hopefully, implemented > ?> with efficiency at least better than O(n). > ? > > Random access by "something, not 'character'" should be O(1). Quite agree. There shoule be a fetch-byte operation, and a fetch-characcter operation. Fetch-character should return a character and the index to the next character. > > > > > I kind of thing that immutability and quadratic growth are in conflict. > > > > They are, to a considerable extent, as with all functional-style data structures. > > But more sophisticated (i.e., complicated) implementations can mitigate somewhat. > > I'm hoping we can win here somehow. > In Java and C# they solve this by having, in a sense, two string types. > constant "string"s > an mutable "StringBuffer"s > Strings never grow. They are always flat. > StringBuffers grow quadratically. They are always flat. They are mutable. > > I suspect we need do something similar. Somehow. > > As I understand, C# and Java do expose string concatenation. > As I understand, they are similar to Modula-3 here, in that the compiler knows > about string concatenation and rewrites the code somewhat. > Thinking about it further, I suspect my example also can't/doesn't run performantly in Java or C# either. > > > Hopefully we can come up with some good solution to this. > I have to run. Initially, create a string as a simple array of bytes. Then, when we start concatenating, use a cm3-like representation. (we could delay this until our string gets a little long, or until a pointer to it gets copied. Maintian that as long as we're still concatenating. We might try balancing the tree somewhat if it gets biggish. But as soon as we start indexing or hashing, or anything like that, we can change representation to the simple array of byte. Usually at that point we're finished concatenating. -- hendrik > > > ?- Jay From hendrik at topoi.pooq.com Thu Jun 28 14:44:46 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 28 Jun 2012 08:44:46 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <4FEB5D09.2080601@lcwb.coop> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> Message-ID: <20120628124446.GB2279@topoi.pooq.com> On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: > > Text is highly general and easy to use. Concatentations and substrings > are easy. Semantics, to its clients, are value semantics, similar to INTEGER. > Random access by *character* number is easy and, hopefully, implemented > with efficiency at least better than O(n). Does it have to be a *character* number we use to index a string? I don't know of any situations where that aspect is importnat enough to force everyone to waste storage on it. -- hendrik From dragisha at m3w.org Thu Jun 28 14:48:38 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 28 Jun 2012 14:48:38 +0200 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120628124446.GB2279@topoi.pooq.com> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> <20120628124446.GB2279@topoi.pooq.com> Message-ID: <01C11478-BAEA-4BAC-8ECE-FA5A28933A44@m3w.org> glyph sounds better, I agree! :) On Jun 28, 2012, at 2:44 PM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: >> >> Text is highly general and easy to use. Concatentations and substrings >> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). > > Does it have to be a *character* number we use to index a string? I > don't know of any situations where that aspect is importnat enough > to force everyone to waste storage on it. > > -- hendrik From hendrik at topoi.pooq.com Thu Jun 28 14:51:03 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 28 Jun 2012 08:51:03 -0400 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> References: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> <20120626185008.50E131A205B@async.async.caltech.edu> <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> Message-ID: <20120628125103.GC2279@topoi.pooq.com> On Wed, Jun 27, 2012 at 01:14:22PM +0200, Dragi?a Duri? wrote: > > On Jun 27, 2012, at 12:19 PM, Jay K wrote: > > > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > > > I don't quite agree. > > There are two ideal approaches. > > 1) > > TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) > > "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR > > So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? I'm starting to discover that a lot of my English documents have nonAscii chracters in them. In particular, the separate open and close quotation marks around quoted speech take more than one byte in Unicode. True, in a starvation-level character set, they are both represented as " , but that's really not what they are. -- hendrik From dabenavidesd at yahoo.es Thu Jun 28 15:51:59 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 28 Jun 2012 14:51:59 +0100 (BST) Subject: [M3devel] UTF-8 TEXT In-Reply-To: <01C11478-BAEA-4BAC-8ECE-FA5A28933A44@m3w.org> Message-ID: <1340891519.52552.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: it can't be used like that (as in DEC-SRC early versions) because TEXT is opaque type, you can't reveal it like that at that level. Anyway whoever needs characteristics of JVM or so languages must know that the likes are very inefficient, and C is pretty nit but also very much UNSAFE, perhaps if somebody wants that maybe should use integer strings to map directly from hardware every symbol of the computer (but support every kind of format seems over-complex in space and time) Operating system normally doesn't handle I/O in many cases, but the I/O subsystem (like Windows I/O) and in some cases it takes advantages of not waiting for a thread to return control over the app. There are many computers that use non-ASCII terms but normally they support that script, so why put more weight on it? Maybe I should ask either lexicographers or specific language users to know what they need for they in a comprehensible manner and not hard coded standards many don't use still. Thanks in advances --- El jue, 28/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] UTF-8 TEXT Para: "Hendrik Boom" CC: m3devel at elegosoft.com Fecha: jueves, 28 de junio, 2012 07:48 glyph sounds better, I agree! :) On Jun 28, 2012, at 2:44 PM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: >> >> Text is highly general and easy to use.? Concatentations and substrings >> are easy.? Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). > > Does it have to be a *character* number we use to index a string?? I > don't know of any situations where that aspect is importnat enough > to force everyone to waste storage on it. > > -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Thu Jun 28 16:10:02 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Thu, 28 Jun 2012 09:10:02 -0500 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120628124446.GB2279@topoi.pooq.com> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> <20120628124446.GB2279@topoi.pooq.com> Message-ID: <4FEC65BA.9080809@lcwb.coop> On 06/28/2012 07:44 AM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: >> >> Text is highly general and easy to use. Concatentations and substrings >> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). > > Does it have to be a *character* number we use to index a string? I > don't know of any situations where that aspect is importnat enough > to force everyone to waste storage on it. > > -- hendrik > It is absolutely essential that it be a character, if you care about Text being a meaningful abstraction. A byte index is a very low level view, now that we have a variable-length encoding, and *especially* now that there are multiple possible ways of representing strings. strings. When it was only ASCII (or ISO-latin1), it was a character index, and the abstraction was there. The fact that it was also a byte index is a coincidental consequence of the choice of underlying physical representation. Now we have a much messier situation regarding representations, but we should not destroy the abstraction and force everyone to always get down into the bowels of the different representations. There will still be mechanisms for low-level coding if you have some compelling reason, or just don't want to rewrite something existing. But let's protect the option of dealing with characters with the same abstraction we have had in the past. From dabenavidesd at yahoo.es Thu Jun 28 17:18:31 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 28 Jun 2012 16:18:31 +0100 (BST) Subject: [M3devel] UTF-8 TEXT In-Reply-To: <4FEC65BA.9080809@lcwb.coop> Message-ID: <1340896711.88750.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: string class is? a super-set by definition of CHARs scripts, note TEXT a primitive type, so it can have every string characteristics. Thus we don't need any other non-primitive TEXT types The need for other TEXT isn't a matter, as it can add or have any characters but put burden of choice to the implementation. WIDECHARs aren't at all needed by Modula-3 at all, but to keep copying CHARs in other is not in my view more string formats is the real advantage to speed up implementation to get two CHARs strings. So I agree in that we must look the performance burden in citing implementations, for instance keep compatibility without loosing special performance. My view is that we need to re implement that in m3core, in either C, for instance or some safe subset of Modula-3 to speed up a little, for instance DEC-SRC, etc, or a subset of SPIN-M3 (somethings I like). But this is more stuff to do, fun certainly, but I would want to concentrate in supporting either that by OS definition, or by accessing hardware (who cares using C RT for Linux, but if we can be faster let's do it in whatever it takes). In the end we can provide better interfaces to develop current OS than they provide to us, so what then it matters if we offer some code to Linux if at all, interested. Greg Nelson told that Rd/Wr are a very nice piece of string type unappreciated by most of the current mainstream languages. Thanks in advance Thanks in advance ? --- El jue, 28/6/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] UTF-8 TEXT Para: m3devel at elegosoft.com Fecha: jueves, 28 de junio, 2012 09:10 On 06/28/2012 07:44 AM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: >> >> Text is highly general and easy to use.? Concatentations and substrings >> are easy.? Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). > > Does it have to be a *character* number we use to index a string?? I > don't know of any situations where that aspect is importnat enough > to force everyone to waste storage on it. > > -- hendrik > It is absolutely essential that it be a character, if you care about Text being a meaningful abstraction.? A byte index is a very low level view, now that we have a variable-length encoding, and *especially* now that there are multiple possible ways of representing strings. strings. When it was only ASCII (or ISO-latin1), it was a character index, and the abstraction was there.? The fact that it was also a byte index is a coincidental consequence of the choice of underlying physical representation.? Now we have a much messier situation regarding representations, but we should not destroy the abstraction and force everyone to always get down into the bowels of the different representations. There will still be mechanisms for low-level coding if you have some compelling reason, or just don't want to rewrite something existing. But let's protect the option of dealing with characters with the same abstraction we have had in the past. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Thu Jun 28 19:02:30 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 28 Jun 2012 13:02:30 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <4FEC65BA.9080809@lcwb.coop> References: <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> <20120628124446.GB2279@topoi.pooq.com> <4FEC65BA.9080809@lcwb.coop> Message-ID: <20120628170230.GD2279@topoi.pooq.com> On Thu, Jun 28, 2012 at 09:10:02AM -0500, Rodney M. Bates wrote: > > > On 06/28/2012 07:44 AM, Hendrik Boom wrote: > >On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: > >> > >>Text is highly general and easy to use. Concatentations and substrings > >>are easy. Semantics, to its clients, are value semantics, similar to INTEGER. > >>Random access by *character* number is easy and, hopefully, implemented > >>with efficiency at least better than O(n). > > > >Does it have to be a *character* number we use to index a string? I > >don't know of any situations where that aspect is importnat enough > >to force everyone to waste storage on it. > > > >-- hendrik > > > > It is absolutely essential that it be a character, if you care about > Text being a meaningful abstraction. A byte index is a very low level > view, now that we have a variable-length encoding, and *especially* > now that there are multiple possible ways of representing strings. > strings. I'm not arguing whether the index should point to a character. I'm questioning whether it need be a count of characters. This is surely a matter of data representation rather then concept. A character index could be implemented in a variety of ways. It certainly could be implemented as a character count, presumably for legacy applications with attendant costs. It could be implemented as a byte count if the string were implemented as an array of bytes. It could be implemented as a machine address, constrained to index into a particular string. It could be implemented as a pointer into a linked list of string pieces, together with an offset indicating where in that piece it currently points. We could even implement byte and character counts in the more exotic TEXT data structures if we chose; we have freedom of representation of TEXT without compromising integer. We can implement *both* character extractors using an INTEGER *byte* count AND character extractors using an INTEGER *character* count. And we can do this in just about any representation of TEXT we come up with. THe specification for the abstraction doesn't even have to say that it'a a byte count. It's sufficient to say one can use an index that is chosen for implementation efficiency. Though it's tempting to provide a byte count for an operation that extracts bytes, not characters. Now that would be a low-level operation that does break the abstraction. -- hendrik > > When it was only ASCII (or ISO-latin1), it was a character > index, and the abstraction was there. The fact that it was also a > byte index is a coincidental consequence of the choice of underlying > physical representation. Now we have a much messier situation regarding > representations, but we should not destroy the abstraction and force > everyone to always get down into the bowels of the different representations. > > There will still be mechanisms for low-level coding if you have some > compelling reason, or just don't want to rewrite something existing. > But let's protect the option of dealing with characters with the same > abstraction we have had in the past. Yes, it was obviously a mistake for Modula 3 not to distringuish between two types for character and byte. And it's not the only language to have have made that mistake. There's two different abstractions here, with different meanings, but they share one name and one implementation. Frankly, I don't care which of the two retains the name CHAR. It's all the same to me whether (a) characters are called WIDECHAR and bytes CHAR or (b) characters are called CHAR and bytes BYTE. because either way proograms are going to have to be changed to adapt to the new world. (a) is probably less disruptive to legacy programs that olny evver need to deal with legacy ASCII files. (b) is probably conceptually cleaner. What's important is that both mechanisms remain available for dealing with values of type TEXT. The designers of Modula 3 have done an admirable job of providing a collection of abstractions that enable both conceptually clean and efficient implementations. Let's not mess it up by providing only a conceptually clean, inefficient interface. -- hendrik From dragisha at m3w.org Thu Jun 28 19:19:48 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 28 Jun 2012 19:19:48 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <20120628125103.GC2279@topoi.pooq.com> References: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> <20120626185008.50E131A205B@async.async.caltech.edu> <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> <20120628125103.GC2279@topoi.pooq.com> Message-ID: My language (Serbian) is written with two alphabets. Before ISO-8859-2 we used ten (yes, 10) different encodings to represent our alphabet(s) with 8 bits. With ISO-8859-2 we got solution for Latin alphabet, but we had to use ISO-8859-5 for Cyrillic. One of our ten encodings (national standard come late) covered both Latin and Cyrillic in 8 bit. Back in 1991-2 I implemented system for handling above mentioned ten encodings. After that experience, an after decade or so of using/fighting ten encodings, you can trust me - even a notion of having single encoding for all language needs is a lifesaver :). That is where my oversensitivity to idea of having two ways to interpret strings comes from. Two ways, just because we can? Ok, we can use two, we can use ten, we can use fifty encodings!! But sensible way is to use one, if possible. And it is possible! It is called UTF-8. On Jun 28, 2012, at 2:51 PM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 01:14:22PM +0200, Dragi?a Duri? wrote: >> >> On Jun 27, 2012, at 12:19 PM, Jay K wrote: >> >>>> More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). >>> >>> I don't quite agree. >>> There are two ideal approaches. >>> 1) >>> TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) >>> "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR >> >> So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? > > I'm starting to discover that a lot of my English documents have > nonAscii chracters in them. In particular, the separate open and close > quotation marks around quoted speech take more than one byte in > Unicode. True, in a starvation-level character set, they are both > represented as " , but that's really not what they are. > > -- hendrik From rcolebur at SCIRES.COM Fri Jun 29 01:35:29 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Thu, 28 Jun 2012 19:35:29 -0400 Subject: [M3devel] EXT Re: UTF-8 TEXT In-Reply-To: <4FEB5D09.2080601@lcwb.coop> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> Message-ID: ... > I feel very stongly that we should *not* take away the full generality of Text, > especially efficient random access, to handle variable-length character > encodings in strings. For these, lets make more friends of Wr and Rd, which > already assume sequential access. For example, a filter pipe that sequentially > reads a Text/Array/stream, applies a UTF-8 interpretation to its bytes, and > delivers a stream of Unicode characters, in variables of type WIDECHAR. > > Text should preserve the abstraction that it's a string of characters, > generalized as it already is in cm3, to have type WIDECHAR, so they can be any > Unicode character. The internal representation should, usually, not be of concern. ... I concur with Rodney. We need to hold true to the design tenants of the language and keep the full generality of Text with efficient random access, and add new variants of the Rd/Wr/etc. abstractions that deal with the various variable-length character encodings as sequential-access streams. --Randy Coleburn From dabenavidesd at yahoo.es Fri Jun 29 02:21:19 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 29 Jun 2012 01:21:19 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: Message-ID: <1340929279.13051.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: in fact CM had the idea of rewriting the Modula-3 language definition in terms of UTF standard, but it never came out, perhaps we will need to maintain two definitions one SPwM3 and two newer CM style, and based on those standards make a front end who can write to the two kind of standards and make them interoperable. One way of promoting CM3 could be talk about a renewed Modula-3, JVM-enabled, etc, system applications (alike Win32, Unix), where as DEC-SRC Modula-3 for research and development with parallelized environment like research system for open AAA compiler (I don't many others writing parallel compilers) with ESC, Vesta, etc. Thanks in advance --- El jue, 28/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Hendrik Boom" CC: m3devel at elegosoft.com Fecha: jueves, 28 de junio, 2012 12:19 My language (Serbian) is written with two alphabets. Before ISO-8859-2 we used ten (yes, 10) different encodings to represent our alphabet(s) with 8 bits. With ISO-8859-2 we got solution for Latin alphabet, but we had to use ISO-8859-5 for Cyrillic. One of our ten encodings (national standard come late) covered both Latin and Cyrillic in 8 bit. Back in 1991-2 I implemented system for handling above mentioned ten encodings. After that experience, an after decade or so of using/fighting ten encodings, you can trust me - even a notion of having single encoding for all language needs is a lifesaver :). That is where my oversensitivity to idea of having two ways to interpret strings comes from. Two ways, just because we can? Ok, we can use two, we can use ten, we can use fifty encodings!! But sensible way is to use one, if possible. And it is possible! It is called UTF-8. On Jun 28, 2012, at 2:51 PM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 01:14:22PM +0200, Dragi?a Duri? wrote: >> >> On Jun 27, 2012, at 12:19 PM, Jay K wrote: >> >>>> More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). >>> >>> I don't quite agree. >>> There are two ideal approaches. >>> 1) >>>? TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) >>>? "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR >> >> So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? > > I'm? starting to discover that a lot of my English documents have > nonAscii chracters in them.? In particular, the separate open and close > quotation marks around quoted speech take more than one byte in > Unicode.? True, in a starvation-level character set, they are both > represented as " , but that's really not what they are. > > -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 29 10:35:38 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 29 Jun 2012 10:35:38 +0200 Subject: [M3devel] Simple change to WIDECHAR type Message-ID: m3front/src/builtinTypes/WCharr.m3, line: T := EnumType.New (16_10000, elts); to T := EnumType.New (16_100000, elts); Will this break things? Any other assumptions anywhere? From dabenavidesd at yahoo.es Fri Jun 29 17:47:50 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 29 Jun 2012 16:47:50 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: Message-ID: <1340984870.74508.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: more important than a maximum length char type we need a minimum optimal char table size (and thus in word size so we can optimize). Use WIDECHAR and move it towards moduli arithmetic of CHAR {0..255}, so you can select at RT a common base to represent your character system I/O This happens in micro digital signal processor if you care output of such systems in practice. Also a terminal of characters is important to note for instance as a way of evaluating the speed of signal processor design. DEC had lot of devices like that so for instance to have a common interface to those systems is useful. Many mainframes are still handled mostly by use of that device so, I guess is important for such system to support most types of encodings: http://vt100.net/docs/vt510-rm/chapter8 http://en.wikipedia.org/wiki/ISO/IEC_8859-5 P Zollo write an emulator for that device, so maybe we can test speed of character streaming with that. Thanks in advance --- El vie, 29/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] Simple change to WIDECHAR type Para: "m3devel" Fecha: viernes, 29 de junio, 2012 03:35 m3front/src/builtinTypes/WCharr.m3, line: ? ? T := EnumType.New (16_10000, elts); to ? ? T := EnumType.New (16_100000, elts); Will this break things? Any other assumptions anywhere? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 29 17:52:55 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 29 Jun 2012 17:52:55 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: References: Message-ID: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> That, or UTF-16 encoding on top of current WIDECHAR. On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory. In other words, all TEXT containing WIDECHAR will double in size. > > On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: > >> m3front/src/builtinTypes/WCharr.m3, line: >> >> T := EnumType.New (16_10000, elts); >> >> to >> >> T := EnumType.New (16_100000, elts); >> >> Will this break things? Any other assumptions anywhere? >> > From dabenavidesd at yahoo.es Fri Jun 29 18:08:57 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 29 Jun 2012 17:08:57 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> Message-ID: <1340986137.5745.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I repeat we need performance well udnerstood as a matter of issue here. Who cares imposed standards, ISO are the real standards, no point to complain about that as DEC put de-facto on its terminals. Thanks in advance ? --- El vie, 29/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Simple change to WIDECHAR type Para: "Antony Hosking" CC: "m3devel" Fecha: viernes, 29 de junio, 2012 10:52 That, or UTF-16 encoding on top of current WIDECHAR. On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory.? In other words, all TEXT containing WIDECHAR will double in size. > > On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: > >> m3front/src/builtinTypes/WCharr.m3, line: >> >>???T := EnumType.New (16_10000, elts); >> >> to >> >>???T := EnumType.New (16_100000, elts); >> >> Will this break things? Any other assumptions anywhere? >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Sat Jun 30 09:33:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 09:33:00 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> Message-ID: <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> Current GetChar/SetChars and GetWideChar/SetWideChars are not character-level access methods, in terms of Unicode. They are "byte-level", fixed width data accesses. Reason: Both CHAR (cardinality 2^8) and WIDECHAR (cardinality 2^16) based strings must use one or more characters to represent whole Unicode (cardinality 2^20). If we must encode in any case, then we don't have any benefit of WIDECHAR (as it is implemented/understood now) at all! To represent Unicode with either CHAR or WIDECHAR based TEXTs - we must use either UTF-8 or UTF-16. Both are one-to-multibyte encodings, encoding one Unicode character to either 1-4 CHARs or 1-2 WIDECHARs. What exactly is meaning (at Modula-3 usual levels of abstraction) of character-level access? Do we need whatever bit pattern physically happening at some location in our data's representation. Or maybe we need numerical representation of actual, visually distinguishable in written representation, Unicode character value? One from that set of 2^20 elements? What is meaning of Text.Sub() based on byte-level access operations where our resulting TEXTs first character is in fact a prefix of some Unicode characters encoding? And/or where our last character is invalid/incomplete suffix of some encoded character. Since when are fast and efficient operations doing something we don't need at all our priority? We are getting nothing at all with WIDECHAR. No. Single. Thing. WIDECHAR does not make us closer to Unicode at all. WIDECHAR, together with CHAR (in context of our current TEXT) makes two almost-solutions to Unicode problem and existence of WIDECHAR scalar type makes us a bit closer to Unicode almost-solution of C world and nothing else. Currently, neither GetChar nor GetWideChar can get "a character at nth position". Reason: No character scalar type to keep any Unicode character. Solution: ====== * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can hold unencoded Unicode characters in scalar values in our Modula-3 programs, while preserving their properties. * Implement properties, relations and methods defined for Unicode. With ASCII, numeric order is everything. With Unicode - it is not. This is probably very big project but we can start somewhere, and let interested parties build on it. Dirk Muysers did work in this regard already. * Whoever thinks we don't need this and our "tradition" and "legacy" are important, please read this: http://unicode.org/standard/WhatIsUnicode.html . dd On Jun 29, 2012, at 5:52 PM, Dragi?a Duri? wrote: > That, or UTF-16 encoding on top of current WIDECHAR. > > On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > >> That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory. In other words, all TEXT containing WIDECHAR will double in size. >> >> On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: >> >>> m3front/src/builtinTypes/WCharr.m3, line: >>> >>> T := EnumType.New (16_10000, elts); >>> >>> to >>> >>> T := EnumType.New (16_100000, elts); >>> >>> Will this break things? Any other assumptions anywhere? >>> >> > From dragisha at m3w.org Sat Jun 30 10:56:27 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 10:56:27 +0200 Subject: [M3devel] Some earlier work Message-ID: <31C25C13-66B2-4637-9D33-C2E5E80AB9DA@m3w.org> This is how we implemented UTF8 strings over current TEXTs. Current implementation is UNSAFE and uses glibc utf8 methods. Nothing too complicated and nothing we can't implemented in Modula-3/portable C. ===== INTERFACE UText; TYPE T = TEXT; Char = CARDINAL; PROCEDURE Cat(t, u: T): T; PROCEDURE Equal(t, u: T): BOOLEAN; PROCEDURE GetChar(t: T; i: CARDINAL): Char; PROCEDURE ByteSize(t: T): CARDINAL; PROCEDURE Length(t: T): CARDINAL; PROCEDURE Empty(t: T): BOOLEAN; PROCEDURE Sub(t: T; start: CARDINAL; length: CARDINAL := LAST(CARDINAL)): T; PROCEDURE SetChars(VAR a: ARRAY OF Char; t: T); PROCEDURE FromChar(ch: Char): T; PROCEDURE FromChars(READONLY a: ARRAY OF Char): T; PROCEDURE Hash(t: T): Word.T; PROCEDURE Compare(t1, t2: T): [-1..1]; PROCEDURE FindChar(t: T; ch: Char; start: CARDINAL := 0): INTEGER; PROCEDURE FindCharR(t: T; ch: Char; start: CARDINAL := LAST(INTEGER)): INTEGER; END UText. From hendrik at topoi.pooq.com Sat Jun 30 16:29:24 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sat, 30 Jun 2012 10:29:24 -0400 Subject: [M3devel] Some earlier work In-Reply-To: <31C25C13-66B2-4637-9D33-C2E5E80AB9DA@m3w.org> References: <31C25C13-66B2-4637-9D33-C2E5E80AB9DA@m3w.org> Message-ID: <20120630142924.GB12402@topoi.pooq.com> On Sat, Jun 30, 2012 at 10:56:27AM +0200, Dragi?a Duri? wrote: > This is how we implemented Any chance you could show us the implementation and not just the INTERFACE? -- hendrik From dragisha at m3w.org Sat Jun 30 16:39:16 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 16:39:16 +0200 Subject: [M3devel] Some earlier work In-Reply-To: <20120630142924.GB12402@topoi.pooq.com> References: <31C25C13-66B2-4637-9D33-C2E5E80AB9DA@m3w.org> <20120630142924.GB12402@topoi.pooq.com> Message-ID: <0C45D4FF-8279-404F-A68E-35656D261959@m3w.org> Of course. http://dl.dropbox.com/u/60554338/UText.m3 On Jun 30, 2012, at 4:29 PM, Hendrik Boom wrote: > On Sat, Jun 30, 2012 at 10:56:27AM +0200, Dragi?a Duri? wrote: >> This is how we implemented > > Any chance you could show us the implementation and not just the INTERFACE? > > -- hendrik From jay.krell at cornell.edu Sat Jun 30 18:52:54 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 30 Jun 2012 16:52:54 +0000 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> References: , , <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org>, <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> Message-ID: I don't fully buy this. 16bit WIDECHAR is very useful on Windows. It can be used directly with a vast vast vast vast number of functions. 32bit char would be require conversion to and from all the time. As well, there are no codepages when using 16 characters. 8 bit characters are interpreted in a way on/by Windows that varies per OS and per user and which isn't stored with the string. I realize that Modula-3 code doesn't necessarily use the same interpretation. "no codepages" is the advantage of utf8 -- pick one "code page". If "code page" means "how to encode/decode more than 8 bits, 8 bits at a time. Hope all the data is 7 bit clean, so it doesn't matter. Otherwise convert to and from a lot. I do understand that current Unicode requires 20 bits, and that a 32bit character type is justifiable. As I understand, this was debated when Unicode was first designed but rejected as too large. - Jay ---------------------------------------- > From: dragisha at m3w.org > Date: Sat, 30 Jun 2012 09:33:00 +0200 > To: antony.hosking at gmail.com > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Simple change to WIDECHAR type > > Current GetChar/SetChars and GetWideChar/SetWideChars are not character-level access methods, in terms of Unicode. They are "byte-level", fixed width data accesses. Reason: Both CHAR (cardinality 2^8) and WIDECHAR (cardinality 2^16) based strings must use one or more characters to represent whole Unicode (cardinality 2^20). If we must encode in any case, then we don't have any benefit of WIDECHAR (as it is implemented/understood now) at all! > > To represent Unicode with either CHAR or WIDECHAR based TEXTs - we must use either UTF-8 or UTF-16. Both are one-to-multibyte encodings, encoding one Unicode character to either 1-4 CHARs or 1-2 WIDECHARs. > > What exactly is meaning (at Modula-3 usual levels of abstraction) of character-level access? Do we need whatever bit pattern physically happening at some location in our data's representation. Or maybe we need numerical representation of actual, visually distinguishable in written representation, Unicode character value? One from that set of 2^20 elements? > > What is meaning of Text.Sub() based on byte-level access operations where our resulting TEXTs first character is in fact a prefix of some Unicode characters encoding? And/or where our last character is invalid/incomplete suffix of some encoded character. > > Since when are fast and efficient operations doing something we don't need at all our priority? > > We are getting nothing at all with WIDECHAR. No. Single. Thing. WIDECHAR does not make us closer to Unicode at all. WIDECHAR, together with CHAR (in context of our current TEXT) makes two almost-solutions to Unicode problem and existence of WIDECHAR scalar type makes us a bit closer to Unicode almost-solution of C world and nothing else. > > Currently, neither GetChar nor GetWideChar can get "a character at nth position". Reason: No character scalar type to keep any Unicode character. > > Solution: > ====== > > * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can hold unencoded Unicode characters in scalar values in our Modula-3 programs, while preserving their properties. > * Implement properties, relations and methods defined for Unicode. With ASCII, numeric order is everything. With Unicode - it is not. This is probably very big project but we can start somewhere, and let interested parties build on it. Dirk Muysers did work in this regard already. > * Whoever thinks we don't need this and our "tradition" and "legacy" are important, please read this: http://unicode.org/standard/WhatIsUnicode.html . > > dd > > On Jun 29, 2012, at 5:52 PM, Dragi?a Duri? wrote: > > > That, or UTF-16 encoding on top of current WIDECHAR. > > > > On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > > > >> That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory. In other words, all TEXT containing WIDECHAR will double in size. > >> > >> On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: > >> > >>> m3front/src/builtinTypes/WCharr.m3, line: > >>> > >>> T := EnumType.New (16_10000, elts); > >>> > >>> to > >>> > >>> T := EnumType.New (16_100000, elts); > >>> > >>> Will this break things? Any other assumptions anywhere? > >>> > >> > > > From dragisha at m3w.org Sat Jun 30 19:17:23 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 19:17:23 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: References: , , <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org>, <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> Message-ID: <7E7703E6-48BD-4DCE-8336-97D41249EDCF@m3w.org> And since when usefulness on Windows defined anything Modula-3? To use vastvastvastnumber of Windows functions on Modula-3 TEXT you must call at least one Modula-3 function on your argument to make it passable to Windows API function. To make another single-call Modula-3 function mapping UTF-8 to Windows API acceptable argument is five minutes task. So, you are in fact gaining nothing with WIDECHAR you can't have with UTF8 packed in Text8.T. 32bit characters is what we have on non-Windows. And we must convert all the time if we are to use Modula-3 WIDECHAR based TEXT to non-Windows wchar strings. Are you arguing Windows is more important than all other platforms we support or what? On Jun 30, 2012, at 6:52 PM, Jay K wrote: > > I don't fully buy this. 16bit WIDECHAR is very useful on Windows. > > It can be used directly with a vast vast vast vast number of functions. > > 32bit char would be require conversion to and from all the time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Sat Jun 30 19:24:01 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Sat, 30 Jun 2012 10:24:01 -0700 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> Message-ID: <20120630172401.DFE8E1A207C@async.async.caltech.edu> =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: ... > >Solution: >=3D=3D=3D=3D=3D=3D > >* Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >hold unencoded Unicode characters in scalar values in our Modula-3 = >programs, while preserving their properties. >* Implement properties, relations and methods defined for Unicode. With = >ASCII, numeric order is everything. With Unicode - it is not. This is = >probably very big project but we can start somewhere, and let interested = >parties build on it. Dirk Muysers did work in this regard already. >* Whoever thinks we don't need this and our "tradition" and "legacy" are = >important, please read this: = >http://unicode.org/standard/WhatIsUnicode.html . > >dd Given what you have said about the near-uselessness of WIDECHAR, does anything actually use it much? What breaks if it is redefined to be the same as, say, INTEGER? (Or Word.T) CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if that could go back to using the SRC data structures. For people who do stuff like write VLSI design tools... (probably many other large-scale applications would like it too). Mika From dragisha at m3w.org Sat Jun 30 20:12:45 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 20:12:45 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <20120630172401.DFE8E1A207C@async.async.caltech.edu> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> Message-ID: <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> I don't see where WIDECHAR can be useful, such as it is. Esp. since TEXT in cm3 is non-flat structure, and it is almost always additional processing to prepare it even for a Windows API argument. Additional processing from dendriform cm3 TEXT is in no way more efficient if some nodes are already just-like-Windows-texts. Also, cm3 TEXT is overengineered - I hope I don't have to argue this. Everything is second to efficient concat operation. IMO, we must leave TEXT to be simple and CHAR based. Just like you need for your VLSI tools. And use something like UText.i3/m3 to use such objects to represent Unicode (UTF-8 encoded) any-language strings. And use WText.* for communication with wchar API's like Windows'. BTW, WIDECHAR literals are non sufficiently defined in cm3. There is a hole size of Moon. What is input encoding for source files containing WIDECHAR literals? For example: CONST Me = W"Dragi?a Duri?"; Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? On Jun 30, 2012, at 7:24 PM, Mika Nystrom wrote: > > =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > ... >> >> Solution: >> =3D=3D=3D=3D=3D=3D >> >> * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >> GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >> hold unencoded Unicode characters in scalar values in our Modula-3 = >> programs, while preserving their properties. >> * Implement properties, relations and methods defined for Unicode. With = >> ASCII, numeric order is everything. With Unicode - it is not. This is = >> probably very big project but we can start somewhere, and let interested = >> parties build on it. Dirk Muysers did work in this regard already. >> * Whoever thinks we don't need this and our "tradition" and "legacy" are = >> important, please read this: = >> http://unicode.org/standard/WhatIsUnicode.html . >> >> dd > > Given what you have said about the near-uselessness of WIDECHAR, does anything > actually use it much? What breaks if it is redefined to be the same as, say, > INTEGER? (Or Word.T) > > CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if > that could go back to using the SRC data structures. For people who do stuff > like write VLSI design tools... (probably many other large-scale applications > would like it too). > > Mika From dabenavidesd at yahoo.es Sun Jun 3 18:51:51 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 3 Jun 2012 17:51:51 +0100 (BST) Subject: [M3devel] Renewed interest in Modula-3 in HP Labs In-Reply-To: <1338470019.63945.YahooMailClassic@web29703.mail.ird.yahoo.com> Message-ID: <1338742311.65879.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all l looking to support a c-backend we would need to know how much can we optimize the energy consumption of any backend CG or how long can we use M3CG in compilation time total (the result could be that we need to distribute precompiled form, see p. 7: http://www.fdi.ucm.es/profesor/ricardo/ei2/crisis.pdf ). This would be a rather good measure of the need of a Object code backend or not (like Gcc, or JVM one, or a translation based like Pascal first implementations were pascal manually machine coded). For instance HP had? HP3000 [1] with several measurements, as their "u-code" Interface was not open but proprietary so you couldn't get their compiler for A-L/SPL (contrary to pascal). I'm sure they have worked out in this problem as well as for newer machines (like for fpga reposition programs for VAXen and Alpha) but how much they will emulate in SW I don't know. I write that because VAX is essentially translated to Alpha via M3CG via HW and equally in SW. I know they are producing VAX in FPGA, but don't know abut Alphas at all. Thanks in advance [1] R. P. Blake, ?Exploring a Stack Architecture,? Computer, vol. 10, no. 5, pp. 30?39, May 1977. --- El jue, 31/5/12, Daniel Alejandro Benavides D. escribi?: De: Daniel Alejandro Benavides D. Asunto: [M3devel] Renewed interest in Modula-3 in HP Labs Para: m3devel at elegosoft.com Fecha: jueves, 31 de mayo, 2012 08:13 Hi all: I see there is some products coming from HP, and others, but specially HP, claiming that provide lower consumption in data center power management. As I see they are working in Tycoon as a Data processor (created in Germany and Europe). As Greg Nelson wrote code for profiling the Alphas and Itanium, perhaps they are interested in work on ESC, but nevertheless Modula-3 and family languages (Quest) as Tycoon is based on them. If I may say so, Quest was defined by its simple denotational semantics, which is the natural deduction system of Baby Modula-3 (though it lacks more than that, but you can process the language of it through the former) Do we want to confirm that, if anyone interested in the TML - TVM please write me for any other questions or comments Thanks in advance http://www.eetimes.com/electronics-news/4373994/HP-cuts-data-center-power-in-lab-tests?cid=NL_EETimesDaily http://tycoon.hpl.hp.com/~tycoon/doc/users_manual_en/ch-intro.html http://wwwmatthes.in.tum.de/file/Publications/1992/Math92/paper.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Sun Jun 3 23:18:47 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sun, 3 Jun 2012 17:18:47 -0400 Subject: [M3devel] Renewed interest in Modula-3 in HP Labs In-Reply-To: <1338742311.65879.YahooMailClassic@web29703.mail.ird.yahoo.com> References: <1338470019.63945.YahooMailClassic@web29703.mail.ird.yahoo.com> <1338742311.65879.YahooMailClassic@web29703.mail.ird.yahoo.com> Message-ID: <20120603211847.GA17923@topoi.pooq.com> On Sun, Jun 03, 2012 at 05:51:51PM +0100, Daniel Alejandro Benavides D. wrote: > semantics, which is the natural deduction system of Baby Modula-3 You keep mentioning Baby Modula 3, but I have no idea what it is. Can you expalin and provide lins? -- hendrik From dabenavidesd at yahoo.es Sun Jun 3 23:48:42 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 3 Jun 2012 22:48:42 +0100 (BST) Subject: [M3devel] Renewed interest in Modula-3 in HP Labs In-Reply-To: <20120603211847.GA17923@topoi.pooq.com> Message-ID: <1338760122.84788.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: for sure yes, it's a first-order prototype-oriented functional programming language for writing programming language's type systems (in Spanish-native tongue countries like Abadi's, most common games or toy tool are Baby dolls, if you care. hence its name if I may say so). Basically the? language itself is not dissimilar from Modula-3 in its object-oriented part. It has a type system in lambda calculus, written for its meta-languages as well (e.g. Modula-3). Its denotational semantics are expressed in a natural deduction system logic. Basically was constructed to explain object oriented languages, though it wasn't written specially for that, but for type system calculus construction (you could say a kind of IBM's Axiom for computers science type theoretician? if I may say so). No other system besides DEC ones had ever play with it (its functional language although simple is not easily executable so Cardelli and others decide to use a different calculus for their joint Book "A Theory of Objects"). But at the? very core issue of unification it lead the work on type systems for its times. Thanks in advance --- El dom, 3/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] Renewed interest in Modula-3 in HP Labs Para: m3devel at elegosoft.com Fecha: domingo, 3 de junio, 2012 16:18 On Sun, Jun 03, 2012 at 05:51:51PM +0100, Daniel Alejandro Benavides D. wrote: > semantics, which is the natural deduction system of Baby Modula-3 You keep mentioning Baby Modula 3, but I have no idea what it is.? Can you expalin and provide lins? -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Wed Jun 6 09:57:40 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 6 Jun 2012 09:57:40 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120606064732.2C9242474003@birch.elegosoft.com> References: <20120606064732.2C9242474003@birch.elegosoft.com> Message-ID: <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Jay, What benefit from 4.6 backend do we expect for cm3 if most of optimizer is "optimized out" of cm3cg? If our "trees" are reason why you must switch optimizations off, is it not more logical to fix our "trees"? One by one, if need be. A look into gm2 (for example), a fix in our backend. That way, future porting to most recent gcc's will be much easier? TIA, dd On Jun 6, 2012, at 8:47 AM, Jay Krell wrote: > Log message: > remove more of the optimizer -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Wed Jun 6 10:10:06 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 6 Jun 2012 08:10:06 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Message-ID: I have very mixed feelings about the optimizer. 1) I'm not certain it is worth the time it takes to run. 2) Fixing our trees isn't necessarily trivial. The most expedient thing is neither to fix the trees, nor remove the optimizer code, but merely to set the optimizer to be off in parse.c. 3) gcc is huge, I'd kind of like to see if I can get actually building it can be made much faster/smaller 4) Probably what really got me started here is the gmp/mpfr/mpc dependency. 5) The "best" thing isn't necessarily to use gcc at all. 6) I'll maybe move up to 4.7 soon. 6b) and maybe not spend so much time on it? Maybe just ln -s in gmp/mpfr/mpc and port only the needed changes? Maybe even not using g++ but the hybrid gcc/g++ I use for gcc-apple (4.2) 7) Do folks out there really use the Modula-3/gcc optimizer, and notice it produces code that runs much faster? ?- Jay ________________________________ > From: dragisha at m3w.org > Date: Wed, 6 Jun 2012 09:57:40 +0200 > To: jkrell at elego.de > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > Jay, > > What benefit from 4.6 backend do we expect for cm3 if most of optimizer > is "optimized out" of cm3cg? > > If our "trees" are reason why you must switch optimizations off, is it > not more logical to fix our "trees"? One by one, if need be. A look > into gm2 (for example), a fix in our backend. That way, future porting > to most recent gcc's will be much easier? > > TIA, > dd > > On Jun 6, 2012, at 8:47 AM, Jay Krell wrote: > > Log message: > remove more of the optimizer > From jay.krell at cornell.edu Wed Jun 6 10:15:32 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 6 Jun 2012 08:15:32 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, Message-ID: > > What benefit from 4.6 backend do we expect for cm3 if most of optimizer ps: just the general goodness of staying current. Even if a hacked up current. 4.7.0 is out already.. ?- Jay ---------------------------------------- > From: jay.krell at cornell.edu > To: dragisha at m3w.org; jkrell at elego.de > CC: m3devel at elegosoft.com > Subject: RE: [M3devel] [M3commit] CVS Update: cm3 > Date: Wed, 6 Jun 2012 08:10:06 +0000 > > > I have very mixed feelings about the optimizer. > 1) I'm not certain it is worth the time it takes to run. > 2) Fixing our trees isn't necessarily trivial. > The most expedient thing is neither to fix the trees, nor remove the optimizer code, but merely > to set the optimizer to be off in parse.c. > 3) gcc is huge, I'd kind of like to see if I can get actually building it can be made much faster/smaller > 4) Probably what really got me started here is the gmp/mpfr/mpc dependency. > 5) The "best" thing isn't necessarily to use gcc at all. > 6) I'll maybe move up to 4.7 soon. > 6b) and maybe not spend so much time on it? Maybe just ln -s in gmp/mpfr/mpc and port only the needed changes? > Maybe even not using g++ but the hybrid gcc/g++ I use for gcc-apple (4.2) > 7) Do folks out there really use the Modula-3/gcc optimizer, and notice it produces code that runs much faster? > > > - Jay > > > ________________________________ > > From: dragisha at m3w.org > > Date: Wed, 6 Jun 2012 09:57:40 +0200 > > To: jkrell at elego.de > > CC: m3devel at elegosoft.com > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > > > Jay, > > > > What benefit from 4.6 backend do we expect for cm3 if most of optimizer > > is "optimized out" of cm3cg? > > > > If our "trees" are reason why you must switch optimizations off, is it > > not more logical to fix our "trees"? One by one, if need be. A look > > into gm2 (for example), a fix in our backend. That way, future porting > > to most recent gcc's will be much easier? > > > > TIA, > > dd > > > > On Jun 6, 2012, at 8:47 AM, Jay Krell wrote: > > > > Log message: > > remove more of the optimizer > > > From dragisha at m3w.org Wed Jun 6 10:51:33 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 6 Jun 2012 10:51:33 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Message-ID: I am using it, and I need it. Does it run better/faster? I didn't test, but is it something to even ask, these days, architectures, ? ? Only if you turned everything off in 5.8.6 and later, as you'r doing it now, then probably my "-O2" default it is of no benefit at all :). Generally, our "pitch" to "sell" super-modern-ultra-blast-mega-fast-superlative-OO and everything else you only dreamed about? And add "no CPU optimizations"? Imagine that. On Jun 6, 2012, at 10:10 AM, Jay K wrote: > 7) Do folks out there really use the Modula-3/gcc optimizer, and notice it produces code that runs much faster? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Wed Jun 6 11:38:18 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 6 Jun 2012 09:38:18 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , Message-ID: 5.8.6 does allow many optimizations to occur. We turn off a very small number directly. Functions that call setjmp have optimizations inhibited by declaring all locals volatile. We don't give the compiler good type information, and we take the address of stuff more than necessary, by generating very low level code. Where you have e.g. MODULE Foo; TYPE Point =? RECORD x,y:INTEGER END; PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; We generate the equivalent of: typedef ptrdiff_t INTEGER; typedef char* ADDRESS; INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. ?- Jay ________________________________ > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > From: dragisha at m3w.org > Date: Wed, 6 Jun 2012 10:51:33 +0200 > CC: jkrell at elego.de; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > I am using it, and I need it. > > Does it run better/faster? I didn't test, but is it something to even > ask, these days, architectures, ? ? > > Only if you turned everything off in 5.8.6 and later, as you'r doing it > now, then probably my "-O2" default it is of no benefit at all :). > > Generally, our "pitch" to "sell" > super-modern-ultra-blast-mega-fast-superlative-OO and everything else > you only dreamed about? And add "no CPU optimizations"? Imagine that. > > On Jun 6, 2012, at 10:10 AM, Jay K wrote: > > 7) Do folks out there really use the Modula-3/gcc optimizer, and notice > it produces code that runs much faster? > From jay.krell at cornell.edu Wed Jun 6 11:42:52 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 6 Jun 2012 09:42:52 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, , <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , Message-ID: ?> Functions that call setjmp I meant -- functions wtih TRY/EXCEPT or TRY/FINALLY. :) ?- Jay ---------------------------------------- > From: jay.krell at cornell.edu > To: dragisha at m3w.org > Date: Wed, 6 Jun 2012 09:38:18 +0000 > CC: jkrell at elego.de; m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > > 5.8.6 does allow many optimizations to occur. > We turn off a very small number directly. > Functions that call setjmp have optimizations inhibited by declaring all locals volatile. > We don't give the compiler good type information, and we take the address of stuff more than necessary, by > generating very low level code. > Where you have e.g. > MODULE Foo; > TYPE Point = RECORD x,y:INTEGER END; > PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; > > > We generate the equivalent of: > > > typedef ptrdiff_t INTEGER; > typedef char* ADDRESS; > INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } > > > Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. > > > > - Jay > > > ________________________________ > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > From: dragisha at m3w.org > > Date: Wed, 6 Jun 2012 10:51:33 +0200 > > CC: jkrell at elego.de; m3devel at elegosoft.com > > To: jay.krell at cornell.edu > > > > I am using it, and I need it. > > > > Does it run better/faster? I didn't test, but is it something to even > > ask, these days, architectures, ? ? > > > > Only if you turned everything off in 5.8.6 and later, as you'r doing it > > now, then probably my "-O2" default it is of no benefit at all :). > > > > Generally, our "pitch" to "sell" > > super-modern-ultra-blast-mega-fast-superlative-OO and everything else > > you only dreamed about? And add "no CPU optimizations"? Imagine that. > > > > On Jun 6, 2012, at 10:10 AM, Jay K wrote: > > > > 7) Do folks out there really use the Modula-3/gcc optimizer, and notice > > it produces code that runs much faster? > > > From dragisha at m3w.org Wed Jun 6 12:17:54 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 6 Jun 2012 12:17:54 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, , <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , Message-ID: I know that much about generated code :). "Good" thing is - not many things changed in *m3 backend since I ported pm3 to LINUX_ALPHA :) On Jun 6, 2012, at 11:42 AM, Jay K wrote: > > > Functions that call setjmp > > > I meant -- functions wtih TRY/EXCEPT or TRY/FINALLY. :) > > - Jay > > ---------------------------------------- >> From: jay.krell at cornell.edu >> To: dragisha at m3w.org >> Date: Wed, 6 Jun 2012 09:38:18 +0000 >> CC: jkrell at elego.de; m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> >> >> 5.8.6 does allow many optimizations to occur. >> We turn off a very small number directly. >> Functions that call setjmp have optimizations inhibited by declaring all locals volatile. >> We don't give the compiler good type information, and we take the address of stuff more than necessary, by >> generating very low level code. >> Where you have e.g. >> MODULE Foo; >> TYPE Point = RECORD x,y:INTEGER END; >> PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; >> >> >> We generate the equivalent of: >> >> >> typedef ptrdiff_t INTEGER; >> typedef char* ADDRESS; >> INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } >> >> >> Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. >> >> >> >> - Jay >> >> >> ________________________________ >>> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >>> From: dragisha at m3w.org >>> Date: Wed, 6 Jun 2012 10:51:33 +0200 >>> CC: jkrell at elego.de; m3devel at elegosoft.com >>> To: jay.krell at cornell.edu >>> >>> I am using it, and I need it. >>> >>> Does it run better/faster? I didn't test, but is it something to even >>> ask, these days, architectures, ? ? >>> >>> Only if you turned everything off in 5.8.6 and later, as you'r doing it >>> now, then probably my "-O2" default it is of no benefit at all :). >>> >>> Generally, our "pitch" to "sell" >>> super-modern-ultra-blast-mega-fast-superlative-OO and everything else >>> you only dreamed about? And add "no CPU optimizations"? Imagine that. >>> >>> On Jun 6, 2012, at 10:10 AM, Jay K wrote: >>> >>> 7) Do folks out there really use the Modula-3/gcc optimizer, and notice >>> it produces code that runs much faster? >>> >> > From dabenavidesd at yahoo.es Wed Jun 6 16:17:23 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 6 Jun 2012 15:17:23 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: Message-ID: <1338992243.7847.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I noticed originally factored code is better, and if its dead then that's optimization. I don't know too much gcc or gdb, but factoring to match open64 (c++) might be better. About Alphas, I know that DEC Firefly was commercialized as SMP VS3520/40 and unrelease V3820/40, given that a DB vendor ported products to it, shouldn't we use their backends to a DB machine? Besides that I think that developing a product for that end is what HP is doing: http://www.zdnetasia.com/hp-aiming-for-data-protection-battleground-62305019.htm?src=newsletter That said, alphas wouldn't use gcc but their own backend directed optimizer, like for their DEClanguages internal products. Thanks in advance --- El mi?, 6/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "Jay K" CC: "Jay Krell" , "m3devel" Fecha: mi?rcoles, 6 de junio, 2012 05:17 I know that much about generated code :). "Good" thing is - not many things changed in *m3 backend since I ported pm3 to LINUX_ALPHA :) On Jun 6, 2012, at 11:42 AM, Jay K wrote: > >? > Functions that call setjmp > > > I meant -- functions wtih TRY/EXCEPT or TRY/FINALLY. :) > >? - Jay > > ---------------------------------------- >> From: jay.krell at cornell.edu >> To: dragisha at m3w.org >> Date: Wed, 6 Jun 2012 09:38:18 +0000 >> CC: jkrell at elego.de; m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> >> >> 5.8.6 does allow many optimizations to occur. >> We turn off a very small number directly. >> Functions that call setjmp have optimizations inhibited by declaring all locals volatile. >> We don't give the compiler good type information, and we take the address of stuff more than necessary, by >> generating very low level code. >> Where you have e.g. >> MODULE Foo; >> TYPE Point =? RECORD x,y:INTEGER END; >> PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; >> >> >> We generate the equivalent of: >> >> >> typedef ptrdiff_t INTEGER; >> typedef char* ADDRESS; >> INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } >> >> >> Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. >> >> >> >> - Jay >> >> >> ________________________________ >>> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >>> From: dragisha at m3w.org >>> Date: Wed, 6 Jun 2012 10:51:33 +0200 >>> CC: jkrell at elego.de; m3devel at elegosoft.com >>> To: jay.krell at cornell.edu >>> >>> I am using it, and I need it. >>> >>> Does it run better/faster? I didn't test, but is it something to even >>> ask, these days, architectures, ? ? >>> >>> Only if you turned everything off in 5.8.6 and later, as you'r doing it >>> now, then probably my "-O2" default it is of no benefit at all :). >>> >>> Generally, our "pitch" to "sell" >>> super-modern-ultra-blast-mega-fast-superlative-OO and everything else >>> you only dreamed about? And add "no CPU optimizations"? Imagine that. >>> >>> On Jun 6, 2012, at 10:10 AM, Jay K wrote: >>> >>> 7) Do folks out there really use the Modula-3/gcc optimizer, and notice >>> it produces code that runs much faster? >>> >> > ??? ???????? ?????? ??? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Wed Jun 6 18:18:08 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Wed, 06 Jun 2012 09:18:08 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Message-ID: <20120606161808.7F5EA1A205B@async.async.caltech.edu> Jay K writes: > ... >7) Do folks out there really use the Modula-3/gcc optimizer=2C and notice i= >t produces code that runs much faster? If we are talking about turning on optimizations in the m3makefile, then the answer is: Yes! At least with CM3 it makes a huge difference in runtime. Without the optimizer CM3-produced code runs far slower than PM3-produced code (I've seen 3X I think.) With it, CM3 can sometimes keep up. Unless you use a lot of TYPECASE or other constructs that have a much less efficient implementation in the CM3 libraries than in the PM3 libraries. Mika From dabenavidesd at yahoo.es Wed Jun 6 20:50:59 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 6 Jun 2012 19:50:59 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120606161808.7F5EA1A205B@async.async.caltech.edu> Message-ID: <1339008659.61806.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: this is very bad news, sounds like we had a old RT. I wonder how parallelized was DEC-SRC Vulcan or alike environments. Thanks in advance --- El mi?, 6/6/12, Mika Nystrom escribi?: De: Mika Nystrom Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "Jay K" CC: m3devel at elegosoft.com Fecha: mi?rcoles, 6 de junio, 2012 11:18 Jay K writes: > ... >7) Do folks out there really use the Modula-3/gcc optimizer=2C and notice i= >t produces code that runs much faster? If we are talking about turning on optimizations in the m3makefile, then the answer is: Yes!? At least with CM3 it makes a huge difference in runtime.? Without the optimizer CM3-produced code runs far slower than PM3-produced code (I've seen 3X I think.)? With it, CM3 can sometimes keep up.? Unless you use a lot of TYPECASE or other constructs that have a much less efficient implementation in the CM3 libraries than in the PM3 libraries. ? ? Mika -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Thu Jun 7 02:06:30 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Wed, 6 Jun 2012 20:06:30 -0400 Subject: [M3devel] ran out of space in /tmp while building .deb Message-ID: <20120607000630.GA4233@topoi.pooq.com> While trying to build a deb for modula 3 on my laptop (a wheezy 32-bit intel machine) /tmp got full and the build aborted. Obviously, I should place /tmp elsewhere -- except that there's no entry in my /etc/fstab telling it where the tmpfs should be mounted. If I could just get it not to mount anything on /tmp things should be fine. Apparently, though, the kernel just know better, and I'm stuck wit a small /tmp. Is there eny way to tell make-dist.py that it's supposed to put its temporary files somewhere other than .tmp? -- hendrik From dragisha at m3w.org Thu Jun 7 03:02:19 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 7 Jun 2012 03:02:19 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607011634.468b6bbf@wenus.next.com.pl> References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> <20120607011634.468b6bbf@wenus.next.com.pl> Message-ID: <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org> Try ALPHA_LINUX, maybe ask Jay first :) On Jun 7, 2012, at 1:16 AM, Dariusz Knoci?ski wrote: > Dnia 2012-06-06, o godz. 12:17:54 > Dragi?a Duri? napisa?(a): > >> I know that much about generated code :). >> >> "Good" thing is - not many things changed in *m3 backend since I ported pm3 >> to LINUX_ALPHA :) >> > Let me ask a stupid question. Is cm3 working on LINUX_ALPHA? I have one ES40 > working server with Gentoo Linux. > > Best Regards > Dariusz Knoci?ski. From jay.krell at cornell.edu Thu Jun 7 03:19:20 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 7 Jun 2012 01:19:20 +0000 Subject: [M3devel] ran out of space in /tmp while building .deb In-Reply-To: <20120607000630.GA4233@topoi.pooq.com> References: <20120607000630.GA4233@topoi.pooq.com> Message-ID: Use the source. Change it if needed. - Jay > Date: Wed, 6 Jun 2012 20:06:30 -0400 > From: hendrik at topoi.pooq.com > To: m3devel at elegosoft.com > Subject: [M3devel] ran out of space in /tmp while building .deb > > While trying to build a deb for modula 3 on my laptop (a wheezy 32-bit > intel machine) /tmp got full and the build aborted. > > Obviously, I should place /tmp elsewhere -- except that there's no entry > in my /etc/fstab telling it where the tmpfs should be mounted. If I > could just get it not to mount anything on /tmp things should be fine. > Apparently, though, the kernel just know better, and I'm stuck wit a > small /tmp. > > Is there eny way to tell make-dist.py that it's supposed to put its > temporary files somewhere other than .tmp? > > -- hendrik > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Thu Jun 7 03:28:15 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 7 Jun 2012 01:28:15 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , , , <20120607011634.468b6bbf@wenus.next.com.pl>, <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org> Message-ID: > Is cm3 working on LINUX_ALPHA? I have one ES40 working server with Gentoo Linux I don't think it does yet, but give me ssh access and I can most likely make it work pretty quickly.There is very very very little to porting these days. The main thing is finding the jmpbuf size, and adding the target to various tables, describing at little or big endian, 32bit or 64bit, etc., but even that is often automatic, if it starts "alpha_" or contains "64", it is assumed 64bit. If it contains "alpha", it is probably assumed little endian. If it contains "_linux", then it is assumed Linux, etc. The jmpbuf size we can just assume something big like 1k (that is a tremendous overkill). jmpbuf size should/will soon be eliminated as a factor in porting anyway.And then you just need to create a config file ALPHA_LINUX that includes("Alpha64.common") and "Linux.common" or such. Does ALPHA_LINUX have a 32bit mode/ABI?Or is it all 64bit all the time?i.e.what does this do:echo > foo.cgcc -m32 foo.c I had some Alphas but I've sold them all.I was given access to Alphas running Tru64 v4.something and v5.something and got that to work.But the "kernel" (Tru64 vs. Linux) and not the "processor architecture" (alpha, x86, sparc) are generally a larger concern, and Linux is really old hat at this point. See..one day...we'll generate C (and maybe have cooperative suspend) and these questions will all just go away. The answer will be "of course, most likely, nothing special". - Jay > From: dragisha at m3w.org > Date: Thu, 7 Jun 2012 03:02:19 +0200 > To: dknoto at gmail.com > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > Try ALPHA_LINUX, maybe ask Jay first :) > > On Jun 7, 2012, at 1:16 AM, Dariusz Knoci?ski wrote: > > > Dnia 2012-06-06, o godz. 12:17:54 > > Dragi?a Duri? napisa?(a): > > > >> I know that much about generated code :). > >> > >> "Good" thing is - not many things changed in *m3 backend since I ported pm3 > >> to LINUX_ALPHA :) > >> > > Let me ask a stupid question. Is cm3 working on LINUX_ALPHA? I have one ES40 > > working server with Gentoo Linux. > > > > Best Regards > > Dariusz Knoci?ski. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Thu Jun 7 06:45:38 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 7 Jun 2012 04:45:38 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120606161808.7F5EA1A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> Message-ID: Daniel, I can't find the email now, as usual, you are probably wrong. We don't have an older runtime, we have a newer one, I think. With more allowance for dynamic loading. Mika, Maybe a TYPECASE-intense design is generally poor? dynamic_cast is slow in some C++ implementations. And I've never seen it used much. Some, but not much. The "type matching" that C++ exception handling has to do isn't particularly fast, though there are other costs there. Other than the stack walk, there is "finding the base of the object", and strcmp to do the actual type match -- name-based-type-equality and all that, with a hope that it suffices and no runtime checking of type hashes like Modula-3 does.. Maybe you should switch on your own type tag? ? But I guess Modula-3 doesn't have unions. Or use OBJECT and method calls? Which reminds me...it bothers me that OBJECT requires heap allocation and garbage collection. It shouldn't require either. I know we have function pointers available to simulate it, without heap allocation, but what I don't know, is if the "implicit downcast" in a virtual function/method call is doable in safe code or not. I'll have to look into it..but I'm busy now.. Maybe there is an optimization whereby the compiler can figure out that there is a small set of likely types that it could check first? Or maybe the full feature could be implemented more efficiently? Maybe it can be optimized based on the fact that the types known to the system are read-mostly, rarely written/appended? I don't know. I'd really have to look into what the language supports and how it is implemented. I'm not certain of either. In C++, typeid() is fast, and requires there be virtual functions (OBJECT). Is TYPECASE limited to OBJECTs? Or heap allocated data? Later.. ?- Jay ---------------------------------------- > To: jay.krell at cornell.edu > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > Date: Wed, 6 Jun 2012 09:18:08 -0700 > From: mika at async.caltech.edu > > Jay K writes: > > > ... > >7) Do folks out there really use the Modula-3/gcc optimizer=2C and notice i= > >t produces code that runs much faster? > > If we are talking about turning on optimizations in the m3makefile, then the > answer is: > > Yes! At least with CM3 it makes a huge difference in runtime. Without > the optimizer CM3-produced code runs far slower than PM3-produced code > (I've seen 3X I think.) With it, CM3 can sometimes keep up. Unless you > use a lot of TYPECASE or other constructs that have a much less efficient > implementation in the CM3 libraries than in the PM3 libraries. > > Mika From dragisha at m3w.org Thu Jun 7 09:30:29 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 7 Jun 2012 09:30:29 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> Message-ID: <0DF4844B-46D5-4AC7-97AD-AE18A38C2BED@m3w.org> Exatcly. Relevant parts of initialization are incremental. On Jun 7, 2012, at 6:45 AM, Jay K wrote: > Daniel, I can't find the email now, as usual, you are probably wrong. > > > We don't have an older runtime, we have a newer one, I think. > With more allowance for dynamic loading. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Thu Jun 7 16:48:24 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 7 Jun 2012 15:48:24 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <0DF4844B-46D5-4AC7-97AD-AE18A38C2BED@m3w.org> Message-ID: <1339080504.10970.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: Yes, but your estimation that user kind of behavior respect a programmer educated in using true multitask machine is not accurate. You can't read a program in two parts at a same machine, you need two different people, that it's so a true system of processors, you need a different kind of system to execute some action described. Little is said if you need to modify an OS code and another one also needs that how do you change the OS without interfering the other? To maintain a consistent view of your system? DEC-SRC were very well educated people who thought that easy of this was not hold in their system (Bob Taylor). They created yet another improvement to Modula-3+ in Modula-2+e Instead of taking inspiration for that kind of systems, they developed a newer one but I don't know much more than that it was a Win system-like. That is the reason why Modula-3 in Object code view isn't quite of many other traditional OS fixed Machine (systems that don't scale anyhow). Instead of Virtual Machinery you are confronted a true Multitasking machine. OK, if you care about that, think what is done to be done for Modula-3 is the full formal definition of the language which starts in Baby Modula-3 consist in that user of the language use it in its own description (hard to explain but that's the only way I'm afraid). Thanks in advance --- El jue, 7/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "Jay K" CC: "m3devel" Fecha: jueves, 7 de junio, 2012 02:30 Exatcly. Relevant parts of initialization are incremental. On Jun 7, 2012, at 6:45 AM, Jay K wrote: Daniel, I can't find the email now, as usual, you are probably wrong. We don't have an older runtime, we have a newer one, I think. With more allowance for dynamic loading. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Thu Jun 7 17:35:52 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 7 Jun 2012 11:35:52 -0400 Subject: [M3devel] ran out of space in /tmp while building .deb In-Reply-To: References: <20120607000630.GA4233@topoi.pooq.com> Message-ID: <20120607153552.GA8202@topoi.pooq.com> On Thu, Jun 07, 2012 at 01:19:20AM +0000, Jay K wrote: > > Use the source. Change it if needed. - Jay Thanks. But before I started hacking the source I found anothher way. It turns out that there's a parameter that suppresses mounting /tmp as a tmpfs. and it seems Debian thinks they got it wrong, and when the current initscripts trickle down from sid to testing the problem will go away by itself. I didn't wait; I changed the parameter; I won't have to hack the source. -- hendrik From hendrik at topoi.pooq.com Thu Jun 7 17:37:29 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 7 Jun 2012 11:37:29 -0400 Subject: [M3devel] .debs for modula 3 In-Reply-To: References: <20120607000630.GA4233@topoi.pooq.com> Message-ID: <20120607153729.GB8202@topoi.pooq.com> By the way, is there anything I should be doing with these .debs I'm creating other than just using them myseof? -- hendrik From dabenavidesd at yahoo.es Thu Jun 7 18:06:53 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 7 Jun 2012 17:06:53 +0100 (BST) Subject: [M3devel] .debs for modula 3 In-Reply-To: <20120607153729.GB8202@topoi.pooq.com> Message-ID: <1339085213.73020.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: Encouraging. What about a chip cipher sign from the repository? I like the idea of the signing of the deb, if I had a utility to sign them by yourself or Elego folks who want to recreate them there (I think this is mostly perl )guys? http://www.advogato.org/article/750.html A different question is whether their sharing of packages is accepted by Elego admin since most of the development occurs not only there so you know, so use a center development or distributed (only DEC-SRC used their Vesta to sign cache builds but maybe others used it in DEC-*, etc). Thanks in advance --- El jue, 7/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: [M3devel] .debs for modula 3 Para: "m3devel" Fecha: jueves, 7 de junio, 2012 10:37 By the way, is there anything I should be doing with these .debs I'm creating other than just using them myseof? -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Thu Jun 7 18:36:41 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 07 Jun 2012 09:36:41 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> Message-ID: <20120607163641.A81351A205B@async.async.caltech.edu> Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated. Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things. In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example. In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know). The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so. Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time. There is simply a static array of the types. CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome. It's all in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a heavy performance cost. And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation. Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job". I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit. I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads. Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code. I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong. Life's too short... Mika P.S. how are the pthreads coming along? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do >isn't particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the object"=2C >and strcmp to do the actual type match -- name-based-type-equality >and all that=2C with a hope that it suffices and no runtime checking >of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires >heap allocation and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C >without heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler >can figure out that there is a small set of likely types >that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types >known to the system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports >and how it is implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual >functions (OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced code >> (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. Unless you >> use a lot of TYPECASE or other constructs that have a much less efficient >> implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > = From rcolebur at SCIRES.COM Thu Jun 7 18:52:04 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Thu, 7 Jun 2012 12:52:04 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607163641.A81351A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> Message-ID: Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7). What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated. Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things. In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example. In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know). The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so. Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time. There is simply a static array of the types. CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome. It's all in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a heavy performance cost. And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation. Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job". I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit. I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads. Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code. I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong. Life's too short... Mika P.S. how are the pthreads coming along? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > = From dabenavidesd at yahoo.es Thu Jun 7 21:42:44 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 7 Jun 2012 20:42:44 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: Message-ID: <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: Yes, it is, but the same conditioning over System Pthreads, is that you can't always link the threads against themselves, so you need re-implement it correctly. Good style DEC-SRC threads might be along the verification project for the Alpha with Vector extensions: http://barroso.org/publications/piranha_asilomar.pdf Thanks in advance --- El jue, 7/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" Fecha: jueves, 7 de junio, 2012 11:52 Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7).? What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated.? Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things.? In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example.? In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know).? The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so.? Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time.? There is simply a static array of the types.? CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome.? It's all in RT0 somewhere.? In short, CM3 does "more" than SRC M3 did but at a heavy performance cost.? And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation.? Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job".? I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit.? I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads.? Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code.? I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong.? Life's too short... ? ???Mika P.S. how are the pthreads coming along?? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now?? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > ??? ???????? ?????? ??? ? = -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Thu Jun 7 22:09:58 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 7 Jun 2012 22:09:58 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607163641.A81351A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> Message-ID: <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> Are you sure about this? Both pm3 and cm3 load type structures from object files on initialization. Type data is in UNTRACED REF ARRAY? structures, for both of them. Difference is in algorithm being incremental, "multi-pass" in cm3 and single-pass in pm3/SRC. Also, for garbage collection, there is a check to see if number of modules (meaning more globals areas) has grown, and rebuilding of globals list in case it is. There is nothing static in type structure of Modula-3. On Jun 7, 2012, at 6:36 PM, Mika Nystrom wrote: > Because of the restrictions of SRC and P M3, types are statically > allocated at compile time and all their subtyping relationships are known > at that time. There is simply a static array of the types. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Thu Jun 7 22:35:37 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 07 Jun 2012 13:35:37 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> Message-ID: <20120607203537.1CBC81A205B@async.async.caltech.edu> Sorry, "static" was (slightly) the wrong word. I believe they are malloced as an array during program startup. There is something significant about the ordering of this array, which is why you can't just add types to the PM3 environment during runtime. CM3 uses more indirection, so it's much easier to add things while running, but it also makes TYPECASE, ISTYPE, etc., slower. Possibly NARROW (explicit as well as implicit) as well... Mika =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > >--Apple-Mail=_D8C54D3B-50C9-47D3-AD4D-116B678A55EC >Content-Transfer-Encoding: quoted-printable >Content-Type: text/plain; > charset=windows-1252 > >Are you sure about this? > >Both pm3 and cm3 load type structures from object files on = >initialization. Type data is in UNTRACED REF ARRAY=85 structures, for = >both of them. > >Difference is in algorithm being incremental, "multi-pass" in cm3 and = >single-pass in pm3/SRC. Also, for garbage collection, there is a check = >to see if number of modules (meaning more globals areas) has grown, and = >rebuilding of globals list in case it is. >=20 >There is nothing static in type structure of Modula-3. > >On Jun 7, 2012, at 6:36 PM, Mika Nystrom wrote: > >> Because of the restrictions of SRC and P M3, types are statically >> allocated at compile time and all their subtyping relationships are = >known >> at that time. There is simply a static array of the types. > > >--Apple-Mail=_D8C54D3B-50C9-47D3-AD4D-116B678A55EC >Content-Transfer-Encoding: quoted-printable >Content-Type: text/html; > charset=windows-1252 > >-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Are = >you sure about this?

Both pm3 and cm3 load type = >structures from object files on initialization. Type data is in UNTRACED = >REF ARRAY=85 structures, for both of = >them.

Difference is in algorithm being = >incremental, "multi-pass" in cm3 and single-pass in pm3/SRC. Also, for = >garbage collection, there is a check to see if number of modules = >(meaning more globals areas) has grown, and rebuilding of globals list = >in case it is.
 
There is nothing static in = >type structure of Modula-3.

On Jun 7, 2012, at = >6:36 PM, Mika Nystrom wrote:

class=3D"Apple-interchange-newline">
class=3D"Apple-style-span" style=3D"border-collapse: separate; = >font-family: Helvetica; font-style: normal; font-variant: normal; = >font-weight: normal; letter-spacing: normal; line-height: normal; = >orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: = >none; white-space: normal; widows: 2; word-spacing: 0px; = >-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: = >0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: = >auto; -webkit-text-stroke-width: 0px; font-size: medium; ">Because of = >the restrictions of SRC and P M3, types are statically
allocated at = >compile time and all their subtyping relationships are known
at that = >time.  There is simply a static array of the = >types.

= > >--Apple-Mail=_D8C54D3B-50C9-47D3-AD4D-116B678A55EC-- From hendrik at topoi.pooq.com Thu Jun 7 23:11:35 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 7 Jun 2012 17:11:35 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607163641.A81351A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> Message-ID: <20120607211135.GA6314@topoi.pooq.com> On Thu, Jun 07, 2012 at 09:36:41AM -0700, Mika Nystrom wrote: > Hi Jay, > > TYPECASE is limited to "reference" types, which effectively means > heap-allocated. Unless you can get alloca in there, I suppose... what > I mean is that in Green Book Modula-3 the only way to get a reference > type is either through a heap allocation or an UNSAFE operation. > > TYPECASE is sometimes the only way to do things. In the Green Book > there are examples of using subtyping to have multiple generations > of objects in the same pickles, for example. In my program, it was > inside an interpreter that's figuring things out without any prior > type information, using ISTYPE or TYPECASE. > > The issue with TYPECASE that I brought up is actually that the > implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than > in PM3's (= SRC M3 as far as I know). The reason (which you allude to) > is that Critical Mass did a lot of work on supporting dynamic loading > of Modula-3 code (loading in types not known at compile time) and as > with many of the other projects they carried out, the code quality was > so-so. Because of the restrictions of SRC and P M3, types are statically > allocated at compile time and all their subtyping relationships are known > at that time. There is simply a static array of the types. CM3, on the > other hand, has some more complicated dynamic data structure that makes > all the TYPECASE and ISTYPE operations much more cumbersome. It's all > in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a > heavy performance cost. And of course no one uses the "more" bit now. I'd like to, if I only knew how. I'd be really interested in having the low-level infrastructure for JIT code generators. -- hendrik From rcolebur at SCIRES.COM Thu Jun 7 23:44:56 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Thu, 7 Jun 2012 17:44:56 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com> References: <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com> Message-ID: Daniel: I'm impressed by your ability to provide so many different research links in your posts. But, after looking at the link you gave in response to my post, I don't see the immediate relevance to my question regarding Modula-3 threading on Windows. Also, I'm sorry, but I have a very difficult time trying to understand what you are saying in your posts. I suppose it must have something to do with the translation between our different languages. Forgive me, but I don't understand your reply. --Randy Coleburn From: Daniel Alejandro Benavides D. [mailto:dabenavidesd at yahoo.es] Sent: Thursday, June 07, 2012 3:43 PM To: m3devel at elegosoft.com; Coleburn, Randy Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi all: Yes, it is, but the same conditioning over System Pthreads, is that you can't always link the threads against themselves, so you need re-implement it correctly. Good style DEC-SRC threads might be along the verification project for the Alpha with Vector extensions: http://barroso.org/publications/piranha_asilomar.pdf Thanks in advance --- El jue, 7/6/12, Coleburn, Randy > escribi?: De: Coleburn, Randy > Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" > Fecha: jueves, 7 de junio, 2012 11:52 Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7). What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated. Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things. In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example. In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know). The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so. Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time. There is simply a static array of the types. CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome. It's all in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a heavy performance cost. And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation. Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job". I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit. I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads. Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code. I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong. Life's too short... Mika P.S. how are the pthreads coming along? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > = -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 00:01:50 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 00:01:50 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607203537.1CBC81A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> <20120607203537.1CBC81A205B@async.async.caltech.edu> Message-ID: <1495A062-1210-4D07-815C-D3609442C51B@m3w.org> I've worked with both runtimes at this level (but not lately). And I can't think of one reason why this would be correct. (It does not make me right, I know:). Structures are equivalent, IIRC, primary difference being in algorithm. Incremental RTLinker operation results in possible reallocation of type structures (bottom of the world, you-are-the-wizard-if you-read-this), but they are still "static" for the most of (99.999..%) process lifetime. Question is important and I am sure it is fixable, if only we can identify problem here. There is nothing inherent to ability for dynamic loading demanding bad data structures at the botom of M3 world. Only (not-improbable) sub-optimal decisions made by cmass people at the moment. On Jun 7, 2012, at 10:35 PM, Mika Nystrom wrote: > Sorry, "static" was (slightly) the wrong word. > > I believe they are malloced as an array during program startup. There is > something significant about the ordering of this array, which is why you > can't just add types to the PM3 environment during runtime. CM3 uses > more indirection, so it's much easier to add things while running, > but it also makes TYPECASE, ISTYPE, etc., slower. Possibly NARROW > (explicit as well as implicit) as well... > > Mika -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Fri Jun 8 00:23:11 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 07 Jun 2012 15:23:11 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <1495A062-1210-4D07-815C-D3609442C51B@m3w.org> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> <20120607203537.1CBC81A205B@async.async.caltech.edu> <1495A062-1210-4D07-815C-D3609442C51B@m3w.org> Message-ID: <20120607222311.E35A71A205B@async.async.caltech.edu> Admittedly it's been a while since I looked at this. I think what's going on is that they used some sort of topological sorting in SRC M3, which was broken by Critical Mass. The reason for the slowdowns is clear if you study the following code for IsSubtype. PM3: PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN = VAR t := Get (b); BEGIN IF (a >= RT0u.nTypes) THEN BadType (a) END; IF (a = 0) THEN RETURN TRUE END; RETURN (t.typecode <= a AND a <= t.lastSubTypeTC); END IsSubtype; CM3: PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN = VAR t: RT0.TypeDefn; BEGIN IF (a = RT0.NilTypecode) THEN RETURN TRUE END; t := Get (a); IF (t = NIL) THEN RETURN FALSE; END; IF (t.typecode = b) THEN RETURN TRUE END; WHILE (t.kind = ORD (TK.Obj)) DO IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END; t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent; IF (t = NIL) THEN RETURN FALSE; END; IF (t.typecode = b) THEN RETURN TRUE; END; END; IF (t.traced # 0) THEN RETURN (b = RT0.RefanyTypecode); ELSE RETURN (b = RT0.AddressTypecode); END; END IsSubtype; Now let's take a peek at Typecase (it is emitted by the compiler for SRC and P M3!)... PROCEDURE ScanTypecase (ref: REFANY; x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER = VAR p: UNTRACED REF TypecaseCell; i: INTEGER; tc, xc: Typecode; BEGIN IF (ref = NIL) THEN RETURN 0; END; tc := TYPECODE (ref); p := x; i := 0; LOOP IF (p.uid = 0) THEN RETURN i; END; IF (p.defn = NIL) THEN p.defn := FindType (p.uid); IF (p.defn = NIL) THEN Fail (RTE.MissingType, RTModule.FromDataAddress(x), LOOPHOLE (p.uid, ADDRESS), NIL); END; END; xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode; IF (tc = xc) OR IsSubtype (tc, xc) THEN RETURN i; END; INC (p, ADRSIZE (p^)); INC (i); END; END ScanTypecase; Mika =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > >--Apple-Mail=_37D33CA3-82A5-4037-BFBC-40CEE6E0DADD >Content-Transfer-Encoding: quoted-printable >Content-Type: text/plain; > charset=us-ascii > >I've worked with both runtimes at this level (but not lately). And I = >can't think of one reason why this would be correct. (It does not make = >me right, I know:). Structures are equivalent, IIRC, primary difference = >being in algorithm. Incremental RTLinker operation results in possible = >reallocation of type structures (bottom of the world, = >you-are-the-wizard-if you-read-this), but they are still "static" for = >the most of (99.999..%) process lifetime. > >Question is important and I am sure it is fixable, if only we can = >identify problem here. There is nothing inherent to ability for dynamic = >loading demanding bad data structures at the botom of M3 world. Only = >(not-improbable) sub-optimal decisions made by cmass people at the = >moment.=20 > >On Jun 7, 2012, at 10:35 PM, Mika Nystrom wrote: > >> Sorry, "static" was (slightly) the wrong word. >>=20 >> I believe they are malloced as an array during program startup. There = >is >> something significant about the ordering of this array, which is why = >you >> can't just add types to the PM3 environment during runtime. CM3 uses >> more indirection, so it's much easier to add things while running, >> but it also makes TYPECASE, ISTYPE, etc., slower. Possibly NARROW >> (explicit as well as implicit) as well... >>=20 >> Mika > > >--Apple-Mail=_37D33CA3-82A5-4037-BFBC-40CEE6E0DADD >Content-Transfer-Encoding: quoted-printable >Content-Type: text/html; > charset=us-ascii > >-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">I've = >worked with both runtimes at this level (but not lately). And I can't = >think of one reason why this would be correct. (It does not make me = >right, I know:). Structures are equivalent, IIRC, primary difference = >being in algorithm. Incremental RTLinker operation results in possible = >reallocation of type structures (bottom of the world, = >you-are-the-wizard-if you-read-this), but they are still "static" for = >the most of (99.999..%) process lifetime.

Question is = >important and I am sure it is fixable, if only we can identify problem = >here. There is nothing inherent to ability for dynamic loading demanding = >bad data structures at the botom of M3 world. Only (not-improbable) = >sub-optimal decisions made by cmass people at the = >moment. 

On Jun 7, 2012, at 10:35 PM, Mika = >Nystrom wrote:

type=3D"cite">separate; font-family: Helvetica; font-style: normal; font-variant: = >normal; font-weight: normal; letter-spacing: normal; line-height: = >normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; = >text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; = >-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: = >0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: = >auto; -webkit-text-stroke-width: 0px; font-size: medium; ">Sorry, = >"static" was (slightly) the wrong word.

I believe they are = >malloced as an array during program startup.  There is
something = >significant about the ordering of this array, which is why you
can't = >just add types to the PM3 environment during runtime.  CM3 = >uses
more indirection, so it's much easier to add things while = >running,
but it also makes TYPECASE, ISTYPE, etc., slower. = > Possibly NARROW
(explicit as well as implicit) as = >well...

    Mika

<= >/div>
= > >--Apple-Mail=_37D33CA3-82A5-4037-BFBC-40CEE6E0DADD-- From jay.krell at cornell.edu Fri Jun 8 01:18:51 2012 From: jay.krell at cornell.edu (Jay) Date: Thu, 7 Jun 2012 16:18:51 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Message-ID: Actually what I showed is frequently wrong. We often use bitfield references, which seems wierd or wrong, but seems to work generally ok and produce better code. The RIGHT thing to use would be "component refs" in gcc parlance, but currently we don't and it isn't a small change. There is kind of a mismatch in the compiler architecture currently... - Jay (briefly/pocket-sized-computer-aka-phone) On Jun 6, 2012, at 3:17 AM, Dragi?a Duri? wrote: > I know that much about generated code :). > > "Good" thing is - not many things changed in *m3 backend since I ported pm3 to LINUX_ALPHA :) > > On Jun 6, 2012, at 11:42 AM, Jay K wrote: > >> >>> Functions that call setjmp >> >> >> I meant -- functions wtih TRY/EXCEPT or TRY/FINALLY. :) >> >> - Jay >> >> ---------------------------------------- >>> From: jay.krell at cornell.edu >>> To: dragisha at m3w.org >>> Date: Wed, 6 Jun 2012 09:38:18 +0000 >>> CC: jkrell at elego.de; m3devel at elegosoft.com >>> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >>> >>> >>> 5.8.6 does allow many optimizations to occur. >>> We turn off a very small number directly. >>> Functions that call setjmp have optimizations inhibited by declaring all locals volatile. >>> We don't give the compiler good type information, and we take the address of stuff more than necessary, by >>> generating very low level code. >>> Where you have e.g. >>> MODULE Foo; >>> TYPE Point = RECORD x,y:INTEGER END; >>> PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; >>> >>> >>> We generate the equivalent of: >>> >>> >>> typedef ptrdiff_t INTEGER; >>> typedef char* ADDRESS; >>> INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } >>> >>> >>> Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. >>> >>> >>> >>> - Jay >>> >>> >>> ________________________________ >>>> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >>>> From: dragisha at m3w.org >>>> Date: Wed, 6 Jun 2012 10:51:33 +0200 >>>> CC: jkrell at elego.de; m3devel at elegosoft.com >>>> To: jay.krell at cornell.edu >>>> >>>> I am using it, and I need it. >>>> >>>> Does it run better/faster? I didn't test, but is it something to even >>>> ask, these days, architectures, ? ? >>>> >>>> Only if you turned everything off in 5.8.6 and later, as you'r doing it >>>> now, then probably my "-O2" default it is of no benefit at all :). >>>> >>>> Generally, our "pitch" to "sell" >>>> super-modern-ultra-blast-mega-fast-superlative-OO and everything else >>>> you only dreamed about? And add "no CPU optimizations"? Imagine that. >>>> >>>> On Jun 6, 2012, at 10:10 AM, Jay K wrote: >>>> >>>> 7) Do folks out there really use the Modula-3/gcc optimizer, and notice >>>> it produces code that runs much faster? >>>> >>> >> > From dabenavidesd at yahoo.es Fri Jun 8 01:21:47 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 8 Jun 2012 00:21:47 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: Message-ID: <1339111307.87279.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: perhaps this would show it: Again, what I'm saying is that you can use a WinNT system thread without losing M3 semantics as long as is implemented as it is in the consistent Model Architecture of the system: http://research.microsoft.com/en-us/um/people/qadeer/talks/microsoft-dec00.ppt Recently a guy from Intel (Rick Hudson) explained and out his? thoughts on that (but I find the same problem I can't understand the problem his is talking about that much). Rialto NT OS was implemented along the lines for embedded devices (nice!): http://www.youtube.com/watch?v=WUfvvFD5tAA DEC-SRC and MS worked together on this, in acting like so there was an Alpha "beta" Win2000, but it didn't happen, as the piranha project :( See this new architectures don't scale for that much they say (sorry HW guys, but show me a good proof I'm writing this from nothing related to it) Thanks in advance --- El jue, 7/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" Fecha: jueves, 7 de junio, 2012 16:44 Daniel: ?I?m impressed by your ability to provide so many different research links in your posts. ?But, after looking at the link you gave in response to my post, I don?t see the immediate relevance to my question regarding Modula-3 threading on Windows. ?Also, I?m sorry, but I have a very difficult time trying to understand what you are saying in your posts. I suppose it must have something to do with the translation between our different languages. Forgive me, but I don?t understand your reply. ?--Randy Coleburn ?From: Daniel Alejandro Benavides D. [mailto:dabenavidesd at yahoo.es] Sent: Thursday, June 07, 2012 3:43 PM To: m3devel at elegosoft.com; Coleburn, Randy Subject: Re: [M3devel] [M3commit] CVS Update: cm3 ?Hi all: Yes, it is, but the same conditioning over System Pthreads, is that you can't always link the threads against themselves, so you need re-implement it correctly. Good style DEC-SRC threads might be along the verification project for the Alpha with Vector extensions: http://barroso.org/publications/piranha_asilomar.pdf Thanks in advance --- El jue, 7/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" Fecha: jueves, 7 de junio, 2012 11:52Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7).? What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated.? Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things.? In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example.? In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know).? The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so.? Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time.? There is simply a static array of the types.? CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome.? It's all in RT0 somewhere.? In short, CM3 does "more" than SRC M3 did but at a heavy performance cost.? And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation.? Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job".? I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit.? I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads.? Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code.? I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong.? Life's too short... ? ???Mika P.S. how are the pthreads coming along?? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now?? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > ??? ???????? ?????? ??? ? = ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 04:05:11 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 02:05:11 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, Message-ID: 1. Yes, Daniel generally doesn't make sense to me either. 2. > Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Yes.Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do.Definitely better than others e.g. Boost. At some point maybe we could use the condition variables that Vista introduces, but 1) I'm reluctant to drop 2000/XP/etc. support and 2) if we implemented something that chose one implementation or the other at runtime, we'd lose coverage of the pre-Vista code. (I'm really disappointed in this area in Win32, that NT 3.1 and Windows 95 didn't have small locks, zero-or-at-least-statically-initializable locks, read/write locks, "once", and condition variables. Vista, finally, has all that. (SRWLOCK are the first three all in one -- small, zero-initialized, read/write...and given them, I'm not sure you really need "once".) Also note that historically we maintained a thread pool, so /creating/ a Modula-3 thread did not necessarily create a Win32 thread. I removed that though, so the implementation is more direct now, albeit probably slower. I didn't realize or forgot we had a problem here. I can try to look into it. The Win32 and pthreads implementation is similar enough, that it might easily be the same problem. - Jay From: rcolebur at SCIRES.COM To: m3devel at elegosoft.com Date: Thu, 7 Jun 2012 17:44:56 -0400 Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Daniel: I?m impressed by your ability to provide so many different research links in your posts. But, after looking at the link you gave in response to my post, I don?t see the immediate relevance to my question regarding Modula-3 threading on Windows. Also, I?m sorry, but I have a very difficult time trying to understand what you are saying in your posts. I suppose it must have something to do with the translation between our different languages. Forgive me, but I don?t understand your reply. --Randy Coleburn From: Daniel Alejandro Benavides D. [mailto:dabenavidesd at yahoo.es] Sent: Thursday, June 07, 2012 3:43 PM To: m3devel at elegosoft.com; Coleburn, Randy Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi all: Yes, it is, but the same conditioning over System Pthreads, is that you can't always link the threads against themselves, so you need re-implement it correctly. Good style DEC-SRC threads might be along the verification project for the Alpha with Vector extensions: http://barroso.org/publications/piranha_asilomar.pdf Thanks in advance --- El jue, 7/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" Fecha: jueves, 7 de junio, 2012 11:52Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7). What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated. Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things. In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example. In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know). The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so. Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time. There is simply a static array of the types. CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome. It's all in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a heavy performance cost. And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation. Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job". I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit. I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads. Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code. I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong. Life's too short... Mika P.S. how are the pthreads coming along? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > = -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 04:13:02 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 02:13:02 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607211135.GA6314@topoi.pooq.com> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , <20120606161808.7F5EA1A205B@async.async.caltech.edu>, , <20120607163641.A81351A205B@async.async.caltech.edu>, <20120607211135.GA6314@topoi.pooq.com> Message-ID: > I'd like to, if I only knew how. I'd be really interested in having the > low-level infrastructure for JIT code generators Would you be satisfied with a Modula-3 interpreter that interpreted a mostly-compiled form?It shouldn't be difficult.I don't know if our intermediate code was designed with interpretation in mind, but it seemslike it wouldn't be particularly difficult.You'd want a "linker" that just zips all the files and puts it "in" or "next to" the stub executable. This would solve the distribution format problem, partly.The existing intermediate code is platform-specific, but not by much (again: jumpbuf size, word size, endian,win32 vs. posix). But I have to admit, I'm keener on generating C than a JIT or an interpreter, andinterpreter is not JIT. Um. What do you hope to gain from JIT?A big reason I ask..is because..well, do you want to ship some portable-executable that relieson JIT being already installed/available? Or do you want to carry the JITer and its code together?Or do you want to target an existing widely deployed JITer such as CLR or Java? In my opinion, the biggest advantage of JIT is portable-executable, depending on widely deployed JITer.But targeting CLR or Java isn't as easy as targeting your own custom thing. I understand there are other advantages -- faster compilation, optimization very specific to runtime environment.But I think portable-executable is most important. That's why I like "script". :)There are disadvantages to JIT: slower execution/startup, maybe harder to debug, easy to reverse engineer (if you care). Heck, at some point you just ship the compiler and portable-executable is source code.There are pluses and minuses all around. - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 04:19:57 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 02:19:57 +0000 Subject: [M3devel] gcc 4.6 backend w/o optimizer? Message-ID: I need to know if I can start moving targets up to a gcc 4.6 backend, given that I've removed the vast majority of the optimizer from it.I will test some of the targets, maybe not all of them. So far I386_DARWIN and AMD64_DARWIN work and I built boot/cross archives for a very large list, and I can run cm3 on Solaris also (I forget which architectures, there are 4, probably SPARC32 at least). Or if there is vehement rejection of a missing optimizer, I can abandon 4.6 and start work on 4.7 instead.I get tired of the unnecessary tedium that I invented, so with 4.7, I'll try to keep the diff small, in particular: keep the gmp/mpfr/mpc dependencies don't compile it with C++ (except parse.c) There is no longer a "core" distribution of gcc, but I'll still cut out vast swaths like all but the C and LTO frontends (Java, C++, Objective C, Objective C++, Fortran, Ada), all of the libraries (libjava, libada, libssp, libmudflap, libgfortran, libquadmath, libgcc, libstdc++, etc.) I know I have one rejection of this but that might not be enough. Tony? - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 09:15:59 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 09:15:59 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607222311.E35A71A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> <20120607203537.1CBC81A205B@async.async.caltech.edu> <1495A062-1210-4D07-815C-D3609442C51B@m3w.org> <20120607222311.E35A71A205B@async.async.caltech.edu> Message-ID: <77A59CCB-0800-4C3C-8AF6-5B455B29DEF7@m3w.org> Thank you for effort. Possible solution is to map typecodes to orderable id's and re-sort every time dynamic loader changes type metadata. Any takers? That way, we will only add one to two array lookups to every TYPECASE invocation. Additional complexity for re-sort is single to small number of invocations. On Jun 8, 2012, at 12:23 AM, Mika Nystrom wrote: > The reason for the slowdowns is clear if you study the following code > for IsSubtype. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 10:06:49 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 10:06:49 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, Message-ID: Please explain this more, and if you can - draw parallel to *nix. TIA On Jun 8, 2012, at 4:05 AM, Jay K wrote: > Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > > > I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > Definitely better than others e.g. Boost. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 11:23:38 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 09:23:38 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , Message-ID: sorry -- clarification, we are similar to the widely used Sun/Oracle JVM.Not necessarily state-of-the-art, but not bad. Our locks map pretty directly to underlying pthread mutex, Win32 critical section.Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. Our condition variable functionaliy maps pretty directly to pthread condition variables.Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation.It was pretty bad. Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. - Jay> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > From: antony.hosking at gmail.com > Date: Fri, 8 Jun 2012 04:38:20 -0400 > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > To: dragisha at m3w.org > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > Please explain this more, and if you can - draw parallel to *nix. > > > > TIA > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > >> > >> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > >> Definitely better than others e.g. Boost. > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > - Tony > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 12:38:30 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 12:38:30 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> Message-ID: <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. Same thing? On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote: > My point is that modern JVMs, including Sun/Oracle HotSpot, don?t map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. > > On Jun 8, 2012, at 5:23 AM, Jay K wrote: > >> sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. >> Not necessarily state-of-the-art, but not bad. >> >> >> Our locks map pretty directly to underlying pthread mutex, Win32 critical section. >> Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. >> >> >> Our condition variable functionaliy maps pretty directly to pthread condition variables. >> Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. >> >> >> Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. >> It was pretty bad. >> >> >> Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. >> >> >> - Jay >> > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) >> > From: antony.hosking at gmail.com >> > Date: Fri, 8 Jun 2012 04:38:20 -0400 >> > CC: jay.krell at cornell.edu; m3devel at elegosoft.com >> > To: dragisha at m3w.org >> > >> > >> > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: >> > >> > > Please explain this more, and if you can - draw parallel to *nix. >> > > >> > > TIA >> > > >> > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: >> > > >> > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. >> > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) >> > >> >> > >> >> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. >> > >> Definitely better than others e.g. Boost. >> > >> > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. >> > >> > - Tony >> > > > > > Antony Hosking | Associate Professor | Computer Science | Purdue University > 305 N. University Street | West Lafayette | IN 47907 | USA > Mobile +1 765 427 5484 > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 12:48:47 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 12:48:47 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> Message-ID: <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org> On Jun 8, 2012, at 12:38 PM, Dragi?a Duri? wrote: > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. > > Same thing? > Meaning: "At least under Linux, Modula-3 using pthreads does same thing as modern JVMs?" > On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote: > >> My point is that modern JVMs, including Sun/Oracle HotSpot, don?t map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. >> >> On Jun 8, 2012, at 5:23 AM, Jay K wrote: >> >>> sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. >>> Not necessarily state-of-the-art, but not bad. >>> >>> >>> Our locks map pretty directly to underlying pthread mutex, Win32 critical section. >>> Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. >>> >>> >>> Our condition variable functionaliy maps pretty directly to pthread condition variables. >>> Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. >>> >>> >>> Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. >>> It was pretty bad. >>> >>> >>> Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. >>> >>> >>> - Jay >>> > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) >>> > From: antony.hosking at gmail.com >>> > Date: Fri, 8 Jun 2012 04:38:20 -0400 >>> > CC: jay.krell at cornell.edu; m3devel at elegosoft.com >>> > To: dragisha at m3w.org >>> > >>> > >>> > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: >>> > >>> > > Please explain this more, and if you can - draw parallel to *nix. >>> > > >>> > > TIA >>> > > >>> > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: >>> > > >>> > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. >>> > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) >>> > >> >>> > >> >>> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. >>> > >> Definitely better than others e.g. Boost. >>> > >>> > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. >>> > >>> > - Tony >>> > >> >> >> >> Antony Hosking | Associate Professor | Computer Science | Purdue University >> 305 N. University Street | West Lafayette | IN 47907 | USA >> Mobile +1 765 427 5484 >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hosking at cs.purdue.edu Fri Jun 8 12:25:20 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 8 Jun 2012 06:25:20 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , Message-ID: <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> My point is that modern JVMs, including Sun/Oracle HotSpot, don?t map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. On Jun 8, 2012, at 5:23 AM, Jay K wrote: > sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. > Not necessarily state-of-the-art, but not bad. > > > Our locks map pretty directly to underlying pthread mutex, Win32 critical section. > Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. > > > Our condition variable functionaliy maps pretty directly to pthread condition variables. > Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. > > > Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. > It was pretty bad. > > > Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. > > > - Jay > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > > From: antony.hosking at gmail.com > > Date: Fri, 8 Jun 2012 04:38:20 -0400 > > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > > To: dragisha at m3w.org > > > > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > > > Please explain this more, and if you can - draw parallel to *nix. > > > > > > TIA > > > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > > >> > > >> > > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > > >> Definitely better than others e.g. Boost. > > > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > > > - Tony > > Antony Hosking | Associate Professor | Computer Science | Purdue University 305 N. University Street | West Lafayette | IN 47907 | USA Mobile +1 765 427 5484 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 13:20:56 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 11:20:56 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org> References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org>, <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org> Message-ID: > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. So is uncontended Win32 critical section and uncontended Win32 SRWLOCK. Just disassemble and/or step through them... Mutex/Semaphore/Event, those always go to the kernel, unfortunately. So our win32 condition-variable-ish stuff might, I have to check. It'd be unfortunate, but it still probably as good as it can be, short of depending on Vista. (Uncontended Vista+ condition variables surely don't involve the kernel either.) The CAS isn't inlined. There is a function call. A dynamically linked one, so at least on Win32, it goes through a function pointer, but other than inlining factors, it can still be very fast. It can bias to a thread, and such. But there will be a function call. - Jay Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) From: dragisha at m3w.org Date: Fri, 8 Jun 2012 12:48:47 +0200 CC: m3devel at elegosoft.com; jay.krell at cornell.edu To: hosking at cs.purdue.edu On Jun 8, 2012, at 12:38 PM, Dragi?a Duri? wrote:At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. Same thing? Meaning: "At least under Linux, Modula-3 using pthreads does same thing as modern JVMs?" On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote:My point is that modern JVMs, including Sun/Oracle HotSpot, don't map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. On Jun 8, 2012, at 5:23 AM, Jay K wrote:sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. Not necessarily state-of-the-art, but not bad. Our locks map pretty directly to underlying pthread mutex, Win32 critical section. Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. Our condition variable functionaliy maps pretty directly to pthread condition variables. Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. It was pretty bad. Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. - Jay > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > From: antony.hosking at gmail.com > Date: Fri, 8 Jun 2012 04:38:20 -0400 > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > To: dragisha at m3w.org > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > Please explain this more, and if you can - draw parallel to *nix. > > > > TIA > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > >> > >> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > >> Definitely better than others e.g. Boost. > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > - Tony > Antony Hosking | Associate Professor | Computer Science | Purdue University305 N. University Street | West Lafayette | IN 47907 | USAMobile +1 765 427 5484 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 13:50:14 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 11:50:14 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , , , , , , , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu>, <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org>, , <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org>, Message-ID: I don't fully understand the paper, but clearly people want to both avoid the function call, and the CAS. And clearly this is viable and often profitable -- often times locks are only ever acquired by one thread, or are locked many times by one thread, then many times by another, etc. The tricky part is adapting to determine which locks benefit, and handling the "transitions" (or "bias revocation") when a "second" thread does acquire the lock. Traditional C/C++ systems are always going to have the function call.Whether or not the CAS can be optimized away in such "unmanaged" systems, I don't know.For example, Win32 SRWLOCKs have no "cleanup" function, nor a required "initialize" function, so that might limit the flexibility of the implementation, though certainly is also advantageous.. - Jay From: jay.krell at cornell.edu To: dragisha at m3w.org; hosking at cs.purdue.edu Date: Fri, 8 Jun 2012 11:20:56 +0000 CC: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. So is uncontended Win32 critical section and uncontended Win32 SRWLOCK. Just disassemble and/or step through them... Mutex/Semaphore/Event, those always go to the kernel, unfortunately. So our win32 condition-variable-ish stuff might, I have to check. It'd be unfortunate, but it still probably as good as it can be, short of depending on Vista. (Uncontended Vista+ condition variables surely don't involve the kernel either.) The CAS isn't inlined. There is a function call. A dynamically linked one, so at least on Win32, it goes through a function pointer, but other than inlining factors, it can still be very fast. It can bias to a thread, and such. But there will be a function call. - Jay Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) From: dragisha at m3w.org Date: Fri, 8 Jun 2012 12:48:47 +0200 CC: m3devel at elegosoft.com; jay.krell at cornell.edu To: hosking at cs.purdue.edu On Jun 8, 2012, at 12:38 PM, Dragi?a Duri? wrote:At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. Same thing? Meaning: "At least under Linux, Modula-3 using pthreads does same thing as modern JVMs?" On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote:My point is that modern JVMs, including Sun/Oracle HotSpot, don't map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. On Jun 8, 2012, at 5:23 AM, Jay K wrote:sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. Not necessarily state-of-the-art, but not bad. Our locks map pretty directly to underlying pthread mutex, Win 32 critical section. Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. Our condition variable functionaliy maps pretty directly to pthread condition variables. Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. It was pretty bad. Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the va st majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. - Jay > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > From: antony.hosking at gmail.com > Date: Fri, 8 Jun 2012 04:38:20 -0400 > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > To: dragisha at m3w.org > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > Please explain this more, and if you can - draw parallel to *nix. > > > > TIA > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > >> > >> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > >> Definitely better than others e.g. Boost. > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > - Tony > Antony Hosking | Associate Professor | Computer Science | Purdue University305 N. University Street | West Lafayette | IN 47907 | USAMobile +1 765 427 5484 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hosking at cs.purdue.edu Fri Jun 8 16:40:35 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 8 Jun 2012 10:40:35 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , , , , , , , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu>, <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org>, , <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org>, Message-ID: <9A392836-2304-4D12-BB87-78A01C7391DF@cs.purdue.edu> Right. On Jun 8, 2012, at 7:50 AM, Jay K wrote: > I don't fully understand the paper, but clearly people want to both avoid the function call, and the CAS. > And clearly this is viable and often profitable -- often times locks are only ever acquired by one thread, or are locked many times by one thread, then many times by another, etc. The tricky part is adapting to determine which locks benefit, and handling the "transitions" (or "bias revocation") when a "second" thread does acquire the lock. > > > Traditional C/C++ systems are always going to have the function call. > Whether or not the CAS can be optimized away in such "unmanaged" systems, I don't know. > For example, Win32 SRWLOCKs have no "cleanup" function, nor a required "initialize" function, so that might limit the flexibility of the implementation, though certainly is also advantageous.. > > > - Jay > > From: jay.krell at cornell.edu > To: dragisha at m3w.org; hosking at cs.purdue.edu > Date: Fri, 8 Jun 2012 11:20:56 +0000 > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > > > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. > > > So is uncontended Win32 critical section and uncontended Win32 SRWLOCK. Just disassemble and/or step through them... > Mutex/Semaphore/Event, those always go to the kernel, unfortunately. > So our win32 condition-variable-ish stuff might, I have to check. It'd be unfortunate, but it still probably as good as it can be, short of depending on Vista. (Uncontended Vista+ condition variables surely don't involve the kernel either.) > > > The CAS isn't inlined. There is a function call. A dynamically linked one, so at least on Win32, it goes through a function pointer, but other than inlining factors, it can still be very fast. It can bias to a thread, and such. But there will be a function call. > > > - Jay > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > From: dragisha at m3w.org > Date: Fri, 8 Jun 2012 12:48:47 +0200 > CC: m3devel at elegosoft.com; jay.krell at cornell.edu > To: hosking at cs.purdue.edu > > > On Jun 8, 2012, at 12:38 PM, Dragi?a Duri? wrote: > > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. > > Same thing? > > > Meaning: "At least under Linux, Modula-3 using pthreads does same thing as modern JVMs?" > > > On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote: > > My point is that modern JVMs, including Sun/Oracle HotSpot, don't map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. > > On Jun 8, 2012, at 5:23 AM, Jay K wrote: > > sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. > Not necessarily state-of-the-art, but not bad. > > > Our locks map pretty directly to underlying pthread mutex, Win 32 critical section. > Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. > > > Our condition variable functionaliy maps pretty directly to pthread condition variables. > Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. > > > Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. > It was pretty bad. > > > Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the va st majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. > > > - Jay > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > > From: antony.hosking at gmail.com > > Date: Fri, 8 Jun 2012 04:38:20 -0400 > > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > > To: dragisha at m3w.org > > > > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > > > Please explain this more, and if you can - draw parallel to *nix. > > > > > > TIA > > > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > > >> > > >> > > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > > >> Definitely better than others e.g. Boost. > > > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > > > - Tony > > > > > > Antony Hosking | Associate Professor | Computer Science | Purdue University > 305 N. University Street | West Lafayette | IN 47907 | USA > Mobile +1 765 427 5484 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Fri Jun 8 16:55:40 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Fri, 8 Jun 2012 10:55:40 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <20120607211135.GA6314@topoi.pooq.com> Message-ID: <20120608145540.GA10805@topoi.pooq.com> On Fri, Jun 08, 2012 at 02:13:02AM +0000, Jay K wrote: > > > I'd like to, if I only knew how. I'd be really interested in having the > > low-level infrastructure for JIT code generators > Would you be satisfied with a Modula-3 interpreter that interpreted a > mostly-compiled form?It shouldn't be difficult. That would be lovely, for all the reasons and opportunitied you mentioned, but it's mostly orthogonal to what I want. I want to write JIT implementations for other languages, languages that have their own methods for defining data structures, but I want them to be interoperable with the Modula 3 I know and like. I don't mind writing a code generator or two, if necessary. But an interpreter would provide poratbility instead of efficiency. Having both could be useful. For example, I'd like to implement a formalism that enables me to download code from the net, formally verify its safety and then be able to execute it really fast. Yes, I might be comiling it all at once instead of a line at at time, but I do want to be able to add it to an existing running program, and saying "JIT" is about the easiest brief summary. I'm quite aware that doing more than a half-assed version of this would be a big project, and that's probably an understatement. > I don't know if our intermediate code was designed with interpretation > in mind, but it seems like it wouldn't be particularly difficult. > You'd want a "linker" that just zips all the files and puts it "in" or > "next to" the stub executable. This would solve the distribution > format problem, partly.The existing intermediate code is > platform-specific, but not by much (again: jumpbuf size, word size, > endian,win32 vs. posix). > But I have to admit, I'm keener on generating C than a JIT or an > interpreter, and interpreter is not JIT. > Um. What do you hope to gain from JIT? The ability to dynamically add code to an existing program and have it run fast. Possibly to have the program generate additional code to add to itself. > A big reason I ask..is > because..well, do you want to ship some portable-executable that > relieson JIT being already installed/available? Or do you want to > carry the JITer and its code together?Or do you want to target an > existing widely deployed JITer such as CLR or Java? In my opinion, > the biggest advantage of JIT is portable-executable, depending on > widely deployed JITer.But targeting CLR or Java isn't as easy as > targeting your own custom thing. I understand there are other > advantages -- faster compilation, optimization very specific to > runtime environment.But I think portable-executable is most important. > That's why I like "script". :)There are disadvantages to JIT: slower > execution/startup, maybe harder to debug, easy to reverse engineer (if > you care). Heck, at some point you just ship the compiler and > portable-executable is source code.There are pluses and minuses all > around. JIT is for speed. Otherwise, interpretation would suffice, and could even be portbale. But even an interpreter would like to be able to add new garbage-collectible types, which is what I'm asking for at the moment. - Jay From hosking at cs.purdue.edu Fri Jun 8 16:39:39 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 8 Jun 2012 10:39:39 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> Message-ID: <7FF89030-927D-40C6-993D-DB44E88A35AD@cs.purdue.edu> Agreed, but we should be able to inline the CAS, avoiding a function call. On Jun 8, 2012, at 6:38 AM, Dragi?a Duri? wrote: > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. > > Same thing? > > On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote: > >> My point is that modern JVMs, including Sun/Oracle HotSpot, don?t map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. >> >> On Jun 8, 2012, at 5:23 AM, Jay K wrote: >> >>> sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. >>> Not necessarily state-of-the-art, but not bad. >>> >>> >>> Our locks map pretty directly to underlying pthread mutex, Win32 critical section. >>> Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. >>> >>> >>> Our condition variable functionaliy maps pretty directly to pthread condition variables. >>> Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. >>> >>> >>> Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. >>> It was pretty bad. >>> >>> >>> Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. >>> >>> >>> - Jay >>> > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) >>> > From: antony.hosking at gmail.com >>> > Date: Fri, 8 Jun 2012 04:38:20 -0400 >>> > CC: jay.krell at cornell.edu; m3devel at elegosoft.com >>> > To: dragisha at m3w.org >>> > >>> > >>> > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: >>> > >>> > > Please explain this more, and if you can - draw parallel to *nix. >>> > > >>> > > TIA >>> > > >>> > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: >>> > > >>> > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. >>> > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) >>> > >> >>> > >> >>> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. >>> > >> Definitely better than others e.g. Boost. >>> > >>> > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. >>> > >>> > - Tony >>> > >> >> >> >> Antony Hosking | Associate Professor | Computer Science | Purdue University >> 305 N. University Street | West Lafayette | IN 47907 | USA >> Mobile +1 765 427 5484 >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jun 8 17:20:48 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 8 Jun 2012 16:20:48 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120608145540.GA10805@topoi.pooq.com> Message-ID: <1339168848.48067.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: interesting someone did that (see others in web search engine): http://compilers.iecc.com/comparch/article/98-03-247 Besides a partial JVM. It would be a selling point for CM3 to be readily implemented and efficient. Thanks in advance --- El vie, 8/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel" Fecha: viernes, 8 de junio, 2012 09:55 On Fri, Jun 08, 2012 at 02:13:02AM +0000, Jay K wrote: > >? > I'd like to, if I only knew how.? I'd be really interested in having the >? > low-level infrastructure for JIT code generators >? Would you be satisfied with a Modula-3 interpreter that interpreted a > mostly-compiled form?It shouldn't be difficult. That would be lovely, for all the reasons and opportunitied you mentioned, but it's mostly orthogonal to what I want. I want to write JIT implementations for other languages, languages that have their own methods for defining data structures, but I want them to be interoperable with the Modula 3 I know and like. I don't mind writing a code generator or two, if necessary.? But an interpreter would provide poratbility instead of efficiency.? Having both could be useful. For example, I'd like to implement a formalism that enables me to download code from the net, formally verify its safety and then be able to execute it really fast.? Yes, I might be comiling it all at once instead of a line at at time, but I do want to be able to add it to an existing running program, and saying "JIT" is about the easiest brief summary. I'm quite aware that doing more than a half-assed version of this would be a big project, and that's probably an understatement. ? > I don't know if our intermediate code was designed with interpretation > in mind, but it seems like it wouldn't be particularly difficult. > You'd want a "linker" that just zips all the files and puts it "in" or > "next to" the stub executable.? This would solve the distribution > format problem, partly.The existing intermediate code is > platform-specific, but not by much (again: jumpbuf size, word size, > endian,win32 vs. posix). > But I have to admit, I'm keener on generating C than a JIT or an > interpreter, and interpreter is not JIT. >? Um. What do you hope to gain from JIT? The ability to dynamically add code to an existing program and have it run fast.? Possibly to have the program generate additional code to add to itself. > A big reason I ask..is > because..well, do you want to ship some portable-executable that > relieson JIT being already installed/available? Or do you want to > carry the JITer and its code together?Or do you want to target an > existing widely deployed JITer such as CLR or Java?? In my opinion, > the biggest advantage of JIT is portable-executable, depending on > widely deployed JITer.But targeting CLR or Java isn't as easy as > targeting your own custom thing.? I understand there are other > advantages -- faster compilation, optimization very specific to > runtime environment.But I think portable-executable is most important. > That's why I like "script". :)There are disadvantages to JIT: slower > execution/startup, maybe harder to debug, easy to reverse engineer (if > you care).? Heck, at some point you just ship the compiler and > portable-executable is source code.There are pluses and minuses all > around. JIT is for speed.? Otherwise, interpretation would suffice, and could even be portbale.? But even an interpreter would like to be able to add new garbage-collectible types, which is what I'm asking for at the moment. ? ? - Jay ??? ???????? ?????? ??? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Fri Jun 8 17:37:04 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Fri, 8 Jun 2012 17:37:04 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120608145540.GA10805@topoi.pooq.com> References: <20120606064732.2C9242474003@birch.elegosoft.com><55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org><20120606161808.7F5EA1A205B@async.async.caltech.edu><20120607163641.A81351A205B@async.async.caltech.edu><20120607211135.GA6314@topoi.pooq.com> <20120608145540.GA10805@topoi.pooq.com> Message-ID: That would be relatively easy. libjit offers an excellent infrastructure for building just in time compilers. On the down-side: Slow program start and a considerable waste of memory resources. Their code generator is as good as non-optimised C. An example: A JIT translator for Oberon. -------------------------------------------------- From: "Hendrik Boom" Sent: Friday, June 08, 2012 4:55 PM To: "m3devel" Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > On Fri, Jun 08, 2012 at 02:13:02AM +0000, Jay K wrote: >> >> > I'd like to, if I only knew how. I'd be really interested in having the >> > low-level infrastructure for JIT code generators >> Would you be satisfied with a Modula-3 interpreter that interpreted a >> mostly-compiled form?It shouldn't be difficult. > > That would be lovely, for all the reasons and opportunitied you > mentioned, but it's mostly orthogonal to what I want. > > I want to write JIT implementations for other languages, languages that > have their own methods for defining data structures, but I want them to > be interoperable with the Modula 3 I know and like. > > I don't mind writing a code generator or two, if necessary. But an > interpreter would provide poratbility instead of efficiency. Having > both could be useful. > > For example, I'd like to implement a formalism that enables me to > download code from the net, formally verify its safety and then be able > to execute it really fast. Yes, I might be comiling it all at once > instead of a line at at time, but I do want to be able to add it to an > existing running program, and saying "JIT" is about the easiest brief > summary. > > I'm quite aware that doing more than a half-assed version of this would > be a big project, and that's probably an understatement. > >> I don't know if our intermediate code was designed with interpretation >> in mind, but it seems like it wouldn't be particularly difficult. >> You'd want a "linker" that just zips all the files and puts it "in" or >> "next to" the stub executable. This would solve the distribution >> format problem, partly.The existing intermediate code is >> platform-specific, but not by much (again: jumpbuf size, word size, >> endian,win32 vs. posix). > >> But I have to admit, I'm keener on generating C than a JIT or an >> interpreter, and interpreter is not JIT. >> Um. What do you hope to gain from JIT? > > The ability to dynamically add code to an existing program and have it > run fast. Possibly to have the program generate additional code to add > to itself. > >> A big reason I ask..is >> because..well, do you want to ship some portable-executable that >> relieson JIT being already installed/available? Or do you want to >> carry the JITer and its code together?Or do you want to target an >> existing widely deployed JITer such as CLR or Java? In my opinion, >> the biggest advantage of JIT is portable-executable, depending on >> widely deployed JITer.But targeting CLR or Java isn't as easy as >> targeting your own custom thing. I understand there are other >> advantages -- faster compilation, optimization very specific to >> runtime environment.But I think portable-executable is most important. >> That's why I like "script". :)There are disadvantages to JIT: slower >> execution/startup, maybe harder to debug, easy to reverse engineer (if >> you care). Heck, at some point you just ship the compiler and >> portable-executable is source code.There are pluses and minuses all >> around. > > JIT is for speed. Otherwise, interpretation would suffice, and could > even be portbale. But even an interpreter would like to be able to add > new garbage-collectible types, which is what I'm asking for at the > moment. > > - Jay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jun 8 20:50:23 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 8 Jun 2012 19:50:23 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: Message-ID: <1339181423.68039.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: Olivetti M3 had one AST-based interpreter, Vulcan was AST-based environment I don't know which was better. Vulcan was heavily parallelized could be nice to make a Multi-Threaded Execution Engine. Olivetti M3 AST tk could be mostly like a good AST for doing extensible kind of meta-environment (and you could retarget C) so for instance use it to generate a portable? environment? in that sense and then execute it to on fast Vulcan parallel make fast JIT builder Thanks in advance --- El vie, 8/6/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "Hendrik Boom" , "m3devel" Fecha: viernes, 8 de junio, 2012 10:37 That would be relatively easy. libjit offers an excellent infrastructure for building just in time compilers. On the down-side: Slow program start and a considerable waste of memory resources. Their code generator is as good as non-optimised C. An example: A JIT translator for Oberon. -------------------------------------------------- From: "Hendrik Boom" Sent: Friday, June 08, 2012 4:55 PM To: "m3devel" Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > On Fri, Jun 08, 2012 at 02:13:02AM +0000, Jay K wrote: >> >>? > I'd like to, if I only knew how.? I'd be really interested in having the >>? > low-level infrastructure for JIT code generators >>? Would you be satisfied with a Modula-3 interpreter that interpreted a >> mostly-compiled form?It shouldn't be difficult. > > That would be lovely, for all the reasons and opportunitied you > mentioned, but it's mostly orthogonal to what I want. > > I want to write JIT implementations for other languages, languages that > have their own methods for defining data structures, but I want them to > be interoperable with the Modula 3 I know and like. > > I don't mind writing a code generator or two, if necessary.? But an > interpreter would provide poratbility instead of efficiency.? Having > both could be useful. > > For example, I'd like to implement a formalism that enables me to > download code from the net, formally verify its safety and then be able > to execute it really fast.? Yes, I might be comiling it all at once > instead of a line at at time, but I do want to be able to add it to an > existing running program, and saying "JIT" is about the easiest brief > summary. > > I'm quite aware that doing more than a half-assed version of this would > be a big project, and that's probably an understatement. >? >> I don't know if our intermediate code was designed with interpretation >> in mind, but it seems like it wouldn't be particularly difficult. >> You'd want a "linker" that just zips all the files and puts it "in" or >> "next to" the stub executable.? This would solve the distribution >> format problem, partly.The existing intermediate code is >> platform-specific, but not by much (again: jumpbuf size, word size, >> endian,win32 vs. posix). > >> But I have to admit, I'm keener on generating C than a JIT or an >> interpreter, and interpreter is not JIT. >>? Um. What do you hope to gain from JIT? > > The ability to dynamically add code to an existing program and have it > run fast.? Possibly to have the program generate additional code to add > to itself. > >> A big reason I ask..is >> because..well, do you want to ship some portable-executable that >> relieson JIT being already installed/available? Or do you want to >> carry the JITer and its code together?Or do you want to target an >> existing widely deployed JITer such as CLR or Java?? In my opinion, >> the biggest advantage of JIT is portable-executable, depending on >> widely deployed JITer.But targeting CLR or Java isn't as easy as >> targeting your own custom thing.? I understand there are other >> advantages -- faster compilation, optimization very specific to >> runtime environment.But I think portable-executable is most important. >> That's why I like "script". :)There are disadvantages to JIT: slower >> execution/startup, maybe harder to debug, easy to reverse engineer (if >> you care).? Heck, at some point you just ship the compiler and >> portable-executable is source code.There are pluses and minuses all >> around. > > JIT is for speed.? Otherwise, interpretation would suffice, and could > even be portbale.? But even an interpreter would like to be able to add > new garbage-collectible types, which is what I'm asking for at the > moment. > >??? - Jay????? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sun Jun 10 10:34:36 2012 From: jay.krell at cornell.edu (Jay K) Date: Sun, 10 Jun 2012 08:34:36 +0000 Subject: [M3devel] reducing our diff to gcc? Message-ID: reducing our diff to gcc? Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. But to my point: gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. but, tree-nested.c, I doubt this can be avoided..so I'm left probably just not bothering with the others. Thoughts? There is also at least one bug fix...that I could avoid needing. There is a bug optimizing our form of div/mod. We could avoid that by going back to function calls, but..again, I'm torn. If you configure -enable-checking, at least currently, there are asserts that have to be removed. I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. ?- Jay From dragisha at m3w.org Sun Jun 10 10:58:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 10 Jun 2012 10:58:00 +0200 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: References: Message-ID: <58EF8A55-4D81-401C-AC47-C5826F6EE759@m3w.org> Think Occam. Not overdoing is good idea :). On Jun 10, 2012, at 10:34 AM, Jay K wrote: > > I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sun Jun 10 12:31:46 2012 From: jay.krell at cornell.edu (Jay K) Date: Sun, 10 Jun 2012 10:31:46 +0000 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: <58EF8A55-4D81-401C-AC47-C5826F6EE759@m3w.org> References: , <58EF8A55-4D81-401C-AC47-C5826F6EE759@m3w.org> Message-ID: Hehe. If someone builds something over-complicated, am I obligated to strip it back down? :) ?- Jay ________________________________ > Subject: Re: [M3devel] reducing our diff to gcc? > From: dragisha at m3w.org > Date: Sun, 10 Jun 2012 10:58:00 +0200 > CC: m3devel at elegosoft.com > To: jay.krell at cornell.edu > > Think Occam. Not overdoing is good idea :). > > On Jun 10, 2012, at 10:34 AM, Jay K wrote: > > > I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. > From dragisha at m3w.org Sun Jun 10 13:05:35 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 10 Jun 2012 13:05:35 +0200 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: References: , <58EF8A55-4D81-401C-AC47-C5826F6EE759@m3w.org> Message-ID: <5663CB3D-C3ED-4BA0-823F-4D251B29F1A6@m3w.org> That makes your change compilcated :). So, no! :) On Jun 10, 2012, at 12:31 PM, Jay K wrote: > > Hehe. If someone builds something over-complicated, am I obligated to strip it back down? > :) > > - Jay > > ________________________________ >> Subject: Re: [M3devel] reducing our diff to gcc? >> From: dragisha at m3w.org >> Date: Sun, 10 Jun 2012 10:58:00 +0200 >> CC: m3devel at elegosoft.com >> To: jay.krell at cornell.edu >> >> Think Occam. Not overdoing is good idea :). >> >> On Jun 10, 2012, at 10:34 AM, Jay K wrote: >> >> >> I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. >> > From dragisha at m3w.org Sun Jun 10 16:16:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 10 Jun 2012 16:16:00 +0200 Subject: [M3devel] new kid on the block: http://lycus.org/ Message-ID: <3590891F-3B7B-46B1-83F6-7155F9254927@m3w.org> Maybe of interest. A friend of mine, D fan, sent this to me. From rodney_bates at lcwb.coop Mon Jun 11 14:39:09 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Mon, 11 Jun 2012 07:39:09 -0500 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: References: Message-ID: <4FD5E6ED.3040503@lcwb.coop> On 06/10/2012 03:34 AM, Jay K wrote: > > reducing our diff to gcc? > > > Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. > > > but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. > > But to my point: > > > gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. > > > tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. > > > but, tree-nested.c, I doubt this can be avoided..so I'm left probably > just not bothering with the others. > tree-nested.c has been a thorn in my side from its inception. I broke a whole lot of stuff in m3gdb, everything that has to do with nonlocal variable access and/or variables of procedure type. It reshuffles the activation record around, with multiple copies of lots of things, especially the static link, which has either two, or, if I remember right, three copies in different places. Moreover, they don't all point to the same place in their target AR. All this wouldn't be too bad, if we got the debug info altered to reflect the reality, but by the time tree-nested does its thing, it's kind of late to do that easily. That's one of the attractions of llvm to me, that it's well set up to transform both the code and its debug info in parallel, when doing optimization. Maybe gcc would be easier too, if we didn't do our own debug info production in parse.c. That could be a lot of work, but would fit fit nicely with switching to dwarf. As I understood it, all of the changes tree-nested.c makes are really only needed for the interaction between nonlocal variable access _and_ inlining. The last I knew we have had inlining disabled from the beginning anyway. Jay, if this is still true, and as you are into disabling various gcc optimizations, what would you think of just disabling what tree-nested does? > > Thoughts? > > > There is also at least one bug fix...that I could avoid needing. > There is a bug optimizing our form of div/mod. > We could avoid that by going back to function calls, but..again, I'm torn. > > > If you configure -enable-checking, at least currently, there are asserts that have to be removed. > > > I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. > > > - Jay > From jay.krell at cornell.edu Mon Jun 11 21:07:26 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 11 Jun 2012 19:07:26 +0000 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: <4FD5E6ED.3040503@lcwb.coop> References: , <4FD5E6ED.3040503@lcwb.coop> Message-ID: ?> Maybe gcc would be easier too, if we didn't do our own debug ? > info production in parse.c. Correct. It is "our fault" for doing wierd things debugging-wise. ?> That could be a lot of work It is "the right amount of work", but yeah, kind of a lot. ?> but would fit fit nicely with switching to dwarf. We'd just use -g and use whatever gcc wants for the target system. Sometimes Dwarf, sometimes not, we wouldn't care. ? > As I understood it, all of the changes tree-nested.c makes are really only > needed for the interaction between nonlocal variable access _and_ inlining. I don't think so, but I don't know. > The last I knew we have had inlining disabled from the beginning anyway. We have inlining on mostly. Aside from a small sprinkling of "volatile". Off in gcc 4.6 backend, but I never enabled that and am moving on to 4.7 rapidly. > what would you think of just disabling what tree-nested does? I'm really not sure it is possible. Sure, if nested functions used only for "lexical hiding" of the functions themselves. But Modula-3 uses the "static link" in a unique-to-itself way. I don't expect gcc to "just work". I can explain the Modula-3 unique way if people want. It turns out..I have thought about this a bunch, there is no good way to handle the static link, given that you can take the addresses of nested functions. (Right?) Where you don't take the address, the static link can just be an extra parameter. Or maybe this is dealt with elsewhere or otherwise... We do actually use "extra parameter" sometimes for static link. And maybe elsewhere/otherwise is in the frontend, mostly..just mostly... There are comments in tree-nested.c indicating it has "bad history". But actually, I'm not sure it does things so poorly. The basic theory of nested functions includes stuffing locals into a struct, at least locals accessed by nested functions, and passing a pointer to that struct as an extra parameter. The locals include said pointer to struct of locals, in the case of multiple nesting levels. OR you can "flatten" things, I guess, maybe. Flattening is problematic though, given nested functions can be mutually recursive and such..you want to update just one place and have all the other code follow pointers to it. Optimization can copy around copies instead of pointers, where it is profitable. Sorry, I don't have time to explain right now. ?- Jay ---------------------------------------- > Date: Mon, 11 Jun 2012 07:39:09 -0500 > From: rodney_bates at lcwb.coop > To: m3devel at elegosoft.com > Subject: Re: [M3devel] reducing our diff to gcc? > > > > On 06/10/2012 03:34 AM, Jay K wrote: > > > > reducing our diff to gcc? > > > > > > Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. > > > > > > but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. > > > > But to my point: > > > > > > gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. > > > > > > tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. > > > > > > but, tree-nested.c, I doubt this can be avoided..so I'm left probably > > just not bothering with the others. > > > > tree-nested.c has been a thorn in my side from its inception. I broke a whole > lot of stuff in m3gdb, everything that has to do with nonlocal variable access > and/or variables of procedure type. It reshuffles the activation record around, > with multiple copies of lots of things, especially the static link, which has > either two, or, if I remember right, three copies in different places. Moreover, > they don't all point to the same place in their target AR. > > All this wouldn't be too bad, if we got the debug info altered to reflect the > reality, but by the time tree-nested does its thing, it's kind of late to do > that easily. That's one of the attractions of llvm to me, that it's well set > up to transform both the code and its debug info in parallel, when doing > optimization. Maybe gcc would be easier too, if we didn't do our own debug > info production in parse.c. That could be a lot of work, but would fit > fit nicely with switching to dwarf. > > As I understood it, all of the changes tree-nested.c makes are really only > needed for the interaction between nonlocal variable access _and_ inlining. > The last I knew we have had inlining disabled from the beginning anyway. > Jay, if this is still true, and as you are into disabling various gcc > optimizations, what would you think of just disabling what tree-nested does? > > > > > Thoughts? > > > > > > There is also at least one bug fix...that I could avoid needing. > > There is a bug optimizing our form of div/mod. > > We could avoid that by going back to function calls, but..again, I'm torn. > > > > > > If you configure -enable-checking, at least currently, there are asserts that have to be removed. > > > > > > I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. > > > > > > - Jay > > From rodney_bates at lcwb.coop Tue Jun 12 18:17:50 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Tue, 12 Jun 2012 11:17:50 -0500 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: References: , <4FD5E6ED.3040503@lcwb.coop> Message-ID: <4FD76BAE.7060702@lcwb.coop> On 06/11/2012 02:07 PM, Jay K wrote: > > > Maybe gcc would be easier too, if we didn't do our own debug > > info production in parse.c. > > Correct. It is "our fault" for doing wierd things debugging-wise. > > > That could be a lot of work > > > It is "the right amount of work", but yeah, kind of a lot. > > > but would fit fit nicely with switching to dwarf. > We'd just use -g and use whatever gcc wants for the target system. > Sometimes Dwarf, sometimes not, we wouldn't care. > It's going to require quite a lot in m3gdb. Stock gdb has readers for several debug info formats, but there's a lot that is language-dependent, even for C, let alone the other languages supported by stock gdb. I think this has considerable debug-format dependency too, leading to a Cartesian product. It is certainly that way for Modula-3. I would be greatly surprised if gcc didn't also require at least a bit of M3-dependent work, even for dwarf. > > > As I understood it, all of the changes tree-nested.c makes are really only > > needed for the interaction between nonlocal variable access _and_ inlining. > > > I don't think so, but I don't know. > > > > The last I knew we have had inlining disabled from the beginning anyway. > > > We have inlining on mostly. Aside from a small sprinkling of "volatile". > Off in gcc 4.6 backend, but I never enabled that and am moving on to 4.7 rapidly. > > > > what would you think of just disabling what tree-nested does? > I'm really not sure it is possible. > Sure, if nested functions used only for "lexical hiding" of the functions themselves. > But Modula-3 uses the "static link" in a unique-to-itself way. > I don't expect gcc to "just work". > I can explain the Modula-3 unique way if people want. > It turns out..I have thought about this a bunch, there is no good way to handle the static link, > given that you can take the addresses of nested functions. (Right?) > Please elaborate. Yes, you can take the address of a nested function. But you can only pass it as a parameter. You can't assign it to a variable. This latter restriction requires some runtime enforcement, but I think it is taken care of by explicitly coded runtime checks generated by parse.c or earlier. The nested-function language extension to C, implemented by stock gcc, allows the taking of the address of a nested function, without the restriction against assigning it to a variable, with no linguistic safety added. If, in C, you use such a function "address" for a function that has returned, to quote from gcc "all hell will break loose". But this should imply that stock gcc support is enough for Modula-3. > > > Where you don't take the address, the static link can just be an extra parameter. > Either way, you need a static link, and it is just passed as an extra parameter. In the x86 case, it is always passed in the same register (ecx, if I recall) and always immediately stored by prolog code at the same place in the AR. tree-nested doesn't mess with this, but adds extra static-linkish variable(s) elsewhere in the AR, derived from this one, and uses them in some/all places. > > Or maybe this is dealt with elsewhere or otherwise... > > > We do actually use "extra parameter" sometimes for static link. > And maybe elsewhere/otherwise is in the frontend, mostly..just mostly... > > > There are comments in tree-nested.c indicating it has "bad history". > But actually, I'm not sure it does things so poorly. > I haven't read the comments in later gcc versions, but the bad history I recall is that it greatly simplifies an "insanely complicated" scheme. Unfortunately, the simplification is all compile-time, at the expense of replacing a relatively simple runtime scheme with one I would call at least very complicated, if not insanely so. > The basic theory of nested functions includes stuffing locals into a struct, > at least locals accessed by nested functions, and passing a pointer to that struct > as an extra parameter. The locals include said pointer to struct of locals, in the case > of multiple nesting levels. OR you can "flatten" things, I guess, maybe.f Actually, it's the other way around. All locals start out in a flat AR. If the function contains nested function(s), tree-nested collects the locals that are referenced nonlocally (i.e., from within one of the nested functions) into a local struct. Then, the nested functions get and use what you could call a "derived static link" (a better term is needed) that points directly to this struct rather than to the whole AR. I guess this helps with inlining, in case the struct isn't actually located in the same way inside the parent AR. > Flattening is problematic though, given nested functions can be mutually recursive > and such..you want to update just one place and have all the other code follow pointers to it. > Optimization can copy around copies instead of pointers, where it is profitable. > Sorry, I don't have time to explain right now. > > > - Jay > > > ---------------------------------------- >> Date: Mon, 11 Jun 2012 07:39:09 -0500 >> From: rodney_bates at lcwb.coop >> To: m3devel at elegosoft.com >> Subject: Re: [M3devel] reducing our diff to gcc? >> >> >> >> On 06/10/2012 03:34 AM, Jay K wrote: >>> >>> reducing our diff to gcc? >>> >>> >>> Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. >>> >>> >>> but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. >>> >>> But to my point: >>> >>> >>> gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. >>> >>> >>> tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. >>> >>> >>> but, tree-nested.c, I doubt this can be avoided..so I'm left probably >>> just not bothering with the others. >>> >> >> tree-nested.c has been a thorn in my side from its inception. I broke a whole >> lot of stuff in m3gdb, everything that has to do with nonlocal variable access >> and/or variables of procedure type. It reshuffles the activation record around, >> with multiple copies of lots of things, especially the static link, which has >> either two, or, if I remember right, three copies in different places. Moreover, >> they don't all point to the same place in their target AR. >> >> All this wouldn't be too bad, if we got the debug info altered to reflect the >> reality, but by the time tree-nested does its thing, it's kind of late to do >> that easily. That's one of the attractions of llvm to me, that it's well set >> up to transform both the code and its debug info in parallel, when doing >> optimization. Maybe gcc would be easier too, if we didn't do our own debug >> info production in parse.c. That could be a lot of work, but would fit >> fit nicely with switching to dwarf. >> >> As I understood it, all of the changes tree-nested.c makes are really only >> needed for the interaction between nonlocal variable access _and_ inlining. >> The last I knew we have had inlining disabled from the beginning anyway. >> Jay, if this is still true, and as you are into disabling various gcc >> optimizations, what would you think of just disabling what tree-nested does? >> >>> >>> Thoughts? >>> >>> >>> There is also at least one bug fix...that I could avoid needing. >>> There is a bug optimizing our form of div/mod. >>> We could avoid that by going back to function calls, but..again, I'm torn. >>> >>> >>> If you configure -enable-checking, at least currently, there are asserts that have to be removed. >>> >>> >>> I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. >>> >>> >>> - Jay >>> > From dabenavidesd at yahoo.es Wed Jun 13 04:18:33 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 13 Jun 2012 03:18:33 +0100 (BST) Subject: [M3devel] reducing our diff to gcc? In-Reply-To: <4FD76BAE.7060702@lcwb.coop> Message-ID: <1339553913.24183.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: in fact language-dependent-parts of a debugger inherently are 'part' of compiler architecture (needs to re-implement a lot of machinery in Gdb from Gcc, maybe it's still the same but could be reordered to cut it down if so is done in C). I have heard M3gdb is like 20k loc, this is hard to me, and in C, worse, I think a full debugger can be implemented in such lines, at least in ldb is like that, so I don't how much really M3gdb is not in Gdb. Now, m3gcc or m3cgc or m3cg or m3cc is not of interest in GNU why keep it,like that, we should use it as a real backend for using it as a language but as a real architecture, as it isn't what would it take to do that? In fact that's what we are trying to do with JIT, right? What I have found tells me that C code tends to be AFAIK portable in the form of a stack architecture like M3CG than anything else In the other sense, compiling gcc over and over again, I don't know how many of us want to do that each time we compile a Modula-3 distribution (I do). Now, I don't think gcc wnats to add and support our ideal architecture, but anyway who knows if the thing will work for us, maybe they will want it, won't they? Thanks in advance --- El mar, 12/6/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] reducing our diff to gcc? Para: "m3devel" Fecha: martes, 12 de junio, 2012 11:17 On 06/11/2012 02:07 PM, Jay K wrote: > >???>? Maybe gcc would be easier too, if we didn't do our own debug >? ? >? info production in parse.c. > > Correct. It is "our fault" for doing wierd things debugging-wise. > >???>? That could be a lot of work > > > It is "the right amount of work", but yeah, kind of a lot. > >???>? but would fit fit nicely with switching to dwarf. > We'd just use -g and use whatever gcc wants for the target system. > Sometimes Dwarf, sometimes not, we wouldn't care. > It's going to require quite a lot in m3gdb.? Stock gdb has readers for several debug info formats, but there's a lot that is language-dependent, even for C, let alone the other languages supported by stock gdb.? I think this has considerable debug-format dependency too, leading to a Cartesian product.? It is certainly that way for Modula-3.? I would be greatly surprised if gcc didn't also require at least a bit of M3-dependent work, even for dwarf. > >???>? As I understood it, all of the changes tree-nested.c makes are really only >???>? needed for the interaction between nonlocal variable access _and_ inlining. > > > I don't think so, but I don't know. > > >???>? The last I knew we have had inlining disabled from the beginning anyway. > > > We have inlining on mostly. Aside from a small sprinkling of "volatile". > Off in gcc 4.6 backend, but I never enabled that and am moving on to 4.7 rapidly. > > >???>? what would you think of just disabling what tree-nested does? > I'm really not sure it is possible. > Sure, if nested functions used only for "lexical hiding" of the functions themselves. > But Modula-3 uses the "static link" in a unique-to-itself way. > I don't expect gcc to "just work". > I can explain the Modula-3 unique way if people want. > It turns out..I have thought about this a bunch, there is no good way to handle the static link, > given that you can take the addresses of nested functions. (Right?) > Please elaborate.? Yes, you can take the address of a nested function.? But you can only pass it as a parameter.? You can't assign it to a variable.? This latter restriction requires some runtime enforcement, but I think it is taken care of by explicitly coded runtime checks generated by parse.c or earlier. The nested-function language extension to C, implemented by stock gcc, allows the taking of the address of a nested function, without the restriction against assigning it to a variable, with no linguistic safety added.? If, in C, you use such a function "address" for a function that has returned, to quote from gcc "all hell will break loose". But this should imply that stock gcc support is enough for Modula-3. > > > Where you don't take the address, the static link can just be an extra parameter. > Either way, you need a static link, and it is just passed as an extra parameter. In the x86 case, it is always passed in the same register (ecx, if I recall) and always immediately stored by prolog code at the same place in the AR.? tree-nested doesn't mess with this, but adds extra static-linkish variable(s) elsewhere in the AR, derived from this one, and uses them in some/all places. > > Or maybe this is dealt with elsewhere or otherwise... > > > We do actually use "extra parameter" sometimes for static link. > And maybe elsewhere/otherwise is in the frontend, mostly..just mostly... > > > There are comments in tree-nested.c indicating it has "bad history". > But actually, I'm not sure it does things so poorly. > I haven't read the comments in later gcc versions, but the bad history I recall is that it greatly simplifies an "insanely complicated" scheme.? Unfortunately, the simplification is all compile-time, at the expense of replacing a relatively simple runtime scheme with one I would call at least very complicated, if not insanely so. > The basic theory of nested functions includes stuffing locals into a struct, > at least locals accessed by nested functions, and passing a pointer to that struct > as an extra parameter. The locals include said pointer to struct of locals, in the case > of multiple nesting levels. OR you can "flatten" things, I guess, maybe.f Actually, it's the other way around.? All locals start out in a flat AR.? If the function contains nested function(s), tree-nested collects the locals that are referenced nonlocally (i.e., from within one of the nested functions) into a local struct.? Then, the nested functions get and use what you could call a "derived static link" (a better term is needed) that points directly to this struct rather than to the whole AR. I guess this helps with inlining, in case the struct isn't actually located in the same way inside the parent AR. > Flattening is problematic though, given nested functions can be mutually recursive > and such..you want to update just one place and have all the other code follow pointers to it. > Optimization can copy around copies instead of pointers, where it is profitable. > Sorry, I don't have time to explain right now. > > >???- Jay > > > ---------------------------------------- >> Date: Mon, 11 Jun 2012 07:39:09 -0500 >> From: rodney_bates at lcwb.coop >> To: m3devel at elegosoft.com >> Subject: Re: [M3devel] reducing our diff to gcc? >> >> >> >> On 06/10/2012 03:34 AM, Jay K wrote: >>> >>> reducing our diff to gcc? >>> >>> >>> Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. >>> >>> >>> but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. >>> >>> But to my point: >>> >>> >>> gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. >>> >>> >>> tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. >>> >>> >>> but, tree-nested.c, I doubt this can be avoided..so I'm left probably >>> just not bothering with the others. >>> >> >> tree-nested.c has been a thorn in my side from its inception. I broke a whole >> lot of stuff in m3gdb, everything that has to do with nonlocal variable access >> and/or variables of procedure type. It reshuffles the activation record around, >> with multiple copies of lots of things, especially the static link, which has >> either two, or, if I remember right, three copies in different places. Moreover, >> they don't all point to the same place in their target AR. >> >> All this wouldn't be too bad, if we got the debug info altered to reflect the >> reality, but by the time tree-nested does its thing, it's kind of late to do >> that easily. That's one of the attractions of llvm to me, that it's well set >> up to transform both the code and its debug info in parallel, when doing >> optimization. Maybe gcc would be easier too, if we didn't do our own debug >> info production in parse.c. That could be a lot of work, but would fit >> fit nicely with switching to dwarf. >> >> As I understood it, all of the changes tree-nested.c makes are really only >> needed for the interaction between nonlocal variable access _and_ inlining. >> The last I knew we have had inlining disabled from the beginning anyway. >> Jay, if this is still true, and as you are into disabling various gcc >> optimizations, what would you think of just disabling what tree-nested does? >> >>> >>> Thoughts? >>> >>> >>> There is also at least one bug fix...that I could avoid needing. >>> There is a bug optimizing our form of div/mod. >>> We could avoid that by going back to function calls, but..again, I'm torn. >>> >>> >>> If you configure -enable-checking, at least currently, there are asserts that have to be removed. >>> >>> >>> I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. >>> >>> >>> - Jay >>> >?????? ???????? ?????? ??? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jun 16 08:09:33 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 16 Jun 2012 06:09:33 +0000 Subject: [M3devel] help test 4.7 backend? Message-ID: help test 4.7 backend? Can folks try out the new 4.7 backend? edit m3-sys/m3cc/src/m3makefile add your platform to the list near the top, mapped to "47" and then run scripts/python/boot2.sh and then, do it again, but edit config/Unix.common, the functon m3_backend to always args += m3back_optimize and optionally but preferably try with -O3 instead of -O2 in the same file and try running some GUI apps like solataire I could use help particularly with: ?SPARC{32,64}_LINUX ?PPC_{LINUX,OPENBSD,NETBSD,FREEBSD,DARWIN} ?ALPHA_OSF ?I386_LINUX, I386_INTERIX, I386_MINGWIN, I386_CYGWIN, because I'm being lazy ? I can do various x86/amd64, either in a VM or opencsw, but splitting that load would be good too. I might go back to not having much time soon or temporarily. Still to do: ? apply OpenBSD patches ? update from 4.7.0 to 4.7.1 that was just released. ? ? Thanks, ?- Jay From jay.krell at cornell.edu Sat Jun 16 10:47:35 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 16 Jun 2012 08:47:35 +0000 Subject: [M3devel] ALPHA_LINUX In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, , <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , , , , , , , , , <20120607011634.468b6bbf@wenus.next.com.pl>, , <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org>, Message-ID: > > Is cm3 working on LINUX_ALPHA? I have one ES40 working server with Gentoo Linux > I don't think it does yet, but give me ssh access and I can most likely make it work pretty quickly. > There is very very very little to porting these days. So forgetful of me. Yes, it works. See:http://www.opencm3.net/uploaded-archives/index.html - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jun 16 11:03:22 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 16 Jun 2012 09:03:22 +0000 Subject: [M3devel] IA64_LINUX In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, , <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , , , , , , , , , <20120607011634.468b6bbf@wenus.next.com.pl>, , <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org>, , Message-ID: also IA64_LINUX, I thought I was working on recently, yet I already put up here a while ago: http://www.opencm3.net/uploaded-archives/index.html i don't remember if I solved the finding the register spill stack coding..and indeed..I don't see the code in m3core...so a little bit to do there... I expect there might be a GC bug there..or maybe we should make all stores volatile..or something... - Jay From: jay.krell at cornell.edu To: dragisha at m3w.org; dknoto at gmail.com CC: m3devel at elegosoft.com Subject: ALPHA_LINUX Date: Sat, 16 Jun 2012 08:47:35 +0000 > > Is cm3 working on LINUX_ALPHA? I have one ES40 working server with Gentoo Linux > I don't think it does yet, but give me ssh access and I can most likely make it work pretty quickly. > There is very very very little to porting these days. So forgetful of me. Yes, it works. See: http://www.opencm3.net/uploaded-archives/index.html - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at wickensonline.co.uk Sat Jun 16 11:49:45 2012 From: mark at wickensonline.co.uk (Mark Wickens) Date: Sat, 16 Jun 2012 10:49:45 +0100 Subject: [M3devel] IA64_LINUX In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> <20120607011634.468b6bbf@wenus.next.com.pl> <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org> Message-ID: If you feel the need to address the issues let me know and I'll put the ZX6000 online for you. Mark. Sent from my iPad On 16 Jun 2012, at 10:03, Jay K wrote: > also IA64_LINUX, I thought I was working on recently, yet I already put up here a while ago: > > http://www.opencm3.net/uploaded-archives/index.html > > > i don't remember if I solved the finding the register spill stack coding..and indeed..I don't see the code in m3core...so a little bit to do there... I expect there might be a GC bug there..or maybe we should make all stores volatile..or something... > > > - Jay > > From: jay.krell at cornell.edu > To: dragisha at m3w.org; dknoto at gmail.com > CC: m3devel at elegosoft.com > Subject: ALPHA_LINUX > Date: Sat, 16 Jun 2012 08:47:35 +0000 > > > > Is cm3 working on LINUX_ALPHA? I have one ES40 working server with Gentoo Linux > > > I don't think it does yet, but give me ssh access and I can most likely make it work pretty quickly. > > There is very very very little to porting these days. > > > So forgetful of me. > > Yes, it works. > > See: > http://www.opencm3.net/uploaded-archives/index.html > > > - Jay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Sun Jun 17 20:36:02 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 17 Jun 2012 20:36:02 +0200 Subject: [M3devel] g_open, GLib wrapper for open() Message-ID: From doc: === There is a group of functions which wrap the common POSIX functions dealing with filenames (g_open(), g_rename(), g_mkdir(), g_stat(), g_unlink(), g_remove(), g_fopen(), g_freopen()). The point of these wrappers is to make it possible to handle file names with any Unicode characters in them on Windows without having to use ifdefs and the wide character API in the application code. The pathname argument should be in the GLib file name encoding. On POSIX this is the actual on-disk encoding which might correspond to the locale settings of the process (or the G_FILENAME_ENCODING environment variable), or not. On Windows the GLib file name encoding is UTF-8. Note that the Microsoft C library does not use UTF-8, but has separate APIs for current system code page and wide characters (UTF-16). The GLib wrappers call the wide character API if present (on modern Windows systems), otherwise convert to/from the system code page. === Template for g_open is: int g_open (const gchar *filename, int flags, int mode); Obviously, I need FilePosix.i3 and descendants, but under Windows? Anyone met/solved this? TIA, dd -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jun 18 22:57:07 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 18 Jun 2012 21:57:07 +0100 (BST) Subject: [M3devel] g_open, GLib wrapper for open() In-Reply-To: Message-ID: <1340053027.33106.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: Win32 doesn't support Unicode character code set natively but as separated strings API for both ANSI and Unicode same as C Run-Time library just as you say, not as it's in Windows NT native code set for all strings. But I don't think the Win32 Win98 Is a common type of system daily, so I guess you can be safe without that. Couldn't you? Thanks in advance --- El dom, 17/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] g_open, GLib wrapper for open() Para: "m3devel" Fecha: domingo, 17 de junio, 2012 13:36 >From doc:===?There is a group of functions which wrap the common POSIX functions dealing with filenames (g_open(), g_rename(), g_mkdir(), g_stat(), g_unlink(), g_remove(), g_fopen(), g_freopen()). The point of these wrappers is to make it possible to handle file names with any Unicode characters in them on Windows without having to use ifdefs and the wide character API in the application code. The pathname argument should be in the GLib file name encoding. On POSIX this is the actual on-disk encoding which might correspond to the locale settings of the process (or the G_FILENAME_ENCODING environment variable), or not. On Windows the GLib file name encoding is UTF-8. Note that the Microsoft C library does not use UTF-8, but has separate APIs for current system code page and wide characters (UTF-16). The GLib wrappers call the wide character API if present (on modern Windows systems), otherwise convert to/from the system code page.===Template for g_open is: int g_open (const gchar *filename, int flags, int mode);Obviously, I need FilePosix.i3 and descendants, but under Windows? Anyone met/solved this? TIA,dd -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 19 09:15:32 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 19 Jun 2012 09:15:32 +0200 Subject: [M3devel] g_open, GLib wrapper for open() In-Reply-To: <1340053027.33106.YahooMailClassic@web29705.mail.ird.yahoo.com> References: <1340053027.33106.YahooMailClassic@web29705.mail.ird.yahoo.com> Message-ID: <110BFBCA-C682-4210-8D44-375550B6DB55@m3w.org> You could not do without. Once you need to access a file from Gtk application, and file is named with at least one Unicode character, you cannot ignore it. On Jun 18, 2012, at 10:57 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > Win32 doesn't support Unicode character code set natively but as separated strings API for both ANSI and Unicode same as C Run-Time library just as you say, not as it's in Windows NT native code set for all strings. But I don't think the Win32 Win98 Is a common type of system daily, so I guess you can be safe without that. Couldn't you? > Thanks in advance > > --- El dom, 17/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: [M3devel] g_open, GLib wrapper for open() > Para: "m3devel" > Fecha: domingo, 17 de junio, 2012 13:36 > > From doc: > === > There is a group of functions which wrap the common POSIX functions dealing with filenames (g_open(), g_rename(), g_mkdir(), g_stat(), g_unlink(), g_remove(), g_fopen(), g_freopen()). The point of these wrappers is to make it possible to handle file names with any Unicode characters in them on Windows without having to use ifdefs and the wide character API in the application code. > > The pathname argument should be in the GLib file name encoding. On POSIX this is the actual on-disk encoding which might correspond to the locale settings of the process (or the G_FILENAME_ENCODING environment variable), or not. > > On Windows the GLib file name encoding is UTF-8. Note that the Microsoft C library does not use UTF-8, but has separate APIs for current system code page and wide characters (UTF-16). The GLib wrappers call the wide character API if present (on modern Windows systems), otherwise convert to/from the system code page. > === > Template for g_open is: > > int g_open (const gchar *filename, > int flags, > int mode); > Obviously, I need FilePosix.i3 and descendants, but under Windows? Anyone met/solved this? > > TIA, > dd > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Tue Jun 19 16:35:34 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 10:35:34 -0400 Subject: [M3devel] missing m3gdb? Message-ID: <20120619143534.GA30034@topoi.pooq.com> Having downloaded the development version in mid-May and succeeded in biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. I then removed my existing Modula 3, installed the new .deb, and proceeded to use it with no problems until today. Today tried to use the debugger, and discovered that m3gdb is missing. Did I bungle something or was m3gdb left out of the script for building the .deb for some reason? If the latter, is it still missing? The only package I remenber deliberately removing is ESC, which didn't compile. -- hendrik From hendrik at topoi.pooq.com Tue Jun 19 17:13:04 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 11:13:04 -0400 Subject: [M3devel] missing m3gdb? In-Reply-To: <20120619143534.GA30034@topoi.pooq.com> References: <20120619143534.GA30034@topoi.pooq.com> Message-ID: <20120619151304.GB30034@topoi.pooq.com> On Tue, Jun 19, 2012 at 10:35:34AM -0400, Hendrik Boom wrote: > Having downloaded the development version in mid-May and succeeded in > biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. I then removed my > existing Modula 3, installed the new .deb, and proceeded to use it with > no problems until today. > > Today tried to use the debugger, and discovered that m3gdb is missing. > > Did I bungle something or was m3gdb left out of the script for building > the .deb for some reason? If the latter, is it still missing? > > The only package I remenber deliberately removing is ESC, which didn't > compile. I don't know if this is relevant, but:::: On LINUXLIBC6, which I've only partially recompiled so far from those same mid-May sources, I get (m3gdb) bt #0 0x0804c75e in RunSeq (code=0xb6c3436c, exec=0xbfdad6d4) at ../src/PqCd.m3:907 #1 0x0804c950 in EnvRunMe (self=0xb6c34308) at ../src/PqCd.m3:923 Debug info for file "Stupid.mc" not in stabs format (m3gdb) which suggests there may be some inncompatibility, possibly caused by the partial recompilation of Modula 3. I don't know whether the debugger is there from my initial download or from my recompilation. > > -- hendrik From dabenavidesd at yahoo.es Tue Jun 19 17:57:52 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 19 Jun 2012 16:57:52 +0100 (BST) Subject: [M3devel] missing m3gdb? In-Reply-To: <20120619143534.GA30034@topoi.pooq.com> Message-ID: <1340121472.14046.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: while I haven't checked cm3-std last (released) build but I didn't need it since anything broke in build time, but ESC hasn't been compiled after last CM3 as the HP' version didn't compile for me (though older CM3 did compile with same HP version) so I tried and worked OK, which might be good for timing it. I don't know if your m3cgc works or not? with other releases, I guess it should not break m3gdb support (whichever m3cgc do you use). My main comment here is that you don't update something or anything else unless isn't working OK (I guess this is pure SW Eng blah blah but if it works ...). Thanks in advance --- El mar, 19/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: [M3devel] missing m3gdb? Para: "m3devel" Fecha: martes, 19 de junio, 2012 09:35 Having downloaded the development version in mid-May and succeeded in biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb.? I then removed my existing Modula 3, installed the new .deb, and proceeded to use it with no problems until today. Today tried to use the debugger, and discovered that m3gdb is missing. Did I bungle something or was m3gdb left out of the script for building the .deb for some reason?? If the latter, is it still missing? The only package I remenber deliberately removing is ESC, which didn't compile. -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 19 18:08:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 19 Jun 2012 18:08:00 +0200 Subject: [M3devel] missing m3gdb? In-Reply-To: <20120619143534.GA30034@topoi.pooq.com> References: <20120619143534.GA30034@topoi.pooq.com> Message-ID: <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> Short answer: If you need m3gdb - use 5.8.6 release version. On Jun 19, 2012, at 4:35 PM, Hendrik Boom wrote: > Having downloaded the development version in mid-May and succeeded in > biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. I then removed my > existing Modula 3, installed the new .deb, and proceeded to use it with > no problems until today. > > Today tried to use the debugger, and discovered that m3gdb is missing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Tue Jun 19 18:28:33 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 19 Jun 2012 17:28:33 +0100 (BST) Subject: [M3devel] missing m3gdb? In-Reply-To: <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> Message-ID: <1340123313.64869.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I don't think is so much a waste of time, if compiler has done well, you don't need to make a debugger, but instead if it hasn't done it well, don't waste more time use a different compiler (optimized). ESC had a X postfix for every package name that has been verified so you could have a different verified and compiled version and as a reference for program behavior and then use an experimental debug able version. That was the idea with a Module system with separate compilation and version stamps, IMHO, to really have a fast to execute and easy to debug around cycle and you need that lately as compiler versions are getting faster or harder to debug. The interesting stuff is whether you could use the same infrastructure to verify in less time or not, that will proof ESC is worth of anything which I'm sure no body uses for that reasoning broken -not Dragisha's nor Hendrick- but most people do ahead of time. Thanks in advance --- El mar, 19/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] missing m3gdb? Para: "Hendrik Boom" CC: "m3devel" Fecha: martes, 19 de junio, 2012 11:08 Short answer: If you need m3gdb - use 5.8.6 release version. On Jun 19, 2012, at 4:35 PM, Hendrik Boom wrote: Having downloaded the development version in mid-May and succeeded in? biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. ?I then removed my? existing Modula 3, installed the new .deb, and proceeded to use it with? no problems until today. Today tried to use the debugger, and discovered that m3gdb is missing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Tue Jun 19 18:55:16 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 12:55:16 -0400 Subject: [M3devel] missing m3gdb? In-Reply-To: <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> References: <20120619143534.GA30034@topoi.pooq.com> <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> Message-ID: <20120619165516.GA32036@topoi.pooq.com> That was my fallback plan, and I'd still have to recompile it so it would access current libraries on Debian. What I wanted to know was whether it was intentional to leave the debugger out of the current .deb-building script (because, perhaps, that it didn't work). And as I've said before, recompiling frmo source is too much work for a beginner. Not that I class myself as a beginner anymore. but if, for example, I'd want to submit a video game written in Modula 3 to an open-source video-game competition, the judges would have to be able to run it on their machines, and they would be beginners. So if the development-source doesn't build a working .deb, I'll build one from 5.8.6. But if I didn't bungle the .deb build, and the m3gdb isn't a known bug, it probably warrants some attentioin, by someone, someday.. -- hendrik The LINUXLIBC6 problem may just be a problem with an incomplete build. I've restarted it after installing postgresql (which was holding things up), and it's compiling, comppiling, and compiling now. But I really had thought the AMD64 Linux build has good, and it seemed not to be. -- hendrik On Tue, Jun 19, 2012 at 06:08:00PM +0200, Dragi?a Duri? wrote: > Short answer: If you need m3gdb - use 5.8.6 release version. > > On Jun 19, 2012, at 4:35 PM, Hendrik Boom wrote: > > > Having downloaded the development version in mid-May and succeeded in > > biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. I then removed my > > existing Modula 3, installed the new .deb, and proceeded to use it with > > no problems until today. > > > > Today tried to use the debugger, and discovered that m3gdb is missing. > From hendrik at topoi.pooq.com Tue Jun 19 19:00:39 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 13:00:39 -0400 Subject: [M3devel] ESC In-Reply-To: <1340123313.64869.YahooMailClassic@web29704.mail.ird.yahoo.com> References: <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> <1340123313.64869.YahooMailClassic@web29704.mail.ird.yahoo.com> Message-ID: <20120619170038.GB32036@topoi.pooq.com> On Tue, Jun 19, 2012 at 05:28:33PM +0100, Daniel Alejandro Benavides D. wrote: > Hi all: > I don't think is so much a waste of time, if compiler has done well, you don't need to make a debugger, but instead if it hasn't done it well, don't waste more time use a different compiler (optimized). > ESC had a X postfix for every package name that has been verified so you could have a different verified and compiled version and as a reference for program behavior and then use an experimental debug able version. > That was the idea with a Module system with separate compilation and version stamps, IMHO, to really have a fast to execute and easy to debug around cycle and you need that lately as compiler versions are getting faster or harder to debug. > The interesting stuff is whether you could use the same infrastructure to verify in less time or not, that will proof ESC is worth of anything which I'm sure no body uses for that reasoning broken -not Dragisha's nor Hendrick- but most people do ahead of time. > Thanks in advance > > --- El mar, 19/6/12, Dragi?a Duri? escribi?: Yes, I agree. It would be worthwhile to track down the ESC source code. Or rewrite it. But until that's been done I'll probably need a debugger. And maybe occasinoally afterward, for the things that ESC doesn't catch. -- hendrik From hendrik at topoi.pooq.com Tue Jun 19 19:57:13 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 13:57:13 -0400 Subject: [M3devel] Rebuilding 5.8.6 for current Debian. In-Reply-To: <20120619165516.GA32036@topoi.pooq.com> References: <20120619143534.GA30034@topoi.pooq.com> <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> <20120619165516.GA32036@topoi.pooq.com> Message-ID: <20120619175713.GA32389@topoi.pooq.com> On Tue, Jun 19, 2012 at 12:55:16PM -0400, Hendrik Boom wrote: > > So if the development-source doesn't build a working .deb, I'll build > one from 5.8.6. The current 5.8.6 .deb is not compatible with current versions of debian. If I build a .deb from the sources in cm3-src-all-5.8.6-REL.tgz, will its version number be 5.8.6, or some modification of 5.8.6? I'd very much want it to be *different* so that my build will be recognised as a more recent build (for a more recent version of Debian). If not, is there a way of specifying it explicitly? The new .deb I make will likely not be compatible with really old versions of Debian. -- hendrik From dabenavidesd at yahoo.es Tue Jun 19 20:17:07 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 19 Jun 2012 19:17:07 +0100 (BST) Subject: [M3devel] ESC In-Reply-To: <20120619170038.GB32036@topoi.pooq.com> Message-ID: <1340129827.93106.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: in fact there was another ESC written exclusively for the purpose of finding the time complexity (from source) of multi-threaded programs, but this would be another approach to find whether ESC system and its proof machine (Simplify) will perform OK using it in normal basis, at the average case scenario (but Simplify has unsoundnesses and program-dependent checker coming from ESC front end), at least in a programming environment like Modula-3 to have the class of complexity of a programming model is something I want. However there is proof of such an environment used for big SW development at IBM, which targeted Modula-3, was not good without formal software analysis (in both fronts, development and performance) Thing is I don't how many studies of Software developers given by a systematic analysis are aside of IBM 80's and some more for Modula-3 theres later. So based in experience I can infer it's good, but in the real world I don't know how many will buy the idea not backed by some real good experience and with some real proof. Anyone else :)? Thanks in advance Thanks in advance --- El mar, 19/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: [M3devel] ESC Para: "m3devel" Fecha: martes, 19 de junio, 2012 12:00 On Tue, Jun 19, 2012 at 05:28:33PM +0100, Daniel Alejandro Benavides D. wrote: > Hi all: > I don't think is so much a waste of time, if compiler has done well, you don't need to make a debugger, but instead if it hasn't done it well, don't waste more time use a different compiler (optimized). > ESC had a X postfix for every package name that has been verified so you could have a different verified and compiled version and as a reference for program behavior and then use an experimental debug able version. > That was the idea with a Module system with separate compilation and version stamps, IMHO, to really have a fast to execute and easy to debug around cycle and you need that lately as compiler versions are getting faster or harder to debug. > The interesting stuff is whether you could use the same infrastructure to verify in less time or not, that will proof ESC is worth of anything which I'm sure no body uses for that reasoning broken -not Dragisha's nor Hendrick- but most people do ahead of time. > Thanks in advance > > --- El mar, 19/6/12, Dragi?a Duri? escribi?: Yes, I agree. It would be worthwhile to track down the ESC source code.? Or rewrite it. But until that's been done I'll probably need a debugger.? And maybe occasinoally afterward, for the things that ESC doesn't catch. -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Wed Jun 20 13:17:06 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Wed, 20 Jun 2012 07:17:06 -0400 Subject: [M3devel] test driver? Message-ID: <20120620111705.GA10486@topoi.pooq.com> Is there a test suite driver somewhere in the Modula 3 ecosystem? I'd like to feed various files of test data into a program to see if it produces acceptable output. Currently it's all text in and out, but I'd prefer not to have to rewrite my test suite because of trivialities, such as spelling corrections in my error messages. This is for regression testing, so automation is appreciated. -- hendrik From dragisha at m3w.org Wed Jun 20 13:26:53 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 20 Jun 2012 13:26:53 +0200 Subject: [M3devel] test driver? In-Reply-To: <20120620111705.GA10486@topoi.pooq.com> References: <20120620111705.GA10486@topoi.pooq.com> Message-ID: cm3/m3-libs/libm3/tests And under. AFAIK, there is continuous building/testing configured for cm3. Search for Hudson, Modula-3? On Jun 20, 2012, at 1:17 PM, Hendrik Boom wrote: > Is there a test suite driver somewhere in the Modula 3 ecosystem? > > I'd like to feed various files of test data into a program to see if it > produces acceptable output. Currently it's all text in and out, but I'd > prefer not to have to rewrite my test suite because of trivialities, > such as spelling corrections in my error messages. > > This is for regression testing, so automation is appreciated. > > -- hendrik > From dabenavidesd at yahoo.es Wed Jun 20 14:41:26 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 20 Jun 2012 13:41:26 +0100 (BST) Subject: [M3devel] test driver? In-Reply-To: Message-ID: <1340196086.64556.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: black-box testing for C or m3cgc, m3cg, m3cc, or m3cg is something we should use daily basis. I know of a free testing platform for C# based on Spec# I think we could use it for static optimization (test -O2 -O3) which combines both adding reasoning to the system (knowledge management): http://books.google.com.co/books?id=Am43BAC06L8C This can be a good thing to do in later stages (code generation, etc). Thanks in advance --- El mi?, 20/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] test driver? Para: "Hendrik Boom" CC: "m3devel" Fecha: mi?rcoles, 20 de junio, 2012 06:26 cm3/m3-libs/libm3/tests And under. AFAIK, there is continuous building/testing configured for cm3. Search for Hudson, Modula-3? On Jun 20, 2012, at 1:17 PM, Hendrik Boom wrote: > Is there a test suite driver somewhere in the Modula 3 ecosystem? > > I'd like to feed various files of test data into a program to see if it > produces acceptable output.? Currently it's all text in and out, but I'd > prefer not to have to rewrite my test suite because of trivialities, > such as spelling corrections in my error messages. > > This is for regression testing, so automation is appreciated. > > -- hendrik > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wagner at elegosoft.com Fri Jun 22 09:16:16 2012 From: wagner at elegosoft.com (mail.elegosoft.com) Date: Fri, 22 Jun 2012 09:16:16 +0200 Subject: [M3devel] help test 4.7 backend? In-Reply-To: References: Message-ID: <20120622091616.18b39755.wagner@elegosoft.com> I just noticed that m3tests have been hanging on luthien/AMD64_FREEBSD for several days now in p006: http://hudson.modula3.com:8080/job/cm3-current-test-m3tests-AMD64_FREEBSD/479/console I don't know if it is related, but it used to run OK. Olaf On Sat, 16 Jun 2012 06:09:33 +0000 Jay K wrote: > > help test 4.7 backend? > > > Can folks try out the new 4.7 backend? > edit m3-sys/m3cc/src/m3makefile > add your platform to the list near the top, mapped to "47" > and then run scripts/python/boot2.sh > and then, do it again, but edit config/Unix.common, the functon > m3_backend to always args += m3back_optimize > and optionally but preferably try with -O3 instead of -O2 in > the same file > and try running some GUI apps like solataire > > > I could use help particularly with: > ?SPARC{32,64}_LINUX > ?PPC_{LINUX,OPENBSD,NETBSD,FREEBSD,DARWIN} > ?ALPHA_OSF > ?I386_LINUX, I386_INTERIX, I386_MINGWIN, I386_CYGWIN, because I'm being lazy > > > ? > I can do various x86/amd64, either in a VM or opencsw, > but splitting that load would be good too. > I might go back to not having much time soon or temporarily. > > > Still to do: > ? apply OpenBSD patches > ? update from 4.7.0 to 4.7.1 that was just released. > ? > ? > Thanks, > ?- Jay > -- Olaf Wagner -- elego Software Solutions GmbH Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 45 86 95 http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: Berlin Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194 From dabenavidesd at yahoo.es Fri Jun 22 17:51:37 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 22 Jun 2012 16:51:37 +0100 (BST) Subject: [M3devel] help test 4.7 backend? In-Reply-To: <20120622091616.18b39755.wagner@elegosoft.com> Message-ID: <1340380297.77309.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: maybe not, else if somebody isn't playing optimization unintended aggressively for m3tests/src but to break semantics of Modula-3 threads? I mean, m3 sources are OK, respect of the Thread interface, but I don't think for the thing they so call pthreads can be the same at the same time, though DEC-SRC hard influenced it. The only way to test? that is in No in SW bug, but the HW, kernel aside, but with this HW I can't be sure they are doing thread safe system code (in other words those machines are badly behaved). I have been thinking in this idea, but requiring to make a Virtual Machine for Modula-3 worth the value of playing it for that matter. It could have multithreading capabilities, tough multitasking system and all. Jay, and all we could try the DEC/Compaq Alpha/Piranha simulator, to catch that kind of errors. Thanks in advance --- El vie, 22/6/12, mail.elegosoft.com escribi?: De: mail.elegosoft.com Asunto: Re: [M3devel] help test 4.7 backend? Para: m3devel at elegosoft.com Fecha: viernes, 22 de junio, 2012 02:16 I just noticed that m3tests have been hanging on luthien/AMD64_FREEBSD for several days now in p006: http://hudson.modula3.com:8080/job/cm3-current-test-m3tests-AMD64_FREEBSD/479/console I don't know if it is related, but it used to run OK. Olaf On Sat, 16 Jun 2012 06:09:33 +0000 Jay K wrote: > > help test 4.7 backend? > > > Can folks try out the new 4.7 backend? > edit m3-sys/m3cc/src/m3makefile > add your platform to the list near the top, mapped to "47" > and then run scripts/python/boot2.sh > and then, do it again, but edit config/Unix.common, the functon > m3_backend to always args += m3back_optimize > and optionally but preferably try with -O3 instead of -O2 in > the same file > and try running some GUI apps like solataire > > > I could use help particularly with: > SPARC{32,64}_LINUX > PPC_{LINUX,OPENBSD,NETBSD,FREEBSD,DARWIN} > ALPHA_OSF > I386_LINUX, I386_INTERIX, I386_MINGWIN, I386_CYGWIN, because I'm being lazy > > > > I can do various x86/amd64, either in a VM or opencsw, > but splitting that load would be good too. > I might go back to not having much time soon or temporarily. > > > Still to do: > apply OpenBSD patches > update from 4.7.0 to 4.7.1 that was just released. > > > Thanks, > - Jay >? ??? ???????? ?????? ??? ? -- Olaf Wagner -- elego Software Solutions GmbH ? ? ? ? ? ? ???Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany phone: +49 30 23 45 86 96? mobile: +49 177 2345 869? fax: +49 30 23 45 86 95 ???http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: Berlin Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jun 23 02:45:17 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 23 Jun 2012 00:45:17 +0000 Subject: [M3devel] help test 4.7 backend? In-Reply-To: <20120622091616.18b39755.wagner@elegosoft.com> References: , <20120622091616.18b39755.wagner@elegosoft.com> Message-ID: I kind of haven't touched FreeBSD. They are still on gcc 4.5. But maybe I did. I'll look into it maybe soon..but I'm super busy the next two weeks. I'm hoping to test FreeBSD/x86 and FreeBSD/amd64 with gcc 4.7 and then move them to it. Thank you for pointing this out. It is good to see the Hudson stuff continue to work. My nodes are kind of all down/gone -- some remain but the router is no longer configured as it was. ?- Jay ---------------------------------------- > Date: Fri, 22 Jun 2012 09:16:16 +0200 > From: wagner at elegosoft.com > To: m3devel at elegosoft.com > Subject: Re: [M3devel] help test 4.7 backend? > > I just noticed that m3tests have been hanging on luthien/AMD64_FREEBSD > for several days now in p006: > > http://hudson.modula3.com:8080/job/cm3-current-test-m3tests-AMD64_FREEBSD/479/console > > I don't know if it is related, but it used to run OK. > > Olaf > > On Sat, 16 Jun 2012 06:09:33 +0000 > Jay K wrote: > > > > > help test 4.7 backend? > > > > > > Can folks try out the new 4.7 backend? > > edit m3-sys/m3cc/src/m3makefile > > add your platform to the list near the top, mapped to "47" > > and then run scripts/python/boot2.sh > > and then, do it again, but edit config/Unix.common, the functon > > m3_backend to always args += m3back_optimize > > and optionally but preferably try with -O3 instead of -O2 in > > the same file > > and try running some GUI apps like solataire > > > > > > I could use help particularly with: > > SPARC{32,64}_LINUX > > PPC_{LINUX,OPENBSD,NETBSD,FREEBSD,DARWIN} > > ALPHA_OSF > > I386_LINUX, I386_INTERIX, I386_MINGWIN, I386_CYGWIN, because I'm being lazy > > > > > > > > I can do various x86/amd64, either in a VM or opencsw, > > but splitting that load would be good too. > > I might go back to not having much time soon or temporarily. > > > > > > Still to do: > > apply OpenBSD patches > > update from 4.7.0 to 4.7.1 that was just released. > > > > > > Thanks, > > - Jay > > > > -- > Olaf Wagner -- elego Software Solutions GmbH > Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany > phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 45 86 95 > http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: Berlin > Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194 From dragisha at m3w.org Mon Jun 25 12:51:05 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 12:51:05 +0200 Subject: [M3devel] Windows, Unicode file names Message-ID: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> Anybody aware of issues with FSWin32.m3 and cases where one actually has to cover situation with non-ASCII filenames under Windows? Met problem? TIA, dd From dabenavidesd at yahoo.es Mon Jun 25 18:52:35 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 17:52:35 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> Message-ID: <1340643155.54846.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: I was asked why there wasn't a faster Modula-3 environment (the Modula-3 NT386GNU is way too slow even nowadays) and without? answer I guess this is the same question of who wants Windows- ready environment and if you are interested DEC had a project M3lite for WinNT/95 (compatible) system I guess addressing compatibility with old users might get better results for CM3, but that's history now, which don't makes or makes little sense anyway on the understanding that Windows8 will be incompatible anyway with Win32. As of today I haven't understand what is the new API they will bring on, and frankly I don't care either if they have a new system to get hands on, but certainly you would want sort like that if you have a tablet or mobile phone where there isn't too much time to spend compiling from source Gcc. Thanks in advance ? --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] Windows, Unicode file names Para: "m3devel" Fecha: lunes, 25 de junio, 2012 05:51 Anybody aware of issues with FSWin32.m3 and cases where one actually has to cover situation with non-ASCII filenames under Windows? Met problem? TIA, dd -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 18:54:56 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 18:54:56 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340643155.54846.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1340643155.54846.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: <8A65A674-1120-459E-98FC-AF622D24EC66@m3w.org> Daniel, please start your own topics and don't dillute other discussions with off topic talk. Thanks in advance, dd On Jun 25, 2012, at 6:52 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > I was asked why there wasn't a faster Modula-3 environment (the Modula-3 NT386GNU is way too slow even nowadays) and without answer I guess this is the same question of who wants Windows- ready environment and if you are interested DEC had a project M3lite for WinNT/95 (compatible) system -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jun 25 19:04:20 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 18:04:20 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <8A65A674-1120-459E-98FC-AF622D24EC66@m3w.org> Message-ID: <1340643860.74333.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: thanks but I don't know if you know that M3lite was Win95 NT compatible system. Perhaps I missed what your point is, but this is the same question I guess (but I don't know your answer either that's a different point). See M3-FAQ ( WHAT IS M3-LITE, MS-WINDOWS SUPPORT ) Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 11:54 Daniel, please start your own topics and don't dillute other discussions with off topic talk. Thanks in advance,dd On Jun 25, 2012, at 6:52 PM, Daniel Alejandro Benavides D. wrote: Hi all: I was asked why there wasn't a faster Modula-3 environment (the Modula-3 NT386GNU is way too slow even nowadays) and without? answer I guess this is the same question of who wants Windows- ready environment and if you are interested DEC had a project M3lite for WinNT/95 (compatible) system -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 19:07:20 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 19:07:20 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340643860.74333.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1340643860.74333.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: <8E8B1021-7B2C-415F-A965-F49257C4C2FB@m3w.org> See subject - Windows, Unicode file names. Thank in advance. On Jun 25, 2012, at 7:04 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > thanks but I don't know if you know that M3lite was Win95 NT compatible system. > Perhaps I missed what your point is, but this is the same question I guess (but I don't know your answer either that's a different point). See M3-FAQ (WHAT IS M3-LITE, MS-WINDOWS SUPPORT ) > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jun 25 19:27:44 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 18:27:44 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <8E8B1021-7B2C-415F-A965-F49257C4C2FB@m3w.org> Message-ID: <1340645264.336.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance PS being clearer about topics is what I want so please be free to tell me as? as much I'm not --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:07 See subject - Windows, Unicode file names. Thank in advance. On Jun 25, 2012, at 7:04 PM, Daniel Alejandro Benavides D. wrote: Hi all: thanks but I don't know if you know that M3lite was Win95 NT compatible system. Perhaps I missed what your point is, but this is the same question I guess (but I don't know your answer either that's a different point). See M3-FAQ (WHAT IS M3-LITE, MS-WINDOWS SUPPORT?) Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 19:36:39 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 19:36:39 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340645264.336.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1340645264.336.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From rcolebur at SCIRES.COM Mon Jun 25 19:51:10 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Mon, 25 Jun 2012 13:51:10 -0400 Subject: [M3devel] EXT [M3commit] CVS Update: cm3 In-Reply-To: <20120624094248.AAB932474003@birch.elegosoft.com> References: <20120624094248.AAB932474003@birch.elegosoft.com> Message-ID: Does this mean HPUX will no longer be supported? -----Original Message----- From: Jay Krell [mailto:jkrell at elego.de] Sent: Sunday, June 24, 2012 7:43 AM To: m3commit at elegosoft.com Subject: EXT [M3commit] CVS Update: cm3 CVSROOT: /usr/cvs Changes by: jkrell at birch. 12/06/24 11:42:45 Modified files: cm3/m3-sys/cminstall/src/config-no-install/: Unix.common Log message: hpux_flags is never used, remove it From dabenavidesd at yahoo.es Mon Jun 25 20:06:10 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 19:06:10 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: Message-ID: <1340647570.11529.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 20:20:01 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 20:20:01 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340647570.11529.YahooMailClassic@web29706.mail.ird.yahoo.com> References: <1340647570.11529.YahooMailClassic@web29706.mail.ird.yahoo.com> Message-ID: <6DF57887-C46F-408C-863F-1242C4C4C6A9@m3w.org> Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > >> Hi all: >> I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. >> But in turn you want to keep compatibility with older file name encodes. >> I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! >> Thanks in advance > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 20:40:43 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 18:40:43 +0000 Subject: [M3devel] EXT [M3commit] CVS Update: cm3 In-Reply-To: References: <20120624094248.AAB932474003@birch.elegosoft.com>, Message-ID: No. This does not represent a loss of any support for any target.It is just a removal of a local variable that is initialized and never further referenced, unless I read the code incorrectly.On the other hand, I don't think anyone here has HPUX available for any testing/development.I used to, but no longer. - Jay > From: rcolebur at SCIRES.COM > To: jkrell at elego.de; m3devel at elegosoft.com > Date: Mon, 25 Jun 2012 13:51:10 -0400 > Subject: Re: [M3devel] EXT [M3commit] CVS Update: cm3 > > Does this mean HPUX will no longer be supported? > > -----Original Message----- > From: Jay Krell [mailto:jkrell at elego.de] > Sent: Sunday, June 24, 2012 7:43 AM > To: m3commit at elegosoft.com > Subject: EXT [M3commit] CVS Update: cm3 > > CVSROOT: /usr/cvs > Changes by: jkrell at birch. 12/06/24 11:42:45 > > Modified files: > cm3/m3-sys/cminstall/src/config-no-install/: Unix.common > > Log message: > hpux_flags is never used, remove it > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 20:49:22 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 20:49:22 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> Message-ID: <461C7DCF-E432-4434-BAD3-7FA3B9775F45@m3w.org> My situation was - Gtk2 interface (GtkFileChooser in my case) returns an UTF-8 encoded string. UTF-8 being GLib internal/native encoding. Neither CreateFileA not CreateFileW can handle it so I "hardcoded" some logic into FS.OpenFile(Readonly)? and handled a case with non-ASCII input. Ideal would be to have encoding information as an integral part of every TEXT, but? In my knowledge, POSIX systems handle UTF-8 filenames well (?check:) so explicit information on encoding for FS is needed only for Windows. On Jun 25, 2012, at 8:44 PM, Jay K wrote: > Functions like CreateFileA use the "ANSI" or "OEM" code page, subject to a public global in Win32, and the two code pages vary per-install (or per-user). It is just not a good system. > > > Functions like CreateFileW work very well with 16bit encoded characters. > > > Can/do we arrange to have 16bit encoded characters? > > > - Jay > > > From: dragisha at m3w.org > > Date: Mon, 25 Jun 2012 12:51:05 +0200 > > To: m3devel at elegosoft.com > > Subject: [M3devel] Windows, Unicode file names > > > > Anybody aware of issues with FSWin32.m3 and cases where one actually has to cover situation with non-ASCII filenames under Windows? Met problem? > > > > TIA, > > dd > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 20:44:18 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 18:44:18 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> References: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> Message-ID: Functions like CreateFileA use the "ANSI" or "OEM" code page, subject to a public global in Win32, and the two code pages vary per-install (or per-user). It is just not a good system. Functions like CreateFileW work very well with 16bit encoded characters. Can/do we arrange to have 16bit encoded characters? - Jay > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 12:51:05 +0200 > To: m3devel at elegosoft.com > Subject: [M3devel] Windows, Unicode file names > > Anybody aware of issues with FSWin32.m3 and cases where one actually has to cover situation with non-ASCII filenames under Windows? Met problem? > > TIA, > dd > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 21:05:59 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 21:05:59 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> Message-ID: <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> If you cared enough to check FSWin32.m3, answer would be obvious :). Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= VAR handle: WinNT.HANDLE; fname := M3toC.SharedTtoS(p); dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); pwText: WinBaseTypes.PCWSTR; BEGIN IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN (* dwNum includes terminating null character. that's +1 above. *) handle := WinBase.CreateFile( lpFileName := fname, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); ELSE pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); handle := WinBase.CreateFileW( lpFileName := pwText, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); DISPOSE(pwText); END; IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN Fail(p, fname); END; M3toC.FreeSharedS(p, fname); RETURN FileWin32.New(handle, FileWin32.Read) END OpenFileReadonly; And similar in OpenFile. Not nice :). Also, I've added CP_UTF8 constant to WinNLS.i3. On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > >> Hi all: >> OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. >> But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): >> http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html >> >> Thanks in advance >> >> --- El lun, 25/6/12, Dragi?a Duri? escribi?: >> >> De: Dragi?a Duri? >> Asunto: Re: [M3devel] Windows, Unicode file names >> Para: "Daniel Alejandro Benavides D." >> CC: "m3devel" >> Fecha: lunes, 25 de junio, 2012 12:36 >> >> Daniel, >> >> I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. >> >> Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. >> >> I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. >> >> dd >> >> On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: >> >>> Hi all: >>> I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. >>> But in turn you want to keep compatibility with older file name encodes. >>> I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! >>> Thanks in advance >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jun 25 21:01:56 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 20:01:56 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <6DF57887-C46F-408C-863F-1242C4C4C6A9@m3w.org> Message-ID: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 13:20 Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 21:39:04 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 19:39:04 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> Message-ID: I think I know what to do here and will look into it..later.. We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW.Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution.A layer above this needs to decode UTF8, if that is the encoding. Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. - Jay From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:05:59 +0200 To: dabenavidesd at yahoo.es CC: m3devel at elegosoft.com Subject: Re: [M3devel] Windows, Unicode file names If you cared enough to check FSWin32.m3, answer would be obvious :). Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= VAR handle: WinNT.HANDLE; fname := M3toC.SharedTtoS(p); dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); pwText: WinBaseTypes.PCWSTR; BEGIN IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN (* dwNum includes terminating null character. that's +1 above. *) handle := WinBase.CreateFile( lpFileName := fname, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); ELSE pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); handle := WinBase.CreateFileW( lpFileName := pwText, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); DISPOSE(pwText); END; IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN Fail(p, fname); END; M3toC.FreeSharedS(p, fname); RETURN FileWin32.New(handle, FileWin32.Read) END OpenFileReadonly; And similar in OpenFile. Not nice :). Also, I've added CP_UTF8 constant to WinNLS.i3. On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote:Hi all: So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 13:20 Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote:Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote:Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 21:48:09 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 21:48:09 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> Message-ID: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. As long as we agree on what exacty WIDECHAR is :) ===From wikipedia The Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed]. === On Jun 25, 2012, at 9:39 PM, Jay K wrote: > I think I know what to do here and will look into it..later.. > > We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. > Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. > A layer above this needs to decode UTF8, if that is the encoding. > > Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. > > - Jay > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:05:59 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > If you cared enough to check FSWin32.m3, answer would be obvious :). > > Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: > > PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= > VAR > handle: WinNT.HANDLE; > fname := M3toC.SharedTtoS(p); > dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); > pwText: WinBaseTypes.PCWSTR; > BEGIN > IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN > (* dwNum includes terminating null character. that's +1 above. > *) > handle := WinBase.CreateFile( > lpFileName := fname, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > ELSE > pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); > EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); > handle := WinBase.CreateFileW( > lpFileName := pwText, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > DISPOSE(pwText); > END; > > IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN > Fail(p, fname); > END; > M3toC.FreeSharedS(p, fname); > RETURN FileWin32.New(handle, FileWin32.Read) > END OpenFileReadonly; > > And similar in OpenFile. Not nice :). > > Also, I've added CP_UTF8 constant to WinNLS.i3. > > On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 22:17:52 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 20:17:52 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> Message-ID: I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert fromTEXT to a flat array of either, and if 32bits, walk the array, checking for > 0xFFFF, throw an exception or return some error if any found, narrow to 16bits, call some "W" function, free the flat array.The size can, I guess, vary between Win32 and non-Win32 platforms.Its size should be stored in a global to communicate between Modula-3 and C. I'd also quite like if TEXT was internally represented as a nul terminated flat array of 8 and/or 16 and/or 32bit quantities, materialzing on demand some of them. But I suspect that flat and readonly and exposing a concat operation are in conflict. I'm not sure. MFC uses a flat reference counted nul terminated representation and it works pretty well. It doesn't materialize-on-demand other widths. - Jay Subject: Re: [M3devel] Windows, Unicode file names From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:48:09 +0200 CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com To: jay.krell at cornell.edu It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. As long as we agree on what exacty WIDECHAR is :)===From wikipediaThe Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed].=== On Jun 25, 2012, at 9:39 PM, Jay K wrote:I think I know what to do here and will look into it..later.. We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. A layer above this needs to decode UTF8, if that is the encoding. Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. - Jay From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:05:59 +0200 To: dabenavidesd at yahoo.es CC: m3devel at elegosoft.com Subject: Re: [M3devel] Windows, Unicode file names If you cared enough to check FSWin32.m3, answer would be obvious :). Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= VAR handle: WinNT.HANDLE; fname := M3toC.SharedTtoS(p); dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); pwText: WinBaseTypes.PCWSTR; BEGIN IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN (* dwNum includes terminating null character. that's +1 above. *) handle := WinBase.CreateFile( lpFileName := fname, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); ELSE pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); handle := WinBase.CreateFileW( lpFileName := pwText, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); DISPOSE(pwText); END; IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN Fail(p, fname); END; M3toC.FreeSharedS(p, fname); RETURN FileWin32.New(handle, FileWin32.Read) END OpenFileReadonly; And similar in OpenFile. Not nice :). Also, I've added CP_UTF8 constant to WinNLS.i3. On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote:Hi all: So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 13:20 Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote:Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote:Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Mon Jun 25 22:34:22 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Mon, 25 Jun 2012 16:34:22 -0400 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> Message-ID: <20120625203422.GA24287@topoi.pooq.com> On Mon, Jun 25, 2012 at 08:17:52PM +0000, Jay K wrote: > > I'd also quite like if TEXT was internally represented as a nul > terminated flat array of 8 and/or 16 and/or 32bit quantities, > materialzing on demand some of them. Does that conflict with NUL being a valid ASCII character? -- hendrik From rodney_bates at lcwb.coop Mon Jun 25 22:29:06 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Mon, 25 Jun 2012 15:29:06 -0500 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> Message-ID: <4FE8CA12.5040104@lcwb.coop> On 06/25/2012 02:48 PM, Dragi?a Duri? wrote: > It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. > > At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. > > As long as we agree on what exacty WIDECHAR is :) > ===From wikipedia > The Microsoft Windows application programming interfaces Win32 and Win64 , as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix -like systems generally require 32-bit values encoded using UTF-32 ^[/citation needed /] . > === > This is not necessarily a proposal, but FWIW: hen working on my altered cm3 TEXT implementations, I put every relevant thing I could find into a state that should allow M3 WIDECHAR to be 32-bit, with only one or two declarations changed. I think Pickles might need some attention to cope with this, however. We would want them to not only handle 32-bit WIDECHAR, but be able to read older pickle files that used 16-bits. > > On Jun 25, 2012, at 9:39 PM, Jay K wrote: > >> I think I know what to do here and will look into it..later.. >> >> We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. >> Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. >> A layer above this needs to decode UTF8, if that is the encoding. >> >> Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. >> >> - Jay > From jay.krell at cornell.edu Mon Jun 25 22:46:18 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 20:46:18 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <20120625203422.GA24287@topoi.pooq.com> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com> Message-ID: Somewhat but not fully. Text.Length should fetch a stored length. As I'm sure it already does.That length should always be correctly maintained. Same as today.Adding one extra nul at the end doesn't invalidate the data.std::string has the same properties -- c_str() can on-demand append a terminal nul,but there could also be one in the string itself.I understand it is a bit wierd. Maintaining a terminal nul does add cost that might be wasted.And reduces the capacity by one.It could be on-demand, I guess. - Jay > Date: Mon, 25 Jun 2012 16:34:22 -0400 > From: hendrik at topoi.pooq.com > To: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > On Mon, Jun 25, 2012 at 08:17:52PM +0000, Jay K wrote: > > > > I'd also quite like if TEXT was internally represented as a nul > > terminated flat array of 8 and/or 16 and/or 32bit quantities, > > materialzing on demand some of them. > > Does that conflict with NUL being a valid ASCII character? > > -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 23:09:37 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 23:09:37 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> Message-ID: On Jun 25, 2012, at 10:17 PM, Jay K wrote: > I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from > TEXT to a flat array of either, and if 32bits, walk the array, checking for > 0xFFFF, throw an exception or return some error if any found, narrow to 16bits, call some "W" function, free the flat array. > The size can, I guess, vary between Win32 and non-Win32 platforms. a) If you like to make it as unportable as possible then yes - 16 or 32 is not important. b) invalid value would be over 0xFFFFF, not 0xFFFF c) Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? WinNLS does that. Simple narrowing (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to UTF-16 is very different thing. d) Size varies, yes. > Its size should be stored in a global to communicate between Modula-3 and C. > > > I'd also quite like if TEXT was internally represented as a nul terminated flat array of 8 and/or 16 and/or 32bit quantities, materialzing on demand some of them. But I suspect that flat and readonly and exposing a concat operation are in conflict. I'm not sure. MFC uses a flat reference counted nul terminated representation and it works pretty well. It doesn't materialize-on-demand other widths. > > - Jay > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:48:09 +0200 > CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. > > At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. > > As long as we agree on what exacty WIDECHAR is :) > ===From wikipedia > The Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed]. > === > > > On Jun 25, 2012, at 9:39 PM, Jay K wrote: > > I think I know what to do here and will look into it..later.. > > We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. > Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. > A layer above this needs to decode UTF8, if that is the encoding. > > Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. > > - Jay > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:05:59 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > If you cared enough to check FSWin32.m3, answer would be obvious :). > > Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: > > PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= > VAR > handle: WinNT.HANDLE; > fname := M3toC.SharedTtoS(p); > dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); > pwText: WinBaseTypes.PCWSTR; > BEGIN > IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN > (* dwNum includes terminating null character. that's +1 above. > *) > handle := WinBase.CreateFile( > lpFileName := fname, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > ELSE > pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); > EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); > handle := WinBase.CreateFileW( > lpFileName := pwText, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > DISPOSE(pwText); > END; > > IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN > Fail(p, fname); > END; > M3toC.FreeSharedS(p, fname); > RETURN FileWin32.New(handle, FileWin32.Read) > END OpenFileReadonly; > > And similar in OpenFile. Not nice :). > > Also, I've added CP_UTF8 constant to WinNLS.i3. > > On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 23:11:49 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 23:11:49 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <4FE8CA12.5040104@lcwb.coop> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <4FE8CA12.5040104@lcwb.coop> Message-ID: <99C12F66-6DC0-4FC3-BC99-3C2A61595CBC@m3w.org> I agree with this. This way we are compatible with Unices (majority of systems we use) but we also have straight way to W functions of Windows API, similar to method I used but with distinctive presumption of input encoding. On Jun 25, 2012, at 10:29 PM, Rodney M. Bates wrote: > This is not necessarily a proposal, but FWIW: > > hen working on my altered cm3 TEXT implementations, I put every relevant thing I could find into > a state that should allow M3 WIDECHAR to be 32-bit, with only one or two declarations > changed. I think Pickles might need some attention to cope with this, however. We would > want them to not only handle 32-bit WIDECHAR, but be able to read older pickle files that > used 16-bits. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 23:30:08 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 21:30:08 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> , Message-ID: > Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? Yes. > WinNLS does that. I doubt that. There is a 32bit to 16bit conversion?Ok, I guess there is. "Surrogate pairs" and all that?Maybe not in WinNLS, but easy enough for us to write, in portable C or Modula-3. :)Part of Text.i3 perhaps. So then, I guess I can sign up for WIDECHAR being 32bits across the board. - Jay Subject: Re: [M3devel] Windows, Unicode file names From: dragisha at m3w.org Date: Mon, 25 Jun 2012 23:09:37 +0200 CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com To: jay.krell at cornell.edu On Jun 25, 2012, at 10:17 PM, Jay K wrote:I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from TEXT to a flat array of either, and if 32bits, walk the array, checking for > 0xFFFF, throw an exception or return some error if any found, narrow to 16bits, call some "W" function, free the flat array. The size can, I guess, vary between Win32 and non-Win32 platforms. a) If you like to make it as unportable as possible then yes - 16 or 32 is not important.b) invalid value would be over 0xFFFFF, not 0xFFFFc) Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? WinNLS does that. Simple narrowing (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to UTF-16 is very different thing.d) Size varies, yes. Its size should be stored in a global to communicate between Modula-3 and C. I'd also quite like if TEXT was internally represented as a nul terminated flat array of 8 and/or 16 and/or 32bit quantities, materialzing on demand some of them. But I suspect that flat and readonly and exposing a concat operation are in conflict. I'm not sure. MFC uses a flat reference counted nul terminated representation and it works pretty well. It doesn't materialize-on-demand other widths. - Jay Subject: Re: [M3devel] Windows, Unicode file names From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:48:09 +0200 CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com To: jay.krell at cornell.edu It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. As long as we agree on what exacty WIDECHAR is :)===From wikipediaThe Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed].=== On Jun 25, 2012, at 9:39 PM, Jay K wrote:I think I know what to do here and will look into it..later.. We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. A layer above this needs to decode UTF8, if that is the encoding. Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. - Jay From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:05:59 +0200 To: dabenavidesd at yahoo.es CC: m3devel at elegosoft.com Subject: Re: [M3devel] Windows, Unicode file names If you cared enough to check FSWin32.m3, answer would be obvious :). Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= VAR handle: WinNT.HANDLE; fname := M3toC.SharedTtoS(p); dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); pwText: WinBaseTypes.PCWSTR; BEGIN IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN (* dwNum includes terminating null character. that's +1 above. *) handle := WinBase.CreateFile( lpFileName := fname, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); ELSE pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); handle := WinBase.CreateFileW( lpFileName := pwText, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); DISPOSE(pwText); END; IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN Fail(p, fname); END; M3toC.FreeSharedS(p, fname); RETURN FileWin32.New(handle, FileWin32.Read) END OpenFileReadonly; And similar in OpenFile. Not nice :). Also, I've added CP_UTF8 constant to WinNLS.i3. On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote:Hi all: So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 13:20 Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote:Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote:Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 26 00:55:45 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 00:55:45 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> , Message-ID: <1E70B011-E236-4931-AB6C-78EC47EA8126@m3w.org> On Jun 25, 2012, at 11:30 PM, Jay K wrote: > > Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? > > Yes. > > > WinNLS does that. > > > I doubt that. There is a 32bit to 16bit conversion? http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx whatever this means: 12000utf-32Unicode UTF-32, little endian byte order; available only to managed applications 12001utf-32BEUnicode UTF-32, big endian byte order; available only to managed applications > Ok, I guess there is. "Surrogate pairs" and all that? > Maybe not in WinNLS, but easy enough for us to write, in portable C or Modula-3. :) That too :) > Part of Text.i3 perhaps. UTF-32 -> UTF-16? Maybe. > > > So then, I guess I can sign up for WIDECHAR being 32bits across the board. > > - Jay > > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 23:09:37 +0200 > CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > > On Jun 25, 2012, at 10:17 PM, Jay K wrote: > > I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from > TEXT to a flat array of either, and if 32bits, walk the array, checking for > 0xFFFF, throw an exception or return some error if any found, narrow to 16bits, call some "W" function, free the flat array. > The size can, I guess, vary between Win32 and non-Win32 platforms. > > a) If you like to make it as unportable as possible then yes - 16 or 32 is not important. > b) invalid value would be over 0xFFFFF, not 0xFFFF > c) Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? WinNLS does that. Simple narrowing (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to UTF-16 is very different thing. > d) Size varies, yes. > > Its size should be stored in a global to communicate between Modula-3 and C. > > > I'd also quite like if TEXT was internally represented as a nul terminated flat array of 8 and/or 16 and/or 32bit quantities, materialzing on demand some of them. But I suspect that flat and readonly and exposing a concat operation are in conflict. I'm not sure. MFC uses a flat reference counted nul terminated representation and it works pretty well. It doesn't materialize-on-demand other widths. > > - Jay > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:48:09 +0200 > CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. > > At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. > > As long as we agree on what exacty WIDECHAR is :) > ===From wikipedia > The Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed]. > === > > > On Jun 25, 2012, at 9:39 PM, Jay K wrote: > > I think I know what to do here and will look into it..later.. > > We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. > Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. > A layer above this needs to decode UTF8, if that is the encoding. > > Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. > > - Jay > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:05:59 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > If you cared enough to check FSWin32.m3, answer would be obvious :). > > Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: > > PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= > VAR > handle: WinNT.HANDLE; > fname := M3toC.SharedTtoS(p); > dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); > pwText: WinBaseTypes.PCWSTR; > BEGIN > IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN > (* dwNum includes terminating null character. that's +1 above. > *) > handle := WinBase.CreateFile( > lpFileName := fname, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > ELSE > pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); > EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); > handle := WinBase.CreateFileW( > lpFileName := pwText, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > DISPOSE(pwText); > END; > > IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN > Fail(p, fname); > END; > M3toC.FreeSharedS(p, fname); > RETURN FileWin32.New(handle, FileWin32.Read) > END OpenFileReadonly; > > And similar in OpenFile. Not nice :). > > Also, I've added CP_UTF8 constant to WinNLS.i3. > > On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Tue Jun 26 02:58:05 2012 From: jay.krell at cornell.edu (Jay K) Date: Tue, 26 Jun 2012 00:58:05 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1E70B011-E236-4931-AB6C-78EC47EA8126@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , , , , <1E70B011-E236-4931-AB6C-78EC47EA8126@m3w.org> Message-ID: ? > http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx?? ? > 12000utf-32Unicode UTF-32, little endian byte order; available only to? managed applications?? ? > 12001utf-32BEUnicode UTF-32, big endian byte order; available only to? managed applications ? Is not useful to us...unless we target .NET instead of native code... Portable Modula-3 or C it should be. ?- Jay ________________________________ > From: dragisha at m3w.org > Date: Tue, 26 Jun 2012 00:55:45 +0200 > To: jay.krell at cornell.edu > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > > On Jun 25, 2012, at 11:30 PM, Jay K wrote: > > > Why would you narrow it to 16bit? You need to convert to UTF-16 and > make it ready for Windows API calls? > > Yes. > > > WinNLS does that. > > > I doubt that. There is a 32bit to 16bit conversion? > > http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx > > whatever this means: > 12000utf-32Unicode UTF-32, little endian byte order; available only to > managed applications > 12001utf-32BEUnicode UTF-32, big endian byte order; available only to > managed applications > > Ok, I guess there is. "Surrogate pairs" and all that? > Maybe not in WinNLS, but easy enough for us to write, in portable C or > Modula-3. :) > > That too :) > > Part of Text.i3 perhaps. > > UTF-32 -> UTF-16? Maybe. > > > > So then, I guess I can sign up for WIDECHAR being 32bits across the board. > > - Jay > > ________________________________ > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 23:09:37 +0200 > CC: dabenavidesd at yahoo.es; > m3devel at elegosoft.com > To: jay.krell at cornell.edu > > > On Jun 25, 2012, at 10:17 PM, Jay K wrote: > > I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from > TEXT to a flat array of either, and if 32bits, walk the array, checking > for > 0xFFFF, throw an exception or return some error if any found, > narrow to 16bits, call some "W" function, free the flat array. > The size can, I guess, vary between Win32 and non-Win32 platforms. > > a) If you like to make it as unportable as possible then yes - 16 or 32 > is not important. > b) invalid value would be over 0xFFFFF, not 0xFFFF > c) Why would you narrow it to 16bit? You need to convert to UTF-16 and > make it ready for Windows API calls? WinNLS does that. Simple narrowing > (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to > UTF-16 is very different thing. > d) Size varies, yes. > > Its size should be stored in a global to communicate between Modula-3 and C. > > > I'd also quite like if TEXT was internally represented as a nul > terminated flat array of 8 and/or 16 and/or 32bit quantities, > materialzing on demand some of them. But I suspect that flat and > readonly and exposing a concat operation are in conflict. I'm not sure. > MFC uses a flat reference counted nul terminated representation and it > works pretty well. It doesn't materialize-on-demand other widths. > > - Jay > ________________________________ > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:48:09 +0200 > CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > It can be what cm3 people had in mind when they created WIDECHAR as a > catchall for Unicode. > > At first glance it looked like no solution to me, but after counting to > ten - I think it is. We can have an UTF-8 layer and use it when and > where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. > > As long as we agree on what exacty WIDECHAR is :) > ===From wikipedia > The Microsoft Windows application programming > interfaces Win32 and Win64, > as well as > the Java and .Net > Framework platforms, > require that wide character variables be defined as 16-bit values, and > that characters be encoded > using UTF-16 (due to former use of > UCS-2), while modern Unix-like > systems generally require 32-bit values encoded > using UTF-32[citation > needed]. > === > > > On Jun 25, 2012, at 9:39 PM, Jay K wrote: > > I think I know what to do here and will look into it..later.. > > We have TEXT. We should just always get WIDECHARs out of it and call > CreateFileW. > Assuming UTF8 is the wrong solution at this level, and passing in UTF8 > won't work with the correct solution. > A layer above this needs to decode UTF8, if that is the encoding. > > Unless someone has declared and implemented that TEXT is in fact always > UTF8-encoded, which I doubt. > > - Jay > ________________________________ > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:05:59 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > If you cared enough to check FSWin32.m3, answer would be obvious :). > > Whatever I do with pathname before I call FS.OpenFile(Readonly)? - > FSWin32.m3 will call CreateFileA. My solution is: > > PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= > VAR > handle: WinNT.HANDLE; > fname := M3toC.SharedTtoS(p); > dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, > NIL, 0); > pwText: WinBaseTypes.PCWSTR; > BEGIN > IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN > (* dwNum includes terminating null character. that's +1 above. > *) > handle := WinBase.CreateFile( > lpFileName := fname, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > ELSE > pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), > WinBaseTypes.PCWSTR); > EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, > pwText, dwNum); > handle := WinBase.CreateFileW( > lpFileName := pwText, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > DISPOSE(pwText); > END; > > IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN > Fail(p, fname); > END; > M3toC.FreeSharedS(p, fname); > RETURN FileWin32.New(handle, FileWin32.Read) > END OpenFileReadonly; > > And similar in OpenFile. Not nice :). > > Also, I've added CP_UTF8 constant to WinNLS.i3. > > On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > So do you need Double-Byte Character String module as currently in TEXT > types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a > Duri? > escribi?: > > De: Dragi?a Duri? > > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > > > CC: "m3devel" > > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit > partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to > FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at > ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of > Modula-3 (but for CM3, though it isn't compiled in elego servers, but > here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a > Duri? > escribi?: > > De: Dragi?a Duri? > > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > > > CC: "m3devel" > > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest > to me. Once you start a topic, and I can understand what is it about, > and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files > with filenames in ASCII and UTF-16. Everything else - you must check > twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with > people who understand this problem. My solution is a workaround and > assumes filename is UTF-8 or ASCII. I would like to start discussion on > this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 > / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't > care at all either) I don't know know your problem was because it won't > be able to be solved! > Thanks in advance > > > From dragisha at m3w.org Tue Jun 26 12:18:41 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 12:18:41 +0200 Subject: [M3devel] =?windows-1252?q?AND_=28=85=2C_16=5Fff=29=85_Not_seriou?= =?windows-1252?q?s_-_or_so_I_hope!?= Message-ID: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> This piece of code, from TextClass.m3, disturbs me? a lot. If we are to use WIDECHAR, I think we must be a lot more serious than this. Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. Probably by someone whose mother tongue is fully writeable with ASCII :). ==== PROCEDURE GetChars (t: TEXT; VAR a: ARRAY OF CHAR; start: CARDINAL) = VAR info : Info; cnt : INTEGER; next : CARDINAL := 0; buf : ARRAY [0..127] OF WIDECHAR; BEGIN t.get_info (info); cnt := MIN (NUMBER (a), info.length - start); WHILE (cnt > 0) DO t.get_wide_chars (buf, start); FOR i := FIRST (buf) TO LAST (buf) DO IF (cnt = 0) THEN RETURN END; a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); INC (next); DEC (cnt); END; INC (start, NUMBER (buf)); END; END GetChars; ==== From dabenavidesd at yahoo.es Tue Jun 26 14:12:42 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 26 Jun 2012 13:12:42 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> Message-ID: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: Maybe is a left over of older code (almost just used a decade ago) but if not, then this meant to be just a partial implementation? If we are to get serious about memory usage seems over strict (or just in case you don't need system NIL terminated widechars be checked?). Thanks in advance --- El mar, 26/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "m3devel" Fecha: martes, 26 de junio, 2012 05:18 This piece of code, from TextClass.m3, disturbs me? a lot. If we are to use WIDECHAR, I think we must be a lot more serious than this. Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. Probably by someone whose mother tongue is fully writeable with ASCII :). ==== PROCEDURE GetChars (t: TEXT;? VAR a: ARRAY OF CHAR;? start: CARDINAL) = VAR ???info : Info; ???cnt? : INTEGER; ???next : CARDINAL := 0; ???buf? : ARRAY [0..127] OF WIDECHAR; BEGIN ???t.get_info (info); ???cnt := MIN (NUMBER (a), info.length - start); ???WHILE (cnt > 0) DO ? ???t.get_wide_chars (buf, start); ? ???FOR i := FIRST (buf) TO LAST (buf) DO ? ? ???IF (cnt = 0) THEN RETURN END; ? ? ???a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); ? ? ???INC (next);? DEC (cnt); ? ???END; ? ???INC (start, NUMBER (buf)); ???END; END GetChars; ==== -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 26 14:27:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 14:27:00 +0200 Subject: [M3devel] =?windows-1252?q?AND_=28=85=2C_16=5Fff=29=85_Not_seriou?= =?windows-1252?q?s_-_or_so_I_hope!?= In-Reply-To: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> Message-ID: <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> If you cared to read, for example Text.i3, you would see this is exactly what cm3 people meant to be. On Jun 26, 2012, at 2:12 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > Maybe is a left over of older code (almost just used a decade ago) but if not, then this meant to be just a partial implementation? If we are to get serious about memory usage seems over strict (or just in case you don't need system NIL terminated widechars be checked?). > Thanks in advance > > --- El mar, 26/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > Para: "m3devel" > Fecha: martes, 26 de junio, 2012 05:18 > > This piece of code, from TextClass.m3, disturbs me? a lot. > > If we are to use WIDECHAR, I think we must be a lot more serious than this. > > Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. > > Probably by someone whose mother tongue is fully writeable with ASCII :). > > ==== > PROCEDURE GetChars (t: TEXT; VAR a: ARRAY OF CHAR; start: CARDINAL) = > VAR > info : Info; > cnt : INTEGER; > next : CARDINAL := 0; > buf : ARRAY [0..127] OF WIDECHAR; > BEGIN > t.get_info (info); > cnt := MIN (NUMBER (a), info.length - start); > WHILE (cnt > 0) DO > t.get_wide_chars (buf, start); > FOR i := FIRST (buf) TO LAST (buf) DO > IF (cnt = 0) THEN RETURN END; > a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); > INC (next); DEC (cnt); > END; > INC (start, NUMBER (buf)); > END; > END GetChars; > ==== > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Tue Jun 26 14:47:31 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 26 Jun 2012 13:47:31 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> Message-ID: <1340714851.17688.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: copied that, but interface TextClass GetChars is kind of different from GetChar in Text. I can't see the interrelation Thanks in advance --- El mar, 26/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: martes, 26 de junio, 2012 07:27 If you cared to read, for example Text.i3, you would see this is exactly what cm3 people meant to be. On Jun 26, 2012, at 2:12 PM, Daniel Alejandro Benavides D. wrote: Hi all: Maybe is a left over of older code (almost just used a decade ago) but if not, then this meant to be just a partial implementation? If we are to get serious about memory usage seems over strict (or just in case you don't need system NIL terminated widechars be checked?). Thanks in advance --- El mar, 26/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "m3devel" Fecha: martes, 26 de junio, 2012 05:18 This piece of code, from TextClass.m3, disturbs me? a lot. If we are to use WIDECHAR, I think we must be a lot more serious than this. Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. Probably by someone whose mother tongue is fully writeable with ASCII :). ==== PROCEDURE GetChars (t: TEXT;? VAR a: ARRAY OF CHAR;? start: CARDINAL) = VAR ???info : Info; ???cnt? : INTEGER; ???next : CARDINAL := 0; ???buf? : ARRAY [0..127] OF WIDECHAR; BEGIN ???t.get_info (info); ???cnt := MIN (NUMBER (a), info.length - start); ???WHILE (cnt > 0) DO ? ???t.get_wide_chars (buf, start); ? ???FOR i := FIRST (buf) TO LAST (buf) DO ? ? ???IF (cnt = 0) THEN RETURN END; ? ? ???a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); ? ? ???INC (next);? DEC (cnt); ? ???END; ? ???INC (start, NUMBER (buf)); ???END; END GetChars; ==== -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Tue Jun 26 16:28:46 2012 From: jay.krell at cornell.edu (Jay K) Date: Tue, 26 Jun 2012 14:28:46 +0000 Subject: [M3devel] =?iso-8859-2?q?AND_=28=2E=2C_16=5Fff=29=2E_Not_serious_?= =?iso-8859-2?q?-_or_so_I_hope!?= In-Reply-To: <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com>, <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> Message-ID: ?> 128 limit I haven't read the code enough yet to verify that but you are probably right ?> ignoring everything over 16_FF Probably that is the responsibility/claim of the caller of GetChars. If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. Perhaps raising an exception would be reasonable to signal the loss of data. Or something. There is HasWideChars for you to check. There is no encoding implied remember. This isn't UTF8 data. ?- Jay ________________________________ > From: dragisha at m3w.org > Date: Tue, 26 Jun 2012 14:27:00 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > If you cared to read, for example Text.i3, you would see this is > exactly what cm3 people meant to be. > > On Jun 26, 2012, at 2:12 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > Maybe is a left over of older code (almost just used a decade ago) but > if not, then this meant to be just a partial implementation? If we are > to get serious about memory usage seems over strict (or just in case > you don't need system NIL terminated widechars be checked?). > Thanks in advance > > --- El mar, 26/6/12, Dragi?a Duri? > > escribi?: > > De: Dragi?a Duri? > > Asunto: [M3devel] AND (., 16_ff). Not serious - or so I hope! > Para: "m3devel" > > Fecha: martes, 26 de junio, 2012 05:18 > > This piece of code, from TextClass.m3, disturbs me. a lot. > > If we are to use WIDECHAR, I think we must be a lot more serious than this. > > Probably, text pieces are limited to 128 bytes by design, somewhere. > But - whose idea was to "narrow" by ignoring everything except 8 LSB's? > By mapping set of 2^20 elements to set of 2^8 elements. > > Probably by someone whose mother tongue is fully writeable with ASCII :). > > ==== > PROCEDURE GetChars (t: TEXT; VAR a: ARRAY OF CHAR; start: CARDINAL) = > VAR > info : Info; > cnt : INTEGER; > next : CARDINAL := 0; > buf : ARRAY [0..127] OF WIDECHAR; > BEGIN > t.get_info (info); > cnt := MIN (NUMBER (a), info.length - start); > WHILE (cnt > 0) DO > t.get_wide_chars (buf, start); > FOR i := FIRST (buf) TO LAST (buf) DO > IF (cnt = 0) THEN RETURN END; > a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); > INC (next); DEC (cnt); > END; > INC (start, NUMBER (buf)); > END; > END GetChars; > ==== > > > From dragisha at m3w.org Tue Jun 26 17:14:06 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 17:14:06 +0200 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com>, <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> Message-ID: <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> On Jun 26, 2012, at 4:28 PM, Jay K wrote: > > 128 limit > > I haven't read the code enough yet to verify that but you are probably right I was not right :), that call is incremental. > > > ignoring everything over 16_FF > > Probably that is the responsibility/claim of the caller of GetChars. > If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. > Perhaps raising an exception would be reasonable to signal the loss of data. Or something. > There is HasWideChars for you to check. > > There is no encoding implied remember. > This isn't UTF8 data. It is not, but probably only way to solve this without exception is to make UTF8 "official" 8bit encoding :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Tue Jun 26 18:00:05 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 26 Jun 2012 12:00:05 -0400 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> Message-ID: <20120626160005.GA29355@topoi.pooq.com> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: > > Somewhat but not fully. Text.Length should fetch a stored length. As > I'm sure it already does.That length should always be correctly > maintained. Same as today.Adding one extra nul at the end doesn't > invalidate the data.std::string has the same properties -- c_str() can > on-demand append a terminal nul,but there could also be one in the > string itself.I understand it is a bit wierd. Maintaining a terminal > nul does add cost that might be wasted.And reduces the capacity by > one.It could be on-demand, I guess. - Jay Don't need the 'on demand'. For the benefits of C interoperability, the extra byte is well worth the price. What I'm worrying about is someone using an enbedded NUL as an end-of-string marker. I smell more bugs creeping in. But I guess bug are inherent in C use, so I'm not surprised seeing them in C interoperation. -- hendrik From jay.krell at cornell.edu Tue Jun 26 18:34:01 2012 From: jay.krell at cornell.edu (Jay) Date: Tue, 26 Jun 2012 09:34:01 -0700 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> Message-ID: >> > 128 limit >> >> I haven't read the code enough yet to verify that but you are probably right > > I was not right :), that call is incremental. I looked for that aspect too but missed it. :( >> > ignoring everything over 16_FF >> >> Probably that is the responsibility/claim of the caller of GetChars. >> If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. >> Perhaps raising an exception would be reasonable to signal the loss of data. Or something. >> There is HasWideChars for you to check. > > >> >> There is no encoding implied remember. >> This isn't UTF8 data. > > > It is not, but probably only way to solve this without exception is to make UTF8 "official" 8bit encoding :) > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. Java is in-between but also simple & predictable -- there being no narrow option other than array of byte, which is reasonable. - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 26 18:46:07 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 18:46:07 +0200 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> Message-ID: <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> You had idea in other message. Store length! Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! On Jun 26, 2012, at 6:34 PM, Jay wrote: > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Tue Jun 26 18:51:00 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 26 Jun 2012 17:51:00 +0100 (BST) Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> Message-ID: <1340729460.40972.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: it would be so much greater fun time/verify and correct than use by hand. Let's do it sooner than later. Thanks in advance --- El mar, 26/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! Para: "Jay K" CC: dabenavidesd at yahoo.es, "m3devel" Fecha: martes, 26 de junio, 2012 10:14 On Jun 26, 2012, at 4:28 PM, Jay K wrote: > 128 limit I haven't read the code enough yet to verify that but you are probably right I was not right :), that call is incremental. ?> ignoring everything over 16_FF Probably that is the responsibility/claim of the caller of GetChars. If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. Perhaps raising an exception would be reasonable to signal the loss of data. Or something. There is HasWideChars for you to check. There is no encoding implied remember. This isn't UTF8 data. It is not, but probably only way to solve this without exception is to make UTF8 "official" 8bit encoding :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 26 19:01:42 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 19:01:42 +0200 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: <117F1599-4A24-462F-9462-7CC756BB7E4B@m3w.org> As for input encoding? Benjamin Kowarch (of M2R10 project) solved this with pragmas. There is good idea on how to instruct parser about text encoding used for source code (meaning also encoding used for string literals). As it's dependent on locals settings, it is important to let compiler know how to parse source. Of course, Unicode string literals will be stored as UTF8 strings after parsing. On Jun 26, 2012, at 6:46 PM, Dragi?a Duri? wrote: > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! > > On Jun 26, 2012, at 6:34 PM, Jay wrote: > >> I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. >> >> >> Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Tue Jun 26 20:19:55 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 26 Jun 2012 14:19:55 -0400 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> Message-ID: <20120626181955.GB29355@topoi.pooq.com> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi?a Duri? wrote: > This piece of code, from TextClass.m3, disturbs me? a lot. > > If we are to use WIDECHAR, I think we must be a lot more serious than this. > > Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. > > Probably by someone whose mother tongue is fully writeable with ASCII :). I'm told the Japanese hate UTF-8, because it expands their characters from two bytes to three. -- hendrik From mika at async.caltech.edu Tue Jun 26 20:50:08 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 26 Jun 2012 11:50:08 -0700 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <20120626160005.GA29355@topoi.pooq.com> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> Message-ID: <20120626185008.50E131A205B@async.async.caltech.edu> As far as I know, SRC M3 and PM3 come with a TEXT implementation that works exactly as described below. An extra byte is used at the end with a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. One of the big advantages of the old version is that Text.Hash is really, really fast. Especially on Alphas... it's hugely more expensive to have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under CM3 than under the old compilers and runtimes. We're talking a factor of five or so in speed since the Table routines are generally entirely dominated by Text.Hash. Mika Hendrik Boom writes: >On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: >> >> Somewhat but not fully. Text.Length should fetch a stored length. As >> I'm sure it already does.That length should always be correctly >> maintained. Same as today.Adding one extra nul at the end doesn't >> invalidate the data.std::string has the same properties -- c_str() can >> on-demand append a terminal nul,but there could also be one in the >> string itself.I understand it is a bit wierd. Maintaining a terminal >> nul does add cost that might be wasted.And reduces the capacity by >> one.It could be on-demand, I guess. - Jay > >Don't need the 'on demand'. For the benefits of C interoperability, the >extra byte is well worth the price. What I'm worrying about is someone >using an enbedded NUL as an end-of-string marker. I smell more bugs >creeping in. But I guess bug are inherent in C use, so I'm not >surprised seeing them in C interoperation. > >-- hendrik From mika at async.caltech.edu Tue Jun 26 20:52:21 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 26 Jun 2012 11:52:21 -0700 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: <20120626185221.24C8B1A205B@async.async.caltech.edu> =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > >--Apple-Mail=_03217A26-DF5A-42D7-BAA5-DF805C7EE80E >Content-Transfer-Encoding: quoted-printable >Content-Type: text/plain; > charset=us-ascii > >You had idea in other message. Store length! > >Another idea - store partial list of indices to character locations. So = >whatever one does, that list can be used/expanded. Whatever storage = >issues this makes, they are probably minor as compared to 32bit WIDECHAR = >for all idea. > >Mika had performance problems with cm3 TEXT. I hope he follows and cares = >to refresh us on those issues?! Apart from the hash table issue I mentioned there were horrible performance issues when concatenating in particular ways, but I think that's been solved now. I don't think anyone has looked at Text.Hash very closely. Mika From dmuysers at hotmail.com Tue Jun 26 21:38:16 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Tue, 26 Jun 2012 21:38:16 +0200 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <20120626181955.GB29355@topoi.pooq.com> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> Message-ID: So let them hate it. Memory is not a problem anymore. -------------------------------------------------- From: "Hendrik Boom" Sent: Tuesday, June 26, 2012 8:19 PM To: Subject: Re: [M3devel]AND (?, 16_ff)? Not serious - or so I hope! > On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi?a Duri? wrote: >> This piece of code, from TextClass.m3, disturbs me? a lot. >> >> If we are to use WIDECHAR, I think we must be a lot more serious than >> this. >> >> Probably, text pieces are limited to 128 bytes by design, somewhere. >> But - whose idea was to "narrow" by ignoring everything except 8 LSB's? >> By mapping set of 2^20 elements to set of 2^8 elements. >> >> Probably by someone whose mother tongue is fully writeable with ASCII :). > > I'm told the Japanese hate UTF-8, because it expands their characters > from two bytes to three. > > -- hendrik > From dragisha at m3w.org Tue Jun 26 21:53:18 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 21:53:18 +0200 Subject: [M3devel] =?windows-1252?q?AND_=28=85=2C_16=5Fff=29=85_Not_seriou?= =?windows-1252?q?s_-_or_so_I_hope!?= In-Reply-To: References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> Message-ID: <5D286A2C-CA1A-4F8B-846A-CDCBDACA2661@m3w.org> Also? If we add length info to TEXT fragments, we might as well add encoding info :). So, most of TEXT fragments in memory will use same (system default) encoding but there will also be a way to mix them, convert to system default or anything some API (like Win32) requires. On Jun 26, 2012, at 9:38 PM, Dirk Muysers wrote: > So let them hate it. Memory is not a problem anymore. > > -------------------------------------------------- > From: "Hendrik Boom" > Sent: Tuesday, June 26, 2012 8:19 PM > To: > Subject: Re: [M3devel]AND (?, 16_ff)? Not serious - or so I hope! > >> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi?a Duri? wrote: >>> This piece of code, from TextClass.m3, disturbs me? a lot. >>> >>> If we are to use WIDECHAR, I think we must be a lot more serious than this. >>> >>> Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. >>> >>> Probably by someone whose mother tongue is fully writeable with ASCII :). >> >> I'm told the Japanese hate UTF-8, because it expands their characters >> from two bytes to three. >> >> -- hendrik From rcolebur at SCIRES.COM Tue Jun 26 22:22:22 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Tue, 26 Jun 2012 16:22:22 -0400 Subject: [M3devel] EXT Re: AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: I seem to recall that Rodney did some work a while back relating to TEXT. Rodney, can you weigh in on some of this? --Randy Coleburn From: Dragi?a Duri? [mailto:dragisha at m3w.org] Sent: Tuesday, June 26, 2012 12:46 PM To: Jay Cc: m3devel Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! You had idea in other message. Store length! Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! On Jun 26, 2012, at 6:34 PM, Jay wrote: I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Tue Jun 26 23:42:02 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Tue, 26 Jun 2012 16:42:02 -0500 Subject: [M3devel] EXT Re: AND (., 16_ff). Not serious - or so I hope! In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: <4FEA2CAA.4010306@lcwb.coop> On 06/26/2012 03:22 PM, Coleburn, Randy wrote: > I seem to recall that Rodney did some work a while back relating to TEXT. > > Rodney, can you weigh in on some of this? > I wrote a modified implementation of cm3 TEXT. It uses the same data structure and invariants, so any internal values it creates are useable by any existing code that imports the various revelations. It improves performance problems deriving from Cat operations' building trees that actually degenerate into linear lists (fairly likely, as it happens whenever a string is constructed by a left-to-right or right-to-left series of concatenations.) As usual, some operations on some values are slower, but it seems to be a gain overall. I have an extensive test driver and statistics gatherer, which shows good results. However, only tested it on LINUXLIBC6 and AMD64_LINUX, machines I have. Olaf was not comfortable that it was fully tested this way, and I have never taken the time to figure out how to run tests on targets I don't have. I think it is a significant improvement over the stock cm3 TEXT implementation. Whether it is as good as just going back to the pm3 implementation is not so clear. All three implementations correctly (except for possible bugs--none known at present, AFAIK) implement the language's abstract Text interface, and code that only uses Text would see only performance differences. > --Randy Coleburn > > *From:*Dragi?a Duri? [mailto:dragisha at m3w.org] > *Sent:* Tuesday, June 26, 2012 12:46 PM > *To:* Jay > *Cc:* m3devel > *Subject:* EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! > > On Jun 26, 2012, at 6:34 PM, Jay wrote: > > > > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > From hendrik at topoi.pooq.com Wed Jun 27 00:16:39 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 26 Jun 2012 18:16:39 -0400 Subject: [M3devel] TEXT In-Reply-To: <5D286A2C-CA1A-4F8B-846A-CDCBDACA2661@m3w.org> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> <5D286A2C-CA1A-4F8B-846A-CDCBDACA2661@m3w.org> Message-ID: <20120626221639.GA28021@topoi.pooq.com> On Tue, Jun 26, 2012 at 09:53:18PM +0200, Dragi?a Duri? wrote: > Also? If we add length info to TEXT fragments, we might as well add encoding info :). We could do that by letting TEXT have subtypes, depending on the encoding. -- hendrik From rcolebur at SCIRES.COM Wed Jun 27 01:44:26 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Tue, 26 Jun 2012 19:44:26 -0400 Subject: [M3devel] EXT Re: EXT Re: AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <4FEA2CAA.4010306@lcwb.coop> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <4FEA2CAA.4010306@lcwb.coop> Message-ID: I am willing to run tests on platforms that I have, mostly Windows flavors. --Randy Coleburn -----Original Message----- From: Rodney M. Bates [mailto:rodney_bates at lcwb.coop] Sent: Tuesday, June 26, 2012 5:42 PM To: m3devel at elegosoft.com Subject: EXT Re: [M3devel] EXT Re: AND (., 16_ff). Not serious - or so I hope! On 06/26/2012 03:22 PM, Coleburn, Randy wrote: > I seem to recall that Rodney did some work a while back relating to TEXT. > > Rodney, can you weigh in on some of this? > I wrote a modified implementation of cm3 TEXT. It uses the same data structure and invariants, so any internal values it creates are useable by any existing code that imports the various revelations. It improves performance problems deriving from Cat operations' building trees that actually degenerate into linear lists (fairly likely, as it happens whenever a string is constructed by a left-to-right or right-to-left series of concatenations.) As usual, some operations on some values are slower, but it seems to be a gain overall. I have an extensive test driver and statistics gatherer, which shows good results. However, only tested it on LINUXLIBC6 and AMD64_LINUX, machines I have. Olaf was not comfortable that it was fully tested this way, and I have never taken the time to figure out how to run tests on targets I don't have. I think it is a significant improvement over the stock cm3 TEXT implementation. Whether it is as good as just going back to the pm3 implementation is not so clear. All three implementations correctly (except for possible bugs--none known at present, AFAIK) implement the language's abstract Text interface, and code that only uses Text would see only performance differences. > --Randy Coleburn > > *From:*Dragi?a Duri? [mailto:dragisha at m3w.org] > *Sent:* Tuesday, June 26, 2012 12:46 PM > *To:* Jay > *Cc:* m3devel > *Subject:* EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! > > On Jun 26, 2012, at 6:34 PM, Jay wrote: > > > > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > From dabenavidesd at yahoo.es Wed Jun 27 03:41:33 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 27 Jun 2012 02:41:33 +0100 (BST) Subject: [M3devel] EXT Re: EXT Re: AND (., 16_ff). Not serious - or so I hope! In-Reply-To: Message-ID: <1340761293.52332.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: even if we have non-faulty implementation the problem remains the same, the coding standard is non-uniformly used, but instead use old TEXT with C cross-compiled version seemed the way to win at least in Win flavors. I give that point to Jay, he is absolutely right,? I fear that if we don't do this correctly, we could loss in C compiler intrinsics. Perhaps before all this work continues we need to port this better we won't do real big advances more quickly. Thanks in advance --- El mar, 26/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] EXT Re: EXT Re: AND (., 16_ff). Not serious - or so I hope! Para: "m3devel at elegosoft.com" Fecha: martes, 26 de junio, 2012 18:44 I am willing to run tests on platforms that I have, mostly Windows flavors. --Randy Coleburn -----Original Message----- From: Rodney M. Bates [mailto:rodney_bates at lcwb.coop] Sent: Tuesday, June 26, 2012 5:42 PM To: m3devel at elegosoft.com Subject: EXT Re: [M3devel] EXT Re: AND (., 16_ff). Not serious - or so I hope! On 06/26/2012 03:22 PM, Coleburn, Randy wrote: > I seem to recall that Rodney did some work a while back relating to TEXT. > > Rodney, can you weigh in on some of this? > I wrote a modified implementation of cm3 TEXT.? It uses the same data structure and invariants, so any internal values it creates are useable by any existing code that imports the various revelations.? It improves performance problems deriving from Cat operations' building trees that actually degenerate into linear lists (fairly likely, as it happens whenever a string is constructed by a left-to-right or right-to-left series of concatenations.)? As usual, some operations on some values are slower, but it seems to be a gain overall. I have an extensive test driver and statistics gatherer, which shows good results.? However, only tested it on LINUXLIBC6 and AMD64_LINUX, machines I have.? Olaf was not comfortable that it was fully tested this way, and I have never taken the time to figure out how to run tests on targets I don't have. I think it is a significant improvement over the stock cm3 TEXT implementation. Whether it is as good as just going back to the pm3 implementation is not so clear. All three implementations correctly (except for possible bugs--none known at present, AFAIK) implement the language's abstract Text interface, and code that only uses Text would see only performance differences. > --Randy Coleburn > > *From:*Dragi?a Duri? [mailto:dragisha at m3w.org] > *Sent:* Tuesday, June 26, 2012 12:46 PM > *To:* Jay > *Cc:* m3devel > *Subject:* EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! > > On Jun 26, 2012, at 6:34 PM, Jay wrote: > > > > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Wed Jun 27 03:54:31 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 27 Jun 2012 02:54:31 +0100 (BST) Subject: [M3devel] TEXT In-Reply-To: <20120626221639.GA28021@topoi.pooq.com> Message-ID: <1340762071.63111.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I don't know, but if this would coexist with everything (e.g C) hard to know is whether this will affect the overall performance, sometimes this is like that (for instance CM3 Text), but perhaps if it's just costs in memory then I wish it were like that. Thanks in advance --- El mar, 26/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] TEXT Para: m3devel at elegosoft.com Fecha: martes, 26 de junio, 2012 17:16 On Tue, Jun 26, 2012 at 09:53:18PM +0200, Dragi?a Duri? wrote: > Also? If we add length info to TEXT fragments, we might as well add encoding info :). We could do that by letting TEXT have subtypes, depending on the encoding. -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Wed Jun 27 03:54:57 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 26 Jun 2012 18:54:57 -0700 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> Message-ID: <20120627015457.238041A205B@async.async.caltech.edu> Memory is always potentially a problem!!!! One of the main reasons my group was slow at switching from PM3 to CM3 was because we were processing node names for chip designs as TEXTs. Chip designs tend to be deeply hierarchical and you wind up printing a lot of strings such as a.b.c.d.e.f.g.h to files. That's when you run into problems with Text.Cat. And memory will always be a problem since you are always designing the next generation of computers with the current generation of computers. Also even if memory weren't a problem, speed is always a problem, and speed isn't entirely unrelated to memory. The Text.Hash I was alluding to earlier hashes eight characters per iteration on a 64-bit machine, as long as characters are 8 bits... If you go to 16 bits it'll take at least twice as long. Furthermore if there is more than one way (bit pattern) to represent a single CHAR it becomes difficult to use algorithms that take more than one at a time. Mika "Dirk Muysers" writes: >So let them hate it. Memory is not a problem anymore. > >-------------------------------------------------- >From: "Hendrik Boom" >Sent: Tuesday, June 26, 2012 8:19 PM >To: >Subject: Re: [M3devel]AND (???, 16_ff)??? Not serious - or so I hope! > >> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi??a Duri?? wrote: >>> This piece of code, from TextClass.m3, disturbs me??? a lot. >>> >>> If we are to use WIDECHAR, I think we must be a lot more serious than >>> this. >>> >>> Probably, text pieces are limited to 128 bytes by design, somewhere. >>> But - whose idea was to "narrow" by ignoring everything except 8 LSB's? >>> By mapping set of 2^20 elements to set of 2^8 elements. >>> >>> Probably by someone whose mother tongue is fully writeable with ASCII :). >> >> I'm told the Japanese hate UTF-8, because it expands their characters >> from two bytes to three. >> >> -- hendrik >> From dabenavidesd at yahoo.es Wed Jun 27 04:18:53 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 27 Jun 2012 03:18:53 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <20120627015457.238041A205B@async.async.caltech.edu> Message-ID: <1340763533.57548.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: Well you have created a chicken and egg problem/opportunity you can create your theory and move forward. I guess history has shown that big chunks of memory won't make higher speed execution programs, but distributed machines with less memory. The problem is I could set a theory that explains how computers could actually evolve and so on, based on family of computers, you know like Stack Computers, and it turns out that in reality it doesn't work like that, and by the reality it's not true (also I don't consider the "reality" to be that, I don't think tablets and stuff will be takers of tomorrow as today, it's very very useful, as were Micros in their time but can't come back and do that again, Micros are gone). I don't think or hate devices or people who uses it (perhaps I'm old for that) but this things are mostly used to send messages to set up quickly a web page (an every day task which I still consider for talented people) Frankly we can say many thing sin theory again but just good people and companies can make a standard way of doing things. Thanks in advance --- El mar, 26/6/12, Mika Nystrom escribi?: De: Mika Nystrom Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "Dirk Muysers" CC: m3devel at elegosoft.com Fecha: martes, 26 de junio, 2012 20:54 Memory is always potentially a problem!!!! One of the main reasons my group was slow at switching from PM3 to CM3 was because we were processing node names for chip designs as TEXTs. Chip designs tend to be deeply hierarchical and you wind up printing a lot of strings such as a.b.c.d.e.f.g.h to files. That's when you run into problems with Text.Cat. And memory will always be a problem since you are always designing the next generation of computers with the current generation of computers. Also even if memory weren't a problem, speed is always a problem, and speed isn't entirely unrelated to memory.? The Text.Hash I was alluding to earlier hashes eight characters per iteration on a 64-bit machine, as long as characters are 8 bits...? If you go to 16 bits it'll take at least twice as long.? Furthermore if there is more than one way (bit pattern) to represent a single CHAR it becomes difficult to use algorithms that take more than one at a time. ? ? Mika "Dirk Muysers" writes: >So let them hate it. Memory is not a problem anymore. > >-------------------------------------------------- >From: "Hendrik Boom" >Sent: Tuesday, June 26, 2012 8:19 PM >To: >Subject: Re: [M3devel]AND (?, 16_ff)? Not serious - or so I hope! > >> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi?a Duri? wrote: >>> This piece of code, from TextClass.m3, disturbs me? a lot. >>> >>> If we are to use WIDECHAR, I think we must be a lot more serious than >>> this. >>> >>> Probably, text pieces are limited to 128 bytes by design, somewhere. >>> But - whose idea was to "narrow" by ignoring everything except 8 LSB's? >>> By mapping set of 2^20 elements to set of 2^8 elements. >>> >>> Probably by someone whose mother tongue is fully writeable with ASCII :). >> >> I'm told the Japanese hate UTF-8, because it expands? their characters >> from two bytes to three. >> >> -- hendrik >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Wed Jun 27 05:30:01 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 26 Jun 2012 23:30:01 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: <20120627033000.GB28021@topoi.pooq.com> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: > I seem to recall that Rodney did some work a while back relating to TEXT. > Rodney, can you weigh in on some of this? > --Randy Coleburn > > From: Dragi?a Duri? [mailto:dragisha at m3w.org] > Sent: Tuesday, June 26, 2012 12:46 PM > To: Jay > Cc: m3devel > Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. Most of the time, you don't need explicit integer indexes to character locations. What you do need is an operation that fetches a character given the string and its index (whatever data structure that index is), and one that increments the index past that character. As long as you can save an index and use it later on the same string, that's probably all you ever need. And with a simple TEXT representation (such as the obvious array of bytes containing characters of various widths) a byte index is all you need (note: NOT a character index). It's easy even to use TEXT and its integer indices as the data representation, as long as you use the proper functions parse the characters and increment the indices by amounts that might differ from 1. And if your source code is represented in UTF-8, the representation that requires little extra compiler effort to parse, your TEXT strings will automagically appear in UTF-8. I can see a use for various wide characters -- the things you extract from a TEXT by parsing biits of it, but none for anything really new complicated for wide TEXT. The only confusing thing is that the existing operations for extracting bytes from TEXT have names that suggest they are extracting characters. -- Hendrik From dmuysers at hotmail.com Wed Jun 27 09:58:28 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Wed, 27 Jun 2012 09:58:28 +0200 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <20120627015457.238041A205B@async.async.caltech.edu> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> <20120627015457.238041A205B@async.async.caltech.edu> Message-ID: Some time ago I have started to develop a unicode library based on the old M3 text model but using UTF-8 internally rather than Latin-1 (see README attachement). For reasons best known to me I had to put it on the backburner in favour of more urgent work. If anybody is interested in furthering this solution I would eagerly give the existing (pre-alpha) code away. This being said, there are certainly better hash algorithms than the one used by m3core (eg Goullburn, see http://www.clockandflame.com/media/Goulburn06.pdf). -------------------------------------------------- From: "Mika Nystrom" Sent: Wednesday, June 27, 2012 3:54 AM To: "Dirk Muysers" Cc: Subject: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > Memory is always potentially a problem!!!! > > One of the main reasons my group was slow at switching from PM3 to CM3 > was because we were processing node names for chip designs as TEXTs. > > Chip designs tend to be deeply hierarchical and you wind up printing a > lot of strings such as > > a.b.c.d.e.f.g.h > > to files. > > That's when you run into problems with Text.Cat. > > And memory will always be a problem since you are always designing the > next generation of computers with the current generation of computers. > > Also even if memory weren't a problem, speed is always a problem, and > speed isn't entirely unrelated to memory. The Text.Hash I was alluding > to earlier hashes eight characters per iteration on a 64-bit machine, > as long as characters are 8 bits... If you go to 16 bits it'll take > at least twice as long. Furthermore if there is more than one way > (bit pattern) to represent a single CHAR it becomes difficult to use > algorithms that take more than one at a time. > > Mika > > "Dirk Muysers" writes: >>So let them hate it. Memory is not a problem anymore. >> >>-------------------------------------------------- >>From: "Hendrik Boom" >>Sent: Tuesday, June 26, 2012 8:19 PM >>To: >>Subject: Re: [M3devel]AND (???, 16_ff)??? Not serious - or so I hope! >> >>> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi??a Duri?? wrote: >>>> This piece of code, from TextClass.m3, disturbs me??? a lot. >>>> >>>> If we are to use WIDECHAR, I think we must be a lot more serious than >>>> this. >>>> >>>> Probably, text pieces are limited to 128 bytes by design, somewhere. >>>> But - whose idea was to "narrow" by ignoring everything except 8 LSB's? >>>> By mapping set of 2^20 elements to set of 2^8 elements. >>>> >>>> Probably by someone whose mother tongue is fully writeable with ASCII >>>> :). >>> >>> I'm told the Japanese hate UTF-8, because it expands their characters >>> from two bytes to three. >>> >>> -- hendrik >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Wed Jun 27 11:52:53 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 27 Jun 2012 11:52:53 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <20120626185008.50E131A205B@async.async.caltech.edu> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> <20120626185008.50E131A205B@async.async.caltech.edu> Message-ID: More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). What we need is to make compler map from input encoding (whatever user chooses or is choosen for him) to internal UTF8. On Jun 26, 2012, at 8:50 PM, Mika Nystrom wrote: > > As far as I know, SRC M3 and PM3 come with a TEXT implementation that > works exactly as described below. An extra byte is used at the end with > a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. > > One of the big advantages of the old version is that Text.Hash is really, > really fast. Especially on Alphas... it's hugely more expensive to > have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under > CM3 than under the old compilers and runtimes. We're talking a factor > of five or so in speed since the Table routines are generally entirely > dominated by Text.Hash. > > Mika > > Hendrik Boom writes: >> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: >>> >>> Somewhat but not fully. Text.Length should fetch a stored length. As >>> I'm sure it already does.That length should always be correctly >>> maintained. Same as today.Adding one extra nul at the end doesn't >>> invalidate the data.std::string has the same properties -- c_str() can >>> on-demand append a terminal nul,but there could also be one in the >>> string itself.I understand it is a bit wierd. Maintaining a terminal >>> nul does add cost that might be wasted.And reduces the capacity by >>> one.It could be on-demand, I guess. - Jay >> >> Don't need the 'on demand'. For the benefits of C interoperability, the >> extra byte is well worth the price. What I'm worrying about is someone >> using an enbedded NUL as an end-of-string marker. I smell more bugs >> creeping in. But I guess bug are inherent in C use, so I'm not >> surprised seeing them in C interoperation. >> >> -- hendrik From jay.krell at cornell.edu Wed Jun 27 12:19:08 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 27 Jun 2012 10:19:08 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com>, , <20120626160005.GA29355@topoi.pooq.com>, <20120626185008.50E131A205B@async.async.caltech.edu>, Message-ID: > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). I don't quite agree.There are two ideal approaches.1) TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR 2) something that can change between them, or possibly store both, but is still mainly flat arraysThat is, once you store a value over 0xFF, the internal represenation changes to flat array of WIDECHAR.Probably it stays that way -- you don't want to thrash back and forth in worst case.Lesser evil is probably to stick with wide represenation.Setting the string to empty might bounce it back narrow.Ditto assigning it from another narrow text, maybe. What I don't yet understand in all this is how to efficiently combine thread safety, immutability, and quadratic growth. The following should be as efficient as in typical C++ libraries: VAR a: TEXT;WHILE TRUE DO a := a & " ";END; I kind of thing that immutability and quadratic growth are in conflict.But not because that sounds obvious.Note that typical C++ libraries do have value semantics for std::string and std::vector. - Jay > From: dragisha at m3w.org > Date: Wed, 27 Jun 2012 11:52:53 +0200 > To: mika at async.caltech.edu > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > What we need is to make compler map from input encoding (whatever user chooses or is choosen for him) to internal UTF8. > > On Jun 26, 2012, at 8:50 PM, Mika Nystrom wrote: > > > > > As far as I know, SRC M3 and PM3 come with a TEXT implementation that > > works exactly as described below. An extra byte is used at the end with > > a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. > > > > One of the big advantages of the old version is that Text.Hash is really, > > really fast. Especially on Alphas... it's hugely more expensive to > > have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under > > CM3 than under the old compilers and runtimes. We're talking a factor > > of five or so in speed since the Table routines are generally entirely > > dominated by Text.Hash. > > > > Mika > > > > Hendrik Boom writes: > >> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: > >>> > >>> Somewhat but not fully. Text.Length should fetch a stored length. As > >>> I'm sure it already does.That length should always be correctly > >>> maintained. Same as today.Adding one extra nul at the end doesn't > >>> invalidate the data.std::string has the same properties -- c_str() can > >>> on-demand append a terminal nul,but there could also be one in the > >>> string itself.I understand it is a bit wierd. Maintaining a terminal > >>> nul does add cost that might be wasted.And reduces the capacity by > >>> one.It could be on-demand, I guess. - Jay > >> > >> Don't need the 'on demand'. For the benefits of C interoperability, the > >> extra byte is well worth the price. What I'm worrying about is someone > >> using an enbedded NUL as an end-of-string marker. I smell more bugs > >> creeping in. But I guess bug are inherent in C use, so I'm not > >> surprised seeing them in C interoperation. > >> > >> -- hendrik > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Wed Jun 27 13:14:22 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 27 Jun 2012 13:14:22 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com>, , <20120626160005.GA29355@topoi.pooq.com>, <20120626185008.50E131A205B@async.async.caltech.edu>, Message-ID: <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> On Jun 27, 2012, at 12:19 PM, Jay K wrote: > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > I don't quite agree. > There are two ideal approaches. > 1) > TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) > "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Wed Jun 27 13:26:31 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 27 Jun 2012 13:26:31 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com>, , <20120626160005.GA29355@topoi.pooq.com>, <20120626185008.50E131A205B@async.async.caltech.edu>, <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> Message-ID: <21109700-2223-46D4-A151-DABC7084BDC1@m3w.org> This is one place where insisting on some imagined/future purity (fully compatible withyour argument - thread safety + immutability + non-quadratic performance) will lead to unreasonable fragmentation and de-facto gray area in CM3 and it's usage. I am only one of people here who de-facto uses TEXT's to hold UTF8 content. And while we all think/talk about solution, every single user who needs international characters and wants to use them in sensible way - will go same way. Then, some "proper" CM3 solution comes and what happens? We rewrite everything to support it? Or ignore it? On Jun 27, 2012, at 1:14 PM, Dragi?a Duri? wrote: > > On Jun 27, 2012, at 12:19 PM, Jay K wrote: > >> > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). >> >> I don't quite agree. >> There are two ideal approaches. >> 1) >> TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) >> "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR > > So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Wed Jun 27 13:52:29 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 27 Jun 2012 12:52:29 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <21109700-2223-46D4-A151-DABC7084BDC1@m3w.org> Message-ID: <1340797949.85516.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: In reality it turns out that ASCII is still the suitable and adhered standard for Modula-2 Command control, structured text, formatting in PLCs systems programming. We better when we pick something be clearer, but nevertheless I agree with internationalization as with compatibility, etc >From what I gather TEXT is allowed to be Latin-1 superset Thanks in advance --- El mi?, 27/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Jay K" CC: "m3devel" Fecha: mi?rcoles, 27 de junio, 2012 06:26 This is one place where insisting on some imagined/future purity (fully compatible withyour argument - thread safety + immutability + non-quadratic performance) will lead to unreasonable fragmentation and de-facto gray area in CM3 and it's usage. I am only one of people here who de-facto uses TEXT's to hold UTF8 content. And while we all think/talk about solution, every single user who needs international characters and wants to use them in sensible way - will go same way. Then, some "proper" CM3 solution comes and what happens? We rewrite everything to support it? Or ignore it? On Jun 27, 2012, at 1:14 PM, Dragi?a Duri? wrote: On Jun 27, 2012, at 12:19 PM, Jay K wrote: ?> More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). I don't?quite agree. There are two ideal approaches. 1) ? TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F)?? "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR? So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Wed Jun 27 21:20:41 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 14:20:41 -0500 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120627033000.GB28021@topoi.pooq.com> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> Message-ID: <4FEB5D09.2080601@lcwb.coop> On 06/26/2012 10:30 PM, Hendrik Boom wrote: > On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >> I seem to recall that Rodney did some work a while back relating to TEXT. >> Rodney, can you weigh in on some of this? >> --Randy Coleburn >> >> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >> Sent: Tuesday, June 26, 2012 12:46 PM >> To: Jay >> Cc: m3devel >> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >> >> You had idea in other message. Store length! >> >> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Most of the time, you don't need explicit integer indexes to character > locations. What you do need is an operation that fetches a character > given the string and its index (whatever data structure that index is), > and one that increments the index past that character. As long as you > can save an index and use it later on the same string, that's probably > all you ever need. And with a simple TEXT representation (such as the > obvious array of bytes containing characters of various widths) a byte > index is all you need (note: NOT a character index). It's easy even to > use TEXT and its integer indices as the data representation, as long as > you use the proper functions parse the characters and increment the > indices by amounts that might differ from 1. > > And if your source code is represented in UTF-8, the representation that > requires little extra compiler effort to parse, your TEXT strings will > automagically appear in UTF-8. The original designers of the language and its libraries have given us two different abstractions for handling character strings (in addition to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. Text is highly general and easy to use. Concatentations and substrings are easy. Semantics, to its clients, are value semantics, similar to INTEGER. Random access by *character* number is easy and, hopefully, implemented with efficiency at least better than O(n). Wr and friends restrict you to sequential access, at least mostly, but gain implementation convenience and efficiency as a result. I feel very stongly that we should *not* take away the full generality of Text, especially efficient random access, to handle variable-length character encodings in strings. For these, lets make more friends of Wr and Rd, which already assume sequential access. For example, a filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 interpretation to its bytes, and delivers a stream of Unicode characters, in variables of type WIDECHAR. Text should preserve the abstraction that it's a string of characters, generalized as it already is in cm3, to have type WIDECHAR, so they can be any Unicode character. The internal representation should, usually, not be of concern. Note that nowhere in Text are character values transferred between a Text.T and any form of I/O stream. In the Text abstraction, all characters go in and out of a Text.T in variables of type CHAR, WIDECHAR, and arrays thereof. IO, etc. is only done in streams, e.g, TextWr. We can easily add new variants of these that encode/decode by various rules. Of course, it is still valid to put a string of bytes in a Text.T and apply, e.g., UTF-8 interpretation yourself. But that's lower-level programming, and shouldn't confuse the abstraction. > > I can see a use for various wide characters -- the things you extract > from a TEXT by parsing biits of it, but none for anything > really new complicated for wide TEXT. > > The only confusing thing is that the existing operations for extracting > bytes from TEXT have names that suggest they are extracting characters. > I think it's more than a suggestion. I think the abstraction clearly considers them characters. And it should stay that way. If you want, at a higher level of code, to treat them as bytes, that's fine, but the abstraction continues to view them as characters (which only you, the client, know is not really so.) > -- Hendrik > From rodney_bates at lcwb.coop Wed Jun 27 22:04:59 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 15:04:59 -0500 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com>, , <20120626160005.GA29355@topoi.pooq.com>, <20120626185008.50E131A205B@async.async.caltech.edu>, Message-ID: <4FEB676B.1010505@lcwb.coop> On 06/27/2012 05:19 AM, Jay K wrote: > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > I don't quite agree. > There are two ideal approaches. > 1) > TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) > "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR > > > 2) something that can change between them, or possibly store both, but is still mainly flat arrays > That is, once you store a value over 0xFF, the internal represenation changes to flat array of WIDECHAR. > Probably it stays that way -- you don't want to thrash back and forth in worst case. > Lesser evil is probably to stick with wide represenation. > Setting the string to empty might bounce it back narrow. > Ditto assigning it from another narrow text, maybe. > This is similar to what the cm3 modification of Text does now. The details of what goes on inside the implementation are a bit different than you describe. There can be mixtures of 8-bit string fragments and 16-bit string fragments, plus other stuff hooking them together. But the abstraction works just like this. > > > What I don't yet understand in all this is how to efficiently combine thread safety, immutability, and quadratic growth. > > The following should be as efficient as in typical C++ libraries: > > > VAR a: TEXT; VAR a: TEXT:= " "; > WHILE TRUE DO > a := a & " "; > END; > In pm3 Text, this will take quadratic time and linear space. The partial strings will be garbage collected, as no copies of the pointers to them are made. GetChar is then O(1). In cm3 Text, this is linear in both time and space, but the space usage has a much higher constant factor than in pm3. In pm3, the asymptotic space used is exactly what the characters themselves require, i.e, one byte per character. For cm3, I count 21 native words per character, plus fragmentation loss for 3 separate heap objects per character. That's 84 times or 168 times, depending on word size. Well, lots of people keep saying RAM is virtually free these days. I guess we really need to hope they are right. GetChar is O(n) when the string is built linearly like this. Best case is O(log n) when built by Cats of single characters. My modification of cm3 Text lies between these. It flattens strings up to a point, then does some imperfect balancing of them higher in trees. Frankly, I think I like going back to the pm3 implementation best. > > I kind of thing that immutability and quadratic growth are in conflict. They are, to a considerable extent, as with all functional-style data structures. But more sophisticated (i.e., complicated) implementations can mitigate somewhat. > But not because that sounds obvious. > Note that typical C++ libraries do have value semantics for std::string and std::vector. > > > - Jay > > > > From: dragisha at m3w.org > > Date: Wed, 27 Jun 2012 11:52:53 +0200 > > To: mika at async.caltech.edu > > CC: m3devel at elegosoft.com > > Subject: Re: [M3devel] Windows, Unicode file names > > > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > > > What we need is to make compler map from input encoding (whatever user chooses or is choosen for him) to internal UTF8. > > > > On Jun 26, 2012, at 8:50 PM, Mika Nystrom wrote: > > > > > > > > As far as I know, SRC M3 and PM3 come with a TEXT implementation that > > > works exactly as described below. An extra byte is used at the end with > > > a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. > > > > > > One of the big advantages of the old version is that Text.Hash is really, > > > really fast. Especially on Alphas... it's hugely more expensive to > > > have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under > > > CM3 than under the old compilers and runtimes. We're talking a factor > > > of five or so in speed since the Table routines are generally entirely > > > dominated by Text.Hash. > > > > > > Mika > > > > > > Hendrik Boom writes: > > >> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: > > >>> > > >>> Somewhat but not fully. Text.Length should fetch a stored length. As > > >>> I'm sure it already does.That length should always be correctly > > >>> maintained. Same as today.Adding one extra nul at the end doesn't > > >>> invalidate the data.std::string has the same properties -- c_str() can > > >>> on-demand append a terminal nul,but there could also be one in the > > >>> string itself.I understand it is a bit wierd. Maintaining a terminal > > >>> nul does add cost that might be wasted.And reduces the capacity by > > >>> one.It could be on-demand, I guess. - Jay > > >> > > >> Don't need the 'on demand'. For the benefits of C interoperability, the > > >> extra byte is well worth the price. What I'm worrying about is someone > > >> using an enbedded NUL as an end-of-string marker. I smell more bugs > > >> creeping in. But I guess bug are inherent in C use, so I'm not > > >> surprised seeing them in C interoperation. > > >> > > >> -- hendrik > > From rodney_bates at lcwb.coop Wed Jun 27 22:10:42 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 15:10:42 -0500 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> Message-ID: <4FEB68C2.7000202@lcwb.coop> Yes, this is a disturbing quirk, and quite out of character with the nature of Modula-3. It would be consistent to say that CHAR<:WIDECHAR, and apply the usual assignability rules. That would make this a runtime range error. On 06/26/2012 05:18 AM, Dragi?a Duri? wrote: > This piece of code, from TextClass.m3, disturbs me? a lot. > > If we are to use WIDECHAR, I think we must be a lot more serious than this. > > Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. > > Probably by someone whose mother tongue is fully writeable with ASCII :). > > ==== > PROCEDURE GetChars (t: TEXT; VAR a: ARRAY OF CHAR; start: CARDINAL) = > VAR > info : Info; > cnt : INTEGER; > next : CARDINAL := 0; > buf : ARRAY [0..127] OF WIDECHAR; > BEGIN > t.get_info (info); > cnt := MIN (NUMBER (a), info.length - start); > WHILE (cnt> 0) DO > t.get_wide_chars (buf, start); > FOR i := FIRST (buf) TO LAST (buf) DO > IF (cnt = 0) THEN RETURN END; > a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); > INC (next); DEC (cnt); > END; > INC (start, NUMBER (buf)); > END; > END GetChars; > ==== > > From rodney_bates at lcwb.coop Wed Jun 27 22:27:29 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 15:27:29 -0500 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> Message-ID: <4FEB6CB1.3070509@lcwb.coop> On 06/26/2012 11:34 AM, Jay wrote: > >>> > 128 limit >>> >>> I haven't read the code enough yet to verify that but you are probably right >> >> I was not right :), that call is incremental. > > I looked for that aspect too but missed it. :( > > >>> > ignoring everything over 16_FF >>> >>> Probably that is the responsibility/claim of the caller of GetChars. >>> If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. >>> Perhaps raising an exception would be reasonable to signal the loss of data. Or something. >>> There is HasWideChars for you to check. >> >> >>> >>> There is no encoding implied remember. >>> This isn't UTF8 data. >> >> It is not, but probably only way to solve this without exception is to make UTF8 "official" 8bit encoding :) >> > > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > TEXT is well abstracted and can be widened, with the exception that truncating characters to 8 bits to return them in a CHAR is wrong. It should be a checked runtime error, and this should be documented. Note that while we have two types CHAR and WIDECHAR for scalars (and can also have arrays thereof), there is still only one type TEXT. Conceptually, it should be viewed as holding strings of WIDECHAR, with some convenience functions for putting CHARs into and getting them out of a TEXT, when the programmer knows the value is in this range. The fact that our implementation stores some values in fields of type CHAR is a hidden implementation detail. There is nothing in the abstraction that requires it to be done this way, or enables clients to know that. We do have two kinds of text literals, conventional and wide. They differ only in how the value is specified, and the ability to specify characters outside of CHAR. > > Java is in-between but also simple & predictable -- there being no narrow option other than array of byte, which is reasonable. > > > - Jay From rodney_bates at lcwb.coop Thu Jun 28 04:12:26 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 21:12:26 -0500 Subject: [M3devel] UTF-8 TEXT In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> Message-ID: <4FEBBD8A.5020206@lcwb.coop> On 06/27/2012 07:32 PM, Antony Hosking wrote: > So what do we do about 6-byte UTF-8 code points? They won't fit in WIDECHAR. Surely we should allow accessing a UTF-8 character as a CARDINAL and be done with it? > Absolutely. Except I think a better way is to make WIDECHAR big enough to hold all of Unicode. > Sent from my iPad > > On Jun 27, 2012, at 3:20 PM, "Rodney M. Bates" wrote: > >> >> >> On 06/26/2012 10:30 PM, Hendrik Boom wrote: >>> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >>>> I seem to recall that Rodney did some work a while back relating to TEXT. >>>> Rodney, can you weigh in on some of this? >>>> --Randy Coleburn >>>> >>>> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >>>> Sent: Tuesday, June 26, 2012 12:46 PM >>>> To: Jay >>>> Cc: m3devel >>>> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >>>> >>>> You had idea in other message. Store length! >>>> >>>> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. >>> >>> Most of the time, you don't need explicit integer indexes to character >>> locations. What you do need is an operation that fetches a character >>> given the string and its index (whatever data structure that index is), >>> and one that increments the index past that character. As long as you >>> can save an index and use it later on the same string, that's probably >>> all you ever need. And with a simple TEXT representation (such as the >>> obvious array of bytes containing characters of various widths) a byte >>> index is all you need (note: NOT a character index). It's easy even to >>> use TEXT and its integer indices as the data representation, as long as >>> you use the proper functions parse the characters and increment the >>> indices by amounts that might differ from 1. >>> >>> And if your source code is represented in UTF-8, the representation that >>> requires little extra compiler effort to parse, your TEXT strings will >>> automagically appear in UTF-8. >> >> The original designers of the language and its libraries have given us >> two different abstractions for handling character strings (in addition >> to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. >> >> Text is highly general and easy to use. Concatentations and substrings >> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). >> >> Wr and friends restrict you to sequential access, at least mostly, but >> gain implementation convenience and efficiency as a result. >> >> I feel very stongly that we should *not* take away the full generality >> of Text, especially efficient random access, to handle variable-length >> character encodings in strings. For these, lets make more friends of >> Wr and Rd, which already assume sequential access. For example, a >> filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 >> interpretation to its bytes, and delivers a stream of Unicode characters, >> in variables of type WIDECHAR. >> >> Text should preserve the abstraction that it's a string of characters, >> generalized as it already is in cm3, to have type WIDECHAR, so they can be any >> Unicode character. The internal representation should, usually, not be >> of concern. >> >> Note that nowhere in Text are character values transferred between >> a Text.T and any form of I/O stream. In the Text abstraction, all >> characters go in and out of a Text.T in variables of type CHAR, >> WIDECHAR, and arrays thereof. IO, etc. is only done in streams, >> e.g, TextWr. We can easily add new variants of these that encode/decode >> by various rules. >> >> Of course, it is still valid to put a string of bytes in a Text.T and >> apply, e.g., UTF-8 interpretation yourself. But that's lower-level >> programming, and shouldn't confuse the abstraction. >> >>> >>> I can see a use for various wide characters -- the things you extract >>> from a TEXT by parsing biits of it, but none for anything >>> really new complicated for wide TEXT. >>> >>> The only confusing thing is that the existing operations for extracting >>> bytes from TEXT have names that suggest they are extracting characters. >>> >> >> I think it's more than a suggestion. I think the abstraction clearly >> considers them characters. And it should stay that way. If you want, >> at a higher level of code, to treat them as bytes, that's fine, but the >> abstraction continues to view them as characters (which only you, the >> client, know is not really so.) >> >>> -- Hendrik >>> > From jay.krell at cornell.edu Thu Jun 28 07:31:04 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 28 Jun 2012 05:31:04 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <4FEB676B.1010505@lcwb.coop> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , , , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , , , <20120625203422.GA24287@topoi.pooq.com>, , , , <20120626160005.GA29355@topoi.pooq.com>, , <20120626185008.50E131A205B@async.async.caltech.edu>, , , , <4FEB676B.1010505@lcwb.coop> Message-ID: ? > Random access by *character* number is easy and, hopefully, implemented ?> with efficiency at least better than O(n). ? Random access by "something, not 'character'" should be O(1). > > I kind of thing that immutability and quadratic growth are in conflict. > > They are, to a considerable extent, as with all functional-style data structures. > But more sophisticated (i.e., complicated) implementations can mitigate somewhat. I'm hoping we can win here somehow. In Java and C# they solve this by having, in a sense, two string types. constant "string"s an mutable "StringBuffer"s Strings never grow. They are always flat. StringBuffers grow quadratically. They are always flat. They are mutable. I suspect we need do something similar. Somehow. As I understand, C# and Java do expose string concatenation. As I understand, they are similar to Modula-3 here, in that the compiler knows about string concatenation and rewrites the code somewhat. Thinking about it further, I suspect my example also can't/doesn't run performantly in Java or C# either. Hopefully we can come up with some good solution to this. I have to run. ?- Jay ---------------------------------------- > Date: Wed, 27 Jun 2012 15:04:59 -0500 > From: rodney_bates at lcwb.coop > To: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > > > On 06/27/2012 05:19 AM, Jay K wrote: > > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > > > I don't quite agree. > > There are two ideal approaches. > > 1) > > TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) > > "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR > > > > > > 2) something that can change between them, or possibly store both, but is still mainly flat arrays > > That is, once you store a value over 0xFF, the internal represenation changes to flat array of WIDECHAR. > > Probably it stays that way -- you don't want to thrash back and forth in worst case. > > Lesser evil is probably to stick with wide represenation. > > Setting the string to empty might bounce it back narrow. > > Ditto assigning it from another narrow text, maybe. > > > > This is similar to what the cm3 modification of Text does now. The details of > what goes on inside the implementation are a bit different than you describe. > There can be mixtures of 8-bit string fragments and 16-bit string fragments, plus > other stuff hooking them together. But the abstraction works just like this. > > > > > > > What I don't yet understand in all this is how to efficiently combine thread safety, immutability, and quadratic growth. > > > > The following should be as efficient as in typical C++ libraries: > > > > > > VAR a: TEXT; > > VAR a: TEXT:= " "; > > > WHILE TRUE DO > > a := a & " "; > > END; > > > > In pm3 Text, this will take quadratic time and linear space. The partial > strings will be garbage collected, as no copies of the pointers to them > are made. GetChar is then O(1). > > In cm3 Text, this is linear in both time and space, but the space usage > has a much higher constant factor than in pm3. In pm3, the asymptotic space used > is exactly what the characters themselves require, i.e, one byte per character. > For cm3, I count 21 native words per character, plus fragmentation loss > for 3 separate heap objects per character. That's 84 times or 168 times, > depending on word size. Well, lots of people keep saying RAM is virtually > free these days. I guess we really need to hope they are right. > GetChar is O(n) when the string is built linearly like this. > Best case is O(log n) when built by Cats of single characters. > > My modification of cm3 Text lies between these. It flattens strings > up to a point, then does some imperfect balancing of them higher in trees. > > Frankly, I think I like going back to the pm3 implementation best. > > > > > I kind of thing that immutability and quadratic growth are in conflict. > > They are, to a considerable extent, as with all functional-style data structures. > But more sophisticated (i.e., complicated) implementations can mitigate somewhat. > > > But not because that sounds obvious. > > Note that typical C++ libraries do have value semantics for std::string and std::vector. > > > > > > - Jay > > > > > > > From: dragisha at m3w.org > > > Date: Wed, 27 Jun 2012 11:52:53 +0200 > > > To: mika at async.caltech.edu > > > CC: m3devel at elegosoft.com > > > Subject: Re: [M3devel] Windows, Unicode file names > > > > > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > > > > > What we need is to make compler map from input encoding (whatever user chooses or is choosen for him) to internal UTF8. > > > > > > On Jun 26, 2012, at 8:50 PM, Mika Nystrom wrote: > > > > > > > > > > > As far as I know, SRC M3 and PM3 come with a TEXT implementation that > > > > works exactly as described below. An extra byte is used at the end with > > > > a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. > > > > > > > > One of the big advantages of the old version is that Text.Hash is really, > > > > really fast. Especially on Alphas... it's hugely more expensive to > > > > have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under > > > > CM3 than under the old compilers and runtimes. We're talking a factor > > > > of five or so in speed since the Table routines are generally entirely > > > > dominated by Text.Hash. > > > > > > > > Mika > > > > > > > > Hendrik Boom writes: > > > >> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: > > > >>> > > > >>> Somewhat but not fully. Text.Length should fetch a stored length. As > > > >>> I'm sure it already does.That length should always be correctly > > > >>> maintained. Same as today.Adding one extra nul at the end doesn't > > > >>> invalidate the data.std::string has the same properties -- c_str() can > > > >>> on-demand append a terminal nul,but there could also be one in the > > > >>> string itself.I understand it is a bit wierd. Maintaining a terminal > > > >>> nul does add cost that might be wasted.And reduces the capacity by > > > >>> one.It could be on-demand, I guess. - Jay > > > >> > > > >> Don't need the 'on demand'. For the benefits of C interoperability, the > > > >> extra byte is well worth the price. What I'm worrying about is someone > > > >> using an enbedded NUL as an end-of-string marker. I smell more bugs > > > >> creeping in. But I guess bug are inherent in C use, so I'm not > > > >> surprised seeing them in C interoperation. > > > >> > > > >> -- hendrik > > > From hendrik at topoi.pooq.com Thu Jun 28 14:37:56 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 28 Jun 2012 08:37:56 -0400 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> <20120626185008.50E131A205B@async.async.caltech.edu> <4FEB676B.1010505@lcwb.coop> Message-ID: <20120628123756.GA2279@topoi.pooq.com> On Thu, Jun 28, 2012 at 05:31:04AM +0000, Jay K wrote: > > ? > Random access by *character* number is easy and, hopefully, implemented > ?> with efficiency at least better than O(n). > ? > > Random access by "something, not 'character'" should be O(1). Quite agree. There shoule be a fetch-byte operation, and a fetch-characcter operation. Fetch-character should return a character and the index to the next character. > > > > > I kind of thing that immutability and quadratic growth are in conflict. > > > > They are, to a considerable extent, as with all functional-style data structures. > > But more sophisticated (i.e., complicated) implementations can mitigate somewhat. > > I'm hoping we can win here somehow. > In Java and C# they solve this by having, in a sense, two string types. > constant "string"s > an mutable "StringBuffer"s > Strings never grow. They are always flat. > StringBuffers grow quadratically. They are always flat. They are mutable. > > I suspect we need do something similar. Somehow. > > As I understand, C# and Java do expose string concatenation. > As I understand, they are similar to Modula-3 here, in that the compiler knows > about string concatenation and rewrites the code somewhat. > Thinking about it further, I suspect my example also can't/doesn't run performantly in Java or C# either. > > > Hopefully we can come up with some good solution to this. > I have to run. Initially, create a string as a simple array of bytes. Then, when we start concatenating, use a cm3-like representation. (we could delay this until our string gets a little long, or until a pointer to it gets copied. Maintian that as long as we're still concatenating. We might try balancing the tree somewhat if it gets biggish. But as soon as we start indexing or hashing, or anything like that, we can change representation to the simple array of byte. Usually at that point we're finished concatenating. -- hendrik > > > ?- Jay From hendrik at topoi.pooq.com Thu Jun 28 14:44:46 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 28 Jun 2012 08:44:46 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <4FEB5D09.2080601@lcwb.coop> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> Message-ID: <20120628124446.GB2279@topoi.pooq.com> On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: > > Text is highly general and easy to use. Concatentations and substrings > are easy. Semantics, to its clients, are value semantics, similar to INTEGER. > Random access by *character* number is easy and, hopefully, implemented > with efficiency at least better than O(n). Does it have to be a *character* number we use to index a string? I don't know of any situations where that aspect is importnat enough to force everyone to waste storage on it. -- hendrik From dragisha at m3w.org Thu Jun 28 14:48:38 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 28 Jun 2012 14:48:38 +0200 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120628124446.GB2279@topoi.pooq.com> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> <20120628124446.GB2279@topoi.pooq.com> Message-ID: <01C11478-BAEA-4BAC-8ECE-FA5A28933A44@m3w.org> glyph sounds better, I agree! :) On Jun 28, 2012, at 2:44 PM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: >> >> Text is highly general and easy to use. Concatentations and substrings >> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). > > Does it have to be a *character* number we use to index a string? I > don't know of any situations where that aspect is importnat enough > to force everyone to waste storage on it. > > -- hendrik From hendrik at topoi.pooq.com Thu Jun 28 14:51:03 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 28 Jun 2012 08:51:03 -0400 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> References: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> <20120626185008.50E131A205B@async.async.caltech.edu> <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> Message-ID: <20120628125103.GC2279@topoi.pooq.com> On Wed, Jun 27, 2012 at 01:14:22PM +0200, Dragi?a Duri? wrote: > > On Jun 27, 2012, at 12:19 PM, Jay K wrote: > > > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > > > I don't quite agree. > > There are two ideal approaches. > > 1) > > TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) > > "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR > > So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? I'm starting to discover that a lot of my English documents have nonAscii chracters in them. In particular, the separate open and close quotation marks around quoted speech take more than one byte in Unicode. True, in a starvation-level character set, they are both represented as " , but that's really not what they are. -- hendrik From dabenavidesd at yahoo.es Thu Jun 28 15:51:59 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 28 Jun 2012 14:51:59 +0100 (BST) Subject: [M3devel] UTF-8 TEXT In-Reply-To: <01C11478-BAEA-4BAC-8ECE-FA5A28933A44@m3w.org> Message-ID: <1340891519.52552.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: it can't be used like that (as in DEC-SRC early versions) because TEXT is opaque type, you can't reveal it like that at that level. Anyway whoever needs characteristics of JVM or so languages must know that the likes are very inefficient, and C is pretty nit but also very much UNSAFE, perhaps if somebody wants that maybe should use integer strings to map directly from hardware every symbol of the computer (but support every kind of format seems over-complex in space and time) Operating system normally doesn't handle I/O in many cases, but the I/O subsystem (like Windows I/O) and in some cases it takes advantages of not waiting for a thread to return control over the app. There are many computers that use non-ASCII terms but normally they support that script, so why put more weight on it? Maybe I should ask either lexicographers or specific language users to know what they need for they in a comprehensible manner and not hard coded standards many don't use still. Thanks in advances --- El jue, 28/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] UTF-8 TEXT Para: "Hendrik Boom" CC: m3devel at elegosoft.com Fecha: jueves, 28 de junio, 2012 07:48 glyph sounds better, I agree! :) On Jun 28, 2012, at 2:44 PM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: >> >> Text is highly general and easy to use.? Concatentations and substrings >> are easy.? Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). > > Does it have to be a *character* number we use to index a string?? I > don't know of any situations where that aspect is importnat enough > to force everyone to waste storage on it. > > -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Thu Jun 28 16:10:02 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Thu, 28 Jun 2012 09:10:02 -0500 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120628124446.GB2279@topoi.pooq.com> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> <20120628124446.GB2279@topoi.pooq.com> Message-ID: <4FEC65BA.9080809@lcwb.coop> On 06/28/2012 07:44 AM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: >> >> Text is highly general and easy to use. Concatentations and substrings >> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). > > Does it have to be a *character* number we use to index a string? I > don't know of any situations where that aspect is importnat enough > to force everyone to waste storage on it. > > -- hendrik > It is absolutely essential that it be a character, if you care about Text being a meaningful abstraction. A byte index is a very low level view, now that we have a variable-length encoding, and *especially* now that there are multiple possible ways of representing strings. strings. When it was only ASCII (or ISO-latin1), it was a character index, and the abstraction was there. The fact that it was also a byte index is a coincidental consequence of the choice of underlying physical representation. Now we have a much messier situation regarding representations, but we should not destroy the abstraction and force everyone to always get down into the bowels of the different representations. There will still be mechanisms for low-level coding if you have some compelling reason, or just don't want to rewrite something existing. But let's protect the option of dealing with characters with the same abstraction we have had in the past. From dabenavidesd at yahoo.es Thu Jun 28 17:18:31 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 28 Jun 2012 16:18:31 +0100 (BST) Subject: [M3devel] UTF-8 TEXT In-Reply-To: <4FEC65BA.9080809@lcwb.coop> Message-ID: <1340896711.88750.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: string class is? a super-set by definition of CHARs scripts, note TEXT a primitive type, so it can have every string characteristics. Thus we don't need any other non-primitive TEXT types The need for other TEXT isn't a matter, as it can add or have any characters but put burden of choice to the implementation. WIDECHARs aren't at all needed by Modula-3 at all, but to keep copying CHARs in other is not in my view more string formats is the real advantage to speed up implementation to get two CHARs strings. So I agree in that we must look the performance burden in citing implementations, for instance keep compatibility without loosing special performance. My view is that we need to re implement that in m3core, in either C, for instance or some safe subset of Modula-3 to speed up a little, for instance DEC-SRC, etc, or a subset of SPIN-M3 (somethings I like). But this is more stuff to do, fun certainly, but I would want to concentrate in supporting either that by OS definition, or by accessing hardware (who cares using C RT for Linux, but if we can be faster let's do it in whatever it takes). In the end we can provide better interfaces to develop current OS than they provide to us, so what then it matters if we offer some code to Linux if at all, interested. Greg Nelson told that Rd/Wr are a very nice piece of string type unappreciated by most of the current mainstream languages. Thanks in advance Thanks in advance ? --- El jue, 28/6/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] UTF-8 TEXT Para: m3devel at elegosoft.com Fecha: jueves, 28 de junio, 2012 09:10 On 06/28/2012 07:44 AM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: >> >> Text is highly general and easy to use.? Concatentations and substrings >> are easy.? Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). > > Does it have to be a *character* number we use to index a string?? I > don't know of any situations where that aspect is importnat enough > to force everyone to waste storage on it. > > -- hendrik > It is absolutely essential that it be a character, if you care about Text being a meaningful abstraction.? A byte index is a very low level view, now that we have a variable-length encoding, and *especially* now that there are multiple possible ways of representing strings. strings. When it was only ASCII (or ISO-latin1), it was a character index, and the abstraction was there.? The fact that it was also a byte index is a coincidental consequence of the choice of underlying physical representation.? Now we have a much messier situation regarding representations, but we should not destroy the abstraction and force everyone to always get down into the bowels of the different representations. There will still be mechanisms for low-level coding if you have some compelling reason, or just don't want to rewrite something existing. But let's protect the option of dealing with characters with the same abstraction we have had in the past. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Thu Jun 28 19:02:30 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 28 Jun 2012 13:02:30 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <4FEC65BA.9080809@lcwb.coop> References: <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> <20120628124446.GB2279@topoi.pooq.com> <4FEC65BA.9080809@lcwb.coop> Message-ID: <20120628170230.GD2279@topoi.pooq.com> On Thu, Jun 28, 2012 at 09:10:02AM -0500, Rodney M. Bates wrote: > > > On 06/28/2012 07:44 AM, Hendrik Boom wrote: > >On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: > >> > >>Text is highly general and easy to use. Concatentations and substrings > >>are easy. Semantics, to its clients, are value semantics, similar to INTEGER. > >>Random access by *character* number is easy and, hopefully, implemented > >>with efficiency at least better than O(n). > > > >Does it have to be a *character* number we use to index a string? I > >don't know of any situations where that aspect is importnat enough > >to force everyone to waste storage on it. > > > >-- hendrik > > > > It is absolutely essential that it be a character, if you care about > Text being a meaningful abstraction. A byte index is a very low level > view, now that we have a variable-length encoding, and *especially* > now that there are multiple possible ways of representing strings. > strings. I'm not arguing whether the index should point to a character. I'm questioning whether it need be a count of characters. This is surely a matter of data representation rather then concept. A character index could be implemented in a variety of ways. It certainly could be implemented as a character count, presumably for legacy applications with attendant costs. It could be implemented as a byte count if the string were implemented as an array of bytes. It could be implemented as a machine address, constrained to index into a particular string. It could be implemented as a pointer into a linked list of string pieces, together with an offset indicating where in that piece it currently points. We could even implement byte and character counts in the more exotic TEXT data structures if we chose; we have freedom of representation of TEXT without compromising integer. We can implement *both* character extractors using an INTEGER *byte* count AND character extractors using an INTEGER *character* count. And we can do this in just about any representation of TEXT we come up with. THe specification for the abstraction doesn't even have to say that it'a a byte count. It's sufficient to say one can use an index that is chosen for implementation efficiency. Though it's tempting to provide a byte count for an operation that extracts bytes, not characters. Now that would be a low-level operation that does break the abstraction. -- hendrik > > When it was only ASCII (or ISO-latin1), it was a character > index, and the abstraction was there. The fact that it was also a > byte index is a coincidental consequence of the choice of underlying > physical representation. Now we have a much messier situation regarding > representations, but we should not destroy the abstraction and force > everyone to always get down into the bowels of the different representations. > > There will still be mechanisms for low-level coding if you have some > compelling reason, or just don't want to rewrite something existing. > But let's protect the option of dealing with characters with the same > abstraction we have had in the past. Yes, it was obviously a mistake for Modula 3 not to distringuish between two types for character and byte. And it's not the only language to have have made that mistake. There's two different abstractions here, with different meanings, but they share one name and one implementation. Frankly, I don't care which of the two retains the name CHAR. It's all the same to me whether (a) characters are called WIDECHAR and bytes CHAR or (b) characters are called CHAR and bytes BYTE. because either way proograms are going to have to be changed to adapt to the new world. (a) is probably less disruptive to legacy programs that olny evver need to deal with legacy ASCII files. (b) is probably conceptually cleaner. What's important is that both mechanisms remain available for dealing with values of type TEXT. The designers of Modula 3 have done an admirable job of providing a collection of abstractions that enable both conceptually clean and efficient implementations. Let's not mess it up by providing only a conceptually clean, inefficient interface. -- hendrik From dragisha at m3w.org Thu Jun 28 19:19:48 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 28 Jun 2012 19:19:48 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <20120628125103.GC2279@topoi.pooq.com> References: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> <20120626185008.50E131A205B@async.async.caltech.edu> <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> <20120628125103.GC2279@topoi.pooq.com> Message-ID: My language (Serbian) is written with two alphabets. Before ISO-8859-2 we used ten (yes, 10) different encodings to represent our alphabet(s) with 8 bits. With ISO-8859-2 we got solution for Latin alphabet, but we had to use ISO-8859-5 for Cyrillic. One of our ten encodings (national standard come late) covered both Latin and Cyrillic in 8 bit. Back in 1991-2 I implemented system for handling above mentioned ten encodings. After that experience, an after decade or so of using/fighting ten encodings, you can trust me - even a notion of having single encoding for all language needs is a lifesaver :). That is where my oversensitivity to idea of having two ways to interpret strings comes from. Two ways, just because we can? Ok, we can use two, we can use ten, we can use fifty encodings!! But sensible way is to use one, if possible. And it is possible! It is called UTF-8. On Jun 28, 2012, at 2:51 PM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 01:14:22PM +0200, Dragi?a Duri? wrote: >> >> On Jun 27, 2012, at 12:19 PM, Jay K wrote: >> >>>> More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). >>> >>> I don't quite agree. >>> There are two ideal approaches. >>> 1) >>> TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) >>> "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR >> >> So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? > > I'm starting to discover that a lot of my English documents have > nonAscii chracters in them. In particular, the separate open and close > quotation marks around quoted speech take more than one byte in > Unicode. True, in a starvation-level character set, they are both > represented as " , but that's really not what they are. > > -- hendrik From rcolebur at SCIRES.COM Fri Jun 29 01:35:29 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Thu, 28 Jun 2012 19:35:29 -0400 Subject: [M3devel] EXT Re: UTF-8 TEXT In-Reply-To: <4FEB5D09.2080601@lcwb.coop> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> Message-ID: ... > I feel very stongly that we should *not* take away the full generality of Text, > especially efficient random access, to handle variable-length character > encodings in strings. For these, lets make more friends of Wr and Rd, which > already assume sequential access. For example, a filter pipe that sequentially > reads a Text/Array/stream, applies a UTF-8 interpretation to its bytes, and > delivers a stream of Unicode characters, in variables of type WIDECHAR. > > Text should preserve the abstraction that it's a string of characters, > generalized as it already is in cm3, to have type WIDECHAR, so they can be any > Unicode character. The internal representation should, usually, not be of concern. ... I concur with Rodney. We need to hold true to the design tenants of the language and keep the full generality of Text with efficient random access, and add new variants of the Rd/Wr/etc. abstractions that deal with the various variable-length character encodings as sequential-access streams. --Randy Coleburn From dabenavidesd at yahoo.es Fri Jun 29 02:21:19 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 29 Jun 2012 01:21:19 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: Message-ID: <1340929279.13051.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: in fact CM had the idea of rewriting the Modula-3 language definition in terms of UTF standard, but it never came out, perhaps we will need to maintain two definitions one SPwM3 and two newer CM style, and based on those standards make a front end who can write to the two kind of standards and make them interoperable. One way of promoting CM3 could be talk about a renewed Modula-3, JVM-enabled, etc, system applications (alike Win32, Unix), where as DEC-SRC Modula-3 for research and development with parallelized environment like research system for open AAA compiler (I don't many others writing parallel compilers) with ESC, Vesta, etc. Thanks in advance --- El jue, 28/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Hendrik Boom" CC: m3devel at elegosoft.com Fecha: jueves, 28 de junio, 2012 12:19 My language (Serbian) is written with two alphabets. Before ISO-8859-2 we used ten (yes, 10) different encodings to represent our alphabet(s) with 8 bits. With ISO-8859-2 we got solution for Latin alphabet, but we had to use ISO-8859-5 for Cyrillic. One of our ten encodings (national standard come late) covered both Latin and Cyrillic in 8 bit. Back in 1991-2 I implemented system for handling above mentioned ten encodings. After that experience, an after decade or so of using/fighting ten encodings, you can trust me - even a notion of having single encoding for all language needs is a lifesaver :). That is where my oversensitivity to idea of having two ways to interpret strings comes from. Two ways, just because we can? Ok, we can use two, we can use ten, we can use fifty encodings!! But sensible way is to use one, if possible. And it is possible! It is called UTF-8. On Jun 28, 2012, at 2:51 PM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 01:14:22PM +0200, Dragi?a Duri? wrote: >> >> On Jun 27, 2012, at 12:19 PM, Jay K wrote: >> >>>> More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). >>> >>> I don't quite agree. >>> There are two ideal approaches. >>> 1) >>>? TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) >>>? "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR >> >> So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? > > I'm? starting to discover that a lot of my English documents have > nonAscii chracters in them.? In particular, the separate open and close > quotation marks around quoted speech take more than one byte in > Unicode.? True, in a starvation-level character set, they are both > represented as " , but that's really not what they are. > > -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 29 10:35:38 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 29 Jun 2012 10:35:38 +0200 Subject: [M3devel] Simple change to WIDECHAR type Message-ID: m3front/src/builtinTypes/WCharr.m3, line: T := EnumType.New (16_10000, elts); to T := EnumType.New (16_100000, elts); Will this break things? Any other assumptions anywhere? From dabenavidesd at yahoo.es Fri Jun 29 17:47:50 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 29 Jun 2012 16:47:50 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: Message-ID: <1340984870.74508.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: more important than a maximum length char type we need a minimum optimal char table size (and thus in word size so we can optimize). Use WIDECHAR and move it towards moduli arithmetic of CHAR {0..255}, so you can select at RT a common base to represent your character system I/O This happens in micro digital signal processor if you care output of such systems in practice. Also a terminal of characters is important to note for instance as a way of evaluating the speed of signal processor design. DEC had lot of devices like that so for instance to have a common interface to those systems is useful. Many mainframes are still handled mostly by use of that device so, I guess is important for such system to support most types of encodings: http://vt100.net/docs/vt510-rm/chapter8 http://en.wikipedia.org/wiki/ISO/IEC_8859-5 P Zollo write an emulator for that device, so maybe we can test speed of character streaming with that. Thanks in advance --- El vie, 29/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] Simple change to WIDECHAR type Para: "m3devel" Fecha: viernes, 29 de junio, 2012 03:35 m3front/src/builtinTypes/WCharr.m3, line: ? ? T := EnumType.New (16_10000, elts); to ? ? T := EnumType.New (16_100000, elts); Will this break things? Any other assumptions anywhere? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 29 17:52:55 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 29 Jun 2012 17:52:55 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: References: Message-ID: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> That, or UTF-16 encoding on top of current WIDECHAR. On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory. In other words, all TEXT containing WIDECHAR will double in size. > > On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: > >> m3front/src/builtinTypes/WCharr.m3, line: >> >> T := EnumType.New (16_10000, elts); >> >> to >> >> T := EnumType.New (16_100000, elts); >> >> Will this break things? Any other assumptions anywhere? >> > From dabenavidesd at yahoo.es Fri Jun 29 18:08:57 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 29 Jun 2012 17:08:57 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> Message-ID: <1340986137.5745.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I repeat we need performance well udnerstood as a matter of issue here. Who cares imposed standards, ISO are the real standards, no point to complain about that as DEC put de-facto on its terminals. Thanks in advance ? --- El vie, 29/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Simple change to WIDECHAR type Para: "Antony Hosking" CC: "m3devel" Fecha: viernes, 29 de junio, 2012 10:52 That, or UTF-16 encoding on top of current WIDECHAR. On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory.? In other words, all TEXT containing WIDECHAR will double in size. > > On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: > >> m3front/src/builtinTypes/WCharr.m3, line: >> >>???T := EnumType.New (16_10000, elts); >> >> to >> >>???T := EnumType.New (16_100000, elts); >> >> Will this break things? Any other assumptions anywhere? >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Sat Jun 30 09:33:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 09:33:00 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> Message-ID: <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> Current GetChar/SetChars and GetWideChar/SetWideChars are not character-level access methods, in terms of Unicode. They are "byte-level", fixed width data accesses. Reason: Both CHAR (cardinality 2^8) and WIDECHAR (cardinality 2^16) based strings must use one or more characters to represent whole Unicode (cardinality 2^20). If we must encode in any case, then we don't have any benefit of WIDECHAR (as it is implemented/understood now) at all! To represent Unicode with either CHAR or WIDECHAR based TEXTs - we must use either UTF-8 or UTF-16. Both are one-to-multibyte encodings, encoding one Unicode character to either 1-4 CHARs or 1-2 WIDECHARs. What exactly is meaning (at Modula-3 usual levels of abstraction) of character-level access? Do we need whatever bit pattern physically happening at some location in our data's representation. Or maybe we need numerical representation of actual, visually distinguishable in written representation, Unicode character value? One from that set of 2^20 elements? What is meaning of Text.Sub() based on byte-level access operations where our resulting TEXTs first character is in fact a prefix of some Unicode characters encoding? And/or where our last character is invalid/incomplete suffix of some encoded character. Since when are fast and efficient operations doing something we don't need at all our priority? We are getting nothing at all with WIDECHAR. No. Single. Thing. WIDECHAR does not make us closer to Unicode at all. WIDECHAR, together with CHAR (in context of our current TEXT) makes two almost-solutions to Unicode problem and existence of WIDECHAR scalar type makes us a bit closer to Unicode almost-solution of C world and nothing else. Currently, neither GetChar nor GetWideChar can get "a character at nth position". Reason: No character scalar type to keep any Unicode character. Solution: ====== * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can hold unencoded Unicode characters in scalar values in our Modula-3 programs, while preserving their properties. * Implement properties, relations and methods defined for Unicode. With ASCII, numeric order is everything. With Unicode - it is not. This is probably very big project but we can start somewhere, and let interested parties build on it. Dirk Muysers did work in this regard already. * Whoever thinks we don't need this and our "tradition" and "legacy" are important, please read this: http://unicode.org/standard/WhatIsUnicode.html . dd On Jun 29, 2012, at 5:52 PM, Dragi?a Duri? wrote: > That, or UTF-16 encoding on top of current WIDECHAR. > > On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > >> That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory. In other words, all TEXT containing WIDECHAR will double in size. >> >> On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: >> >>> m3front/src/builtinTypes/WCharr.m3, line: >>> >>> T := EnumType.New (16_10000, elts); >>> >>> to >>> >>> T := EnumType.New (16_100000, elts); >>> >>> Will this break things? Any other assumptions anywhere? >>> >> > From dragisha at m3w.org Sat Jun 30 10:56:27 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 10:56:27 +0200 Subject: [M3devel] Some earlier work Message-ID: <31C25C13-66B2-4637-9D33-C2E5E80AB9DA@m3w.org> This is how we implemented UTF8 strings over current TEXTs. Current implementation is UNSAFE and uses glibc utf8 methods. Nothing too complicated and nothing we can't implemented in Modula-3/portable C. ===== INTERFACE UText; TYPE T = TEXT; Char = CARDINAL; PROCEDURE Cat(t, u: T): T; PROCEDURE Equal(t, u: T): BOOLEAN; PROCEDURE GetChar(t: T; i: CARDINAL): Char; PROCEDURE ByteSize(t: T): CARDINAL; PROCEDURE Length(t: T): CARDINAL; PROCEDURE Empty(t: T): BOOLEAN; PROCEDURE Sub(t: T; start: CARDINAL; length: CARDINAL := LAST(CARDINAL)): T; PROCEDURE SetChars(VAR a: ARRAY OF Char; t: T); PROCEDURE FromChar(ch: Char): T; PROCEDURE FromChars(READONLY a: ARRAY OF Char): T; PROCEDURE Hash(t: T): Word.T; PROCEDURE Compare(t1, t2: T): [-1..1]; PROCEDURE FindChar(t: T; ch: Char; start: CARDINAL := 0): INTEGER; PROCEDURE FindCharR(t: T; ch: Char; start: CARDINAL := LAST(INTEGER)): INTEGER; END UText. From hendrik at topoi.pooq.com Sat Jun 30 16:29:24 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sat, 30 Jun 2012 10:29:24 -0400 Subject: [M3devel] Some earlier work In-Reply-To: <31C25C13-66B2-4637-9D33-C2E5E80AB9DA@m3w.org> References: <31C25C13-66B2-4637-9D33-C2E5E80AB9DA@m3w.org> Message-ID: <20120630142924.GB12402@topoi.pooq.com> On Sat, Jun 30, 2012 at 10:56:27AM +0200, Dragi?a Duri? wrote: > This is how we implemented Any chance you could show us the implementation and not just the INTERFACE? -- hendrik From dragisha at m3w.org Sat Jun 30 16:39:16 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 16:39:16 +0200 Subject: [M3devel] Some earlier work In-Reply-To: <20120630142924.GB12402@topoi.pooq.com> References: <31C25C13-66B2-4637-9D33-C2E5E80AB9DA@m3w.org> <20120630142924.GB12402@topoi.pooq.com> Message-ID: <0C45D4FF-8279-404F-A68E-35656D261959@m3w.org> Of course. http://dl.dropbox.com/u/60554338/UText.m3 On Jun 30, 2012, at 4:29 PM, Hendrik Boom wrote: > On Sat, Jun 30, 2012 at 10:56:27AM +0200, Dragi?a Duri? wrote: >> This is how we implemented > > Any chance you could show us the implementation and not just the INTERFACE? > > -- hendrik From jay.krell at cornell.edu Sat Jun 30 18:52:54 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 30 Jun 2012 16:52:54 +0000 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> References: , , <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org>, <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> Message-ID: I don't fully buy this. 16bit WIDECHAR is very useful on Windows. It can be used directly with a vast vast vast vast number of functions. 32bit char would be require conversion to and from all the time. As well, there are no codepages when using 16 characters. 8 bit characters are interpreted in a way on/by Windows that varies per OS and per user and which isn't stored with the string. I realize that Modula-3 code doesn't necessarily use the same interpretation. "no codepages" is the advantage of utf8 -- pick one "code page". If "code page" means "how to encode/decode more than 8 bits, 8 bits at a time. Hope all the data is 7 bit clean, so it doesn't matter. Otherwise convert to and from a lot. I do understand that current Unicode requires 20 bits, and that a 32bit character type is justifiable. As I understand, this was debated when Unicode was first designed but rejected as too large. - Jay ---------------------------------------- > From: dragisha at m3w.org > Date: Sat, 30 Jun 2012 09:33:00 +0200 > To: antony.hosking at gmail.com > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Simple change to WIDECHAR type > > Current GetChar/SetChars and GetWideChar/SetWideChars are not character-level access methods, in terms of Unicode. They are "byte-level", fixed width data accesses. Reason: Both CHAR (cardinality 2^8) and WIDECHAR (cardinality 2^16) based strings must use one or more characters to represent whole Unicode (cardinality 2^20). If we must encode in any case, then we don't have any benefit of WIDECHAR (as it is implemented/understood now) at all! > > To represent Unicode with either CHAR or WIDECHAR based TEXTs - we must use either UTF-8 or UTF-16. Both are one-to-multibyte encodings, encoding one Unicode character to either 1-4 CHARs or 1-2 WIDECHARs. > > What exactly is meaning (at Modula-3 usual levels of abstraction) of character-level access? Do we need whatever bit pattern physically happening at some location in our data's representation. Or maybe we need numerical representation of actual, visually distinguishable in written representation, Unicode character value? One from that set of 2^20 elements? > > What is meaning of Text.Sub() based on byte-level access operations where our resulting TEXTs first character is in fact a prefix of some Unicode characters encoding? And/or where our last character is invalid/incomplete suffix of some encoded character. > > Since when are fast and efficient operations doing something we don't need at all our priority? > > We are getting nothing at all with WIDECHAR. No. Single. Thing. WIDECHAR does not make us closer to Unicode at all. WIDECHAR, together with CHAR (in context of our current TEXT) makes two almost-solutions to Unicode problem and existence of WIDECHAR scalar type makes us a bit closer to Unicode almost-solution of C world and nothing else. > > Currently, neither GetChar nor GetWideChar can get "a character at nth position". Reason: No character scalar type to keep any Unicode character. > > Solution: > ====== > > * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can hold unencoded Unicode characters in scalar values in our Modula-3 programs, while preserving their properties. > * Implement properties, relations and methods defined for Unicode. With ASCII, numeric order is everything. With Unicode - it is not. This is probably very big project but we can start somewhere, and let interested parties build on it. Dirk Muysers did work in this regard already. > * Whoever thinks we don't need this and our "tradition" and "legacy" are important, please read this: http://unicode.org/standard/WhatIsUnicode.html . > > dd > > On Jun 29, 2012, at 5:52 PM, Dragi?a Duri? wrote: > > > That, or UTF-16 encoding on top of current WIDECHAR. > > > > On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > > > >> That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory. In other words, all TEXT containing WIDECHAR will double in size. > >> > >> On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: > >> > >>> m3front/src/builtinTypes/WCharr.m3, line: > >>> > >>> T := EnumType.New (16_10000, elts); > >>> > >>> to > >>> > >>> T := EnumType.New (16_100000, elts); > >>> > >>> Will this break things? Any other assumptions anywhere? > >>> > >> > > > From dragisha at m3w.org Sat Jun 30 19:17:23 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 19:17:23 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: References: , , <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org>, <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> Message-ID: <7E7703E6-48BD-4DCE-8336-97D41249EDCF@m3w.org> And since when usefulness on Windows defined anything Modula-3? To use vastvastvastnumber of Windows functions on Modula-3 TEXT you must call at least one Modula-3 function on your argument to make it passable to Windows API function. To make another single-call Modula-3 function mapping UTF-8 to Windows API acceptable argument is five minutes task. So, you are in fact gaining nothing with WIDECHAR you can't have with UTF8 packed in Text8.T. 32bit characters is what we have on non-Windows. And we must convert all the time if we are to use Modula-3 WIDECHAR based TEXT to non-Windows wchar strings. Are you arguing Windows is more important than all other platforms we support or what? On Jun 30, 2012, at 6:52 PM, Jay K wrote: > > I don't fully buy this. 16bit WIDECHAR is very useful on Windows. > > It can be used directly with a vast vast vast vast number of functions. > > 32bit char would be require conversion to and from all the time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Sat Jun 30 19:24:01 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Sat, 30 Jun 2012 10:24:01 -0700 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> Message-ID: <20120630172401.DFE8E1A207C@async.async.caltech.edu> =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: ... > >Solution: >=3D=3D=3D=3D=3D=3D > >* Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >hold unencoded Unicode characters in scalar values in our Modula-3 = >programs, while preserving their properties. >* Implement properties, relations and methods defined for Unicode. With = >ASCII, numeric order is everything. With Unicode - it is not. This is = >probably very big project but we can start somewhere, and let interested = >parties build on it. Dirk Muysers did work in this regard already. >* Whoever thinks we don't need this and our "tradition" and "legacy" are = >important, please read this: = >http://unicode.org/standard/WhatIsUnicode.html . > >dd Given what you have said about the near-uselessness of WIDECHAR, does anything actually use it much? What breaks if it is redefined to be the same as, say, INTEGER? (Or Word.T) CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if that could go back to using the SRC data structures. For people who do stuff like write VLSI design tools... (probably many other large-scale applications would like it too). Mika From dragisha at m3w.org Sat Jun 30 20:12:45 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 20:12:45 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <20120630172401.DFE8E1A207C@async.async.caltech.edu> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> Message-ID: <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> I don't see where WIDECHAR can be useful, such as it is. Esp. since TEXT in cm3 is non-flat structure, and it is almost always additional processing to prepare it even for a Windows API argument. Additional processing from dendriform cm3 TEXT is in no way more efficient if some nodes are already just-like-Windows-texts. Also, cm3 TEXT is overengineered - I hope I don't have to argue this. Everything is second to efficient concat operation. IMO, we must leave TEXT to be simple and CHAR based. Just like you need for your VLSI tools. And use something like UText.i3/m3 to use such objects to represent Unicode (UTF-8 encoded) any-language strings. And use WText.* for communication with wchar API's like Windows'. BTW, WIDECHAR literals are non sufficiently defined in cm3. There is a hole size of Moon. What is input encoding for source files containing WIDECHAR literals? For example: CONST Me = W"Dragi?a Duri?"; Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? On Jun 30, 2012, at 7:24 PM, Mika Nystrom wrote: > > =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > ... >> >> Solution: >> =3D=3D=3D=3D=3D=3D >> >> * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >> GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >> hold unencoded Unicode characters in scalar values in our Modula-3 = >> programs, while preserving their properties. >> * Implement properties, relations and methods defined for Unicode. With = >> ASCII, numeric order is everything. With Unicode - it is not. This is = >> probably very big project but we can start somewhere, and let interested = >> parties build on it. Dirk Muysers did work in this regard already. >> * Whoever thinks we don't need this and our "tradition" and "legacy" are = >> important, please read this: = >> http://unicode.org/standard/WhatIsUnicode.html . >> >> dd > > Given what you have said about the near-uselessness of WIDECHAR, does anything > actually use it much? What breaks if it is redefined to be the same as, say, > INTEGER? (Or Word.T) > > CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if > that could go back to using the SRC data structures. For people who do stuff > like write VLSI design tools... (probably many other large-scale applications > would like it too). > > Mika From dabenavidesd at yahoo.es Sun Jun 3 18:51:51 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 3 Jun 2012 17:51:51 +0100 (BST) Subject: [M3devel] Renewed interest in Modula-3 in HP Labs In-Reply-To: <1338470019.63945.YahooMailClassic@web29703.mail.ird.yahoo.com> Message-ID: <1338742311.65879.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all l looking to support a c-backend we would need to know how much can we optimize the energy consumption of any backend CG or how long can we use M3CG in compilation time total (the result could be that we need to distribute precompiled form, see p. 7: http://www.fdi.ucm.es/profesor/ricardo/ei2/crisis.pdf ). This would be a rather good measure of the need of a Object code backend or not (like Gcc, or JVM one, or a translation based like Pascal first implementations were pascal manually machine coded). For instance HP had? HP3000 [1] with several measurements, as their "u-code" Interface was not open but proprietary so you couldn't get their compiler for A-L/SPL (contrary to pascal). I'm sure they have worked out in this problem as well as for newer machines (like for fpga reposition programs for VAXen and Alpha) but how much they will emulate in SW I don't know. I write that because VAX is essentially translated to Alpha via M3CG via HW and equally in SW. I know they are producing VAX in FPGA, but don't know abut Alphas at all. Thanks in advance [1] R. P. Blake, ?Exploring a Stack Architecture,? Computer, vol. 10, no. 5, pp. 30?39, May 1977. --- El jue, 31/5/12, Daniel Alejandro Benavides D. escribi?: De: Daniel Alejandro Benavides D. Asunto: [M3devel] Renewed interest in Modula-3 in HP Labs Para: m3devel at elegosoft.com Fecha: jueves, 31 de mayo, 2012 08:13 Hi all: I see there is some products coming from HP, and others, but specially HP, claiming that provide lower consumption in data center power management. As I see they are working in Tycoon as a Data processor (created in Germany and Europe). As Greg Nelson wrote code for profiling the Alphas and Itanium, perhaps they are interested in work on ESC, but nevertheless Modula-3 and family languages (Quest) as Tycoon is based on them. If I may say so, Quest was defined by its simple denotational semantics, which is the natural deduction system of Baby Modula-3 (though it lacks more than that, but you can process the language of it through the former) Do we want to confirm that, if anyone interested in the TML - TVM please write me for any other questions or comments Thanks in advance http://www.eetimes.com/electronics-news/4373994/HP-cuts-data-center-power-in-lab-tests?cid=NL_EETimesDaily http://tycoon.hpl.hp.com/~tycoon/doc/users_manual_en/ch-intro.html http://wwwmatthes.in.tum.de/file/Publications/1992/Math92/paper.pdf -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Sun Jun 3 23:18:47 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sun, 3 Jun 2012 17:18:47 -0400 Subject: [M3devel] Renewed interest in Modula-3 in HP Labs In-Reply-To: <1338742311.65879.YahooMailClassic@web29703.mail.ird.yahoo.com> References: <1338470019.63945.YahooMailClassic@web29703.mail.ird.yahoo.com> <1338742311.65879.YahooMailClassic@web29703.mail.ird.yahoo.com> Message-ID: <20120603211847.GA17923@topoi.pooq.com> On Sun, Jun 03, 2012 at 05:51:51PM +0100, Daniel Alejandro Benavides D. wrote: > semantics, which is the natural deduction system of Baby Modula-3 You keep mentioning Baby Modula 3, but I have no idea what it is. Can you expalin and provide lins? -- hendrik From dabenavidesd at yahoo.es Sun Jun 3 23:48:42 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Sun, 3 Jun 2012 22:48:42 +0100 (BST) Subject: [M3devel] Renewed interest in Modula-3 in HP Labs In-Reply-To: <20120603211847.GA17923@topoi.pooq.com> Message-ID: <1338760122.84788.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: for sure yes, it's a first-order prototype-oriented functional programming language for writing programming language's type systems (in Spanish-native tongue countries like Abadi's, most common games or toy tool are Baby dolls, if you care. hence its name if I may say so). Basically the? language itself is not dissimilar from Modula-3 in its object-oriented part. It has a type system in lambda calculus, written for its meta-languages as well (e.g. Modula-3). Its denotational semantics are expressed in a natural deduction system logic. Basically was constructed to explain object oriented languages, though it wasn't written specially for that, but for type system calculus construction (you could say a kind of IBM's Axiom for computers science type theoretician? if I may say so). No other system besides DEC ones had ever play with it (its functional language although simple is not easily executable so Cardelli and others decide to use a different calculus for their joint Book "A Theory of Objects"). But at the? very core issue of unification it lead the work on type systems for its times. Thanks in advance --- El dom, 3/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] Renewed interest in Modula-3 in HP Labs Para: m3devel at elegosoft.com Fecha: domingo, 3 de junio, 2012 16:18 On Sun, Jun 03, 2012 at 05:51:51PM +0100, Daniel Alejandro Benavides D. wrote: > semantics, which is the natural deduction system of Baby Modula-3 You keep mentioning Baby Modula 3, but I have no idea what it is.? Can you expalin and provide lins? -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Wed Jun 6 09:57:40 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 6 Jun 2012 09:57:40 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120606064732.2C9242474003@birch.elegosoft.com> References: <20120606064732.2C9242474003@birch.elegosoft.com> Message-ID: <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Jay, What benefit from 4.6 backend do we expect for cm3 if most of optimizer is "optimized out" of cm3cg? If our "trees" are reason why you must switch optimizations off, is it not more logical to fix our "trees"? One by one, if need be. A look into gm2 (for example), a fix in our backend. That way, future porting to most recent gcc's will be much easier? TIA, dd On Jun 6, 2012, at 8:47 AM, Jay Krell wrote: > Log message: > remove more of the optimizer -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Wed Jun 6 10:10:06 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 6 Jun 2012 08:10:06 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Message-ID: I have very mixed feelings about the optimizer. 1) I'm not certain it is worth the time it takes to run. 2) Fixing our trees isn't necessarily trivial. The most expedient thing is neither to fix the trees, nor remove the optimizer code, but merely to set the optimizer to be off in parse.c. 3) gcc is huge, I'd kind of like to see if I can get actually building it can be made much faster/smaller 4) Probably what really got me started here is the gmp/mpfr/mpc dependency. 5) The "best" thing isn't necessarily to use gcc at all. 6) I'll maybe move up to 4.7 soon. 6b) and maybe not spend so much time on it? Maybe just ln -s in gmp/mpfr/mpc and port only the needed changes? Maybe even not using g++ but the hybrid gcc/g++ I use for gcc-apple (4.2) 7) Do folks out there really use the Modula-3/gcc optimizer, and notice it produces code that runs much faster? ?- Jay ________________________________ > From: dragisha at m3w.org > Date: Wed, 6 Jun 2012 09:57:40 +0200 > To: jkrell at elego.de > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > Jay, > > What benefit from 4.6 backend do we expect for cm3 if most of optimizer > is "optimized out" of cm3cg? > > If our "trees" are reason why you must switch optimizations off, is it > not more logical to fix our "trees"? One by one, if need be. A look > into gm2 (for example), a fix in our backend. That way, future porting > to most recent gcc's will be much easier? > > TIA, > dd > > On Jun 6, 2012, at 8:47 AM, Jay Krell wrote: > > Log message: > remove more of the optimizer > From jay.krell at cornell.edu Wed Jun 6 10:15:32 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 6 Jun 2012 08:15:32 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, Message-ID: > > What benefit from 4.6 backend do we expect for cm3 if most of optimizer ps: just the general goodness of staying current. Even if a hacked up current. 4.7.0 is out already.. ?- Jay ---------------------------------------- > From: jay.krell at cornell.edu > To: dragisha at m3w.org; jkrell at elego.de > CC: m3devel at elegosoft.com > Subject: RE: [M3devel] [M3commit] CVS Update: cm3 > Date: Wed, 6 Jun 2012 08:10:06 +0000 > > > I have very mixed feelings about the optimizer. > 1) I'm not certain it is worth the time it takes to run. > 2) Fixing our trees isn't necessarily trivial. > The most expedient thing is neither to fix the trees, nor remove the optimizer code, but merely > to set the optimizer to be off in parse.c. > 3) gcc is huge, I'd kind of like to see if I can get actually building it can be made much faster/smaller > 4) Probably what really got me started here is the gmp/mpfr/mpc dependency. > 5) The "best" thing isn't necessarily to use gcc at all. > 6) I'll maybe move up to 4.7 soon. > 6b) and maybe not spend so much time on it? Maybe just ln -s in gmp/mpfr/mpc and port only the needed changes? > Maybe even not using g++ but the hybrid gcc/g++ I use for gcc-apple (4.2) > 7) Do folks out there really use the Modula-3/gcc optimizer, and notice it produces code that runs much faster? > > > - Jay > > > ________________________________ > > From: dragisha at m3w.org > > Date: Wed, 6 Jun 2012 09:57:40 +0200 > > To: jkrell at elego.de > > CC: m3devel at elegosoft.com > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > > > Jay, > > > > What benefit from 4.6 backend do we expect for cm3 if most of optimizer > > is "optimized out" of cm3cg? > > > > If our "trees" are reason why you must switch optimizations off, is it > > not more logical to fix our "trees"? One by one, if need be. A look > > into gm2 (for example), a fix in our backend. That way, future porting > > to most recent gcc's will be much easier? > > > > TIA, > > dd > > > > On Jun 6, 2012, at 8:47 AM, Jay Krell wrote: > > > > Log message: > > remove more of the optimizer > > > From dragisha at m3w.org Wed Jun 6 10:51:33 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 6 Jun 2012 10:51:33 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Message-ID: I am using it, and I need it. Does it run better/faster? I didn't test, but is it something to even ask, these days, architectures, ? ? Only if you turned everything off in 5.8.6 and later, as you'r doing it now, then probably my "-O2" default it is of no benefit at all :). Generally, our "pitch" to "sell" super-modern-ultra-blast-mega-fast-superlative-OO and everything else you only dreamed about? And add "no CPU optimizations"? Imagine that. On Jun 6, 2012, at 10:10 AM, Jay K wrote: > 7) Do folks out there really use the Modula-3/gcc optimizer, and notice it produces code that runs much faster? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Wed Jun 6 11:38:18 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 6 Jun 2012 09:38:18 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , Message-ID: 5.8.6 does allow many optimizations to occur. We turn off a very small number directly. Functions that call setjmp have optimizations inhibited by declaring all locals volatile. We don't give the compiler good type information, and we take the address of stuff more than necessary, by generating very low level code. Where you have e.g. MODULE Foo; TYPE Point =? RECORD x,y:INTEGER END; PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; We generate the equivalent of: typedef ptrdiff_t INTEGER; typedef char* ADDRESS; INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. ?- Jay ________________________________ > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > From: dragisha at m3w.org > Date: Wed, 6 Jun 2012 10:51:33 +0200 > CC: jkrell at elego.de; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > I am using it, and I need it. > > Does it run better/faster? I didn't test, but is it something to even > ask, these days, architectures, ? ? > > Only if you turned everything off in 5.8.6 and later, as you'r doing it > now, then probably my "-O2" default it is of no benefit at all :). > > Generally, our "pitch" to "sell" > super-modern-ultra-blast-mega-fast-superlative-OO and everything else > you only dreamed about? And add "no CPU optimizations"? Imagine that. > > On Jun 6, 2012, at 10:10 AM, Jay K wrote: > > 7) Do folks out there really use the Modula-3/gcc optimizer, and notice > it produces code that runs much faster? > From jay.krell at cornell.edu Wed Jun 6 11:42:52 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 6 Jun 2012 09:42:52 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, , <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , Message-ID: ?> Functions that call setjmp I meant -- functions wtih TRY/EXCEPT or TRY/FINALLY. :) ?- Jay ---------------------------------------- > From: jay.krell at cornell.edu > To: dragisha at m3w.org > Date: Wed, 6 Jun 2012 09:38:18 +0000 > CC: jkrell at elego.de; m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > > 5.8.6 does allow many optimizations to occur. > We turn off a very small number directly. > Functions that call setjmp have optimizations inhibited by declaring all locals volatile. > We don't give the compiler good type information, and we take the address of stuff more than necessary, by > generating very low level code. > Where you have e.g. > MODULE Foo; > TYPE Point = RECORD x,y:INTEGER END; > PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; > > > We generate the equivalent of: > > > typedef ptrdiff_t INTEGER; > typedef char* ADDRESS; > INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } > > > Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. > > > > - Jay > > > ________________________________ > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > From: dragisha at m3w.org > > Date: Wed, 6 Jun 2012 10:51:33 +0200 > > CC: jkrell at elego.de; m3devel at elegosoft.com > > To: jay.krell at cornell.edu > > > > I am using it, and I need it. > > > > Does it run better/faster? I didn't test, but is it something to even > > ask, these days, architectures, ? ? > > > > Only if you turned everything off in 5.8.6 and later, as you'r doing it > > now, then probably my "-O2" default it is of no benefit at all :). > > > > Generally, our "pitch" to "sell" > > super-modern-ultra-blast-mega-fast-superlative-OO and everything else > > you only dreamed about? And add "no CPU optimizations"? Imagine that. > > > > On Jun 6, 2012, at 10:10 AM, Jay K wrote: > > > > 7) Do folks out there really use the Modula-3/gcc optimizer, and notice > > it produces code that runs much faster? > > > From dragisha at m3w.org Wed Jun 6 12:17:54 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 6 Jun 2012 12:17:54 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, , <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , Message-ID: I know that much about generated code :). "Good" thing is - not many things changed in *m3 backend since I ported pm3 to LINUX_ALPHA :) On Jun 6, 2012, at 11:42 AM, Jay K wrote: > > > Functions that call setjmp > > > I meant -- functions wtih TRY/EXCEPT or TRY/FINALLY. :) > > - Jay > > ---------------------------------------- >> From: jay.krell at cornell.edu >> To: dragisha at m3w.org >> Date: Wed, 6 Jun 2012 09:38:18 +0000 >> CC: jkrell at elego.de; m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> >> >> 5.8.6 does allow many optimizations to occur. >> We turn off a very small number directly. >> Functions that call setjmp have optimizations inhibited by declaring all locals volatile. >> We don't give the compiler good type information, and we take the address of stuff more than necessary, by >> generating very low level code. >> Where you have e.g. >> MODULE Foo; >> TYPE Point = RECORD x,y:INTEGER END; >> PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; >> >> >> We generate the equivalent of: >> >> >> typedef ptrdiff_t INTEGER; >> typedef char* ADDRESS; >> INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } >> >> >> Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. >> >> >> >> - Jay >> >> >> ________________________________ >>> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >>> From: dragisha at m3w.org >>> Date: Wed, 6 Jun 2012 10:51:33 +0200 >>> CC: jkrell at elego.de; m3devel at elegosoft.com >>> To: jay.krell at cornell.edu >>> >>> I am using it, and I need it. >>> >>> Does it run better/faster? I didn't test, but is it something to even >>> ask, these days, architectures, ? ? >>> >>> Only if you turned everything off in 5.8.6 and later, as you'r doing it >>> now, then probably my "-O2" default it is of no benefit at all :). >>> >>> Generally, our "pitch" to "sell" >>> super-modern-ultra-blast-mega-fast-superlative-OO and everything else >>> you only dreamed about? And add "no CPU optimizations"? Imagine that. >>> >>> On Jun 6, 2012, at 10:10 AM, Jay K wrote: >>> >>> 7) Do folks out there really use the Modula-3/gcc optimizer, and notice >>> it produces code that runs much faster? >>> >> > From dabenavidesd at yahoo.es Wed Jun 6 16:17:23 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 6 Jun 2012 15:17:23 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: Message-ID: <1338992243.7847.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I noticed originally factored code is better, and if its dead then that's optimization. I don't know too much gcc or gdb, but factoring to match open64 (c++) might be better. About Alphas, I know that DEC Firefly was commercialized as SMP VS3520/40 and unrelease V3820/40, given that a DB vendor ported products to it, shouldn't we use their backends to a DB machine? Besides that I think that developing a product for that end is what HP is doing: http://www.zdnetasia.com/hp-aiming-for-data-protection-battleground-62305019.htm?src=newsletter That said, alphas wouldn't use gcc but their own backend directed optimizer, like for their DEClanguages internal products. Thanks in advance --- El mi?, 6/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "Jay K" CC: "Jay Krell" , "m3devel" Fecha: mi?rcoles, 6 de junio, 2012 05:17 I know that much about generated code :). "Good" thing is - not many things changed in *m3 backend since I ported pm3 to LINUX_ALPHA :) On Jun 6, 2012, at 11:42 AM, Jay K wrote: > >? > Functions that call setjmp > > > I meant -- functions wtih TRY/EXCEPT or TRY/FINALLY. :) > >? - Jay > > ---------------------------------------- >> From: jay.krell at cornell.edu >> To: dragisha at m3w.org >> Date: Wed, 6 Jun 2012 09:38:18 +0000 >> CC: jkrell at elego.de; m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> >> >> 5.8.6 does allow many optimizations to occur. >> We turn off a very small number directly. >> Functions that call setjmp have optimizations inhibited by declaring all locals volatile. >> We don't give the compiler good type information, and we take the address of stuff more than necessary, by >> generating very low level code. >> Where you have e.g. >> MODULE Foo; >> TYPE Point =? RECORD x,y:INTEGER END; >> PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; >> >> >> We generate the equivalent of: >> >> >> typedef ptrdiff_t INTEGER; >> typedef char* ADDRESS; >> INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } >> >> >> Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. >> >> >> >> - Jay >> >> >> ________________________________ >>> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >>> From: dragisha at m3w.org >>> Date: Wed, 6 Jun 2012 10:51:33 +0200 >>> CC: jkrell at elego.de; m3devel at elegosoft.com >>> To: jay.krell at cornell.edu >>> >>> I am using it, and I need it. >>> >>> Does it run better/faster? I didn't test, but is it something to even >>> ask, these days, architectures, ? ? >>> >>> Only if you turned everything off in 5.8.6 and later, as you'r doing it >>> now, then probably my "-O2" default it is of no benefit at all :). >>> >>> Generally, our "pitch" to "sell" >>> super-modern-ultra-blast-mega-fast-superlative-OO and everything else >>> you only dreamed about? And add "no CPU optimizations"? Imagine that. >>> >>> On Jun 6, 2012, at 10:10 AM, Jay K wrote: >>> >>> 7) Do folks out there really use the Modula-3/gcc optimizer, and notice >>> it produces code that runs much faster? >>> >> > ??? ???????? ?????? ??? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Wed Jun 6 18:18:08 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Wed, 06 Jun 2012 09:18:08 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Message-ID: <20120606161808.7F5EA1A205B@async.async.caltech.edu> Jay K writes: > ... >7) Do folks out there really use the Modula-3/gcc optimizer=2C and notice i= >t produces code that runs much faster? If we are talking about turning on optimizations in the m3makefile, then the answer is: Yes! At least with CM3 it makes a huge difference in runtime. Without the optimizer CM3-produced code runs far slower than PM3-produced code (I've seen 3X I think.) With it, CM3 can sometimes keep up. Unless you use a lot of TYPECASE or other constructs that have a much less efficient implementation in the CM3 libraries than in the PM3 libraries. Mika From dabenavidesd at yahoo.es Wed Jun 6 20:50:59 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 6 Jun 2012 19:50:59 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120606161808.7F5EA1A205B@async.async.caltech.edu> Message-ID: <1339008659.61806.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: this is very bad news, sounds like we had a old RT. I wonder how parallelized was DEC-SRC Vulcan or alike environments. Thanks in advance --- El mi?, 6/6/12, Mika Nystrom escribi?: De: Mika Nystrom Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "Jay K" CC: m3devel at elegosoft.com Fecha: mi?rcoles, 6 de junio, 2012 11:18 Jay K writes: > ... >7) Do folks out there really use the Modula-3/gcc optimizer=2C and notice i= >t produces code that runs much faster? If we are talking about turning on optimizations in the m3makefile, then the answer is: Yes!? At least with CM3 it makes a huge difference in runtime.? Without the optimizer CM3-produced code runs far slower than PM3-produced code (I've seen 3X I think.)? With it, CM3 can sometimes keep up.? Unless you use a lot of TYPECASE or other constructs that have a much less efficient implementation in the CM3 libraries than in the PM3 libraries. ? ? Mika -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Thu Jun 7 02:06:30 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Wed, 6 Jun 2012 20:06:30 -0400 Subject: [M3devel] ran out of space in /tmp while building .deb Message-ID: <20120607000630.GA4233@topoi.pooq.com> While trying to build a deb for modula 3 on my laptop (a wheezy 32-bit intel machine) /tmp got full and the build aborted. Obviously, I should place /tmp elsewhere -- except that there's no entry in my /etc/fstab telling it where the tmpfs should be mounted. If I could just get it not to mount anything on /tmp things should be fine. Apparently, though, the kernel just know better, and I'm stuck wit a small /tmp. Is there eny way to tell make-dist.py that it's supposed to put its temporary files somewhere other than .tmp? -- hendrik From dragisha at m3w.org Thu Jun 7 03:02:19 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 7 Jun 2012 03:02:19 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607011634.468b6bbf@wenus.next.com.pl> References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> <20120607011634.468b6bbf@wenus.next.com.pl> Message-ID: <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org> Try ALPHA_LINUX, maybe ask Jay first :) On Jun 7, 2012, at 1:16 AM, Dariusz Knoci?ski wrote: > Dnia 2012-06-06, o godz. 12:17:54 > Dragi?a Duri? napisa?(a): > >> I know that much about generated code :). >> >> "Good" thing is - not many things changed in *m3 backend since I ported pm3 >> to LINUX_ALPHA :) >> > Let me ask a stupid question. Is cm3 working on LINUX_ALPHA? I have one ES40 > working server with Gentoo Linux. > > Best Regards > Dariusz Knoci?ski. From jay.krell at cornell.edu Thu Jun 7 03:19:20 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 7 Jun 2012 01:19:20 +0000 Subject: [M3devel] ran out of space in /tmp while building .deb In-Reply-To: <20120607000630.GA4233@topoi.pooq.com> References: <20120607000630.GA4233@topoi.pooq.com> Message-ID: Use the source. Change it if needed. - Jay > Date: Wed, 6 Jun 2012 20:06:30 -0400 > From: hendrik at topoi.pooq.com > To: m3devel at elegosoft.com > Subject: [M3devel] ran out of space in /tmp while building .deb > > While trying to build a deb for modula 3 on my laptop (a wheezy 32-bit > intel machine) /tmp got full and the build aborted. > > Obviously, I should place /tmp elsewhere -- except that there's no entry > in my /etc/fstab telling it where the tmpfs should be mounted. If I > could just get it not to mount anything on /tmp things should be fine. > Apparently, though, the kernel just know better, and I'm stuck wit a > small /tmp. > > Is there eny way to tell make-dist.py that it's supposed to put its > temporary files somewhere other than .tmp? > > -- hendrik > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Thu Jun 7 03:28:15 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 7 Jun 2012 01:28:15 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , , , <20120607011634.468b6bbf@wenus.next.com.pl>, <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org> Message-ID: > Is cm3 working on LINUX_ALPHA? I have one ES40 working server with Gentoo Linux I don't think it does yet, but give me ssh access and I can most likely make it work pretty quickly.There is very very very little to porting these days. The main thing is finding the jmpbuf size, and adding the target to various tables, describing at little or big endian, 32bit or 64bit, etc., but even that is often automatic, if it starts "alpha_" or contains "64", it is assumed 64bit. If it contains "alpha", it is probably assumed little endian. If it contains "_linux", then it is assumed Linux, etc. The jmpbuf size we can just assume something big like 1k (that is a tremendous overkill). jmpbuf size should/will soon be eliminated as a factor in porting anyway.And then you just need to create a config file ALPHA_LINUX that includes("Alpha64.common") and "Linux.common" or such. Does ALPHA_LINUX have a 32bit mode/ABI?Or is it all 64bit all the time?i.e.what does this do:echo > foo.cgcc -m32 foo.c I had some Alphas but I've sold them all.I was given access to Alphas running Tru64 v4.something and v5.something and got that to work.But the "kernel" (Tru64 vs. Linux) and not the "processor architecture" (alpha, x86, sparc) are generally a larger concern, and Linux is really old hat at this point. See..one day...we'll generate C (and maybe have cooperative suspend) and these questions will all just go away. The answer will be "of course, most likely, nothing special". - Jay > From: dragisha at m3w.org > Date: Thu, 7 Jun 2012 03:02:19 +0200 > To: dknoto at gmail.com > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > > Try ALPHA_LINUX, maybe ask Jay first :) > > On Jun 7, 2012, at 1:16 AM, Dariusz Knoci?ski wrote: > > > Dnia 2012-06-06, o godz. 12:17:54 > > Dragi?a Duri? napisa?(a): > > > >> I know that much about generated code :). > >> > >> "Good" thing is - not many things changed in *m3 backend since I ported pm3 > >> to LINUX_ALPHA :) > >> > > Let me ask a stupid question. Is cm3 working on LINUX_ALPHA? I have one ES40 > > working server with Gentoo Linux. > > > > Best Regards > > Dariusz Knoci?ski. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Thu Jun 7 06:45:38 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 7 Jun 2012 04:45:38 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120606161808.7F5EA1A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> Message-ID: Daniel, I can't find the email now, as usual, you are probably wrong. We don't have an older runtime, we have a newer one, I think. With more allowance for dynamic loading. Mika, Maybe a TYPECASE-intense design is generally poor? dynamic_cast is slow in some C++ implementations. And I've never seen it used much. Some, but not much. The "type matching" that C++ exception handling has to do isn't particularly fast, though there are other costs there. Other than the stack walk, there is "finding the base of the object", and strcmp to do the actual type match -- name-based-type-equality and all that, with a hope that it suffices and no runtime checking of type hashes like Modula-3 does.. Maybe you should switch on your own type tag? ? But I guess Modula-3 doesn't have unions. Or use OBJECT and method calls? Which reminds me...it bothers me that OBJECT requires heap allocation and garbage collection. It shouldn't require either. I know we have function pointers available to simulate it, without heap allocation, but what I don't know, is if the "implicit downcast" in a virtual function/method call is doable in safe code or not. I'll have to look into it..but I'm busy now.. Maybe there is an optimization whereby the compiler can figure out that there is a small set of likely types that it could check first? Or maybe the full feature could be implemented more efficiently? Maybe it can be optimized based on the fact that the types known to the system are read-mostly, rarely written/appended? I don't know. I'd really have to look into what the language supports and how it is implemented. I'm not certain of either. In C++, typeid() is fast, and requires there be virtual functions (OBJECT). Is TYPECASE limited to OBJECTs? Or heap allocated data? Later.. ?- Jay ---------------------------------------- > To: jay.krell at cornell.edu > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > Date: Wed, 6 Jun 2012 09:18:08 -0700 > From: mika at async.caltech.edu > > Jay K writes: > > > ... > >7) Do folks out there really use the Modula-3/gcc optimizer=2C and notice i= > >t produces code that runs much faster? > > If we are talking about turning on optimizations in the m3makefile, then the > answer is: > > Yes! At least with CM3 it makes a huge difference in runtime. Without > the optimizer CM3-produced code runs far slower than PM3-produced code > (I've seen 3X I think.) With it, CM3 can sometimes keep up. Unless you > use a lot of TYPECASE or other constructs that have a much less efficient > implementation in the CM3 libraries than in the PM3 libraries. > > Mika From dragisha at m3w.org Thu Jun 7 09:30:29 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 7 Jun 2012 09:30:29 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> Message-ID: <0DF4844B-46D5-4AC7-97AD-AE18A38C2BED@m3w.org> Exatcly. Relevant parts of initialization are incremental. On Jun 7, 2012, at 6:45 AM, Jay K wrote: > Daniel, I can't find the email now, as usual, you are probably wrong. > > > We don't have an older runtime, we have a newer one, I think. > With more allowance for dynamic loading. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Thu Jun 7 16:48:24 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 7 Jun 2012 15:48:24 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <0DF4844B-46D5-4AC7-97AD-AE18A38C2BED@m3w.org> Message-ID: <1339080504.10970.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: Yes, but your estimation that user kind of behavior respect a programmer educated in using true multitask machine is not accurate. You can't read a program in two parts at a same machine, you need two different people, that it's so a true system of processors, you need a different kind of system to execute some action described. Little is said if you need to modify an OS code and another one also needs that how do you change the OS without interfering the other? To maintain a consistent view of your system? DEC-SRC were very well educated people who thought that easy of this was not hold in their system (Bob Taylor). They created yet another improvement to Modula-3+ in Modula-2+e Instead of taking inspiration for that kind of systems, they developed a newer one but I don't know much more than that it was a Win system-like. That is the reason why Modula-3 in Object code view isn't quite of many other traditional OS fixed Machine (systems that don't scale anyhow). Instead of Virtual Machinery you are confronted a true Multitasking machine. OK, if you care about that, think what is done to be done for Modula-3 is the full formal definition of the language which starts in Baby Modula-3 consist in that user of the language use it in its own description (hard to explain but that's the only way I'm afraid). Thanks in advance --- El jue, 7/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "Jay K" CC: "m3devel" Fecha: jueves, 7 de junio, 2012 02:30 Exatcly. Relevant parts of initialization are incremental. On Jun 7, 2012, at 6:45 AM, Jay K wrote: Daniel, I can't find the email now, as usual, you are probably wrong. We don't have an older runtime, we have a newer one, I think. With more allowance for dynamic loading. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Thu Jun 7 17:35:52 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 7 Jun 2012 11:35:52 -0400 Subject: [M3devel] ran out of space in /tmp while building .deb In-Reply-To: References: <20120607000630.GA4233@topoi.pooq.com> Message-ID: <20120607153552.GA8202@topoi.pooq.com> On Thu, Jun 07, 2012 at 01:19:20AM +0000, Jay K wrote: > > Use the source. Change it if needed. - Jay Thanks. But before I started hacking the source I found anothher way. It turns out that there's a parameter that suppresses mounting /tmp as a tmpfs. and it seems Debian thinks they got it wrong, and when the current initscripts trickle down from sid to testing the problem will go away by itself. I didn't wait; I changed the parameter; I won't have to hack the source. -- hendrik From hendrik at topoi.pooq.com Thu Jun 7 17:37:29 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 7 Jun 2012 11:37:29 -0400 Subject: [M3devel] .debs for modula 3 In-Reply-To: References: <20120607000630.GA4233@topoi.pooq.com> Message-ID: <20120607153729.GB8202@topoi.pooq.com> By the way, is there anything I should be doing with these .debs I'm creating other than just using them myseof? -- hendrik From dabenavidesd at yahoo.es Thu Jun 7 18:06:53 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 7 Jun 2012 17:06:53 +0100 (BST) Subject: [M3devel] .debs for modula 3 In-Reply-To: <20120607153729.GB8202@topoi.pooq.com> Message-ID: <1339085213.73020.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: Encouraging. What about a chip cipher sign from the repository? I like the idea of the signing of the deb, if I had a utility to sign them by yourself or Elego folks who want to recreate them there (I think this is mostly perl )guys? http://www.advogato.org/article/750.html A different question is whether their sharing of packages is accepted by Elego admin since most of the development occurs not only there so you know, so use a center development or distributed (only DEC-SRC used their Vesta to sign cache builds but maybe others used it in DEC-*, etc). Thanks in advance --- El jue, 7/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: [M3devel] .debs for modula 3 Para: "m3devel" Fecha: jueves, 7 de junio, 2012 10:37 By the way, is there anything I should be doing with these .debs I'm creating other than just using them myseof? -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Thu Jun 7 18:36:41 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 07 Jun 2012 09:36:41 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> Message-ID: <20120607163641.A81351A205B@async.async.caltech.edu> Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated. Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things. In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example. In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know). The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so. Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time. There is simply a static array of the types. CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome. It's all in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a heavy performance cost. And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation. Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job". I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit. I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads. Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code. I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong. Life's too short... Mika P.S. how are the pthreads coming along? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do >isn't particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the object"=2C >and strcmp to do the actual type match -- name-based-type-equality >and all that=2C with a hope that it suffices and no runtime checking >of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires >heap allocation and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C >without heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler >can figure out that there is a small set of likely types >that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types >known to the system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports >and how it is implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual >functions (OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced code >> (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. Unless you >> use a lot of TYPECASE or other constructs that have a much less efficient >> implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > = From rcolebur at SCIRES.COM Thu Jun 7 18:52:04 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Thu, 7 Jun 2012 12:52:04 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607163641.A81351A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> Message-ID: Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7). What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated. Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things. In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example. In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know). The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so. Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time. There is simply a static array of the types. CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome. It's all in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a heavy performance cost. And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation. Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job". I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit. I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads. Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code. I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong. Life's too short... Mika P.S. how are the pthreads coming along? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > = From dabenavidesd at yahoo.es Thu Jun 7 21:42:44 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 7 Jun 2012 20:42:44 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: Message-ID: <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: Yes, it is, but the same conditioning over System Pthreads, is that you can't always link the threads against themselves, so you need re-implement it correctly. Good style DEC-SRC threads might be along the verification project for the Alpha with Vector extensions: http://barroso.org/publications/piranha_asilomar.pdf Thanks in advance --- El jue, 7/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" Fecha: jueves, 7 de junio, 2012 11:52 Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7).? What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated.? Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things.? In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example.? In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know).? The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so.? Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time.? There is simply a static array of the types.? CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome.? It's all in RT0 somewhere.? In short, CM3 does "more" than SRC M3 did but at a heavy performance cost.? And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation.? Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job".? I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit.? I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads.? Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code.? I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong.? Life's too short... ? ???Mika P.S. how are the pthreads coming along?? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now?? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > ??? ???????? ?????? ??? ? = -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Thu Jun 7 22:09:58 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 7 Jun 2012 22:09:58 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607163641.A81351A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> Message-ID: <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> Are you sure about this? Both pm3 and cm3 load type structures from object files on initialization. Type data is in UNTRACED REF ARRAY? structures, for both of them. Difference is in algorithm being incremental, "multi-pass" in cm3 and single-pass in pm3/SRC. Also, for garbage collection, there is a check to see if number of modules (meaning more globals areas) has grown, and rebuilding of globals list in case it is. There is nothing static in type structure of Modula-3. On Jun 7, 2012, at 6:36 PM, Mika Nystrom wrote: > Because of the restrictions of SRC and P M3, types are statically > allocated at compile time and all their subtyping relationships are known > at that time. There is simply a static array of the types. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Thu Jun 7 22:35:37 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 07 Jun 2012 13:35:37 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> Message-ID: <20120607203537.1CBC81A205B@async.async.caltech.edu> Sorry, "static" was (slightly) the wrong word. I believe they are malloced as an array during program startup. There is something significant about the ordering of this array, which is why you can't just add types to the PM3 environment during runtime. CM3 uses more indirection, so it's much easier to add things while running, but it also makes TYPECASE, ISTYPE, etc., slower. Possibly NARROW (explicit as well as implicit) as well... Mika =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > >--Apple-Mail=_D8C54D3B-50C9-47D3-AD4D-116B678A55EC >Content-Transfer-Encoding: quoted-printable >Content-Type: text/plain; > charset=windows-1252 > >Are you sure about this? > >Both pm3 and cm3 load type structures from object files on = >initialization. Type data is in UNTRACED REF ARRAY=85 structures, for = >both of them. > >Difference is in algorithm being incremental, "multi-pass" in cm3 and = >single-pass in pm3/SRC. Also, for garbage collection, there is a check = >to see if number of modules (meaning more globals areas) has grown, and = >rebuilding of globals list in case it is. >=20 >There is nothing static in type structure of Modula-3. > >On Jun 7, 2012, at 6:36 PM, Mika Nystrom wrote: > >> Because of the restrictions of SRC and P M3, types are statically >> allocated at compile time and all their subtyping relationships are = >known >> at that time. There is simply a static array of the types. > > >--Apple-Mail=_D8C54D3B-50C9-47D3-AD4D-116B678A55EC >Content-Transfer-Encoding: quoted-printable >Content-Type: text/html; > charset=windows-1252 > >-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">Are = >you sure about this?

Both pm3 and cm3 load type = >structures from object files on initialization. Type data is in UNTRACED = >REF ARRAY=85 structures, for both of = >them.

Difference is in algorithm being = >incremental, "multi-pass" in cm3 and single-pass in pm3/SRC. Also, for = >garbage collection, there is a check to see if number of modules = >(meaning more globals areas) has grown, and rebuilding of globals list = >in case it is.
 
There is nothing static in = >type structure of Modula-3.

On Jun 7, 2012, at = >6:36 PM, Mika Nystrom wrote:

class=3D"Apple-interchange-newline">
class=3D"Apple-style-span" style=3D"border-collapse: separate; = >font-family: Helvetica; font-style: normal; font-variant: normal; = >font-weight: normal; letter-spacing: normal; line-height: normal; = >orphans: 2; text-align: -webkit-auto; text-indent: 0px; text-transform: = >none; white-space: normal; widows: 2; word-spacing: 0px; = >-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: = >0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: = >auto; -webkit-text-stroke-width: 0px; font-size: medium; ">Because of = >the restrictions of SRC and P M3, types are statically
allocated at = >compile time and all their subtyping relationships are known
at that = >time.  There is simply a static array of the = >types.

= > >--Apple-Mail=_D8C54D3B-50C9-47D3-AD4D-116B678A55EC-- From hendrik at topoi.pooq.com Thu Jun 7 23:11:35 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 7 Jun 2012 17:11:35 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607163641.A81351A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> Message-ID: <20120607211135.GA6314@topoi.pooq.com> On Thu, Jun 07, 2012 at 09:36:41AM -0700, Mika Nystrom wrote: > Hi Jay, > > TYPECASE is limited to "reference" types, which effectively means > heap-allocated. Unless you can get alloca in there, I suppose... what > I mean is that in Green Book Modula-3 the only way to get a reference > type is either through a heap allocation or an UNSAFE operation. > > TYPECASE is sometimes the only way to do things. In the Green Book > there are examples of using subtyping to have multiple generations > of objects in the same pickles, for example. In my program, it was > inside an interpreter that's figuring things out without any prior > type information, using ISTYPE or TYPECASE. > > The issue with TYPECASE that I brought up is actually that the > implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than > in PM3's (= SRC M3 as far as I know). The reason (which you allude to) > is that Critical Mass did a lot of work on supporting dynamic loading > of Modula-3 code (loading in types not known at compile time) and as > with many of the other projects they carried out, the code quality was > so-so. Because of the restrictions of SRC and P M3, types are statically > allocated at compile time and all their subtyping relationships are known > at that time. There is simply a static array of the types. CM3, on the > other hand, has some more complicated dynamic data structure that makes > all the TYPECASE and ISTYPE operations much more cumbersome. It's all > in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a > heavy performance cost. And of course no one uses the "more" bit now. I'd like to, if I only knew how. I'd be really interested in having the low-level infrastructure for JIT code generators. -- hendrik From rcolebur at SCIRES.COM Thu Jun 7 23:44:56 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Thu, 7 Jun 2012 17:44:56 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com> References: <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com> Message-ID: Daniel: I'm impressed by your ability to provide so many different research links in your posts. But, after looking at the link you gave in response to my post, I don't see the immediate relevance to my question regarding Modula-3 threading on Windows. Also, I'm sorry, but I have a very difficult time trying to understand what you are saying in your posts. I suppose it must have something to do with the translation between our different languages. Forgive me, but I don't understand your reply. --Randy Coleburn From: Daniel Alejandro Benavides D. [mailto:dabenavidesd at yahoo.es] Sent: Thursday, June 07, 2012 3:43 PM To: m3devel at elegosoft.com; Coleburn, Randy Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi all: Yes, it is, but the same conditioning over System Pthreads, is that you can't always link the threads against themselves, so you need re-implement it correctly. Good style DEC-SRC threads might be along the verification project for the Alpha with Vector extensions: http://barroso.org/publications/piranha_asilomar.pdf Thanks in advance --- El jue, 7/6/12, Coleburn, Randy > escribi?: De: Coleburn, Randy > Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" > Fecha: jueves, 7 de junio, 2012 11:52 Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7). What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated. Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things. In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example. In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know). The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so. Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time. There is simply a static array of the types. CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome. It's all in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a heavy performance cost. And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation. Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job". I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit. I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads. Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code. I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong. Life's too short... Mika P.S. how are the pthreads coming along? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > = -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 00:01:50 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 00:01:50 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607203537.1CBC81A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> <20120607203537.1CBC81A205B@async.async.caltech.edu> Message-ID: <1495A062-1210-4D07-815C-D3609442C51B@m3w.org> I've worked with both runtimes at this level (but not lately). And I can't think of one reason why this would be correct. (It does not make me right, I know:). Structures are equivalent, IIRC, primary difference being in algorithm. Incremental RTLinker operation results in possible reallocation of type structures (bottom of the world, you-are-the-wizard-if you-read-this), but they are still "static" for the most of (99.999..%) process lifetime. Question is important and I am sure it is fixable, if only we can identify problem here. There is nothing inherent to ability for dynamic loading demanding bad data structures at the botom of M3 world. Only (not-improbable) sub-optimal decisions made by cmass people at the moment. On Jun 7, 2012, at 10:35 PM, Mika Nystrom wrote: > Sorry, "static" was (slightly) the wrong word. > > I believe they are malloced as an array during program startup. There is > something significant about the ordering of this array, which is why you > can't just add types to the PM3 environment during runtime. CM3 uses > more indirection, so it's much easier to add things while running, > but it also makes TYPECASE, ISTYPE, etc., slower. Possibly NARROW > (explicit as well as implicit) as well... > > Mika -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Fri Jun 8 00:23:11 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Thu, 07 Jun 2012 15:23:11 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <1495A062-1210-4D07-815C-D3609442C51B@m3w.org> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> <20120607203537.1CBC81A205B@async.async.caltech.edu> <1495A062-1210-4D07-815C-D3609442C51B@m3w.org> Message-ID: <20120607222311.E35A71A205B@async.async.caltech.edu> Admittedly it's been a while since I looked at this. I think what's going on is that they used some sort of topological sorting in SRC M3, which was broken by Critical Mass. The reason for the slowdowns is clear if you study the following code for IsSubtype. PM3: PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN = VAR t := Get (b); BEGIN IF (a >= RT0u.nTypes) THEN BadType (a) END; IF (a = 0) THEN RETURN TRUE END; RETURN (t.typecode <= a AND a <= t.lastSubTypeTC); END IsSubtype; CM3: PROCEDURE IsSubtype (a, b: Typecode): BOOLEAN = VAR t: RT0.TypeDefn; BEGIN IF (a = RT0.NilTypecode) THEN RETURN TRUE END; t := Get (a); IF (t = NIL) THEN RETURN FALSE; END; IF (t.typecode = b) THEN RETURN TRUE END; WHILE (t.kind = ORD (TK.Obj)) DO IF (t.link_state = 0) THEN FinishTypecell (t, NIL); END; t := LOOPHOLE (t, RT0.ObjectTypeDefn).parent; IF (t = NIL) THEN RETURN FALSE; END; IF (t.typecode = b) THEN RETURN TRUE; END; END; IF (t.traced # 0) THEN RETURN (b = RT0.RefanyTypecode); ELSE RETURN (b = RT0.AddressTypecode); END; END IsSubtype; Now let's take a peek at Typecase (it is emitted by the compiler for SRC and P M3!)... PROCEDURE ScanTypecase (ref: REFANY; x: ADDRESS(*ARRAY [0..] OF Cell*)): INTEGER = VAR p: UNTRACED REF TypecaseCell; i: INTEGER; tc, xc: Typecode; BEGIN IF (ref = NIL) THEN RETURN 0; END; tc := TYPECODE (ref); p := x; i := 0; LOOP IF (p.uid = 0) THEN RETURN i; END; IF (p.defn = NIL) THEN p.defn := FindType (p.uid); IF (p.defn = NIL) THEN Fail (RTE.MissingType, RTModule.FromDataAddress(x), LOOPHOLE (p.uid, ADDRESS), NIL); END; END; xc := LOOPHOLE (p.defn, RT0.TypeDefn).typecode; IF (tc = xc) OR IsSubtype (tc, xc) THEN RETURN i; END; INC (p, ADRSIZE (p^)); INC (i); END; END ScanTypecase; Mika =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > >--Apple-Mail=_37D33CA3-82A5-4037-BFBC-40CEE6E0DADD >Content-Transfer-Encoding: quoted-printable >Content-Type: text/plain; > charset=us-ascii > >I've worked with both runtimes at this level (but not lately). And I = >can't think of one reason why this would be correct. (It does not make = >me right, I know:). Structures are equivalent, IIRC, primary difference = >being in algorithm. Incremental RTLinker operation results in possible = >reallocation of type structures (bottom of the world, = >you-are-the-wizard-if you-read-this), but they are still "static" for = >the most of (99.999..%) process lifetime. > >Question is important and I am sure it is fixable, if only we can = >identify problem here. There is nothing inherent to ability for dynamic = >loading demanding bad data structures at the botom of M3 world. Only = >(not-improbable) sub-optimal decisions made by cmass people at the = >moment.=20 > >On Jun 7, 2012, at 10:35 PM, Mika Nystrom wrote: > >> Sorry, "static" was (slightly) the wrong word. >>=20 >> I believe they are malloced as an array during program startup. There = >is >> something significant about the ordering of this array, which is why = >you >> can't just add types to the PM3 environment during runtime. CM3 uses >> more indirection, so it's much easier to add things while running, >> but it also makes TYPECASE, ISTYPE, etc., slower. Possibly NARROW >> (explicit as well as implicit) as well... >>=20 >> Mika > > >--Apple-Mail=_37D33CA3-82A5-4037-BFBC-40CEE6E0DADD >Content-Transfer-Encoding: quoted-printable >Content-Type: text/html; > charset=us-ascii > >-webkit-nbsp-mode: space; -webkit-line-break: after-white-space; ">I've = >worked with both runtimes at this level (but not lately). And I can't = >think of one reason why this would be correct. (It does not make me = >right, I know:). Structures are equivalent, IIRC, primary difference = >being in algorithm. Incremental RTLinker operation results in possible = >reallocation of type structures (bottom of the world, = >you-are-the-wizard-if you-read-this), but they are still "static" for = >the most of (99.999..%) process lifetime.

Question is = >important and I am sure it is fixable, if only we can identify problem = >here. There is nothing inherent to ability for dynamic loading demanding = >bad data structures at the botom of M3 world. Only (not-improbable) = >sub-optimal decisions made by cmass people at the = >moment. 

On Jun 7, 2012, at 10:35 PM, Mika = >Nystrom wrote:

type=3D"cite">separate; font-family: Helvetica; font-style: normal; font-variant: = >normal; font-weight: normal; letter-spacing: normal; line-height: = >normal; orphans: 2; text-align: -webkit-auto; text-indent: 0px; = >text-transform: none; white-space: normal; widows: 2; word-spacing: 0px; = >-webkit-border-horizontal-spacing: 0px; -webkit-border-vertical-spacing: = >0px; -webkit-text-decorations-in-effect: none; -webkit-text-size-adjust: = >auto; -webkit-text-stroke-width: 0px; font-size: medium; ">Sorry, = >"static" was (slightly) the wrong word.

I believe they are = >malloced as an array during program startup.  There is
something = >significant about the ordering of this array, which is why you
can't = >just add types to the PM3 environment during runtime.  CM3 = >uses
more indirection, so it's much easier to add things while = >running,
but it also makes TYPECASE, ISTYPE, etc., slower. = > Possibly NARROW
(explicit as well as implicit) as = >well...

    Mika

<= >/div>
= > >--Apple-Mail=_37D33CA3-82A5-4037-BFBC-40CEE6E0DADD-- From jay.krell at cornell.edu Fri Jun 8 01:18:51 2012 From: jay.krell at cornell.edu (Jay) Date: Thu, 7 Jun 2012 16:18:51 -0700 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> Message-ID: Actually what I showed is frequently wrong. We often use bitfield references, which seems wierd or wrong, but seems to work generally ok and produce better code. The RIGHT thing to use would be "component refs" in gcc parlance, but currently we don't and it isn't a small change. There is kind of a mismatch in the compiler architecture currently... - Jay (briefly/pocket-sized-computer-aka-phone) On Jun 6, 2012, at 3:17 AM, Dragi?a Duri? wrote: > I know that much about generated code :). > > "Good" thing is - not many things changed in *m3 backend since I ported pm3 to LINUX_ALPHA :) > > On Jun 6, 2012, at 11:42 AM, Jay K wrote: > >> >>> Functions that call setjmp >> >> >> I meant -- functions wtih TRY/EXCEPT or TRY/FINALLY. :) >> >> - Jay >> >> ---------------------------------------- >>> From: jay.krell at cornell.edu >>> To: dragisha at m3w.org >>> Date: Wed, 6 Jun 2012 09:38:18 +0000 >>> CC: jkrell at elego.de; m3devel at elegosoft.com >>> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >>> >>> >>> 5.8.6 does allow many optimizations to occur. >>> We turn off a very small number directly. >>> Functions that call setjmp have optimizations inhibited by declaring all locals volatile. >>> We don't give the compiler good type information, and we take the address of stuff more than necessary, by >>> generating very low level code. >>> Where you have e.g. >>> MODULE Foo; >>> TYPE Point = RECORD x,y:INTEGER END; >>> PROCEDURE GetY(VAR pt:Point):INTEGER = BEGIN RETURN pt.y; END GetY; >>> >>> >>> We generate the equivalent of: >>> >>> >>> typedef ptrdiff_t INTEGER; >>> typedef char* ADDRESS; >>> INTEGER Foo_GetY(ADDRESS pt) { return *(INTEGER*)(pt + sizeof(INTEGER)); } >>> >>> >>> Maybe I'll wrap up 4.6, not enable it, and move on to 4.7.. >>> >>> >>> >>> - Jay >>> >>> >>> ________________________________ >>>> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >>>> From: dragisha at m3w.org >>>> Date: Wed, 6 Jun 2012 10:51:33 +0200 >>>> CC: jkrell at elego.de; m3devel at elegosoft.com >>>> To: jay.krell at cornell.edu >>>> >>>> I am using it, and I need it. >>>> >>>> Does it run better/faster? I didn't test, but is it something to even >>>> ask, these days, architectures, ? ? >>>> >>>> Only if you turned everything off in 5.8.6 and later, as you'r doing it >>>> now, then probably my "-O2" default it is of no benefit at all :). >>>> >>>> Generally, our "pitch" to "sell" >>>> super-modern-ultra-blast-mega-fast-superlative-OO and everything else >>>> you only dreamed about? And add "no CPU optimizations"? Imagine that. >>>> >>>> On Jun 6, 2012, at 10:10 AM, Jay K wrote: >>>> >>>> 7) Do folks out there really use the Modula-3/gcc optimizer, and notice >>>> it produces code that runs much faster? >>>> >>> >> > From dabenavidesd at yahoo.es Fri Jun 8 01:21:47 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 8 Jun 2012 00:21:47 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: Message-ID: <1339111307.87279.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: perhaps this would show it: Again, what I'm saying is that you can use a WinNT system thread without losing M3 semantics as long as is implemented as it is in the consistent Model Architecture of the system: http://research.microsoft.com/en-us/um/people/qadeer/talks/microsoft-dec00.ppt Recently a guy from Intel (Rick Hudson) explained and out his? thoughts on that (but I find the same problem I can't understand the problem his is talking about that much). Rialto NT OS was implemented along the lines for embedded devices (nice!): http://www.youtube.com/watch?v=WUfvvFD5tAA DEC-SRC and MS worked together on this, in acting like so there was an Alpha "beta" Win2000, but it didn't happen, as the piranha project :( See this new architectures don't scale for that much they say (sorry HW guys, but show me a good proof I'm writing this from nothing related to it) Thanks in advance --- El jue, 7/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" Fecha: jueves, 7 de junio, 2012 16:44 Daniel: ?I?m impressed by your ability to provide so many different research links in your posts. ?But, after looking at the link you gave in response to my post, I don?t see the immediate relevance to my question regarding Modula-3 threading on Windows. ?Also, I?m sorry, but I have a very difficult time trying to understand what you are saying in your posts. I suppose it must have something to do with the translation between our different languages. Forgive me, but I don?t understand your reply. ?--Randy Coleburn ?From: Daniel Alejandro Benavides D. [mailto:dabenavidesd at yahoo.es] Sent: Thursday, June 07, 2012 3:43 PM To: m3devel at elegosoft.com; Coleburn, Randy Subject: Re: [M3devel] [M3commit] CVS Update: cm3 ?Hi all: Yes, it is, but the same conditioning over System Pthreads, is that you can't always link the threads against themselves, so you need re-implement it correctly. Good style DEC-SRC threads might be along the verification project for the Alpha with Vector extensions: http://barroso.org/publications/piranha_asilomar.pdf Thanks in advance --- El jue, 7/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" Fecha: jueves, 7 de junio, 2012 11:52Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7).? What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated.? Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things.? In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example.? In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know).? The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so.? Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time.? There is simply a static array of the types.? CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome.? It's all in RT0 somewhere.? In short, CM3 does "more" than SRC M3 did but at a heavy performance cost.? And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation.? Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job".? I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit.? I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads.? Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code.? I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong.? Life's too short... ? ???Mika P.S. how are the pthreads coming along?? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now?? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > ??? ???????? ?????? ??? ? = ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 04:05:11 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 02:05:11 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, Message-ID: 1. Yes, Daniel generally doesn't make sense to me either. 2. > Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Yes.Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do.Definitely better than others e.g. Boost. At some point maybe we could use the condition variables that Vista introduces, but 1) I'm reluctant to drop 2000/XP/etc. support and 2) if we implemented something that chose one implementation or the other at runtime, we'd lose coverage of the pre-Vista code. (I'm really disappointed in this area in Win32, that NT 3.1 and Windows 95 didn't have small locks, zero-or-at-least-statically-initializable locks, read/write locks, "once", and condition variables. Vista, finally, has all that. (SRWLOCK are the first three all in one -- small, zero-initialized, read/write...and given them, I'm not sure you really need "once".) Also note that historically we maintained a thread pool, so /creating/ a Modula-3 thread did not necessarily create a Win32 thread. I removed that though, so the implementation is more direct now, albeit probably slower. I didn't realize or forgot we had a problem here. I can try to look into it. The Win32 and pthreads implementation is similar enough, that it might easily be the same problem. - Jay From: rcolebur at SCIRES.COM To: m3devel at elegosoft.com Date: Thu, 7 Jun 2012 17:44:56 -0400 Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Daniel: I?m impressed by your ability to provide so many different research links in your posts. But, after looking at the link you gave in response to my post, I don?t see the immediate relevance to my question regarding Modula-3 threading on Windows. Also, I?m sorry, but I have a very difficult time trying to understand what you are saying in your posts. I suppose it must have something to do with the translation between our different languages. Forgive me, but I don?t understand your reply. --Randy Coleburn From: Daniel Alejandro Benavides D. [mailto:dabenavidesd at yahoo.es] Sent: Thursday, June 07, 2012 3:43 PM To: m3devel at elegosoft.com; Coleburn, Randy Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi all: Yes, it is, but the same conditioning over System Pthreads, is that you can't always link the threads against themselves, so you need re-implement it correctly. Good style DEC-SRC threads might be along the verification project for the Alpha with Vector extensions: http://barroso.org/publications/piranha_asilomar.pdf Thanks in advance --- El jue, 7/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel at elegosoft.com" Fecha: jueves, 7 de junio, 2012 11:52Mika: I concur with what you are saying about needing a way to retain the good ideas in CM3 without sacrificing so much on performance. As far as the thread test program goes, it still shows the implementation is broken somehow on Windows (2000, XP, & 7). What can I do to help debug and solve this problem? Am I correct that on Windows, Modula-3 threads are supposed to map to OS (Windows) threads? Regards, Randy Coleburn -----Original Message----- From: Mika Nystrom [mailto:mika at async.caltech.edu] Sent: Thursday, June 07, 2012 12:37 PM To: Jay K Cc: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 Hi Jay, TYPECASE is limited to "reference" types, which effectively means heap-allocated. Unless you can get alloca in there, I suppose... what I mean is that in Green Book Modula-3 the only way to get a reference type is either through a heap allocation or an UNSAFE operation. TYPECASE is sometimes the only way to do things. In the Green Book there are examples of using subtyping to have multiple generations of objects in the same pickles, for example. In my program, it was inside an interpreter that's figuring things out without any prior type information, using ISTYPE or TYPECASE. The issue with TYPECASE that I brought up is actually that the implementation of TYPECASE and ISTYPE is far slower in the CM3 m3core than in PM3's (= SRC M3 as far as I know). The reason (which you allude to) is that Critical Mass did a lot of work on supporting dynamic loading of Modula-3 code (loading in types not known at compile time) and as with many of the other projects they carried out, the code quality was so-so. Because of the restrictions of SRC and P M3, types are statically allocated at compile time and all their subtyping relationships are known at that time. There is simply a static array of the types. CM3, on the other hand, has some more complicated dynamic data structure that makes all the TYPECASE and ISTYPE operations much more cumbersome. It's all in RT0 somewhere. In short, CM3 does "more" than SRC M3 did but at a heavy performance cost. And of course no one uses the "more" bit now. Kind of like what they did to TEXTs... good ideas for some users, but somewhat half-baked implementation. Given that dynamic loading is used so little, if at all, and it in any case only happens infrequently itself, it seems there ought to be a way to achieve what the CM3 guys were trying to do while retaining the performance of the older implementation, but not if your code is a "rush job". I think it would have been sensible to vet Critical Mass's code a bit better before switching from PM3 to CM3 for the "official" distribution of Modula-3. I still use PM3 quite a bit. I can no longer blame the TEXTs, nor can I blame the pthreads implementation's being broken since I use CM3 with user threads. Now it's mainly because m3gdb works great on FreeBSD-5.5 with PM3-generated code. I've tried so many times to get things working on other machines with CM3 and newer m3gdb and there's always something annoyingly wrong. Life's too short... Mika P.S. how are the pthreads coming along? I saw some checkins (Dragisa), does the thread tester run without hanging or crashing now? I'd love to use pthreads but it's not been high on my list to debug as long as I can live with user threads... Jay K writes: > >Daniel=2C I can't find the email now=2C as usual=2C you are probably wrong. > > >We don't have an older runtime=2C we have a newer one=2C I think. >With more allowance for dynamic loading. > > >Mika=2C >Maybe a TYPECASE-intense design is generally poor? >dynamic_cast is slow in some C++ implementations. >And I've never seen it used much. Some=2C but not much. >The "type matching" that C++ exception handling has to do isn't >particularly fast=2C though there are other costs there. >Other than the stack walk=2C there is "finding the base of the >object"=2C and strcmp to do the actual type match -- >name-based-type-equality and all that=2C with a hope that it suffices >and no runtime checking of type hashes like Modula-3 does.. > > >Maybe you should switch on your own type tag? >=A0 But I guess Modula-3 doesn't have unions. >Or use OBJECT and method calls? > > >Which reminds me...it bothers me that OBJECT requires heap allocation >and garbage collection. It shouldn't require either. >I know we have function pointers available to simulate it=2C without >heap allocation=2C but what I don't know=2C is if the "implicit dow= >ncast" >in a virtual function/method call is doable in safe code or not. >I'll have to look into it..but I'm busy now.. > > >Maybe there is an optimization whereby the compiler can figure out that >there is a small set of likely types that it could check first? > > >Or maybe the full feature could be implemented more efficiently? > > >Maybe it can be optimized based on the fact that the types known to the >system are read-mostly=2C rarely written/appended? > > >I don't know. >I'd really have to look into what the language supports and how it is >implemented. I'm not certain of either. > > >In C++=2C typeid() is fast=2C and requires there be virtual functions >(OBJECT). Is TYPECASE limited to OBJECTs? >Or heap allocated data? > > >Later.. >=A0- Jay > > > > > > >---------------------------------------- >> To: jay.krell at cornell.edu >> CC: m3devel at elegosoft.com >> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 >> Date: Wed=2C 6 Jun 2012 09:18:08 -0700 >> From: mika at async.caltech.edu >> >> Jay K writes: >> > >> ... >> >7) Do folks out there really use the Modula-3/gcc optimizer=3D2C and >> >not= >ice i=3D >> >t produces code that runs much faster? >> >> If we are talking about turning on optimizations in the m3makefile=2C >> the= >n the >> answer is: >> >> Yes! At least with CM3 it makes a huge difference in runtime. Without >> the optimizer CM3-produced code runs far slower than PM3-produced >> code (I've seen 3X I think.) With it=2C CM3 can sometimes keep up. >> Unless you use a lot of TYPECASE or other constructs that have a much >> less efficient implementation in the CM3 libraries than in the PM3 libraries. >> >> Mika > = -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 04:13:02 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 02:13:02 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607211135.GA6314@topoi.pooq.com> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , <20120606161808.7F5EA1A205B@async.async.caltech.edu>, , <20120607163641.A81351A205B@async.async.caltech.edu>, <20120607211135.GA6314@topoi.pooq.com> Message-ID: > I'd like to, if I only knew how. I'd be really interested in having the > low-level infrastructure for JIT code generators Would you be satisfied with a Modula-3 interpreter that interpreted a mostly-compiled form?It shouldn't be difficult.I don't know if our intermediate code was designed with interpretation in mind, but it seemslike it wouldn't be particularly difficult.You'd want a "linker" that just zips all the files and puts it "in" or "next to" the stub executable. This would solve the distribution format problem, partly.The existing intermediate code is platform-specific, but not by much (again: jumpbuf size, word size, endian,win32 vs. posix). But I have to admit, I'm keener on generating C than a JIT or an interpreter, andinterpreter is not JIT. Um. What do you hope to gain from JIT?A big reason I ask..is because..well, do you want to ship some portable-executable that relieson JIT being already installed/available? Or do you want to carry the JITer and its code together?Or do you want to target an existing widely deployed JITer such as CLR or Java? In my opinion, the biggest advantage of JIT is portable-executable, depending on widely deployed JITer.But targeting CLR or Java isn't as easy as targeting your own custom thing. I understand there are other advantages -- faster compilation, optimization very specific to runtime environment.But I think portable-executable is most important. That's why I like "script". :)There are disadvantages to JIT: slower execution/startup, maybe harder to debug, easy to reverse engineer (if you care). Heck, at some point you just ship the compiler and portable-executable is source code.There are pluses and minuses all around. - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 04:19:57 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 02:19:57 +0000 Subject: [M3devel] gcc 4.6 backend w/o optimizer? Message-ID: I need to know if I can start moving targets up to a gcc 4.6 backend, given that I've removed the vast majority of the optimizer from it.I will test some of the targets, maybe not all of them. So far I386_DARWIN and AMD64_DARWIN work and I built boot/cross archives for a very large list, and I can run cm3 on Solaris also (I forget which architectures, there are 4, probably SPARC32 at least). Or if there is vehement rejection of a missing optimizer, I can abandon 4.6 and start work on 4.7 instead.I get tired of the unnecessary tedium that I invented, so with 4.7, I'll try to keep the diff small, in particular: keep the gmp/mpfr/mpc dependencies don't compile it with C++ (except parse.c) There is no longer a "core" distribution of gcc, but I'll still cut out vast swaths like all but the C and LTO frontends (Java, C++, Objective C, Objective C++, Fortran, Ada), all of the libraries (libjava, libada, libssp, libmudflap, libgfortran, libquadmath, libgcc, libstdc++, etc.) I know I have one rejection of this but that might not be enough. Tony? - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 09:15:59 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 09:15:59 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120607222311.E35A71A205B@async.async.caltech.edu> References: <20120606064732.2C9242474003@birch.elegosoft.com>, <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> , <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <166377B4-8CC2-4415-A08E-0655E75227A4@m3w.org> <20120607203537.1CBC81A205B@async.async.caltech.edu> <1495A062-1210-4D07-815C-D3609442C51B@m3w.org> <20120607222311.E35A71A205B@async.async.caltech.edu> Message-ID: <77A59CCB-0800-4C3C-8AF6-5B455B29DEF7@m3w.org> Thank you for effort. Possible solution is to map typecodes to orderable id's and re-sort every time dynamic loader changes type metadata. Any takers? That way, we will only add one to two array lookups to every TYPECASE invocation. Additional complexity for re-sort is single to small number of invocations. On Jun 8, 2012, at 12:23 AM, Mika Nystrom wrote: > The reason for the slowdowns is clear if you study the following code > for IsSubtype. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 10:06:49 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 10:06:49 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, Message-ID: Please explain this more, and if you can - draw parallel to *nix. TIA On Jun 8, 2012, at 4:05 AM, Jay K wrote: > Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > > > I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > Definitely better than others e.g. Boost. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 11:23:38 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 09:23:38 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , Message-ID: sorry -- clarification, we are similar to the widely used Sun/Oracle JVM.Not necessarily state-of-the-art, but not bad. Our locks map pretty directly to underlying pthread mutex, Win32 critical section.Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. Our condition variable functionaliy maps pretty directly to pthread condition variables.Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation.It was pretty bad. Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. - Jay> Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > From: antony.hosking at gmail.com > Date: Fri, 8 Jun 2012 04:38:20 -0400 > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > To: dragisha at m3w.org > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > Please explain this more, and if you can - draw parallel to *nix. > > > > TIA > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > >> > >> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > >> Definitely better than others e.g. Boost. > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > - Tony > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 12:38:30 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 12:38:30 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> Message-ID: <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. Same thing? On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote: > My point is that modern JVMs, including Sun/Oracle HotSpot, don?t map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. > > On Jun 8, 2012, at 5:23 AM, Jay K wrote: > >> sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. >> Not necessarily state-of-the-art, but not bad. >> >> >> Our locks map pretty directly to underlying pthread mutex, Win32 critical section. >> Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. >> >> >> Our condition variable functionaliy maps pretty directly to pthread condition variables. >> Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. >> >> >> Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. >> It was pretty bad. >> >> >> Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. >> >> >> - Jay >> > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) >> > From: antony.hosking at gmail.com >> > Date: Fri, 8 Jun 2012 04:38:20 -0400 >> > CC: jay.krell at cornell.edu; m3devel at elegosoft.com >> > To: dragisha at m3w.org >> > >> > >> > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: >> > >> > > Please explain this more, and if you can - draw parallel to *nix. >> > > >> > > TIA >> > > >> > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: >> > > >> > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. >> > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) >> > >> >> > >> >> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. >> > >> Definitely better than others e.g. Boost. >> > >> > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. >> > >> > - Tony >> > > > > > Antony Hosking | Associate Professor | Computer Science | Purdue University > 305 N. University Street | West Lafayette | IN 47907 | USA > Mobile +1 765 427 5484 > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 8 12:48:47 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 8 Jun 2012 12:48:47 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> Message-ID: <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org> On Jun 8, 2012, at 12:38 PM, Dragi?a Duri? wrote: > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. > > Same thing? > Meaning: "At least under Linux, Modula-3 using pthreads does same thing as modern JVMs?" > On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote: > >> My point is that modern JVMs, including Sun/Oracle HotSpot, don?t map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. >> >> On Jun 8, 2012, at 5:23 AM, Jay K wrote: >> >>> sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. >>> Not necessarily state-of-the-art, but not bad. >>> >>> >>> Our locks map pretty directly to underlying pthread mutex, Win32 critical section. >>> Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. >>> >>> >>> Our condition variable functionaliy maps pretty directly to pthread condition variables. >>> Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. >>> >>> >>> Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. >>> It was pretty bad. >>> >>> >>> Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. >>> >>> >>> - Jay >>> > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) >>> > From: antony.hosking at gmail.com >>> > Date: Fri, 8 Jun 2012 04:38:20 -0400 >>> > CC: jay.krell at cornell.edu; m3devel at elegosoft.com >>> > To: dragisha at m3w.org >>> > >>> > >>> > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: >>> > >>> > > Please explain this more, and if you can - draw parallel to *nix. >>> > > >>> > > TIA >>> > > >>> > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: >>> > > >>> > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. >>> > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) >>> > >> >>> > >> >>> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. >>> > >> Definitely better than others e.g. Boost. >>> > >>> > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. >>> > >>> > - Tony >>> > >> >> >> >> Antony Hosking | Associate Professor | Computer Science | Purdue University >> 305 N. University Street | West Lafayette | IN 47907 | USA >> Mobile +1 765 427 5484 >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hosking at cs.purdue.edu Fri Jun 8 12:25:20 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 8 Jun 2012 06:25:20 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , Message-ID: <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> My point is that modern JVMs, including Sun/Oracle HotSpot, don?t map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. On Jun 8, 2012, at 5:23 AM, Jay K wrote: > sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. > Not necessarily state-of-the-art, but not bad. > > > Our locks map pretty directly to underlying pthread mutex, Win32 critical section. > Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. > > > Our condition variable functionaliy maps pretty directly to pthread condition variables. > Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. > > > Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. > It was pretty bad. > > > Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. > > > - Jay > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > > From: antony.hosking at gmail.com > > Date: Fri, 8 Jun 2012 04:38:20 -0400 > > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > > To: dragisha at m3w.org > > > > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > > > Please explain this more, and if you can - draw parallel to *nix. > > > > > > TIA > > > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > > >> > > >> > > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > > >> Definitely better than others e.g. Boost. > > > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > > > - Tony > > Antony Hosking | Associate Professor | Computer Science | Purdue University 305 N. University Street | West Lafayette | IN 47907 | USA Mobile +1 765 427 5484 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 13:20:56 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 11:20:56 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org> References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org>, <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org> Message-ID: > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. So is uncontended Win32 critical section and uncontended Win32 SRWLOCK. Just disassemble and/or step through them... Mutex/Semaphore/Event, those always go to the kernel, unfortunately. So our win32 condition-variable-ish stuff might, I have to check. It'd be unfortunate, but it still probably as good as it can be, short of depending on Vista. (Uncontended Vista+ condition variables surely don't involve the kernel either.) The CAS isn't inlined. There is a function call. A dynamically linked one, so at least on Win32, it goes through a function pointer, but other than inlining factors, it can still be very fast. It can bias to a thread, and such. But there will be a function call. - Jay Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) From: dragisha at m3w.org Date: Fri, 8 Jun 2012 12:48:47 +0200 CC: m3devel at elegosoft.com; jay.krell at cornell.edu To: hosking at cs.purdue.edu On Jun 8, 2012, at 12:38 PM, Dragi?a Duri? wrote:At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. Same thing? Meaning: "At least under Linux, Modula-3 using pthreads does same thing as modern JVMs?" On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote:My point is that modern JVMs, including Sun/Oracle HotSpot, don't map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. On Jun 8, 2012, at 5:23 AM, Jay K wrote:sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. Not necessarily state-of-the-art, but not bad. Our locks map pretty directly to underlying pthread mutex, Win32 critical section. Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. Our condition variable functionaliy maps pretty directly to pthread condition variables. Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. It was pretty bad. Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. - Jay > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > From: antony.hosking at gmail.com > Date: Fri, 8 Jun 2012 04:38:20 -0400 > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > To: dragisha at m3w.org > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > Please explain this more, and if you can - draw parallel to *nix. > > > > TIA > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > >> > >> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > >> Definitely better than others e.g. Boost. > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > - Tony > Antony Hosking | Associate Professor | Computer Science | Purdue University305 N. University Street | West Lafayette | IN 47907 | USAMobile +1 765 427 5484 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Fri Jun 8 13:50:14 2012 From: jay.krell at cornell.edu (Jay K) Date: Fri, 8 Jun 2012 11:50:14 +0000 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , , , , , , , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu>, <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org>, , <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org>, Message-ID: I don't fully understand the paper, but clearly people want to both avoid the function call, and the CAS. And clearly this is viable and often profitable -- often times locks are only ever acquired by one thread, or are locked many times by one thread, then many times by another, etc. The tricky part is adapting to determine which locks benefit, and handling the "transitions" (or "bias revocation") when a "second" thread does acquire the lock. Traditional C/C++ systems are always going to have the function call.Whether or not the CAS can be optimized away in such "unmanaged" systems, I don't know.For example, Win32 SRWLOCKs have no "cleanup" function, nor a required "initialize" function, so that might limit the flexibility of the implementation, though certainly is also advantageous.. - Jay From: jay.krell at cornell.edu To: dragisha at m3w.org; hosking at cs.purdue.edu Date: Fri, 8 Jun 2012 11:20:56 +0000 CC: m3devel at elegosoft.com Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. So is uncontended Win32 critical section and uncontended Win32 SRWLOCK. Just disassemble and/or step through them... Mutex/Semaphore/Event, those always go to the kernel, unfortunately. So our win32 condition-variable-ish stuff might, I have to check. It'd be unfortunate, but it still probably as good as it can be, short of depending on Vista. (Uncontended Vista+ condition variables surely don't involve the kernel either.) The CAS isn't inlined. There is a function call. A dynamically linked one, so at least on Win32, it goes through a function pointer, but other than inlining factors, it can still be very fast. It can bias to a thread, and such. But there will be a function call. - Jay Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) From: dragisha at m3w.org Date: Fri, 8 Jun 2012 12:48:47 +0200 CC: m3devel at elegosoft.com; jay.krell at cornell.edu To: hosking at cs.purdue.edu On Jun 8, 2012, at 12:38 PM, Dragi?a Duri? wrote:At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. Same thing? Meaning: "At least under Linux, Modula-3 using pthreads does same thing as modern JVMs?" On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote:My point is that modern JVMs, including Sun/Oracle HotSpot, don't map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. On Jun 8, 2012, at 5:23 AM, Jay K wrote:sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. Not necessarily state-of-the-art, but not bad. Our locks map pretty directly to underlying pthread mutex, Win 32 critical section. Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. Our condition variable functionaliy maps pretty directly to pthread condition variables. Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. It was pretty bad. Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the va st majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. - Jay > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > From: antony.hosking at gmail.com > Date: Fri, 8 Jun 2012 04:38:20 -0400 > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > To: dragisha at m3w.org > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > Please explain this more, and if you can - draw parallel to *nix. > > > > TIA > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > >> > >> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > >> Definitely better than others e.g. Boost. > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > - Tony > Antony Hosking | Associate Professor | Computer Science | Purdue University305 N. University Street | West Lafayette | IN 47907 | USAMobile +1 765 427 5484 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hosking at cs.purdue.edu Fri Jun 8 16:40:35 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 8 Jun 2012 10:40:35 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: References: , , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , , , , , , , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu>, <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org>, , <39FC6EBD-FA8B-4C5F-89CC-E83986C0E01E@m3w.org>, Message-ID: <9A392836-2304-4D12-BB87-78A01C7391DF@cs.purdue.edu> Right. On Jun 8, 2012, at 7:50 AM, Jay K wrote: > I don't fully understand the paper, but clearly people want to both avoid the function call, and the CAS. > And clearly this is viable and often profitable -- often times locks are only ever acquired by one thread, or are locked many times by one thread, then many times by another, etc. The tricky part is adapting to determine which locks benefit, and handling the "transitions" (or "bias revocation") when a "second" thread does acquire the lock. > > > Traditional C/C++ systems are always going to have the function call. > Whether or not the CAS can be optimized away in such "unmanaged" systems, I don't know. > For example, Win32 SRWLOCKs have no "cleanup" function, nor a required "initialize" function, so that might limit the flexibility of the implementation, though certainly is also advantageous.. > > > - Jay > > From: jay.krell at cornell.edu > To: dragisha at m3w.org; hosking at cs.purdue.edu > Date: Fri, 8 Jun 2012 11:20:56 +0000 > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > > > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. > > > So is uncontended Win32 critical section and uncontended Win32 SRWLOCK. Just disassemble and/or step through them... > Mutex/Semaphore/Event, those always go to the kernel, unfortunately. > So our win32 condition-variable-ish stuff might, I have to check. It'd be unfortunate, but it still probably as good as it can be, short of depending on Vista. (Uncontended Vista+ condition variables surely don't involve the kernel either.) > > > The CAS isn't inlined. There is a function call. A dynamically linked one, so at least on Win32, it goes through a function pointer, but other than inlining factors, it can still be very fast. It can bias to a thread, and such. But there will be a function call. > > > - Jay > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > From: dragisha at m3w.org > Date: Fri, 8 Jun 2012 12:48:47 +0200 > CC: m3devel at elegosoft.com; jay.krell at cornell.edu > To: hosking at cs.purdue.edu > > > On Jun 8, 2012, at 12:38 PM, Dragi?a Duri? wrote: > > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. > > Same thing? > > > Meaning: "At least under Linux, Modula-3 using pthreads does same thing as modern JVMs?" > > > On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote: > > My point is that modern JVMs, including Sun/Oracle HotSpot, don't map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. > > On Jun 8, 2012, at 5:23 AM, Jay K wrote: > > sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. > Not necessarily state-of-the-art, but not bad. > > > Our locks map pretty directly to underlying pthread mutex, Win 32 critical section. > Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. > > > Our condition variable functionaliy maps pretty directly to pthread condition variables. > Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. > > > Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. > It was pretty bad. > > > Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the va st majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. > > > - Jay > > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) > > From: antony.hosking at gmail.com > > Date: Fri, 8 Jun 2012 04:38:20 -0400 > > CC: jay.krell at cornell.edu; m3devel at elegosoft.com > > To: dragisha at m3w.org > > > > > > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: > > > > > Please explain this more, and if you can - draw parallel to *nix. > > > > > > TIA > > > > > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: > > > > > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. > > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) > > >> > > >> > > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. > > >> Definitely better than others e.g. Boost. > > > > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. > > > > - Tony > > > > > > Antony Hosking | Associate Professor | Computer Science | Purdue University > 305 N. University Street | West Lafayette | IN 47907 | USA > Mobile +1 765 427 5484 -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Fri Jun 8 16:55:40 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Fri, 8 Jun 2012 10:55:40 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> <20120606161808.7F5EA1A205B@async.async.caltech.edu> <20120607163641.A81351A205B@async.async.caltech.edu> <20120607211135.GA6314@topoi.pooq.com> Message-ID: <20120608145540.GA10805@topoi.pooq.com> On Fri, Jun 08, 2012 at 02:13:02AM +0000, Jay K wrote: > > > I'd like to, if I only knew how. I'd be really interested in having the > > low-level infrastructure for JIT code generators > Would you be satisfied with a Modula-3 interpreter that interpreted a > mostly-compiled form?It shouldn't be difficult. That would be lovely, for all the reasons and opportunitied you mentioned, but it's mostly orthogonal to what I want. I want to write JIT implementations for other languages, languages that have their own methods for defining data structures, but I want them to be interoperable with the Modula 3 I know and like. I don't mind writing a code generator or two, if necessary. But an interpreter would provide poratbility instead of efficiency. Having both could be useful. For example, I'd like to implement a formalism that enables me to download code from the net, formally verify its safety and then be able to execute it really fast. Yes, I might be comiling it all at once instead of a line at at time, but I do want to be able to add it to an existing running program, and saying "JIT" is about the easiest brief summary. I'm quite aware that doing more than a half-assed version of this would be a big project, and that's probably an understatement. > I don't know if our intermediate code was designed with interpretation > in mind, but it seems like it wouldn't be particularly difficult. > You'd want a "linker" that just zips all the files and puts it "in" or > "next to" the stub executable. This would solve the distribution > format problem, partly.The existing intermediate code is > platform-specific, but not by much (again: jumpbuf size, word size, > endian,win32 vs. posix). > But I have to admit, I'm keener on generating C than a JIT or an > interpreter, and interpreter is not JIT. > Um. What do you hope to gain from JIT? The ability to dynamically add code to an existing program and have it run fast. Possibly to have the program generate additional code to add to itself. > A big reason I ask..is > because..well, do you want to ship some portable-executable that > relieson JIT being already installed/available? Or do you want to > carry the JITer and its code together?Or do you want to target an > existing widely deployed JITer such as CLR or Java? In my opinion, > the biggest advantage of JIT is portable-executable, depending on > widely deployed JITer.But targeting CLR or Java isn't as easy as > targeting your own custom thing. I understand there are other > advantages -- faster compilation, optimization very specific to > runtime environment.But I think portable-executable is most important. > That's why I like "script". :)There are disadvantages to JIT: slower > execution/startup, maybe harder to debug, easy to reverse engineer (if > you care). Heck, at some point you just ship the compiler and > portable-executable is source code.There are pluses and minuses all > around. JIT is for speed. Otherwise, interpretation would suffice, and could even be portbale. But even an interpreter would like to be able to add new garbage-collectible types, which is what I'm asking for at the moment. - Jay From hosking at cs.purdue.edu Fri Jun 8 16:39:39 2012 From: hosking at cs.purdue.edu (Tony Hosking) Date: Fri, 8 Jun 2012 10:39:39 -0400 Subject: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) In-Reply-To: <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> References: , <1339098164.65798.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <8531351D-A02E-4635-971F-C96736810851@cs.purdue.edu> <07030E90-1842-47DB-8D68-86F154E05E0D@m3w.org> Message-ID: <7FF89030-927D-40C6-993D-DB44E88A35AD@cs.purdue.edu> Agreed, but we should be able to inline the CAS, avoiding a function call. On Jun 8, 2012, at 6:38 AM, Dragi?a Duri? wrote: > At least under Linux, uncontended access to futex is (IMHO) CAS based, user space operation. > > Same thing? > > On Jun 8, 2012, at 12:25 PM, Tony Hosking wrote: > >> My point is that modern JVMs, including Sun/Oracle HotSpot, don?t map every synchronized statement to an invocation of an underlying pthread or win32 lock. Instead, they use fast processor synchronization primitives like CAS compiled into the code to quickly "lock" an object in the vast majority of cases when no other thread is trying to lock the same object, without mapping to some pthread or win32 mutex. >> >> On Jun 8, 2012, at 5:23 AM, Jay K wrote: >> >>> sorry -- clarification, we are similar to the widely used Sun/Oracle JVM. >>> Not necessarily state-of-the-art, but not bad. >>> >>> >>> Our locks map pretty directly to underlying pthread mutex, Win32 critical section. >>> Maybe not 100% directly. Maybe we delay-heap-allocate-and-initialize, i.e. so lock declaration/creation is super cheap -- just leave room for a pointer -- but there is a small extra code per lock acquire/release. >>> >>> >>> Our condition variable functionaliy maps pretty directly to pthread condition variables. >>> Prior to Vista, there were no Win32 condition variables, but what we do is pretty good, better than many implementations out there (e.g. older Modula-3, Boost) and similar to widely used implementations, e.g. Sun/Oracle Java. In particular we do not have a giant lock for condition variable operations, which some literature says you need. >>> >>> >>> Historically the Win32 Modula-3 threading library had a giant lock to aid in condition variable implementation. >>> It was pretty bad. >>> >>> >>> Since pthread and Win32 are widely used, hopefully they are really good, and if not, will be improved for the vast majority of code to reuse. Tony, in your research, you should be sure to compare against Win32 SRWLOCK and newer versions of Windows (i.e. newer than XP). I'll try to read your paper. >>> >>> >>> - Jay >>> > Subject: Re: [M3devel] [M3commit] CVS Update: cm3 (windows condition variables) >>> > From: antony.hosking at gmail.com >>> > Date: Fri, 8 Jun 2012 04:38:20 -0400 >>> > CC: jay.krell at cornell.edu; m3devel at elegosoft.com >>> > To: dragisha at m3w.org >>> > >>> > >>> > On Jun 8, 2012, at 4:06 AM, Dragi?a Duri? wrote: >>> > >>> > > Please explain this more, and if you can - draw parallel to *nix. >>> > > >>> > > TIA >>> > > >>> > > On Jun 8, 2012, at 4:05 AM, Jay K wrote: >>> > > >>> > >> Note though that "LOCK" doesn't map directly to EnterCriticalSection and more significantly, m3core provides essentially condition variables, which don't map directly to Win32, prior to Vista. >>> > >> (LOCK at least does a delay-heap-allocation-and-initialize before EnterCriticalSection, but also interacts with the condition variables I recall.) >>> > >> >>> > >> >>> > >> I did a bunch of "research" and our condition variable substitution is pretty good now, equivalent to what Java implementations do. >>> > >> Definitely better than others e.g. Boost. >>> > >>> > We are certainly NOT equivalent to state-of-the-art Java implementations. Take a look at http://dx.doi.org/10.1145/2093157.2093184 for example. >>> > >>> > - Tony >>> > >> >> >> >> Antony Hosking | Associate Professor | Computer Science | Purdue University >> 305 N. University Street | West Lafayette | IN 47907 | USA >> Mobile +1 765 427 5484 >> >> >> >> >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jun 8 17:20:48 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 8 Jun 2012 16:20:48 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120608145540.GA10805@topoi.pooq.com> Message-ID: <1339168848.48067.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: interesting someone did that (see others in web search engine): http://compilers.iecc.com/comparch/article/98-03-247 Besides a partial JVM. It would be a selling point for CM3 to be readily implemented and efficient. Thanks in advance --- El vie, 8/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "m3devel" Fecha: viernes, 8 de junio, 2012 09:55 On Fri, Jun 08, 2012 at 02:13:02AM +0000, Jay K wrote: > >? > I'd like to, if I only knew how.? I'd be really interested in having the >? > low-level infrastructure for JIT code generators >? Would you be satisfied with a Modula-3 interpreter that interpreted a > mostly-compiled form?It shouldn't be difficult. That would be lovely, for all the reasons and opportunitied you mentioned, but it's mostly orthogonal to what I want. I want to write JIT implementations for other languages, languages that have their own methods for defining data structures, but I want them to be interoperable with the Modula 3 I know and like. I don't mind writing a code generator or two, if necessary.? But an interpreter would provide poratbility instead of efficiency.? Having both could be useful. For example, I'd like to implement a formalism that enables me to download code from the net, formally verify its safety and then be able to execute it really fast.? Yes, I might be comiling it all at once instead of a line at at time, but I do want to be able to add it to an existing running program, and saying "JIT" is about the easiest brief summary. I'm quite aware that doing more than a half-assed version of this would be a big project, and that's probably an understatement. ? > I don't know if our intermediate code was designed with interpretation > in mind, but it seems like it wouldn't be particularly difficult. > You'd want a "linker" that just zips all the files and puts it "in" or > "next to" the stub executable.? This would solve the distribution > format problem, partly.The existing intermediate code is > platform-specific, but not by much (again: jumpbuf size, word size, > endian,win32 vs. posix). > But I have to admit, I'm keener on generating C than a JIT or an > interpreter, and interpreter is not JIT. >? Um. What do you hope to gain from JIT? The ability to dynamically add code to an existing program and have it run fast.? Possibly to have the program generate additional code to add to itself. > A big reason I ask..is > because..well, do you want to ship some portable-executable that > relieson JIT being already installed/available? Or do you want to > carry the JITer and its code together?Or do you want to target an > existing widely deployed JITer such as CLR or Java?? In my opinion, > the biggest advantage of JIT is portable-executable, depending on > widely deployed JITer.But targeting CLR or Java isn't as easy as > targeting your own custom thing.? I understand there are other > advantages -- faster compilation, optimization very specific to > runtime environment.But I think portable-executable is most important. > That's why I like "script". :)There are disadvantages to JIT: slower > execution/startup, maybe harder to debug, easy to reverse engineer (if > you care).? Heck, at some point you just ship the compiler and > portable-executable is source code.There are pluses and minuses all > around. JIT is for speed.? Otherwise, interpretation would suffice, and could even be portbale.? But even an interpreter would like to be able to add new garbage-collectible types, which is what I'm asking for at the moment. ? ? - Jay ??? ???????? ?????? ??? ? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dmuysers at hotmail.com Fri Jun 8 17:37:04 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Fri, 8 Jun 2012 17:37:04 +0200 Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: <20120608145540.GA10805@topoi.pooq.com> References: <20120606064732.2C9242474003@birch.elegosoft.com><55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org><20120606161808.7F5EA1A205B@async.async.caltech.edu><20120607163641.A81351A205B@async.async.caltech.edu><20120607211135.GA6314@topoi.pooq.com> <20120608145540.GA10805@topoi.pooq.com> Message-ID: That would be relatively easy. libjit offers an excellent infrastructure for building just in time compilers. On the down-side: Slow program start and a considerable waste of memory resources. Their code generator is as good as non-optimised C. An example: A JIT translator for Oberon. -------------------------------------------------- From: "Hendrik Boom" Sent: Friday, June 08, 2012 4:55 PM To: "m3devel" Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > On Fri, Jun 08, 2012 at 02:13:02AM +0000, Jay K wrote: >> >> > I'd like to, if I only knew how. I'd be really interested in having the >> > low-level infrastructure for JIT code generators >> Would you be satisfied with a Modula-3 interpreter that interpreted a >> mostly-compiled form?It shouldn't be difficult. > > That would be lovely, for all the reasons and opportunitied you > mentioned, but it's mostly orthogonal to what I want. > > I want to write JIT implementations for other languages, languages that > have their own methods for defining data structures, but I want them to > be interoperable with the Modula 3 I know and like. > > I don't mind writing a code generator or two, if necessary. But an > interpreter would provide poratbility instead of efficiency. Having > both could be useful. > > For example, I'd like to implement a formalism that enables me to > download code from the net, formally verify its safety and then be able > to execute it really fast. Yes, I might be comiling it all at once > instead of a line at at time, but I do want to be able to add it to an > existing running program, and saying "JIT" is about the easiest brief > summary. > > I'm quite aware that doing more than a half-assed version of this would > be a big project, and that's probably an understatement. > >> I don't know if our intermediate code was designed with interpretation >> in mind, but it seems like it wouldn't be particularly difficult. >> You'd want a "linker" that just zips all the files and puts it "in" or >> "next to" the stub executable. This would solve the distribution >> format problem, partly.The existing intermediate code is >> platform-specific, but not by much (again: jumpbuf size, word size, >> endian,win32 vs. posix). > >> But I have to admit, I'm keener on generating C than a JIT or an >> interpreter, and interpreter is not JIT. >> Um. What do you hope to gain from JIT? > > The ability to dynamically add code to an existing program and have it > run fast. Possibly to have the program generate additional code to add > to itself. > >> A big reason I ask..is >> because..well, do you want to ship some portable-executable that >> relieson JIT being already installed/available? Or do you want to >> carry the JITer and its code together?Or do you want to target an >> existing widely deployed JITer such as CLR or Java? In my opinion, >> the biggest advantage of JIT is portable-executable, depending on >> widely deployed JITer.But targeting CLR or Java isn't as easy as >> targeting your own custom thing. I understand there are other >> advantages -- faster compilation, optimization very specific to >> runtime environment.But I think portable-executable is most important. >> That's why I like "script". :)There are disadvantages to JIT: slower >> execution/startup, maybe harder to debug, easy to reverse engineer (if >> you care). Heck, at some point you just ship the compiler and >> portable-executable is source code.There are pluses and minuses all >> around. > > JIT is for speed. Otherwise, interpretation would suffice, and could > even be portbale. But even an interpreter would like to be able to add > new garbage-collectible types, which is what I'm asking for at the > moment. > > - Jay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Fri Jun 8 20:50:23 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 8 Jun 2012 19:50:23 +0100 (BST) Subject: [M3devel] [M3commit] CVS Update: cm3 In-Reply-To: Message-ID: <1339181423.68039.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: Olivetti M3 had one AST-based interpreter, Vulcan was AST-based environment I don't know which was better. Vulcan was heavily parallelized could be nice to make a Multi-Threaded Execution Engine. Olivetti M3 AST tk could be mostly like a good AST for doing extensible kind of meta-environment (and you could retarget C) so for instance use it to generate a portable? environment? in that sense and then execute it to on fast Vulcan parallel make fast JIT builder Thanks in advance --- El vie, 8/6/12, Dirk Muysers escribi?: De: Dirk Muysers Asunto: Re: [M3devel] [M3commit] CVS Update: cm3 Para: "Hendrik Boom" , "m3devel" Fecha: viernes, 8 de junio, 2012 10:37 That would be relatively easy. libjit offers an excellent infrastructure for building just in time compilers. On the down-side: Slow program start and a considerable waste of memory resources. Their code generator is as good as non-optimised C. An example: A JIT translator for Oberon. -------------------------------------------------- From: "Hendrik Boom" Sent: Friday, June 08, 2012 4:55 PM To: "m3devel" Subject: Re: [M3devel] [M3commit] CVS Update: cm3 > On Fri, Jun 08, 2012 at 02:13:02AM +0000, Jay K wrote: >> >>? > I'd like to, if I only knew how.? I'd be really interested in having the >>? > low-level infrastructure for JIT code generators >>? Would you be satisfied with a Modula-3 interpreter that interpreted a >> mostly-compiled form?It shouldn't be difficult. > > That would be lovely, for all the reasons and opportunitied you > mentioned, but it's mostly orthogonal to what I want. > > I want to write JIT implementations for other languages, languages that > have their own methods for defining data structures, but I want them to > be interoperable with the Modula 3 I know and like. > > I don't mind writing a code generator or two, if necessary.? But an > interpreter would provide poratbility instead of efficiency.? Having > both could be useful. > > For example, I'd like to implement a formalism that enables me to > download code from the net, formally verify its safety and then be able > to execute it really fast.? Yes, I might be comiling it all at once > instead of a line at at time, but I do want to be able to add it to an > existing running program, and saying "JIT" is about the easiest brief > summary. > > I'm quite aware that doing more than a half-assed version of this would > be a big project, and that's probably an understatement. >? >> I don't know if our intermediate code was designed with interpretation >> in mind, but it seems like it wouldn't be particularly difficult. >> You'd want a "linker" that just zips all the files and puts it "in" or >> "next to" the stub executable.? This would solve the distribution >> format problem, partly.The existing intermediate code is >> platform-specific, but not by much (again: jumpbuf size, word size, >> endian,win32 vs. posix). > >> But I have to admit, I'm keener on generating C than a JIT or an >> interpreter, and interpreter is not JIT. >>? Um. What do you hope to gain from JIT? > > The ability to dynamically add code to an existing program and have it > run fast.? Possibly to have the program generate additional code to add > to itself. > >> A big reason I ask..is >> because..well, do you want to ship some portable-executable that >> relieson JIT being already installed/available? Or do you want to >> carry the JITer and its code together?Or do you want to target an >> existing widely deployed JITer such as CLR or Java?? In my opinion, >> the biggest advantage of JIT is portable-executable, depending on >> widely deployed JITer.But targeting CLR or Java isn't as easy as >> targeting your own custom thing.? I understand there are other >> advantages -- faster compilation, optimization very specific to >> runtime environment.But I think portable-executable is most important. >> That's why I like "script". :)There are disadvantages to JIT: slower >> execution/startup, maybe harder to debug, easy to reverse engineer (if >> you care).? Heck, at some point you just ship the compiler and >> portable-executable is source code.There are pluses and minuses all >> around. > > JIT is for speed.? Otherwise, interpretation would suffice, and could > even be portbale.? But even an interpreter would like to be able to add > new garbage-collectible types, which is what I'm asking for at the > moment. > >??? - Jay????? > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sun Jun 10 10:34:36 2012 From: jay.krell at cornell.edu (Jay K) Date: Sun, 10 Jun 2012 08:34:36 +0000 Subject: [M3devel] reducing our diff to gcc? Message-ID: reducing our diff to gcc? Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. But to my point: gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. but, tree-nested.c, I doubt this can be avoided..so I'm left probably just not bothering with the others. Thoughts? There is also at least one bug fix...that I could avoid needing. There is a bug optimizing our form of div/mod. We could avoid that by going back to function calls, but..again, I'm torn. If you configure -enable-checking, at least currently, there are asserts that have to be removed. I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. ?- Jay From dragisha at m3w.org Sun Jun 10 10:58:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 10 Jun 2012 10:58:00 +0200 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: References: Message-ID: <58EF8A55-4D81-401C-AC47-C5826F6EE759@m3w.org> Think Occam. Not overdoing is good idea :). On Jun 10, 2012, at 10:34 AM, Jay K wrote: > > I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sun Jun 10 12:31:46 2012 From: jay.krell at cornell.edu (Jay K) Date: Sun, 10 Jun 2012 10:31:46 +0000 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: <58EF8A55-4D81-401C-AC47-C5826F6EE759@m3w.org> References: , <58EF8A55-4D81-401C-AC47-C5826F6EE759@m3w.org> Message-ID: Hehe. If someone builds something over-complicated, am I obligated to strip it back down? :) ?- Jay ________________________________ > Subject: Re: [M3devel] reducing our diff to gcc? > From: dragisha at m3w.org > Date: Sun, 10 Jun 2012 10:58:00 +0200 > CC: m3devel at elegosoft.com > To: jay.krell at cornell.edu > > Think Occam. Not overdoing is good idea :). > > On Jun 10, 2012, at 10:34 AM, Jay K wrote: > > > I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. > From dragisha at m3w.org Sun Jun 10 13:05:35 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 10 Jun 2012 13:05:35 +0200 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: References: , <58EF8A55-4D81-401C-AC47-C5826F6EE759@m3w.org> Message-ID: <5663CB3D-C3ED-4BA0-823F-4D251B29F1A6@m3w.org> That makes your change compilcated :). So, no! :) On Jun 10, 2012, at 12:31 PM, Jay K wrote: > > Hehe. If someone builds something over-complicated, am I obligated to strip it back down? > :) > > - Jay > > ________________________________ >> Subject: Re: [M3devel] reducing our diff to gcc? >> From: dragisha at m3w.org >> Date: Sun, 10 Jun 2012 10:58:00 +0200 >> CC: m3devel at elegosoft.com >> To: jay.krell at cornell.edu >> >> Think Occam. Not overdoing is good idea :). >> >> On Jun 10, 2012, at 10:34 AM, Jay K wrote: >> >> >> I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. >> > From dragisha at m3w.org Sun Jun 10 16:16:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 10 Jun 2012 16:16:00 +0200 Subject: [M3devel] new kid on the block: http://lycus.org/ Message-ID: <3590891F-3B7B-46B1-83F6-7155F9254927@m3w.org> Maybe of interest. A friend of mine, D fan, sent this to me. From rodney_bates at lcwb.coop Mon Jun 11 14:39:09 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Mon, 11 Jun 2012 07:39:09 -0500 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: References: Message-ID: <4FD5E6ED.3040503@lcwb.coop> On 06/10/2012 03:34 AM, Jay K wrote: > > reducing our diff to gcc? > > > Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. > > > but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. > > But to my point: > > > gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. > > > tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. > > > but, tree-nested.c, I doubt this can be avoided..so I'm left probably > just not bothering with the others. > tree-nested.c has been a thorn in my side from its inception. I broke a whole lot of stuff in m3gdb, everything that has to do with nonlocal variable access and/or variables of procedure type. It reshuffles the activation record around, with multiple copies of lots of things, especially the static link, which has either two, or, if I remember right, three copies in different places. Moreover, they don't all point to the same place in their target AR. All this wouldn't be too bad, if we got the debug info altered to reflect the reality, but by the time tree-nested does its thing, it's kind of late to do that easily. That's one of the attractions of llvm to me, that it's well set up to transform both the code and its debug info in parallel, when doing optimization. Maybe gcc would be easier too, if we didn't do our own debug info production in parse.c. That could be a lot of work, but would fit fit nicely with switching to dwarf. As I understood it, all of the changes tree-nested.c makes are really only needed for the interaction between nonlocal variable access _and_ inlining. The last I knew we have had inlining disabled from the beginning anyway. Jay, if this is still true, and as you are into disabling various gcc optimizations, what would you think of just disabling what tree-nested does? > > Thoughts? > > > There is also at least one bug fix...that I could avoid needing. > There is a bug optimizing our form of div/mod. > We could avoid that by going back to function calls, but..again, I'm torn. > > > If you configure -enable-checking, at least currently, there are asserts that have to be removed. > > > I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. > > > - Jay > From jay.krell at cornell.edu Mon Jun 11 21:07:26 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 11 Jun 2012 19:07:26 +0000 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: <4FD5E6ED.3040503@lcwb.coop> References: , <4FD5E6ED.3040503@lcwb.coop> Message-ID: ?> Maybe gcc would be easier too, if we didn't do our own debug ? > info production in parse.c. Correct. It is "our fault" for doing wierd things debugging-wise. ?> That could be a lot of work It is "the right amount of work", but yeah, kind of a lot. ?> but would fit fit nicely with switching to dwarf. We'd just use -g and use whatever gcc wants for the target system. Sometimes Dwarf, sometimes not, we wouldn't care. ? > As I understood it, all of the changes tree-nested.c makes are really only > needed for the interaction between nonlocal variable access _and_ inlining. I don't think so, but I don't know. > The last I knew we have had inlining disabled from the beginning anyway. We have inlining on mostly. Aside from a small sprinkling of "volatile". Off in gcc 4.6 backend, but I never enabled that and am moving on to 4.7 rapidly. > what would you think of just disabling what tree-nested does? I'm really not sure it is possible. Sure, if nested functions used only for "lexical hiding" of the functions themselves. But Modula-3 uses the "static link" in a unique-to-itself way. I don't expect gcc to "just work". I can explain the Modula-3 unique way if people want. It turns out..I have thought about this a bunch, there is no good way to handle the static link, given that you can take the addresses of nested functions. (Right?) Where you don't take the address, the static link can just be an extra parameter. Or maybe this is dealt with elsewhere or otherwise... We do actually use "extra parameter" sometimes for static link. And maybe elsewhere/otherwise is in the frontend, mostly..just mostly... There are comments in tree-nested.c indicating it has "bad history". But actually, I'm not sure it does things so poorly. The basic theory of nested functions includes stuffing locals into a struct, at least locals accessed by nested functions, and passing a pointer to that struct as an extra parameter. The locals include said pointer to struct of locals, in the case of multiple nesting levels. OR you can "flatten" things, I guess, maybe. Flattening is problematic though, given nested functions can be mutually recursive and such..you want to update just one place and have all the other code follow pointers to it. Optimization can copy around copies instead of pointers, where it is profitable. Sorry, I don't have time to explain right now. ?- Jay ---------------------------------------- > Date: Mon, 11 Jun 2012 07:39:09 -0500 > From: rodney_bates at lcwb.coop > To: m3devel at elegosoft.com > Subject: Re: [M3devel] reducing our diff to gcc? > > > > On 06/10/2012 03:34 AM, Jay K wrote: > > > > reducing our diff to gcc? > > > > > > Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. > > > > > > but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. > > > > But to my point: > > > > > > gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. > > > > > > tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. > > > > > > but, tree-nested.c, I doubt this can be avoided..so I'm left probably > > just not bothering with the others. > > > > tree-nested.c has been a thorn in my side from its inception. I broke a whole > lot of stuff in m3gdb, everything that has to do with nonlocal variable access > and/or variables of procedure type. It reshuffles the activation record around, > with multiple copies of lots of things, especially the static link, which has > either two, or, if I remember right, three copies in different places. Moreover, > they don't all point to the same place in their target AR. > > All this wouldn't be too bad, if we got the debug info altered to reflect the > reality, but by the time tree-nested does its thing, it's kind of late to do > that easily. That's one of the attractions of llvm to me, that it's well set > up to transform both the code and its debug info in parallel, when doing > optimization. Maybe gcc would be easier too, if we didn't do our own debug > info production in parse.c. That could be a lot of work, but would fit > fit nicely with switching to dwarf. > > As I understood it, all of the changes tree-nested.c makes are really only > needed for the interaction between nonlocal variable access _and_ inlining. > The last I knew we have had inlining disabled from the beginning anyway. > Jay, if this is still true, and as you are into disabling various gcc > optimizations, what would you think of just disabling what tree-nested does? > > > > > Thoughts? > > > > > > There is also at least one bug fix...that I could avoid needing. > > There is a bug optimizing our form of div/mod. > > We could avoid that by going back to function calls, but..again, I'm torn. > > > > > > If you configure -enable-checking, at least currently, there are asserts that have to be removed. > > > > > > I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. > > > > > > - Jay > > From rodney_bates at lcwb.coop Tue Jun 12 18:17:50 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Tue, 12 Jun 2012 11:17:50 -0500 Subject: [M3devel] reducing our diff to gcc? In-Reply-To: References: , <4FD5E6ED.3040503@lcwb.coop> Message-ID: <4FD76BAE.7060702@lcwb.coop> On 06/11/2012 02:07 PM, Jay K wrote: > > > Maybe gcc would be easier too, if we didn't do our own debug > > info production in parse.c. > > Correct. It is "our fault" for doing wierd things debugging-wise. > > > That could be a lot of work > > > It is "the right amount of work", but yeah, kind of a lot. > > > but would fit fit nicely with switching to dwarf. > We'd just use -g and use whatever gcc wants for the target system. > Sometimes Dwarf, sometimes not, we wouldn't care. > It's going to require quite a lot in m3gdb. Stock gdb has readers for several debug info formats, but there's a lot that is language-dependent, even for C, let alone the other languages supported by stock gdb. I think this has considerable debug-format dependency too, leading to a Cartesian product. It is certainly that way for Modula-3. I would be greatly surprised if gcc didn't also require at least a bit of M3-dependent work, even for dwarf. > > > As I understood it, all of the changes tree-nested.c makes are really only > > needed for the interaction between nonlocal variable access _and_ inlining. > > > I don't think so, but I don't know. > > > > The last I knew we have had inlining disabled from the beginning anyway. > > > We have inlining on mostly. Aside from a small sprinkling of "volatile". > Off in gcc 4.6 backend, but I never enabled that and am moving on to 4.7 rapidly. > > > > what would you think of just disabling what tree-nested does? > I'm really not sure it is possible. > Sure, if nested functions used only for "lexical hiding" of the functions themselves. > But Modula-3 uses the "static link" in a unique-to-itself way. > I don't expect gcc to "just work". > I can explain the Modula-3 unique way if people want. > It turns out..I have thought about this a bunch, there is no good way to handle the static link, > given that you can take the addresses of nested functions. (Right?) > Please elaborate. Yes, you can take the address of a nested function. But you can only pass it as a parameter. You can't assign it to a variable. This latter restriction requires some runtime enforcement, but I think it is taken care of by explicitly coded runtime checks generated by parse.c or earlier. The nested-function language extension to C, implemented by stock gcc, allows the taking of the address of a nested function, without the restriction against assigning it to a variable, with no linguistic safety added. If, in C, you use such a function "address" for a function that has returned, to quote from gcc "all hell will break loose". But this should imply that stock gcc support is enough for Modula-3. > > > Where you don't take the address, the static link can just be an extra parameter. > Either way, you need a static link, and it is just passed as an extra parameter. In the x86 case, it is always passed in the same register (ecx, if I recall) and always immediately stored by prolog code at the same place in the AR. tree-nested doesn't mess with this, but adds extra static-linkish variable(s) elsewhere in the AR, derived from this one, and uses them in some/all places. > > Or maybe this is dealt with elsewhere or otherwise... > > > We do actually use "extra parameter" sometimes for static link. > And maybe elsewhere/otherwise is in the frontend, mostly..just mostly... > > > There are comments in tree-nested.c indicating it has "bad history". > But actually, I'm not sure it does things so poorly. > I haven't read the comments in later gcc versions, but the bad history I recall is that it greatly simplifies an "insanely complicated" scheme. Unfortunately, the simplification is all compile-time, at the expense of replacing a relatively simple runtime scheme with one I would call at least very complicated, if not insanely so. > The basic theory of nested functions includes stuffing locals into a struct, > at least locals accessed by nested functions, and passing a pointer to that struct > as an extra parameter. The locals include said pointer to struct of locals, in the case > of multiple nesting levels. OR you can "flatten" things, I guess, maybe.f Actually, it's the other way around. All locals start out in a flat AR. If the function contains nested function(s), tree-nested collects the locals that are referenced nonlocally (i.e., from within one of the nested functions) into a local struct. Then, the nested functions get and use what you could call a "derived static link" (a better term is needed) that points directly to this struct rather than to the whole AR. I guess this helps with inlining, in case the struct isn't actually located in the same way inside the parent AR. > Flattening is problematic though, given nested functions can be mutually recursive > and such..you want to update just one place and have all the other code follow pointers to it. > Optimization can copy around copies instead of pointers, where it is profitable. > Sorry, I don't have time to explain right now. > > > - Jay > > > ---------------------------------------- >> Date: Mon, 11 Jun 2012 07:39:09 -0500 >> From: rodney_bates at lcwb.coop >> To: m3devel at elegosoft.com >> Subject: Re: [M3devel] reducing our diff to gcc? >> >> >> >> On 06/10/2012 03:34 AM, Jay K wrote: >>> >>> reducing our diff to gcc? >>> >>> >>> Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. >>> >>> >>> but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. >>> >>> But to my point: >>> >>> >>> gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. >>> >>> >>> tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. >>> >>> >>> but, tree-nested.c, I doubt this can be avoided..so I'm left probably >>> just not bothering with the others. >>> >> >> tree-nested.c has been a thorn in my side from its inception. I broke a whole >> lot of stuff in m3gdb, everything that has to do with nonlocal variable access >> and/or variables of procedure type. It reshuffles the activation record around, >> with multiple copies of lots of things, especially the static link, which has >> either two, or, if I remember right, three copies in different places. Moreover, >> they don't all point to the same place in their target AR. >> >> All this wouldn't be too bad, if we got the debug info altered to reflect the >> reality, but by the time tree-nested does its thing, it's kind of late to do >> that easily. That's one of the attractions of llvm to me, that it's well set >> up to transform both the code and its debug info in parallel, when doing >> optimization. Maybe gcc would be easier too, if we didn't do our own debug >> info production in parse.c. That could be a lot of work, but would fit >> fit nicely with switching to dwarf. >> >> As I understood it, all of the changes tree-nested.c makes are really only >> needed for the interaction between nonlocal variable access _and_ inlining. >> The last I knew we have had inlining disabled from the beginning anyway. >> Jay, if this is still true, and as you are into disabling various gcc >> optimizations, what would you think of just disabling what tree-nested does? >> >>> >>> Thoughts? >>> >>> >>> There is also at least one bug fix...that I could avoid needing. >>> There is a bug optimizing our form of div/mod. >>> We could avoid that by going back to function calls, but..again, I'm torn. >>> >>> >>> If you configure -enable-checking, at least currently, there are asserts that have to be removed. >>> >>> >>> I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. >>> >>> >>> - Jay >>> > From dabenavidesd at yahoo.es Wed Jun 13 04:18:33 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 13 Jun 2012 03:18:33 +0100 (BST) Subject: [M3devel] reducing our diff to gcc? In-Reply-To: <4FD76BAE.7060702@lcwb.coop> Message-ID: <1339553913.24183.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: in fact language-dependent-parts of a debugger inherently are 'part' of compiler architecture (needs to re-implement a lot of machinery in Gdb from Gcc, maybe it's still the same but could be reordered to cut it down if so is done in C). I have heard M3gdb is like 20k loc, this is hard to me, and in C, worse, I think a full debugger can be implemented in such lines, at least in ldb is like that, so I don't how much really M3gdb is not in Gdb. Now, m3gcc or m3cgc or m3cg or m3cc is not of interest in GNU why keep it,like that, we should use it as a real backend for using it as a language but as a real architecture, as it isn't what would it take to do that? In fact that's what we are trying to do with JIT, right? What I have found tells me that C code tends to be AFAIK portable in the form of a stack architecture like M3CG than anything else In the other sense, compiling gcc over and over again, I don't know how many of us want to do that each time we compile a Modula-3 distribution (I do). Now, I don't think gcc wnats to add and support our ideal architecture, but anyway who knows if the thing will work for us, maybe they will want it, won't they? Thanks in advance --- El mar, 12/6/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] reducing our diff to gcc? Para: "m3devel" Fecha: martes, 12 de junio, 2012 11:17 On 06/11/2012 02:07 PM, Jay K wrote: > >???>? Maybe gcc would be easier too, if we didn't do our own debug >? ? >? info production in parse.c. > > Correct. It is "our fault" for doing wierd things debugging-wise. > >???>? That could be a lot of work > > > It is "the right amount of work", but yeah, kind of a lot. > >???>? but would fit fit nicely with switching to dwarf. > We'd just use -g and use whatever gcc wants for the target system. > Sometimes Dwarf, sometimes not, we wouldn't care. > It's going to require quite a lot in m3gdb.? Stock gdb has readers for several debug info formats, but there's a lot that is language-dependent, even for C, let alone the other languages supported by stock gdb.? I think this has considerable debug-format dependency too, leading to a Cartesian product.? It is certainly that way for Modula-3.? I would be greatly surprised if gcc didn't also require at least a bit of M3-dependent work, even for dwarf. > >???>? As I understood it, all of the changes tree-nested.c makes are really only >???>? needed for the interaction between nonlocal variable access _and_ inlining. > > > I don't think so, but I don't know. > > >???>? The last I knew we have had inlining disabled from the beginning anyway. > > > We have inlining on mostly. Aside from a small sprinkling of "volatile". > Off in gcc 4.6 backend, but I never enabled that and am moving on to 4.7 rapidly. > > >???>? what would you think of just disabling what tree-nested does? > I'm really not sure it is possible. > Sure, if nested functions used only for "lexical hiding" of the functions themselves. > But Modula-3 uses the "static link" in a unique-to-itself way. > I don't expect gcc to "just work". > I can explain the Modula-3 unique way if people want. > It turns out..I have thought about this a bunch, there is no good way to handle the static link, > given that you can take the addresses of nested functions. (Right?) > Please elaborate.? Yes, you can take the address of a nested function.? But you can only pass it as a parameter.? You can't assign it to a variable.? This latter restriction requires some runtime enforcement, but I think it is taken care of by explicitly coded runtime checks generated by parse.c or earlier. The nested-function language extension to C, implemented by stock gcc, allows the taking of the address of a nested function, without the restriction against assigning it to a variable, with no linguistic safety added.? If, in C, you use such a function "address" for a function that has returned, to quote from gcc "all hell will break loose". But this should imply that stock gcc support is enough for Modula-3. > > > Where you don't take the address, the static link can just be an extra parameter. > Either way, you need a static link, and it is just passed as an extra parameter. In the x86 case, it is always passed in the same register (ecx, if I recall) and always immediately stored by prolog code at the same place in the AR.? tree-nested doesn't mess with this, but adds extra static-linkish variable(s) elsewhere in the AR, derived from this one, and uses them in some/all places. > > Or maybe this is dealt with elsewhere or otherwise... > > > We do actually use "extra parameter" sometimes for static link. > And maybe elsewhere/otherwise is in the frontend, mostly..just mostly... > > > There are comments in tree-nested.c indicating it has "bad history". > But actually, I'm not sure it does things so poorly. > I haven't read the comments in later gcc versions, but the bad history I recall is that it greatly simplifies an "insanely complicated" scheme.? Unfortunately, the simplification is all compile-time, at the expense of replacing a relatively simple runtime scheme with one I would call at least very complicated, if not insanely so. > The basic theory of nested functions includes stuffing locals into a struct, > at least locals accessed by nested functions, and passing a pointer to that struct > as an extra parameter. The locals include said pointer to struct of locals, in the case > of multiple nesting levels. OR you can "flatten" things, I guess, maybe.f Actually, it's the other way around.? All locals start out in a flat AR.? If the function contains nested function(s), tree-nested collects the locals that are referenced nonlocally (i.e., from within one of the nested functions) into a local struct.? Then, the nested functions get and use what you could call a "derived static link" (a better term is needed) that points directly to this struct rather than to the whole AR. I guess this helps with inlining, in case the struct isn't actually located in the same way inside the parent AR. > Flattening is problematic though, given nested functions can be mutually recursive > and such..you want to update just one place and have all the other code follow pointers to it. > Optimization can copy around copies instead of pointers, where it is profitable. > Sorry, I don't have time to explain right now. > > >???- Jay > > > ---------------------------------------- >> Date: Mon, 11 Jun 2012 07:39:09 -0500 >> From: rodney_bates at lcwb.coop >> To: m3devel at elegosoft.com >> Subject: Re: [M3devel] reducing our diff to gcc? >> >> >> >> On 06/10/2012 03:34 AM, Jay K wrote: >>> >>> reducing our diff to gcc? >>> >>> >>> Ignore my hacking: extern C, removal of optimizer, removal of gmp/mpfr/mpc.. >>> >>> >>> but wait: do people like removal of gmp/mpfr/mpc? I do. I'm torn. >>> >>> But to my point: >>> >>> >>> gimplify.c: I think we can achieve the diff via langhook.gimplify_expr. >>> >>> >>> tree.def: I think frontends can add their own codes in separate files, so the diff can be removed. >>> >>> >>> but, tree-nested.c, I doubt this can be avoided..so I'm left probably >>> just not bothering with the others. >>> >> >> tree-nested.c has been a thorn in my side from its inception. I broke a whole >> lot of stuff in m3gdb, everything that has to do with nonlocal variable access >> and/or variables of procedure type. It reshuffles the activation record around, >> with multiple copies of lots of things, especially the static link, which has >> either two, or, if I remember right, three copies in different places. Moreover, >> they don't all point to the same place in their target AR. >> >> All this wouldn't be too bad, if we got the debug info altered to reflect the >> reality, but by the time tree-nested does its thing, it's kind of late to do >> that easily. That's one of the attractions of llvm to me, that it's well set >> up to transform both the code and its debug info in parallel, when doing >> optimization. Maybe gcc would be easier too, if we didn't do our own debug >> info production in parse.c. That could be a lot of work, but would fit >> fit nicely with switching to dwarf. >> >> As I understood it, all of the changes tree-nested.c makes are really only >> needed for the interaction between nonlocal variable access _and_ inlining. >> The last I knew we have had inlining disabled from the beginning anyway. >> Jay, if this is still true, and as you are into disabling various gcc >> optimizations, what would you think of just disabling what tree-nested does? >> >>> >>> Thoughts? >>> >>> >>> There is also at least one bug fix...that I could avoid needing. >>> There is a bug optimizing our form of div/mod. >>> We could avoid that by going back to function calls, but..again, I'm torn. >>> >>> >>> If you configure -enable-checking, at least currently, there are asserts that have to be removed. >>> >>> >>> I think I'll just go ahead and patch 4.7 "completely", w/o overdoing it. >>> >>> >>> - Jay >>> >?????? ???????? ?????? ??? -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jun 16 08:09:33 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 16 Jun 2012 06:09:33 +0000 Subject: [M3devel] help test 4.7 backend? Message-ID: help test 4.7 backend? Can folks try out the new 4.7 backend? edit m3-sys/m3cc/src/m3makefile add your platform to the list near the top, mapped to "47" and then run scripts/python/boot2.sh and then, do it again, but edit config/Unix.common, the functon m3_backend to always args += m3back_optimize and optionally but preferably try with -O3 instead of -O2 in the same file and try running some GUI apps like solataire I could use help particularly with: ?SPARC{32,64}_LINUX ?PPC_{LINUX,OPENBSD,NETBSD,FREEBSD,DARWIN} ?ALPHA_OSF ?I386_LINUX, I386_INTERIX, I386_MINGWIN, I386_CYGWIN, because I'm being lazy ? I can do various x86/amd64, either in a VM or opencsw, but splitting that load would be good too. I might go back to not having much time soon or temporarily. Still to do: ? apply OpenBSD patches ? update from 4.7.0 to 4.7.1 that was just released. ? ? Thanks, ?- Jay From jay.krell at cornell.edu Sat Jun 16 10:47:35 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 16 Jun 2012 08:47:35 +0000 Subject: [M3devel] ALPHA_LINUX In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, , <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , , , , , , , , , <20120607011634.468b6bbf@wenus.next.com.pl>, , <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org>, Message-ID: > > Is cm3 working on LINUX_ALPHA? I have one ES40 working server with Gentoo Linux > I don't think it does yet, but give me ssh access and I can most likely make it work pretty quickly. > There is very very very little to porting these days. So forgetful of me. Yes, it works. See:http://www.opencm3.net/uploaded-archives/index.html - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jun 16 11:03:22 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 16 Jun 2012 09:03:22 +0000 Subject: [M3devel] IA64_LINUX In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com>, , <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org>, , , , , , , , , , , , <20120607011634.468b6bbf@wenus.next.com.pl>, , <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org>, , Message-ID: also IA64_LINUX, I thought I was working on recently, yet I already put up here a while ago: http://www.opencm3.net/uploaded-archives/index.html i don't remember if I solved the finding the register spill stack coding..and indeed..I don't see the code in m3core...so a little bit to do there... I expect there might be a GC bug there..or maybe we should make all stores volatile..or something... - Jay From: jay.krell at cornell.edu To: dragisha at m3w.org; dknoto at gmail.com CC: m3devel at elegosoft.com Subject: ALPHA_LINUX Date: Sat, 16 Jun 2012 08:47:35 +0000 > > Is cm3 working on LINUX_ALPHA? I have one ES40 working server with Gentoo Linux > I don't think it does yet, but give me ssh access and I can most likely make it work pretty quickly. > There is very very very little to porting these days. So forgetful of me. Yes, it works. See: http://www.opencm3.net/uploaded-archives/index.html - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From mark at wickensonline.co.uk Sat Jun 16 11:49:45 2012 From: mark at wickensonline.co.uk (Mark Wickens) Date: Sat, 16 Jun 2012 10:49:45 +0100 Subject: [M3devel] IA64_LINUX In-Reply-To: References: <20120606064732.2C9242474003@birch.elegosoft.com> <55A889D3-F9CA-4CDC-8F54-82788B68041C@m3w.org> <20120607011634.468b6bbf@wenus.next.com.pl> <741029B0-331E-4E10-9886-86A78B0ED3CC@m3w.org> Message-ID: If you feel the need to address the issues let me know and I'll put the ZX6000 online for you. Mark. Sent from my iPad On 16 Jun 2012, at 10:03, Jay K wrote: > also IA64_LINUX, I thought I was working on recently, yet I already put up here a while ago: > > http://www.opencm3.net/uploaded-archives/index.html > > > i don't remember if I solved the finding the register spill stack coding..and indeed..I don't see the code in m3core...so a little bit to do there... I expect there might be a GC bug there..or maybe we should make all stores volatile..or something... > > > - Jay > > From: jay.krell at cornell.edu > To: dragisha at m3w.org; dknoto at gmail.com > CC: m3devel at elegosoft.com > Subject: ALPHA_LINUX > Date: Sat, 16 Jun 2012 08:47:35 +0000 > > > > Is cm3 working on LINUX_ALPHA? I have one ES40 working server with Gentoo Linux > > > I don't think it does yet, but give me ssh access and I can most likely make it work pretty quickly. > > There is very very very little to porting these days. > > > So forgetful of me. > > Yes, it works. > > See: > http://www.opencm3.net/uploaded-archives/index.html > > > - Jay > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Sun Jun 17 20:36:02 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sun, 17 Jun 2012 20:36:02 +0200 Subject: [M3devel] g_open, GLib wrapper for open() Message-ID: From doc: === There is a group of functions which wrap the common POSIX functions dealing with filenames (g_open(), g_rename(), g_mkdir(), g_stat(), g_unlink(), g_remove(), g_fopen(), g_freopen()). The point of these wrappers is to make it possible to handle file names with any Unicode characters in them on Windows without having to use ifdefs and the wide character API in the application code. The pathname argument should be in the GLib file name encoding. On POSIX this is the actual on-disk encoding which might correspond to the locale settings of the process (or the G_FILENAME_ENCODING environment variable), or not. On Windows the GLib file name encoding is UTF-8. Note that the Microsoft C library does not use UTF-8, but has separate APIs for current system code page and wide characters (UTF-16). The GLib wrappers call the wide character API if present (on modern Windows systems), otherwise convert to/from the system code page. === Template for g_open is: int g_open (const gchar *filename, int flags, int mode); Obviously, I need FilePosix.i3 and descendants, but under Windows? Anyone met/solved this? TIA, dd -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jun 18 22:57:07 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 18 Jun 2012 21:57:07 +0100 (BST) Subject: [M3devel] g_open, GLib wrapper for open() In-Reply-To: Message-ID: <1340053027.33106.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: Win32 doesn't support Unicode character code set natively but as separated strings API for both ANSI and Unicode same as C Run-Time library just as you say, not as it's in Windows NT native code set for all strings. But I don't think the Win32 Win98 Is a common type of system daily, so I guess you can be safe without that. Couldn't you? Thanks in advance --- El dom, 17/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] g_open, GLib wrapper for open() Para: "m3devel" Fecha: domingo, 17 de junio, 2012 13:36 >From doc:===?There is a group of functions which wrap the common POSIX functions dealing with filenames (g_open(), g_rename(), g_mkdir(), g_stat(), g_unlink(), g_remove(), g_fopen(), g_freopen()). The point of these wrappers is to make it possible to handle file names with any Unicode characters in them on Windows without having to use ifdefs and the wide character API in the application code. The pathname argument should be in the GLib file name encoding. On POSIX this is the actual on-disk encoding which might correspond to the locale settings of the process (or the G_FILENAME_ENCODING environment variable), or not. On Windows the GLib file name encoding is UTF-8. Note that the Microsoft C library does not use UTF-8, but has separate APIs for current system code page and wide characters (UTF-16). The GLib wrappers call the wide character API if present (on modern Windows systems), otherwise convert to/from the system code page.===Template for g_open is: int g_open (const gchar *filename, int flags, int mode);Obviously, I need FilePosix.i3 and descendants, but under Windows? Anyone met/solved this? TIA,dd -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 19 09:15:32 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 19 Jun 2012 09:15:32 +0200 Subject: [M3devel] g_open, GLib wrapper for open() In-Reply-To: <1340053027.33106.YahooMailClassic@web29705.mail.ird.yahoo.com> References: <1340053027.33106.YahooMailClassic@web29705.mail.ird.yahoo.com> Message-ID: <110BFBCA-C682-4210-8D44-375550B6DB55@m3w.org> You could not do without. Once you need to access a file from Gtk application, and file is named with at least one Unicode character, you cannot ignore it. On Jun 18, 2012, at 10:57 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > Win32 doesn't support Unicode character code set natively but as separated strings API for both ANSI and Unicode same as C Run-Time library just as you say, not as it's in Windows NT native code set for all strings. But I don't think the Win32 Win98 Is a common type of system daily, so I guess you can be safe without that. Couldn't you? > Thanks in advance > > --- El dom, 17/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: [M3devel] g_open, GLib wrapper for open() > Para: "m3devel" > Fecha: domingo, 17 de junio, 2012 13:36 > > From doc: > === > There is a group of functions which wrap the common POSIX functions dealing with filenames (g_open(), g_rename(), g_mkdir(), g_stat(), g_unlink(), g_remove(), g_fopen(), g_freopen()). The point of these wrappers is to make it possible to handle file names with any Unicode characters in them on Windows without having to use ifdefs and the wide character API in the application code. > > The pathname argument should be in the GLib file name encoding. On POSIX this is the actual on-disk encoding which might correspond to the locale settings of the process (or the G_FILENAME_ENCODING environment variable), or not. > > On Windows the GLib file name encoding is UTF-8. Note that the Microsoft C library does not use UTF-8, but has separate APIs for current system code page and wide characters (UTF-16). The GLib wrappers call the wide character API if present (on modern Windows systems), otherwise convert to/from the system code page. > === > Template for g_open is: > > int g_open (const gchar *filename, > int flags, > int mode); > Obviously, I need FilePosix.i3 and descendants, but under Windows? Anyone met/solved this? > > TIA, > dd > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Tue Jun 19 16:35:34 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 10:35:34 -0400 Subject: [M3devel] missing m3gdb? Message-ID: <20120619143534.GA30034@topoi.pooq.com> Having downloaded the development version in mid-May and succeeded in biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. I then removed my existing Modula 3, installed the new .deb, and proceeded to use it with no problems until today. Today tried to use the debugger, and discovered that m3gdb is missing. Did I bungle something or was m3gdb left out of the script for building the .deb for some reason? If the latter, is it still missing? The only package I remenber deliberately removing is ESC, which didn't compile. -- hendrik From hendrik at topoi.pooq.com Tue Jun 19 17:13:04 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 11:13:04 -0400 Subject: [M3devel] missing m3gdb? In-Reply-To: <20120619143534.GA30034@topoi.pooq.com> References: <20120619143534.GA30034@topoi.pooq.com> Message-ID: <20120619151304.GB30034@topoi.pooq.com> On Tue, Jun 19, 2012 at 10:35:34AM -0400, Hendrik Boom wrote: > Having downloaded the development version in mid-May and succeeded in > biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. I then removed my > existing Modula 3, installed the new .deb, and proceeded to use it with > no problems until today. > > Today tried to use the debugger, and discovered that m3gdb is missing. > > Did I bungle something or was m3gdb left out of the script for building > the .deb for some reason? If the latter, is it still missing? > > The only package I remenber deliberately removing is ESC, which didn't > compile. I don't know if this is relevant, but:::: On LINUXLIBC6, which I've only partially recompiled so far from those same mid-May sources, I get (m3gdb) bt #0 0x0804c75e in RunSeq (code=0xb6c3436c, exec=0xbfdad6d4) at ../src/PqCd.m3:907 #1 0x0804c950 in EnvRunMe (self=0xb6c34308) at ../src/PqCd.m3:923 Debug info for file "Stupid.mc" not in stabs format (m3gdb) which suggests there may be some inncompatibility, possibly caused by the partial recompilation of Modula 3. I don't know whether the debugger is there from my initial download or from my recompilation. > > -- hendrik From dabenavidesd at yahoo.es Tue Jun 19 17:57:52 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 19 Jun 2012 16:57:52 +0100 (BST) Subject: [M3devel] missing m3gdb? In-Reply-To: <20120619143534.GA30034@topoi.pooq.com> Message-ID: <1340121472.14046.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: while I haven't checked cm3-std last (released) build but I didn't need it since anything broke in build time, but ESC hasn't been compiled after last CM3 as the HP' version didn't compile for me (though older CM3 did compile with same HP version) so I tried and worked OK, which might be good for timing it. I don't know if your m3cgc works or not? with other releases, I guess it should not break m3gdb support (whichever m3cgc do you use). My main comment here is that you don't update something or anything else unless isn't working OK (I guess this is pure SW Eng blah blah but if it works ...). Thanks in advance --- El mar, 19/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: [M3devel] missing m3gdb? Para: "m3devel" Fecha: martes, 19 de junio, 2012 09:35 Having downloaded the development version in mid-May and succeeded in biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb.? I then removed my existing Modula 3, installed the new .deb, and proceeded to use it with no problems until today. Today tried to use the debugger, and discovered that m3gdb is missing. Did I bungle something or was m3gdb left out of the script for building the .deb for some reason?? If the latter, is it still missing? The only package I remenber deliberately removing is ESC, which didn't compile. -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 19 18:08:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 19 Jun 2012 18:08:00 +0200 Subject: [M3devel] missing m3gdb? In-Reply-To: <20120619143534.GA30034@topoi.pooq.com> References: <20120619143534.GA30034@topoi.pooq.com> Message-ID: <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> Short answer: If you need m3gdb - use 5.8.6 release version. On Jun 19, 2012, at 4:35 PM, Hendrik Boom wrote: > Having downloaded the development version in mid-May and succeeded in > biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. I then removed my > existing Modula 3, installed the new .deb, and proceeded to use it with > no problems until today. > > Today tried to use the debugger, and discovered that m3gdb is missing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Tue Jun 19 18:28:33 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 19 Jun 2012 17:28:33 +0100 (BST) Subject: [M3devel] missing m3gdb? In-Reply-To: <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> Message-ID: <1340123313.64869.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I don't think is so much a waste of time, if compiler has done well, you don't need to make a debugger, but instead if it hasn't done it well, don't waste more time use a different compiler (optimized). ESC had a X postfix for every package name that has been verified so you could have a different verified and compiled version and as a reference for program behavior and then use an experimental debug able version. That was the idea with a Module system with separate compilation and version stamps, IMHO, to really have a fast to execute and easy to debug around cycle and you need that lately as compiler versions are getting faster or harder to debug. The interesting stuff is whether you could use the same infrastructure to verify in less time or not, that will proof ESC is worth of anything which I'm sure no body uses for that reasoning broken -not Dragisha's nor Hendrick- but most people do ahead of time. Thanks in advance --- El mar, 19/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] missing m3gdb? Para: "Hendrik Boom" CC: "m3devel" Fecha: martes, 19 de junio, 2012 11:08 Short answer: If you need m3gdb - use 5.8.6 release version. On Jun 19, 2012, at 4:35 PM, Hendrik Boom wrote: Having downloaded the development version in mid-May and succeeded in? biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. ?I then removed my? existing Modula 3, installed the new .deb, and proceeded to use it with? no problems until today. Today tried to use the debugger, and discovered that m3gdb is missing. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Tue Jun 19 18:55:16 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 12:55:16 -0400 Subject: [M3devel] missing m3gdb? In-Reply-To: <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> References: <20120619143534.GA30034@topoi.pooq.com> <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> Message-ID: <20120619165516.GA32036@topoi.pooq.com> That was my fallback plan, and I'd still have to recompile it so it would access current libraries on Debian. What I wanted to know was whether it was intentional to leave the debugger out of the current .deb-building script (because, perhaps, that it didn't work). And as I've said before, recompiling frmo source is too much work for a beginner. Not that I class myself as a beginner anymore. but if, for example, I'd want to submit a video game written in Modula 3 to an open-source video-game competition, the judges would have to be able to run it on their machines, and they would be beginners. So if the development-source doesn't build a working .deb, I'll build one from 5.8.6. But if I didn't bungle the .deb build, and the m3gdb isn't a known bug, it probably warrants some attentioin, by someone, someday.. -- hendrik The LINUXLIBC6 problem may just be a problem with an incomplete build. I've restarted it after installing postgresql (which was holding things up), and it's compiling, comppiling, and compiling now. But I really had thought the AMD64 Linux build has good, and it seemed not to be. -- hendrik On Tue, Jun 19, 2012 at 06:08:00PM +0200, Dragi?a Duri? wrote: > Short answer: If you need m3gdb - use 5.8.6 release version. > > On Jun 19, 2012, at 4:35 PM, Hendrik Boom wrote: > > > Having downloaded the development version in mid-May and succeeded in > > biulding cm3-all-AMD64_LINUX-d5.9.0-20120518.deb. I then removed my > > existing Modula 3, installed the new .deb, and proceeded to use it with > > no problems until today. > > > > Today tried to use the debugger, and discovered that m3gdb is missing. > From hendrik at topoi.pooq.com Tue Jun 19 19:00:39 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 13:00:39 -0400 Subject: [M3devel] ESC In-Reply-To: <1340123313.64869.YahooMailClassic@web29704.mail.ird.yahoo.com> References: <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> <1340123313.64869.YahooMailClassic@web29704.mail.ird.yahoo.com> Message-ID: <20120619170038.GB32036@topoi.pooq.com> On Tue, Jun 19, 2012 at 05:28:33PM +0100, Daniel Alejandro Benavides D. wrote: > Hi all: > I don't think is so much a waste of time, if compiler has done well, you don't need to make a debugger, but instead if it hasn't done it well, don't waste more time use a different compiler (optimized). > ESC had a X postfix for every package name that has been verified so you could have a different verified and compiled version and as a reference for program behavior and then use an experimental debug able version. > That was the idea with a Module system with separate compilation and version stamps, IMHO, to really have a fast to execute and easy to debug around cycle and you need that lately as compiler versions are getting faster or harder to debug. > The interesting stuff is whether you could use the same infrastructure to verify in less time or not, that will proof ESC is worth of anything which I'm sure no body uses for that reasoning broken -not Dragisha's nor Hendrick- but most people do ahead of time. > Thanks in advance > > --- El mar, 19/6/12, Dragi?a Duri? escribi?: Yes, I agree. It would be worthwhile to track down the ESC source code. Or rewrite it. But until that's been done I'll probably need a debugger. And maybe occasinoally afterward, for the things that ESC doesn't catch. -- hendrik From hendrik at topoi.pooq.com Tue Jun 19 19:57:13 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 19 Jun 2012 13:57:13 -0400 Subject: [M3devel] Rebuilding 5.8.6 for current Debian. In-Reply-To: <20120619165516.GA32036@topoi.pooq.com> References: <20120619143534.GA30034@topoi.pooq.com> <121739E0-E3E7-486A-905D-C296A6B302BC@m3w.org> <20120619165516.GA32036@topoi.pooq.com> Message-ID: <20120619175713.GA32389@topoi.pooq.com> On Tue, Jun 19, 2012 at 12:55:16PM -0400, Hendrik Boom wrote: > > So if the development-source doesn't build a working .deb, I'll build > one from 5.8.6. The current 5.8.6 .deb is not compatible with current versions of debian. If I build a .deb from the sources in cm3-src-all-5.8.6-REL.tgz, will its version number be 5.8.6, or some modification of 5.8.6? I'd very much want it to be *different* so that my build will be recognised as a more recent build (for a more recent version of Debian). If not, is there a way of specifying it explicitly? The new .deb I make will likely not be compatible with really old versions of Debian. -- hendrik From dabenavidesd at yahoo.es Tue Jun 19 20:17:07 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 19 Jun 2012 19:17:07 +0100 (BST) Subject: [M3devel] ESC In-Reply-To: <20120619170038.GB32036@topoi.pooq.com> Message-ID: <1340129827.93106.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: in fact there was another ESC written exclusively for the purpose of finding the time complexity (from source) of multi-threaded programs, but this would be another approach to find whether ESC system and its proof machine (Simplify) will perform OK using it in normal basis, at the average case scenario (but Simplify has unsoundnesses and program-dependent checker coming from ESC front end), at least in a programming environment like Modula-3 to have the class of complexity of a programming model is something I want. However there is proof of such an environment used for big SW development at IBM, which targeted Modula-3, was not good without formal software analysis (in both fronts, development and performance) Thing is I don't how many studies of Software developers given by a systematic analysis are aside of IBM 80's and some more for Modula-3 theres later. So based in experience I can infer it's good, but in the real world I don't know how many will buy the idea not backed by some real good experience and with some real proof. Anyone else :)? Thanks in advance Thanks in advance --- El mar, 19/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: [M3devel] ESC Para: "m3devel" Fecha: martes, 19 de junio, 2012 12:00 On Tue, Jun 19, 2012 at 05:28:33PM +0100, Daniel Alejandro Benavides D. wrote: > Hi all: > I don't think is so much a waste of time, if compiler has done well, you don't need to make a debugger, but instead if it hasn't done it well, don't waste more time use a different compiler (optimized). > ESC had a X postfix for every package name that has been verified so you could have a different verified and compiled version and as a reference for program behavior and then use an experimental debug able version. > That was the idea with a Module system with separate compilation and version stamps, IMHO, to really have a fast to execute and easy to debug around cycle and you need that lately as compiler versions are getting faster or harder to debug. > The interesting stuff is whether you could use the same infrastructure to verify in less time or not, that will proof ESC is worth of anything which I'm sure no body uses for that reasoning broken -not Dragisha's nor Hendrick- but most people do ahead of time. > Thanks in advance > > --- El mar, 19/6/12, Dragi?a Duri? escribi?: Yes, I agree. It would be worthwhile to track down the ESC source code.? Or rewrite it. But until that's been done I'll probably need a debugger.? And maybe occasinoally afterward, for the things that ESC doesn't catch. -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Wed Jun 20 13:17:06 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Wed, 20 Jun 2012 07:17:06 -0400 Subject: [M3devel] test driver? Message-ID: <20120620111705.GA10486@topoi.pooq.com> Is there a test suite driver somewhere in the Modula 3 ecosystem? I'd like to feed various files of test data into a program to see if it produces acceptable output. Currently it's all text in and out, but I'd prefer not to have to rewrite my test suite because of trivialities, such as spelling corrections in my error messages. This is for regression testing, so automation is appreciated. -- hendrik From dragisha at m3w.org Wed Jun 20 13:26:53 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 20 Jun 2012 13:26:53 +0200 Subject: [M3devel] test driver? In-Reply-To: <20120620111705.GA10486@topoi.pooq.com> References: <20120620111705.GA10486@topoi.pooq.com> Message-ID: cm3/m3-libs/libm3/tests And under. AFAIK, there is continuous building/testing configured for cm3. Search for Hudson, Modula-3? On Jun 20, 2012, at 1:17 PM, Hendrik Boom wrote: > Is there a test suite driver somewhere in the Modula 3 ecosystem? > > I'd like to feed various files of test data into a program to see if it > produces acceptable output. Currently it's all text in and out, but I'd > prefer not to have to rewrite my test suite because of trivialities, > such as spelling corrections in my error messages. > > This is for regression testing, so automation is appreciated. > > -- hendrik > From dabenavidesd at yahoo.es Wed Jun 20 14:41:26 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 20 Jun 2012 13:41:26 +0100 (BST) Subject: [M3devel] test driver? In-Reply-To: Message-ID: <1340196086.64556.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: black-box testing for C or m3cgc, m3cg, m3cc, or m3cg is something we should use daily basis. I know of a free testing platform for C# based on Spec# I think we could use it for static optimization (test -O2 -O3) which combines both adding reasoning to the system (knowledge management): http://books.google.com.co/books?id=Am43BAC06L8C This can be a good thing to do in later stages (code generation, etc). Thanks in advance --- El mi?, 20/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] test driver? Para: "Hendrik Boom" CC: "m3devel" Fecha: mi?rcoles, 20 de junio, 2012 06:26 cm3/m3-libs/libm3/tests And under. AFAIK, there is continuous building/testing configured for cm3. Search for Hudson, Modula-3? On Jun 20, 2012, at 1:17 PM, Hendrik Boom wrote: > Is there a test suite driver somewhere in the Modula 3 ecosystem? > > I'd like to feed various files of test data into a program to see if it > produces acceptable output.? Currently it's all text in and out, but I'd > prefer not to have to rewrite my test suite because of trivialities, > such as spelling corrections in my error messages. > > This is for regression testing, so automation is appreciated. > > -- hendrik > -------------- next part -------------- An HTML attachment was scrubbed... URL: From wagner at elegosoft.com Fri Jun 22 09:16:16 2012 From: wagner at elegosoft.com (mail.elegosoft.com) Date: Fri, 22 Jun 2012 09:16:16 +0200 Subject: [M3devel] help test 4.7 backend? In-Reply-To: References: Message-ID: <20120622091616.18b39755.wagner@elegosoft.com> I just noticed that m3tests have been hanging on luthien/AMD64_FREEBSD for several days now in p006: http://hudson.modula3.com:8080/job/cm3-current-test-m3tests-AMD64_FREEBSD/479/console I don't know if it is related, but it used to run OK. Olaf On Sat, 16 Jun 2012 06:09:33 +0000 Jay K wrote: > > help test 4.7 backend? > > > Can folks try out the new 4.7 backend? > edit m3-sys/m3cc/src/m3makefile > add your platform to the list near the top, mapped to "47" > and then run scripts/python/boot2.sh > and then, do it again, but edit config/Unix.common, the functon > m3_backend to always args += m3back_optimize > and optionally but preferably try with -O3 instead of -O2 in > the same file > and try running some GUI apps like solataire > > > I could use help particularly with: > ?SPARC{32,64}_LINUX > ?PPC_{LINUX,OPENBSD,NETBSD,FREEBSD,DARWIN} > ?ALPHA_OSF > ?I386_LINUX, I386_INTERIX, I386_MINGWIN, I386_CYGWIN, because I'm being lazy > > > ? > I can do various x86/amd64, either in a VM or opencsw, > but splitting that load would be good too. > I might go back to not having much time soon or temporarily. > > > Still to do: > ? apply OpenBSD patches > ? update from 4.7.0 to 4.7.1 that was just released. > ? > ? > Thanks, > ?- Jay > -- Olaf Wagner -- elego Software Solutions GmbH Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 45 86 95 http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: Berlin Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194 From dabenavidesd at yahoo.es Fri Jun 22 17:51:37 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 22 Jun 2012 16:51:37 +0100 (BST) Subject: [M3devel] help test 4.7 backend? In-Reply-To: <20120622091616.18b39755.wagner@elegosoft.com> Message-ID: <1340380297.77309.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: maybe not, else if somebody isn't playing optimization unintended aggressively for m3tests/src but to break semantics of Modula-3 threads? I mean, m3 sources are OK, respect of the Thread interface, but I don't think for the thing they so call pthreads can be the same at the same time, though DEC-SRC hard influenced it. The only way to test? that is in No in SW bug, but the HW, kernel aside, but with this HW I can't be sure they are doing thread safe system code (in other words those machines are badly behaved). I have been thinking in this idea, but requiring to make a Virtual Machine for Modula-3 worth the value of playing it for that matter. It could have multithreading capabilities, tough multitasking system and all. Jay, and all we could try the DEC/Compaq Alpha/Piranha simulator, to catch that kind of errors. Thanks in advance --- El vie, 22/6/12, mail.elegosoft.com escribi?: De: mail.elegosoft.com Asunto: Re: [M3devel] help test 4.7 backend? Para: m3devel at elegosoft.com Fecha: viernes, 22 de junio, 2012 02:16 I just noticed that m3tests have been hanging on luthien/AMD64_FREEBSD for several days now in p006: http://hudson.modula3.com:8080/job/cm3-current-test-m3tests-AMD64_FREEBSD/479/console I don't know if it is related, but it used to run OK. Olaf On Sat, 16 Jun 2012 06:09:33 +0000 Jay K wrote: > > help test 4.7 backend? > > > Can folks try out the new 4.7 backend? > edit m3-sys/m3cc/src/m3makefile > add your platform to the list near the top, mapped to "47" > and then run scripts/python/boot2.sh > and then, do it again, but edit config/Unix.common, the functon > m3_backend to always args += m3back_optimize > and optionally but preferably try with -O3 instead of -O2 in > the same file > and try running some GUI apps like solataire > > > I could use help particularly with: > SPARC{32,64}_LINUX > PPC_{LINUX,OPENBSD,NETBSD,FREEBSD,DARWIN} > ALPHA_OSF > I386_LINUX, I386_INTERIX, I386_MINGWIN, I386_CYGWIN, because I'm being lazy > > > > I can do various x86/amd64, either in a VM or opencsw, > but splitting that load would be good too. > I might go back to not having much time soon or temporarily. > > > Still to do: > apply OpenBSD patches > update from 4.7.0 to 4.7.1 that was just released. > > > Thanks, > - Jay >? ??? ???????? ?????? ??? ? -- Olaf Wagner -- elego Software Solutions GmbH ? ? ? ? ? ? ???Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany phone: +49 30 23 45 86 96? mobile: +49 177 2345 869? fax: +49 30 23 45 86 95 ???http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: Berlin Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194 -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Sat Jun 23 02:45:17 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 23 Jun 2012 00:45:17 +0000 Subject: [M3devel] help test 4.7 backend? In-Reply-To: <20120622091616.18b39755.wagner@elegosoft.com> References: , <20120622091616.18b39755.wagner@elegosoft.com> Message-ID: I kind of haven't touched FreeBSD. They are still on gcc 4.5. But maybe I did. I'll look into it maybe soon..but I'm super busy the next two weeks. I'm hoping to test FreeBSD/x86 and FreeBSD/amd64 with gcc 4.7 and then move them to it. Thank you for pointing this out. It is good to see the Hudson stuff continue to work. My nodes are kind of all down/gone -- some remain but the router is no longer configured as it was. ?- Jay ---------------------------------------- > Date: Fri, 22 Jun 2012 09:16:16 +0200 > From: wagner at elegosoft.com > To: m3devel at elegosoft.com > Subject: Re: [M3devel] help test 4.7 backend? > > I just noticed that m3tests have been hanging on luthien/AMD64_FREEBSD > for several days now in p006: > > http://hudson.modula3.com:8080/job/cm3-current-test-m3tests-AMD64_FREEBSD/479/console > > I don't know if it is related, but it used to run OK. > > Olaf > > On Sat, 16 Jun 2012 06:09:33 +0000 > Jay K wrote: > > > > > help test 4.7 backend? > > > > > > Can folks try out the new 4.7 backend? > > edit m3-sys/m3cc/src/m3makefile > > add your platform to the list near the top, mapped to "47" > > and then run scripts/python/boot2.sh > > and then, do it again, but edit config/Unix.common, the functon > > m3_backend to always args += m3back_optimize > > and optionally but preferably try with -O3 instead of -O2 in > > the same file > > and try running some GUI apps like solataire > > > > > > I could use help particularly with: > > SPARC{32,64}_LINUX > > PPC_{LINUX,OPENBSD,NETBSD,FREEBSD,DARWIN} > > ALPHA_OSF > > I386_LINUX, I386_INTERIX, I386_MINGWIN, I386_CYGWIN, because I'm being lazy > > > > > > > > I can do various x86/amd64, either in a VM or opencsw, > > but splitting that load would be good too. > > I might go back to not having much time soon or temporarily. > > > > > > Still to do: > > apply OpenBSD patches > > update from 4.7.0 to 4.7.1 that was just released. > > > > > > Thanks, > > - Jay > > > > -- > Olaf Wagner -- elego Software Solutions GmbH > Gustav-Meyer-Allee 25 / Geb?ude 12, 13355 Berlin, Germany > phone: +49 30 23 45 86 96 mobile: +49 177 2345 869 fax: +49 30 23 45 86 95 > http://www.elegosoft.com | Gesch?ftsf?hrer: Olaf Wagner | Sitz: Berlin > Handelregister: Amtsgericht Charlottenburg HRB 77719 | USt-IdNr: DE163214194 From dragisha at m3w.org Mon Jun 25 12:51:05 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 12:51:05 +0200 Subject: [M3devel] Windows, Unicode file names Message-ID: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> Anybody aware of issues with FSWin32.m3 and cases where one actually has to cover situation with non-ASCII filenames under Windows? Met problem? TIA, dd From dabenavidesd at yahoo.es Mon Jun 25 18:52:35 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 17:52:35 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> Message-ID: <1340643155.54846.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: I was asked why there wasn't a faster Modula-3 environment (the Modula-3 NT386GNU is way too slow even nowadays) and without? answer I guess this is the same question of who wants Windows- ready environment and if you are interested DEC had a project M3lite for WinNT/95 (compatible) system I guess addressing compatibility with old users might get better results for CM3, but that's history now, which don't makes or makes little sense anyway on the understanding that Windows8 will be incompatible anyway with Win32. As of today I haven't understand what is the new API they will bring on, and frankly I don't care either if they have a new system to get hands on, but certainly you would want sort like that if you have a tablet or mobile phone where there isn't too much time to spend compiling from source Gcc. Thanks in advance ? --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] Windows, Unicode file names Para: "m3devel" Fecha: lunes, 25 de junio, 2012 05:51 Anybody aware of issues with FSWin32.m3 and cases where one actually has to cover situation with non-ASCII filenames under Windows? Met problem? TIA, dd -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 18:54:56 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 18:54:56 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340643155.54846.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1340643155.54846.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: <8A65A674-1120-459E-98FC-AF622D24EC66@m3w.org> Daniel, please start your own topics and don't dillute other discussions with off topic talk. Thanks in advance, dd On Jun 25, 2012, at 6:52 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > I was asked why there wasn't a faster Modula-3 environment (the Modula-3 NT386GNU is way too slow even nowadays) and without answer I guess this is the same question of who wants Windows- ready environment and if you are interested DEC had a project M3lite for WinNT/95 (compatible) system -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jun 25 19:04:20 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 18:04:20 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <8A65A674-1120-459E-98FC-AF622D24EC66@m3w.org> Message-ID: <1340643860.74333.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: thanks but I don't know if you know that M3lite was Win95 NT compatible system. Perhaps I missed what your point is, but this is the same question I guess (but I don't know your answer either that's a different point). See M3-FAQ ( WHAT IS M3-LITE, MS-WINDOWS SUPPORT ) Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 11:54 Daniel, please start your own topics and don't dillute other discussions with off topic talk. Thanks in advance,dd On Jun 25, 2012, at 6:52 PM, Daniel Alejandro Benavides D. wrote: Hi all: I was asked why there wasn't a faster Modula-3 environment (the Modula-3 NT386GNU is way too slow even nowadays) and without? answer I guess this is the same question of who wants Windows- ready environment and if you are interested DEC had a project M3lite for WinNT/95 (compatible) system -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 19:07:20 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 19:07:20 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340643860.74333.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1340643860.74333.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: <8E8B1021-7B2C-415F-A965-F49257C4C2FB@m3w.org> See subject - Windows, Unicode file names. Thank in advance. On Jun 25, 2012, at 7:04 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > thanks but I don't know if you know that M3lite was Win95 NT compatible system. > Perhaps I missed what your point is, but this is the same question I guess (but I don't know your answer either that's a different point). See M3-FAQ (WHAT IS M3-LITE, MS-WINDOWS SUPPORT ) > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jun 25 19:27:44 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 18:27:44 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <8E8B1021-7B2C-415F-A965-F49257C4C2FB@m3w.org> Message-ID: <1340645264.336.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance PS being clearer about topics is what I want so please be free to tell me as? as much I'm not --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:07 See subject - Windows, Unicode file names. Thank in advance. On Jun 25, 2012, at 7:04 PM, Daniel Alejandro Benavides D. wrote: Hi all: thanks but I don't know if you know that M3lite was Win95 NT compatible system. Perhaps I missed what your point is, but this is the same question I guess (but I don't know your answer either that's a different point). See M3-FAQ (WHAT IS M3-LITE, MS-WINDOWS SUPPORT?) Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 19:36:39 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 19:36:39 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340645264.336.YahooMailClassic@web29701.mail.ird.yahoo.com> References: <1340645264.336.YahooMailClassic@web29701.mail.ird.yahoo.com> Message-ID: Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From rcolebur at SCIRES.COM Mon Jun 25 19:51:10 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Mon, 25 Jun 2012 13:51:10 -0400 Subject: [M3devel] EXT [M3commit] CVS Update: cm3 In-Reply-To: <20120624094248.AAB932474003@birch.elegosoft.com> References: <20120624094248.AAB932474003@birch.elegosoft.com> Message-ID: Does this mean HPUX will no longer be supported? -----Original Message----- From: Jay Krell [mailto:jkrell at elego.de] Sent: Sunday, June 24, 2012 7:43 AM To: m3commit at elegosoft.com Subject: EXT [M3commit] CVS Update: cm3 CVSROOT: /usr/cvs Changes by: jkrell at birch. 12/06/24 11:42:45 Modified files: cm3/m3-sys/cminstall/src/config-no-install/: Unix.common Log message: hpux_flags is never used, remove it From dabenavidesd at yahoo.es Mon Jun 25 20:06:10 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 19:06:10 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: Message-ID: <1340647570.11529.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 20:20:01 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 20:20:01 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340647570.11529.YahooMailClassic@web29706.mail.ird.yahoo.com> References: <1340647570.11529.YahooMailClassic@web29706.mail.ird.yahoo.com> Message-ID: <6DF57887-C46F-408C-863F-1242C4C4C6A9@m3w.org> Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > >> Hi all: >> I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. >> But in turn you want to keep compatibility with older file name encodes. >> I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! >> Thanks in advance > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 20:40:43 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 18:40:43 +0000 Subject: [M3devel] EXT [M3commit] CVS Update: cm3 In-Reply-To: References: <20120624094248.AAB932474003@birch.elegosoft.com>, Message-ID: No. This does not represent a loss of any support for any target.It is just a removal of a local variable that is initialized and never further referenced, unless I read the code incorrectly.On the other hand, I don't think anyone here has HPUX available for any testing/development.I used to, but no longer. - Jay > From: rcolebur at SCIRES.COM > To: jkrell at elego.de; m3devel at elegosoft.com > Date: Mon, 25 Jun 2012 13:51:10 -0400 > Subject: Re: [M3devel] EXT [M3commit] CVS Update: cm3 > > Does this mean HPUX will no longer be supported? > > -----Original Message----- > From: Jay Krell [mailto:jkrell at elego.de] > Sent: Sunday, June 24, 2012 7:43 AM > To: m3commit at elegosoft.com > Subject: EXT [M3commit] CVS Update: cm3 > > CVSROOT: /usr/cvs > Changes by: jkrell at birch. 12/06/24 11:42:45 > > Modified files: > cm3/m3-sys/cminstall/src/config-no-install/: Unix.common > > Log message: > hpux_flags is never used, remove it > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 20:49:22 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 20:49:22 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> Message-ID: <461C7DCF-E432-4434-BAD3-7FA3B9775F45@m3w.org> My situation was - Gtk2 interface (GtkFileChooser in my case) returns an UTF-8 encoded string. UTF-8 being GLib internal/native encoding. Neither CreateFileA not CreateFileW can handle it so I "hardcoded" some logic into FS.OpenFile(Readonly)? and handled a case with non-ASCII input. Ideal would be to have encoding information as an integral part of every TEXT, but? In my knowledge, POSIX systems handle UTF-8 filenames well (?check:) so explicit information on encoding for FS is needed only for Windows. On Jun 25, 2012, at 8:44 PM, Jay K wrote: > Functions like CreateFileA use the "ANSI" or "OEM" code page, subject to a public global in Win32, and the two code pages vary per-install (or per-user). It is just not a good system. > > > Functions like CreateFileW work very well with 16bit encoded characters. > > > Can/do we arrange to have 16bit encoded characters? > > > - Jay > > > From: dragisha at m3w.org > > Date: Mon, 25 Jun 2012 12:51:05 +0200 > > To: m3devel at elegosoft.com > > Subject: [M3devel] Windows, Unicode file names > > > > Anybody aware of issues with FSWin32.m3 and cases where one actually has to cover situation with non-ASCII filenames under Windows? Met problem? > > > > TIA, > > dd > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 20:44:18 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 18:44:18 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> References: <33AC198A-B8BB-40E9-9F05-6E08A3676539@m3w.org> Message-ID: Functions like CreateFileA use the "ANSI" or "OEM" code page, subject to a public global in Win32, and the two code pages vary per-install (or per-user). It is just not a good system. Functions like CreateFileW work very well with 16bit encoded characters. Can/do we arrange to have 16bit encoded characters? - Jay > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 12:51:05 +0200 > To: m3devel at elegosoft.com > Subject: [M3devel] Windows, Unicode file names > > Anybody aware of issues with FSWin32.m3 and cases where one actually has to cover situation with non-ASCII filenames under Windows? Met problem? > > TIA, > dd > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 21:05:59 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 21:05:59 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> Message-ID: <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> If you cared enough to check FSWin32.m3, answer would be obvious :). Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= VAR handle: WinNT.HANDLE; fname := M3toC.SharedTtoS(p); dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); pwText: WinBaseTypes.PCWSTR; BEGIN IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN (* dwNum includes terminating null character. that's +1 above. *) handle := WinBase.CreateFile( lpFileName := fname, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); ELSE pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); handle := WinBase.CreateFileW( lpFileName := pwText, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); DISPOSE(pwText); END; IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN Fail(p, fname); END; M3toC.FreeSharedS(p, fname); RETURN FileWin32.New(handle, FileWin32.Read) END OpenFileReadonly; And similar in OpenFile. Not nice :). Also, I've added CP_UTF8 constant to WinNLS.i3. On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > >> Hi all: >> OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. >> But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): >> http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html >> >> Thanks in advance >> >> --- El lun, 25/6/12, Dragi?a Duri? escribi?: >> >> De: Dragi?a Duri? >> Asunto: Re: [M3devel] Windows, Unicode file names >> Para: "Daniel Alejandro Benavides D." >> CC: "m3devel" >> Fecha: lunes, 25 de junio, 2012 12:36 >> >> Daniel, >> >> I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. >> >> Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. >> >> I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. >> >> dd >> >> On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: >> >>> Hi all: >>> I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. >>> But in turn you want to keep compatibility with older file name encodes. >>> I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! >>> Thanks in advance >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Mon Jun 25 21:01:56 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Mon, 25 Jun 2012 20:01:56 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <6DF57887-C46F-408C-863F-1242C4C4C6A9@m3w.org> Message-ID: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> Hi all: So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 13:20 Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 21:39:04 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 19:39:04 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> Message-ID: I think I know what to do here and will look into it..later.. We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW.Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution.A layer above this needs to decode UTF8, if that is the encoding. Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. - Jay From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:05:59 +0200 To: dabenavidesd at yahoo.es CC: m3devel at elegosoft.com Subject: Re: [M3devel] Windows, Unicode file names If you cared enough to check FSWin32.m3, answer would be obvious :). Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= VAR handle: WinNT.HANDLE; fname := M3toC.SharedTtoS(p); dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); pwText: WinBaseTypes.PCWSTR; BEGIN IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN (* dwNum includes terminating null character. that's +1 above. *) handle := WinBase.CreateFile( lpFileName := fname, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); ELSE pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); handle := WinBase.CreateFileW( lpFileName := pwText, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); DISPOSE(pwText); END; IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN Fail(p, fname); END; M3toC.FreeSharedS(p, fname); RETURN FileWin32.New(handle, FileWin32.Read) END OpenFileReadonly; And similar in OpenFile. Not nice :). Also, I've added CP_UTF8 constant to WinNLS.i3. On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote:Hi all: So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 13:20 Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote:Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote:Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 21:48:09 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 21:48:09 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> Message-ID: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. As long as we agree on what exacty WIDECHAR is :) ===From wikipedia The Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed]. === On Jun 25, 2012, at 9:39 PM, Jay K wrote: > I think I know what to do here and will look into it..later.. > > We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. > Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. > A layer above this needs to decode UTF8, if that is the encoding. > > Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. > > - Jay > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:05:59 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > If you cared enough to check FSWin32.m3, answer would be obvious :). > > Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: > > PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= > VAR > handle: WinNT.HANDLE; > fname := M3toC.SharedTtoS(p); > dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); > pwText: WinBaseTypes.PCWSTR; > BEGIN > IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN > (* dwNum includes terminating null character. that's +1 above. > *) > handle := WinBase.CreateFile( > lpFileName := fname, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > ELSE > pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); > EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); > handle := WinBase.CreateFileW( > lpFileName := pwText, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > DISPOSE(pwText); > END; > > IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN > Fail(p, fname); > END; > M3toC.FreeSharedS(p, fname); > RETURN FileWin32.New(handle, FileWin32.Read) > END OpenFileReadonly; > > And similar in OpenFile. Not nice :). > > Also, I've added CP_UTF8 constant to WinNLS.i3. > > On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 22:17:52 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 20:17:52 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> Message-ID: I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert fromTEXT to a flat array of either, and if 32bits, walk the array, checking for > 0xFFFF, throw an exception or return some error if any found, narrow to 16bits, call some "W" function, free the flat array.The size can, I guess, vary between Win32 and non-Win32 platforms.Its size should be stored in a global to communicate between Modula-3 and C. I'd also quite like if TEXT was internally represented as a nul terminated flat array of 8 and/or 16 and/or 32bit quantities, materialzing on demand some of them. But I suspect that flat and readonly and exposing a concat operation are in conflict. I'm not sure. MFC uses a flat reference counted nul terminated representation and it works pretty well. It doesn't materialize-on-demand other widths. - Jay Subject: Re: [M3devel] Windows, Unicode file names From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:48:09 +0200 CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com To: jay.krell at cornell.edu It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. As long as we agree on what exacty WIDECHAR is :)===From wikipediaThe Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed].=== On Jun 25, 2012, at 9:39 PM, Jay K wrote:I think I know what to do here and will look into it..later.. We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. A layer above this needs to decode UTF8, if that is the encoding. Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. - Jay From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:05:59 +0200 To: dabenavidesd at yahoo.es CC: m3devel at elegosoft.com Subject: Re: [M3devel] Windows, Unicode file names If you cared enough to check FSWin32.m3, answer would be obvious :). Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= VAR handle: WinNT.HANDLE; fname := M3toC.SharedTtoS(p); dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); pwText: WinBaseTypes.PCWSTR; BEGIN IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN (* dwNum includes terminating null character. that's +1 above. *) handle := WinBase.CreateFile( lpFileName := fname, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); ELSE pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); handle := WinBase.CreateFileW( lpFileName := pwText, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); DISPOSE(pwText); END; IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN Fail(p, fname); END; M3toC.FreeSharedS(p, fname); RETURN FileWin32.New(handle, FileWin32.Read) END OpenFileReadonly; And similar in OpenFile. Not nice :). Also, I've added CP_UTF8 constant to WinNLS.i3. On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote:Hi all: So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 13:20 Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote:Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote:Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Mon Jun 25 22:34:22 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Mon, 25 Jun 2012 16:34:22 -0400 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> Message-ID: <20120625203422.GA24287@topoi.pooq.com> On Mon, Jun 25, 2012 at 08:17:52PM +0000, Jay K wrote: > > I'd also quite like if TEXT was internally represented as a nul > terminated flat array of 8 and/or 16 and/or 32bit quantities, > materialzing on demand some of them. Does that conflict with NUL being a valid ASCII character? -- hendrik From rodney_bates at lcwb.coop Mon Jun 25 22:29:06 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Mon, 25 Jun 2012 15:29:06 -0500 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> Message-ID: <4FE8CA12.5040104@lcwb.coop> On 06/25/2012 02:48 PM, Dragi?a Duri? wrote: > It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. > > At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. > > As long as we agree on what exacty WIDECHAR is :) > ===From wikipedia > The Microsoft Windows application programming interfaces Win32 and Win64 , as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix -like systems generally require 32-bit values encoded using UTF-32 ^[/citation needed /] . > === > This is not necessarily a proposal, but FWIW: hen working on my altered cm3 TEXT implementations, I put every relevant thing I could find into a state that should allow M3 WIDECHAR to be 32-bit, with only one or two declarations changed. I think Pickles might need some attention to cope with this, however. We would want them to not only handle 32-bit WIDECHAR, but be able to read older pickle files that used 16-bits. > > On Jun 25, 2012, at 9:39 PM, Jay K wrote: > >> I think I know what to do here and will look into it..later.. >> >> We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. >> Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. >> A layer above this needs to decode UTF8, if that is the encoding. >> >> Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. >> >> - Jay > From jay.krell at cornell.edu Mon Jun 25 22:46:18 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 20:46:18 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <20120625203422.GA24287@topoi.pooq.com> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com> Message-ID: Somewhat but not fully. Text.Length should fetch a stored length. As I'm sure it already does.That length should always be correctly maintained. Same as today.Adding one extra nul at the end doesn't invalidate the data.std::string has the same properties -- c_str() can on-demand append a terminal nul,but there could also be one in the string itself.I understand it is a bit wierd. Maintaining a terminal nul does add cost that might be wasted.And reduces the capacity by one.It could be on-demand, I guess. - Jay > Date: Mon, 25 Jun 2012 16:34:22 -0400 > From: hendrik at topoi.pooq.com > To: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > On Mon, Jun 25, 2012 at 08:17:52PM +0000, Jay K wrote: > > > > I'd also quite like if TEXT was internally represented as a nul > > terminated flat array of 8 and/or 16 and/or 32bit quantities, > > materialzing on demand some of them. > > Does that conflict with NUL being a valid ASCII character? > > -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 23:09:37 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 23:09:37 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> Message-ID: On Jun 25, 2012, at 10:17 PM, Jay K wrote: > I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from > TEXT to a flat array of either, and if 32bits, walk the array, checking for > 0xFFFF, throw an exception or return some error if any found, narrow to 16bits, call some "W" function, free the flat array. > The size can, I guess, vary between Win32 and non-Win32 platforms. a) If you like to make it as unportable as possible then yes - 16 or 32 is not important. b) invalid value would be over 0xFFFFF, not 0xFFFF c) Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? WinNLS does that. Simple narrowing (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to UTF-16 is very different thing. d) Size varies, yes. > Its size should be stored in a global to communicate between Modula-3 and C. > > > I'd also quite like if TEXT was internally represented as a nul terminated flat array of 8 and/or 16 and/or 32bit quantities, materialzing on demand some of them. But I suspect that flat and readonly and exposing a concat operation are in conflict. I'm not sure. MFC uses a flat reference counted nul terminated representation and it works pretty well. It doesn't materialize-on-demand other widths. > > - Jay > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:48:09 +0200 > CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. > > At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. > > As long as we agree on what exacty WIDECHAR is :) > ===From wikipedia > The Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed]. > === > > > On Jun 25, 2012, at 9:39 PM, Jay K wrote: > > I think I know what to do here and will look into it..later.. > > We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. > Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. > A layer above this needs to decode UTF8, if that is the encoding. > > Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. > > - Jay > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:05:59 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > If you cared enough to check FSWin32.m3, answer would be obvious :). > > Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: > > PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= > VAR > handle: WinNT.HANDLE; > fname := M3toC.SharedTtoS(p); > dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); > pwText: WinBaseTypes.PCWSTR; > BEGIN > IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN > (* dwNum includes terminating null character. that's +1 above. > *) > handle := WinBase.CreateFile( > lpFileName := fname, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > ELSE > pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); > EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); > handle := WinBase.CreateFileW( > lpFileName := pwText, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > DISPOSE(pwText); > END; > > IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN > Fail(p, fname); > END; > M3toC.FreeSharedS(p, fname); > RETURN FileWin32.New(handle, FileWin32.Read) > END OpenFileReadonly; > > And similar in OpenFile. Not nice :). > > Also, I've added CP_UTF8 constant to WinNLS.i3. > > On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Mon Jun 25 23:11:49 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Mon, 25 Jun 2012 23:11:49 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <4FE8CA12.5040104@lcwb.coop> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <4FE8CA12.5040104@lcwb.coop> Message-ID: <99C12F66-6DC0-4FC3-BC99-3C2A61595CBC@m3w.org> I agree with this. This way we are compatible with Unices (majority of systems we use) but we also have straight way to W functions of Windows API, similar to method I used but with distinctive presumption of input encoding. On Jun 25, 2012, at 10:29 PM, Rodney M. Bates wrote: > This is not necessarily a proposal, but FWIW: > > hen working on my altered cm3 TEXT implementations, I put every relevant thing I could find into > a state that should allow M3 WIDECHAR to be 32-bit, with only one or two declarations > changed. I think Pickles might need some attention to cope with this, however. We would > want them to not only handle 32-bit WIDECHAR, but be able to read older pickle files that > used 16-bits. -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Mon Jun 25 23:30:08 2012 From: jay.krell at cornell.edu (Jay K) Date: Mon, 25 Jun 2012 21:30:08 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> , Message-ID: > Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? Yes. > WinNLS does that. I doubt that. There is a 32bit to 16bit conversion?Ok, I guess there is. "Surrogate pairs" and all that?Maybe not in WinNLS, but easy enough for us to write, in portable C or Modula-3. :)Part of Text.i3 perhaps. So then, I guess I can sign up for WIDECHAR being 32bits across the board. - Jay Subject: Re: [M3devel] Windows, Unicode file names From: dragisha at m3w.org Date: Mon, 25 Jun 2012 23:09:37 +0200 CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com To: jay.krell at cornell.edu On Jun 25, 2012, at 10:17 PM, Jay K wrote:I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from TEXT to a flat array of either, and if 32bits, walk the array, checking for > 0xFFFF, throw an exception or return some error if any found, narrow to 16bits, call some "W" function, free the flat array. The size can, I guess, vary between Win32 and non-Win32 platforms. a) If you like to make it as unportable as possible then yes - 16 or 32 is not important.b) invalid value would be over 0xFFFFF, not 0xFFFFc) Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? WinNLS does that. Simple narrowing (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to UTF-16 is very different thing.d) Size varies, yes. Its size should be stored in a global to communicate between Modula-3 and C. I'd also quite like if TEXT was internally represented as a nul terminated flat array of 8 and/or 16 and/or 32bit quantities, materialzing on demand some of them. But I suspect that flat and readonly and exposing a concat operation are in conflict. I'm not sure. MFC uses a flat reference counted nul terminated representation and it works pretty well. It doesn't materialize-on-demand other widths. - Jay Subject: Re: [M3devel] Windows, Unicode file names From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:48:09 +0200 CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com To: jay.krell at cornell.edu It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. As long as we agree on what exacty WIDECHAR is :)===From wikipediaThe Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed].=== On Jun 25, 2012, at 9:39 PM, Jay K wrote:I think I know what to do here and will look into it..later.. We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. A layer above this needs to decode UTF8, if that is the encoding. Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. - Jay From: dragisha at m3w.org Date: Mon, 25 Jun 2012 21:05:59 +0200 To: dabenavidesd at yahoo.es CC: m3devel at elegosoft.com Subject: Re: [M3devel] Windows, Unicode file names If you cared enough to check FSWin32.m3, answer would be obvious :). Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= VAR handle: WinNT.HANDLE; fname := M3toC.SharedTtoS(p); dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); pwText: WinBaseTypes.PCWSTR; BEGIN IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN (* dwNum includes terminating null character. that's +1 above. *) handle := WinBase.CreateFile( lpFileName := fname, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); ELSE pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); handle := WinBase.CreateFileW( lpFileName := pwText, dwDesiredAccess := WinNT.GENERIC_READ, dwShareMode := WinNT.FILE_SHARE_READ, lpSecurityAttributes := NIL, dwCreationDisposition := WinBase.OPEN_EXISTING, dwFlagsAndAttributes := 0, hTemplateFile := NIL); DISPOSE(pwText); END; IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN Fail(p, fname); END; M3toC.FreeSharedS(p, fname); RETURN FileWin32.New(handle, FileWin32.Read) END OpenFileReadonly; And similar in OpenFile. Not nice :). Also, I've added CP_UTF8 constant to WinNLS.i3. On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote:Hi all: So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 13:20 Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote:Hi all: OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html Thanks in advance --- El lun, 25/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: lunes, 25 de junio, 2012 12:36 Daniel, I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. dd On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote:Hi all: I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. But in turn you want to keep compatibility with older file name encodes. I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 26 00:55:45 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 00:55:45 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> , Message-ID: <1E70B011-E236-4931-AB6C-78EC47EA8126@m3w.org> On Jun 25, 2012, at 11:30 PM, Jay K wrote: > > Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? > > Yes. > > > WinNLS does that. > > > I doubt that. There is a 32bit to 16bit conversion? http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx whatever this means: 12000utf-32Unicode UTF-32, little endian byte order; available only to managed applications 12001utf-32BEUnicode UTF-32, big endian byte order; available only to managed applications > Ok, I guess there is. "Surrogate pairs" and all that? > Maybe not in WinNLS, but easy enough for us to write, in portable C or Modula-3. :) That too :) > Part of Text.i3 perhaps. UTF-32 -> UTF-16? Maybe. > > > So then, I guess I can sign up for WIDECHAR being 32bits across the board. > > - Jay > > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 23:09:37 +0200 > CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > > On Jun 25, 2012, at 10:17 PM, Jay K wrote: > > I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from > TEXT to a flat array of either, and if 32bits, walk the array, checking for > 0xFFFF, throw an exception or return some error if any found, narrow to 16bits, call some "W" function, free the flat array. > The size can, I guess, vary between Win32 and non-Win32 platforms. > > a) If you like to make it as unportable as possible then yes - 16 or 32 is not important. > b) invalid value would be over 0xFFFFF, not 0xFFFF > c) Why would you narrow it to 16bit? You need to convert to UTF-16 and make it ready for Windows API calls? WinNLS does that. Simple narrowing (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to UTF-16 is very different thing. > d) Size varies, yes. > > Its size should be stored in a global to communicate between Modula-3 and C. > > > I'd also quite like if TEXT was internally represented as a nul terminated flat array of 8 and/or 16 and/or 32bit quantities, materialzing on demand some of them. But I suspect that flat and readonly and exposing a concat operation are in conflict. I'm not sure. MFC uses a flat reference counted nul terminated representation and it works pretty well. It doesn't materialize-on-demand other widths. > > - Jay > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:48:09 +0200 > CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > It can be what cm3 people had in mind when they created WIDECHAR as a catchall for Unicode. > > At first glance it looked like no solution to me, but after counting to ten - I think it is. We can have an UTF-8 layer and use it when and where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. > > As long as we agree on what exacty WIDECHAR is :) > ===From wikipedia > The Microsoft Windows application programming interfaces Win32 and Win64, as well as the Java and .Net Framework platforms, require that wide character variables be defined as 16-bit values, and that characters be encoded using UTF-16 (due to former use of UCS-2), while modern Unix-like systems generally require 32-bit values encoded using UTF-32[citation needed]. > === > > > On Jun 25, 2012, at 9:39 PM, Jay K wrote: > > I think I know what to do here and will look into it..later.. > > We have TEXT. We should just always get WIDECHARs out of it and call CreateFileW. > Assuming UTF8 is the wrong solution at this level, and passing in UTF8 won't work with the correct solution. > A layer above this needs to decode UTF8, if that is the encoding. > > Unless someone has declared and implemented that TEXT is in fact always UTF8-encoded, which I doubt. > > - Jay > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:05:59 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > If you cared enough to check FSWin32.m3, answer would be obvious :). > > Whatever I do with pathname before I call FS.OpenFile(Readonly)? - FSWin32.m3 will call CreateFileA. My solution is: > > PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= > VAR > handle: WinNT.HANDLE; > fname := M3toC.SharedTtoS(p); > dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, NIL, 0); > pwText: WinBaseTypes.PCWSTR; > BEGIN > IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN > (* dwNum includes terminating null character. that's +1 above. > *) > handle := WinBase.CreateFile( > lpFileName := fname, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > ELSE > pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), WinBaseTypes.PCWSTR); > EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, pwText, dwNum); > handle := WinBase.CreateFileW( > lpFileName := pwText, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > DISPOSE(pwText); > END; > > IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN > Fail(p, fname); > END; > M3toC.FreeSharedS(p, fname); > RETURN FileWin32.New(handle, FileWin32.Read) > END OpenFileReadonly; > > And similar in OpenFile. Not nice :). > > Also, I've added CP_UTF8 constant to WinNLS.i3. > > On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > So do you need Double-Byte Character String module as currently in TEXT types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of Modula-3 (but for CM3, though it isn't compiled in elego servers, but here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > CC: "m3devel" > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest to me. Once you start a topic, and I can understand what is it about, and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files with filenames in ASCII and UTF-16. Everything else - you must check twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with people who understand this problem. My solution is a workaround and assumes filename is UTF-8 or ASCII. I would like to start discussion on this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't care at all either) I don't know know your problem was because it won't be able to be solved! > Thanks in advance -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Tue Jun 26 02:58:05 2012 From: jay.krell at cornell.edu (Jay K) Date: Tue, 26 Jun 2012 00:58:05 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <1E70B011-E236-4931-AB6C-78EC47EA8126@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , , , , <1E70B011-E236-4931-AB6C-78EC47EA8126@m3w.org> Message-ID: ? > http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx?? ? > 12000utf-32Unicode UTF-32, little endian byte order; available only to? managed applications?? ? > 12001utf-32BEUnicode UTF-32, big endian byte order; available only to? managed applications ? Is not useful to us...unless we target .NET instead of native code... Portable Modula-3 or C it should be. ?- Jay ________________________________ > From: dragisha at m3w.org > Date: Tue, 26 Jun 2012 00:55:45 +0200 > To: jay.krell at cornell.edu > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > > On Jun 25, 2012, at 11:30 PM, Jay K wrote: > > > Why would you narrow it to 16bit? You need to convert to UTF-16 and > make it ready for Windows API calls? > > Yes. > > > WinNLS does that. > > > I doubt that. There is a 32bit to 16bit conversion? > > http://msdn.microsoft.com/en-us/library/windows/desktop/dd317756%28v=vs.85%29.aspx > > whatever this means: > 12000utf-32Unicode UTF-32, little endian byte order; available only to > managed applications > 12001utf-32BEUnicode UTF-32, big endian byte order; available only to > managed applications > > Ok, I guess there is. "Surrogate pairs" and all that? > Maybe not in WinNLS, but easy enough for us to write, in portable C or > Modula-3. :) > > That too :) > > Part of Text.i3 perhaps. > > UTF-32 -> UTF-16? Maybe. > > > > So then, I guess I can sign up for WIDECHAR being 32bits across the board. > > - Jay > > ________________________________ > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 23:09:37 +0200 > CC: dabenavidesd at yahoo.es; > m3devel at elegosoft.com > To: jay.krell at cornell.edu > > > On Jun 25, 2012, at 10:17 PM, Jay K wrote: > > I don't care if WIDECHAR is 16 bits or 32bits, as long as I can convert from > TEXT to a flat array of either, and if 32bits, walk the array, checking > for > 0xFFFF, throw an exception or return some error if any found, > narrow to 16bits, call some "W" function, free the flat array. > The size can, I guess, vary between Win32 and non-Win32 platforms. > > a) If you like to make it as unportable as possible then yes - 16 or 32 > is not important. > b) invalid value would be over 0xFFFFF, not 0xFFFF > c) Why would you narrow it to 16bit? You need to convert to UTF-16 and > make it ready for Windows API calls? WinNLS does that. Simple narrowing > (similar to commented in Text.i3) to 16bit and recoding from UTF-32 to > UTF-16 is very different thing. > d) Size varies, yes. > > Its size should be stored in a global to communicate between Modula-3 and C. > > > I'd also quite like if TEXT was internally represented as a nul > terminated flat array of 8 and/or 16 and/or 32bit quantities, > materialzing on demand some of them. But I suspect that flat and > readonly and exposing a concat operation are in conflict. I'm not sure. > MFC uses a flat reference counted nul terminated representation and it > works pretty well. It doesn't materialize-on-demand other widths. > > - Jay > ________________________________ > Subject: Re: [M3devel] Windows, Unicode file names > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:48:09 +0200 > CC: dabenavidesd at yahoo.es; m3devel at elegosoft.com > To: jay.krell at cornell.edu > > It can be what cm3 people had in mind when they created WIDECHAR as a > catchall for Unicode. > > At first glance it looked like no solution to me, but after counting to > ten - I think it is. We can have an UTF-8 layer and use it when and > where needed, to recode our strings to catchall WIDECHAR/WIDETEXT. > > As long as we agree on what exacty WIDECHAR is :) > ===From wikipedia > The Microsoft Windows application programming > interfaces Win32 and Win64, > as well as > the Java and .Net > Framework platforms, > require that wide character variables be defined as 16-bit values, and > that characters be encoded > using UTF-16 (due to former use of > UCS-2), while modern Unix-like > systems generally require 32-bit values encoded > using UTF-32[citation > needed]. > === > > > On Jun 25, 2012, at 9:39 PM, Jay K wrote: > > I think I know what to do here and will look into it..later.. > > We have TEXT. We should just always get WIDECHARs out of it and call > CreateFileW. > Assuming UTF8 is the wrong solution at this level, and passing in UTF8 > won't work with the correct solution. > A layer above this needs to decode UTF8, if that is the encoding. > > Unless someone has declared and implemented that TEXT is in fact always > UTF8-encoded, which I doubt. > > - Jay > ________________________________ > From: dragisha at m3w.org > Date: Mon, 25 Jun 2012 21:05:59 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > If you cared enough to check FSWin32.m3, answer would be obvious :). > > Whatever I do with pathname before I call FS.OpenFile(Readonly)? - > FSWin32.m3 will call CreateFileA. My solution is: > > PROCEDURE OpenFileReadonly(p: Pathname.T): File.T RAISES {OSError.E}= > VAR > handle: WinNT.HANDLE; > fname := M3toC.SharedTtoS(p); > dwNum := WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, > NIL, 0); > pwText: WinBaseTypes.PCWSTR; > BEGIN > IF dwNum = 0 OR dwNum = Text.Length(p) + 1 THEN > (* dwNum includes terminating null character. that's +1 above. > *) > handle := WinBase.CreateFile( > lpFileName := fname, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > ELSE > pwText := LOOPHOLE(NEW(UNTRACED REF ARRAY OF CHAR, dwNum*2), > WinBaseTypes.PCWSTR); > EVAL WinNLS.MultiByteToWideChar (WinNLS.CP_UTF8, 0, fname, -1, > pwText, dwNum); > handle := WinBase.CreateFileW( > lpFileName := pwText, > dwDesiredAccess := WinNT.GENERIC_READ, > dwShareMode := WinNT.FILE_SHARE_READ, > lpSecurityAttributes := NIL, > dwCreationDisposition := WinBase.OPEN_EXISTING, > dwFlagsAndAttributes := 0, > hTemplateFile := NIL); > DISPOSE(pwText); > END; > > IF LOOPHOLE(handle, INTEGER) = WinBase.INVALID_HANDLE_VALUE THEN > Fail(p, fname); > END; > M3toC.FreeSharedS(p, fname); > RETURN FileWin32.New(handle, FileWin32.Read) > END OpenFileReadonly; > > And similar in OpenFile. Not nice :). > > Also, I've added CP_UTF8 constant to WinNLS.i3. > > On Jun 25, 2012, at 9:01 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > So do you need Double-Byte Character String module as currently in TEXT > types? but you can do that already. Couldn't you? > Thanks in advance > > --- El lun, 25/6/12, Dragi?a > Duri? > escribi?: > > De: Dragi?a Duri? > > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > > > CC: "m3devel" > > Fecha: lunes, 25 de junio, 2012 13:20 > > Yes, they exposed parts of NLS. That's how problem can be, albeit > partially, solved. By using methods exposed there. > > What we don't have is how to communicate actual encoding of string to > FS module so FS methods can handle filenames accordingly. > > On Jun 25, 2012, at 8:06 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > OK, good, Win32 API dealt with inter-NLS (National Language Support) at > ASCII and other formats level with NLS API. > But it appears to be have not been used for DEC-SRC WinNT port of > Modula-3 (but for CM3, though it isn't compiled in elego servers, but > here): > http://www.cs.purdue.edu/homes/hosking/m3/help/gen_html/m3core/src/win32/WinNLS.i3.html > > Thanks in advance > > --- El lun, 25/6/12, Dragi?a > Duri? > escribi?: > > De: Dragi?a Duri? > > Asunto: Re: [M3devel] Windows, Unicode file names > Para: "Daniel Alejandro Benavides D." > > > CC: "m3devel" > > Fecha: lunes, 25 de junio, 2012 12:36 > > Daniel, > > I can talk about many things, and most things Modula-3 are of interest > to me. Once you start a topic, and I can understand what is it about, > and it meets my interests - I'll be there. > > Problem I met with filenames is nothing old. Windows can open files > with filenames in ASCII and UTF-16. Everything else - you must check > twice and do a workaround. > > I've written here in hope I can get i to some fruitful discussion with > people who understand this problem. My solution is a workaround and > assumes filename is UTF-8 or ASCII. I would like to start discussion on > this and work from there to more general solution. > > dd > > On Jun 25, 2012, at 7:27 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > I as I understood, thought you don't want to talk about compatible W 95 > / NT distro of Modula-3. > But in turn you want to keep compatibility with older file name encodes. > I don't care that but if its useful anyway (because newer windows don't > care at all either) I don't know know your problem was because it won't > be able to be solved! > Thanks in advance > > > From dragisha at m3w.org Tue Jun 26 12:18:41 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 12:18:41 +0200 Subject: [M3devel] =?windows-1252?q?AND_=28=85=2C_16=5Fff=29=85_Not_seriou?= =?windows-1252?q?s_-_or_so_I_hope!?= Message-ID: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> This piece of code, from TextClass.m3, disturbs me? a lot. If we are to use WIDECHAR, I think we must be a lot more serious than this. Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. Probably by someone whose mother tongue is fully writeable with ASCII :). ==== PROCEDURE GetChars (t: TEXT; VAR a: ARRAY OF CHAR; start: CARDINAL) = VAR info : Info; cnt : INTEGER; next : CARDINAL := 0; buf : ARRAY [0..127] OF WIDECHAR; BEGIN t.get_info (info); cnt := MIN (NUMBER (a), info.length - start); WHILE (cnt > 0) DO t.get_wide_chars (buf, start); FOR i := FIRST (buf) TO LAST (buf) DO IF (cnt = 0) THEN RETURN END; a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); INC (next); DEC (cnt); END; INC (start, NUMBER (buf)); END; END GetChars; ==== From dabenavidesd at yahoo.es Tue Jun 26 14:12:42 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 26 Jun 2012 13:12:42 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> Message-ID: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: Maybe is a left over of older code (almost just used a decade ago) but if not, then this meant to be just a partial implementation? If we are to get serious about memory usage seems over strict (or just in case you don't need system NIL terminated widechars be checked?). Thanks in advance --- El mar, 26/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "m3devel" Fecha: martes, 26 de junio, 2012 05:18 This piece of code, from TextClass.m3, disturbs me? a lot. If we are to use WIDECHAR, I think we must be a lot more serious than this. Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. Probably by someone whose mother tongue is fully writeable with ASCII :). ==== PROCEDURE GetChars (t: TEXT;? VAR a: ARRAY OF CHAR;? start: CARDINAL) = VAR ???info : Info; ???cnt? : INTEGER; ???next : CARDINAL := 0; ???buf? : ARRAY [0..127] OF WIDECHAR; BEGIN ???t.get_info (info); ???cnt := MIN (NUMBER (a), info.length - start); ???WHILE (cnt > 0) DO ? ???t.get_wide_chars (buf, start); ? ???FOR i := FIRST (buf) TO LAST (buf) DO ? ? ???IF (cnt = 0) THEN RETURN END; ? ? ???a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); ? ? ???INC (next);? DEC (cnt); ? ???END; ? ???INC (start, NUMBER (buf)); ???END; END GetChars; ==== -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 26 14:27:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 14:27:00 +0200 Subject: [M3devel] =?windows-1252?q?AND_=28=85=2C_16=5Fff=29=85_Not_seriou?= =?windows-1252?q?s_-_or_so_I_hope!?= In-Reply-To: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> Message-ID: <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> If you cared to read, for example Text.i3, you would see this is exactly what cm3 people meant to be. On Jun 26, 2012, at 2:12 PM, Daniel Alejandro Benavides D. wrote: > Hi all: > Maybe is a left over of older code (almost just used a decade ago) but if not, then this meant to be just a partial implementation? If we are to get serious about memory usage seems over strict (or just in case you don't need system NIL terminated widechars be checked?). > Thanks in advance > > --- El mar, 26/6/12, Dragi?a Duri? escribi?: > > De: Dragi?a Duri? > Asunto: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > Para: "m3devel" > Fecha: martes, 26 de junio, 2012 05:18 > > This piece of code, from TextClass.m3, disturbs me? a lot. > > If we are to use WIDECHAR, I think we must be a lot more serious than this. > > Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. > > Probably by someone whose mother tongue is fully writeable with ASCII :). > > ==== > PROCEDURE GetChars (t: TEXT; VAR a: ARRAY OF CHAR; start: CARDINAL) = > VAR > info : Info; > cnt : INTEGER; > next : CARDINAL := 0; > buf : ARRAY [0..127] OF WIDECHAR; > BEGIN > t.get_info (info); > cnt := MIN (NUMBER (a), info.length - start); > WHILE (cnt > 0) DO > t.get_wide_chars (buf, start); > FOR i := FIRST (buf) TO LAST (buf) DO > IF (cnt = 0) THEN RETURN END; > a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); > INC (next); DEC (cnt); > END; > INC (start, NUMBER (buf)); > END; > END GetChars; > ==== > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Tue Jun 26 14:47:31 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 26 Jun 2012 13:47:31 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> Message-ID: <1340714851.17688.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: copied that, but interface TextClass GetChars is kind of different from GetChar in Text. I can't see the interrelation Thanks in advance --- El mar, 26/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "Daniel Alejandro Benavides D." CC: "m3devel" Fecha: martes, 26 de junio, 2012 07:27 If you cared to read, for example Text.i3, you would see this is exactly what cm3 people meant to be. On Jun 26, 2012, at 2:12 PM, Daniel Alejandro Benavides D. wrote: Hi all: Maybe is a left over of older code (almost just used a decade ago) but if not, then this meant to be just a partial implementation? If we are to get serious about memory usage seems over strict (or just in case you don't need system NIL terminated widechars be checked?). Thanks in advance --- El mar, 26/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "m3devel" Fecha: martes, 26 de junio, 2012 05:18 This piece of code, from TextClass.m3, disturbs me? a lot. If we are to use WIDECHAR, I think we must be a lot more serious than this. Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. Probably by someone whose mother tongue is fully writeable with ASCII :). ==== PROCEDURE GetChars (t: TEXT;? VAR a: ARRAY OF CHAR;? start: CARDINAL) = VAR ???info : Info; ???cnt? : INTEGER; ???next : CARDINAL := 0; ???buf? : ARRAY [0..127] OF WIDECHAR; BEGIN ???t.get_info (info); ???cnt := MIN (NUMBER (a), info.length - start); ???WHILE (cnt > 0) DO ? ???t.get_wide_chars (buf, start); ? ???FOR i := FIRST (buf) TO LAST (buf) DO ? ? ???IF (cnt = 0) THEN RETURN END; ? ? ???a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); ? ? ???INC (next);? DEC (cnt); ? ???END; ? ???INC (start, NUMBER (buf)); ???END; END GetChars; ==== -------------- next part -------------- An HTML attachment was scrubbed... URL: From jay.krell at cornell.edu Tue Jun 26 16:28:46 2012 From: jay.krell at cornell.edu (Jay K) Date: Tue, 26 Jun 2012 14:28:46 +0000 Subject: [M3devel] =?iso-8859-2?q?AND_=28=2E=2C_16=5Fff=29=2E_Not_serious_?= =?iso-8859-2?q?-_or_so_I_hope!?= In-Reply-To: <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com>, <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> Message-ID: ?> 128 limit I haven't read the code enough yet to verify that but you are probably right ?> ignoring everything over 16_FF Probably that is the responsibility/claim of the caller of GetChars. If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. Perhaps raising an exception would be reasonable to signal the loss of data. Or something. There is HasWideChars for you to check. There is no encoding implied remember. This isn't UTF8 data. ?- Jay ________________________________ > From: dragisha at m3w.org > Date: Tue, 26 Jun 2012 14:27:00 +0200 > To: dabenavidesd at yahoo.es > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > If you cared to read, for example Text.i3, you would see this is > exactly what cm3 people meant to be. > > On Jun 26, 2012, at 2:12 PM, Daniel Alejandro Benavides D. wrote: > > Hi all: > Maybe is a left over of older code (almost just used a decade ago) but > if not, then this meant to be just a partial implementation? If we are > to get serious about memory usage seems over strict (or just in case > you don't need system NIL terminated widechars be checked?). > Thanks in advance > > --- El mar, 26/6/12, Dragi?a Duri? > > escribi?: > > De: Dragi?a Duri? > > Asunto: [M3devel] AND (., 16_ff). Not serious - or so I hope! > Para: "m3devel" > > Fecha: martes, 26 de junio, 2012 05:18 > > This piece of code, from TextClass.m3, disturbs me. a lot. > > If we are to use WIDECHAR, I think we must be a lot more serious than this. > > Probably, text pieces are limited to 128 bytes by design, somewhere. > But - whose idea was to "narrow" by ignoring everything except 8 LSB's? > By mapping set of 2^20 elements to set of 2^8 elements. > > Probably by someone whose mother tongue is fully writeable with ASCII :). > > ==== > PROCEDURE GetChars (t: TEXT; VAR a: ARRAY OF CHAR; start: CARDINAL) = > VAR > info : Info; > cnt : INTEGER; > next : CARDINAL := 0; > buf : ARRAY [0..127] OF WIDECHAR; > BEGIN > t.get_info (info); > cnt := MIN (NUMBER (a), info.length - start); > WHILE (cnt > 0) DO > t.get_wide_chars (buf, start); > FOR i := FIRST (buf) TO LAST (buf) DO > IF (cnt = 0) THEN RETURN END; > a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); > INC (next); DEC (cnt); > END; > INC (start, NUMBER (buf)); > END; > END GetChars; > ==== > > > From dragisha at m3w.org Tue Jun 26 17:14:06 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 17:14:06 +0200 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com>, <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> Message-ID: <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> On Jun 26, 2012, at 4:28 PM, Jay K wrote: > > 128 limit > > I haven't read the code enough yet to verify that but you are probably right I was not right :), that call is incremental. > > > ignoring everything over 16_FF > > Probably that is the responsibility/claim of the caller of GetChars. > If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. > Perhaps raising an exception would be reasonable to signal the loss of data. Or something. > There is HasWideChars for you to check. > > There is no encoding implied remember. > This isn't UTF8 data. It is not, but probably only way to solve this without exception is to make UTF8 "official" 8bit encoding :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Tue Jun 26 18:00:05 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 26 Jun 2012 12:00:05 -0400 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> Message-ID: <20120626160005.GA29355@topoi.pooq.com> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: > > Somewhat but not fully. Text.Length should fetch a stored length. As > I'm sure it already does.That length should always be correctly > maintained. Same as today.Adding one extra nul at the end doesn't > invalidate the data.std::string has the same properties -- c_str() can > on-demand append a terminal nul,but there could also be one in the > string itself.I understand it is a bit wierd. Maintaining a terminal > nul does add cost that might be wasted.And reduces the capacity by > one.It could be on-demand, I guess. - Jay Don't need the 'on demand'. For the benefits of C interoperability, the extra byte is well worth the price. What I'm worrying about is someone using an enbedded NUL as an end-of-string marker. I smell more bugs creeping in. But I guess bug are inherent in C use, so I'm not surprised seeing them in C interoperation. -- hendrik From jay.krell at cornell.edu Tue Jun 26 18:34:01 2012 From: jay.krell at cornell.edu (Jay) Date: Tue, 26 Jun 2012 09:34:01 -0700 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> Message-ID: >> > 128 limit >> >> I haven't read the code enough yet to verify that but you are probably right > > I was not right :), that call is incremental. I looked for that aspect too but missed it. :( >> > ignoring everything over 16_FF >> >> Probably that is the responsibility/claim of the caller of GetChars. >> If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. >> Perhaps raising an exception would be reasonable to signal the loss of data. Or something. >> There is HasWideChars for you to check. > > >> >> There is no encoding implied remember. >> This isn't UTF8 data. > > > It is not, but probably only way to solve this without exception is to make UTF8 "official" 8bit encoding :) > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. Java is in-between but also simple & predictable -- there being no narrow option other than array of byte, which is reasonable. - Jay -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 26 18:46:07 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 18:46:07 +0200 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> Message-ID: <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> You had idea in other message. Store length! Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! On Jun 26, 2012, at 6:34 PM, Jay wrote: > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Tue Jun 26 18:51:00 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Tue, 26 Jun 2012 17:51:00 +0100 (BST) Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> Message-ID: <1340729460.40972.YahooMailClassic@web29701.mail.ird.yahoo.com> Hi all: it would be so much greater fun time/verify and correct than use by hand. Let's do it sooner than later. Thanks in advance --- El mar, 26/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! Para: "Jay K" CC: dabenavidesd at yahoo.es, "m3devel" Fecha: martes, 26 de junio, 2012 10:14 On Jun 26, 2012, at 4:28 PM, Jay K wrote: > 128 limit I haven't read the code enough yet to verify that but you are probably right I was not right :), that call is incremental. ?> ignoring everything over 16_FF Probably that is the responsibility/claim of the caller of GetChars. If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. Perhaps raising an exception would be reasonable to signal the loss of data. Or something. There is HasWideChars for you to check. There is no encoding implied remember. This isn't UTF8 data. It is not, but probably only way to solve this without exception is to make UTF8 "official" 8bit encoding :) -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Tue Jun 26 19:01:42 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 19:01:42 +0200 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: <117F1599-4A24-462F-9462-7CC756BB7E4B@m3w.org> As for input encoding? Benjamin Kowarch (of M2R10 project) solved this with pragmas. There is good idea on how to instruct parser about text encoding used for source code (meaning also encoding used for string literals). As it's dependent on locals settings, it is important to let compiler know how to parse source. Of course, Unicode string literals will be stored as UTF8 strings after parsing. On Jun 26, 2012, at 6:46 PM, Dragi?a Duri? wrote: > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! > > On Jun 26, 2012, at 6:34 PM, Jay wrote: > >> I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. >> >> >> Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Tue Jun 26 20:19:55 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 26 Jun 2012 14:19:55 -0400 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> Message-ID: <20120626181955.GB29355@topoi.pooq.com> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi?a Duri? wrote: > This piece of code, from TextClass.m3, disturbs me? a lot. > > If we are to use WIDECHAR, I think we must be a lot more serious than this. > > Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. > > Probably by someone whose mother tongue is fully writeable with ASCII :). I'm told the Japanese hate UTF-8, because it expands their characters from two bytes to three. -- hendrik From mika at async.caltech.edu Tue Jun 26 20:50:08 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 26 Jun 2012 11:50:08 -0700 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <20120626160005.GA29355@topoi.pooq.com> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> Message-ID: <20120626185008.50E131A205B@async.async.caltech.edu> As far as I know, SRC M3 and PM3 come with a TEXT implementation that works exactly as described below. An extra byte is used at the end with a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. One of the big advantages of the old version is that Text.Hash is really, really fast. Especially on Alphas... it's hugely more expensive to have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under CM3 than under the old compilers and runtimes. We're talking a factor of five or so in speed since the Table routines are generally entirely dominated by Text.Hash. Mika Hendrik Boom writes: >On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: >> >> Somewhat but not fully. Text.Length should fetch a stored length. As >> I'm sure it already does.That length should always be correctly >> maintained. Same as today.Adding one extra nul at the end doesn't >> invalidate the data.std::string has the same properties -- c_str() can >> on-demand append a terminal nul,but there could also be one in the >> string itself.I understand it is a bit wierd. Maintaining a terminal >> nul does add cost that might be wasted.And reduces the capacity by >> one.It could be on-demand, I guess. - Jay > >Don't need the 'on demand'. For the benefits of C interoperability, the >extra byte is well worth the price. What I'm worrying about is someone >using an enbedded NUL as an end-of-string marker. I smell more bugs >creeping in. But I guess bug are inherent in C use, so I'm not >surprised seeing them in C interoperation. > >-- hendrik From mika at async.caltech.edu Tue Jun 26 20:52:21 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 26 Jun 2012 11:52:21 -0700 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: <20120626185221.24C8B1A205B@async.async.caltech.edu> =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > >--Apple-Mail=_03217A26-DF5A-42D7-BAA5-DF805C7EE80E >Content-Transfer-Encoding: quoted-printable >Content-Type: text/plain; > charset=us-ascii > >You had idea in other message. Store length! > >Another idea - store partial list of indices to character locations. So = >whatever one does, that list can be used/expanded. Whatever storage = >issues this makes, they are probably minor as compared to 32bit WIDECHAR = >for all idea. > >Mika had performance problems with cm3 TEXT. I hope he follows and cares = >to refresh us on those issues?! Apart from the hash table issue I mentioned there were horrible performance issues when concatenating in particular ways, but I think that's been solved now. I don't think anyone has looked at Text.Hash very closely. Mika From dmuysers at hotmail.com Tue Jun 26 21:38:16 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Tue, 26 Jun 2012 21:38:16 +0200 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <20120626181955.GB29355@topoi.pooq.com> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> Message-ID: So let them hate it. Memory is not a problem anymore. -------------------------------------------------- From: "Hendrik Boom" Sent: Tuesday, June 26, 2012 8:19 PM To: Subject: Re: [M3devel]AND (?, 16_ff)? Not serious - or so I hope! > On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi?a Duri? wrote: >> This piece of code, from TextClass.m3, disturbs me? a lot. >> >> If we are to use WIDECHAR, I think we must be a lot more serious than >> this. >> >> Probably, text pieces are limited to 128 bytes by design, somewhere. >> But - whose idea was to "narrow" by ignoring everything except 8 LSB's? >> By mapping set of 2^20 elements to set of 2^8 elements. >> >> Probably by someone whose mother tongue is fully writeable with ASCII :). > > I'm told the Japanese hate UTF-8, because it expands their characters > from two bytes to three. > > -- hendrik > From dragisha at m3w.org Tue Jun 26 21:53:18 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Tue, 26 Jun 2012 21:53:18 +0200 Subject: [M3devel] =?windows-1252?q?AND_=28=85=2C_16=5Fff=29=85_Not_seriou?= =?windows-1252?q?s_-_or_so_I_hope!?= In-Reply-To: References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> Message-ID: <5D286A2C-CA1A-4F8B-846A-CDCBDACA2661@m3w.org> Also? If we add length info to TEXT fragments, we might as well add encoding info :). So, most of TEXT fragments in memory will use same (system default) encoding but there will also be a way to mix them, convert to system default or anything some API (like Win32) requires. On Jun 26, 2012, at 9:38 PM, Dirk Muysers wrote: > So let them hate it. Memory is not a problem anymore. > > -------------------------------------------------- > From: "Hendrik Boom" > Sent: Tuesday, June 26, 2012 8:19 PM > To: > Subject: Re: [M3devel]AND (?, 16_ff)? Not serious - or so I hope! > >> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi?a Duri? wrote: >>> This piece of code, from TextClass.m3, disturbs me? a lot. >>> >>> If we are to use WIDECHAR, I think we must be a lot more serious than this. >>> >>> Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. >>> >>> Probably by someone whose mother tongue is fully writeable with ASCII :). >> >> I'm told the Japanese hate UTF-8, because it expands their characters >> from two bytes to three. >> >> -- hendrik From rcolebur at SCIRES.COM Tue Jun 26 22:22:22 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Tue, 26 Jun 2012 16:22:22 -0400 Subject: [M3devel] EXT Re: AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: I seem to recall that Rodney did some work a while back relating to TEXT. Rodney, can you weigh in on some of this? --Randy Coleburn From: Dragi?a Duri? [mailto:dragisha at m3w.org] Sent: Tuesday, June 26, 2012 12:46 PM To: Jay Cc: m3devel Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! You had idea in other message. Store length! Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! On Jun 26, 2012, at 6:34 PM, Jay wrote: I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Tue Jun 26 23:42:02 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Tue, 26 Jun 2012 16:42:02 -0500 Subject: [M3devel] EXT Re: AND (., 16_ff). Not serious - or so I hope! In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: <4FEA2CAA.4010306@lcwb.coop> On 06/26/2012 03:22 PM, Coleburn, Randy wrote: > I seem to recall that Rodney did some work a while back relating to TEXT. > > Rodney, can you weigh in on some of this? > I wrote a modified implementation of cm3 TEXT. It uses the same data structure and invariants, so any internal values it creates are useable by any existing code that imports the various revelations. It improves performance problems deriving from Cat operations' building trees that actually degenerate into linear lists (fairly likely, as it happens whenever a string is constructed by a left-to-right or right-to-left series of concatenations.) As usual, some operations on some values are slower, but it seems to be a gain overall. I have an extensive test driver and statistics gatherer, which shows good results. However, only tested it on LINUXLIBC6 and AMD64_LINUX, machines I have. Olaf was not comfortable that it was fully tested this way, and I have never taken the time to figure out how to run tests on targets I don't have. I think it is a significant improvement over the stock cm3 TEXT implementation. Whether it is as good as just going back to the pm3 implementation is not so clear. All three implementations correctly (except for possible bugs--none known at present, AFAIK) implement the language's abstract Text interface, and code that only uses Text would see only performance differences. > --Randy Coleburn > > *From:*Dragi?a Duri? [mailto:dragisha at m3w.org] > *Sent:* Tuesday, June 26, 2012 12:46 PM > *To:* Jay > *Cc:* m3devel > *Subject:* EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! > > On Jun 26, 2012, at 6:34 PM, Jay wrote: > > > > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > From hendrik at topoi.pooq.com Wed Jun 27 00:16:39 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 26 Jun 2012 18:16:39 -0400 Subject: [M3devel] TEXT In-Reply-To: <5D286A2C-CA1A-4F8B-846A-CDCBDACA2661@m3w.org> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> <5D286A2C-CA1A-4F8B-846A-CDCBDACA2661@m3w.org> Message-ID: <20120626221639.GA28021@topoi.pooq.com> On Tue, Jun 26, 2012 at 09:53:18PM +0200, Dragi?a Duri? wrote: > Also? If we add length info to TEXT fragments, we might as well add encoding info :). We could do that by letting TEXT have subtypes, depending on the encoding. -- hendrik From rcolebur at SCIRES.COM Wed Jun 27 01:44:26 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Tue, 26 Jun 2012 19:44:26 -0400 Subject: [M3devel] EXT Re: EXT Re: AND (., 16_ff). Not serious - or so I hope! In-Reply-To: <4FEA2CAA.4010306@lcwb.coop> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <4FEA2CAA.4010306@lcwb.coop> Message-ID: I am willing to run tests on platforms that I have, mostly Windows flavors. --Randy Coleburn -----Original Message----- From: Rodney M. Bates [mailto:rodney_bates at lcwb.coop] Sent: Tuesday, June 26, 2012 5:42 PM To: m3devel at elegosoft.com Subject: EXT Re: [M3devel] EXT Re: AND (., 16_ff). Not serious - or so I hope! On 06/26/2012 03:22 PM, Coleburn, Randy wrote: > I seem to recall that Rodney did some work a while back relating to TEXT. > > Rodney, can you weigh in on some of this? > I wrote a modified implementation of cm3 TEXT. It uses the same data structure and invariants, so any internal values it creates are useable by any existing code that imports the various revelations. It improves performance problems deriving from Cat operations' building trees that actually degenerate into linear lists (fairly likely, as it happens whenever a string is constructed by a left-to-right or right-to-left series of concatenations.) As usual, some operations on some values are slower, but it seems to be a gain overall. I have an extensive test driver and statistics gatherer, which shows good results. However, only tested it on LINUXLIBC6 and AMD64_LINUX, machines I have. Olaf was not comfortable that it was fully tested this way, and I have never taken the time to figure out how to run tests on targets I don't have. I think it is a significant improvement over the stock cm3 TEXT implementation. Whether it is as good as just going back to the pm3 implementation is not so clear. All three implementations correctly (except for possible bugs--none known at present, AFAIK) implement the language's abstract Text interface, and code that only uses Text would see only performance differences. > --Randy Coleburn > > *From:*Dragi?a Duri? [mailto:dragisha at m3w.org] > *Sent:* Tuesday, June 26, 2012 12:46 PM > *To:* Jay > *Cc:* m3devel > *Subject:* EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! > > On Jun 26, 2012, at 6:34 PM, Jay wrote: > > > > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > From dabenavidesd at yahoo.es Wed Jun 27 03:41:33 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 27 Jun 2012 02:41:33 +0100 (BST) Subject: [M3devel] EXT Re: EXT Re: AND (., 16_ff). Not serious - or so I hope! In-Reply-To: Message-ID: <1340761293.52332.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: even if we have non-faulty implementation the problem remains the same, the coding standard is non-uniformly used, but instead use old TEXT with C cross-compiled version seemed the way to win at least in Win flavors. I give that point to Jay, he is absolutely right,? I fear that if we don't do this correctly, we could loss in C compiler intrinsics. Perhaps before all this work continues we need to port this better we won't do real big advances more quickly. Thanks in advance --- El mar, 26/6/12, Coleburn, Randy escribi?: De: Coleburn, Randy Asunto: Re: [M3devel] EXT Re: EXT Re: AND (., 16_ff). Not serious - or so I hope! Para: "m3devel at elegosoft.com" Fecha: martes, 26 de junio, 2012 18:44 I am willing to run tests on platforms that I have, mostly Windows flavors. --Randy Coleburn -----Original Message----- From: Rodney M. Bates [mailto:rodney_bates at lcwb.coop] Sent: Tuesday, June 26, 2012 5:42 PM To: m3devel at elegosoft.com Subject: EXT Re: [M3devel] EXT Re: AND (., 16_ff). Not serious - or so I hope! On 06/26/2012 03:22 PM, Coleburn, Randy wrote: > I seem to recall that Rodney did some work a while back relating to TEXT. > > Rodney, can you weigh in on some of this? > I wrote a modified implementation of cm3 TEXT.? It uses the same data structure and invariants, so any internal values it creates are useable by any existing code that imports the various revelations.? It improves performance problems deriving from Cat operations' building trees that actually degenerate into linear lists (fairly likely, as it happens whenever a string is constructed by a left-to-right or right-to-left series of concatenations.)? As usual, some operations on some values are slower, but it seems to be a gain overall. I have an extensive test driver and statistics gatherer, which shows good results.? However, only tested it on LINUXLIBC6 and AMD64_LINUX, machines I have.? Olaf was not comfortable that it was fully tested this way, and I have never taken the time to figure out how to run tests on targets I don't have. I think it is a significant improvement over the stock cm3 TEXT implementation. Whether it is as good as just going back to the pm3 implementation is not so clear. All three implementations correctly (except for possible bugs--none known at present, AFAIK) implement the language's abstract Text interface, and code that only uses Text would see only performance differences. > --Randy Coleburn > > *From:*Dragi?a Duri? [mailto:dragisha at m3w.org] > *Sent:* Tuesday, June 26, 2012 12:46 PM > *To:* Jay > *Cc:* m3devel > *Subject:* EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Mika had performance problems with cm3 TEXT. I hope he follows and cares to refresh us on those issues?! > > On Jun 26, 2012, at 6:34 PM, Jay wrote: > > > > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Wed Jun 27 03:54:31 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 27 Jun 2012 02:54:31 +0100 (BST) Subject: [M3devel] TEXT In-Reply-To: <20120626221639.GA28021@topoi.pooq.com> Message-ID: <1340762071.63111.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I don't know, but if this would coexist with everything (e.g C) hard to know is whether this will affect the overall performance, sometimes this is like that (for instance CM3 Text), but perhaps if it's just costs in memory then I wish it were like that. Thanks in advance --- El mar, 26/6/12, Hendrik Boom escribi?: De: Hendrik Boom Asunto: Re: [M3devel] TEXT Para: m3devel at elegosoft.com Fecha: martes, 26 de junio, 2012 17:16 On Tue, Jun 26, 2012 at 09:53:18PM +0200, Dragi?a Duri? wrote: > Also? If we add length info to TEXT fragments, we might as well add encoding info :). We could do that by letting TEXT have subtypes, depending on the encoding. -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Wed Jun 27 03:54:57 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Tue, 26 Jun 2012 18:54:57 -0700 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> Message-ID: <20120627015457.238041A205B@async.async.caltech.edu> Memory is always potentially a problem!!!! One of the main reasons my group was slow at switching from PM3 to CM3 was because we were processing node names for chip designs as TEXTs. Chip designs tend to be deeply hierarchical and you wind up printing a lot of strings such as a.b.c.d.e.f.g.h to files. That's when you run into problems with Text.Cat. And memory will always be a problem since you are always designing the next generation of computers with the current generation of computers. Also even if memory weren't a problem, speed is always a problem, and speed isn't entirely unrelated to memory. The Text.Hash I was alluding to earlier hashes eight characters per iteration on a 64-bit machine, as long as characters are 8 bits... If you go to 16 bits it'll take at least twice as long. Furthermore if there is more than one way (bit pattern) to represent a single CHAR it becomes difficult to use algorithms that take more than one at a time. Mika "Dirk Muysers" writes: >So let them hate it. Memory is not a problem anymore. > >-------------------------------------------------- >From: "Hendrik Boom" >Sent: Tuesday, June 26, 2012 8:19 PM >To: >Subject: Re: [M3devel]AND (???, 16_ff)??? Not serious - or so I hope! > >> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi??a Duri?? wrote: >>> This piece of code, from TextClass.m3, disturbs me??? a lot. >>> >>> If we are to use WIDECHAR, I think we must be a lot more serious than >>> this. >>> >>> Probably, text pieces are limited to 128 bytes by design, somewhere. >>> But - whose idea was to "narrow" by ignoring everything except 8 LSB's? >>> By mapping set of 2^20 elements to set of 2^8 elements. >>> >>> Probably by someone whose mother tongue is fully writeable with ASCII :). >> >> I'm told the Japanese hate UTF-8, because it expands their characters >> from two bytes to three. >> >> -- hendrik >> From dabenavidesd at yahoo.es Wed Jun 27 04:18:53 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 27 Jun 2012 03:18:53 +0100 (BST) Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <20120627015457.238041A205B@async.async.caltech.edu> Message-ID: <1340763533.57548.YahooMailClassic@web29706.mail.ird.yahoo.com> Hi all: Well you have created a chicken and egg problem/opportunity you can create your theory and move forward. I guess history has shown that big chunks of memory won't make higher speed execution programs, but distributed machines with less memory. The problem is I could set a theory that explains how computers could actually evolve and so on, based on family of computers, you know like Stack Computers, and it turns out that in reality it doesn't work like that, and by the reality it's not true (also I don't consider the "reality" to be that, I don't think tablets and stuff will be takers of tomorrow as today, it's very very useful, as were Micros in their time but can't come back and do that again, Micros are gone). I don't think or hate devices or people who uses it (perhaps I'm old for that) but this things are mostly used to send messages to set up quickly a web page (an every day task which I still consider for talented people) Frankly we can say many thing sin theory again but just good people and companies can make a standard way of doing things. Thanks in advance --- El mar, 26/6/12, Mika Nystrom escribi?: De: Mika Nystrom Asunto: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! Para: "Dirk Muysers" CC: m3devel at elegosoft.com Fecha: martes, 26 de junio, 2012 20:54 Memory is always potentially a problem!!!! One of the main reasons my group was slow at switching from PM3 to CM3 was because we were processing node names for chip designs as TEXTs. Chip designs tend to be deeply hierarchical and you wind up printing a lot of strings such as a.b.c.d.e.f.g.h to files. That's when you run into problems with Text.Cat. And memory will always be a problem since you are always designing the next generation of computers with the current generation of computers. Also even if memory weren't a problem, speed is always a problem, and speed isn't entirely unrelated to memory.? The Text.Hash I was alluding to earlier hashes eight characters per iteration on a 64-bit machine, as long as characters are 8 bits...? If you go to 16 bits it'll take at least twice as long.? Furthermore if there is more than one way (bit pattern) to represent a single CHAR it becomes difficult to use algorithms that take more than one at a time. ? ? Mika "Dirk Muysers" writes: >So let them hate it. Memory is not a problem anymore. > >-------------------------------------------------- >From: "Hendrik Boom" >Sent: Tuesday, June 26, 2012 8:19 PM >To: >Subject: Re: [M3devel]AND (?, 16_ff)? Not serious - or so I hope! > >> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi?a Duri? wrote: >>> This piece of code, from TextClass.m3, disturbs me? a lot. >>> >>> If we are to use WIDECHAR, I think we must be a lot more serious than >>> this. >>> >>> Probably, text pieces are limited to 128 bytes by design, somewhere. >>> But - whose idea was to "narrow" by ignoring everything except 8 LSB's? >>> By mapping set of 2^20 elements to set of 2^8 elements. >>> >>> Probably by someone whose mother tongue is fully writeable with ASCII :). >> >> I'm told the Japanese hate UTF-8, because it expands? their characters >> from two bytes to three. >> >> -- hendrik >> -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Wed Jun 27 05:30:01 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Tue, 26 Jun 2012 23:30:01 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> Message-ID: <20120627033000.GB28021@topoi.pooq.com> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: > I seem to recall that Rodney did some work a while back relating to TEXT. > Rodney, can you weigh in on some of this? > --Randy Coleburn > > From: Dragi?a Duri? [mailto:dragisha at m3w.org] > Sent: Tuesday, June 26, 2012 12:46 PM > To: Jay > Cc: m3devel > Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! > > You had idea in other message. Store length! > > Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. Most of the time, you don't need explicit integer indexes to character locations. What you do need is an operation that fetches a character given the string and its index (whatever data structure that index is), and one that increments the index past that character. As long as you can save an index and use it later on the same string, that's probably all you ever need. And with a simple TEXT representation (such as the obvious array of bytes containing characters of various widths) a byte index is all you need (note: NOT a character index). It's easy even to use TEXT and its integer indices as the data representation, as long as you use the proper functions parse the characters and increment the indices by amounts that might differ from 1. And if your source code is represented in UTF-8, the representation that requires little extra compiler effort to parse, your TEXT strings will automagically appear in UTF-8. I can see a use for various wide characters -- the things you extract from a TEXT by parsing biits of it, but none for anything really new complicated for wide TEXT. The only confusing thing is that the existing operations for extracting bytes from TEXT have names that suggest they are extracting characters. -- Hendrik From dmuysers at hotmail.com Wed Jun 27 09:58:28 2012 From: dmuysers at hotmail.com (Dirk Muysers) Date: Wed, 27 Jun 2012 09:58:28 +0200 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <20120627015457.238041A205B@async.async.caltech.edu> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> <20120626181955.GB29355@topoi.pooq.com> <20120627015457.238041A205B@async.async.caltech.edu> Message-ID: Some time ago I have started to develop a unicode library based on the old M3 text model but using UTF-8 internally rather than Latin-1 (see README attachement). For reasons best known to me I had to put it on the backburner in favour of more urgent work. If anybody is interested in furthering this solution I would eagerly give the existing (pre-alpha) code away. This being said, there are certainly better hash algorithms than the one used by m3core (eg Goullburn, see http://www.clockandflame.com/media/Goulburn06.pdf). -------------------------------------------------- From: "Mika Nystrom" Sent: Wednesday, June 27, 2012 3:54 AM To: "Dirk Muysers" Cc: Subject: Re: [M3devel] AND (?, 16_ff)? Not serious - or so I hope! > Memory is always potentially a problem!!!! > > One of the main reasons my group was slow at switching from PM3 to CM3 > was because we were processing node names for chip designs as TEXTs. > > Chip designs tend to be deeply hierarchical and you wind up printing a > lot of strings such as > > a.b.c.d.e.f.g.h > > to files. > > That's when you run into problems with Text.Cat. > > And memory will always be a problem since you are always designing the > next generation of computers with the current generation of computers. > > Also even if memory weren't a problem, speed is always a problem, and > speed isn't entirely unrelated to memory. The Text.Hash I was alluding > to earlier hashes eight characters per iteration on a 64-bit machine, > as long as characters are 8 bits... If you go to 16 bits it'll take > at least twice as long. Furthermore if there is more than one way > (bit pattern) to represent a single CHAR it becomes difficult to use > algorithms that take more than one at a time. > > Mika > > "Dirk Muysers" writes: >>So let them hate it. Memory is not a problem anymore. >> >>-------------------------------------------------- >>From: "Hendrik Boom" >>Sent: Tuesday, June 26, 2012 8:19 PM >>To: >>Subject: Re: [M3devel]AND (???, 16_ff)??? Not serious - or so I hope! >> >>> On Tue, Jun 26, 2012 at 12:18:41PM +0200, Dragi??a Duri?? wrote: >>>> This piece of code, from TextClass.m3, disturbs me??? a lot. >>>> >>>> If we are to use WIDECHAR, I think we must be a lot more serious than >>>> this. >>>> >>>> Probably, text pieces are limited to 128 bytes by design, somewhere. >>>> But - whose idea was to "narrow" by ignoring everything except 8 LSB's? >>>> By mapping set of 2^20 elements to set of 2^8 elements. >>>> >>>> Probably by someone whose mother tongue is fully writeable with ASCII >>>> :). >>> >>> I'm told the Japanese hate UTF-8, because it expands their characters >>> from two bytes to three. >>> >>> -- hendrik >>> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Wed Jun 27 11:52:53 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 27 Jun 2012 11:52:53 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <20120626185008.50E131A205B@async.async.caltech.edu> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com> <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org> <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> <20120626185008.50E131A205B@async.async.caltech.edu> Message-ID: More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). What we need is to make compler map from input encoding (whatever user chooses or is choosen for him) to internal UTF8. On Jun 26, 2012, at 8:50 PM, Mika Nystrom wrote: > > As far as I know, SRC M3 and PM3 come with a TEXT implementation that > works exactly as described below. An extra byte is used at the end with > a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. > > One of the big advantages of the old version is that Text.Hash is really, > really fast. Especially on Alphas... it's hugely more expensive to > have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under > CM3 than under the old compilers and runtimes. We're talking a factor > of five or so in speed since the Table routines are generally entirely > dominated by Text.Hash. > > Mika > > Hendrik Boom writes: >> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: >>> >>> Somewhat but not fully. Text.Length should fetch a stored length. As >>> I'm sure it already does.That length should always be correctly >>> maintained. Same as today.Adding one extra nul at the end doesn't >>> invalidate the data.std::string has the same properties -- c_str() can >>> on-demand append a terminal nul,but there could also be one in the >>> string itself.I understand it is a bit wierd. Maintaining a terminal >>> nul does add cost that might be wasted.And reduces the capacity by >>> one.It could be on-demand, I guess. - Jay >> >> Don't need the 'on demand'. For the benefits of C interoperability, the >> extra byte is well worth the price. What I'm worrying about is someone >> using an enbedded NUL as an end-of-string marker. I smell more bugs >> creeping in. But I guess bug are inherent in C use, so I'm not >> surprised seeing them in C interoperation. >> >> -- hendrik From jay.krell at cornell.edu Wed Jun 27 12:19:08 2012 From: jay.krell at cornell.edu (Jay K) Date: Wed, 27 Jun 2012 10:19:08 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com>, , <20120626160005.GA29355@topoi.pooq.com>, <20120626185008.50E131A205B@async.async.caltech.edu>, Message-ID: > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). I don't quite agree.There are two ideal approaches.1) TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR 2) something that can change between them, or possibly store both, but is still mainly flat arraysThat is, once you store a value over 0xFF, the internal represenation changes to flat array of WIDECHAR.Probably it stays that way -- you don't want to thrash back and forth in worst case.Lesser evil is probably to stick with wide represenation.Setting the string to empty might bounce it back narrow.Ditto assigning it from another narrow text, maybe. What I don't yet understand in all this is how to efficiently combine thread safety, immutability, and quadratic growth. The following should be as efficient as in typical C++ libraries: VAR a: TEXT;WHILE TRUE DO a := a & " ";END; I kind of thing that immutability and quadratic growth are in conflict.But not because that sounds obvious.Note that typical C++ libraries do have value semantics for std::string and std::vector. - Jay > From: dragisha at m3w.org > Date: Wed, 27 Jun 2012 11:52:53 +0200 > To: mika at async.caltech.edu > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > What we need is to make compler map from input encoding (whatever user chooses or is choosen for him) to internal UTF8. > > On Jun 26, 2012, at 8:50 PM, Mika Nystrom wrote: > > > > > As far as I know, SRC M3 and PM3 come with a TEXT implementation that > > works exactly as described below. An extra byte is used at the end with > > a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. > > > > One of the big advantages of the old version is that Text.Hash is really, > > really fast. Especially on Alphas... it's hugely more expensive to > > have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under > > CM3 than under the old compilers and runtimes. We're talking a factor > > of five or so in speed since the Table routines are generally entirely > > dominated by Text.Hash. > > > > Mika > > > > Hendrik Boom writes: > >> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: > >>> > >>> Somewhat but not fully. Text.Length should fetch a stored length. As > >>> I'm sure it already does.That length should always be correctly > >>> maintained. Same as today.Adding one extra nul at the end doesn't > >>> invalidate the data.std::string has the same properties -- c_str() can > >>> on-demand append a terminal nul,but there could also be one in the > >>> string itself.I understand it is a bit wierd. Maintaining a terminal > >>> nul does add cost that might be wasted.And reduces the capacity by > >>> one.It could be on-demand, I guess. - Jay > >> > >> Don't need the 'on demand'. For the benefits of C interoperability, the > >> extra byte is well worth the price. What I'm worrying about is someone > >> using an enbedded NUL as an end-of-string marker. I smell more bugs > >> creeping in. But I guess bug are inherent in C use, so I'm not > >> surprised seeing them in C interoperation. > >> > >> -- hendrik > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Wed Jun 27 13:14:22 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 27 Jun 2012 13:14:22 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com>, , <20120626160005.GA29355@topoi.pooq.com>, <20120626185008.50E131A205B@async.async.caltech.edu>, Message-ID: <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> On Jun 27, 2012, at 12:19 PM, Jay K wrote: > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > I don't quite agree. > There are two ideal approaches. > 1) > TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) > "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Wed Jun 27 13:26:31 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Wed, 27 Jun 2012 13:26:31 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com>, , <20120626160005.GA29355@topoi.pooq.com>, <20120626185008.50E131A205B@async.async.caltech.edu>, <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> Message-ID: <21109700-2223-46D4-A151-DABC7084BDC1@m3w.org> This is one place where insisting on some imagined/future purity (fully compatible withyour argument - thread safety + immutability + non-quadratic performance) will lead to unreasonable fragmentation and de-facto gray area in CM3 and it's usage. I am only one of people here who de-facto uses TEXT's to hold UTF8 content. And while we all think/talk about solution, every single user who needs international characters and wants to use them in sensible way - will go same way. Then, some "proper" CM3 solution comes and what happens? We rewrite everything to support it? Or ignore it? On Jun 27, 2012, at 1:14 PM, Dragi?a Duri? wrote: > > On Jun 27, 2012, at 12:19 PM, Jay K wrote: > >> > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). >> >> I don't quite agree. >> There are two ideal approaches. >> 1) >> TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) >> "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR > > So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? > > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dabenavidesd at yahoo.es Wed Jun 27 13:52:29 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Wed, 27 Jun 2012 12:52:29 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: <21109700-2223-46D4-A151-DABC7084BDC1@m3w.org> Message-ID: <1340797949.85516.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: In reality it turns out that ASCII is still the suitable and adhered standard for Modula-2 Command control, structured text, formatting in PLCs systems programming. We better when we pick something be clearer, but nevertheless I agree with internationalization as with compatibility, etc >From what I gather TEXT is allowed to be Latin-1 superset Thanks in advance --- El mi?, 27/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Jay K" CC: "m3devel" Fecha: mi?rcoles, 27 de junio, 2012 06:26 This is one place where insisting on some imagined/future purity (fully compatible withyour argument - thread safety + immutability + non-quadratic performance) will lead to unreasonable fragmentation and de-facto gray area in CM3 and it's usage. I am only one of people here who de-facto uses TEXT's to hold UTF8 content. And while we all think/talk about solution, every single user who needs international characters and wants to use them in sensible way - will go same way. Then, some "proper" CM3 solution comes and what happens? We rewrite everything to support it? Or ignore it? On Jun 27, 2012, at 1:14 PM, Dragi?a Duri? wrote: On Jun 27, 2012, at 12:19 PM, Jay K wrote: ?> More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). I don't?quite agree. There are two ideal approaches. 1) ? TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F)?? "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR? So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Wed Jun 27 21:20:41 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 14:20:41 -0500 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120627033000.GB28021@topoi.pooq.com> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> Message-ID: <4FEB5D09.2080601@lcwb.coop> On 06/26/2012 10:30 PM, Hendrik Boom wrote: > On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >> I seem to recall that Rodney did some work a while back relating to TEXT. >> Rodney, can you weigh in on some of this? >> --Randy Coleburn >> >> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >> Sent: Tuesday, June 26, 2012 12:46 PM >> To: Jay >> Cc: m3devel >> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >> >> You had idea in other message. Store length! >> >> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. > > Most of the time, you don't need explicit integer indexes to character > locations. What you do need is an operation that fetches a character > given the string and its index (whatever data structure that index is), > and one that increments the index past that character. As long as you > can save an index and use it later on the same string, that's probably > all you ever need. And with a simple TEXT representation (such as the > obvious array of bytes containing characters of various widths) a byte > index is all you need (note: NOT a character index). It's easy even to > use TEXT and its integer indices as the data representation, as long as > you use the proper functions parse the characters and increment the > indices by amounts that might differ from 1. > > And if your source code is represented in UTF-8, the representation that > requires little extra compiler effort to parse, your TEXT strings will > automagically appear in UTF-8. The original designers of the language and its libraries have given us two different abstractions for handling character strings (in addition to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. Text is highly general and easy to use. Concatentations and substrings are easy. Semantics, to its clients, are value semantics, similar to INTEGER. Random access by *character* number is easy and, hopefully, implemented with efficiency at least better than O(n). Wr and friends restrict you to sequential access, at least mostly, but gain implementation convenience and efficiency as a result. I feel very stongly that we should *not* take away the full generality of Text, especially efficient random access, to handle variable-length character encodings in strings. For these, lets make more friends of Wr and Rd, which already assume sequential access. For example, a filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 interpretation to its bytes, and delivers a stream of Unicode characters, in variables of type WIDECHAR. Text should preserve the abstraction that it's a string of characters, generalized as it already is in cm3, to have type WIDECHAR, so they can be any Unicode character. The internal representation should, usually, not be of concern. Note that nowhere in Text are character values transferred between a Text.T and any form of I/O stream. In the Text abstraction, all characters go in and out of a Text.T in variables of type CHAR, WIDECHAR, and arrays thereof. IO, etc. is only done in streams, e.g, TextWr. We can easily add new variants of these that encode/decode by various rules. Of course, it is still valid to put a string of bytes in a Text.T and apply, e.g., UTF-8 interpretation yourself. But that's lower-level programming, and shouldn't confuse the abstraction. > > I can see a use for various wide characters -- the things you extract > from a TEXT by parsing biits of it, but none for anything > really new complicated for wide TEXT. > > The only confusing thing is that the existing operations for extracting > bytes from TEXT have names that suggest they are extracting characters. > I think it's more than a suggestion. I think the abstraction clearly considers them characters. And it should stay that way. If you want, at a higher level of code, to treat them as bytes, that's fine, but the abstraction continues to view them as characters (which only you, the client, know is not really so.) > -- Hendrik > From rodney_bates at lcwb.coop Wed Jun 27 22:04:59 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 15:04:59 -0500 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , <20120625203422.GA24287@topoi.pooq.com>, , <20120626160005.GA29355@topoi.pooq.com>, <20120626185008.50E131A205B@async.async.caltech.edu>, Message-ID: <4FEB676B.1010505@lcwb.coop> On 06/27/2012 05:19 AM, Jay K wrote: > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > I don't quite agree. > There are two ideal approaches. > 1) > TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) > "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR > > > 2) something that can change between them, or possibly store both, but is still mainly flat arrays > That is, once you store a value over 0xFF, the internal represenation changes to flat array of WIDECHAR. > Probably it stays that way -- you don't want to thrash back and forth in worst case. > Lesser evil is probably to stick with wide represenation. > Setting the string to empty might bounce it back narrow. > Ditto assigning it from another narrow text, maybe. > This is similar to what the cm3 modification of Text does now. The details of what goes on inside the implementation are a bit different than you describe. There can be mixtures of 8-bit string fragments and 16-bit string fragments, plus other stuff hooking them together. But the abstraction works just like this. > > > What I don't yet understand in all this is how to efficiently combine thread safety, immutability, and quadratic growth. > > The following should be as efficient as in typical C++ libraries: > > > VAR a: TEXT; VAR a: TEXT:= " "; > WHILE TRUE DO > a := a & " "; > END; > In pm3 Text, this will take quadratic time and linear space. The partial strings will be garbage collected, as no copies of the pointers to them are made. GetChar is then O(1). In cm3 Text, this is linear in both time and space, but the space usage has a much higher constant factor than in pm3. In pm3, the asymptotic space used is exactly what the characters themselves require, i.e, one byte per character. For cm3, I count 21 native words per character, plus fragmentation loss for 3 separate heap objects per character. That's 84 times or 168 times, depending on word size. Well, lots of people keep saying RAM is virtually free these days. I guess we really need to hope they are right. GetChar is O(n) when the string is built linearly like this. Best case is O(log n) when built by Cats of single characters. My modification of cm3 Text lies between these. It flattens strings up to a point, then does some imperfect balancing of them higher in trees. Frankly, I think I like going back to the pm3 implementation best. > > I kind of thing that immutability and quadratic growth are in conflict. They are, to a considerable extent, as with all functional-style data structures. But more sophisticated (i.e., complicated) implementations can mitigate somewhat. > But not because that sounds obvious. > Note that typical C++ libraries do have value semantics for std::string and std::vector. > > > - Jay > > > > From: dragisha at m3w.org > > Date: Wed, 27 Jun 2012 11:52:53 +0200 > > To: mika at async.caltech.edu > > CC: m3devel at elegosoft.com > > Subject: Re: [M3devel] Windows, Unicode file names > > > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > > > What we need is to make compler map from input encoding (whatever user chooses or is choosen for him) to internal UTF8. > > > > On Jun 26, 2012, at 8:50 PM, Mika Nystrom wrote: > > > > > > > > As far as I know, SRC M3 and PM3 come with a TEXT implementation that > > > works exactly as described below. An extra byte is used at the end with > > > a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. > > > > > > One of the big advantages of the old version is that Text.Hash is really, > > > really fast. Especially on Alphas... it's hugely more expensive to > > > have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under > > > CM3 than under the old compilers and runtimes. We're talking a factor > > > of five or so in speed since the Table routines are generally entirely > > > dominated by Text.Hash. > > > > > > Mika > > > > > > Hendrik Boom writes: > > >> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: > > >>> > > >>> Somewhat but not fully. Text.Length should fetch a stored length. As > > >>> I'm sure it already does.That length should always be correctly > > >>> maintained. Same as today.Adding one extra nul at the end doesn't > > >>> invalidate the data.std::string has the same properties -- c_str() can > > >>> on-demand append a terminal nul,but there could also be one in the > > >>> string itself.I understand it is a bit wierd. Maintaining a terminal > > >>> nul does add cost that might be wasted.And reduces the capacity by > > >>> one.It could be on-demand, I guess. - Jay > > >> > > >> Don't need the 'on demand'. For the benefits of C interoperability, the > > >> extra byte is well worth the price. What I'm worrying about is someone > > >> using an enbedded NUL as an end-of-string marker. I smell more bugs > > >> creeping in. But I guess bug are inherent in C use, so I'm not > > >> surprised seeing them in C interoperation. > > >> > > >> -- hendrik > > From rodney_bates at lcwb.coop Wed Jun 27 22:10:42 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 15:10:42 -0500 Subject: [M3devel] =?utf-8?b?QU5EICjigKYsIDE2X2ZmKeKApiBOb3Qgc2VyaW91cyAt?= =?utf-8?q?_or_so_I_hope!?= In-Reply-To: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> References: <989FB282-786A-447D-8CA9-2432AE922DBF@m3w.org> Message-ID: <4FEB68C2.7000202@lcwb.coop> Yes, this is a disturbing quirk, and quite out of character with the nature of Modula-3. It would be consistent to say that CHAR<:WIDECHAR, and apply the usual assignability rules. That would make this a runtime range error. On 06/26/2012 05:18 AM, Dragi?a Duri? wrote: > This piece of code, from TextClass.m3, disturbs me? a lot. > > If we are to use WIDECHAR, I think we must be a lot more serious than this. > > Probably, text pieces are limited to 128 bytes by design, somewhere. But - whose idea was to "narrow" by ignoring everything except 8 LSB's? By mapping set of 2^20 elements to set of 2^8 elements. > > Probably by someone whose mother tongue is fully writeable with ASCII :). > > ==== > PROCEDURE GetChars (t: TEXT; VAR a: ARRAY OF CHAR; start: CARDINAL) = > VAR > info : Info; > cnt : INTEGER; > next : CARDINAL := 0; > buf : ARRAY [0..127] OF WIDECHAR; > BEGIN > t.get_info (info); > cnt := MIN (NUMBER (a), info.length - start); > WHILE (cnt> 0) DO > t.get_wide_chars (buf, start); > FOR i := FIRST (buf) TO LAST (buf) DO > IF (cnt = 0) THEN RETURN END; > a[next] := VAL (Word.And (ORD (buf[i]), 16_ff), CHAR); > INC (next); DEC (cnt); > END; > INC (start, NUMBER (buf)); > END; > END GetChars; > ==== > > From rodney_bates at lcwb.coop Wed Jun 27 22:27:29 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 15:27:29 -0500 Subject: [M3devel] AND (., 16_ff). Not serious - or so I hope! In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> Message-ID: <4FEB6CB1.3070509@lcwb.coop> On 06/26/2012 11:34 AM, Jay wrote: > >>> > 128 limit >>> >>> I haven't read the code enough yet to verify that but you are probably right >> >> I was not right :), that call is incremental. > > I looked for that aspect too but missed it. :( > > >>> > ignoring everything over 16_FF >>> >>> Probably that is the responsibility/claim of the caller of GetChars. >>> If you want to be correct in the face of non-ASCII, you are probably obligated to call GetWideChars. >>> Perhaps raising an exception would be reasonable to signal the loss of data. Or something. >>> There is HasWideChars for you to check. >> >> >>> >>> There is no encoding implied remember. >>> This isn't UTF8 data. >> >> It is not, but probably only way to solve this without exception is to make UTF8 "official" 8bit encoding :) >> > > I'm torn on that. We'd have to consider ramifications like Text.Length vs buffer size requirements/expectations. > > > Is TEXT & its use abstracted enough to have been widened? Should we put it back and introduce WIDETEXT? That is essentially what C and C++ do. They are inconvenient for existing code but simple predictable make sense. Contrast with weird hybrid systems like Perl & Python for which I just can't get through the documentation and understand and predict how they work.. > TEXT is well abstracted and can be widened, with the exception that truncating characters to 8 bits to return them in a CHAR is wrong. It should be a checked runtime error, and this should be documented. Note that while we have two types CHAR and WIDECHAR for scalars (and can also have arrays thereof), there is still only one type TEXT. Conceptually, it should be viewed as holding strings of WIDECHAR, with some convenience functions for putting CHARs into and getting them out of a TEXT, when the programmer knows the value is in this range. The fact that our implementation stores some values in fields of type CHAR is a hidden implementation detail. There is nothing in the abstraction that requires it to be done this way, or enables clients to know that. We do have two kinds of text literals, conventional and wide. They differ only in how the value is specified, and the ability to specify characters outside of CHAR. > > Java is in-between but also simple & predictable -- there being no narrow option other than array of byte, which is reasonable. > > > - Jay From rodney_bates at lcwb.coop Thu Jun 28 04:12:26 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Wed, 27 Jun 2012 21:12:26 -0500 Subject: [M3devel] UTF-8 TEXT In-Reply-To: References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> Message-ID: <4FEBBD8A.5020206@lcwb.coop> On 06/27/2012 07:32 PM, Antony Hosking wrote: > So what do we do about 6-byte UTF-8 code points? They won't fit in WIDECHAR. Surely we should allow accessing a UTF-8 character as a CARDINAL and be done with it? > Absolutely. Except I think a better way is to make WIDECHAR big enough to hold all of Unicode. > Sent from my iPad > > On Jun 27, 2012, at 3:20 PM, "Rodney M. Bates" wrote: > >> >> >> On 06/26/2012 10:30 PM, Hendrik Boom wrote: >>> On Tue, Jun 26, 2012 at 04:22:22PM -0400, Coleburn, Randy wrote: >>>> I seem to recall that Rodney did some work a while back relating to TEXT. >>>> Rodney, can you weigh in on some of this? >>>> --Randy Coleburn >>>> >>>> From: Dragi?a Duri? [mailto:dragisha at m3w.org] >>>> Sent: Tuesday, June 26, 2012 12:46 PM >>>> To: Jay >>>> Cc: m3devel >>>> Subject: EXT Re: [M3devel] AND (., 16_ff). Not serious - or so I hope! >>>> >>>> You had idea in other message. Store length! >>>> >>>> Another idea - store partial list of indices to character locations. So whatever one does, that list can be used/expanded. Whatever storage issues this makes, they are probably minor as compared to 32bit WIDECHAR for all idea. >>> >>> Most of the time, you don't need explicit integer indexes to character >>> locations. What you do need is an operation that fetches a character >>> given the string and its index (whatever data structure that index is), >>> and one that increments the index past that character. As long as you >>> can save an index and use it later on the same string, that's probably >>> all you ever need. And with a simple TEXT representation (such as the >>> obvious array of bytes containing characters of various widths) a byte >>> index is all you need (note: NOT a character index). It's easy even to >>> use TEXT and its integer indices as the data representation, as long as >>> you use the proper functions parse the characters and increment the >>> indices by amounts that might differ from 1. >>> >>> And if your source code is represented in UTF-8, the representation that >>> requires little extra compiler effort to parse, your TEXT strings will >>> automagically appear in UTF-8. >> >> The original designers of the language and its libraries have given us >> two different abstractions for handling character strings (in addition >> to plain arrays.) 1) Text, and 2) Wr, Rd, and their cousins. >> >> Text is highly general and easy to use. Concatentations and substrings >> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). >> >> Wr and friends restrict you to sequential access, at least mostly, but >> gain implementation convenience and efficiency as a result. >> >> I feel very stongly that we should *not* take away the full generality >> of Text, especially efficient random access, to handle variable-length >> character encodings in strings. For these, lets make more friends of >> Wr and Rd, which already assume sequential access. For example, a >> filter pipe that sequentially reads a Text/Array/stream, applies a UTF-8 >> interpretation to its bytes, and delivers a stream of Unicode characters, >> in variables of type WIDECHAR. >> >> Text should preserve the abstraction that it's a string of characters, >> generalized as it already is in cm3, to have type WIDECHAR, so they can be any >> Unicode character. The internal representation should, usually, not be >> of concern. >> >> Note that nowhere in Text are character values transferred between >> a Text.T and any form of I/O stream. In the Text abstraction, all >> characters go in and out of a Text.T in variables of type CHAR, >> WIDECHAR, and arrays thereof. IO, etc. is only done in streams, >> e.g, TextWr. We can easily add new variants of these that encode/decode >> by various rules. >> >> Of course, it is still valid to put a string of bytes in a Text.T and >> apply, e.g., UTF-8 interpretation yourself. But that's lower-level >> programming, and shouldn't confuse the abstraction. >> >>> >>> I can see a use for various wide characters -- the things you extract >>> from a TEXT by parsing biits of it, but none for anything >>> really new complicated for wide TEXT. >>> >>> The only confusing thing is that the existing operations for extracting >>> bytes from TEXT have names that suggest they are extracting characters. >>> >> >> I think it's more than a suggestion. I think the abstraction clearly >> considers them characters. And it should stay that way. If you want, >> at a higher level of code, to treat them as bytes, that's fine, but the >> abstraction continues to view them as characters (which only you, the >> client, know is not really so.) >> >>> -- Hendrik >>> > From jay.krell at cornell.edu Thu Jun 28 07:31:04 2012 From: jay.krell at cornell.edu (Jay K) Date: Thu, 28 Jun 2012 05:31:04 +0000 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <4FEB676B.1010505@lcwb.coop> References: <1340650916.77840.YahooMailClassic@web29703.mail.ird.yahoo.com>, , <68D99A3E-8244-402A-8718-20DA1669F61A@m3w.org>, , , , <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org>, , , , <20120625203422.GA24287@topoi.pooq.com>, , , , <20120626160005.GA29355@topoi.pooq.com>, , <20120626185008.50E131A205B@async.async.caltech.edu>, , , , <4FEB676B.1010505@lcwb.coop> Message-ID: ? > Random access by *character* number is easy and, hopefully, implemented ?> with efficiency at least better than O(n). ? Random access by "something, not 'character'" should be O(1). > > I kind of thing that immutability and quadratic growth are in conflict. > > They are, to a considerable extent, as with all functional-style data structures. > But more sophisticated (i.e., complicated) implementations can mitigate somewhat. I'm hoping we can win here somehow. In Java and C# they solve this by having, in a sense, two string types. constant "string"s an mutable "StringBuffer"s Strings never grow. They are always flat. StringBuffers grow quadratically. They are always flat. They are mutable. I suspect we need do something similar. Somehow. As I understand, C# and Java do expose string concatenation. As I understand, they are similar to Modula-3 here, in that the compiler knows about string concatenation and rewrites the code somewhat. Thinking about it further, I suspect my example also can't/doesn't run performantly in Java or C# either. Hopefully we can come up with some good solution to this. I have to run. ?- Jay ---------------------------------------- > Date: Wed, 27 Jun 2012 15:04:59 -0500 > From: rodney_bates at lcwb.coop > To: m3devel at elegosoft.com > Subject: Re: [M3devel] Windows, Unicode file names > > > > On 06/27/2012 05:19 AM, Jay K wrote: > > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > > > I don't quite agree. > > There are two ideal approaches. > > 1) > > TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) > > "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR > > > > > > 2) something that can change between them, or possibly store both, but is still mainly flat arrays > > That is, once you store a value over 0xFF, the internal represenation changes to flat array of WIDECHAR. > > Probably it stays that way -- you don't want to thrash back and forth in worst case. > > Lesser evil is probably to stick with wide represenation. > > Setting the string to empty might bounce it back narrow. > > Ditto assigning it from another narrow text, maybe. > > > > This is similar to what the cm3 modification of Text does now. The details of > what goes on inside the implementation are a bit different than you describe. > There can be mixtures of 8-bit string fragments and 16-bit string fragments, plus > other stuff hooking them together. But the abstraction works just like this. > > > > > > > What I don't yet understand in all this is how to efficiently combine thread safety, immutability, and quadratic growth. > > > > The following should be as efficient as in typical C++ libraries: > > > > > > VAR a: TEXT; > > VAR a: TEXT:= " "; > > > WHILE TRUE DO > > a := a & " "; > > END; > > > > In pm3 Text, this will take quadratic time and linear space. The partial > strings will be garbage collected, as no copies of the pointers to them > are made. GetChar is then O(1). > > In cm3 Text, this is linear in both time and space, but the space usage > has a much higher constant factor than in pm3. In pm3, the asymptotic space used > is exactly what the characters themselves require, i.e, one byte per character. > For cm3, I count 21 native words per character, plus fragmentation loss > for 3 separate heap objects per character. That's 84 times or 168 times, > depending on word size. Well, lots of people keep saying RAM is virtually > free these days. I guess we really need to hope they are right. > GetChar is O(n) when the string is built linearly like this. > Best case is O(log n) when built by Cats of single characters. > > My modification of cm3 Text lies between these. It flattens strings > up to a point, then does some imperfect balancing of them higher in trees. > > Frankly, I think I like going back to the pm3 implementation best. > > > > > I kind of thing that immutability and quadratic growth are in conflict. > > They are, to a considerable extent, as with all functional-style data structures. > But more sophisticated (i.e., complicated) implementations can mitigate somewhat. > > > But not because that sounds obvious. > > Note that typical C++ libraries do have value semantics for std::string and std::vector. > > > > > > - Jay > > > > > > > From: dragisha at m3w.org > > > Date: Wed, 27 Jun 2012 11:52:53 +0200 > > > To: mika at async.caltech.edu > > > CC: m3devel at elegosoft.com > > > Subject: Re: [M3devel] Windows, Unicode file names > > > > > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > > > > > What we need is to make compler map from input encoding (whatever user chooses or is choosen for him) to internal UTF8. > > > > > > On Jun 26, 2012, at 8:50 PM, Mika Nystrom wrote: > > > > > > > > > > > As far as I know, SRC M3 and PM3 come with a TEXT implementation that > > > > works exactly as described below. An extra byte is used at the end with > > > > a character VAL(0,CHAR). The Texts are simply arrays of 8-bit characters. > > > > > > > > One of the big advantages of the old version is that Text.Hash is really, > > > > really fast. Especially on Alphas... it's hugely more expensive to > > > > have hash tables (i.e., Modula-3 generic Tables) keyed on Texts under > > > > CM3 than under the old compilers and runtimes. We're talking a factor > > > > of five or so in speed since the Table routines are generally entirely > > > > dominated by Text.Hash. > > > > > > > > Mika > > > > > > > > Hendrik Boom writes: > > > >> On Mon, Jun 25, 2012 at 08:46:18PM +0000, Jay K wrote: > > > >>> > > > >>> Somewhat but not fully. Text.Length should fetch a stored length. As > > > >>> I'm sure it already does.That length should always be correctly > > > >>> maintained. Same as today.Adding one extra nul at the end doesn't > > > >>> invalidate the data.std::string has the same properties -- c_str() can > > > >>> on-demand append a terminal nul,but there could also be one in the > > > >>> string itself.I understand it is a bit wierd. Maintaining a terminal > > > >>> nul does add cost that might be wasted.And reduces the capacity by > > > >>> one.It could be on-demand, I guess. - Jay > > > >> > > > >> Don't need the 'on demand'. For the benefits of C interoperability, the > > > >> extra byte is well worth the price. What I'm worrying about is someone > > > >> using an enbedded NUL as an end-of-string marker. I smell more bugs > > > >> creeping in. But I guess bug are inherent in C use, so I'm not > > > >> surprised seeing them in C interoperation. > > > >> > > > >> -- hendrik > > > From hendrik at topoi.pooq.com Thu Jun 28 14:37:56 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 28 Jun 2012 08:37:56 -0400 Subject: [M3devel] Windows, Unicode file names In-Reply-To: References: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> <20120626185008.50E131A205B@async.async.caltech.edu> <4FEB676B.1010505@lcwb.coop> Message-ID: <20120628123756.GA2279@topoi.pooq.com> On Thu, Jun 28, 2012 at 05:31:04AM +0000, Jay K wrote: > > ? > Random access by *character* number is easy and, hopefully, implemented > ?> with efficiency at least better than O(n). > ? > > Random access by "something, not 'character'" should be O(1). Quite agree. There shoule be a fetch-byte operation, and a fetch-characcter operation. Fetch-character should return a character and the index to the next character. > > > > > I kind of thing that immutability and quadratic growth are in conflict. > > > > They are, to a considerable extent, as with all functional-style data structures. > > But more sophisticated (i.e., complicated) implementations can mitigate somewhat. > > I'm hoping we can win here somehow. > In Java and C# they solve this by having, in a sense, two string types. > constant "string"s > an mutable "StringBuffer"s > Strings never grow. They are always flat. > StringBuffers grow quadratically. They are always flat. They are mutable. > > I suspect we need do something similar. Somehow. > > As I understand, C# and Java do expose string concatenation. > As I understand, they are similar to Modula-3 here, in that the compiler knows > about string concatenation and rewrites the code somewhat. > Thinking about it further, I suspect my example also can't/doesn't run performantly in Java or C# either. > > > Hopefully we can come up with some good solution to this. > I have to run. Initially, create a string as a simple array of bytes. Then, when we start concatenating, use a cm3-like representation. (we could delay this until our string gets a little long, or until a pointer to it gets copied. Maintian that as long as we're still concatenating. We might try balancing the tree somewhat if it gets biggish. But as soon as we start indexing or hashing, or anything like that, we can change representation to the simple array of byte. Usually at that point we're finished concatenating. -- hendrik > > > ?- Jay From hendrik at topoi.pooq.com Thu Jun 28 14:44:46 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 28 Jun 2012 08:44:46 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <4FEB5D09.2080601@lcwb.coop> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> Message-ID: <20120628124446.GB2279@topoi.pooq.com> On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: > > Text is highly general and easy to use. Concatentations and substrings > are easy. Semantics, to its clients, are value semantics, similar to INTEGER. > Random access by *character* number is easy and, hopefully, implemented > with efficiency at least better than O(n). Does it have to be a *character* number we use to index a string? I don't know of any situations where that aspect is importnat enough to force everyone to waste storage on it. -- hendrik From dragisha at m3w.org Thu Jun 28 14:48:38 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 28 Jun 2012 14:48:38 +0200 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120628124446.GB2279@topoi.pooq.com> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> <20120628124446.GB2279@topoi.pooq.com> Message-ID: <01C11478-BAEA-4BAC-8ECE-FA5A28933A44@m3w.org> glyph sounds better, I agree! :) On Jun 28, 2012, at 2:44 PM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: >> >> Text is highly general and easy to use. Concatentations and substrings >> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). > > Does it have to be a *character* number we use to index a string? I > don't know of any situations where that aspect is importnat enough > to force everyone to waste storage on it. > > -- hendrik From hendrik at topoi.pooq.com Thu Jun 28 14:51:03 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 28 Jun 2012 08:51:03 -0400 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> References: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> <20120626185008.50E131A205B@async.async.caltech.edu> <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> Message-ID: <20120628125103.GC2279@topoi.pooq.com> On Wed, Jun 27, 2012 at 01:14:22PM +0200, Dragi?a Duri? wrote: > > On Jun 27, 2012, at 12:19 PM, Jay K wrote: > > > > More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). > > > > I don't quite agree. > > There are two ideal approaches. > > 1) > > TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) > > "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR > > So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? I'm starting to discover that a lot of my English documents have nonAscii chracters in them. In particular, the separate open and close quotation marks around quoted speech take more than one byte in Unicode. True, in a starvation-level character set, they are both represented as " , but that's really not what they are. -- hendrik From dabenavidesd at yahoo.es Thu Jun 28 15:51:59 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 28 Jun 2012 14:51:59 +0100 (BST) Subject: [M3devel] UTF-8 TEXT In-Reply-To: <01C11478-BAEA-4BAC-8ECE-FA5A28933A44@m3w.org> Message-ID: <1340891519.52552.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: it can't be used like that (as in DEC-SRC early versions) because TEXT is opaque type, you can't reveal it like that at that level. Anyway whoever needs characteristics of JVM or so languages must know that the likes are very inefficient, and C is pretty nit but also very much UNSAFE, perhaps if somebody wants that maybe should use integer strings to map directly from hardware every symbol of the computer (but support every kind of format seems over-complex in space and time) Operating system normally doesn't handle I/O in many cases, but the I/O subsystem (like Windows I/O) and in some cases it takes advantages of not waiting for a thread to return control over the app. There are many computers that use non-ASCII terms but normally they support that script, so why put more weight on it? Maybe I should ask either lexicographers or specific language users to know what they need for they in a comprehensible manner and not hard coded standards many don't use still. Thanks in advances --- El jue, 28/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] UTF-8 TEXT Para: "Hendrik Boom" CC: m3devel at elegosoft.com Fecha: jueves, 28 de junio, 2012 07:48 glyph sounds better, I agree! :) On Jun 28, 2012, at 2:44 PM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: >> >> Text is highly general and easy to use.? Concatentations and substrings >> are easy.? Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). > > Does it have to be a *character* number we use to index a string?? I > don't know of any situations where that aspect is importnat enough > to force everyone to waste storage on it. > > -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From rodney_bates at lcwb.coop Thu Jun 28 16:10:02 2012 From: rodney_bates at lcwb.coop (Rodney M. Bates) Date: Thu, 28 Jun 2012 09:10:02 -0500 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <20120628124446.GB2279@topoi.pooq.com> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> <20120628124446.GB2279@topoi.pooq.com> Message-ID: <4FEC65BA.9080809@lcwb.coop> On 06/28/2012 07:44 AM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: >> >> Text is highly general and easy to use. Concatentations and substrings >> are easy. Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). > > Does it have to be a *character* number we use to index a string? I > don't know of any situations where that aspect is importnat enough > to force everyone to waste storage on it. > > -- hendrik > It is absolutely essential that it be a character, if you care about Text being a meaningful abstraction. A byte index is a very low level view, now that we have a variable-length encoding, and *especially* now that there are multiple possible ways of representing strings. strings. When it was only ASCII (or ISO-latin1), it was a character index, and the abstraction was there. The fact that it was also a byte index is a coincidental consequence of the choice of underlying physical representation. Now we have a much messier situation regarding representations, but we should not destroy the abstraction and force everyone to always get down into the bowels of the different representations. There will still be mechanisms for low-level coding if you have some compelling reason, or just don't want to rewrite something existing. But let's protect the option of dealing with characters with the same abstraction we have had in the past. From dabenavidesd at yahoo.es Thu Jun 28 17:18:31 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Thu, 28 Jun 2012 16:18:31 +0100 (BST) Subject: [M3devel] UTF-8 TEXT In-Reply-To: <4FEC65BA.9080809@lcwb.coop> Message-ID: <1340896711.88750.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: string class is? a super-set by definition of CHARs scripts, note TEXT a primitive type, so it can have every string characteristics. Thus we don't need any other non-primitive TEXT types The need for other TEXT isn't a matter, as it can add or have any characters but put burden of choice to the implementation. WIDECHARs aren't at all needed by Modula-3 at all, but to keep copying CHARs in other is not in my view more string formats is the real advantage to speed up implementation to get two CHARs strings. So I agree in that we must look the performance burden in citing implementations, for instance keep compatibility without loosing special performance. My view is that we need to re implement that in m3core, in either C, for instance or some safe subset of Modula-3 to speed up a little, for instance DEC-SRC, etc, or a subset of SPIN-M3 (somethings I like). But this is more stuff to do, fun certainly, but I would want to concentrate in supporting either that by OS definition, or by accessing hardware (who cares using C RT for Linux, but if we can be faster let's do it in whatever it takes). In the end we can provide better interfaces to develop current OS than they provide to us, so what then it matters if we offer some code to Linux if at all, interested. Greg Nelson told that Rd/Wr are a very nice piece of string type unappreciated by most of the current mainstream languages. Thanks in advance Thanks in advance ? --- El jue, 28/6/12, Rodney M. Bates escribi?: De: Rodney M. Bates Asunto: Re: [M3devel] UTF-8 TEXT Para: m3devel at elegosoft.com Fecha: jueves, 28 de junio, 2012 09:10 On 06/28/2012 07:44 AM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: >> >> Text is highly general and easy to use.? Concatentations and substrings >> are easy.? Semantics, to its clients, are value semantics, similar to INTEGER. >> Random access by *character* number is easy and, hopefully, implemented >> with efficiency at least better than O(n). > > Does it have to be a *character* number we use to index a string?? I > don't know of any situations where that aspect is importnat enough > to force everyone to waste storage on it. > > -- hendrik > It is absolutely essential that it be a character, if you care about Text being a meaningful abstraction.? A byte index is a very low level view, now that we have a variable-length encoding, and *especially* now that there are multiple possible ways of representing strings. strings. When it was only ASCII (or ISO-latin1), it was a character index, and the abstraction was there.? The fact that it was also a byte index is a coincidental consequence of the choice of underlying physical representation.? Now we have a much messier situation regarding representations, but we should not destroy the abstraction and force everyone to always get down into the bowels of the different representations. There will still be mechanisms for low-level coding if you have some compelling reason, or just don't want to rewrite something existing. But let's protect the option of dealing with characters with the same abstraction we have had in the past. -------------- next part -------------- An HTML attachment was scrubbed... URL: From hendrik at topoi.pooq.com Thu Jun 28 19:02:30 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Thu, 28 Jun 2012 13:02:30 -0400 Subject: [M3devel] UTF-8 TEXT In-Reply-To: <4FEC65BA.9080809@lcwb.coop> References: <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> <20120628124446.GB2279@topoi.pooq.com> <4FEC65BA.9080809@lcwb.coop> Message-ID: <20120628170230.GD2279@topoi.pooq.com> On Thu, Jun 28, 2012 at 09:10:02AM -0500, Rodney M. Bates wrote: > > > On 06/28/2012 07:44 AM, Hendrik Boom wrote: > >On Wed, Jun 27, 2012 at 02:20:41PM -0500, Rodney M. Bates wrote: > >> > >>Text is highly general and easy to use. Concatentations and substrings > >>are easy. Semantics, to its clients, are value semantics, similar to INTEGER. > >>Random access by *character* number is easy and, hopefully, implemented > >>with efficiency at least better than O(n). > > > >Does it have to be a *character* number we use to index a string? I > >don't know of any situations where that aspect is importnat enough > >to force everyone to waste storage on it. > > > >-- hendrik > > > > It is absolutely essential that it be a character, if you care about > Text being a meaningful abstraction. A byte index is a very low level > view, now that we have a variable-length encoding, and *especially* > now that there are multiple possible ways of representing strings. > strings. I'm not arguing whether the index should point to a character. I'm questioning whether it need be a count of characters. This is surely a matter of data representation rather then concept. A character index could be implemented in a variety of ways. It certainly could be implemented as a character count, presumably for legacy applications with attendant costs. It could be implemented as a byte count if the string were implemented as an array of bytes. It could be implemented as a machine address, constrained to index into a particular string. It could be implemented as a pointer into a linked list of string pieces, together with an offset indicating where in that piece it currently points. We could even implement byte and character counts in the more exotic TEXT data structures if we chose; we have freedom of representation of TEXT without compromising integer. We can implement *both* character extractors using an INTEGER *byte* count AND character extractors using an INTEGER *character* count. And we can do this in just about any representation of TEXT we come up with. THe specification for the abstraction doesn't even have to say that it'a a byte count. It's sufficient to say one can use an index that is chosen for implementation efficiency. Though it's tempting to provide a byte count for an operation that extracts bytes, not characters. Now that would be a low-level operation that does break the abstraction. -- hendrik > > When it was only ASCII (or ISO-latin1), it was a character > index, and the abstraction was there. The fact that it was also a > byte index is a coincidental consequence of the choice of underlying > physical representation. Now we have a much messier situation regarding > representations, but we should not destroy the abstraction and force > everyone to always get down into the bowels of the different representations. > > There will still be mechanisms for low-level coding if you have some > compelling reason, or just don't want to rewrite something existing. > But let's protect the option of dealing with characters with the same > abstraction we have had in the past. Yes, it was obviously a mistake for Modula 3 not to distringuish between two types for character and byte. And it's not the only language to have have made that mistake. There's two different abstractions here, with different meanings, but they share one name and one implementation. Frankly, I don't care which of the two retains the name CHAR. It's all the same to me whether (a) characters are called WIDECHAR and bytes CHAR or (b) characters are called CHAR and bytes BYTE. because either way proograms are going to have to be changed to adapt to the new world. (a) is probably less disruptive to legacy programs that olny evver need to deal with legacy ASCII files. (b) is probably conceptually cleaner. What's important is that both mechanisms remain available for dealing with values of type TEXT. The designers of Modula 3 have done an admirable job of providing a collection of abstractions that enable both conceptually clean and efficient implementations. Let's not mess it up by providing only a conceptually clean, inefficient interface. -- hendrik From dragisha at m3w.org Thu Jun 28 19:19:48 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Thu, 28 Jun 2012 19:19:48 +0200 Subject: [M3devel] Windows, Unicode file names In-Reply-To: <20120628125103.GC2279@topoi.pooq.com> References: <157AD65E-0A51-4744-A7EF-9DC6D0108DD1@m3w.org> <20120625203422.GA24287@topoi.pooq.com> <20120626160005.GA29355@topoi.pooq.com> <20120626185008.50E131A205B@async.async.caltech.edu> <03E8005D-CD75-4699-A703-518F219A6F09@m3w.org> <20120628125103.GC2279@topoi.pooq.com> Message-ID: My language (Serbian) is written with two alphabets. Before ISO-8859-2 we used ten (yes, 10) different encodings to represent our alphabet(s) with 8 bits. With ISO-8859-2 we got solution for Latin alphabet, but we had to use ISO-8859-5 for Cyrillic. One of our ten encodings (national standard come late) covered both Latin and Cyrillic in 8 bit. Back in 1991-2 I implemented system for handling above mentioned ten encodings. After that experience, an after decade or so of using/fighting ten encodings, you can trust me - even a notion of having single encoding for all language needs is a lifesaver :). That is where my oversensitivity to idea of having two ways to interpret strings comes from. Two ways, just because we can? Ok, we can use two, we can use ten, we can use fifty encodings!! But sensible way is to use one, if possible. And it is possible! It is called UTF-8. On Jun 28, 2012, at 2:51 PM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 01:14:22PM +0200, Dragi?a Duri? wrote: >> >> On Jun 27, 2012, at 12:19 PM, Jay K wrote: >> >>>> More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). >>> >>> I don't quite agree. >>> There are two ideal approaches. >>> 1) >>> TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) >>> "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR >> >> So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? > > I'm starting to discover that a lot of my English documents have > nonAscii chracters in them. In particular, the separate open and close > quotation marks around quoted speech take more than one byte in > Unicode. True, in a starvation-level character set, they are both > represented as " , but that's really not what they are. > > -- hendrik From rcolebur at SCIRES.COM Fri Jun 29 01:35:29 2012 From: rcolebur at SCIRES.COM (Coleburn, Randy) Date: Thu, 28 Jun 2012 19:35:29 -0400 Subject: [M3devel] EXT Re: UTF-8 TEXT In-Reply-To: <4FEB5D09.2080601@lcwb.coop> References: <1340712762.68230.YahooMailClassic@web29704.mail.ird.yahoo.com> <2BFED857-E66E-4534-8931-F822E8364D02@m3w.org> <30A65919-D57C-4E4C-97C8-D0BD5F7860EF@m3w.org> <0A373E4C-32CB-4947-A542-67F0910DBEC4@m3w.org> <20120627033000.GB28021@topoi.pooq.com> <4FEB5D09.2080601@lcwb.coop> Message-ID: ... > I feel very stongly that we should *not* take away the full generality of Text, > especially efficient random access, to handle variable-length character > encodings in strings. For these, lets make more friends of Wr and Rd, which > already assume sequential access. For example, a filter pipe that sequentially > reads a Text/Array/stream, applies a UTF-8 interpretation to its bytes, and > delivers a stream of Unicode characters, in variables of type WIDECHAR. > > Text should preserve the abstraction that it's a string of characters, > generalized as it already is in cm3, to have type WIDECHAR, so they can be any > Unicode character. The internal representation should, usually, not be of concern. ... I concur with Rodney. We need to hold true to the design tenants of the language and keep the full generality of Text with efficient random access, and add new variants of the Rd/Wr/etc. abstractions that deal with the various variable-length character encodings as sequential-access streams. --Randy Coleburn From dabenavidesd at yahoo.es Fri Jun 29 02:21:19 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 29 Jun 2012 01:21:19 +0100 (BST) Subject: [M3devel] Windows, Unicode file names In-Reply-To: Message-ID: <1340929279.13051.YahooMailClassic@web29705.mail.ird.yahoo.com> Hi all: in fact CM had the idea of rewriting the Modula-3 language definition in terms of UTF standard, but it never came out, perhaps we will need to maintain two definitions one SPwM3 and two newer CM style, and based on those standards make a front end who can write to the two kind of standards and make them interoperable. One way of promoting CM3 could be talk about a renewed Modula-3, JVM-enabled, etc, system applications (alike Win32, Unix), where as DEC-SRC Modula-3 for research and development with parallelized environment like research system for open AAA compiler (I don't many others writing parallel compilers) with ESC, Vesta, etc. Thanks in advance --- El jue, 28/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Windows, Unicode file names Para: "Hendrik Boom" CC: m3devel at elegosoft.com Fecha: jueves, 28 de junio, 2012 12:19 My language (Serbian) is written with two alphabets. Before ISO-8859-2 we used ten (yes, 10) different encodings to represent our alphabet(s) with 8 bits. With ISO-8859-2 we got solution for Latin alphabet, but we had to use ISO-8859-5 for Cyrillic. One of our ten encodings (national standard come late) covered both Latin and Cyrillic in 8 bit. Back in 1991-2 I implemented system for handling above mentioned ten encodings. After that experience, an after decade or so of using/fighting ten encodings, you can trust me - even a notion of having single encoding for all language needs is a lifesaver :). That is where my oversensitivity to idea of having two ways to interpret strings comes from. Two ways, just because we can? Ok, we can use two, we can use ten, we can use fifty encodings!! But sensible way is to use one, if possible. And it is possible! It is called UTF-8. On Jun 28, 2012, at 2:51 PM, Hendrik Boom wrote: > On Wed, Jun 27, 2012 at 01:14:22PM +0200, Dragi?a Duri? wrote: >> >> On Jun 27, 2012, at 12:19 PM, Jay K wrote: >> >>>> More and more is obvious how ideal structure would be: ARRAY OF CHAR, UTF8 encoded, using SRC M3 Text.Hash(). >>> >>> I don't quite agree. >>> There are two ideal approaches. >>> 1) >>>? TEXT is like ARRAY OF CHAR and no values over 0xFF (or maybe even 0x7F) >>>? "WiDETEXT" is like ARRAY OF WIDECHAR, for 16bit or 32bit WIDECHAR >> >> So we can have two representations for single thing: variable holding some text. And representation depends on a question "do you need non-basic-english-characters"? > > I'm? starting to discover that a lot of my English documents have > nonAscii chracters in them.? In particular, the separate open and close > quotation marks around quoted speech take more than one byte in > Unicode.? True, in a starvation-level character set, they are both > represented as " , but that's really not what they are. > > -- hendrik -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 29 10:35:38 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 29 Jun 2012 10:35:38 +0200 Subject: [M3devel] Simple change to WIDECHAR type Message-ID: m3front/src/builtinTypes/WCharr.m3, line: T := EnumType.New (16_10000, elts); to T := EnumType.New (16_100000, elts); Will this break things? Any other assumptions anywhere? From dabenavidesd at yahoo.es Fri Jun 29 17:47:50 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 29 Jun 2012 16:47:50 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: Message-ID: <1340984870.74508.YahooMailClassic@web29702.mail.ird.yahoo.com> Hi all: more important than a maximum length char type we need a minimum optimal char table size (and thus in word size so we can optimize). Use WIDECHAR and move it towards moduli arithmetic of CHAR {0..255}, so you can select at RT a common base to represent your character system I/O This happens in micro digital signal processor if you care output of such systems in practice. Also a terminal of characters is important to note for instance as a way of evaluating the speed of signal processor design. DEC had lot of devices like that so for instance to have a common interface to those systems is useful. Many mainframes are still handled mostly by use of that device so, I guess is important for such system to support most types of encodings: http://vt100.net/docs/vt510-rm/chapter8 http://en.wikipedia.org/wiki/ISO/IEC_8859-5 P Zollo write an emulator for that device, so maybe we can test speed of character streaming with that. Thanks in advance --- El vie, 29/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: [M3devel] Simple change to WIDECHAR type Para: "m3devel" Fecha: viernes, 29 de junio, 2012 03:35 m3front/src/builtinTypes/WCharr.m3, line: ? ? T := EnumType.New (16_10000, elts); to ? ? T := EnumType.New (16_100000, elts); Will this break things? Any other assumptions anywhere? -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Fri Jun 29 17:52:55 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Fri, 29 Jun 2012 17:52:55 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: References: Message-ID: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> That, or UTF-16 encoding on top of current WIDECHAR. On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory. In other words, all TEXT containing WIDECHAR will double in size. > > On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: > >> m3front/src/builtinTypes/WCharr.m3, line: >> >> T := EnumType.New (16_10000, elts); >> >> to >> >> T := EnumType.New (16_100000, elts); >> >> Will this break things? Any other assumptions anywhere? >> > From dabenavidesd at yahoo.es Fri Jun 29 18:08:57 2012 From: dabenavidesd at yahoo.es (Daniel Alejandro Benavides D.) Date: Fri, 29 Jun 2012 17:08:57 +0100 (BST) Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> Message-ID: <1340986137.5745.YahooMailClassic@web29704.mail.ird.yahoo.com> Hi all: I repeat we need performance well udnerstood as a matter of issue here. Who cares imposed standards, ISO are the real standards, no point to complain about that as DEC put de-facto on its terminals. Thanks in advance ? --- El vie, 29/6/12, Dragi?a Duri? escribi?: De: Dragi?a Duri? Asunto: Re: [M3devel] Simple change to WIDECHAR type Para: "Antony Hosking" CC: "m3devel" Fecha: viernes, 29 de junio, 2012 10:52 That, or UTF-16 encoding on top of current WIDECHAR. On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory.? In other words, all TEXT containing WIDECHAR will double in size. > > On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: > >> m3front/src/builtinTypes/WCharr.m3, line: >> >>???T := EnumType.New (16_10000, elts); >> >> to >> >>???T := EnumType.New (16_100000, elts); >> >> Will this break things? Any other assumptions anywhere? >> > -------------- next part -------------- An HTML attachment was scrubbed... URL: From dragisha at m3w.org Sat Jun 30 09:33:00 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 09:33:00 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> Message-ID: <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> Current GetChar/SetChars and GetWideChar/SetWideChars are not character-level access methods, in terms of Unicode. They are "byte-level", fixed width data accesses. Reason: Both CHAR (cardinality 2^8) and WIDECHAR (cardinality 2^16) based strings must use one or more characters to represent whole Unicode (cardinality 2^20). If we must encode in any case, then we don't have any benefit of WIDECHAR (as it is implemented/understood now) at all! To represent Unicode with either CHAR or WIDECHAR based TEXTs - we must use either UTF-8 or UTF-16. Both are one-to-multibyte encodings, encoding one Unicode character to either 1-4 CHARs or 1-2 WIDECHARs. What exactly is meaning (at Modula-3 usual levels of abstraction) of character-level access? Do we need whatever bit pattern physically happening at some location in our data's representation. Or maybe we need numerical representation of actual, visually distinguishable in written representation, Unicode character value? One from that set of 2^20 elements? What is meaning of Text.Sub() based on byte-level access operations where our resulting TEXTs first character is in fact a prefix of some Unicode characters encoding? And/or where our last character is invalid/incomplete suffix of some encoded character. Since when are fast and efficient operations doing something we don't need at all our priority? We are getting nothing at all with WIDECHAR. No. Single. Thing. WIDECHAR does not make us closer to Unicode at all. WIDECHAR, together with CHAR (in context of our current TEXT) makes two almost-solutions to Unicode problem and existence of WIDECHAR scalar type makes us a bit closer to Unicode almost-solution of C world and nothing else. Currently, neither GetChar nor GetWideChar can get "a character at nth position". Reason: No character scalar type to keep any Unicode character. Solution: ====== * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can hold unencoded Unicode characters in scalar values in our Modula-3 programs, while preserving their properties. * Implement properties, relations and methods defined for Unicode. With ASCII, numeric order is everything. With Unicode - it is not. This is probably very big project but we can start somewhere, and let interested parties build on it. Dirk Muysers did work in this regard already. * Whoever thinks we don't need this and our "tradition" and "legacy" are important, please read this: http://unicode.org/standard/WhatIsUnicode.html . dd On Jun 29, 2012, at 5:52 PM, Dragi?a Duri? wrote: > That, or UTF-16 encoding on top of current WIDECHAR. > > On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > >> That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory. In other words, all TEXT containing WIDECHAR will double in size. >> >> On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: >> >>> m3front/src/builtinTypes/WCharr.m3, line: >>> >>> T := EnumType.New (16_10000, elts); >>> >>> to >>> >>> T := EnumType.New (16_100000, elts); >>> >>> Will this break things? Any other assumptions anywhere? >>> >> > From dragisha at m3w.org Sat Jun 30 10:56:27 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 10:56:27 +0200 Subject: [M3devel] Some earlier work Message-ID: <31C25C13-66B2-4637-9D33-C2E5E80AB9DA@m3w.org> This is how we implemented UTF8 strings over current TEXTs. Current implementation is UNSAFE and uses glibc utf8 methods. Nothing too complicated and nothing we can't implemented in Modula-3/portable C. ===== INTERFACE UText; TYPE T = TEXT; Char = CARDINAL; PROCEDURE Cat(t, u: T): T; PROCEDURE Equal(t, u: T): BOOLEAN; PROCEDURE GetChar(t: T; i: CARDINAL): Char; PROCEDURE ByteSize(t: T): CARDINAL; PROCEDURE Length(t: T): CARDINAL; PROCEDURE Empty(t: T): BOOLEAN; PROCEDURE Sub(t: T; start: CARDINAL; length: CARDINAL := LAST(CARDINAL)): T; PROCEDURE SetChars(VAR a: ARRAY OF Char; t: T); PROCEDURE FromChar(ch: Char): T; PROCEDURE FromChars(READONLY a: ARRAY OF Char): T; PROCEDURE Hash(t: T): Word.T; PROCEDURE Compare(t1, t2: T): [-1..1]; PROCEDURE FindChar(t: T; ch: Char; start: CARDINAL := 0): INTEGER; PROCEDURE FindCharR(t: T; ch: Char; start: CARDINAL := LAST(INTEGER)): INTEGER; END UText. From hendrik at topoi.pooq.com Sat Jun 30 16:29:24 2012 From: hendrik at topoi.pooq.com (Hendrik Boom) Date: Sat, 30 Jun 2012 10:29:24 -0400 Subject: [M3devel] Some earlier work In-Reply-To: <31C25C13-66B2-4637-9D33-C2E5E80AB9DA@m3w.org> References: <31C25C13-66B2-4637-9D33-C2E5E80AB9DA@m3w.org> Message-ID: <20120630142924.GB12402@topoi.pooq.com> On Sat, Jun 30, 2012 at 10:56:27AM +0200, Dragi?a Duri? wrote: > This is how we implemented Any chance you could show us the implementation and not just the INTERFACE? -- hendrik From dragisha at m3w.org Sat Jun 30 16:39:16 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 16:39:16 +0200 Subject: [M3devel] Some earlier work In-Reply-To: <20120630142924.GB12402@topoi.pooq.com> References: <31C25C13-66B2-4637-9D33-C2E5E80AB9DA@m3w.org> <20120630142924.GB12402@topoi.pooq.com> Message-ID: <0C45D4FF-8279-404F-A68E-35656D261959@m3w.org> Of course. http://dl.dropbox.com/u/60554338/UText.m3 On Jun 30, 2012, at 4:29 PM, Hendrik Boom wrote: > On Sat, Jun 30, 2012 at 10:56:27AM +0200, Dragi?a Duri? wrote: >> This is how we implemented > > Any chance you could show us the implementation and not just the INTERFACE? > > -- hendrik From jay.krell at cornell.edu Sat Jun 30 18:52:54 2012 From: jay.krell at cornell.edu (Jay K) Date: Sat, 30 Jun 2012 16:52:54 +0000 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> References: , , <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org>, <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> Message-ID: I don't fully buy this. 16bit WIDECHAR is very useful on Windows. It can be used directly with a vast vast vast vast number of functions. 32bit char would be require conversion to and from all the time. As well, there are no codepages when using 16 characters. 8 bit characters are interpreted in a way on/by Windows that varies per OS and per user and which isn't stored with the string. I realize that Modula-3 code doesn't necessarily use the same interpretation. "no codepages" is the advantage of utf8 -- pick one "code page". If "code page" means "how to encode/decode more than 8 bits, 8 bits at a time. Hope all the data is 7 bit clean, so it doesn't matter. Otherwise convert to and from a lot. I do understand that current Unicode requires 20 bits, and that a 32bit character type is justifiable. As I understand, this was debated when Unicode was first designed but rejected as too large. - Jay ---------------------------------------- > From: dragisha at m3w.org > Date: Sat, 30 Jun 2012 09:33:00 +0200 > To: antony.hosking at gmail.com > CC: m3devel at elegosoft.com > Subject: Re: [M3devel] Simple change to WIDECHAR type > > Current GetChar/SetChars and GetWideChar/SetWideChars are not character-level access methods, in terms of Unicode. They are "byte-level", fixed width data accesses. Reason: Both CHAR (cardinality 2^8) and WIDECHAR (cardinality 2^16) based strings must use one or more characters to represent whole Unicode (cardinality 2^20). If we must encode in any case, then we don't have any benefit of WIDECHAR (as it is implemented/understood now) at all! > > To represent Unicode with either CHAR or WIDECHAR based TEXTs - we must use either UTF-8 or UTF-16. Both are one-to-multibyte encodings, encoding one Unicode character to either 1-4 CHARs or 1-2 WIDECHARs. > > What exactly is meaning (at Modula-3 usual levels of abstraction) of character-level access? Do we need whatever bit pattern physically happening at some location in our data's representation. Or maybe we need numerical representation of actual, visually distinguishable in written representation, Unicode character value? One from that set of 2^20 elements? > > What is meaning of Text.Sub() based on byte-level access operations where our resulting TEXTs first character is in fact a prefix of some Unicode characters encoding? And/or where our last character is invalid/incomplete suffix of some encoded character. > > Since when are fast and efficient operations doing something we don't need at all our priority? > > We are getting nothing at all with WIDECHAR. No. Single. Thing. WIDECHAR does not make us closer to Unicode at all. WIDECHAR, together with CHAR (in context of our current TEXT) makes two almost-solutions to Unicode problem and existence of WIDECHAR scalar type makes us a bit closer to Unicode almost-solution of C world and nothing else. > > Currently, neither GetChar nor GetWideChar can get "a character at nth position". Reason: No character scalar type to keep any Unicode character. > > Solution: > ====== > > * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can hold unencoded Unicode characters in scalar values in our Modula-3 programs, while preserving their properties. > * Implement properties, relations and methods defined for Unicode. With ASCII, numeric order is everything. With Unicode - it is not. This is probably very big project but we can start somewhere, and let interested parties build on it. Dirk Muysers did work in this regard already. > * Whoever thinks we don't need this and our "tradition" and "legacy" are important, please read this: http://unicode.org/standard/WhatIsUnicode.html . > > dd > > On Jun 29, 2012, at 5:52 PM, Dragi?a Duri? wrote: > > > That, or UTF-16 encoding on top of current WIDECHAR. > > > > On Jun 29, 2012, at 3:50 PM, Antony Hosking wrote: > > > >> That will change WIDECHAR from a value consuming 16-bits of memory into a value consuming 32-bits of memory. In other words, all TEXT containing WIDECHAR will double in size. > >> > >> On Jun 29, 2012, at 4:35 AM, Dragi?a Duri? wrote: > >> > >>> m3front/src/builtinTypes/WCharr.m3, line: > >>> > >>> T := EnumType.New (16_10000, elts); > >>> > >>> to > >>> > >>> T := EnumType.New (16_100000, elts); > >>> > >>> Will this break things? Any other assumptions anywhere? > >>> > >> > > > From dragisha at m3w.org Sat Jun 30 19:17:23 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 19:17:23 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: References: , , <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org>, <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> Message-ID: <7E7703E6-48BD-4DCE-8336-97D41249EDCF@m3w.org> And since when usefulness on Windows defined anything Modula-3? To use vastvastvastnumber of Windows functions on Modula-3 TEXT you must call at least one Modula-3 function on your argument to make it passable to Windows API function. To make another single-call Modula-3 function mapping UTF-8 to Windows API acceptable argument is five minutes task. So, you are in fact gaining nothing with WIDECHAR you can't have with UTF8 packed in Text8.T. 32bit characters is what we have on non-Windows. And we must convert all the time if we are to use Modula-3 WIDECHAR based TEXT to non-Windows wchar strings. Are you arguing Windows is more important than all other platforms we support or what? On Jun 30, 2012, at 6:52 PM, Jay K wrote: > > I don't fully buy this. 16bit WIDECHAR is very useful on Windows. > > It can be used directly with a vast vast vast vast number of functions. > > 32bit char would be require conversion to and from all the time. -------------- next part -------------- An HTML attachment was scrubbed... URL: From mika at async.caltech.edu Sat Jun 30 19:24:01 2012 From: mika at async.caltech.edu (Mika Nystrom) Date: Sat, 30 Jun 2012 10:24:01 -0700 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> Message-ID: <20120630172401.DFE8E1A207C@async.async.caltech.edu> =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: ... > >Solution: >=3D=3D=3D=3D=3D=3D > >* Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >hold unencoded Unicode characters in scalar values in our Modula-3 = >programs, while preserving their properties. >* Implement properties, relations and methods defined for Unicode. With = >ASCII, numeric order is everything. With Unicode - it is not. This is = >probably very big project but we can start somewhere, and let interested = >parties build on it. Dirk Muysers did work in this regard already. >* Whoever thinks we don't need this and our "tradition" and "legacy" are = >important, please read this: = >http://unicode.org/standard/WhatIsUnicode.html . > >dd Given what you have said about the near-uselessness of WIDECHAR, does anything actually use it much? What breaks if it is redefined to be the same as, say, INTEGER? (Or Word.T) CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if that could go back to using the SRC data structures. For people who do stuff like write VLSI design tools... (probably many other large-scale applications would like it too). Mika From dragisha at m3w.org Sat Jun 30 20:12:45 2012 From: dragisha at m3w.org (=?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?=) Date: Sat, 30 Jun 2012 20:12:45 +0200 Subject: [M3devel] Simple change to WIDECHAR type In-Reply-To: <20120630172401.DFE8E1A207C@async.async.caltech.edu> References: <4AD84314-8E53-4A50-A51B-A0DF8DC910BC@m3w.org> <031FD9D2-8C62-48B3-8E2F-1122B7C1ED96@m3w.org> <20120630172401.DFE8E1A207C@async.async.caltech.edu> Message-ID: <7BCB3BB7-D2F2-470B-8E70-2A7FF274E0FC@m3w.org> I don't see where WIDECHAR can be useful, such as it is. Esp. since TEXT in cm3 is non-flat structure, and it is almost always additional processing to prepare it even for a Windows API argument. Additional processing from dendriform cm3 TEXT is in no way more efficient if some nodes are already just-like-Windows-texts. Also, cm3 TEXT is overengineered - I hope I don't have to argue this. Everything is second to efficient concat operation. IMO, we must leave TEXT to be simple and CHAR based. Just like you need for your VLSI tools. And use something like UText.i3/m3 to use such objects to represent Unicode (UTF-8 encoded) any-language strings. And use WText.* for communication with wchar API's like Windows'. BTW, WIDECHAR literals are non sufficiently defined in cm3. There is a hole size of Moon. What is input encoding for source files containing WIDECHAR literals? For example: CONST Me = W"Dragi?a Duri?"; Jay, please explain this to me. My editor creates UTF8 files, for example. What cm3 expects after W" ? On Jun 30, 2012, at 7:24 PM, Mika Nystrom wrote: > > =?utf-8?Q?Dragi=C5=A1a_Duri=C4=87?= writes: > ... >> >> Solution: >> =3D=3D=3D=3D=3D=3D >> >> * Redefine WIDECHAR to hold at least 20 bit values, or create UNICHAR or = >> GLYPH (and leave WIDECHAR as it is for vertical compatibility) so we can = >> hold unencoded Unicode characters in scalar values in our Modula-3 = >> programs, while preserving their properties. >> * Implement properties, relations and methods defined for Unicode. With = >> ASCII, numeric order is everything. With Unicode - it is not. This is = >> probably very big project but we can start somewhere, and let interested = >> parties build on it. Dirk Muysers did work in this regard already. >> * Whoever thinks we don't need this and our "tradition" and "legacy" are = >> important, please read this: = >> http://unicode.org/standard/WhatIsUnicode.html . >> >> dd > > Given what you have said about the near-uselessness of WIDECHAR, does anything > actually use it much? What breaks if it is redefined to be the same as, say, > INTEGER? (Or Word.T) > > CHAR is quite useful for processing 7-bit ASCII, and it would be lovely if > that could go back to using the SRC data structures. For people who do stuff > like write VLSI design tools... (probably many other large-scale applications > would like it too). > > Mika