Build an erlang computer (was:Computers are fast)

classic Classic list List threaded Threaded
19 messages Options
Reply | Threaded
Open this post in threaded view
|

Build an erlang computer (was:Computers are fast)

Joe Armstrong (TN/EAB)

Absolutely.

In my never ending quest for inefficiency I have seldom met a problem
which could not be solved in the twinkling of an eye (apart, that is,
from our friend that wanted to do O(10^19) computations to force some
crypto system :-)

I don't know if this is the right place to discuss this, but
I've been thinking about building a quiet and fast computer.

Has anybody done this? - I think I'm beginning to understand
the things I have to do to make it quiet - but how do I make a fast
computer?

I want to maximize my bangs for the buck.

Any ideas on the tradeoffs between processor caches, main memory sizes,
CPU clock speeds etc.

I'd like to optimize for

  - compiling programs with a few hundred source code files
  - making a high performance web server

Where is my money better spent?

   A cheapish processor with as much memory as possible
   Say a Athlon 64 3000+ 2GHz 512KB cache at 1295 kr
   With 4 G memory (about 1000 kr/G)

   Or the cheapest dual core Athlon 64 X2 3800+ 2GH 1MB = 3250kr
   With 2 G memory

In the old days I always said that buying more memory was better than
buying a faster processor - is this still true? - also what
is the effect of increasing the size of the processor cache contra
more main memory for the same money?

Has anybody made any measurements here?

How about cheapo processor, water cooling and overclocking????

Is this the way to go?


<aside> It would be interesting to have some timings
for (say) the time to compile all the Erlang in say
stdlib, done on different processors, with different cache
size etc.
</aside>

Question: Is there a simple program to make main memory totally
unavailable in my linux system? - I'd like to boot the system
saying "use only 100K of memory" or "use only 200K or memory"
(I currently have 512K) so I can measure the effect of different
memory sizes on performance.


Cheers

/Joe

   
 

> -----Original Message-----
> From: [hidden email]
> [mailto:[hidden email]] On Behalf Of David Hopwood
> Sent: den 25 januari 2006 03:16
> To: [hidden email]
> Subject: Computers are fast (was: Recursive list comprehension)
>
> Richard A. O'Keefe wrote:
> > maybe they're right to dismiss the cost as a sacrifice for
> > provably correct programs,
> >
> > But they *DON'T* dismiss cost as an issue.  They just
> aren't one-eyed
> > and prejudiced about it.  They are aware that development
> time is a cost.
> > They are aware that a high level language enlarges the
> scope of what
> > you
> > *CAN* program at all.  They are aware that cheap ($1000)
> computers are
> > now more than a thousand times faster than expensive ($1000000)
> > mainframes of 25 years ago and have about a thousand times
> as much memory.
>
> Absolutely. Recently, during the development of a soft
> real-time program, I accidentally left on a debugging option
> that was dumping megabytes per second of barely useful debug
> information to disk. But the machine was so damn fast that
> this wasn't at all noticeable from the program's performance,
> and it still met its real-time deadlines.
>
> Some programmers, usually those who do almost all their
> programming in C,
> C++ and similar languages, seem to be obsessed with low level
> C++ optimizations
> that shave off a few cycles here and there. They haven't
> internalized just how fast modern computers are capable of running.
>
> When a program runs perceptibly slowly, it is almost always
> due to misdesign, not programming language. Slow applications
> (and operating systems) are slow because they're doing the
> wrong thing, not because they're doing the right thing
> slowly. To effectively optimize at a high level, you need a
> high level language.
>
> --
> David Hopwood <[hidden email]>
>
>
Reply | Threaded
Open this post in threaded view
|

RE: Build an erlang computer (was:Computers are fast)

Vlad Dumitrescu XX (LN/EAB)
Hi,

> Question: Is there a simple program to make main memory
> totally unavailable in my linux system? - I'd like to boot
> the system saying "use only 100K of memory" or "use only 200K
> or memory"
> (I currently have 512K) so I can measure the effect of
> different memory sizes on performance.

There usually is a mem=XXXM boot parameter for the kernel. I don't know
if it works to restrict it, or is it just for cases where the default
doen't recognize all installed memory.

Otherwise, it is also possible to restrict memory usage per user, using
the PAM module.

You can control the per-user limits using the resource-limits PAM module
and /etc/pam.d/limits.conf. For example, limits for group users might
look like this:

                @users     hard  core    0
                @users     hard  nproc   50
                @users     hard  rss     5000

This says to prohibit the creation of core files, restrict the number of
processes to 50, and restrict memory usage per user to 5M.

Regards,
Vlad
Reply | Threaded
Open this post in threaded view
|

Re: Build an erlang computer (was:Computers are fast)

James Hague
In reply to this post by Joe Armstrong (TN/EAB)
> In my never ending quest for inefficiency I have seldom met a problem
> which could not be solved in the twinkling of an eye (apart, that is,
> from our friend that wanted to do O(10^19) computations to force some
> crypto system :-)

How about parsing a 300+ megabyte XML file?  I have a lean and mean
XML parser in Erlang, one that only deals with a strict subset of XML,
and operates entirely on binaries.  On 8 or 10 megabyte files, it's
great, but on this monster--heh.  After about 30 minutes the emulator
dies with "Abnormal Termination."  I suspect it's running out of
memory.

It's interesting when these kinds of crazy problems come along :)
Reply | Threaded
Open this post in threaded view
|

Re: Build an erlang computer (was:Computers are fast)

David Hopwood
James Hague wrote:
>>In my never ending quest for inefficiency I have seldom met a problem
>>which could not be solved in the twinkling of an eye (apart, that is,
>>from our friend that wanted to do O(10^19) computations to force some
>>crypto system :-)

That's an example of "doing the wrong thing, [rather than] doing the right
thing slowly."

> How about parsing a 300+ megabyte XML file?

And yet another example! :-)

--
David Hopwood <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

[Off-topic] Re: Build an erlang computer (was:Computers are fast)

David Hopwood
In reply to this post by Joe Armstrong (TN/EAB)
Joe Armstrong (AL/EAB) wrote:

> I'd like to optimize for
>
>   - compiling programs with a few hundred source code files
>   - making a high performance web server
>
> Where is my money better spent?
>
>    A cheapish processor with as much memory as possible
>    Say a Athlon 64 3000+ 2GHz 512KB cache at 1295 kr
>    With 4 G memory (about 1000 kr/G)
>
>    Or the cheapest dual core Athlon 64 X2 3800+ 2GH 1MB = 3250kr
>    With 2 G memory
>
> In the old days I always said that buying more memory was better than
> buying a faster processor - is this still true? - also what
> is the effect of increasing the size of the processor cache contra
> more main memory for the same money?

For the sake of argument, suppose that the dual core would do everything twice
as fast (it certainly won't do better). This will not result in a 2x perceived
increase in performance, since most of the things that are being done twice as
fast took imperceptible time to begin with.

It is more effective to remove situations that cause pathologically bad
performance -- that is, performance so bad that just doubling the speed won't
fix it. One such situation is virtual memory thrashing. The 4G box will be
able to do more before running into this problem (and the 64-bit address
space allows you to actually make use of 4G).

--
David Hopwood <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

Re: Build an erlang computer (was:Computers are fast)

Scott Lystig Fritchie
In reply to this post by Vlad Dumitrescu XX (LN/EAB)
>>>>> "vd" == Vlad Dumitrescu XX \(LN/EAB\) <Vlad> writes:

vd> There usually is a mem=XXXM boot parameter for the kernel. I
vd> don't know if it works to restrict it, or is it just for cases
vd> where the default doen't recognize all installed memory.

If your machine has more RAM than XXX megabytes, then that extra RAM
will not be used.

Beware using that kernel option: accidentally using "K" rather than
"M" can have bad effects on machines that are further away than arm's
reach.  (Not that I would *ever* do such a thing to a machine 2700
kilometers away, obviously.  :-)

Another way to do it is the mlock() system call.  That will pin the
contents of VM pages into RAM.  I've had problems mlock()ing more than
1.3GB of RAM at one time on a Red Hat AS2.1 machine, even if I only do
it 100MB at a time.  I dunno if later kernels avoid that problem.

-Scott
Reply | Threaded
Open this post in threaded view
|

Re: Build an erlang computer (was:Computers are fast)

Valentin Micic
In reply to this post by Joe Armstrong (TN/EAB)
>Question: Is there a simple program to make main memory totally
>unavailable in my linux system?

Yeah... its called JVM ;-)

Cheers,
Valentin.
Reply | Threaded
Open this post in threaded view
|

Re: Build an erlang computer (was:Computers are fast)

James Hague
In reply to this post by David Hopwood
> > How about parsing a 300+ megabyte XML file?
>
> And yet another example! :-)

Not gonna argue with that, but that doesn't mean such files don't exist :)
Reply | Threaded
Open this post in threaded view
|

Re: [Off-topic] Re: Build an erlang computer (was:Computers are fast)

Matthias Lang
In reply to this post by David Hopwood

 > Joe Armstrong (AL/EAB) wrote:
 > >    Say a Athlon 64 3000+ 2GHz 512KB cache at 1295 kr
...
 > >    Or the cheapest dual core Athlon 64 X2 3800+ 2GH 1MB = 3250kr

Nospam Hopwood writes:

 > For the sake of argument, suppose that the dual core would do
 > everything twice as fast (it certainly won't do better).

The dual-core CPU in Joe's case has twice as much cache, so finding
_something_ it's more than twice as quick on isn't a lost cause.

Besides, the X2 can probably expose concurrency problems in device
drivers and the like at more than 10x the rate of the uniprocessor.

Matthias
Reply | Threaded
Open this post in threaded view
|

RE: Build an erlang computer (was:Computers are fast)

Joe Armstrong (TN/EAB)
In reply to this post by Joe Armstrong (TN/EAB)

This is pure lunacy - design goals 10 in
http://www.w3.org/TR/2003/PER-xml-20031030/
says:

     " 10. Terseness in XML markup is of minimal importance. "
 
  But terseness of expression *is* important if you have lots of data,
this implies
that you should not use XML when there is lots of data.

  Using XML for voluminous data is a sure sign of bad design

  << in another project I pumped into, XML was being used to represent
     a quantity that had three discrete states.

     THREE STATES CAN BE REPRESENTED IN TWO BITS

     But they chose XML - the declaration of a single state look about
     190 Bytes - and they had *lots* of records, which they stored in a
big data base.

     Now the data base was slow, so they bought more memory, it was
still slow,
     so they wanted to go distributed - so they asked me since "Joe
knows something about
     distributed programming" >>

   Mindless use of XML is sure sign of excruciatingly bad design.
   >>

   Idea - grade moderately difficult - XML should compress very nicely -
since the
same tags get repeated over and over again, thus in LZSS compression
duplicated tags will
appear as pointers.

   How about writing an XML parser that works directly from an LSZZ
compressed XML stream
*without* doing the decompression first. If your clever and cache the
result of the parses
of the things that the LZSS pointer point to you might be able to write
a very fast and compact parser.

   I will give a bottle of whisky to the first person to send me a
correct Erlang program that does this.

 Cheers

/Joe


> -----Original Message-----
> From: James Hague [mailto:[hidden email]]
> Sent: den 25 januari 2006 19:02
> To: Joe Armstrong (AL/EAB); [hidden email]
> Subject: Re: Build an erlang computer (was:Computers are fast)
>
> > In my never ending quest for inefficiency I have seldom met
> a problem
> > which could not be solved in the twinkling of an eye
> (apart, that is,
> > from our friend that wanted to do O(10^19) computations to
> force some
> > crypto system :-)
>
> How about parsing a 300+ megabyte XML file?  I have a lean
> and mean XML parser in Erlang, one that only deals with a
> strict subset of XML, and operates entirely on binaries.  On
> 8 or 10 megabyte files, it's great, but on this monster--heh.
>  After about 30 minutes the emulator dies with "Abnormal
> Termination."  I suspect it's running out of memory.
>
> It's interesting when these kinds of crazy problems come along :)
>
Reply | Threaded
Open this post in threaded view
|

RE: Build an erlang computer (was:Computers are fast)

Vlad Dumitrescu
Hi Joe,

>    How about writing an XML parser that works directly from
> an LSZZ compressed XML stream
> *without* doing the decompression first.

I suppose you mean "without decompressing the *whole* stream first", right?

If it can be done without decompressing at all, I think that only one bottle
is _way_ too small a reward ;-)

Regards,
/Vlad




Reply | Threaded
Open this post in threaded view
|

RE: [Off-topic] Re: Build an erlang computer (was:Computers are fast)

Joe Armstrong (TN/EAB)
In reply to this post by Joe Armstrong (TN/EAB)
 

> Joe Armstrong (AL/EAB) wrote:
> > I'd like to optimize for
> >
> >   - compiling programs with a few hundred source code files
> >   - making a high performance web server
> >
> > Where is my money better spent?
> >
> >    A cheapish processor with as much memory as possible
> >    Say a Athlon 64 3000+ 2GHz 512KB cache at 1295 kr
> >    With 4 G memory (about 1000 kr/G)
> >
> >    Or the cheapest dual core Athlon 64 X2 3800+ 2GH 1MB = 3250kr
> >    With 2 G memory
> >
> > In the old days I always said that buying more memory was
> better than
> > buying a faster processor - is this still true? - also what is the
> > effect of increasing the size of the processor cache contra
> more main
> > memory for the same money?
>
> For the sake of argument, suppose that the dual core would do
> everything twice as fast (it certainly won't do better). This
> will not result in a 2x perceived increase in performance,
> since most of the things that are being done twice as fast
> took imperceptible time to begin with.
>
> It is more effective to remove situations that cause
> pathologically bad performance -- that is, performance so bad
> that just doubling the speed won't fix it. One such situation
> is virtual memory thrashing. The 4G box will be able to do
> more before running into this problem (and the 64-bit address
> space allows you to actually make use of 4G).

Good point - I was thinking of the following strategy:

          Buy a cheapish processor (not dual core) and 2 G memory
        upgrade 1 - If not fast enough add 2 G more memory
          upgrade 2 - In 18 months change to dual core which will now be
cheap

Upgrade 1 might not be necessary

Next question - when will I notice the difference in a 512K v. 1M
processor cache

I guess the *only* application that is pathologically bad is video
transcoding
I'm assuming this is CPU limited and that the 1M cache is better.

Next question - do the different motherboard chipsets really make much
difference
to performance?

/Joe

                   
         
>



> --
> David Hopwood <[hidden email]>
>
>
Reply | Threaded
Open this post in threaded view
|

Re: Build an erlang computer

Gerd Flaig
In reply to this post by Joe Armstrong (TN/EAB)
"Joe Armstrong \(AL/EAB\)" <[hidden email]> writes:

> Question: Is there a simple program to make main memory totally
> unavailable in my linux system? - I'd like to boot the system
> saying "use only 100K of memory" or "use only 200K or memory"
> (I currently have 512K) so I can measure the effect of different
> memory sizes on performance.

512k, that reminds me of my Amiga days. ;)

You can use the kernel command line arguments for this, boot e.g. with
mem=128M or mem=64M. If you don't want to reboot your main system, you
might try virtualization with User Mode Linux, Xen or the like.

Another approach would be to lock portions of memory into RAM using
mlock(2) or mlockall(2).

     Goodbyte, Gerd.
--
The last thing one knows in constructing a work is what to put first.
                -- Blaise Pascal

Reply | Threaded
Open this post in threaded view
|

Re: Build an erlang computer (was:Computers are fast)

Zoltan Podlovics
In reply to this post by James Hague
James Hague wrote:
> How about parsing a 300+ megabyte XML file?  I have a lean and mean
> XML parser in Erlang, one that only deals with a strict subset of XML,
> and operates entirely on binaries.  On 8 or 10 megabyte files, it's
> great, but on this monster--heh.  After about 30 minutes the emulator
> dies with "Abnormal Termination."  I suspect it's running out of
> memory.
>
> It's interesting when these kinds of crazy problems come along :)
>

You should consider the non-extractive way of XML processing. Using
Virtual Token Descriptor ( http://vtd-xml.sf.net ) in an XML messaging
system would be a huge advantage because you can send the XML data and
the VTD (parsing information) data together in binary form.

Regards,
Zoltan
Reply | Threaded
Open this post in threaded view
|

Compressed XML (was: Build an erlang computer (was:Computers are fast))

Ewan Higgs
In reply to this post by Joe Armstrong (TN/EAB)

--- "Joe Armstrong (AL/EAB)"
<[hidden email]> wrote:
>    How about writing an XML parser that works
> directly from an LSZZ
> compressed XML stream
> *without* doing the decompression first. If your
> clever and cache the
> result of the parses
> of the things that the LZSS pointer point to you
> might be able to write
> a very fast and compact parser.

I don't know about LSZZ, but I know Zip (RFC 1950 [1])
is iterative so if you have an erlang binding[2] you
can make a process that repeatedly reads chunks of the
file and inflates and then passes it to the xml
parsing process[3].

>    I will give a bottle of whisky to the first
> person to send me a
> correct Erlang program that does this.

If the above helps you write it in a trivial amount of
time, can I claim the prize? :D

Warm regards,
Ewan

[1] http://www.faqs.org/rfcs/rfc1950.html
[2]
http://cvs.sourceforge.net/viewcvs.py/jungerl/jungerl/lib/zlib/src/zlib.erl?rev=1.2&view=auto
[3] A slightly jiggied version of your own xml parsing
library will work fine: http://www.sics.se/~joe/ericsson/xml/xml.html


               
___________________________________________________________
To help you stay safe and secure online, we've developed the all new Yahoo! Security Centre. http://uk.security.yahoo.com
Reply | Threaded
Open this post in threaded view
|

Re: Compressed XML (was: Build an erlang computer (was:Computers are fast))

bryan rasmussen
>
> I don't know about LSZZ, but I know Zip (RFC 1950 [1])
> is iterative so if you have an erlang binding[2] you
> can make a process that repeatedly reads chunks of the
> file and inflates and then passes it to the xml
> parsing process[3].
>
Hmm, a compressed stream-based xml parser, was there a compressed sax
parser at some point?

well, found this
http://www.w3.org/2003/08/binary-interchange-workshop/33-HiT-W3C-Workshop-2003.pdf

that's pretty cool.
Reply | Threaded
Open this post in threaded view
|

"Terseness in XML markup is of minimal importance"

David Hopwood
In reply to this post by Joe Armstrong (TN/EAB)
Joe Armstrong (AL/EAB) wrote:

> This is pure lunacy - design goals 10 in
> http://www.w3.org/TR/2003/PER-xml-20031030/
> says:
>
>      " 10. Terseness in XML markup is of minimal importance. "
>  
>   But terseness of expression *is* important if you have lots of data,
> this implies
> that you should not use XML when there is lots of data.
>
>   Using XML for voluminous data is a sure sign of bad design
>
>   << in another project I pumped into, XML was being used to represent
>      a quantity that had three discrete states.
>
>      THREE STATES CAN BE REPRESENTED IN TWO BITS
>
>      But they chose XML - the declaration of a single state look about
>      190 Bytes - and they had *lots* of records, which they stored in a
> big data base.
>
>      Now the data base was slow, so they bought more memory, it was
> still slow,
>      so they wanted to go distributed - so they asked me since "Joe
> knows something about
>      distributed programming" >>
>
>    Mindless use of XML is sure sign of excruciatingly bad design.
>    >>
>
>    Idea - grade moderately difficult - XML should compress very nicely -
> since the same tags get repeated over and over again, thus in LZSS
> compression duplicated tags will appear as pointers.

Duplicated byte strings will appear as pointers, but these will usually
not start and end at boundaries of duplicated tags.

In general I think this *kind* of idea for how to work around problems
with XML (for example) by making things even more complicated, is part of
the problem. Why can't we just point and laugh at the silly people who are
designing systems that use 300+ MByte XML files?

--
David Hopwood <[hidden email]>

Reply | Threaded
Open this post in threaded view
|

RE: Build an erlang computer (was:Computers are fast)

Richard A. O'Keefe-2
In reply to this post by Joe Armstrong (TN/EAB)
"Joe Armstrong \(AL/EAB\)" <[hidden email]> wrote:
        [Using XML for voluminous data] is pure lunacy

There's a student records protocol designed by someone in a goverment
department here; I have a copy of the spec.  They use XML.  They use
XML badly.  (With just a little redesign, I was able to reduce the
size of the average message by about a factor of two, not that anyone
was interested, of course.  People just don't use attributes enough.)
And they specify badly, no schema information of any kind (not just no
Schema and no DTD, but not even an *informal* schema).

        Idea - grade moderately difficult - XML should compress very
        nicely - since the same tags get repeated over and over again,
        thus in LZSS compression duplicated tags will appear as
        pointers.

It does.  See XMill, by
  Dan Suciu, AT&T Labs Research, Florham Park, [hidden email]
  Hartmut Liefke, Univ. of Pennsylvania, Philadelphia, [hidden email]

Shakespeare's "Timon of Athens"
 177 kB in XML   (Bosak's XML version)
 150 kB in SGML  (Bosak's original SGML version)
 121 kB in SGML  (same structure, with </SPEECH>, </STAGEDIR>, and
                  <SPEAKER> implied, <LINE> shortened to <L>, and
                  <SPEECH> shortened to <S>)
 110 kB as TEXT  (plain text stripped out of XML version)
  47 kB in XMill (compression of 3.74 relative to XML)

  49 kB XML   + gzip -9  (Bosak's)
  48 kB SGML  + gzip -9  (Bosak's)
  48 kB XMill + gzip -9  (yep, gzip made this one bigger)
  46 kB SGML  + gzip -9  (my briefer SGML)
  45 kB TEXT  + gzip -9  (plain text)

  37 kB XML   + bzip2 -9 (Bosak's,         a factor of 4.8)
  37 kB SGML  + bzip2 -9 (Bosak's,         a factor of 4.1)
  36 kB SGML  + bzip2 -9 (my briefer SGML, a factor of 3.3)
  36 kB TEXT  + bzip2 -9 (plain text,      a factor of 3.0)

  40 kB Xbmill (Bosak's XML + xbmill,      a factor of 4.4)

This is an example which is pretty heavy on text:
 Bosak's XML  markup adds 61%;
 Bosak's SGML markup adds 37%;
 briefer SGML markup adds 10%
(where "briefer" does NOT mean "expressing less structure"; the XML, SML,
and briefer SGML versions all have the *same* logical structure).
Even so, we see that it does compress pretty well.  In fact, using
bzip2, the compressed XML is very little larger than the compressed
plain text!

XMill can be told to use multiple "containers", which elements (via a path)
to put in which containers, and what compression scheme to use for each
container, and you can apparently plug your own compressors in, although
I've never done that.

        How about writing an XML parser that works directly from an LSZZ
        compressed XML stream
        *without* doing the decompression first.

This is effectively what xdemill (xbdemill) does.  The compression phase
stores a (compressed) representation of the tree structure, separate from
the compressed representation of the text.

All of Bosak's XML Shakespeare in one file with <PLAYS>...</PLAYS>
wrapped around them:

    parse using nsgmls with ESIS output to /dev/null 3.3 sec
    parse using SWI's sgml, output ditto 3.6 sec
    parse using my XML parser 'qh', output ditto 0.6 sec
    decompress plays.xmi using xdemill, xml out to /dev/null 0.6 sec

qh doesn't recognise DTDs; with that deficiency it's the fastest XML
parser I've ever come across.  So xdemill is doing pretty well.

        If you're clever and cache the result of the parses of the
        things that the LZSS pointer point to you might be able to write
        a very fast and compact parser.
       
The really clever thing to do is to do the parsing as part of compression
so that the decompressor has to do very little work to build the structure.
In a general context, this is particularly nice, because you can recover
the structure, then decompress the text parts *lazily*, so the parts you
are interested in never do get decompressed.

Someone else originally asked:
        > How about parsing a 300+ megabyte XML file?  I have a lean
        > and mean XML parser in Erlang, one that only deals with a
        > strict subset of XML, and operates entirely on binaries.  On
        > 8 or 10 megabyte files, it's great, but on this monster--heh.
        >  After about 30 minutes the emulator dies with "Abnormal
        > Termination."  I suspect it's running out of memory.

On the same 500MHz 500MB machine where I did the timing tests above,
qh parses a 300MB XML file in 21 seconds, about 1/3 of which is I/O.
For things like this an Erlang parser is *never* going to match C, so
there really is a very strong argument for parsing the XML in another
process and just sending the interesting bits or summaries to Erlang.

Reply | Threaded
Open this post in threaded view
|

Re: "Terseness in XML markup is of minimal importance"

bryan rasmussen
In reply to this post by David Hopwood
Well here's a laugh, just in from the python list

"Just want to check which xml parser you guys have found to be the
quickest. I have xml documents with 250 000 records or more and the
processing of these documents are taking way to long. The validation is
the main problem. Any module names, non validating would be find to,
would help a lot."

I agree that people doing really big xml files are doing foolish
things, but actually...
there is a market there, a very real niche market:

http://www.datapower.com/products/xa35.html

datapower was recently bought by IBM weren't they. hmm.



On 1/26/06, David Hopwood <[hidden email]> wrote:

> Joe Armstrong (AL/EAB) wrote:
> > This is pure lunacy - design goals 10 in
> > http://www.w3.org/TR/2003/PER-xml-20031030/
> > says:
> >
> >      " 10. Terseness in XML markup is of minimal importance. "
> >
> >   But terseness of expression *is* important if you have lots of data,
> > this implies
> > that you should not use XML when there is lots of data.
> >
> >   Using XML for voluminous data is a sure sign of bad design
> >
> >   << in another project I pumped into, XML was being used to represent
> >      a quantity that had three discrete states.
> >
> >      THREE STATES CAN BE REPRESENTED IN TWO BITS
> >
> >      But they chose XML - the declaration of a single state look about
> >      190 Bytes - and they had *lots* of records, which they stored in a
> > big data base.
> >
> >      Now the data base was slow, so they bought more memory, it was
> > still slow,
> >      so they wanted to go distributed - so they asked me since "Joe
> > knows something about
> >      distributed programming" >>
> >
> >    Mindless use of XML is sure sign of excruciatingly bad design.
> >    >>
> >
> >    Idea - grade moderately difficult - XML should compress very nicely -
> > since the same tags get repeated over and over again, thus in LZSS
> > compression duplicated tags will appear as pointers.
>
> Duplicated byte strings will appear as pointers, but these will usually
> not start and end at boundaries of duplicated tags.
>
> In general I think this *kind* of idea for how to work around problems
> with XML (for example) by making things even more complicated, is part of
> the problem. Why can't we just point and laugh at the silly people who are
> designing systems that use 300+ MByte XML files?
>
> --
> David Hopwood <[hidden email]>
>
>