Internal compiler atoms

classic Classic list List threaded Threaded
17 messages Options
Reply | Threaded
Open this post in threaded view
|

Internal compiler atoms

Håkan Mattsson

I am trying to compile a
​(​
big
​)​
module from a list of forms
​​
, but it seems
​ like the compiler internally generates lots of new atoms:​


no more index entries in atom_tab (max=1048576)

Crash dump is being written to: erl_crash.dump...

​In this case the compiler itself generated ​more than 300K atoms while compiling my forms.

​Why is the atoms generated?​

​Is this anything that can be disabled?​

/Håkan


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Valentin Micic-2

​Is this anything that can be disabled?​

Say that there is, very soon you would be asking: why is there a limit on a number of atoms? 
Instead, you should write a code that does not generate 700k atoms.

V/


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Albin Stigö-2
1. Try to avoid dynamically creating new atoms.

2. But I need to! Goto rule 1.

3. Dynamic atoms are not safe for long running code. You will
eventually exhaust the atom table and this will lead to subtle bugs.

4. If you really DO need to create dynamic atoms for a quick and dirty
hack, keep in mind that atoms with a common prefix ie. foo_1, foo_2,
foo_3 etc will lead worse performance because of how erlang compares
atoms (some Erlang guru correct me if I'm wrong but this used to be
the case).


--Albin


On Fri, Jul 28, 2017 at 8:23 AM, Valentin Micic <[hidden email]> wrote:

>
> Is this anything that can be disabled?
>
> Say that there is, very soon you would be asking: why is there a limit on a
> number of atoms?
> Instead, you should write a code that does not generate 700k atoms.
>
> V/
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Håkan Mattsson
You must have mis-understood what I wrote.

It is not my program that generates all those atoms. They are generated internally by the compiler while it is compiling my program from forms. The atoms my program are using do already exist in the forms data structure.

I am very well aware of the +t flag. But it is quite boring to set it to 100M just to be able to compile. As I do not understand why the compiler dynamically generates all these internal atoms I cannot predict how big the atom table needs to be. In my latest run the compiler actually generated 25M atoms (which is 25 times the default size of the atom table). It surprised me.

Please, do not come up with further suggestions about avoiding explicit creation of atoms in general. My question was much more specific.

/Håkan

On Jul 28, 2017 08:43, "Albin Stigö" <[hidden email]> wrote:
1. Try to avoid dynamically creating new atoms.

2. But I need to! Goto rule 1.

3. Dynamic atoms are not safe for long running code. You will
eventually exhaust the atom table and this will lead to subtle bugs.

4. If you really DO need to create dynamic atoms for a quick and dirty
hack, keep in mind that atoms with a common prefix ie. foo_1, foo_2,
foo_3 etc will lead worse performance because of how erlang compares
atoms (some Erlang guru correct me if I'm wrong but this used to be
the case).


--Albin


On Fri, Jul 28, 2017 at 8:23 AM, Valentin Micic <[hidden email]> wrote:
>
> Is this anything that can be disabled?
>
> Say that there is, very soon you would be asking: why is there a limit on a
> number of atoms?
> Instead, you should write a code that does not generate 700k atoms.
>
> V/
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Albin Stigö-2
Håkan,

Interesting problem... not completely sure I understand it 100% then.
What do you mean by "list of forms". How large is your module?

I'm not sure why the compiler registers so many atoms but I guess it
has to do with the compiler's internal representation of your module.

I find "The BEAM book" to be a great resource when I run in to these
kinds of problems:
https://github.com/happi/theBeamBook


--Albin

On Fri, Jul 28, 2017 at 9:22 AM, Håkan Mattsson <[hidden email]> wrote:

> You must have mis-understood what I wrote.
>
> It is not my program that generates all those atoms. They are generated
> internally by the compiler while it is compiling my program from forms. The
> atoms my program are using do already exist in the forms data structure.
>
> I am very well aware of the +t flag. But it is quite boring to set it to
> 100M just to be able to compile. As I do not understand why the compiler
> dynamically generates all these internal atoms I cannot predict how big the
> atom table needs to be. In my latest run the compiler actually generated 25M
> atoms (which is 25 times the default size of the atom table). It surprised
> me.
>
> Please, do not come up with further suggestions about avoiding explicit
> creation of atoms in general. My question was much more specific.
>
> /Håkan
>
> On Jul 28, 2017 08:43, "Albin Stigö" <[hidden email]> wrote:
>
> 1. Try to avoid dynamically creating new atoms.
>
> 2. But I need to! Goto rule 1.
>
> 3. Dynamic atoms are not safe for long running code. You will
> eventually exhaust the atom table and this will lead to subtle bugs.
>
> 4. If you really DO need to create dynamic atoms for a quick and dirty
> hack, keep in mind that atoms with a common prefix ie. foo_1, foo_2,
> foo_3 etc will lead worse performance because of how erlang compares
> atoms (some Erlang guru correct me if I'm wrong but this used to be
> the case).
>
>
> --Albin
>
>
> On Fri, Jul 28, 2017 at 8:23 AM, Valentin Micic <[hidden email]>
> wrote:
>>
>> Is this anything that can be disabled?
>>
>> Say that there is, very soon you would be asking: why is there a limit on
>> a
>> number of atoms?
>> Instead, you should write a code that does not generate 700k atoms.
>>
>> V/
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Håkan Mattsson
My module is quite  large. The compressed beam-file is about 45MB. But that is besides the point. It does not explain why the atoms are generated.

/Håkan

On Fri, Jul 28, 2017 at 9:33 AM, Albin Stigö <[hidden email]> wrote:
Håkan,

Interesting problem... not completely sure I understand it 100% then.
What do you mean by "list of forms". How large is your module?

I'm not sure why the compiler registers so many atoms but I guess it
has to do with the compiler's internal representation of your module.

I find "The BEAM book" to be a great resource when I run in to these
kinds of problems:
https://github.com/happi/theBeamBook


--Albin

On Fri, Jul 28, 2017 at 9:22 AM, Håkan Mattsson <[hidden email]> wrote:
> You must have mis-understood what I wrote.
>
> It is not my program that generates all those atoms. They are generated
> internally by the compiler while it is compiling my program from forms. The
> atoms my program are using do already exist in the forms data structure.
>
> I am very well aware of the +t flag. But it is quite boring to set it to
> 100M just to be able to compile. As I do not understand why the compiler
> dynamically generates all these internal atoms I cannot predict how big the
> atom table needs to be. In my latest run the compiler actually generated 25M
> atoms (which is 25 times the default size of the atom table). It surprised
> me.
>
> Please, do not come up with further suggestions about avoiding explicit
> creation of atoms in general. My question was much more specific.
>
> /Håkan
>
> On Jul 28, 2017 08:43, "Albin Stigö" <[hidden email]> wrote:
>
> 1. Try to avoid dynamically creating new atoms.
>
> 2. But I need to! Goto rule 1.
>
> 3. Dynamic atoms are not safe for long running code. You will
> eventually exhaust the atom table and this will lead to subtle bugs.
>
> 4. If you really DO need to create dynamic atoms for a quick and dirty
> hack, keep in mind that atoms with a common prefix ie. foo_1, foo_2,
> foo_3 etc will lead worse performance because of how erlang compares
> atoms (some Erlang guru correct me if I'm wrong but this used to be
> the case).
>
>
> --Albin
>
>
> On Fri, Jul 28, 2017 at 8:23 AM, Valentin Micic <[hidden email]>
> wrote:
>>
>> Is this anything that can be disabled?
>>
>> Say that there is, very soon you would be asking: why is there a limit on
>> a
>> number of atoms?
>> Instead, you should write a code that does not generate 700k atoms.
>>
>> V/
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
>


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Jachym Holecek
In reply to this post by Håkan Mattsson
# Håkan Mattsson 2017-07-27:
> In this case the compiler itself generated more than 300K atoms while
> compiling my forms.
>
> Why is the atoms generated?

Names for variables introduced by the compiler. Names for functions
introduced by the compiler. Some of this seems to happen in v3_core,
have a look at new_fun_name/2 + new_var_name/1 and follow their
callers. There seems to be more besides v3_core, too.

> Is this anything that can be disabled?

Probably not, but ask a compiler expert. ;-)

BR,
        -- Jachym
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Dominic Morneau
In reply to this post by Håkan Mattsson
Since you have a crash dump, you could check:

   sed -ne '/=atoms/,$p' < erl_crash.dump | head -n 100

That would show the last 100 atoms created before the crash, which might help figure out what's up.

Dominic

2017年7月27日(木) 23:19 Håkan Mattsson <[hidden email]>:

I am trying to compile a
​(​
big
​)​
module from a list of forms
​​
, but it seems
​ like the compiler internally generates lots of new atoms:​


no more index entries in atom_tab (max=1048576)

Crash dump is being written to: erl_crash.dump...

​In this case the compiler itself generated ​more than 300K atoms while compiling my forms.

​Why is the atoms generated?​

​Is this anything that can be disabled?​

/Håkan

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Björn-Egil Dahlberg-2
In reply to this post by Håkan Mattsson
All identifiers, i.e. variables and function names, are represented as atoms internally in the compiler. In addition to your own variables, temporaries are also generated in core and kernel. These are also identifiers and therefor also atoms. This means we can't really predict how many atoms will be generated either.

And No, there is no real need for these to be represented as atoms. It would be better if they were binaries imho. I guess the original author(s) of the compiler never expected huge a huge number of atoms being generated.

The remedy is to rewrite the compiler internals to use binary strings instead of atoms for identifiers. That goes for the whole chain of things, tokinizer, parser, linter, forms, core, kernel and beam. Happy hacking. :)

// Björn-Egil

2017-07-27 16:19 GMT+02:00 Håkan Mattsson <[hidden email]>:

I am trying to compile a
​(​
big
​)​
module from a list of forms
​​
, but it seems
​ like the compiler internally generates lots of new atoms:​


no more index entries in atom_tab (max=1048576)

Crash dump is being written to: erl_crash.dump...

​In this case the compiler itself generated ​more than 300K atoms while compiling my forms.

​Why is the atoms generated?​

​Is this anything that can be disabled?​

/Håkan


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Richard A. O'Keefe-2
In reply to this post by Albin Stigö-2


On 28/07/17 7:33 PM, Albin Stigö wrote:

> I find "The BEAM book" to be a great resource when I run in to these
> kinds of problems:
> https://github.com/happi/theBeamBook

I would like to say a public and heartfelt THANK YOU
to everyone who has contributed to the BEAM book.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Lloyd R. Prentice-2
Thanks for the links!

Looks like I have several weeks of study ahead of me.

All the best,

Lloyd

-----Original Message-----
From: "Richard A. O'Keefe" <[hidden email]>
Sent: Sunday, July 30, 2017 7:59pm
To: [hidden email]
Subject: Re: [erlang-questions] Internal compiler atoms



On 28/07/17 7:33 PM, Albin Stigö wrote:

> I find "The BEAM book" to be a great resource when I run in to these
> kinds of problems:
> https://github.com/happi/theBeamBook

I would like to say a public and heartfelt THANK YOU
to everyone who has contributed to the BEAM book.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Richard A. O'Keefe-2
In reply to this post by Björn-Egil Dahlberg-2


On 30/07/17 1:18 AM, Björn-Egil Dahlberg wrote:
> All identifiers, i.e. variables and function names, are represented as
> atoms internally in the compiler.

For what it's worth, Quintus Prolog on 32-bit machines had a hard limit
of 2 million atoms and a realistic limit of a lot fewer.  With that in
mind, Quintus represented variable names as a sort of ersatz packed
string (in such a way that they sorted correctly).  With some of our
customers trying to compile large amounts of machine-generated code,
this was a practical necessity.  For Erlang, binaries are the obvious
choice.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Anthony Ramine-4
In reply to this post by Håkan Mattsson

> Le 27 juil. 2017 à 16:19, Håkan Mattsson <[hidden email]> a écrit :
>
>
> I am trying to compile a ​(​big​)​ module from a list of forms​​, but it seems​ like the compiler internally generates lots of new atoms:​
>
> no more index entries in atom_tab (max=1048576)
>
> Crash dump is being written to: erl_crash.dump...
>
> ​In this case the compiler itself generated ​more than 300K atoms while compiling my forms.
> ​
> ​Why is the atoms generated?​
>
> ​Is this anything that can be disabled?​
>
> /Håkan

Write a core_transform that replaces all atoms in c_vars by integers. Given the Core inliner already emits variables named after integer values, the rest of the Core compiler passes should be able to cope with them.

>
> Le 28 juil. 2017 à 08:43, Albin Stigö <[hidden email]> a écrit :
>
> 4. If you really DO need to create dynamic atoms for a quick and dirty
> hack, keep in mind that atoms with a common prefix ie. foo_1, foo_2,
> foo_3 etc will lead worse performance because of how erlang compares
> atoms (some Erlang guru correct me if I'm wrong but this used to be
> the case).

Wrong, atom comparison is always O(1).


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Anthony Ramine-4

> Le 1 août 2017 à 13:58, Anthony Ramine <[hidden email]> a écrit :
>
> Write a core_transform that replaces all atoms in c_vars by integers. Given the Core inliner already emits variables named after integer values, the rest of the Core compiler passes should be able to cope with them.

Actually I just realised that wouldn't solve the problem, because the problem is probably that one of the Core passes explicitly emits atomic names for nothing. The good news is that given the inliner emits integers, we can do the same in the culprit pass.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Mikael Pettersson-5
In reply to this post by Anthony Ramine-4
Anthony Ramine writes:
 > > Le 28 juil. 2017 à 08:43, Albin Stigö <[hidden email]> a écrit :
 > >
 > > 4. If you really DO need to create dynamic atoms for a quick and dirty
 > > hack, keep in mind that atoms with a common prefix ie. foo_1, foo_2,
 > > foo_3 etc will lead worse performance because of how erlang compares
 > > atoms (some Erlang guru correct me if I'm wrong but this used to be
 > > the case).
 >
 > Wrong, atom comparison is always O(1).

Equality/inequality tests on atoms is O(1).  (That's the whole point of atoms,
and their analogues in LISP, i.e. "symbols".)

Comparisons (<, >, etc) is O(length(atom_to_list(Atom)), and with
common prefixes the cost is indeed higher.

C.f. erts/emulator/beam/utils.c:erts_cmp_atoms().
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Björn Gustavsson-4
In reply to this post by Håkan Mattsson
On Thu, Jul 27, 2017 at 4:19 PM, Håkan Mattsson <[hidden email]> wrote:

>
> I am trying to compile a
> (
> big
> )
> module from a list of forms
> , but it seems
> like the compiler internally generates lots of new atoms:
>
>
> no more index entries in atom_tab (max=1048576)
>
> Crash dump is being written to: erl_crash.dump...

Please show us the code (a scaled-down representative
sample is fine).

/Björn


--
Björn Gustavsson, Erlang/OTP, Ericsson AB
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Internal compiler atoms

Erdem Aksu
In reply to this post by Håkan Mattsson
Hello Håkan,

If you compile your module from core erlang (cerl) forms and pass 'from_core' option to compile:forms/2 call, compiler will skip many passes including core pass, which calls v3_call:module/2 and will only parse the core forms and run core_passes.
Thus you can leap list_to_atom/1 calls that are invoked by v3_call:module/2 call.


Br,
Erdem Aksu

On Thu, Jul 27, 2017 at 4:19 PM Håkan Mattsson <[hidden email]> wrote:

I am trying to compile a
​(​
big
​)​
module from a list of forms
​​
, but it seems
​ like the compiler internally generates lots of new atoms:​


no more index entries in atom_tab (max=1048576)

Crash dump is being written to: erl_crash.dump...

​In this case the compiler itself generated ​more than 300K atoms while compiling my forms.

​Why is the atoms generated?​

​Is this anything that can be disabled?​

/Håkan

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions