Proposal: a new Dbgi BEAM chunk

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
4 messages Options
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Proposal: a new Dbgi BEAM chunk

José Valim-2
Many tools in the Erlang ecosystem expect the Erlang Abstract Code chunk to exist or, if it doesn't exist, it automatically generates one from the source code, if it can find the respective Erlang source.

This restricts the use of the existing tooling to only some languages and leads to code duplication in different tools (such as dialyzer, debuger, cover, etc) as each tool includes their own implemention of loading abstract code from beams, fetching it from source as well as converting the abstract code to other formats.

To partially solve this issue for languages that compile directly to Core, such as LFE, I have earlier proposed a chunk that stores Core AST. However, even if we add such chunk, I can foresee the following problems:

  * Storing the Core AST chunk still does not include the ability of retrieving the AST on the fly in case the chunk is not available for whatever reason

  * Adding a new chunk could potentially make the situation worse because tools in the future may work directly on those new chunks, forcing compilers to add both Erlang Abstract Format and Code AST chunks to the .beam file. Futhermore, it is expected that languages may want to store their own AST as well, which will lead to further increase on the .beam file size

Therefore we need a mechanism to store abstract code on .beam such that:

  * The abstract code is stored once but can be retrieved in different formats, as supported by the initial language (where the initial language is erlang, core, lfe, elixir, alpaca, etc)

  * If the abstract code is omitted, we should still provide the ability to retrieve it from source if desired, regardless of the initial language

I have written a proposal which aims to unify how abstract code, or generally speaking, debug information is stored on `.beam` by introducing a new chunk, called "Dbgi" which aims to replace the current "Absc" chunk. The proposal is backwards compatible and solves the problems outlined above.

The full proposal alongside a prototype can be found on GitHub: https://github.com/erlang/otp/pull/1367

Feedback is welcome!


José Valim
Skype: jv.ptec
Founder and Director of R&D

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: a new Dbgi BEAM chunk

Vlad Dumitrescu-2
Hi,

On Mon, Mar 13, 2017 at 1:55 PM, José Valim <[hidden email]> wrote:
Many tools in the Erlang ecosystem expect the Erlang Abstract Code chunk to exist or, if it doesn't exist, it automatically generates one from the source code, if it can find the respective Erlang source.

I have written a proposal which aims to unify how abstract code, or generally speaking, debug information is stored on `.beam` by introducing a new chunk, called "Dbgi" which aims to replace the current "Absc" chunk. The proposal is backwards compatible and solves the problems outlined above.

The full proposal alongside a prototype can be found on GitHub: https://github.com/erlang/otp/pull/1367
 
I think it's a good and useful idea. +1

best regards,
Vlad


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: a new Dbgi BEAM chunk

Kostis Sagonas-2
In reply to this post by José Valim-2
On 03/13/2017 01:55 PM, José Valim wrote:
> Many tools in the Erlang ecosystem expect the Erlang Abstract Code chunk
> to exist or, if it doesn't exist, it automatically generates one from
> the source code, if it can find the respective Erlang source.

This is the point where I got confused...

  - What does the mechanism that finds the source code have to do with
the new chunk which is stored in the .beam file?  These two are totally
orthogonal mechanisms, aren't they?

  - How is finding "the respective Erlang source" related to solving the
problems that LFE or other languages (existing and future ones) may be
facing?  Does the proposal come with some magic mechanism to "find" (I
guess "generate" is a more appropriate word here) Erlang source code
from e.g. LFE source?

Don't misunderstand me, I am not necessarily against the proposal.  It's
just that I do not see why/how renaming a BEAM chunk is helping us solve
problems that are orthogonal to the info that gets stored in this
particular chunk.

> Therefore we need a mechanism to store abstract code on .beam such that:
>
>   * The abstract code is stored once but can be retrieved in different
> formats, as supported by the initial language (where the initial
> language is erlang, core, lfe, elixir, alpaca, etc)
>
>   * If the abstract code is omitted, we should still provide the ability
> to retrieve it from source if desired, regardless of the initial language

Does this mean that it will be impossible to hide the original source
code from now on?

Does this mean that if I have a .beam file lying around from long ago or
I have written a compiler that generates .beam files without a .Dbgi
chuck this is not a valid .beam file anymore?  How is that "backwards
compatible"?  (as claimed in the PR)

Apologies if I have misunderstood something...

Kostis
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|  
Report Content as Inappropriate

Re: Proposal: a new Dbgi BEAM chunk

José Valim-2
Hi Kostis,

Thanks for the comments. Answers inline.
 
 - What does the mechanism that finds the source code have to do with the new chunk which is stored in the .beam file?  These two are totally orthogonal mechanisms, aren't they?

The new proposed Dbgi chunk does not follow the same format as the Abst chunk. It is made of three fields:

{debug_info_v1, Backend, Metadata | none}

The backend field must be a module that knows how to:

- How to convert Metadata to different formats. For example, Elixir will likely store Elixir AST in the Metadata field and be able to convert the Metadata field to Elixir AST, Erlang AST and Core AST.

- How to retrieve the AST from source if Metadata is none. The process will likely involve: 1. find the source for the beam file in the :compile attributes 2. parse the source file and 3. convert it to desired format. That's exactly how fetching abstract code from source works today on tools like cover and debugger
 
The proposed API for the Backend is outlined in the PR: https://github.com/erlang/otp/pull/1367

 - How is finding "the respective Erlang source" related to solving the problems that LFE or other languages (existing and future ones) may be facing?  Does the proposal come with some magic mechanism to "find" (I guess "generate" is a more appropriate word here) Erlang source code from e.g. LFE source?

As per above, the Dbgi chunk contains the backend module and the backend module has the implementation of how to retrieve the AST from source. That's why it is important for functions like beam_lib:strip/1 to not erase the Dbgi chunk but instead set the metadata field to none.
  
Don't misunderstand me, I am not necessarily against the proposal.  It's just that I do not see why/how renaming a BEAM chunk is helping us solve problems that are orthogonal to the info that gets stored in this particular chunk.

Hopefully the points above clarify it. We are not only renaming the chunk, we are adding extra information to it as well and changing the shape of the metadata stored (which is why a new chunk is required).
 
Does this mean that it will be impossible to hide the original source code from now on?

This behaviour will be the same as today. To fully answer the question, let's outline how tools that need the AST work today:

1. Attempt to load the AST from the beam chunk

2. If the AST is not available, see if there is a source file on disk

3. If the source file is available, parse it and convert to AST

In other words, the process of hiding a source from a tool is:

1. You can encrypt debug_info

2. Or you can pass debug_info false and remove the source from disk

Today, if you set debug_info to false but the source is still on disk, most tools will end-up building the AST from source. If you don't want that reconstruction then the source must not be available on disk. I aim to keep this behaviour.
 
Does this mean that if I have a .beam file lying around from long ago or I have written a compiler that generates .beam files without a .Dbgi chuck this is not a valid .beam file anymore?  How is that "backwards compatible"?  (as claimed in the PR)

The beam_lib:chunk(BinOrPath, [:abstract_code]) will continue to look for the Abst chunk for at least 3 releases for backwards compatibility reasons. It will work like this:

* Look for the Dbgi chunk, if it is available, it will ask the backend to convert the metadata to Erlang format
* If the Dbgi chunk is not available, it will look at the old Abst chunk and return it

This means that beam_lib will be able to handle the differences between old and new beams. The only exception is if you lookup directly for the "Abst" chunk, which now will no longer be available, but that should not cause errors because the chunk has always been optional.

Your feedback here is very valuable because you have built many tools that work on core. With the proposal above, I hope such tools will have code like this:

case beam_lib:chunks(Beam, [debug_info]) of
  {ok,{Module,[{debug_info, {debug_info_v1, Backend, Metadata}}]}} ->
    case Backend:debug_info(core, Module, Metadata, [allow_source_lookup]) of
      {ok, CoreAST} ->

      {error, Reason} ->
        %% handle error
    end
  {error, Reason} ->
    %% handle error
end

The tool no longer needs to retrieve Erlang AST and translate it to core nor know how to perform source lookups. Furthermore, the tool will work with any language that knows how to emit Core AST from the information stored in the Dbgi chunk.

Please let me know if there are more questions or points I should clarify,

José Valim
Skype: jv.ptec
Founder and Director of R&D


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Loading...