Many tools in the Erlang ecosystem expect the Erlang Abstract Code chunk to exist or, if it doesn't exist, it automatically generates one from the source code, if it can find the respective Erlang source. This restricts the use of the existing tooling to only some languages and leads to code duplication in different tools (such as dialyzer, debuger, cover, etc) as each tool includes their own implemention of loading abstract code from beams, fetching it from source as well as converting the abstract code to other formats. To partially solve this issue for languages that compile directly to Core, such as LFE, I have earlier proposed a chunk that stores Core AST. However, even if we add such chunk, I can foresee the following problems: * Storing the Core AST chunk still does not include the ability of retrieving the AST on the fly in case the chunk is not available for whatever reason * Adding a new chunk could potentially make the situation worse because tools in the future may work directly on those new chunks, forcing compilers to add both Erlang Abstract Format and Code AST chunks to the .beam file. Futhermore, it is expected that languages may want to store their own AST as well, which will lead to further increase on the .beam file size Therefore we need a mechanism to store abstract code on .beam such that: * The abstract code is stored once but can be retrieved in different formats, as supported by the initial language (where the initial language is erlang, core, lfe, elixir, alpaca, etc) * If the abstract code is omitted, we should still provide the ability to retrieve it from source if desired, regardless of the initial language I have written a proposal which aims to unify how abstract code, or generally speaking, debug information is stored on `.beam` by introducing a new chunk, called "Dbgi" which aims to replace the current "Absc" chunk. The proposal is backwards compatible and solves the problems outlined above. The full proposal alongside a prototype can be found on GitHub: https://github.com/erlang/otp/pull/1367 Feedback is welcome! _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
Hi,
On Mon, Mar 13, 2017 at 1:55 PM, José Valim <[hidden email]> wrote:
I think it's a good and useful idea. +1 best regards, Vlad _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
In reply to this post by José Valim-2
On 03/13/2017 01:55 PM, José Valim wrote:
> Many tools in the Erlang ecosystem expect the Erlang Abstract Code chunk > to exist or, if it doesn't exist, it automatically generates one from > the source code, if it can find the respective Erlang source. This is the point where I got confused... - What does the mechanism that finds the source code have to do with the new chunk which is stored in the .beam file? These two are totally orthogonal mechanisms, aren't they? - How is finding "the respective Erlang source" related to solving the problems that LFE or other languages (existing and future ones) may be facing? Does the proposal come with some magic mechanism to "find" (I guess "generate" is a more appropriate word here) Erlang source code from e.g. LFE source? Don't misunderstand me, I am not necessarily against the proposal. It's just that I do not see why/how renaming a BEAM chunk is helping us solve problems that are orthogonal to the info that gets stored in this particular chunk. > Therefore we need a mechanism to store abstract code on .beam such that: > > * The abstract code is stored once but can be retrieved in different > formats, as supported by the initial language (where the initial > language is erlang, core, lfe, elixir, alpaca, etc) > > * If the abstract code is omitted, we should still provide the ability > to retrieve it from source if desired, regardless of the initial language Does this mean that it will be impossible to hide the original source code from now on? Does this mean that if I have a .beam file lying around from long ago or I have written a compiler that generates .beam files without a .Dbgi chuck this is not a valid .beam file anymore? How is that "backwards compatible"? (as claimed in the PR) Apologies if I have misunderstood something... Kostis _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
Hi Kostis, Thanks for the comments. Answers inline. - What does the mechanism that finds the source code have to do with the new chunk which is stored in the .beam file? These two are totally orthogonal mechanisms, aren't they? The new proposed Dbgi chunk does not follow the same format as the Abst chunk. It is made of three fields:
The backend field must be a module that knows how to:
The proposed API for the Backend is outlined in the PR: https://github.com/erlang/otp/pull/1367 - How is finding "the respective Erlang source" related to solving the problems that LFE or other languages (existing and future ones) may be facing? Does the proposal come with some magic mechanism to "find" (I guess "generate" is a more appropriate word here) Erlang source code from e.g. LFE source? As per above, the Dbgi chunk contains the backend module and the backend module has the implementation of how to retrieve the AST from source. That's why it is important for functions like beam_lib:strip/1 to not erase the Dbgi chunk but instead set the metadata field to none. Don't misunderstand me, I am not necessarily against the proposal. It's just that I do not see why/how renaming a BEAM chunk is helping us solve problems that are orthogonal to the info that gets stored in this particular chunk. Hopefully the points above clarify it. We are not only renaming the chunk, we are adding extra information to it as well and changing the shape of the metadata stored (which is why a new chunk is required). Does this mean that it will be impossible to hide the original source code from now on? This behaviour will be the same as today. To fully answer the question, let's outline how tools that need the AST work today:
In other words, the process of hiding a source from a tool is:
Today, if you set debug_info to false but the source is still on disk, most tools will end-up building the AST from source. If you don't want that reconstruction then the source must not be available on disk. I aim to keep this behaviour. Does this mean that if I have a .beam file lying around from long ago or I have written a compiler that generates .beam files without a .Dbgi chuck this is not a valid .beam file anymore? How is that "backwards compatible"? (as claimed in the PR) The beam_lib:chunk(BinOrPath, [:abstract_code]) will continue to look for the Abst chunk for at least 3 releases for backwards compatibility reasons. It will work like this:
This means that beam_lib will be able to handle the differences between old and new beams. The only exception is if you lookup directly for the "Abst" chunk, which now will no longer be available, but that should not cause errors because the chunk has always been optional. Your feedback here is very valuable because you have built many tools that work on core. With the proposal above, I hope such tools will have code like this:
The tool no longer needs to retrieve Erlang AST and translate it to core nor know how to perform source lookups. Furthermore, the tool will work with any language that knows how to emit Core AST from the information stored in the Dbgi chunk. Please let me know if there are more questions or points I should clarify, José Valim Skype: jv.ptec Founder and Director of R&D _______________________________________________ erlang-questions mailing list [hidden email] http://erlang.org/mailman/listinfo/erlang-questions |
Free forum by Nabble | Edit this page |