I really wish there was more information on how erts options might be important for usage, in ways that aid performance, but without making the configuration architecture specific. If there were results that show the benefits of certain options, that would probably help. I know there is erts_alloc_config which helps figure out the garbage collection options for your situation. I also know there is the typical erl documentation which has brief mention of the options available. However, many of the options seem a bit obscure. For instance, "+scl false" was mentioned with the recent scheduling NIF problem, though it is unclear what situations "+scl false" is best for. "+spp true" looks like a cool option (and it is valid to say that its usage requires your own testing) but it would help if there was some example of where it works best. Perhaps there is information about options well-suited for embedded development that could help make the use-cases clearer. Benchmarks
are often done with a particular release of Erlang without a good understanding of the effective options, so it may help to have a way of reporting the effective command line options (I don't think this currently exists) since various options are picked automatically (and there seems to be more automatic options than what is reported within the shell prompt header, just due to the number of erts options, but I could be wrong).
A test or benchmark that shows the maximum acceptable mean (and possibly stddev) latency allowed within a NIF or BIF, while the expectation of "good" performance seems important. I think there have been suggestions that the maximum latency should be under 1 ms, but I could be wrong or the information could be anecdotal. Either way, it seems like a good thing to test, investigate, or expose as part of the scheduler changes. Then perhaps there could be more information about when you need to bump reductions. I don't necessarily understand why a NIF or BIF can't utilize a yield call, to yield to the scheduler, but there is probably a good reason, and it would be nice to understand more without going into the code to figure it out (probably trying to avoid the delay of a context switch? but there shouldn't be a delay here). So, any recent benchmarks done within the OTP team would help here.
It seems like a common problem would be having lower average request latency when the system is under heavy load, which seems unusual/strange, but would be related to the schedulers being kept busier within the Erlang VM. Information on the proper usage of erts options for such issues, both with normal usage, and to make benchmarking more reliable seems important. Information from Basho that relates to Riak configuration might help here, if they are using some of the erts flags for specific situations.