Untimely garbage collection

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Untimely garbage collection

Shawn Pearce
I'm having a little trouble abusing Erts.  :-)

What I've setup is an in-process C driver on Linux which
allocates a pool of ErlDrvBinary objects when the port
is opened from Erts.  As this driver receives data from
the bt848 frame grabber card (its a video capture driver),
it places the video data into the available binaries and
then sends the binaries to Erts with driver_output_term.

Within Erts, a pair of gen_server processes open two
ports:  one to the bt848 video driver, and another to
an X11 XVideo display driver.  The messages sent by
the bt848 with binaries attached is accepted by the
one gen_server and is directly forwarded to the other.

When the second gen_server gets the binaries, it sends
them to the port using Port ! {self(), {command, List}},
where List is the List of ErlDrvBinary objects given to
Erts by the bt848 driver.

So we have a traffic flow like this:

  bt848 ---> gen_server 1 --> gen_server 2 --> XVideo

Initial testing showed that allocating ErlDrvBinary
objects for each video frame was far too costly in
CPU time.  The allocator is just too slow.  So my
initial design was to have the bt848 driver use a
circular queue and just overwrite binaries as it
wraps around.  This causes a nice little side effect
of binaries being modified within Erlang when they
should be read only.

So I decided to use the ErlDrvBinary refc field.
If refc is 1, the bt848 driver owns the binary and
is free to modify the contents.  Erlang doesn't have
a reference, so its not a problem.  If refc is > 1,
then at least one or more processes within Erlang
still hold a reference to this binary and it cannot
be updated.  The bt848 driver uses its own micro
GC routine which just scans the queue for any
binary objects with refc == 1.

This scheme worked well, provided the gen_server's
used something like:

        handle_info({_, read, List}, State) ->
                % work with binaries stored in List
                force_gc({noreply, State}).

        force_gc(Ret) ->
                garbage_collect(),
                Ret.

This seemed kind of risky, but appeared to work well.
Erts was making the tail call into force_gc, which
allowed it to remove List from the process stack.
Since the only references to the binaries were held
in List, and its now popped off the stack, the
binaries should be unreachable and have their
refc decremented when the garbage collection occurs.
Additionally, the gen_server's use a memory heap of only
377 words, which should GC quickly.

Suddenly this has stopped working.  My C drivers are
seizing when they run out of binaries, and all of the
binaries have a refc of 3:  one for the driver that
"owns" the binary, and one for each gen_server process.

The XVideo driver only holds the binary for a short
period of time and is definately increasing and
decreasing refc properly.
refc

>From the perspective of my application, it would be ok
for my Erlang servers to notify the C drivers when they
are done with the binary so it can rewrite it, regardless
of the refc.  It just causes a lot more coordination code
to be written and more messages to be passed per set of
frames being moved.  It also breaks the Erlang single
assignment / everything is read only model, making it
harder for other modules to integrate well.

Does anyone who knows more about the Erlang GC and
Erlang driver development have better suggestions
than what I currently have?

I'm starting to get a little frustrated debugging
this "lack of GC" sorta-deadlock I'm in...

--
Shawn.

Why do I like Perl?  Because ``in accordance with Unix tradition Perl
gives you enough rope to hang yourself with.''

Why do I dislike Java? Because ``the class ROPE that should contain the
method HANG to do the hanging doesn't exist because there is too much
'security' built into the base language.''


Reply | Threaded
Open this post in threaded view
|

Untimely garbage collection

Shawn Pearce
Shawn Pearce <spearce> scrawled:
> I'm having a little trouble abusing Erts.  :-)

...

> Does anyone who knows more about the Erlang GC and
> Erlang driver development have better suggestions
> than what I currently have?

RTFM:

spawn_opt(Module, Function, ArgumentList, Options):

        {fullsweep_after, Number}
        Here are a few cases when it could be useful to change fullsweep_after.
        Firstly, if you want binaries that are no longer used to be thrown away
        as soon as possible. (Set Number to zero.)

I'll try this and see if that makes things better.

--
Shawn.

Why do I like Perl?  Because ``in accordance with Unix tradition Perl
gives you enough rope to hang yourself with.''

Why do I dislike Java? Because ``the class ROPE that should contain the
method HANG to do the hanging doesn't exist because there is too much
'security' built into the base language.''


Reply | Threaded
Open this post in threaded view
|

Untimely garbage collection

Shawn Pearce
This had no effect on my program at all.
With fullsweep_after set to 0 and 1 it still
runs out of buffers and locks up.

It definately looks like the erlang binaries are
not being garbage collected from within erts.

Shawn Pearce <spearce> scrawled:

> Shawn Pearce <spearce> scrawled:
> > I'm having a little trouble abusing Erts.  :-)
>
> ...
>
> > Does anyone who knows more about the Erlang GC and
> > Erlang driver development have better suggestions
> > than what I currently have?
>
> RTFM:
>
> spawn_opt(Module, Function, ArgumentList, Options):
>
> {fullsweep_after, Number}
> Here are a few cases when it could be useful to change fullsweep_after.
> Firstly, if you want binaries that are no longer used to be thrown away
> as soon as possible. (Set Number to zero.)
>
> I'll try this and see if that makes things better.

--
Shawn.

Why do I like Perl?  Because ``in accordance with Unix tradition Perl
gives you enough rope to hang yourself with.''

Why do I dislike Java? Because ``the class ROPE that should contain the
method HANG to do the hanging doesn't exist because there is too much
'security' built into the base language.''


Reply | Threaded
Open this post in threaded view
|

Untimely garbage collection

Scott Lystig Fritchie-3
In reply to this post by Shawn Pearce
In preparing for the upcoming ACM PLI Erlang workshop, I'm suddenly
*very* interested in the issues Shawn raises.  I'll split my comments
& questions into two messages ... two short messages are harder to
ignore (intentionally or accidentally) than a single long message.
:-)

>>>>> "sp" == Shawn Pearce <spearce> writes:

sp> When the second gen_server gets the binaries, it sends them to the
sp> port using Port ! {self(), {command, List}}, where List is the
sp> List of ErlDrvBinary objects given to Erts by the bt848 driver.

The docs & erts code seem to imply that erlang:port_command/2 is the
preferred way of doing that.  {shrug}

Would I be correct to guess that your XVideo driver defines the
'outputv' method and that its 'outputv' handler accesses the pointers
inside of the ErlIOVec directly (to avoid unnecessary data copies)?

sp> Initial testing showed that allocating ErlDrvBinary objects for
sp> each video frame was far too costly in CPU time.  The allocator is
sp> just too slow.

Really?  You really be moving a *lot* of data through those drivers.
Or, if after allocating a ErlDrvBinary, you don't have enough time to
copy the frame into the new ErlDrvBinary without dropping some data?

Perhaps this strategy would be useful?  Have the bt848 driver allocate
a single (or a small number of) ErlDrvBinary large enough to hold
several frames worth of data.  The driver can choose the offset in a
ErlDrvBinary to deposit the next frame's data.  Hrm ... it isn't
obvious if this would lower your overhead or not.

sp> From the perspective of my application, it would be ok
sp> for my Erlang servers to notify the C drivers when they are done
sp> with the binary so it can rewrite it, regardless of the refc.

I had a brainstorm I had yesterday on this topic.  Consider this
example from the SWIG (http://www.swig.org/) documentation:

        # Copy a file
        def filecopy(source,target):
                f1 = fopen(source, "r")
                f2 = fopen(target, "w")
                buffer = malloc(8192)
                nbytes = fread(buffer,8192,1,f1)
                while (nbytes > 0):
                        fwrite(buffer,8192,1,f2)
                        nbytes = fread(buffer,8192,1,f1)
                free(buffer)

An Erlang driver cannot implement malloc and fread in this manner
because of its assumption of multiple assignment.

But, what if the local Erlang process knew that certain binaries were
multiple-assignment-capable?  Then it could safely work like filecopy
above *if* it were written carefully.  This might be viable if two
things were added:

        1. If the owner process of such a binary were to send it
        to another process, the ErlDrvBinary data would be _copied_ so
        that the multiple-assignment-ignorant receiver could
        blissfully assume single-assignment semantics?

        2. The driver implemented a copy method so that the owner
        process could make a single-assignment "snapshot" of the
        multiple-assignment binary for long-term keeping.

Is this a good idea?  {shrug}

-Scott


Reply | Threaded
Open this post in threaded view
|

Aggressive GC? Was Re: Untimely garbage collection

Scott Lystig Fritchie-3
In reply to this post by Shawn Pearce
While hacking drivers, I've come across a GC question, similar to
Shawn's, that I cannot answer.  

Consider this code snippet:

        iolist2binary(B) when binary(B) ->
            {B, size(B)};
        iolist2binary(L) when List(L) ->
            B = list_to_binary(L),
            {B, size(B)}.
       
        foofoo(Port, IOList) when port(Port) ->
            {IOListBinary, IOListBinaryLen} = iolist2binary(IOList),
            C = [ <<?S1_FOOFOO, IOListBinaryLen:32/integer>>, IOListBinary],
            erlang:port_command(Port, C),
            %% Vulnerability window begins here?
            get_port_reply(Port).
       
        get_port_reply(Port) ->
            receive
                {Port, ok} -> ok;
                [...]

Assume:

        1. The driver caches the pointer to IOListBinary's data
        buffer.  This permits the driver to avoid making a copy of
        that data buffer: it can access the binary's data buffer
        directly.

        2. The driver is implemented asynchonously, using a separate
        Pthread.  Thus the VM can do other work while the driver's
        Pthread takes an arbitrary amount of time to do whatever it
        does.

My question: Is it possible for the VM's GC to decrement
IOListBinary's refcount in the time between execution of the
port_command() and the receive?  If the VM's GC were extremely
aggressive(*), then if a GC happened at "Vulnerability window begins
here?", then:

        a. IOListBinary's refcount could go to zero: if the original
        IOList were a deep byte list, then foofoo() is the only thing
        that knows about IOListBinary, so its refcount would drop to
        zero.

        b. Disaster!  The driver's cached pointer to IOListBinary's
        data buffer is invalid.  The driver is executing independently
        of the VM's Pthread, so there's no guarantee that the driver
        will use the pointer before the pointer becomes invalid.

-Scott

(*) If the GC system can figure out that a binding has not yet gone
out of scope yet *but* that binding will never be used again, then
... I'm in trouble.


Reply | Threaded
Open this post in threaded view
|

Aggressive GC? Was Re: Untimely garbage collection

Shawn Pearce
My understanding of this (and the way I'm using it in my drivers)
is this:

When port_command/2 returns, the driver's outputv method
(or just output) has been called and returned already.  This
can be seen also when just using Port ! {self, {command, List}},
the driver's outputv function is called while the message is being
sent.  The send operation doesn't proceed to the next operation
in your Erlang code until the driver has completed its outputv
work.

Since the driver wants to hold the binary for some period of
time, it must increment the refc of any ErlDrvBinary which
it wants to keep, before it returns from the outputv
function.  This sets the refc to be one higher than
the number of Erlang processes still using the binary.

When the Erlang GC kicks in at your vunerability window,
it decrements the refc of the binary, but discovers that
the refc is still > 0, so it leaves the binary alone.

The driver calls driver_free_binary at some point in the
future, which decrements refc and determines that refc == 0,
so it deallocates the binary.

Therefore, the situation you are describing cannot occur.

Of course, since the driver is a pthread, you do have to
be very careful about the interaction with Erts.  If you are
using the async driver interface provided by Erts, you cannot
make emulator calls like driver_free_binary or driver_output_term
from within the background pthread.  They can only be made from
the main Erts thread that your erl_drv_entry functions are invoked
on.

In my bt848 driver case, I spawn my own pthread when the port
is setup.  This pthread uses a mutex and a pipe to communicate
with the Erts thread, and all Erts interactions occur on the
only when Erts calls my driver's erl_drv_entry functions.
In the outputv method of my driver, I increment refc, store them in
a memory space shared with the pthread, and signal the pthread to
wake up.  When the pthread is done with the binary, it
signals Erts over the pipe, and my ready_input driver method
decrements the refc (and deallocates the binary if
necessary).  Synchronization between the Erts thread and my
background pthread is done through a pthread mutex, which
is locked and unlocked in the outputv/read_input functions.

outputv:
        ev->binv[1]->refc++; // I want to keep this binary!
        pthread_mutex_lock(&ref->lock); // Sync with my pthread
        ref->the_binary = ev->binv[1];
        pthread_cond_signal(&ref->cond); // Wakup pthread
        pthread_mutex_unlock(&ref->lock); // Unsync

ready_input:
        read(ref->pipe_in, &_junk, 1);
        pthread_mutex_lock(&ref->lock);
        driver_free_binary(ref->the_binary);
        ref->the_binary = NULL;
        pthread_mutex_unlock(&ref->lock);

bgthread_worker:
        pthread_mutex_lock(&ref->lock);
        for(;;)
        {
                while(!ref->the_binary)
                        pthread_cond_wait(&ref->cond, &ref->lock);
                // Work with the_binary
                ...
                // Signal Erts we are done.
                write(ref->pipe_out, &ref, 1);
        }
        pthread_mutex_unlock(&ref->lock);

The actual code is a little bit more complex, but this is the very
simple version of it.  Erts basically does the right thing, the
question is, does your driver?  ;-)

Plus, I don't think the GC would necessarily occur at a function
call.  I think you'd need to trip it by allocating memory or
calling erlang:garbage_collect() explicitly.

Scott Lystig Fritchie <fritchie> scrawled:

> While hacking drivers, I've come across a GC question, similar to
> Shawn's, that I cannot answer.  
>
> Consider this code snippet:
>
> iolist2binary(B) when binary(B) ->
>    {B, size(B)};
> iolist2binary(L) when List(L) ->
>    B = list_to_binary(L),
>    {B, size(B)}.
>
> foofoo(Port, IOList) when port(Port) ->
>    {IOListBinary, IOListBinaryLen} = iolist2binary(IOList),
>    C = [ <<?S1_FOOFOO, IOListBinaryLen:32/integer>>, IOListBinary],
>    erlang:port_command(Port, C),
>    %% Vulnerability window begins here?
>    get_port_reply(Port).
>
> get_port_reply(Port) ->
>    receive
>        {Port, ok} -> ok;
>        [...]
>
> Assume:
>
> 1. The driver caches the pointer to IOListBinary's data
> buffer.  This permits the driver to avoid making a copy of
> that data buffer: it can access the binary's data buffer
> directly.
>
> 2. The driver is implemented asynchonously, using a separate
> Pthread.  Thus the VM can do other work while the driver's
> Pthread takes an arbitrary amount of time to do whatever it
> does.
>
> My question: Is it possible for the VM's GC to decrement
> IOListBinary's refcount in the time between execution of the
> port_command() and the receive?  If the VM's GC were extremely
> aggressive(*), then if a GC happened at "Vulnerability window begins
> here?", then:
>
> a. IOListBinary's refcount could go to zero: if the original
> IOList were a deep byte list, then foofoo() is the only thing
> that knows about IOListBinary, so its refcount would drop to
> zero.
>
> b. Disaster!  The driver's cached pointer to IOListBinary's
> data buffer is invalid.  The driver is executing independently
> of the VM's Pthread, so there's no guarantee that the driver
> will use the pointer before the pointer becomes invalid.
>
> -Scott
>
> (*) If the GC system can figure out that a binding has not yet gone
> out of scope yet *but* that binding will never be used again, then
> ... I'm in trouble.
--
Shawn.

Why do I like Perl?  Because ``in accordance with Unix tradition Perl
gives you enough rope to hang yourself with.''

Why do I dislike Java? Because ``the class ROPE that should contain the
method HANG to do the hanging doesn't exist because there is too much
'security' built into the base language.''


Reply | Threaded
Open this post in threaded view
|

Untimely garbage collection

Shawn Pearce
In reply to this post by Scott Lystig Fritchie-3
>>>>> "slf" == Scott Lystig Fritchie <fritchie> scrawled:
slf> >>>>> "sp" == Shawn Pearce <spearce> writes:
slf> sp> When the second gen_server gets the binaries, it sends them to the
slf> sp> port using Port ! {self(), {command, List}}, where List is the
slf> sp> List of ErlDrvBinary objects given to Erts by the bt848 driver.
slf>
slf> The docs & erts code seem to imply that erlang:port_command/2 is the
slf> preferred way of doing that.  {shrug}

Learn something new every day.  Thanks, I'll update my code.

slf> Would I be correct to guess that your XVideo driver defines the
slf> 'outputv' method and that its 'outputv' handler accesses the pointers
slf> inside of the ErlIOVec directly (to avoid unnecessary data copies)?

Yes.  Because the binaries are much larger than the 4*ERL_ONHEAP_BIN_LIMIT
(which in R8B-1 is 256 bytes), Erts won't combine them into a single
binary.  Instead I get the group of them as an ErlIoVec, which I then
just take the binaries out of.

However, my code is "poor" in that the data stored within the binary must
start at the first byte of the ErlDrvBinary.  In reality, the ErlIoVec may
point to a byte within the binary, not at the start.  This can occur if
the binary actually came from another driver, but the driver used an
offset to skip some leading number of bytes, or if Erts "splits" the
binary into two subbinaries without copying the data.

I guess that would be a 'bug' that I should address at some point.  Right
now the only binaries I am dealing with are controlled by other drivers
I've also written.

slf> sp> Initial testing showed that allocating ErlDrvBinary objects for
slf> sp> each video frame was far too costly in CPU time.  The allocator is
slf> sp> just too slow.
slf>
slf> Really?  You really be moving a *lot* of data through those drivers.
slf> Or, if after allocating a ErlDrvBinary, you don't have enough time to
slf> copy the frame into the new ErlDrvBinary without dropping some data?
slf>
slf> Perhaps this strategy would be useful?  Have the bt848 driver allocate
slf> a single (or a small number of) ErlDrvBinary large enough to hold
slf> several frames worth of data.  The driver can choose the offset in a
slf> ErlDrvBinary to deposit the next frame's data.  Hrm ... it isn't
slf> obvious if this would lower your overhead or not.

Eh.  Its digital video.  Frames of digital video are not exactly what I'd
call small.  Plus I have digital audio too, but those are quite small
compared to the digital video frames.

It was "initial" testing, my test consisted of a very small Erlang
driver and module, was run several times in a couple of hours, and that
was that.  I may very well have done something in that test that wasn't
close enough to real life, giving me bad results.

Part of the problem with allocating a frame (or even a group of frames)
every time I need them, rather than reusing ErlDrvBinarys is that i could
potentially explode my memory heap quite dramatically.  If a video compressor
gets behind, I'll still be capturing video frames at "wire speed".  I'll
never get any back pressure from the compressor to slow down the capture
engine, forcing the capture engine to just skip capturing frames.

I could setup my own counters and stuff and have the video compressor send
a message to the capture driver when the video compressor starts to see that
its queue is getting long I guess...

It just seemed so much more convienent to let the driver "own" an
ErlDrvBinary, and when that binary's refc == 1, reuse it.  With a fixed
number of binaries, its possible for the driver to feel the backpressure
very quickly, as the video compressor won't be "releasing" binaries by
setting their refc to 1.

slf> sp> From the perspective of my application, it would be ok
slf> sp> for my Erlang servers to notify the C drivers when they are done
slf> sp> with the binary so it can rewrite it, regardless of the refc.
slf>
slf> I had a brainstorm I had yesterday on this topic.  Consider this
slf> example from the SWIG (http://www.swig.org/) documentation:
slf>
slf> # Copy a file
slf> def filecopy(source,target):
slf> f1 = fopen(source, "r")
slf> f2 = fopen(target, "w")
slf> buffer = malloc(8192)
slf> nbytes = fread(buffer,8192,1,f1)
slf> while (nbytes > 0):
slf> fwrite(buffer,8192,1,f2)
slf> nbytes = fread(buffer,8192,1,f1)
slf> free(buffer)
slf>
slf> An Erlang driver cannot implement malloc and fread in this manner
slf> because of its assumption of multiple assignment.
slf>
slf> But, what if the local Erlang process knew that certain binaries were
slf> multiple-assignment-capable?  Then it could safely work like filecopy
slf> above *if* it were written carefully.  This might be viable if two
slf> things were added:
slf>
slf> 1. If the owner process of such a binary were to send it
slf> to another process, the ErlDrvBinary data would be _copied_ so
slf> that the multiple-assignment-ignorant receiver could
slf> blissfully assume single-assignment semantics?
slf>
slf> 2. The driver implemented a copy method so that the owner
slf> process could make a single-assignment "snapshot" of the
slf> multiple-assignment binary for long-term keeping.
slf>
slf> Is this a good idea?  {shrug}

I don't like this idea that much.  What I was thinking about instead
was a way for a driver to "register" interest in an ErlDrvBinary's
refc mutation.  For example:

void
my_watcher(ErlDrvData ref0, ErlDrvBinary theBinary)
{
        fprintf(stderr, "%p now has %i users\r\n", theBinary, theBinary->refc);
}

        ErlDrvBinary* b = driver_alloc_binary(8192);
        driver_watch_binary(ref0, b, my_watcher);

Then when Erts decrements refc during GC in a process we can see that
because Erts called the my_watcher function with the driver's own
data object and the binary in question.

Then the Erlang code could do:

        % Copy a file
        filecopy(source,target) ->
                {ok, F1} = file:open(source, [read, raw, binary]),
                {ok, F2} = file:open(target, [write, raw, binary]),
                filecopy2(F1, F2).

        filecopy2(F1, F2) ->
                erlang:garbage_collect(),
                filecopy2(F1, F2, file:read(F1, 8192)).

        filecopy2(F1, F2, {ok, Buffer}) ->
                file:write(F2, Buffer),
                filecopy2(F1, F2);
        filecopy2(F1, F2, eof) ->
                file:close(F1),
                file:close(F2),
                ok;
        filecopy2(F1, F2, {error, Info}) ->
                file:close(F1),
                file:close(F2),
                exit({error, Info}).
               
Initally, the F1 driver would check during the call to file:read
to see if a buffer has been allocated of that size.  If it has not,
it would allocate it, if it has, it would reuse that buffer, so
long as refc == 1.

The erlang:garbage_collect() call would be to force the process to
release its reference to the binary so that the driver's my_watcher
would be called, allowing the driver to see the refc decrease and
put that binary back into its list of usable buffers.

This is admittedly a stupid example, as the driver could have just
checked for refc == 1 during file:read.  This works today in R8B-1,
and I use that to some extent.

--
Shawn.

Why do I like Perl?  Because ``in accordance with Unix tradition Perl
gives you enough rope to hang yourself with.''

Why do I dislike Java? Because ``the class ROPE that should contain the
method HANG to do the hanging doesn't exist because there is too much
'security' built into the base language.''


Reply | Threaded
Open this post in threaded view
|

Aggressive GC? Was Re: Untimely garbage collection

Björn Gustavsson-3
In reply to this post by Scott Lystig Fritchie-3
Answer to the part about the GC:

Scott Lystig Fritchie <fritchie> writes:

> While hacking drivers, I've come across a GC question, similar to
> Shawn's, that I cannot answer.  
>
> Consider this code snippet:
>
> iolist2binary(B) when binary(B) ->
>    {B, size(B)};
> iolist2binary(L) when List(L) ->
>    B = list_to_binary(L),
>    {B, size(B)}.
>
> foofoo(Port, IOList) when port(Port) ->
>    {IOListBinary, IOListBinaryLen} = iolist2binary(IOList),
>    C = [ <<?S1_FOOFOO, IOListBinaryLen:32/integer>>, IOListBinary],
>    erlang:port_command(Port, C),
>    %% Vulnerability window begins here?

Yes. If your driver haven't incremented the reference counter, you are
in trouble.

The compiler knows that the variables C and IOListBinary will not be
used anymore; therefore, at this point no reference to them will
be kept, and the next garbage collection the reference counts
for the binaries will be decremented.

>    get_port_reply(Port).
>
> get_port_reply(Port) ->
>    receive
>        {Port, ok} -> ok;
>        [...]
>

/Bjorn
--
Bj?rn Gustavsson            Ericsson Utvecklings AB
bjorn      ?T2/UAB/F/P
                            BOX 1505
+46 8 727 56 87    125 25 ?lvsj?


Reply | Threaded
Open this post in threaded view
|

Aggressive GC? Was Re: Untimely garbage collection

Scott Lystig Fritchie-3
>>>>> "bg" == Bjorn Gustavsson <bjorn> writes:

bg> Yes. If your driver haven't incremented the reference counter, you
bg> are in trouble.

bg> The compiler knows that the variables C and IOListBinary will not
bg> be used anymore; therefore, at this point no reference to them
bg> will be kept, and the next garbage collection the reference counts
bg> for the binaries will be decremented.

Those darn smart compilers!  They are so, umm, smart!  {sigh}

It's good to know that my current driver implementation is broken.  I
don't wanna fix it, but I guess there's a price to pay for 100%
correct behavior.  It's so sad when 99.99% correctness isn't good
enough.

Referring back to one of Shawn's messages ... I hadn't ever paid
attention to the "binv" member of the ErlIOVec structure.  Now I'm
quite interested!  The serialization scheme needed to get data into
the driver is very annoying -- I was under the impression that the
ErlIOVec completely hid the serialized data from me.  It's nice to
know I'm wrong.

I had discovered a problem with non-binary data in an I/O list:
write_port() allocates a buffer, "cbin" of "csize", for all non-binary
data.  write_port() frees that buffer immediately after calling the
driver's "outputv" method.  My thinking was: it's unsafe for an
asynchronous driver to try to access the non-binary data stored in
"cbin", so I'd force the Erlang code calling the driver to create
binaries for any non-binary data.

... However, upon another examination, write_port() is using
driver_alloc_binary(), *not* sys_alloc_from() or safe_alloc_from().
Driver binaries have a reference count!  If my driver increments the
refc of that thing, then I don't have to make the Erlang-side code so
contorted when given non-binary data.  Yay!

-Scott