Issues with stdin on ports

classic Classic list List threaded Threaded
34 messages Options
12
Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Anthony Grimes
Howdy folks.

I unfortunately have not been able to use Erlang for most of what I've
been doing lately because of a long standing issue with Erlang ports
that I'd like to start a discussion about here.

As far as I am aware, ports are generally the only option for creating
and communicating with external processes in Erlang. They seem to have
at least one particular fatal flaw which prevents them from being very
useful to me, and that is that there is no way to close stdin (and send
EOF) and then also read from the process's stdout. For example, I cannot
use a port to start the 'cat' program which listens on stdin for data
and waits for EOF and then echos that data back to you. I can do the
first part, which is send it data on stdin, but the only way for me to
close it is to call port_close and close the entire process.

This issue prevents Erlang users from doing any even slightly more than
trivial communication with external processes without having some kind
of middleman program that handles the creation of the actual process you
need to talk to and looks for a specific byte sequence to indicate 'EOF'.

I could totally be wrong, but it seems like we need something other than
just port_close. Something like
http://www.erlang.org/doc/man/gen_tcp.html#shutdown-2 
<http://www.erlang.org/doc/man/gen_tcp.html#shutdown-2>which lets you say
"Hey, I want to close the stdin of this process but still read from its
stdout." or something similar. I could be totally off track on what a
good solution would be.

So I'm wondering if people are aware of this problem, and I'd like to
make sure that people think it is an actual problem that should be
fixed. I'm also curious what people think a good solution to the problem
would be. I'm not sure I have the time/particular skill set to fix it
given that the port code is some pretty obscure (to me) C code, but
starting conversation seems like a good way to begin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130728/68361264/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Richard A. O'Keefe

On 29/07/2013, at 4:14 PM, Anthony Grimes wrote:

> and communicating with external processes in Erlang. They seem to have
> at least one particular fatal flaw which prevents them from being very
> useful to me, and that is that there is no way to close stdin (and send
> EOF) and then also read from the process's stdout. For example, I cannot
> use a port to start the 'cat' program which listens on stdin for data
> and waits for EOF and then echos that data back to you. I can do the
> first part, which is send it data on stdin, but the only way for me to
> close it is to call port_close and close the entire process.

Note that "only send data to a command" and "only receive data from a
command" are the traditional ways for a UNIX program to communicate
with another over a pipe.  popen(<command>, "r") reads the output of
the command and popen(<command>, "w") writes to the input of the command.
There isn't even any standard _term_ for talking about connecting to both
stdin and stdout of a command in UNIX, and that's because it's an
incredibly easy way to deadlock.

> This issue prevents Erlang users from doing any even slightly more than
> trivial communication with external processes without having some kind
> of middleman program that handles the creation of the actual process you
> need to talk to and looks for a specific byte sequence to indicate 'EOF'.

Just like it prevents C users from doing the same thing.
Unless they fake something up using named pipes or UNIX-domain sockets.
(Or message queues.  I do wish Mac OS X implemented rather more of POSIX...)

Unix anonymous pipes are simply the wrong tool for the job in _any_
programming language.

The historic way to do "slightly more than trivial communication with
external processes" has been to set the external processes up as C nodes
or to use sockets.





Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Anthony Grimes
I'm not sure I follow. I think you may have misunderstood what I am
trying to accomplish. Here is a simple low-level example I just whipped
up in a Clojure repl using the standard built-in Java Process library.

user=> (def proc (.exec (Runtime/getRuntime) (into-array String ["cat"])))
#'user/proc
user=> (def stdin (.getOutputStream proc))
#'user/stdin
user=> (def stdout (.getInputStream proc))
#'user/stdout
user=> (.write stdin (.getBytes "Hi!"))
nil
user=> (.close stdin)
nil
user=> (let [arr (byte-array 3)] (.read stdout arr) (String. arr))
"Hi!"

Lots of unix programs work like this. We have cat in this example, but
grep, wc, and various others work like that as well, in that when you
run them they listen for data and wait for you to send EOF and then they
start producing output. It's a very common thing.

Every language I can think of lets you work with programs like this.
Ports are an abstraction for both reading and writing. The important
missing piece of functionality is being able to send EOF and tell the
program that you are done writing so that it can produce output. I'm not
following why this is a bad thing.

Perhaps this example and clarification will clear things up, and I
apologize in advance if I am missing something crucial!

-Anthony

> Richard A. O'Keefe <mailto:ok>
> July 29, 2013 12:15 AM
> On 29/07/2013, at 4:14 PM, Anthony Grimes wrote:
>
>> and communicating with external processes in Erlang. They seem to have
>> at least one particular fatal flaw which prevents them from being very
>> useful to me, and that is that there is no way to close stdin (and send
>> EOF) and then also read from the process's stdout. For example, I cannot
>> use a port to start the 'cat' program which listens on stdin for data
>> and waits for EOF and then echos that data back to you. I can do the
>> first part, which is send it data on stdin, but the only way for me to
>> close it is to call port_close and close the entire process.
>
> Note that "only send data to a command" and "only receive data from a
> command" are the traditional ways for a UNIX program to communicate
> with another over a pipe.  popen(<command>, "r") reads the output of
> the command and popen(<command>, "w") writes to the input of the command.
> There isn't even any standard _term_ for talking about connecting to both
> stdin and stdout of a command in UNIX, and that's because it's an
> incredibly easy way to deadlock.
>
>> This issue prevents Erlang users from doing any even slightly more than
>> trivial communication with external processes without having some kind
>> of middleman program that handles the creation of the actual process you
>> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>
> Just like it prevents C users from doing the same thing.
> Unless they fake something up using named pipes or UNIX-domain sockets.
> (Or message queues.  I do wish Mac OS X implemented rather more of POSIX...)
>
> Unix anonymous pipes are simply the wrong tool for the job in _any_
> programming language.
>
> The historic way to do "slightly more than trivial communication with
> external processes" has been to set the external processes up as C nodes
> or to use sockets.
>
>
>
>
> Anthony Grimes <mailto:i>
> July 28, 2013 9:14 PM
> Howdy folks.
>
> I unfortunately have not been able to use Erlang for most of what I've
> been doing lately because of a long standing issue with Erlang ports
> that I'd like to start a discussion about here.
>
> As far as I am aware, ports are generally the only option for creating
> and communicating with external processes in Erlang. They seem to have
> at least one particular fatal flaw which prevents them from being very
> useful to me, and that is that there is no way to close stdin (and send
> EOF) and then also read from the process's stdout. For example, I cannot
> use a port to start the 'cat' program which listens on stdin for data
> and waits for EOF and then echos that data back to you. I can do the
> first part, which is send it data on stdin, but the only way for me to
> close it is to call port_close and close the entire process.
>
> This issue prevents Erlang users from doing any even slightly more than
> trivial communication with external processes without having some kind
> of middleman program that handles the creation of the actual process you
> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>
> I could totally be wrong, but it seems like we need something other than
> just port_close. Something like
> http://www.erlang.org/doc/man/gen_tcp.html#shutdown-2
> <http://www.erlang.org/doc/man/gen_tcp.html#shutdown-2> which lets you say
> "Hey, I want to close the stdin of this process but still read from its
> stdout." or something similar. I could be totally off track on what a
> good solution would be.
>
> So I'm wondering if people are aware of this problem, and I'd like to
> make sure that people think it is an actual problem that should be
> fixed. I'm also curious what people think a good solution to the problem
> would be. I'm not sure I have the time/particular skill set to fix it
> given that the port code is some pretty obscure (to me) C code, but
> starting conversation seems like a good way to begin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/35105963/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compose-unknown-contact.jpg
Type: image/jpeg
Size: 770 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/35105963/attachment-0002.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postbox-contact.jpg
Type: image/jpeg
Size: 1188 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/35105963/attachment-0003.jpg>

Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Robert Raschke-7
In reply to this post by Anthony Grimes
Hi Anthony,

In the past, I've tended to use the port mechanism to simply kick off a C
node, which then allows you to have full control over whatever
communications needs you have.

This obviously only works if you are interfacing with a technology that
will allow you to create C node and use the EI libs in some way. Not sure
if that is the case from what you wrote.

Regards,
Robby
On Jul 29, 2013 8:04 AM, "Anthony Grimes" <i> wrote:

>  Howdy folks.
>
> I unfortunately have not been able to use Erlang for most of what I've
> been doing lately because of a long standing issue with Erlang ports
> that I'd like to start a discussion about here.
>
> As far as I am aware, ports are generally the only option for creating
> and communicating with external processes in Erlang. They seem to have
> at least one particular fatal flaw which prevents them from being very
> useful to me, and that is that there is no way to close stdin (and send
> EOF) and then also read from the process's stdout. For example, I cannot
> use a port to start the 'cat' program which listens on stdin for data
> and waits for EOF and then echos that data back to you. I can do the
> first part, which is send it data on stdin, but the only way for me to
> close it is to call port_close and close the entire process.
>
> This issue prevents Erlang users from doing any even slightly more than
> trivial communication with external processes without having some kind
> of middleman program that handles the creation of the actual process you
> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>
> I could totally be wrong, but it seems like we need something other than
> just port_close. Something like
> http://www.erlang.org/doc/man/**gen_tcp.html#shutdown-2<http://www.erlang.org/doc/man/gen_tcp.html#shutdown-2>
>  which lets you say
> "Hey, I want to close the stdin of this process but still read from its
> stdout." or something similar. I could be totally off track on what a
> good solution would be.
>
> So I'm wondering if people are aware of this problem, and I'd like to
> make sure that people think it is an actual problem that should be
> fixed. I'm also curious what people think a good solution to the problem
> would be. I'm not sure I have the time/particular skill set to fix it
> given that the port code is some pretty obscure (to me) C code, but
> starting conversation seems like a good way to begin.
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/9e1713c4/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Anthony Grimes
Hey Robert.


I don't think C nodes help in this case, and they don't solve the general problem. One of my use cases is talking to pygmentize, which is a Python program. If I want to do that, I have to write a middleman program that does the actual communication with this program and talks to my Erlang program over a socket, or via ports if I make the middleman program look for some specific sequence of bytes to treat as EOF since I can't send actual EOF.

On Mon, Jul 29, 2013 at 2:12 AM, Robert Raschke <rtrlists>
wrote:

> Hi Anthony,
> In the past, I've tended to use the port mechanism to simply kick off a C
> node, which then allows you to have full control over whatever
> communications needs you have.
> This obviously only works if you are interfacing with a technology that
> will allow you to create C node and use the EI libs in some way. Not sure
> if that is the case from what you wrote.
> Regards,
> Robby
> On Jul 29, 2013 8:04 AM, "Anthony Grimes" <i> wrote:
>>  Howdy folks.
>>
>> I unfortunately have not been able to use Erlang for most of what I've
>> been doing lately because of a long standing issue with Erlang ports
>> that I'd like to start a discussion about here.
>>
>> As far as I am aware, ports are generally the only option for creating
>> and communicating with external processes in Erlang. They seem to have
>> at least one particular fatal flaw which prevents them from being very
>> useful to me, and that is that there is no way to close stdin (and send
>> EOF) and then also read from the process's stdout. For example, I cannot
>> use a port to start the 'cat' program which listens on stdin for data
>> and waits for EOF and then echos that data back to you. I can do the
>> first part, which is send it data on stdin, but the only way for me to
>> close it is to call port_close and close the entire process.
>>
>> This issue prevents Erlang users from doing any even slightly more than
>> trivial communication with external processes without having some kind
>> of middleman program that handles the creation of the actual process you
>> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>>
>> I could totally be wrong, but it seems like we need something other than
>> just port_close. Something like
>> http://www.erlang.org/doc/man/**gen_tcp.html#shutdown-2<http://www.erlang.org/doc/man/gen_tcp.html#shutdown-2>
>>  which lets you say
>> "Hey, I want to close the stdin of this process but still read from its
>> stdout." or something similar. I could be totally off track on what a
>> good solution would be.
>>
>> So I'm wondering if people are aware of this problem, and I'd like to
>> make sure that people think it is an actual problem that should be
>> fixed. I'm also curious what people think a good solution to the problem
>> would be. I'm not sure I have the time/particular skill set to fix it
>> given that the port code is some pretty obscure (to me) C code, but
>> starting conversation seems like a good way to begin.
>>
>> _______________________________________________
>> erlang-questions mailing list
>> erlang-questions
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/9a84ff2e/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Robert Raschke-7
Hi Anthony,

could Py-interface http://www.lysator.liu.se/~tab/erlang/py_interface/ help
with that?

Note, I've not used it, so don't know much at all beyond reading that page.
It would probably mean using the Pygments library, rather than going via
the command line pygmentize. You can probably take the pygmentize source
and replace the command line handling with setting up a Python Node.

Alternatively, and not necessarily very nice, instead of streaming to
pygmentize, write to a file and invoke on that. You wouldn't even need a
port then, you could get away with os:cmd/1 (if you aren't interested in
return codes). But you already know this, I think.

Robby

PS Something like expect for Erlang ports would be pretty cool, though. Not
that I'm not volunteering ;-)



On Mon, Jul 29, 2013 at 10:19 AM, Anthony Grimes <i> wrote:

> Hey Robert.
>
> I don't think C nodes help in this case, and they don't solve the general
> problem. One of my use cases is talking to pygmentize, which is a Python
> program. If I want to do that, I have to write a middleman program that
> does the actual communication with this program and talks to my Erlang
> program over a socket, or via ports if I make the middleman program look
> for some specific sequence of bytes to treat as EOF since I can't send
> actual EOF.
>
>
> On Mon, Jul 29, 2013 at 2:12 AM, Robert Raschke <rtrlists>wrote:
>
>> Hi Anthony,
>>
>> In the past, I've tended to use the port mechanism to simply kick off a C
>> node, which then allows you to have full control over whatever
>> communications needs you have.
>>
>> This obviously only works if you are interfacing with a technology that
>> will allow you to create C node and use the EI libs in some way. Not sure
>> if that is the case from what you wrote.
>>
>> Regards,
>> Robby
>> On Jul 29, 2013 8:04 AM, "Anthony Grimes" <i> wrote:
>>
>>>  Howdy folks.
>>>
>>> I unfortunately have not been able to use Erlang for most of what I've
>>> been doing lately because of a long standing issue with Erlang ports
>>> that I'd like to start a discussion about here.
>>>
>>> As far as I am aware, ports are generally the only option for creating
>>> and communicating with external processes in Erlang. They seem to have
>>> at least one particular fatal flaw which prevents them from being very
>>> useful to me, and that is that there is no way to close stdin (and send
>>> EOF) and then also read from the process's stdout. For example, I cannot
>>> use a port to start the 'cat' program which listens on stdin for data
>>> and waits for EOF and then echos that data back to you. I can do the
>>> first part, which is send it data on stdin, but the only way for me to
>>> close it is to call port_close and close the entire process.
>>>
>>> This issue prevents Erlang users from doing any even slightly more than
>>> trivial communication with external processes without having some kind
>>> of middleman program that handles the creation of the actual process you
>>> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>>>
>>> I could totally be wrong, but it seems like we need something other than
>>> just port_close. Something like
>>> http://www.erlang.org/doc/man/**gen_tcp.html#shutdown-2<http://www.erlang.org/doc/man/gen_tcp.html#shutdown-2>
>>>  which lets you say
>>> "Hey, I want to close the stdin of this process but still read from its
>>> stdout." or something similar. I could be totally off track on what a
>>> good solution would be.
>>>
>>> So I'm wondering if people are aware of this problem, and I'd like to
>>> make sure that people think it is an actual problem that should be
>>> fixed. I'm also curious what people think a good solution to the problem
>>> would be. I'm not sure I have the time/particular skill set to fix it
>>> given that the port code is some pretty obscure (to me) C code, but
>>> starting conversation seems like a good way to begin.
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> erlang-questions
>>> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/4e25958b/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Jesper Louis Andersen
In reply to this post by Anthony Grimes
On Mon, Jul 29, 2013 at 6:14 AM, Anthony Grimes <i> wrote:

> So I'm wondering if people are aware of this problem, and I'd like to
> make sure that people think it is an actual problem that should be
> fixed. I'm also curious what people think a good solution to the problem
> would be. I'm not sure I have the time/particular skill set to fix it
> given that the port code is some pretty obscure (to me) C code, but
> starting conversation seems like a good way to begin.


I don't think ports were created with that use case in mind. Usually when
connecting Erlang to foreign programs, you have the advantage of being able
to specify the interface fully and then you can pick a better protocol than
"send message and then EOF". The deeper trouble here is that such an
interface means it is use-once. It will just blindly fork()/exec() programs
all the time and that hardly seems productive and efficient. It probably
will not scale very well and then you are looking at a framing solution on
the stream in any case.

Note the same problem is present with gen_tcp:shutdown/2. It seems
counterproductive to pay the handshake every time you want to communicate.

Another problem is that with a fork()/exec() solution you ctx switch to the
pygmentize process (or whatever process) for one processing only and then
you switch away again. You can't pipeline stuff to pygmentize and hence you
pay ctx switch overhead all the time.

In other words, I have a hard time seeing this to be useful, but you may
correct me on that if you have a better use case or can shoot down my
stream of thought.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/3d565fd4/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Per Hedeland-4
In reply to this post by Richard A. O'Keefe
"Richard A. O'Keefe" <ok> wrote:

>
>On 29/07/2013, at 4:14 PM, Anthony Grimes wrote:
>
>> and communicating with external processes in Erlang. They seem to have
>> at least one particular fatal flaw which prevents them from being very
>> useful to me, and that is that there is no way to close stdin (and send
>> EOF) and then also read from the process's stdout. For example, I cannot
>> use a port to start the 'cat' program which listens on stdin for data
>> and waits for EOF and then echos that data back to you. I can do the
>> first part, which is send it data on stdin, but the only way for me to
>> close it is to call port_close and close the entire process.

FWIW, I definitely agree that this is a missing piece of functionality.
I'm not sure how useful/important it is in the grand scheme of things,
but personally I could have used it on a couple of occasions. As you
mentioned, it's basically the equivalent of TCP shutdown() that is
needed, although shutdown() is perhaps a bit over-engineered - I've
never seen anyone use SHUT_RD or SHUT_RDWR...

Also the "opposite" functionality is already available for ports via the
'eof' option - i.e. you get informed that the other end has closed its
write side, but can still write data in the other direction.

>Note that "only send data to a command" and "only receive data from a
>command" are the traditional ways for a UNIX program to communicate
>with another over a pipe.

Well, it's basically the definition of the traditional pipeline concept
of the Unix shells, and pipes are obviously what you need to implement
it - but that doesn't preclude other uses of pipes. The zsh shell even
allows you to set up "bidirectional pipes" on the commandline.

> popen(<command>, "r") reads the output of
>the command and popen(<command>, "w") writes to the input of the command.

popen() is effectively a convenience function to abstract away the
somewhat non-trivial application of pipe(), fork(), close(), and
execve() that is required to set things up correctly for two particular
and common usages of pipes in application code. (It is not used by
common shells to implement pipelines though.)

>There isn't even any standard _term_ for talking about connecting to both
>stdin and stdout of a command in UNIX, and that's because it's an
>incredibly easy way to deadlock.

There is no need to have a term for it, since all you need is two pipes,
one for each direction - and it's probably uncommon enough to not
warrant its own convenience function. And you can indeed easily deadlock
if you don't think about what you're doing, but I really doubt that this
is the reason for any absence of terminology or functions.

But anyway, I don't see how any of this is relevant to the question at
hand. Opening a bi-directional connection between two processes by means
of a pair of pipes is exactly what erlang:open_port/2 *already does*
when you use 'spawn' (or 'spawn_executable' these days) to start an
external process. And it has been doing this since day one, and I can't
recall anyone complaining how this is hopelessly dangerous due to the
risk of deadlock (the risk is of course reduced due to the fact that the
VM does non-blocking I/O).

>> This issue prevents Erlang users from doing any even slightly more than
>> trivial communication with external processes without having some kind
>> of middleman program that handles the creation of the actual process you
>> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>
>Just like it prevents C users from doing the same thing.

No, there is nothing that prevents C users from doing the same thing.
And even if they have to go to some effort to do it, it just means
having to write a bit more C - whereas the Erlang user can't write a bit
more Erlang to do just the small addition of "close the write side of
one of the pipes", even though the pipe pair is already there...

>Unix anonymous pipes are simply the wrong tool for the job in _any_
>programming language.
>
>The historic way to do "slightly more than trivial communication with
>external processes" has been to set the external processes up as C nodes
>or to use sockets.

Using (TCP) sockets instead of pipes doesn't really change the "risk of
deadlock". In the case of the Erlang VM (i.e. open_port vs gen_tcp), it
may actually increase it, due to the existence of the passive and
{active, once} modes for sockets - another piece of functionality that
is "missing" from ports.

--Per Hedeland

Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Anthony Grimes
In reply to this post by Robert Raschke-7
Right, there are certainly ways around it, but the point of this
discussion is that it seems to be a missing piece of functionality that
Erlangers cannot do unless they write C code. It seems like ports are
90% of the way there. I think Mr. Hedeland represented my points nicely.

-Anthony

> Robert Raschke <mailto:rtrlists>
> July 29, 2013 4:16 AM
> Hi Anthony,
>
> could Py-interface http://www.lysator.liu.se/~tab/erlang/py_interface/
> <http://www.lysator.liu.se/%7Etab/erlang/py_interface/> help with that?
>
> Note, I've not used it, so don't know much at all beyond reading that
> page. It would probably mean using the Pygments library, rather than
> going via the command line pygmentize. You can probably take the
> pygmentize source and replace the command line handling with setting
> up a Python Node.
>
> Alternatively, and not necessarily very nice, instead of streaming to
> pygmentize, write to a file and invoke on that. You wouldn't even need
> a port then, you could get away with os:cmd/1 (if you aren't
> interested in return codes). But you already know this, I think.
>
> Robby
>
> PS Something like expect for Erlang ports would be pretty cool,
> though. Not that I'm not volunteering ;-)
>
>
>
>
> Anthony Grimes <mailto:i>
> July 29, 2013 2:19 AM
> Hey Robert.
>
> I don't think C nodes help in this case, and they don't solve the
> general problem. One of my use cases is talking to pygmentize, which
> is a Python program. If I want to do that, I have to write a middleman
> program that does the actual communication with this program and talks
> to my Erlang program over a socket, or via ports if I make the
> middleman program look for some specific sequence of bytes to treat as
> EOF since I can't send actual EOF.
>
>
>
> Robert Raschke <mailto:rtrlists>
> July 29, 2013 2:12 AM
>
> Hi Anthony,
>
> In the past, I've tended to use the port mechanism to simply kick off
> a C node, which then allows you to have full control over whatever
> communications needs you have.
>
> This obviously only works if you are interfacing with a technology
> that will allow you to create C node and use the EI libs in some way.
> Not sure if that is the case from what you wrote.
>
> Regards,
> Robby
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/12432d39/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compose-unknown-contact.jpg
Type: image/jpeg
Size: 770 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/12432d39/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postbox-contact.jpg
Type: image/jpeg
Size: 1188 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/12432d39/attachment-0001.jpg>

Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Richard A. O'Keefe
In reply to this post by Richard A. O'Keefe

On 29/07/2013, at 8:20 PM, Anthony Grimes wrote:

> Yeah, re-reading your post a couple of times, I think we might be on the wrong page or something. Here is a low level example of what I'd like to be able to do in Erlang. This is a Clojure repl session where I interact with the 'cat' program via the built in Java Process library:
>
> user=> (def proc (.exec (Runtime/getRuntime) (into-array String ["cat"])))
> #'user/proc
> user=> (def stdin (.getOutputStream proc))
> #'user/stdin
> user=> (def stdout (.getInputStream proc))

I have some trouble reading Clojure.  I don't know what the dots are.
Hazarding a guess,

        This is *PRECISELY* the "Hello, deadlock!"
        kind of buggy stuff that the C interface was designed
        to *not* let you write.

> Lots of unix programs work like this.
> We have cat in this example, but grep, wc, and various others work like that as well. It is this easy or easier to do the same thing in every other language I can think of.

Actually, NO.  You are talking about "filters" here,
and filters are designed to be connected into ***ACYCLIC*** networks.

> If it's fundamentally a bad thing, I'm surprised these programs work like that in the first place and that these languages support this.

The programs do NOT work the way you think they do.
A filter reads from its standard input.
It writes to its standard output.
If it could have emotions, it would view the prospect
of those two being the *same* thing with shuddering dread.
(Except of course, when the thing is the terminal.  The
user is assumed to be capable of infinite buffering.)

Erlang is perfectly happy to be connected to an ACYCLIC network
of pipe-linked processes too.

> It seems to be an entirely common place, basic feature any remotely high level programming language.

Actually, no.  The ability to connect to the standard input *AND* the standard
output of the *same* process is *not* a commonplace feature of high level
programming languages (some do, some don't) because unless you code with
extreme (and to a certain extent, non-portable) care, you end up in deadlock land.

Only if one of the programs is absolutely guaranteed to write a tiny
amount of information -- at most one PIPE_BUF worth, do you have
any shadow of a trace of a right to expect it to work.

If you don't believe me, believe the Java documentation,
where the page for java.lang.Process says

        All [the new process's] standard io (i.e. stdin, stdout, stderr)
        operations will be redirected to the parent process through
        three streams (getOutputStream(), getInputStream(),
        getErrorStream()).  The parent process uses these streams to feed input to and get output from the subprocess.
>>>>>> Because some native platforms only provide limited buffer size
>>>>>> for standard input and output streams, failure to promptly
>>>>>> write the input stream or read the output stream of the
>>>>>> subprocess may cause the subprocess to block, and even deadlock.

The POSIX guarantee for PIPE_BUF is just 512 bytes.
That is, should the parent process write 513 bytes to the child,
and the child write 513 bytes to the parent,
hello deadlock!

Like I said, connecting to *both* ends of a command through pipes
is something to anticipate with shuddering dread.  It is *not* a
standard feature to be used lightly.

I can't find anything about external processes in the Haskell 2010
report.  System.Process
http://www.haskell.org/ghc/docs/7.4-latest/html/libraries/process-1.1.0.1/System-Process.html
isn't mentioned in Haskell 2010.  I am actually pretty shocked that
the documentation doesn't mention the deadlock problem.




Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Richard A. O'Keefe
In reply to this post by Per Hedeland-4

On 30/07/2013, at 2:13 AM, Per Hedeland wrote:
>
> FWIW, I definitely agree that this is a missing piece of functionality.

Oh it's present in several languages.  You could argue that it's missing.
But is it missing like a car radio in a car that lacks one,
or is it missing like an ejector seat in a car that lacks one?

>> Note that "only send data to a command" and "only receive data from a
>> command" are the traditional ways for a UNIX program to communicate
>> with another over a pipe.
>
> Well, it's basically the definition of the traditional pipeline concept
> of the Unix shells, and pipes are obviously what you need to implement
> it - but that doesn't preclude other uses of pipes. The zsh shell even
> allows you to set up "bidirectional pipes" on the commandline.

There are reasons why I don't use zsh.  That's one of them.
>
>> popen(<command>, "r") reads the output of
>> the command and popen(<command>, "w") writes to the input of the command.
>
> popen() is effectively a convenience function to abstract away the
> somewhat non-trivial application of pipe(), fork(), close(), and
> execve() that is required to set things up correctly for two particular
> and common usages of pipes in application code. (It is not used by
> common shells to implement pipelines though.)

Having implemented popen() for two other high level languages, I know that.
The fact that common shells do not use it is irrelevant.
>
>> There isn't even any standard _term_ for talking about connecting to both
>> stdin and stdout of a command in UNIX, and that's because it's an
>> incredibly easy way to deadlock.
>
> There is no need to have a term for it, since all you need is two pipes,
> one for each direction

Non-sequitur.  If it's a thing you need to do often, it's a thing you
need to be able to talk about.  When I've thought of doing it, I've
used the word David Bacon used in his SETL2 system, "pump".  And then
I've used the phrase "looming disaster" and done something else.

> - and it's probably uncommon enough to not
> warrant its own convenience function. And you can indeed easily deadlock
> if you don't think about what you're doing, but I really doubt that this
> is the reason for any absence of terminology or functions.

It's not that you can easily deadlock,
it's that it's hard *NOT* to deadlock.

>  (the risk is of course reduced due to the fact that the
> VM does non-blocking I/O).

And *that* is the thing that saves Erlang.  Of course, avoiding the
coding complexity of dealing with nonblocking I/O is one of the reasons
for using a multithreading language like Erlang.
>>
>> Just like it prevents C users from doing the same thing.
>
> No, there is nothing that prevents C users from doing the same thing.

You may have misunderstood me.
>>>> THE POPEN INTERFACE <<<< prevents C users doing this.
Yes, all the other functions are there, and yes, if you desperately
want to program it, you can.  But it is enough work that nobody
ever does this lightly.

For that matter, it's not beyond the abilities of, say, the glibc
authors, to extend their implementation of popen to support "r+"
or "w+" modes, if there were much demand for it.  (Oddly enough,
Mac OS X 10.7.5 _does_ support "r+" mode, but the Linux I checked
does not.  Weird.)  I have _never_ been able to understand the
differences between Mac OS X and POSIX.


Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Anthony Grimes
In reply to this post by Richard A. O'Keefe
I apologize for getting terminology wrong, I guess. I've never heard
these terms before in reference to working with external processes. I'm
still not really getting it though.

The problem here is that Erlang ports let me read from stdout and write
to stdin, but not let me flush stdin and tell the program that I'm done
writing, which is important for certain programs that wait for this EOF
to start producing output. What am I saying that's wrong? I'm not saying
these are the same things. In fact, I'm not trying to make any
assertions about how unix processes and pipes works, and if it sounds
like that then I apologize and retract any assertions I've made.

I'm simply asking how I'm supposed to deal with programs that I have
*no* control over that wait for a EOF on their input to start producing
data. I just want to send a ^D.I have examples of programs that do that
and a thousand ways to do it in other languages, and I'm being told it
is disastrous to do. If the solution is to just keep doing what
everybody has been doing in Erlang for years which is write giant hack
middleman programs to do it, then I'll concede defeat. I've got to say
though, I'm pretty blown away by your response to this.

Anyways, thanks for the responses everyone!

> Richard A. O'Keefe <mailto:ok>
> July 29, 2013 10:44 PM
> On 29/07/2013, at 8:20 PM, Anthony Grimes wrote:
>
>> Yeah, re-reading your post a couple of times, I think we might be on the wrong page or something. Here is a low level example of what I'd like to be able to do in Erlang. This is a Clojure repl session where I interact with the 'cat' program via the built in Java Process library:
>>
>> user=> (def proc (.exec (Runtime/getRuntime) (into-array String ["cat"])))
>> #'user/proc
>> user=> (def stdin (.getOutputStream proc))
>> #'user/stdin
>> user=> (def stdout (.getInputStream proc))
>
> I have some trouble reading Clojure.  I don't know what the dots are.
> Hazarding a guess,
>
> This is *PRECISELY* the "Hello, deadlock!"
> kind of buggy stuff that the C interface was designed
> to *not* let you write.
>
>> Lots of unix programs work like this.
>> We have cat in this example, but grep, wc, and various others work like that as well. It is this easy or easier to do the same thing in every other language I can think of.
>
> Actually, NO.  You are talking about "filters" here,
> and filters are designed to be connected into ***ACYCLIC*** networks.
>
>> If it's fundamentally a bad thing, I'm surprised these programs work like that in the first place and that these languages support this.
>
> The programs do NOT work the way you think they do.
> A filter reads from its standard input.
> It writes to its standard output.
> If it could have emotions, it would view the prospect
> of those two being the *same* thing with shuddering dread.
> (Except of course, when the thing is the terminal.  The
> user is assumed to be capable of infinite buffering.)
>
> Erlang is perfectly happy to be connected to an ACYCLIC network
> of pipe-linked processes too.
>
>> It seems to be an entirely common place, basic feature any remotely high level programming language.
>
> Actually, no.  The ability to connect to the standard input *AND* the standard
> output of the *same* process is *not* a commonplace feature of high level
> programming languages (some do, some don't) because unless you code with
> extreme (and to a certain extent, non-portable) care, you end up in deadlock land.
>
> Only if one of the programs is absolutely guaranteed to write a tiny
> amount of information -- at most one PIPE_BUF worth, do you have
> any shadow of a trace of a right to expect it to work.
>
> If you don't believe me, believe the Java documentation,
> where the page for java.lang.Process says
>
> All [the new process's] standard io (i.e. stdin, stdout, stderr)
> operations will be redirected to the parent process through
> three streams (getOutputStream(), getInputStream(),
> getErrorStream()).  The parent process uses these streams to feed input to and get output from the subprocess.
>>>>>>> Because some native platforms only provide limited buffer size
>>>>>>> for standard input and output streams, failure to promptly
>>>>>>> write the input stream or read the output stream of the
>>>>>>> subprocess may cause the subprocess to block, and even deadlock.
>
> The POSIX guarantee for PIPE_BUF is just 512 bytes.
> That is, should the parent process write 513 bytes to the child,
> and the child write 513 bytes to the parent,
> hello deadlock!
>
> Like I said, connecting to *both* ends of a command through pipes
> is something to anticipate with shuddering dread.  It is *not* a
> standard feature to be used lightly.
>
> I can't find anything about external processes in the Haskell 2010
> report.  System.Process
> http://www.haskell.org/ghc/docs/7.4-latest/html/libraries/process-1.1.0.1/System-Process.html
> isn't mentioned in Haskell 2010.  I am actually pretty shocked that
> the documentation doesn't mention the deadlock problem.
>
>
>
> Anthony Grimes <mailto:i>
> July 29, 2013 1:20 AM
> Yeah, re-reading your post a couple of times, I think we might be on
> the wrong page or something. Here is a low level example of what I'd
> like to be able to do in Erlang. This is a Clojure repl session where
> I interact with the 'cat' program via the built in Java Process library:
>
> user=> (def proc (.exec (Runtime/getRuntime) (into-array String ["cat"])))
> #'user/proc
> user=> (def stdin (.getOutputStream proc))
> #'user/stdin
> user=> (def stdout (.getInputStream proc))
> #'user/stdout
> user=> (.write stdin (.getBytes "Hi!"))
> nil
> user=> (.close stdin)
> nil
> user=> (let [arr (byte-array 3)] (.read stdout arr) (String. arr))
> "Hi!"
>
> Lots of unix programs work like this. We have cat in this example, but
> grep, wc, and various others work like that as well. It is this easy
> or easier to do the same thing in every other language I can think of.
> If it's fundamentally a bad thing, I'm surprised these programs work
> like that in the first place and that these languages support this. It
> seems to be an entirely common place, basic feature any remotely high
> level programming language.
>
> Perhaps this example and clarification will clear things up!
>
> -Anthony
> Richard A. O'Keefe <mailto:ok>
> July 29, 2013 12:15 AM
> On 29/07/2013, at 4:14 PM, Anthony Grimes wrote:
>
>> and communicating with external processes in Erlang. They seem to have
>> at least one particular fatal flaw which prevents them from being very
>> useful to me, and that is that there is no way to close stdin (and send
>> EOF) and then also read from the process's stdout. For example, I cannot
>> use a port to start the 'cat' program which listens on stdin for data
>> and waits for EOF and then echos that data back to you. I can do the
>> first part, which is send it data on stdin, but the only way for me to
>> close it is to call port_close and close the entire process.
>
> Note that "only send data to a command" and "only receive data from a
> command" are the traditional ways for a UNIX program to communicate
> with another over a pipe.  popen(<command>, "r") reads the output of
> the command and popen(<command>, "w") writes to the input of the command.
> There isn't even any standard _term_ for talking about connecting to both
> stdin and stdout of a command in UNIX, and that's because it's an
> incredibly easy way to deadlock.
>
>> This issue prevents Erlang users from doing any even slightly more than
>> trivial communication with external processes without having some kind
>> of middleman program that handles the creation of the actual process you
>> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>
> Just like it prevents C users from doing the same thing.
> Unless they fake something up using named pipes or UNIX-domain sockets.
> (Or message queues.  I do wish Mac OS X implemented rather more of POSIX...)
>
> Unix anonymous pipes are simply the wrong tool for the job in _any_
> programming language.
>
> The historic way to do "slightly more than trivial communication with
> external processes" has been to set the external processes up as C nodes
> or to use sockets.
>
>
>
>
> Anthony Grimes <mailto:i>
> July 28, 2013 9:14 PM
> Howdy folks.
>
> I unfortunately have not been able to use Erlang for most of what I've
> been doing lately because of a long standing issue with Erlang ports
> that I'd like to start a discussion about here.
>
> As far as I am aware, ports are generally the only option for creating
> and communicating with external processes in Erlang. They seem to have
> at least one particular fatal flaw which prevents them from being very
> useful to me, and that is that there is no way to close stdin (and send
> EOF) and then also read from the process's stdout. For example, I cannot
> use a port to start the 'cat' program which listens on stdin for data
> and waits for EOF and then echos that data back to you. I can do the
> first part, which is send it data on stdin, but the only way for me to
> close it is to call port_close and close the entire process.
>
> This issue prevents Erlang users from doing any even slightly more than
> trivial communication with external processes without having some kind
> of middleman program that handles the creation of the actual process you
> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>
> I could totally be wrong, but it seems like we need something other than
> just port_close. Something like
> http://www.erlang.org/doc/man/gen_tcp.html#shutdown-2
> <http://www.erlang.org/doc/man/gen_tcp.html#shutdown-2> which lets you say
> "Hey, I want to close the stdin of this process but still read from its
> stdout." or something similar. I could be totally off track on what a
> good solution would be.
>
> So I'm wondering if people are aware of this problem, and I'd like to
> make sure that people think it is an actual problem that should be
> fixed. I'm also curious what people think a good solution to the problem
> would be. I'm not sure I have the time/particular skill set to fix it
> given that the port code is some pretty obscure (to me) C code, but
> starting conversation seems like a good way to begin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/4dcc3a4f/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compose-unknown-contact.jpg
Type: image/jpeg
Size: 770 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/4dcc3a4f/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postbox-contact.jpg
Type: image/jpeg
Size: 1188 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/4dcc3a4f/attachment-0001.jpg>

Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Anthony Grimes
In reply to this post by Richard A. O'Keefe
Also, fwiw, here are links to other people with the same issue over the
years:

http://erlang.org/pipermail/erlang-questions/2010-November/054330.html
http://erlang.org/pipermail/erlang-questions/2010-October/053944.html
http://erlang.org/pipermail/erlang-questions/2009-March/042123.html
http://stackoverflow.com/questions/8792376/erlang-ports-interfacing-with-a-wc-like-program

And here are hacks that have been written to get around this limitation:

https://github.com/mattsta/erlang-stdinout-pool#why-is-this-special
I assume this does as well: https://github.com/saleyn/erlexec

Cheers!

-Anthony

> Richard A. O'Keefe <mailto:ok>
> July 29, 2013 10:55 PM
> On 30/07/2013, at 2:13 AM, Per Hedeland wrote:
>> FWIW, I definitely agree that this is a missing piece of functionality.
>
> Oh it's present in several languages.  You could argue that it's missing.
> But is it missing like a car radio in a car that lacks one,
> or is it missing like an ejector seat in a car that lacks one?
>
>>> Note that "only send data to a command" and "only receive data from a
>>> command" are the traditional ways for a UNIX program to communicate
>>> with another over a pipe.
>> Well, it's basically the definition of the traditional pipeline concept
>> of the Unix shells, and pipes are obviously what you need to implement
>> it - but that doesn't preclude other uses of pipes. The zsh shell even
>> allows you to set up "bidirectional pipes" on the commandline.
>
> There are reasons why I don't use zsh.  That's one of them.
>>> popen(<command>, "r") reads the output of
>>> the command and popen(<command>, "w") writes to the input of the command.
>> popen() is effectively a convenience function to abstract away the
>> somewhat non-trivial application of pipe(), fork(), close(), and
>> execve() that is required to set things up correctly for two particular
>> and common usages of pipes in application code. (It is not used by
>> common shells to implement pipelines though.)
>
> Having implemented popen() for two other high level languages, I know that.
> The fact that common shells do not use it is irrelevant.
>>> There isn't even any standard _term_ for talking about connecting to both
>>> stdin and stdout of a command in UNIX, and that's because it's an
>>> incredibly easy way to deadlock.
>> There is no need to have a term for it, since all you need is two pipes,
>> one for each direction
>
> Non-sequitur.  If it's a thing you need to do often, it's a thing you
> need to be able to talk about.  When I've thought of doing it, I've
> used the word David Bacon used in his SETL2 system, "pump".  And then
> I've used the phrase "looming disaster" and done something else.
>
>> - and it's probably uncommon enough to not
>> warrant its own convenience function. And you can indeed easily deadlock
>> if you don't think about what you're doing, but I really doubt that this
>> is the reason for any absence of terminology or functions.
>
> It's not that you can easily deadlock,
> it's that it's hard *NOT* to deadlock.
>
>>  (the risk is of course reduced due to the fact that the
>> VM does non-blocking I/O).
>
> And *that* is the thing that saves Erlang.  Of course, avoiding the
> coding complexity of dealing with nonblocking I/O is one of the reasons
> for using a multithreading language like Erlang.
>>> Just like it prevents C users from doing the same thing.
>> No, there is nothing that prevents C users from doing the same thing.
>
> You may have misunderstood me.
>>>>> THE POPEN INTERFACE <<<< prevents C users doing this.
> Yes, all the other functions are there, and yes, if you desperately
> want to program it, you can.  But it is enough work that nobody
> ever does this lightly.
>
> For that matter, it's not beyond the abilities of, say, the glibc
> authors, to extend their implementation of popen to support "r+"
> or "w+" modes, if there were much demand for it.  (Oddly enough,
> Mac OS X 10.7.5 _does_ support "r+" mode, but the Linux I checked
> does not.  Weird.)  I have _never_ been able to understand the
> differences between Mac OS X and POSIX.
>
> Per Hedeland <mailto:per>
> July 29, 2013 7:13 AM
> "Richard A. O'Keefe" <ok> wrote:
>> On 29/07/2013, at 4:14 PM, Anthony Grimes wrote:
>>
>>> and communicating with external processes in Erlang. They seem to have
>>> at least one particular fatal flaw which prevents them from being very
>>> useful to me, and that is that there is no way to close stdin (and send
>>> EOF) and then also read from the process's stdout. For example, I cannot
>>> use a port to start the 'cat' program which listens on stdin for data
>>> and waits for EOF and then echos that data back to you. I can do the
>>> first part, which is send it data on stdin, but the only way for me to
>>> close it is to call port_close and close the entire process.
>
> FWIW, I definitely agree that this is a missing piece of functionality.
> I'm not sure how useful/important it is in the grand scheme of things,
> but personally I could have used it on a couple of occasions. As you
> mentioned, it's basically the equivalent of TCP shutdown() that is
> needed, although shutdown() is perhaps a bit over-engineered - I've
> never seen anyone use SHUT_RD or SHUT_RDWR...
>
> Also the "opposite" functionality is already available for ports via the
> 'eof' option - i.e. you get informed that the other end has closed its
> write side, but can still write data in the other direction.
>
>> Note that "only send data to a command" and "only receive data from a
>> command" are the traditional ways for a UNIX program to communicate
>> with another over a pipe.
>
> Well, it's basically the definition of the traditional pipeline concept
> of the Unix shells, and pipes are obviously what you need to implement
> it - but that doesn't preclude other uses of pipes. The zsh shell even
> allows you to set up "bidirectional pipes" on the commandline.
>
>> popen(<command>, "r") reads the output of
>> the command and popen(<command>, "w") writes to the input of the command.
>
> popen() is effectively a convenience function to abstract away the
> somewhat non-trivial application of pipe(), fork(), close(), and
> execve() that is required to set things up correctly for two particular
> and common usages of pipes in application code. (It is not used by
> common shells to implement pipelines though.)
>
>> There isn't even any standard _term_ for talking about connecting to both
>> stdin and stdout of a command in UNIX, and that's because it's an
>> incredibly easy way to deadlock.
>
> There is no need to have a term for it, since all you need is two pipes,
> one for each direction - and it's probably uncommon enough to not
> warrant its own convenience function. And you can indeed easily deadlock
> if you don't think about what you're doing, but I really doubt that this
> is the reason for any absence of terminology or functions.
>
> But anyway, I don't see how any of this is relevant to the question at
> hand. Opening a bi-directional connection between two processes by means
> of a pair of pipes is exactly what erlang:open_port/2 *already does*
> when you use 'spawn' (or 'spawn_executable' these days) to start an
> external process. And it has been doing this since day one, and I can't
> recall anyone complaining how this is hopelessly dangerous due to the
> risk of deadlock (the risk is of course reduced due to the fact that the
> VM does non-blocking I/O).
>
>>> This issue prevents Erlang users from doing any even slightly more than
>>> trivial communication with external processes without having some kind
>>> of middleman program that handles the creation of the actual process you
>>> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>> Just like it prevents C users from doing the same thing.
>
> No, there is nothing that prevents C users from doing the same thing.
> And even if they have to go to some effort to do it, it just means
> having to write a bit more C - whereas the Erlang user can't write a bit
> more Erlang to do just the small addition of "close the write side of
> one of the pipes", even though the pipe pair is already there...
>
>> Unix anonymous pipes are simply the wrong tool for the job in _any_
>> programming language.
>>
>> The historic way to do "slightly more than trivial communication with
>> external processes" has been to set the external processes up as C nodes
>> or to use sockets.
>
> Using (TCP) sockets instead of pipes doesn't really change the "risk of
> deadlock". In the case of the Erlang VM (i.e. open_port vs gen_tcp), it
> may actually increase it, due to the existence of the passive and
> {active, once} modes for sockets - another piece of functionality that
> is "missing" from ports.
>
> --Per Hedeland
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/7f3bc32e/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compose-unknown-contact.jpg
Type: image/jpeg
Size: 770 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/7f3bc32e/attachment-0001.jpg>

Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Richard A. O'Keefe
In reply to this post by Anthony Grimes

On 30/07/2013, at 6:02 PM, Anthony Grimes wrote:
> I'm simply asking how I'm supposed to deal with programs that I have *no* control over that wait for a EOF on their input to start producing data. I just want to send a ^D.

End of file and Control-D are two different things.
UNIX technically doesn't have a character that signifies end of file.
There is an "EOF" character, but what it does is "send the current line NOW",
and sending a line with *no* characters, not even a newline, causes a read()
to return 0, and _that_ is the end of file indication.
Since 1979, when I started using UNIX, my end-of-file character has been
Control-Z, to be compatible with the other computer systems I was using.
I am rather fed up with readline-infested programs that insist on Control-D
instead of accepting the EOF character I set up.

And a program that waits for all of its input before producing any data
is by definition not a filter.

There is a very very simple technique.
(1) Create a temporary file.
(2) Create a pipe, telling that pipe to write to the temporary file.
(3) Send your data to the pipe and close the pipe.
(4) Now read the temporary file.

I still don't understand why you can't do this/

> I have examples of programs that do that and a thousand ways to do it in other languages, and I'm being told it is disastrous to do. If the solution is to just keep doing what everybody has been doing in Erlang for years which is write giant hack middleman programs to do it,

No, that is NOT what everybody has been doing in Erlang.
Creating a temporary file is simple and safe and easy to program.


Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Anthony Grimes
Once again, I'm sorry I haven't got the terminology right.

Sure, that's an option. I'm still unclear why I can do this in every
other language I use but not Erlang. I find it hard to believe that all
other languages with this capability are doing something utterly stupid,
but Erlang, being the best language in the known universe, has gotten it
utterly correct. But once again, I'll concede.

Thanks for the responses and teaching me a thing or two, Mr. O'Keefe!

> Richard A. O'Keefe <mailto:ok>
> July 29, 2013 11:23 PM
> On 30/07/2013, at 6:02 PM, Anthony Grimes wrote:
>> I'm simply asking how I'm supposed to deal with programs that I have *no* control over that wait for a EOF on their input to start producing data. I just want to send a ^D.
>
> End of file and Control-D are two different things.
> UNIX technically doesn't have a character that signifies end of file.
> There is an "EOF" character, but what it does is "send the current line NOW",
> and sending a line with *no* characters, not even a newline, causes a read()
> to return 0, and _that_ is the end of file indication.
> Since 1979, when I started using UNIX, my end-of-file character has been
> Control-Z, to be compatible with the other computer systems I was using.
> I am rather fed up with readline-infested programs that insist on Control-D
> instead of accepting the EOF character I set up.
>
> And a program that waits for all of its input before producing any data
> is by definition not a filter.
>
> There is a very very simple technique.
> (1) Create a temporary file.
> (2) Create a pipe, telling that pipe to write to the temporary file.
> (3) Send your data to the pipe and close the pipe.
> (4) Now read the temporary file.
>
> I still don't understand why you can't do this/
>
>> I have examples of programs that do that and a thousand ways to do it in other languages, and I'm being told it is disastrous to do. If the solution is to just keep doing what everybody has been doing in Erlang for years which is write giant hack middleman programs to do it,
>
> No, that is NOT what everybody has been doing in Erlang.
> Creating a temporary file is simple and safe and easy to program.
>
> Anthony Grimes <mailto:i>
> July 29, 2013 1:20 AM
> Yeah, re-reading your post a couple of times, I think we might be on
> the wrong page or something. Here is a low level example of what I'd
> like to be able to do in Erlang. This is a Clojure repl session where
> I interact with the 'cat' program via the built in Java Process library:
>
> user=> (def proc (.exec (Runtime/getRuntime) (into-array String ["cat"])))
> #'user/proc
> user=> (def stdin (.getOutputStream proc))
> #'user/stdin
> user=> (def stdout (.getInputStream proc))
> #'user/stdout
> user=> (.write stdin (.getBytes "Hi!"))
> nil
> user=> (.close stdin)
> nil
> user=> (let [arr (byte-array 3)] (.read stdout arr) (String. arr))
> "Hi!"
>
> Lots of unix programs work like this. We have cat in this example, but
> grep, wc, and various others work like that as well. It is this easy
> or easier to do the same thing in every other language I can think of.
> If it's fundamentally a bad thing, I'm surprised these programs work
> like that in the first place and that these languages support this. It
> seems to be an entirely common place, basic feature any remotely high
> level programming language.
>
> Perhaps this example and clarification will clear things up!
>
> -Anthony
> Richard A. O'Keefe <mailto:ok>
> July 29, 2013 12:15 AM
> On 29/07/2013, at 4:14 PM, Anthony Grimes wrote:
>
>> and communicating with external processes in Erlang. They seem to have
>> at least one particular fatal flaw which prevents them from being very
>> useful to me, and that is that there is no way to close stdin (and send
>> EOF) and then also read from the process's stdout. For example, I cannot
>> use a port to start the 'cat' program which listens on stdin for data
>> and waits for EOF and then echos that data back to you. I can do the
>> first part, which is send it data on stdin, but the only way for me to
>> close it is to call port_close and close the entire process.
>
> Note that "only send data to a command" and "only receive data from a
> command" are the traditional ways for a UNIX program to communicate
> with another over a pipe.  popen(<command>, "r") reads the output of
> the command and popen(<command>, "w") writes to the input of the command.
> There isn't even any standard _term_ for talking about connecting to both
> stdin and stdout of a command in UNIX, and that's because it's an
> incredibly easy way to deadlock.
>
>> This issue prevents Erlang users from doing any even slightly more than
>> trivial communication with external processes without having some kind
>> of middleman program that handles the creation of the actual process you
>> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>
> Just like it prevents C users from doing the same thing.
> Unless they fake something up using named pipes or UNIX-domain sockets.
> (Or message queues.  I do wish Mac OS X implemented rather more of POSIX...)
>
> Unix anonymous pipes are simply the wrong tool for the job in _any_
> programming language.
>
> The historic way to do "slightly more than trivial communication with
> external processes" has been to set the external processes up as C nodes
> or to use sockets.
>
>
>
>
> Anthony Grimes <mailto:i>
> July 28, 2013 9:14 PM
> Howdy folks.
>
> I unfortunately have not been able to use Erlang for most of what I've
> been doing lately because of a long standing issue with Erlang ports
> that I'd like to start a discussion about here.
>
> As far as I am aware, ports are generally the only option for creating
> and communicating with external processes in Erlang. They seem to have
> at least one particular fatal flaw which prevents them from being very
> useful to me, and that is that there is no way to close stdin (and send
> EOF) and then also read from the process's stdout. For example, I cannot
> use a port to start the 'cat' program which listens on stdin for data
> and waits for EOF and then echos that data back to you. I can do the
> first part, which is send it data on stdin, but the only way for me to
> close it is to call port_close and close the entire process.
>
> This issue prevents Erlang users from doing any even slightly more than
> trivial communication with external processes without having some kind
> of middleman program that handles the creation of the actual process you
> need to talk to and looks for a specific byte sequence to indicate 'EOF'.
>
> I could totally be wrong, but it seems like we need something other than
> just port_close. Something like
> http://www.erlang.org/doc/man/gen_tcp.html#shutdown-2
> <http://www.erlang.org/doc/man/gen_tcp.html#shutdown-2> which lets you say
> "Hey, I want to close the stdin of this process but still read from its
> stdout." or something similar. I could be totally off track on what a
> good solution would be.
>
> So I'm wondering if people are aware of this problem, and I'd like to
> make sure that people think it is an actual problem that should be
> fixed. I'm also curious what people think a good solution to the problem
> would be. I'm not sure I have the time/particular skill set to fix it
> given that the port code is some pretty obscure (to me) C code, but
> starting conversation seems like a good way to begin.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/7a40549a/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: compose-unknown-contact.jpg
Type: image/jpeg
Size: 770 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/7a40549a/attachment.jpg>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: postbox-contact.jpg
Type: image/jpeg
Size: 1188 bytes
Desc: not available
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130729/7a40549a/attachment-0001.jpg>

Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Jon Meredith
In reply to this post by Richard A. O'Keefe
> End of file and Control-D are two different things.
> UNIX technically doesn't have a character that signifies end of file.

What's more the connection between ^D (or other programmed characters) and
things they cause such as signals exists in the terminal layer and does
not exist in pipes at all. Or rather _only_ exists in tty devices (and
virtual ones).

Jon



Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Jesper Louis Andersen
In reply to this post by Anthony Grimes
On Tue, Jul 30, 2013 at 8:30 AM, Anthony Grimes <i> wrote:

> Sure, that's an option. I'm still unclear why I can do this in every other
> language I use but not Erlang. I find it hard to believe that all other
> languages with this capability are doing something utterly stupid, but
> Erlang, being the best language in the known universe, has gotten it
> utterly correct. But once again, I'll concede.


One thing to muse about is how useful an unimplemented feature is, if said
language is more than 20 years old :)

Yes, you could add it, and it may make sense to add, but I find it way
easier to hack pygments to be able to read data in {packet, 4} format.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130730/a43966c9/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Heinz Nikolaus Gies
I am facing the same issue as Anthony,
it is rather annoying there are more issues that are involved with Erlangs handling of the STD* pipes, i.e. missing a way to terminate a program that does not terminate on STDin close. I would not argue 'because it's not in Erlang it is not important'. I greatly enjoy that Erlang is not jumping any change to change but that argument means nothing would ever get get done :P


On Jul 30, 2013, at 10:43, Jesper Louis Andersen <jesper.louis.andersen> wrote:

>
> On Tue, Jul 30, 2013 at 8:30 AM, Anthony Grimes <i> wrote:
> Sure, that's an option. I'm still unclear why I can do this in every other language I use but not Erlang. I find it hard to believe that all other languages with this capability are doing something utterly stupid, but Erlang, being the best language in the known universe, has gotten it utterly correct. But once again, I'll concede.
>
> One thing to muse about is how useful an unimplemented feature is, if said language is more than 20 years old :)
>
> Yes, you could add it, and it may make sense to add, but I find it way easier to hack pygments to be able to read data in {packet, 4} format.
> _______________________________________________
> erlang-questions mailing list
> erlang-questions
> http://erlang.org/mailman/listinfo/erlang-questions

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130730/2e661794/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 801 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130730/2e661794/attachment.bin>

Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Robert Raschke-7
Having thought about this for a couple of minutes now, and having looked
around for some ideas from other languages, if I had the problem of
interacting with some random piece of software from within Erlang, I would
write a small Jinterface wrapper around ExpectJ or expect4j and take it
from there.

(I suspect that coding a native Erlang based expect clone is non-trivial,
but maybe ripping off some existing code could get you started down that
road.)

Robby



On Tue, Jul 30, 2013 at 11:01 AM, Heinz Nikolaus Gies <heinz>wrote:

> I am facing the same issue as Anthony,
> it is rather annoying there are more issues that are involved with Erlangs
> handling of the STD* pipes, i.e. missing a way to terminate a program that
> does not terminate on STDin close. I would not argue 'because it's not in
> Erlang it is not important'. I greatly enjoy that Erlang is not jumping any
> change to change but that argument means nothing would ever get get done :P
>
>
> On Jul 30, 2013, at 10:43, Jesper Louis Andersen <
> jesper.louis.andersen> wrote:
>
>
> On Tue, Jul 30, 2013 at 8:30 AM, Anthony Grimes <i> wrote:
>
>> Sure, that's an option. I'm still unclear why I can do this in every
>> other language I use but not Erlang. I find it hard to believe that all
>> other languages with this capability are doing something utterly stupid,
>> but Erlang, being the best language in the known universe, has gotten it
>> utterly correct. But once again, I'll concede.
>
>
> One thing to muse about is how useful an unimplemented feature is, if said
> language is more than 20 years old :)
>
> Yes, you could add it, and it may make sense to add, but I find it way
> easier to hack pygments to be able to read data in {packet, 4} format.
> _______________________________________________
> erlang-questions mailing list
> erlang-questions
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> _______________________________________________
> erlang-questions mailing list
> erlang-questions
> http://erlang.org/mailman/listinfo/erlang-questions
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://erlang.org/pipermail/erlang-questions/attachments/20130730/3cb5a35c/attachment.html>

Reply | Threaded
Open this post in threaded view
|

Issues with stdin on ports

Per Hedeland-4
In reply to this post by Richard A. O'Keefe
"Richard A. O'Keefe" <ok> wrote:
>
>On 30/07/2013, at 2:13 AM, Per Hedeland wrote:
>>
>> FWIW, I definitely agree that this is a missing piece of functionality.
>
>Oh it's present in several languages.  You could argue that it's missing.
>But is it missing like a car radio in a car that lacks one,
>or is it missing like an ejector seat in a car that lacks one?

I think it's more like a parking brake - not strictly needed, but quite
useful, and definitely not dangerous in any way.

>>> popen(<command>, "r") reads the output of
>>> the command and popen(<command>, "w") writes to the input of the command.
>>
>> popen() is effectively a convenience function to abstract away the
>> somewhat non-trivial application of pipe(), fork(), close(), and
>> execve() that is required to set things up correctly for two particular
>> and common usages of pipes in application code. (It is not used by
>> common shells to implement pipelines though.)
>
>Having implemented popen() for two other high level languages, I know that.
>The fact that common shells do not use it is irrelevant.

No, it is quite relevant, since it's one example of how popen() doesn't
define the "pipe API" in C, anymore than stdio defines the "I/O API" (in
fact considerably less, since popen() is far more limited relative to
the underlying basic API, and unlike stdio lacks the portability promise
of being ANSI standard).

>>> There isn't even any standard _term_ for talking about connecting to both
>>> stdin and stdout of a command in UNIX, and that's because it's an
>>> incredibly easy way to deadlock.
>>
>> There is no need to have a term for it, since all you need is two pipes,
>> one for each direction
>
>Non-sequitur.  If it's a thing you need to do often, it's a thing you
>need to be able to talk about.

Ditto - I didn't claim that it's something you need to do often, only
responded to your claim that the absence of a term is due to the dangers
of using it.

>>  (the risk is of course reduced due to the fact that the
>> VM does non-blocking I/O).
>
>And *that* is the thing that saves Erlang.

Yes, I was actually overly cautious with "reduced" - it completely
eliminates the risk of deadlock (at lest as long as there is no way to
block input from a port).

>You may have misunderstood me.
>>>>> THE POPEN INTERFACE <<<< prevents C users doing this.

Well, that's not what you said, so I guess the problem wasn't a case of
misunderstanding.

>For that matter, it's not beyond the abilities of, say, the glibc
>authors, to extend their implementation of popen to support "r+"
>or "w+" modes, if there were much demand for it.  (Oddly enough,
>Mac OS X 10.7.5 _does_ support "r+" mode, but the Linux I checked
>does not.  Weird.)  I have _never_ been able to understand the
>differences between Mac OS X and POSIX.

This is probably due to its relationship with FreeBSD, and the
explanation seems to be present in the popen(3) man page of both:

  Because popen() is now implemented using a
  bidirectional pipe, the mode argument may request a bidirectional data
  flow.

I.e. *pipes* are bidirectional on FreeBSD (and other *BSD AFAIK) - I'm
not sure whether they actually are on MacOS X, the pipe(2) man page
certainly doesn't say so (which it does on FreeBSD). It may well also be
the case that MacOS X just copied the BSD popen(3) man page without
bothering to make sure that it worked as advertised with their
underlying pipe(2). (Side note, FreeBSD pipe(2) mentions that SysVR4 was
actually first with bidirectional pipes, and indeed Solaris has them too
- but according to its popen(3C), doesn't support "r+".)

Anyway, interesting as it may be, I'm afraid that this will have to
conclude my participation in a discussion about pipes in C and the
potential dangers of using them in other ways than popen() allows for -
it is after all completely off-topic in this thread at least.

To summarize: Erlang open_port/2 *by default* creates a bi-directional,
deadlock-free communication channel to the external process. The
question at hand is just whether adding a way to close only the "output"
side of this channel has merit.

--Per

12