Streaming Data using httpc

classic Classic list List threaded Threaded
9 messages Options
Reply | Threaded
Open this post in threaded view
|

Streaming Data using httpc

John Duffy
Hi

I'm new to Erlang so please forgive my ignorance. I'm trying to stream data from a REST API using httpc, and although I have scoured the internet and the documentation I can't find a good example of how to do this, in particular how the "sync" and "receiver" options interoperate. My unsuccessful module looks like this...

-module(streaming).

-export([data/0]).

data() ->
    {ok, RequestId} = httpc:request(get, {"http://my_streaming_data_source.com", []}, [], [{sync, false}, {receiver, self()]),
    receive_data(RequestId).

receive_data(RequestId) ->
    receive
        {http, {RequestId, stream_start, Headers}} -> do something...;
        {http, {RequestId, stream, Data}}               - > do something...;
        {http, {RequestId, stream_end, Headers}}   -> do something...;
    end,
    receive_data(RequestId).


Is the above how I should be structuring my module?

Kind regards

John Duffy




_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Streaming Data using httpc

Paul Peregud-2
I don't have experience using httpc, so one remark only:

receive_data(RequestId, State) ->
    receive
        {http, {RequestId, stream_start, Headers}} -> do something...,
receive_data(RequestId, State);
        {http, {RequestId, stream, Data}}               - > do
something..., receive_data(RequestId, State);
        {http, {RequestId, stream_end, Headers}}   -> do something...
    end.

Because you want to exit receive_data when no more data is going your way.


On Sun, Apr 19, 2015 at 5:30 PM, John Duffy <[hidden email]> wrote:

> Hi
>
> I'm new to Erlang so please forgive my ignorance. I'm trying to stream data
> from a REST API using httpc, and although I have scoured the internet and
> the documentation I can't find a good example of how to do this, in
> particular how the "sync" and "receiver" options interoperate. My
> unsuccessful module looks like this...
>
> -module(streaming).
>
> -export([data/0]).
>
> data() ->
>     {ok, RequestId} = httpc:request(get,
> {"http://my_streaming_data_source.com", []}, [], [{sync, false}, {receiver,
> self()]),
>     receive_data(RequestId).
>
> receive_data(RequestId) ->
>     receive
>         {http, {RequestId, stream_start, Headers}} -> do something...;
>         {http, {RequestId, stream, Data}}               - > do something...;
>         {http, {RequestId, stream_end, Headers}}   -> do something...;
>     end,
>     receive_data(RequestId).
>
>
> Is the above how I should be structuring my module?
>
> Kind regards
>
> John Duffy
>
>
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
>



--
Best regards,
Paul Peregud
+48602112091
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Streaming Data using httpc

Jesper Louis Andersen-2
In reply to this post by John Duffy


On Sun, Apr 19, 2015 at 6:14 PM John Duffy <[hidden email]> wrote:


-module(streaming).


[...]

You are pretty close to the goal, but you are confusing the stream/receiver options I think. In streaming, you will receive the data as a series of chunks, which is what your code expect, but you don't supply an option requesting streaming operation. So you don't retrieve an expected tuple. You can add a catchall tuple to your receieve clause in receive_data/1 to make sure you have the right format in your match. Also, you can add a 'after' clause to time out after a while. This can make debugging easier since you "get back" to the the REPL.

The following works on a quick test in my end. Note how the receive clause is different from yours, and that you get everything in one fell swoop, rather than having to match on a multitude of clauses.

For more serious work, you might want to check out some of the numerous other projects for HTTP client requests. I'm partial to Gun and Hackney myself, but there are also ibrowse, lhttpc and fusco. They have slightly different semantics and areas at which they excel, so choose wisely :)

For a prototype however, I think httpc is fine.

-module(streaming).

-export([data/0]). 

data() ->
    {ok, RequestId} = httpc:request(get, {"example.com", []}, [], [{sync, false}, {receiver, self()}]),
    receive_data(RequestId).

receive_data(RequestId) ->
    receive
        {http, {RequestId, {StatusLine, Headers, Body}}} ->
            error_logger:info_report(
            #{
             status => StatusLine,
             headers => length(Headers),
             body_size => byte_size(Body)
            }),
            ok
    after 5000 ->
            error_logger:info_report(timeout)
    end.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Streaming Data using httpc

nx-2
In reply to this post by Paul Peregud-2
Not sure what you're trying to do, but if you're actually processing
the body of the message in a stream you may want to use the stream
option and make your module a process. Otherwise you can stream the
data to a file (i.e. downloading files) using the {stream, filename()}
option.

If you're wanting the former, and you want to do something with each
part of the stream and ask for the next block in the stream, you can
do something like this. This assumes you're module is a gen_server and
you're using the {stream, once} option to step through the stream
response:

handle_call({async_request, Url}, _From, State) ->
  httpc:request(get, {Url, _Headers=[]}, _HttpOptions=[], [{stream,
{self, once}}, {sync, false}]),
  {reply, ok, State}.

%% Start stream, store httpc Pid, continue
handle_info({http, {_RequestId, stream_start, _Headers, Pid}}, State}) ->
  ok = httpc:stream_next(Pid),
  {noreply, State#state{pid=Pid}};
%% Handle a chunk, continue
handle_info({http, {_RequestId, stream, Part}}, #state{pid=Pid}=State}) ->
  handle_part(Part),
  ok = httpc:stream_next(Pid),
  {noreply, State};
%% Handle a 404 error, cancel the request
handle_info({http, {RequestId, {{_HTTPVersion, 404, "Not Found"}, _Headers,
_Body}}}, State) ->
  ok = httpc:cancel_request(RequestId),
  {stop, normal, State};
%% Handle stream error, cancel the request
handle_info({http, {RequestId, {error, Reason}}}, State) ->
  ok = httpc:cancel_request(RequestId),
  {stop, normal, State};
%% Stream end
handle_info({http, {_RequestId, stream_end, _Headers}}, State) ->
  do_something(),
  {stop, normal, State}.

The standard httpc library is flexible enough if you want a precise
level of control over the lifetime of the HTTP stream.


On Sun, Apr 19, 2015 at 2:06 PM, Paul Peregud <[hidden email]> wrote:

> I don't have experience using httpc, so one remark only:
>
> receive_data(RequestId, State) ->
>     receive
>         {http, {RequestId, stream_start, Headers}} -> do something...,
> receive_data(RequestId, State);
>         {http, {RequestId, stream, Data}}               - > do
> something..., receive_data(RequestId, State);
>         {http, {RequestId, stream_end, Headers}}   -> do something...
>     end.
>
> Because you want to exit receive_data when no more data is going your way.
>
>
> On Sun, Apr 19, 2015 at 5:30 PM, John Duffy <[hidden email]> wrote:
>> Hi
>>
>> I'm new to Erlang so please forgive my ignorance. I'm trying to stream data
>> from a REST API using httpc, and although I have scoured the internet and
>> the documentation I can't find a good example of how to do this, in
>> particular how the "sync" and "receiver" options interoperate. My
>> unsuccessful module looks like this...
>>
>> -module(streaming).
>>
>> -export([data/0]).
>>
>> data() ->
>>     {ok, RequestId} = httpc:request(get,
>> {"http://my_streaming_data_source.com", []}, [], [{sync, false}, {receiver,
>> self()]),
>>     receive_data(RequestId).
>>
>> receive_data(RequestId) ->
>>     receive
>>         {http, {RequestId, stream_start, Headers}} -> do something...;
>>         {http, {RequestId, stream, Data}}               - > do something...;
>>         {http, {RequestId, stream_end, Headers}}   -> do something...;
>>     end,
>>     receive_data(RequestId).
>>
>>
>> Is the above how I should be structuring my module?
>>
>> Kind regards
>>
>> John Duffy
>>
>>
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions
>>
>
>
>
> --
> Best regards,
> Paul Peregud
> +48602112091
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Streaming Data using httpc

John Duffy
In reply to this post by John Duffy
Hi Jesper

Thank you for your reply, very helpful.

I'm still a bit puzzled as to why my test doesn't work, I'm being to wonder if  I need to be sending additional headers, or something, to the server.

If I use 'curl' then everything works, I get a steady stream of data...

curl "http://stream-sandbox.oanda.com/v1/prices?accountId=99999&instruments=EUR_USD"

However, putting the same URL into your example results in a single time-out and the Erlang emulator stalling.

Kind regards

John

----Original message----
From : [hidden email]
Date : 19/04/2015 - 22:00 (GMTDT)
To : [hidden email], [hidden email]
Subject : Re: [erlang-questions] Streaming Data using httpc



On Sun, Apr 19, 2015 at 6:14 PM John Duffy <[hidden email]> wrote:


-module(streaming).


[...]

You are pretty close to the goal, but you are confusing the stream/receiver options I think. In streaming, you will receive the data as a series of chunks, which is what your code expect, but you don't supply an option requesting streaming operation. So you don't retrieve an expected tuple. You can add a catchall tuple to your receieve clause in receive_data/1 to make sure you have the right format in your match. Also, you can add a 'after' clause to time out after a while. This can make debugging easier since you "get back" to the the REPL.

The following works on a quick test in my end. Note how the receive clause is different from yours, and that you get everything in one fell swoop, rather than having to match on a multitude of clauses.

For more serious work, you might want to check out some of the numerous other projects for HTTP client requests. I'm partial to Gun and Hackney myself, but there are also ibrowse, lhttpc and fusco. They have slightly different semantics and areas at which they excel, so choose wisely :)

For a prototype however, I think httpc is fine.

-module(streaming).

-export([data/0]). 

data() ->
    {ok, RequestId} = httpc:request(get, {"example.com", []}, [], [{sync, false}, {receiver, self()}]),
    receive_data(RequestId).

receive_data(RequestId) ->
    receive
        {http, {RequestId, {StatusLine, Headers, Body}}} ->
            error_logger:info_report(
            #{
             status => StatusLine,
             headers => length(Headers),
             body_size => byte_size(Body)
            }),
            ok
    after 5000 ->
            error_logger:info_report(timeout)
    end.




_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Streaming Data using httpc

Antoine Koener

Le 20 avr. 2015 à 13:05, John Duffy <[hidden email]> a écrit :

Hi Jesper

Thank you for your reply, very helpful.

I'm still a bit puzzled as to why my test doesn't work, I'm being to wonder if  I need to be sending additional headers, or something, to the server.

If I use 'curl' then everything works, I get a steady stream of data...


The problem seems to be the server because if you use -v for curl you will observe something like this:

< HTTP/1.1 200 Ok
* Server openresty/1.7.0.1 is not blacklisted
< Server: openresty/1.7.0.1
< Date: Tue, 21 Apr 2015 15:15:16 GMT
< Content-Type: application/json
< Transfer-Encoding: chunked
< Connection: close
< Access-Control-Allow-Origin: *
<
{"tick":{"instrument":"EUR_CHF","time":"2015-04-21T15:13:48.585046Z","bid":1.2041,"ask":1.20435}}
{"heartbeat":{"time":"2015-04-21T15:15:16.632703Z"}}
{"heartbeat":{"time":"2015-04-21T15:15:21.632772Z"}}
{"heartbeat":{"time":"2015-04-21T15:15:24.598956Z"}}

As you can see there's some strange headers:
Connection: close  
The connection is not closed because it's a stream.

Transfer-Encoding: chunked
What I see is absolutely not chunked transfer, it's a bunch of json lines...
Chunks should be preceded by the size (hex encoded) and \r\n

So I think that the erlang code is trying to respect headers, close the connection and search for chunk encoding, but there's none...

It might be interesting to report those problems to the developers of this service :)





However, putting the same URL into your example results in a single time-out and the Erlang emulator stalling.

Kind regards

John

----Original message----
From : [hidden email]
Date : 19/04/2015 - 22:00 (GMTDT)
To : [hidden email], [hidden email]
Subject : Re: [erlang-questions] Streaming Data using httpc



On Sun, Apr 19, 2015 at 6:14 PM John Duffy <[hidden email]> wrote:


-module(streaming).


[...]

You are pretty close to the goal, but you are confusing the stream/receiver options I think. In streaming, you will receive the data as a series of chunks, which is what your code expect, but you don't supply an option requesting streaming operation. So you don't retrieve an expected tuple. You can add a catchall tuple to your receieve clause in receive_data/1 to make sure you have the right format in your match. Also, you can add a 'after' clause to time out after a while. This can make debugging easier since you "get back" to the the REPL.

The following works on a quick test in my end. Note how the receive clause is different from yours, and that you get everything in one fell swoop, rather than having to match on a multitude of clauses.

For more serious work, you might want to check out some of the numerous other projects for HTTP client requests. I'm partial to Gun and Hackney myself, but there are also ibrowse, lhttpc and fusco. They have slightly different semantics and areas at which they excel, so choose wisely :)

For a prototype however, I think httpc is fine.

-module(streaming).

-export([data/0]). 

data() ->
    {ok, RequestId} = httpc:request(get, {"example.com", []}, [], [{sync, false}, {receiver, self()}]),
    receive_data(RequestId).

receive_data(RequestId) ->
    receive
        {http, {RequestId, {StatusLine, Headers, Body}}} ->
            error_logger:info_report(
            #{
             status => StatusLine,
             headers => length(Headers),
             body_size => byte_size(Body)
            }),
            ok
    after 5000 ->
            error_logger:info_report(timeout)
    end.




_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Streaming Data using httpc

Fred Hebert-2
On 04/21, Antoine Koener wrote:
>
>The problem seems to be the server because if you use -v for curl you will observe something like this:
>
> [...]
>
>As you can see there's some strange headers:
>Connection: close
>The connection is not closed because it's a stream.

The connection:close header is used to say that once the request is
done, and to avoid attempting to reuse it as keep-alive for a follow-up
request. There's nothing bad about it, and it is not relevant to the
fact you are streaming. The streaming is part of chunked encoding (which
has no content length known ahead of time, and is self-delimiting),
related to the body, not the connection.

>
>Transfer-Encoding: chunked
>What I see is absolutely not chunked transfer, it's a bunch of json lines...
>Chunks should be preceded by the size (hex encoded) and \r\n
>

That's because you need to use `--raw` in curl to avoid seeing the
content body already decoded. `-v` will decode chunked content and
display it as it comes. You can also try a tcpdump to see the raw data.

>It might be interesting to report those problems to the developers of
>this service :)
>

Don't. Nothing you specified there is actually a problem unless they
advertise keep-alive connections forever.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Streaming Data using httpc

John Duffy
Fred, Antoine

Thank you for your replies. I will investigate further...

Kind regards

John


----Original message----
From : [hidden email]
Date : 21/04/2015 - 16:41 (GMTDT)
To : [hidden email]
Cc : [hidden email], [hidden email]
Subject : Re: [erlang-questions] Streaming Data using httpc

On 04/21, Antoine Koener wrote:
>
>The problem seems to be the server because if you use -v for curl you will observe something like this:
>
> [...]
>
>As you can see there's some strange headers:
>Connection: close
>The connection is not closed because it's a stream.

The connection:close header is used to say that once the request is
done, and to avoid attempting to reuse it as keep-alive for a follow-up
request. There's nothing bad about it, and it is not relevant to the
fact you are streaming. The streaming is part of chunked encoding (which
has no content length known ahead of time, and is self-delimiting),
related to the body, not the connection.

>
>Transfer-Encoding: chunked
>What I see is absolutely not chunked transfer, it's a bunch of json lines...
>Chunks should be preceded by the size (hex encoded) and \r\n
>

That's because you need to use `--raw` in curl to avoid seeing the
content body already decoded. `-v` will decode chunked content and
display it as it comes. You can also try a tcpdump to see the raw data.

>It might be interesting to report those problems to the developers of
>this service :)
>

Don't. Nothing you specified there is actually a problem unless they
advertise keep-alive connections forever.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: Streaming Data using httpc

John Duffy
In reply to this post by Fred Hebert-2
Fred

By the way... a great book! It is constantly open on my desk at the moment as I try to get to grips with Erlang.

Kind regards

John Duffy


----Original message----
From : [hidden email]
Date : 21/04/2015 - 16:41 (GMTDT)
To : [hidden email]
Cc : [hidden email], [hidden email]
Subject : Re: [erlang-questions] Streaming Data using httpc

On 04/21, Antoine Koener wrote:
>
>The problem seems to be the server because if you use -v for curl you will observe something like this:
>
> [...]
>
>As you can see there's some strange headers:
>Connection: close
>The connection is not closed because it's a stream.

The connection:close header is used to say that once the request is
done, and to avoid attempting to reuse it as keep-alive for a follow-up
request. There's nothing bad about it, and it is not relevant to the
fact you are streaming. The streaming is part of chunked encoding (which
has no content length known ahead of time, and is self-delimiting),
related to the body, not the connection.

>
>Transfer-Encoding: chunked
>What I see is absolutely not chunked transfer, it's a bunch of json lines...
>Chunks should be preceded by the size (hex encoded) and \r\n
>

That's because you need to use `--raw` in curl to avoid seeing the
content body already decoded. `-v` will decode chunked content and
display it as it comes. You can also try a tcpdump to see the raw data.

>It might be interesting to report those problems to the developers of
>this service :)
>

Don't. Nothing you specified there is actually a problem unless they
advertise keep-alive connections forever.



_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions