List Question

Previous Topic Next Topic
 
classic Classic list List threaded Threaded
22 messages Options
12
Reply | Threaded
Open this post in threaded view
|

List Question

Andrew McIntyre
Hello All,

A Newbie question,

Can I tell the difference Between a List of Strings and a Single
String?

eg the Value in a List may just be "FT" or it might be ["FT", "ST"]

I do not know in advance if its a single string or a list of strings,
but want to behave differently - actually return the first element of
the list when there are multiple values. I have parsed a variable
hierarchical structure and the hierarchy can vary over instances.

As strings are lists "is_list" returns true for both. Is that soluble
or do I need to change the data structure to indicate it. Sorry if
this has obvious answer, but new to erlang.

Thanks

--
Best regards,
Andrew McIntyre                     mailto:[hidden email]

sent from a real computer

R&D Director
Medical-Objects
4/102 Wises Road
MAROOCHYDORE Q 4558
AUSTRALIA

www.medical-objects.com.au


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

zxq9-2
On 2017年08月07日 月曜日 22:13:23 Andrew McIntyre wrote:
> Hello All,
>
> A Newbie question,
>
> Can I tell the difference Between a List of Strings and a Single
> String?
>
> eg the Value in a List may just be "FT" or it might be ["FT", "ST"]

Using a match this is pretty straight forward.


choose_your_adventure("FT") ->
    path_one();
choose_your_adventure(["FT" | Rest]) ->
    ok = path_two(),
    choose_your_adventure(Rest).

Generally speaking, if your system is the one generating the lists/strings then it is useful to produce them in a way that is more semantically unambiguous.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Dmytro Lytovchenko
In reply to this post by Andrew McIntyre
A string is a list of integers [$F, $T] (where $x is a code for the letter x and so on)
A list of strings is a — List of lists of integers. That is [ [$F, $T], [$S, $T] ]
So to check whether a value is a list of lists you take its first element and check whether it's also a list

function([X | _] = List) when is_list(X) -> got a nonempty list of strings!
function([X | _]) when is_list(X) -> got a nonempty list of integers (one string)
function([]) -> i am sorry, empty list is here

there is no way to know having an empty list what it supposed to contain. So an empty list is an empty list

2017-08-07 14:13 GMT+02:00 Andrew McIntyre <[hidden email]>:
Hello All,

A Newbie question,

Can I tell the difference Between a List of Strings and a Single
String?

eg the Value in a List may just be "FT" or it might be ["FT", "ST"]

I do not know in advance if its a single string or a list of strings,
but want to behave differently - actually return the first element of
the list when there are multiple values. I have parsed a variable
hierarchical structure and the hierarchy can vary over instances.

As strings are lists "is_list" returns true for both. Is that soluble
or do I need to change the data structure to indicate it. Sorry if
this has obvious answer, but new to erlang.

Thanks

--
Best regards,
Andrew McIntyre                     mailto:[hidden email]

sent from a real computer

R&D Director
Medical-Objects
4/102 Wises Road
MAROOCHYDORE Q 4558
AUSTRALIA

www.medical-objects.com.au


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

zxq9-2
In reply to this post by zxq9-2
On 2017年08月07日 月曜日 22:29:31 you wrote:
> Hello zxq9,
>
> Thanks, Unfortunately I do not know the value of the string that will
> be there. Its an extensible hierarchy that can be several lists deep -
> or not. Might need to revise the data structure

In this case it can be useful to consider a way of tagging values.

Imagine we want to represent a directory tree structure and have a descent-first traversal function recurse over it while creating the tree. We have two things that can happen, there is a flat list of new directories that need to be created, and there is the possibility that the tree depth extends deeper at each node.

The naive version would look like what you have:

["top_dir_1",
 "top_dir_2",
 ["next_level_1",
  "next_level_2"]]

This leaves a bit to be desired, not only because of the problem you have pointed out that makes it difficult to know what is deep and what is shallow, but also because we don't really have a good way to represent a full tree (what would be the name of a directory containing other directories?).

So consider instead something like this:

[{"top_dir_1", []},
 {"top_dir_2", []},
 {"top_dir_3",
  [{"next_level_1", []},
   {"next_level_2", []}]}]

Now we have a representation of each directory's name AND its contents.

We can traverse this laterally AND in depth without any ambiguity or need for carrying around a record of where we have been (by using depth recursion and tail-call recursion):


make_tree([{Dir, Contents} | Rest]) ->
    ok =
        case filelib:is_dir(Dir) of
            true ->
                ok;
            false ->
                ok = log(info, "Creating dir: ~p", [Dir]),
                file:make_dir(Dir)
        end,
    ok = file:set_cwd(Dir),
    ok = make_tree(Contents),
    ok = file:set_cwd(".."),
    make_tree(Rest);
make_tree([]) ->
    ok.


Not so bad.

In your case we could represent things perhaps a bit better by separating the types and tagging them. Instead of just "FT" and whatever other string labels you might want, you could either use atoms (totally unambiguous) or tuples as we have in the example able (also totally unambiguous). I prefer tuples, though, because they are easier to read.

[{value, "foo"},
 {tree,
  [{value, "bar"},
   {value, "foo"}]},
 {value, "baz"}]


So then we do something like:


traverse([{value, Value} | Rest]) ->
   ok = do_thing(Value),
   traverse(Rest);
traverse([{tree, Contents} | Rest]) ->
   ok = traverse(Contents),
   traverse(Rest);
traverse([]) ->
   ok.


Anyway, don't be afraid of varying your value types to say exactly what you mean. If your strings like "FT" only had meaning within your system consider NOT USING STRINGS, and using atoms instead. That makes it even easier:


[foo,
 bar,
 [foo,
  bar],
 foo]


So then we can do:


traverse([foo | Rest]) ->
    ok = do_foo(),
    traverse(Rest);
traverse([bar | Rest]) ->
    ok = do_bar(),
    traverse(Rest);
traverse([Value | Rest]) when is_list(Value) ->
    ok = traverse(Value),
    traverse(Rest);
traverse([]) ->
    ok.


And of course, you can not use a guard if you want to match on a list shape in the listy clause there, but that is a minor detail. The point is to make your data types MEAN SOMETHING REASONABLE within your system. Use atoms when your values are meaningful only within your system. Strings are for the birds.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Oliver Korpilla
In reply to this post by Andrew McIntyre
Hello.

I would have done something like this:

is_string(Input) when is_list(Input) -> lists:all(fun erlang:is_integer/1, Input);
is_string(_Input) -> false.
 
This works fine for regular strings (lists of integers).

Erlang, however, often uses a construct called iolist for performant formatting of output. And these can contain lists of strings or list of lists in arbitrary depth, also not requiring them to be proper lists (they need not end with the empty list []). There more work would be required.

Cheers,
Oliver


Gesendet: Montag, 07. August 2017 um 14:13 Uhr
Von: "Andrew McIntyre" <[hidden email]>
An: "Erlang-Questions Questions" <[hidden email]>
Betreff: [erlang-questions] List Question
Hello All,

A Newbie question,

Can I tell the difference Between a List of Strings and a Single
String?

eg the Value in a List may just be "FT" or it might be ["FT", "ST"]

I do not know in advance if its a single string or a list of strings,
but want to behave differently - actually return the first element of
the list when there are multiple values. I have parsed a variable
hierarchical structure and the hierarchy can vary over instances.

As strings are lists "is_list" returns true for both. Is that soluble
or do I need to change the data structure to indicate it. Sorry if
this has obvious answer, but new to erlang.

Thanks

--
Best regards,
Andrew McIntyre mailto:[hidden email]

sent from a real computer

R&D Director
Medical-Objects
4/102 Wises Road
MAROOCHYDORE Q 4558
AUSTRALIA

www.medical-objects.com.au[http://www.medical-objects.com.au]


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions[http://erlang.org/mailman/listinfo/erlang-questions]
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Andrew McIntyre
In reply to this post by zxq9-2
Hello Craig,

Thanks for your help.

I am trying to store the data as efficiently as possible. Its HL7
natively and this is my test:

OBX|17|FT~TEST|8265-1^^LN&SUBCOMP|1&2&3&4|\H\Spot Image 2\N\||||||F

|~^& are delimiters. The hierarchy is only so deep and using lists of
lists to provide a tree like way to access the data eg Field 3, repeat
1 component 2 subcomponent1

Parsed it looks like this:

[["OBX","17",
  ["FT","TEST"],
  [["8265-1",[],["LN","SUBCOMP"]]],
  [[["1","2","3","4"]]],
  "\\H\\Spot Image 2\\N\\",[],[],[],[],[],"F"]]

As the format evolves over time the hierarchy can be extended, but
older clients can still read the value they are expecting if they
follow the rules, like reading the first value in the list when you
only expect one value to be there.

Currently a typical system might have 12 million of these records so
want to keep format as small as possible in the erlang format, hence
reluctant to tag 2 much, but know how to get value of interest. Maybe
that is my non erlang background showing up? Traversing 4 small lists
by index should be fast??

I guess I could save strings as binary in the lists then is_binary
should work?? Is that the case. I gather on 64bit system especially
binary is more space efficient.

Monday, August 7, 2017, 10:53:11 PM, you wrote:

z> On 2017年08月07日 月曜日 22:29:31 you wrote:
>> Hello zxq9,
>>
>> Thanks, Unfortunately I do not know the value of the string that will
>> be there. Its an extensible hierarchy that can be several lists deep -
>> or not. Might need to revise the data structure

z> In this case it can be useful to consider a way of tagging values.

z> Imagine we want to represent a directory tree structure and have a
z> descent-first traversal function recurse over it while creating the
z> tree. We have two things that can happen, there is a flat list of
z> new directories that need to be created, and there is the
z> possibility that the tree depth extends deeper at each node.

z> The naive version would look like what you have:

z> ["top_dir_1",
z>  "top_dir_2",
z>  ["next_level_1",
z>   "next_level_2"]]

z> This leaves a bit to be desired, not only because of the problem
z> you have pointed out that makes it difficult to know what is deep
z> and what is shallow, but also because we don't really have a good
z> way to represent a full tree (what would be the name of a directory containing other directories?).

z> So consider instead something like this:

z> [{"top_dir_1", []},
z>  {"top_dir_2", []},
z>  {"top_dir_3",
z>   [{"next_level_1", []},
z>    {"next_level_2", []}]}]

z> Now we have a representation of each directory's name AND its contents.

z> We can traverse this laterally AND in depth without any ambiguity
z> or need for carrying around a record of where we have been (by
z> using depth recursion and tail-call recursion):


z> make_tree([{Dir, Contents} | Rest]) ->
z>     ok =
z>         case filelib:is_dir(Dir) of
z>             true ->
z>                 ok;
z>             false ->
z>                 ok = log(info, "Creating dir: ~p", [Dir]),
z>                 file:make_dir(Dir)
z>         end,
z>     ok = file:set_cwd(Dir),
z>     ok = make_tree(Contents),
z>     ok = file:set_cwd(".."),
z>     make_tree(Rest);
make_tree([]) ->>
z>     ok.


z> Not so bad.

z> In your case we could represent things perhaps a bit better by
z> separating the types and tagging them. Instead of just "FT" and
z> whatever other string labels you might want, you could either use
z> atoms (totally unambiguous) or tuples as we have in the example
z> able (also totally unambiguous). I prefer tuples, though, because they are easier to read.

z> [{value, "foo"},
z>  {tree,
z>   [{value, "bar"},
z>    {value, "foo"}]},
z>  {value, "baz"}]


z> So then we do something like:


z> traverse([{value, Value} | Rest]) ->
z>    ok = do_thing(Value),
z>    traverse(Rest);
z> traverse([{tree, Contents} | Rest]) ->
z>    ok = traverse(Contents),
z>    traverse(Rest);
traverse([]) ->>
z>    ok.


z> Anyway, don't be afraid of varying your value types to say exactly
z> what you mean. If your strings like "FT" only had meaning within
z> your system consider NOT USING STRINGS, and using atoms instead. That makes it even easier:


z> [foo,
z>  bar,
z>  [foo,
z>   bar],
z>  foo]


z> So then we can do:


z> traverse([foo | Rest]) ->
z>     ok = do_foo(),
z>     traverse(Rest);
z> traverse([bar | Rest]) ->
z>     ok = do_bar(),
z>     traverse(Rest);
z> traverse([Value | Rest]) when is_list(Value) ->
z>     ok = traverse(Value),
z>     traverse(Rest);
traverse([]) ->>
z>     ok.


z> And of course, you can not use a guard if you want to match on a
z> list shape in the listy clause there, but that is a minor detail.
z> The point is to make your data types MEAN SOMETHING REASONABLE
z> within your system. Use atoms when your values are meaningful only
z> within your system. Strings are for the birds.

z> -Craig
z> _______________________________________________
z> erlang-questions mailing list
z> [hidden email]
z> http://erlang.org/mailman/listinfo/erlang-questions



--
Best regards,
 Andrew                             mailto:[hidden email]

sent from a real computer


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

zxq9-2
On 2017年08月07日 月曜日 23:46:52 you wrote:

> Hello Craig,
>
> Thanks for your help.
>
> I am trying to store the data as efficiently as possible. Its HL7
> natively and this is my test:
>
> OBX|17|FT~TEST|8265-1^^LN&SUBCOMP|1&2&3&4|\H\Spot Image 2\N\||||||F
>
> |~^& are delimiters. The hierarchy is only so deep and using lists of
> lists to provide a tree like way to access the data eg Field 3, repeat
> 1 component 2 subcomponent1
>
> Parsed it looks like this:
>
> [["OBX","17",
>   ["FT","TEST"],
>   [["8265-1",[],["LN","SUBCOMP"]]],
>   [[["1","2","3","4"]]],
>   "\\H\\Spot Image 2\\N\\",[],[],[],[],[],"F"]]
>
> As the format evolves over time the hierarchy can be extended, but
> older clients can still read the value they are expecting if they
> follow the rules, like reading the first value in the list when you
> only expect one value to be there.
>
> Currently a typical system might have 12 million of these records so
> want to keep format as small as possible in the erlang format, hence
> reluctant to tag 2 much, but know how to get value of interest. Maybe
> that is my non erlang background showing up? Traversing 4 small lists
> by index should be fast??
>
> I guess I could save strings as binary in the lists then is_binary
> should work?? Is that the case. I gather on 64bit system especially
> binary is more space efficient.

We would really want to know a bit about the data's natural semantics before leaping to conclusions regarding "most efficient" or especially "best" internal representation.

On the note of "internal representation"... do you need 12 million records in memory at a given time, or are these things you can store as strings and then pull the ones you want as you go? Do you need random access to records? To be able to query them? Are you performing aggregate operations over them? All of these might change the answer of "what is best".

For example, if you need a few tens of millions of records in memory at a given time AND the short strings like "FT" and "TEST" are always members of a smallish, finite set you know beforehand then atoms will be much more space efficient if that is your main concern. Atoms are tiny in memory (essentially like using integer constants in a C program), but they serialize back to strings in storage so that can change things. For example, the atom 'test' (or 'TEST') is smaller in memory than either the string "TEST" or the binary <<"TEST">>.

In any case, just to get the code right first so you have a platform to change things around I would recommend TOTALLY FORGETTING ABOUT THIS to start with and jump straight to tagged pairs. This will keep your semantics straight and dramatically reduce the mental overhead of writing traversal functions. You can switch that stuff around later -- but first you need things to work at least partially before you can start measuring stuff.

For example:

> [["OBX","17",
>   ["FT","TEST"],
>   [["8265-1",[],["LN","SUBCOMP"]]],
>   [[["1","2","3","4"]]],
>   "\\H\\Spot Image 2\\N\\",[],[],[],[],[],"F"]]

I would parse that to something like this:

[{foo, "OBX"},
 {bar, 17},
 {baz, {"FT", "TEST"}},
 [[{biz, "8265-1"}
   {boz, []},
   {ballz, {"LN", "SUBCOMP"}}]],
 [[{bonz, [1, 2, 3, 4]}]],
 {coz, "\\H\\Spot Image 2\\N\\"},
 {fuzz, "F"}]

The leading tag should always disambiguate the MEANING of the element you are encountering in your lists. Here I used junk terms, but I think you get the idea. Whatever atom labels you use should be unambiguous and mean something to the humans reading the code (or crash dumps).

The overhead here VS the list version is the atoms and enclosing tuple (a few bytes for both) -- which is negligible compared to the cost of using actual strings directly in your code. Remember, an atom is basically an alias for a constant, so don't fret over it too much -- just don't start doing insane things like dynamically generating an arbitrary number of unique atoms!

But again, remember, we don't care how big this is in memory yet. What we care about is that it is easy to traverse and know what you mean when you do so, and also that it is easy to unambiguously cast the data from an old format to a new one (which is inherently easier with tagged data than with semantically amibugous lists of lists).

Obviously giving further advice would be a bit easier if I knew more about the meaning of the data itself, but anyway, the primary concern here is getting the logic right first. You can crunch things down later -- and most of the time that usually has much less to do with finding a "small" internal representation for those moments while you are actively handling some bits of data in memory and a LOT more to do with how you are going to index serialized data on disk (ETS/DETS/Mnesia may be a good route for that, or Postgres, or maps, whatever -- depending on what you're up to).

Hopefully I explained more than confused.

-Craig
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Juan Jose Comellas-3
In reply to this post by Andrew McIntyre
Andrew, if you want to store the data in a format that is as compact as possible, I'd recommend storing the HL7 message itself as a binary and parsing on demand. If you want to store the data pre-parsed, then I would store them as list of segments where each segment is represented by a nested tuple. That way you can reference the fields, components, etc., by their index in an O(1) operation, and you can still easily add or remove segments from a message.

What I'm describing is similar to the intermediate format used by an HL7 parser (https://github.com/jcomellas/ex_hl7) I wrote for Elixir. You could probably use it as inspiration for what you need. I had also created another parser in Erlang (https://github.com/jcomellas/ehl7) that maps the segments to records, but part of it is in C using NIFs.

Let me know if you have any other doubts.



On Mon, Aug 7, 2017 at 10:46 AM, Andrew McIntyre <[hidden email]> wrote:
Hello Craig,

Thanks for your help.

I am trying to store the data as efficiently as possible. Its HL7
natively and this is my test:

OBX|17|FT~TEST|8265-1^^LN&SUBCOMP|1&2&3&4|\H\Spot Image 2\N\||||||F

|~^& are delimiters. The hierarchy is only so deep and using lists of
lists to provide a tree like way to access the data eg Field 3, repeat
1 component 2 subcomponent1

Parsed it looks like this:

[["OBX","17",
  ["FT","TEST"],
  [["8265-1",[],["LN","SUBCOMP"]]],
  [[["1","2","3","4"]]],
  "\\H\\Spot Image 2\\N\\",[],[],[],[],[],"F"]]

As the format evolves over time the hierarchy can be extended, but
older clients can still read the value they are expecting if they
follow the rules, like reading the first value in the list when you
only expect one value to be there.

Currently a typical system might have 12 million of these records so
want to keep format as small as possible in the erlang format, hence
reluctant to tag 2 much, but know how to get value of interest. Maybe
that is my non erlang background showing up? Traversing 4 small lists
by index should be fast??

I guess I could save strings as binary in the lists then is_binary
should work?? Is that the case. I gather on 64bit system especially
binary is more space efficient.

Monday, August 7, 2017, 10:53:11 PM, you wrote:

z> On 2017年08月07日 月曜日 22:29:31 you wrote:
>> Hello zxq9,
>>
>> Thanks, Unfortunately I do not know the value of the string that will
>> be there. Its an extensible hierarchy that can be several lists deep -
>> or not. Might need to revise the data structure

z> In this case it can be useful to consider a way of tagging values.

z> Imagine we want to represent a directory tree structure and have a
z> descent-first traversal function recurse over it while creating the
z> tree. We have two things that can happen, there is a flat list of
z> new directories that need to be created, and there is the
z> possibility that the tree depth extends deeper at each node.

z> The naive version would look like what you have:

z> ["top_dir_1",
z>  "top_dir_2",
z>  ["next_level_1",
z>   "next_level_2"]]

z> This leaves a bit to be desired, not only because of the problem
z> you have pointed out that makes it difficult to know what is deep
z> and what is shallow, but also because we don't really have a good
z> way to represent a full tree (what would be the name of a directory containing other directories?).

z> So consider instead something like this:

z> [{"top_dir_1", []},
z>  {"top_dir_2", []},
z>  {"top_dir_3",
z>   [{"next_level_1", []},
z>    {"next_level_2", []}]}]

z> Now we have a representation of each directory's name AND its contents.

z> We can traverse this laterally AND in depth without any ambiguity
z> or need for carrying around a record of where we have been (by
z> using depth recursion and tail-call recursion):


z> make_tree([{Dir, Contents} | Rest]) ->
z>     ok =
z>         case filelib:is_dir(Dir) of
z>             true ->
z>                 ok;
z>             false ->
z>                 ok = log(info, "Creating dir: ~p", [Dir]),
z>                 file:make_dir(Dir)
z>         end,
z>     ok = file:set_cwd(Dir),
z>     ok = make_tree(Contents),
z>     ok = file:set_cwd(".."),
z>     make_tree(Rest);
make_tree([]) ->>
z>     ok.


z> Not so bad.

z> In your case we could represent things perhaps a bit better by
z> separating the types and tagging them. Instead of just "FT" and
z> whatever other string labels you might want, you could either use
z> atoms (totally unambiguous) or tuples as we have in the example
z> able (also totally unambiguous). I prefer tuples, though, because they are easier to read.

z> [{value, "foo"},
z>  {tree,
z>   [{value, "bar"},
z>    {value, "foo"}]},
z>  {value, "baz"}]


z> So then we do something like:


z> traverse([{value, Value} | Rest]) ->
z>    ok = do_thing(Value),
z>    traverse(Rest);
z> traverse([{tree, Contents} | Rest]) ->
z>    ok = traverse(Contents),
z>    traverse(Rest);
traverse([]) ->>
z>    ok.


z> Anyway, don't be afraid of varying your value types to say exactly
z> what you mean. If your strings like "FT" only had meaning within
z> your system consider NOT USING STRINGS, and using atoms instead. That makes it even easier:


z> [foo,
z>  bar,
z>  [foo,
z>   bar],
z>  foo]


z> So then we can do:


z> traverse([foo | Rest]) ->
z>     ok = do_foo(),
z>     traverse(Rest);
z> traverse([bar | Rest]) ->
z>     ok = do_bar(),
z>     traverse(Rest);
z> traverse([Value | Rest]) when is_list(Value) ->
z>     ok = traverse(Value),
z>     traverse(Rest);
traverse([]) ->>
z>     ok.


z> And of course, you can not use a guard if you want to match on a
z> list shape in the listy clause there, but that is a minor detail.
z> The point is to make your data types MEAN SOMETHING REASONABLE
z> within your system. Use atoms when your values are meaningful only
z> within your system. Strings are for the birds.

z> -Craig
z> _______________________________________________
z> erlang-questions mailing list
z> [hidden email]
z> http://erlang.org/mailman/listinfo/erlang-questions



--
Best regards,
 Andrew                             mailto:[hidden email]

sent from a real computer


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Richard A. O'Keefe-2
In reply to this post by Andrew McIntyre


On 8/08/17 12:13 AM, Andrew McIntyre wrote:
> Hello All,
>
> A Newbie question,
>
> Can I tell the difference Between a List of Strings and a Single
> String?

Not if it is an empty list.
If it is OK to treat an empty list of strings as
if it were a list containing one empty string,
you can use this:

classify([[_|_]|_]) -> list;
classify([[]|_]   ) -> list;
classify([C|_]    ) when is_integer(C) -> string;
classify([]       ) -> string. % GUESS

>
> I do not know in advance if its a single string or a list of strings,
> but want to behave differently

In this case you have a poor data structure choice.
You should be passing {string,S} or {string_list,L}
or something like that.  Or you might want to use
binaries instead of character lists.

  - actually return the first element of
> the list when there are multiple values.

In this particular case, it sounds as though

first_string([S|]) when is_list(S) -> S;
first_string(S)    when is_list(S) -> S.

might do.

But seriously, one of the first rules in any Lispy
language, like Lisp, Scheme, Pop-2, Pop-11, Prolog,
or Erlang, is Don't Make Your Data Ambiguous.

It's OK to have a data structure that could, in
a different context, be interpreted differently,
but if you pass something to a function, it is
up to you to make sure the function gets all the
information it needs.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Richard A. O'Keefe-2
In reply to this post by Andrew McIntyre


On 8/08/17 1:46 AM, Andrew McIntyre wrote:
> Hello Craig,
>
> Thanks for your help.
>
> I am trying to store the data as efficiently as possible.
What do you mean "efficiently"?

Do you mean "in the least SPACE possible"?
In that case you should consider using binaries instead
of strings.

Do you mean "taking the least TIME possible for operations"?
In that case, you need to tag your data so that the code
that carries out the operations KNOWS right away what it is
dealing with.

> Its HL7
> natively and this is my test:

hl7.org lists a lot of standards; being unfamiliar with
them all I don't know which you are referring to.  And
you have to sign up in order to read any of them, and I
dislike signing up for things when I don't know the
consequences of signing up.

>
> OBX|17|FT~TEST|8265-1^^LN&SUBCOMP|1&2&3&4|\H\Spot Image 2\N\||||||F
>
> |~^& are delimiters.

I take it that "|" is the first level delimiter (why the heck
are they not using the Information Separator characters ASCII
has specifically for this purpose?), ~ is the second level,
^ is the third level, and & is the fourth level.

Ah, wait:

<hl7 message> ::= <hl7 segment> {"\r" <hl7 segment>}...

<hl7 segment> ::= <hl7 field> {"|" <hl7 field>}...

<hl7 field> ::= <hl7 subfield> {"^" <hl7 subfield>}...

<hl7 subfield> ::= <hl7 subsubfield> {"&" <subsubfield>}...

<hl7 subsubfield> ::= <hl7 repeating> {"~" <hl7 repeating>}

<hl7 repeating> ::= {[^\r|^&~\\]|\.]}...

The first field of a segment appears to be a
message type dentifier.

One data structure for this would be
-type message()     = list(segment()).
-type segment()     = list(field()).
-type field()       = list(subfield()).
-type subfield()    = list(subsubfield())
-type subsubfield() = list(repeating()).
-type repeating()   = string().

[[[["OBX"]]], [[["17"]]], [[["FT", "TEST"]]], [[["8265-1"]], [[[]]],
[["LN"], ["SUBCOMP"]]], [[["1"], ["2"], ["3"], ["4"]]], [[["\\H\\Spot
Image 2\\N\\"]]], [[[""]]], [[[""]]], [[[""]]], [[[""]]], [[[""]]],
[[["F"]]]]

This is completely unambiguous.

Now let's suppose we use a slightly different data structure.
-type hier() = string() | {many,list(hier())}.

We get
{many,["OBX","17",{many,["FT","TEST"]},{many,["8265-1","",{many,["LN","SUBCOMP"]}]},{many,["1","2","3","4"]},"\\H\\Spot
Image 2\\N\\","","","","","","F"]}

Note that this is unambiguous but pays the price of a
{many,_} wrapper only when there is more than one subpart,
so the relative overhead is not so large.

first({many,[H|_]) -> first(H);
first(X)           -> X.


> Currently a typical system might have 12 million of these records so
> want to keep format as small as possible in the erlang format,

The way to do that would be to compress each segment, keep the entire
segment as a binary, and only decompress and parse it when you need
to look at that particular segment.

A rough estimate of the size of this particular example is 108 words,
which is actually less than the string took.  Let's say that an
average message takes 200 words in the {many,_} | string() form.
32-bit:  9.6 GB, not going to fit.
64-bit: 19.2 GB, this machine has 16 GB, oh dear.

Even if we assume the average segment is 100 bytes,
12 million of those is 1.2 GB (excluding the cost
of a binary for each segment.  That's definitely
doable.

So we're looking at three options, I think:
(1) Keeping segments as binaries, and parsing them every time
     you need to look inside.
(2) Keeping segments as Erlang data structures in a Mnesia
     table (that is, on disc).
(3) Streaming messages through an Erlang program instead of
     storing them all in memory at once.



  hence

> reluctant to tag 2 much, but know how to get value of interest. Maybe
> that is my non erlang background showing up? Traversing 4 small lists
> by index should be fast??
>
> I guess I could save strings as binary in the lists then is_binary
> should work?? Is that the case. I gather on 64bit system especially
> binary is more space efficient.
>
> Monday, August 7, 2017, 10:53:11 PM, you wrote:
>
> z> On 2017年08月07日 月曜日 22:29:31 you wrote:
>>> Hello zxq9,
>>>
>>> Thanks, Unfortunately I do not know the value of the string that will
>>> be there. Its an extensible hierarchy that can be several lists deep -
>>> or not. Might need to revise the data structure
>
> z> In this case it can be useful to consider a way of tagging values.
>
> z> Imagine we want to represent a directory tree structure and have a
> z> descent-first traversal function recurse over it while creating the
> z> tree. We have two things that can happen, there is a flat list of
> z> new directories that need to be created, and there is the
> z> possibility that the tree depth extends deeper at each node.
>
> z> The naive version would look like what you have:
>
> z> ["top_dir_1",
> z>  "top_dir_2",
> z>  ["next_level_1",
> z>   "next_level_2"]]
>
> z> This leaves a bit to be desired, not only because of the problem
> z> you have pointed out that makes it difficult to know what is deep
> z> and what is shallow, but also because we don't really have a good
> z> way to represent a full tree (what would be the name of a directory containing other directories?).
>
> z> So consider instead something like this:
>
> z> [{"top_dir_1", []},
> z>  {"top_dir_2", []},
> z>  {"top_dir_3",
> z>   [{"next_level_1", []},
> z>    {"next_level_2", []}]}]
>
> z> Now we have a representation of each directory's name AND its contents.
>
> z> We can traverse this laterally AND in depth without any ambiguity
> z> or need for carrying around a record of where we have been (by
> z> using depth recursion and tail-call recursion):
>
>
> z> make_tree([{Dir, Contents} | Rest]) ->
> z>     ok =
> z>         case filelib:is_dir(Dir) of
> z>             true ->
> z>                 ok;
> z>             false ->
> z>                 ok = log(info, "Creating dir: ~p", [Dir]),
> z>                 file:make_dir(Dir)
> z>         end,
> z>     ok = file:set_cwd(Dir),
> z>     ok = make_tree(Contents),
> z>     ok = file:set_cwd(".."),
> z>     make_tree(Rest);
> make_tree([]) ->>
> z>     ok.
>
>
> z> Not so bad.
>
> z> In your case we could represent things perhaps a bit better by
> z> separating the types and tagging them. Instead of just "FT" and
> z> whatever other string labels you might want, you could either use
> z> atoms (totally unambiguous) or tuples as we have in the example
> z> able (also totally unambiguous). I prefer tuples, though, because they are easier to read.
>
> z> [{value, "foo"},
> z>  {tree,
> z>   [{value, "bar"},
> z>    {value, "foo"}]},
> z>  {value, "baz"}]
>
>
> z> So then we do something like:
>
>
> z> traverse([{value, Value} | Rest]) ->
> z>    ok = do_thing(Value),
> z>    traverse(Rest);
> z> traverse([{tree, Contents} | Rest]) ->
> z>    ok = traverse(Contents),
> z>    traverse(Rest);
> traverse([]) ->>
> z>    ok.
>
>
> z> Anyway, don't be afraid of varying your value types to say exactly
> z> what you mean. If your strings like "FT" only had meaning within
> z> your system consider NOT USING STRINGS, and using atoms instead. That makes it even easier:
>
>
> z> [foo,
> z>  bar,
> z>  [foo,
> z>   bar],
> z>  foo]
>
>
> z> So then we can do:
>
>
> z> traverse([foo | Rest]) ->
> z>     ok = do_foo(),
> z>     traverse(Rest);
> z> traverse([bar | Rest]) ->
> z>     ok = do_bar(),
> z>     traverse(Rest);
> z> traverse([Value | Rest]) when is_list(Value) ->
> z>     ok = traverse(Value),
> z>     traverse(Rest);
> traverse([]) ->>
> z>     ok.
>
>
> z> And of course, you can not use a guard if you want to match on a
> z> list shape in the listy clause there, but that is a minor detail.
> z> The point is to make your data types MEAN SOMETHING REASONABLE
> z> within your system. Use atoms when your values are meaningful only
> z> within your system. Strings are for the birds.
>
> z> -Craig
> z> _______________________________________________
> z> erlang-questions mailing list
> z> [hidden email]
> z> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Joe Armstrong-2
In reply to this post by Andrew McIntyre
Hello,

I'm going to go way off topic here and not answer your specific
question about lists ...

Your last mail had the information I need - you're trying to parse HL7.
I have a few comments.

1) Your original question did not bother to mention  what problem you
were trying to solve -
    You asked about a sub-problem that you encountered when trying to
solve your principle
    problem (principle problem = parse HL7) (sub-problem = differentiate lists)

 2) It's *always* a good idea to ask questions about the principle
problem first !!!!

I didn't know what HL7 was - my immediate thought was
 'I wonder if anybody has written an *proper* HL7 parser in Erlang' - by
proper I mean "has expended a significant amount of thought on writing a parser"

Google is your friend - It told me what HL7 was (I hadn't a clue here
- "never heard of it")
and it turned up a parser in elixir

    https://github.com/jcomellas/ex_hl7

From the quality of the documentation I assume this is a *proper*
implementation.

Now elixir compiles to .beam files and can be called from Erlang -
which raises another
sub problem "how do I compile the elixir code and call it from Erlang"
and begs the
question "is this effort worthwhile"

Given that a parser for HL7 exists in elixir it might be sensible to
use it "off the shelf"

I have a feeling that elixir folks are good at reusing erlang code -
but that reuse in the
opposite direction is less easy.

The last time I fiddled a bit (yesterday as it happened) - it turned
out to be less than
blindingly obvious how to call other than trivial elixir code from erlang.

I was also wondering about cross-compilation. Has anybody written
something that turns
erlang code into elixir source code or vice. versa.

Cheers

/Joe





On Mon, Aug 7, 2017 at 3:46 PM, Andrew McIntyre
<[hidden email]> wrote:

> Hello Craig,
>
> Thanks for your help.
>
> I am trying to store the data as efficiently as possible. Its HL7
> natively and this is my test:
>
> OBX|17|FT~TEST|8265-1^^LN&SUBCOMP|1&2&3&4|\H\Spot Image 2\N\||||||F
>
> |~^& are delimiters. The hierarchy is only so deep and using lists of
> lists to provide a tree like way to access the data eg Field 3, repeat
> 1 component 2 subcomponent1
>
> Parsed it looks like this:
>
> [["OBX","17",
>   ["FT","TEST"],
>   [["8265-1",[],["LN","SUBCOMP"]]],
>   [[["1","2","3","4"]]],
>   "\\H\\Spot Image 2\\N\\",[],[],[],[],[],"F"]]
>
> As the format evolves over time the hierarchy can be extended, but
> older clients can still read the value they are expecting if they
> follow the rules, like reading the first value in the list when you
> only expect one value to be there.
>
> Currently a typical system might have 12 million of these records so
> want to keep format as small as possible in the erlang format, hence
> reluctant to tag 2 much, but know how to get value of interest. Maybe
> that is my non erlang background showing up? Traversing 4 small lists
> by index should be fast??
>
> I guess I could save strings as binary in the lists then is_binary
> should work?? Is that the case. I gather on 64bit system especially
> binary is more space efficient.
>
> Monday, August 7, 2017, 10:53:11 PM, you wrote:
>
> z> On 2017年08月07日 月曜日 22:29:31 you wrote:
>>> Hello zxq9,
>>>
>>> Thanks, Unfortunately I do not know the value of the string that will
>>> be there. Its an extensible hierarchy that can be several lists deep -
>>> or not. Might need to revise the data structure
>
> z> In this case it can be useful to consider a way of tagging values.
>
> z> Imagine we want to represent a directory tree structure and have a
> z> descent-first traversal function recurse over it while creating the
> z> tree. We have two things that can happen, there is a flat list of
> z> new directories that need to be created, and there is the
> z> possibility that the tree depth extends deeper at each node.
>
> z> The naive version would look like what you have:
>
> z> ["top_dir_1",
> z>  "top_dir_2",
> z>  ["next_level_1",
> z>   "next_level_2"]]
>
> z> This leaves a bit to be desired, not only because of the problem
> z> you have pointed out that makes it difficult to know what is deep
> z> and what is shallow, but also because we don't really have a good
> z> way to represent a full tree (what would be the name of a directory containing other directories?).
>
> z> So consider instead something like this:
>
> z> [{"top_dir_1", []},
> z>  {"top_dir_2", []},
> z>  {"top_dir_3",
> z>   [{"next_level_1", []},
> z>    {"next_level_2", []}]}]
>
> z> Now we have a representation of each directory's name AND its contents.
>
> z> We can traverse this laterally AND in depth without any ambiguity
> z> or need for carrying around a record of where we have been (by
> z> using depth recursion and tail-call recursion):
>
>
> z> make_tree([{Dir, Contents} | Rest]) ->
> z>     ok =
> z>         case filelib:is_dir(Dir) of
> z>             true ->
> z>                 ok;
> z>             false ->
> z>                 ok = log(info, "Creating dir: ~p", [Dir]),
> z>                 file:make_dir(Dir)
> z>         end,
> z>     ok = file:set_cwd(Dir),
> z>     ok = make_tree(Contents),
> z>     ok = file:set_cwd(".."),
> z>     make_tree(Rest);
> make_tree([]) ->>
> z>     ok.
>
>
> z> Not so bad.
>
> z> In your case we could represent things perhaps a bit better by
> z> separating the types and tagging them. Instead of just "FT" and
> z> whatever other string labels you might want, you could either use
> z> atoms (totally unambiguous) or tuples as we have in the example
> z> able (also totally unambiguous). I prefer tuples, though, because they are easier to read.
>
> z> [{value, "foo"},
> z>  {tree,
> z>   [{value, "bar"},
> z>    {value, "foo"}]},
> z>  {value, "baz"}]
>
>
> z> So then we do something like:
>
>
> z> traverse([{value, Value} | Rest]) ->
> z>    ok = do_thing(Value),
> z>    traverse(Rest);
> z> traverse([{tree, Contents} | Rest]) ->
> z>    ok = traverse(Contents),
> z>    traverse(Rest);
> traverse([]) ->>
> z>    ok.
>
>
> z> Anyway, don't be afraid of varying your value types to say exactly
> z> what you mean. If your strings like "FT" only had meaning within
> z> your system consider NOT USING STRINGS, and using atoms instead. That makes it even easier:
>
>
> z> [foo,
> z>  bar,
> z>  [foo,
> z>   bar],
> z>  foo]
>
>
> z> So then we can do:
>
>
> z> traverse([foo | Rest]) ->
> z>     ok = do_foo(),
> z>     traverse(Rest);
> z> traverse([bar | Rest]) ->
> z>     ok = do_bar(),
> z>     traverse(Rest);
> z> traverse([Value | Rest]) when is_list(Value) ->
> z>     ok = traverse(Value),
> z>     traverse(Rest);
> traverse([]) ->>
> z>     ok.
>
>
> z> And of course, you can not use a guard if you want to match on a
> z> list shape in the listy clause there, but that is a minor detail.
> z> The point is to make your data types MEAN SOMETHING REASONABLE
> z> within your system. Use atoms when your values are meaningful only
> z> within your system. Strings are for the birds.
>
> z> -Craig
> z> _______________________________________________
> z> erlang-questions mailing list
> z> [hidden email]
> z> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> --
> Best regards,
>  Andrew                             mailto:[hidden email]
>
> sent from a real computer
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Oleksii Semilietov
I using parse_trans_pp from Ulf Wiger parse_trans lib (https://github.com/uwiger/parse_trans) to get erlang code from elixir compiled modules.

Common technique is compile elixir project than do
escript ~/projects/parse_trans/ebin/parse_trans_pp.beam _build/dev/lib/some_project/ebin/module_name.beam > module_name.erl

Output will be erang module.


Do somebody have other ways?

On 8 August 2017 at 11:00, Joe Armstrong <[hidden email]> wrote:
Hello,

I'm going to go way off topic here and not answer your specific
question about lists ...

Your last mail had the information I need - you're trying to parse HL7.
I have a few comments.

1) Your original question did not bother to mention  what problem you
were trying to solve -
    You asked about a sub-problem that you encountered when trying to
solve your principle
    problem (principle problem = parse HL7) (sub-problem = differentiate lists)

 2) It's *always* a good idea to ask questions about the principle
problem first !!!!

I didn't know what HL7 was - my immediate thought was
 'I wonder if anybody has written an *proper* HL7 parser in Erlang' - by
proper I mean "has expended a significant amount of thought on writing a parser"

Google is your friend - It told me what HL7 was (I hadn't a clue here
- "never heard of it")
and it turned up a parser in elixir

    https://github.com/jcomellas/ex_hl7

From the quality of the documentation I assume this is a *proper*
implementation.

Now elixir compiles to .beam files and can be called from Erlang -
which raises another
sub problem "how do I compile the elixir code and call it from Erlang"
and begs the
question "is this effort worthwhile"

Given that a parser for HL7 exists in elixir it might be sensible to
use it "off the shelf"

I have a feeling that elixir folks are good at reusing erlang code -
but that reuse in the
opposite direction is less easy.

The last time I fiddled a bit (yesterday as it happened) - it turned
out to be less than
blindingly obvious how to call other than trivial elixir code from erlang.

I was also wondering about cross-compilation. Has anybody written
something that turns
erlang code into elixir source code or vice. versa.

Cheers

/Joe





On Mon, Aug 7, 2017 at 3:46 PM, Andrew McIntyre
<[hidden email]> wrote:
> Hello Craig,
>
> Thanks for your help.
>
> I am trying to store the data as efficiently as possible. Its HL7
> natively and this is my test:
>
> OBX|17|FT~TEST|8265-1^^LN&SUBCOMP|1&2&3&4|\H\Spot Image 2\N\||||||F
>
> |~^& are delimiters. The hierarchy is only so deep and using lists of
> lists to provide a tree like way to access the data eg Field 3, repeat
> 1 component 2 subcomponent1
>
> Parsed it looks like this:
>
> [["OBX","17",
>   ["FT","TEST"],
>   [["8265-1",[],["LN","SUBCOMP"]]],
>   [[["1","2","3","4"]]],
>   "\\H\\Spot Image 2\\N\\",[],[],[],[],[],"F"]]
>
> As the format evolves over time the hierarchy can be extended, but
> older clients can still read the value they are expecting if they
> follow the rules, like reading the first value in the list when you
> only expect one value to be there.
>
> Currently a typical system might have 12 million of these records so
> want to keep format as small as possible in the erlang format, hence
> reluctant to tag 2 much, but know how to get value of interest. Maybe
> that is my non erlang background showing up? Traversing 4 small lists
> by index should be fast??
>
> I guess I could save strings as binary in the lists then is_binary
> should work?? Is that the case. I gather on 64bit system especially
> binary is more space efficient.
>
> Monday, August 7, 2017, 10:53:11 PM, you wrote:
>
> z> On 2017年08月07日 月曜日 22:29:31 you wrote:
>>> Hello zxq9,
>>>
>>> Thanks, Unfortunately I do not know the value of the string that will
>>> be there. Its an extensible hierarchy that can be several lists deep -
>>> or not. Might need to revise the data structure
>
> z> In this case it can be useful to consider a way of tagging values.
>
> z> Imagine we want to represent a directory tree structure and have a
> z> descent-first traversal function recurse over it while creating the
> z> tree. We have two things that can happen, there is a flat list of
> z> new directories that need to be created, and there is the
> z> possibility that the tree depth extends deeper at each node.
>
> z> The naive version would look like what you have:
>
> z> ["top_dir_1",
> z>  "top_dir_2",
> z>  ["next_level_1",
> z>   "next_level_2"]]
>
> z> This leaves a bit to be desired, not only because of the problem
> z> you have pointed out that makes it difficult to know what is deep
> z> and what is shallow, but also because we don't really have a good
> z> way to represent a full tree (what would be the name of a directory containing other directories?).
>
> z> So consider instead something like this:
>
> z> [{"top_dir_1", []},
> z>  {"top_dir_2", []},
> z>  {"top_dir_3",
> z>   [{"next_level_1", []},
> z>    {"next_level_2", []}]}]
>
> z> Now we have a representation of each directory's name AND its contents.
>
> z> We can traverse this laterally AND in depth without any ambiguity
> z> or need for carrying around a record of where we have been (by
> z> using depth recursion and tail-call recursion):
>
>
> z> make_tree([{Dir, Contents} | Rest]) ->
> z>     ok =
> z>         case filelib:is_dir(Dir) of
> z>             true ->
> z>                 ok;
> z>             false ->
> z>                 ok = log(info, "Creating dir: ~p", [Dir]),
> z>                 file:make_dir(Dir)
> z>         end,
> z>     ok = file:set_cwd(Dir),
> z>     ok = make_tree(Contents),
> z>     ok = file:set_cwd(".."),
> z>     make_tree(Rest);
> make_tree([]) ->>
> z>     ok.
>
>
> z> Not so bad.
>
> z> In your case we could represent things perhaps a bit better by
> z> separating the types and tagging them. Instead of just "FT" and
> z> whatever other string labels you might want, you could either use
> z> atoms (totally unambiguous) or tuples as we have in the example
> z> able (also totally unambiguous). I prefer tuples, though, because they are easier to read.
>
> z> [{value, "foo"},
> z>  {tree,
> z>   [{value, "bar"},
> z>    {value, "foo"}]},
> z>  {value, "baz"}]
>
>
> z> So then we do something like:
>
>
> z> traverse([{value, Value} | Rest]) ->
> z>    ok = do_thing(Value),
> z>    traverse(Rest);
> z> traverse([{tree, Contents} | Rest]) ->
> z>    ok = traverse(Contents),
> z>    traverse(Rest);
> traverse([]) ->>
> z>    ok.
>
>
> z> Anyway, don't be afraid of varying your value types to say exactly
> z> what you mean. If your strings like "FT" only had meaning within
> z> your system consider NOT USING STRINGS, and using atoms instead. That makes it even easier:
>
>
> z> [foo,
> z>  bar,
> z>  [foo,
> z>   bar],
> z>  foo]
>
>
> z> So then we can do:
>
>
> z> traverse([foo | Rest]) ->
> z>     ok = do_foo(),
> z>     traverse(Rest);
> z> traverse([bar | Rest]) ->
> z>     ok = do_bar(),
> z>     traverse(Rest);
> z> traverse([Value | Rest]) when is_list(Value) ->
> z>     ok = traverse(Value),
> z>     traverse(Rest);
> traverse([]) ->>
> z>     ok.
>
>
> z> And of course, you can not use a guard if you want to match on a
> z> list shape in the listy clause there, but that is a minor detail.
> z> The point is to make your data types MEAN SOMETHING REASONABLE
> z> within your system. Use atoms when your values are meaningful only
> z> within your system. Strings are for the birds.
>
> z> -Craig
> z> _______________________________________________
> z> erlang-questions mailing list
> z> [hidden email]
> z> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> --
> Best regards,
>  Andrew                             mailto:[hidden email]
>
> sent from a real computer
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions



--
Best regards,
Alex [Oleksii Semilietov]


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Michael Truog
In reply to this post by Joe Armstrong-2
On 08/08/2017 01:00 AM, Joe Armstrong wrote:
> The last time I fiddled a bit (yesterday as it happened) - it turned
> out to be less than
> blindingly obvious how to call other than trivial elixir code from erlang.
>
> I was also wondering about cross-compilation. Has anybody written
> something that turns
> erlang code into elixir source code or vice. versa.

I created a script ex2erl (https://github.com/okeuday/reltool_util/blob/master/ex2erl) to convert a single Elixir module into multiple Erlang modules.  Then it is easy to include Elixir source code into an Erlang project without creating a dependency on Elixir for all use of the Erlang project.  The ex2erl script requires an Elixir installation alongside the Erlang installation and depends on the Elixir module having the debug_info in the beam output which is done by default.

Best Regards,
Michael
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Jonáš Trantina
In reply to this post by Joe Armstrong-2

Hi Joe,

I recently investigated this topic and found the following rebar3 plugin you can use (if you are using rebar3) - https://github.com/barrel-db/rebar3_elixir_compile. I think it supports both your own Elixir code and Elixir dependencies, it automatically builds all of those so you can use them from your Erlang code. And using it from Erlang shouldn't be a problem once you know some Elixir basics (at least how the types are mapped between Ex and Erl). There's one plugin for old rebar too I believe.

While I tried the above plugin and it seemed to work fine (on a smallish project) I found that using mix (Elixir's build tool) worked much better. Since mix supports both Erlang and Elixir out-of-the-box it seemed more natural for mixed projects (again tested only on a smallish test project, more info can be found at https://hexdocs.pm/mix/Mix.Tasks.Compile.Erlang.html#content).

Regards,
Jonas Trantina

út 8. 8. 2017 v 10:01 odesílatel Joe Armstrong <[hidden email]> napsal:
Hello,

I'm going to go way off topic here and not answer your specific
question about lists ...

Your last mail had the information I need - you're trying to parse HL7.
I have a few comments.

1) Your original question did not bother to mention  what problem you
were trying to solve -
    You asked about a sub-problem that you encountered when trying to
solve your principle
    problem (principle problem = parse HL7) (sub-problem = differentiate lists)

 2) It's *always* a good idea to ask questions about the principle
problem first !!!!

I didn't know what HL7 was - my immediate thought was
 'I wonder if anybody has written an *proper* HL7 parser in Erlang' - by
proper I mean "has expended a significant amount of thought on writing a parser"

Google is your friend - It told me what HL7 was (I hadn't a clue here
- "never heard of it")
and it turned up a parser in elixir

    https://github.com/jcomellas/ex_hl7

From the quality of the documentation I assume this is a *proper*
implementation.

Now elixir compiles to .beam files and can be called from Erlang -
which raises another
sub problem "how do I compile the elixir code and call it from Erlang"
and begs the
question "is this effort worthwhile"

Given that a parser for HL7 exists in elixir it might be sensible to
use it "off the shelf"

I have a feeling that elixir folks are good at reusing erlang code -
but that reuse in the
opposite direction is less easy.

The last time I fiddled a bit (yesterday as it happened) - it turned
out to be less than
blindingly obvious how to call other than trivial elixir code from erlang.

I was also wondering about cross-compilation. Has anybody written
something that turns
erlang code into elixir source code or vice. versa.

Cheers

/Joe





On Mon, Aug 7, 2017 at 3:46 PM, Andrew McIntyre
<[hidden email]> wrote:
> Hello Craig,
>
> Thanks for your help.
>
> I am trying to store the data as efficiently as possible. Its HL7
> natively and this is my test:
>
> OBX|17|FT~TEST|8265-1^^LN&SUBCOMP|1&2&3&4|\H\Spot Image 2\N\||||||F
>
> |~^& are delimiters. The hierarchy is only so deep and using lists of
> lists to provide a tree like way to access the data eg Field 3, repeat
> 1 component 2 subcomponent1
>
> Parsed it looks like this:
>
> [["OBX","17",
>   ["FT","TEST"],
>   [["8265-1",[],["LN","SUBCOMP"]]],
>   [[["1","2","3","4"]]],
>   "\\H\\Spot Image 2\\N\\",[],[],[],[],[],"F"]]
>
> As the format evolves over time the hierarchy can be extended, but
> older clients can still read the value they are expecting if they
> follow the rules, like reading the first value in the list when you
> only expect one value to be there.
>
> Currently a typical system might have 12 million of these records so
> want to keep format as small as possible in the erlang format, hence
> reluctant to tag 2 much, but know how to get value of interest. Maybe
> that is my non erlang background showing up? Traversing 4 small lists
> by index should be fast??
>
> I guess I could save strings as binary in the lists then is_binary
> should work?? Is that the case. I gather on 64bit system especially
> binary is more space efficient.
>
> Monday, August 7, 2017, 10:53:11 PM, you wrote:
>
> z> On 2017年08月07日 月曜日 22:29:31 you wrote:
>>> Hello zxq9,
>>>
>>> Thanks, Unfortunately I do not know the value of the string that will
>>> be there. Its an extensible hierarchy that can be several lists deep -
>>> or not. Might need to revise the data structure
>
> z> In this case it can be useful to consider a way of tagging values.
>
> z> Imagine we want to represent a directory tree structure and have a
> z> descent-first traversal function recurse over it while creating the
> z> tree. We have two things that can happen, there is a flat list of
> z> new directories that need to be created, and there is the
> z> possibility that the tree depth extends deeper at each node.
>
> z> The naive version would look like what you have:
>
> z> ["top_dir_1",
> z>  "top_dir_2",
> z>  ["next_level_1",
> z>   "next_level_2"]]
>
> z> This leaves a bit to be desired, not only because of the problem
> z> you have pointed out that makes it difficult to know what is deep
> z> and what is shallow, but also because we don't really have a good
> z> way to represent a full tree (what would be the name of a directory containing other directories?).
>
> z> So consider instead something like this:
>
> z> [{"top_dir_1", []},
> z>  {"top_dir_2", []},
> z>  {"top_dir_3",
> z>   [{"next_level_1", []},
> z>    {"next_level_2", []}]}]
>
> z> Now we have a representation of each directory's name AND its contents.
>
> z> We can traverse this laterally AND in depth without any ambiguity
> z> or need for carrying around a record of where we have been (by
> z> using depth recursion and tail-call recursion):
>
>
> z> make_tree([{Dir, Contents} | Rest]) ->
> z>     ok =
> z>         case filelib:is_dir(Dir) of
> z>             true ->
> z>                 ok;
> z>             false ->
> z>                 ok = log(info, "Creating dir: ~p", [Dir]),
> z>                 file:make_dir(Dir)
> z>         end,
> z>     ok = file:set_cwd(Dir),
> z>     ok = make_tree(Contents),
> z>     ok = file:set_cwd(".."),
> z>     make_tree(Rest);
> make_tree([]) ->>
> z>     ok.
>
>
> z> Not so bad.
>
> z> In your case we could represent things perhaps a bit better by
> z> separating the types and tagging them. Instead of just "FT" and
> z> whatever other string labels you might want, you could either use
> z> atoms (totally unambiguous) or tuples as we have in the example
> z> able (also totally unambiguous). I prefer tuples, though, because they are easier to read.
>
> z> [{value, "foo"},
> z>  {tree,
> z>   [{value, "bar"},
> z>    {value, "foo"}]},
> z>  {value, "baz"}]
>
>
> z> So then we do something like:
>
>
> z> traverse([{value, Value} | Rest]) ->
> z>    ok = do_thing(Value),
> z>    traverse(Rest);
> z> traverse([{tree, Contents} | Rest]) ->
> z>    ok = traverse(Contents),
> z>    traverse(Rest);
> traverse([]) ->>
> z>    ok.
>
>
> z> Anyway, don't be afraid of varying your value types to say exactly
> z> what you mean. If your strings like "FT" only had meaning within
> z> your system consider NOT USING STRINGS, and using atoms instead. That makes it even easier:
>
>
> z> [foo,
> z>  bar,
> z>  [foo,
> z>   bar],
> z>  foo]
>
>
> z> So then we can do:
>
>
> z> traverse([foo | Rest]) ->
> z>     ok = do_foo(),
> z>     traverse(Rest);
> z> traverse([bar | Rest]) ->
> z>     ok = do_bar(),
> z>     traverse(Rest);
> z> traverse([Value | Rest]) when is_list(Value) ->
> z>     ok = traverse(Value),
> z>     traverse(Rest);
> traverse([]) ->>
> z>     ok.
>
>
> z> And of course, you can not use a guard if you want to match on a
> z> list shape in the listy clause there, but that is a minor detail.
> z> The point is to make your data types MEAN SOMETHING REASONABLE
> z> within your system. Use atoms when your values are meaningful only
> z> within your system. Strings are for the birds.
>
> z> -Craig
> z> _______________________________________________
> z> erlang-questions mailing list
> z> [hidden email]
> z> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
> --
> Best regards,
>  Andrew                             mailto:[hidden email]
>
> sent from a real computer
>
>
> _______________________________________________
> erlang-questions mailing list
> [hidden email]
> http://erlang.org/mailman/listinfo/erlang-questions
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Andrew McIntyre
In reply to this post by Joe Armstrong-2
Hello Joe,

Thanks for you thoughts, I have been following erlang since your first
book, but have 20yrs of OO Delphi code that parses hL7 and I was trying to
learn erlang by implementing something I am very familiar with in an
OO way, in a functional way, I am sure will take a while to get a hang
of the community and functional programming. Have a bad habit of
trying to implement everything myself, and HL7 has nuances that make
backward and forward compatibility quite good and that starts in
parser. We have also done in elixir, but a little to much magic and
syntax inconsistencies for my taste in elixir. I guess thats because I
am used to very readable explicit pascal code, but elixir appeals to
people coming from ruby. In the end it runs on the beam, and that is
what attracts me!

Will try and explain questions better, thanks for all the responses

Andrew


Tuesday, August 8, 2017, 6:00:59 PM, you wrote:

JA> Hello,

JA> I'm going to go way off topic here and not answer your specific
JA> question about lists ...

JA> Your last mail had the information I need - you're trying to parse HL7.
JA> I have a few comments.

JA> 1) Your original question did not bother to mention  what problem you
JA> were trying to solve -
JA>     You asked about a sub-problem that you encountered when trying to
JA> solve your principle
JA>     problem (principle problem = parse HL7) (sub-problem = differentiate lists)

JA>  2) It's *always* a good idea to ask questions about the principle
JA> problem first !!!!

JA> I didn't know what HL7 was - my immediate thought was
JA>  'I wonder if anybody has written an *proper* HL7 parser in Erlang' - by
JA> proper I mean "has expended a significant amount of thought on writing a parser"

JA> Google is your friend - It told me what HL7 was (I hadn't a clue here
JA> - "never heard of it")
JA> and it turned up a parser in elixir

JA>     https://github.com/jcomellas/ex_hl7

>>From the quality of the documentation I assume this is a *proper*
JA> implementation.

JA> Now elixir compiles to .beam files and can be called from Erlang -
JA> which raises another
JA> sub problem "how do I compile the elixir code and call it from Erlang"
JA> and begs the
JA> question "is this effort worthwhile"

JA> Given that a parser for HL7 exists in elixir it might be sensible to
JA> use it "off the shelf"

JA> I have a feeling that elixir folks are good at reusing erlang code -
JA> but that reuse in the
JA> opposite direction is less easy.

JA> The last time I fiddled a bit (yesterday as it happened) - it turned
JA> out to be less than
JA> blindingly obvious how to call other than trivial elixir code from erlang.

JA> I was also wondering about cross-compilation. Has anybody written
JA> something that turns
JA> erlang code into elixir source code or vice. versa.

JA> Cheers

JA> /Joe





JA> On Mon, Aug 7, 2017 at 3:46 PM, Andrew McIntyre
JA> <[hidden email]> wrote:

>> Hello Craig,
>>
>> Thanks for your help.
>>
>> I am trying to store the data as efficiently as possible. Its HL7
>> natively and this is my test:
>>
>> OBX|17|FT~TEST|8265-1^^LN&SUBCOMP|1&2&3&4|\H\Spot Image 2\N\||||||F
>>
>> |~^& are delimiters. The hierarchy is only so deep and using lists of
>> lists to provide a tree like way to access the data eg Field 3, repeat
>> 1 component 2 subcomponent1
>>
>> Parsed it looks like this:
>>
>> [["OBX","17",
>>   ["FT","TEST"],
>>   [["8265-1",[],["LN","SUBCOMP"]]],
>>   [[["1","2","3","4"]]],
>>   "\\H\\Spot Image 2\\N\\",[],[],[],[],[],"F"]]
>>
>> As the format evolves over time the hierarchy can be extended, but
>> older clients can still read the value they are expecting if they
>> follow the rules, like reading the first value in the list when you
>> only expect one value to be there.
>>
>> Currently a typical system might have 12 million of these records so
>> want to keep format as small as possible in the erlang format, hence
>> reluctant to tag 2 much, but know how to get value of interest. Maybe
>> that is my non erlang background showing up? Traversing 4 small lists
>> by index should be fast??
>>
>> I guess I could save strings as binary in the lists then is_binary
>> should work?? Is that the case. I gather on 64bit system especially
>> binary is more space efficient.
>>
>> Monday, August 7, 2017, 10:53:11 PM, you wrote:
>>
>> z> On 2017年08月07日 月曜日 22:29:31 you wrote:
>>>> Hello zxq9,
>>>>
>>>> Thanks, Unfortunately I do not know the value of the string that will
>>>> be there. Its an extensible hierarchy that can be several lists deep -
>>>> or not. Might need to revise the data structure
>>
>> z> In this case it can be useful to consider a way of tagging values.
>>
>> z> Imagine we want to represent a directory tree structure and have a
>> z> descent-first traversal function recurse over it while creating the
>> z> tree. We have two things that can happen, there is a flat list of
>> z> new directories that need to be created, and there is the
>> z> possibility that the tree depth extends deeper at each node.
>>
>> z> The naive version would look like what you have:
>>
>> z> ["top_dir_1",
>> z>  "top_dir_2",
>> z>  ["next_level_1",
>> z>   "next_level_2"]]
>>
>> z> This leaves a bit to be desired, not only because of the problem
>> z> you have pointed out that makes it difficult to know what is deep
>> z> and what is shallow, but also because we don't really have a good
>> z> way to represent a full tree (what would be the name of a directory containing other directories?).
>>
>> z> So consider instead something like this:
>>
>> z> [{"top_dir_1", []},
>> z>  {"top_dir_2", []},
>> z>  {"top_dir_3",
>> z>   [{"next_level_1", []},
>> z>    {"next_level_2", []}]}]
>>
>> z> Now we have a representation of each directory's name AND its contents.
>>
>> z> We can traverse this laterally AND in depth without any ambiguity
>> z> or need for carrying around a record of where we have been (by
>> z> using depth recursion and tail-call recursion):
>>
>>
>> z> make_tree([{Dir, Contents} | Rest]) ->
>> z>     ok =
>> z>         case filelib:is_dir(Dir) of
>> z>             true ->
>> z>                 ok;
>> z>             false ->
>> z>                 ok = log(info, "Creating dir: ~p", [Dir]),
>> z>                 file:make_dir(Dir)
>> z>         end,
>> z>     ok = file:set_cwd(Dir),
>> z>     ok = make_tree(Contents),
>> z>     ok = file:set_cwd(".."),
>> z>     make_tree(Rest);
>> make_tree([]) ->>
>> z>     ok.
>>
>>
>> z> Not so bad.
>>
>> z> In your case we could represent things perhaps a bit better by
>> z> separating the types and tagging them. Instead of just "FT" and
>> z> whatever other string labels you might want, you could either use
>> z> atoms (totally unambiguous) or tuples as we have in the example
>> z> able (also totally unambiguous). I prefer tuples, though, because they are easier to read.
>>
>> z> [{value, "foo"},
>> z>  {tree,
>> z>   [{value, "bar"},
>> z>    {value, "foo"}]},
>> z>  {value, "baz"}]
>>
>>
>> z> So then we do something like:
>>
>>
>> z> traverse([{value, Value} | Rest]) ->
>> z>    ok = do_thing(Value),
>> z>    traverse(Rest);
>> z> traverse([{tree, Contents} | Rest]) ->
>> z>    ok = traverse(Contents),
>> z>    traverse(Rest);
>> traverse([]) ->>
>> z>    ok.
>>
>>
>> z> Anyway, don't be afraid of varying your value types to say exactly
>> z> what you mean. If your strings like "FT" only had meaning within
>> z> your system consider NOT USING STRINGS, and using atoms instead. That makes it even easier:
>>
>>
>> z> [foo,
>> z>  bar,
>> z>  [foo,
>> z>   bar],
>> z>  foo]
>>
>>
>> z> So then we can do:
>>
>>
>> z> traverse([foo | Rest]) ->
>> z>     ok = do_foo(),
>> z>     traverse(Rest);
>> z> traverse([bar | Rest]) ->
>> z>     ok = do_bar(),
>> z>     traverse(Rest);
>> z> traverse([Value | Rest]) when is_list(Value) ->
>> z>     ok = traverse(Value),
>> z>     traverse(Rest);
>> traverse([]) ->>
>> z>     ok.
>>
>>
>> z> And of course, you can not use a guard if you want to match on a
>> z> list shape in the listy clause there, but that is a minor detail.
>> z> The point is to make your data types MEAN SOMETHING REASONABLE
>> z> within your system. Use atoms when your values are meaningful only
>> z> within your system. Strings are for the birds.
>>
>> z> -Craig
>> z> _______________________________________________
>> z> erlang-questions mailing list
>> z> [hidden email]
>> z> http://erlang.org/mailman/listinfo/erlang-questions
>>
>>
>>
>> --
>> Best regards,
>>  Andrew                             mailto:[hidden email]
>>
>> sent from a real computer
>>
>>
>> _______________________________________________
>> erlang-questions mailing list
>> [hidden email]
>> http://erlang.org/mailman/listinfo/erlang-questions




--
Best regards,
 Andrew                             mailto:[hidden email]

sent from a real computer


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Joe Armstrong-2
On Tue, Aug 8, 2017 at 11:52 AM, Andrew McIntyre
<[hidden email]> wrote:
> Hello Joe,
>
> Thanks for you thoughts, I have been following erlang since your first
> book, but have 20yrs of OO Delphi code that parses hL7 and I was trying to
> learn erlang by implementing something I am very familiar with in an
> OO way, in a functional way, I am sure will take a while to get a hang
> of the community and functional programming. >

> Have a bad habit of
> trying to implement everything myself,

I have this too - I think it's a very good habit. There are only a few
ways to *really*
learn things - Implement it yourself, teach it, write a book about it.

/Joe



> and HL7 has nuances that make
> backward and forward compatibility quite good and that starts in
> parser. We have also done in elixir, but a little to much magic and
> syntax inconsistencies for my taste in elixir. I guess thats because I
> am used to very readable explicit pascal code, but elixir appeals to
> people coming from ruby. In the end it runs on the beam, and that is
> what attracts me!
>
> Will try and explain questions better, thanks for all the responses
>
> Andrew
>
>
> Tuesday, August 8, 2017, 6:00:59 PM, you wrote:
>
> JA> Hello,
>
> JA> I'm going to go way off topic here and not answer your specific
> JA> question about lists ...
>
> JA> Your last mail had the information I need - you're trying to parse HL7.
> JA> I have a few comments.
>
> JA> 1) Your original question did not bother to mention  what problem you
> JA> were trying to solve -
> JA>     You asked about a sub-problem that you encountered when trying to
> JA> solve your principle
> JA>     problem (principle problem = parse HL7) (sub-problem = differentiate lists)
>
> JA>  2) It's *always* a good idea to ask questions about the principle
> JA> problem first !!!!
>
> JA> I didn't know what HL7 was - my immediate thought was
> JA>  'I wonder if anybody has written an *proper* HL7 parser in Erlang' - by
> JA> proper I mean "has expended a significant amount of thought on writing a parser"
>
> JA> Google is your friend - It told me what HL7 was (I hadn't a clue here
> JA> - "never heard of it")
> JA> and it turned up a parser in elixir
>
> JA>     https://github.com/jcomellas/ex_hl7
>
>>>From the quality of the documentation I assume this is a *proper*
> JA> implementation.
>
> JA> Now elixir compiles to .beam files and can be called from Erlang -
> JA> which raises another
> JA> sub problem "how do I compile the elixir code and call it from Erlang"
> JA> and begs the
> JA> question "is this effort worthwhile"
>
> JA> Given that a parser for HL7 exists in elixir it might be sensible to
> JA> use it "off the shelf"
>
> JA> I have a feeling that elixir folks are good at reusing erlang code -
> JA> but that reuse in the
> JA> opposite direction is less easy.
>
> JA> The last time I fiddled a bit (yesterday as it happened) - it turned
> JA> out to be less than
> JA> blindingly obvious how to call other than trivial elixir code from erlang.
>
> JA> I was also wondering about cross-compilation. Has anybody written
> JA> something that turns
> JA> erlang code into elixir source code or vice. versa.
>
> JA> Cheers
>
> JA> /Joe
>
>
>
>
>
> JA> On Mon, Aug 7, 2017 at 3:46 PM, Andrew McIntyre
> JA> <[hidden email]> wrote:
>>> Hello Craig,
>>>
>>> Thanks for your help.
>>>
>>> I am trying to store the data as efficiently as possible. Its HL7
>>> natively and this is my test:
>>>
>>> OBX|17|FT~TEST|8265-1^^LN&SUBCOMP|1&2&3&4|\H\Spot Image 2\N\||||||F
>>>
>>> |~^& are delimiters. The hierarchy is only so deep and using lists of
>>> lists to provide a tree like way to access the data eg Field 3, repeat
>>> 1 component 2 subcomponent1
>>>
>>> Parsed it looks like this:
>>>
>>> [["OBX","17",
>>>   ["FT","TEST"],
>>>   [["8265-1",[],["LN","SUBCOMP"]]],
>>>   [[["1","2","3","4"]]],
>>>   "\\H\\Spot Image 2\\N\\",[],[],[],[],[],"F"]]
>>>
>>> As the format evolves over time the hierarchy can be extended, but
>>> older clients can still read the value they are expecting if they
>>> follow the rules, like reading the first value in the list when you
>>> only expect one value to be there.
>>>
>>> Currently a typical system might have 12 million of these records so
>>> want to keep format as small as possible in the erlang format, hence
>>> reluctant to tag 2 much, but know how to get value of interest. Maybe
>>> that is my non erlang background showing up? Traversing 4 small lists
>>> by index should be fast??
>>>
>>> I guess I could save strings as binary in the lists then is_binary
>>> should work?? Is that the case. I gather on 64bit system especially
>>> binary is more space efficient.
>>>
>>> Monday, August 7, 2017, 10:53:11 PM, you wrote:
>>>
>>> z> On 2017年08月07日 月曜日 22:29:31 you wrote:
>>>>> Hello zxq9,
>>>>>
>>>>> Thanks, Unfortunately I do not know the value of the string that will
>>>>> be there. Its an extensible hierarchy that can be several lists deep -
>>>>> or not. Might need to revise the data structure
>>>
>>> z> In this case it can be useful to consider a way of tagging values.
>>>
>>> z> Imagine we want to represent a directory tree structure and have a
>>> z> descent-first traversal function recurse over it while creating the
>>> z> tree. We have two things that can happen, there is a flat list of
>>> z> new directories that need to be created, and there is the
>>> z> possibility that the tree depth extends deeper at each node.
>>>
>>> z> The naive version would look like what you have:
>>>
>>> z> ["top_dir_1",
>>> z>  "top_dir_2",
>>> z>  ["next_level_1",
>>> z>   "next_level_2"]]
>>>
>>> z> This leaves a bit to be desired, not only because of the problem
>>> z> you have pointed out that makes it difficult to know what is deep
>>> z> and what is shallow, but also because we don't really have a good
>>> z> way to represent a full tree (what would be the name of a directory containing other directories?).
>>>
>>> z> So consider instead something like this:
>>>
>>> z> [{"top_dir_1", []},
>>> z>  {"top_dir_2", []},
>>> z>  {"top_dir_3",
>>> z>   [{"next_level_1", []},
>>> z>    {"next_level_2", []}]}]
>>>
>>> z> Now we have a representation of each directory's name AND its contents.
>>>
>>> z> We can traverse this laterally AND in depth without any ambiguity
>>> z> or need for carrying around a record of where we have been (by
>>> z> using depth recursion and tail-call recursion):
>>>
>>>
>>> z> make_tree([{Dir, Contents} | Rest]) ->
>>> z>     ok =
>>> z>         case filelib:is_dir(Dir) of
>>> z>             true ->
>>> z>                 ok;
>>> z>             false ->
>>> z>                 ok = log(info, "Creating dir: ~p", [Dir]),
>>> z>                 file:make_dir(Dir)
>>> z>         end,
>>> z>     ok = file:set_cwd(Dir),
>>> z>     ok = make_tree(Contents),
>>> z>     ok = file:set_cwd(".."),
>>> z>     make_tree(Rest);
>>> make_tree([]) ->>
>>> z>     ok.
>>>
>>>
>>> z> Not so bad.
>>>
>>> z> In your case we could represent things perhaps a bit better by
>>> z> separating the types and tagging them. Instead of just "FT" and
>>> z> whatever other string labels you might want, you could either use
>>> z> atoms (totally unambiguous) or tuples as we have in the example
>>> z> able (also totally unambiguous). I prefer tuples, though, because they are easier to read.
>>>
>>> z> [{value, "foo"},
>>> z>  {tree,
>>> z>   [{value, "bar"},
>>> z>    {value, "foo"}]},
>>> z>  {value, "baz"}]
>>>
>>>
>>> z> So then we do something like:
>>>
>>>
>>> z> traverse([{value, Value} | Rest]) ->
>>> z>    ok = do_thing(Value),
>>> z>    traverse(Rest);
>>> z> traverse([{tree, Contents} | Rest]) ->
>>> z>    ok = traverse(Contents),
>>> z>    traverse(Rest);
>>> traverse([]) ->>
>>> z>    ok.
>>>
>>>
>>> z> Anyway, don't be afraid of varying your value types to say exactly
>>> z> what you mean. If your strings like "FT" only had meaning within
>>> z> your system consider NOT USING STRINGS, and using atoms instead. That makes it even easier:
>>>
>>>
>>> z> [foo,
>>> z>  bar,
>>> z>  [foo,
>>> z>   bar],
>>> z>  foo]
>>>
>>>
>>> z> So then we can do:
>>>
>>>
>>> z> traverse([foo | Rest]) ->
>>> z>     ok = do_foo(),
>>> z>     traverse(Rest);
>>> z> traverse([bar | Rest]) ->
>>> z>     ok = do_bar(),
>>> z>     traverse(Rest);
>>> z> traverse([Value | Rest]) when is_list(Value) ->
>>> z>     ok = traverse(Value),
>>> z>     traverse(Rest);
>>> traverse([]) ->>
>>> z>     ok.
>>>
>>>
>>> z> And of course, you can not use a guard if you want to match on a
>>> z> list shape in the listy clause there, but that is a minor detail.
>>> z> The point is to make your data types MEAN SOMETHING REASONABLE
>>> z> within your system. Use atoms when your values are meaningful only
>>> z> within your system. Strings are for the birds.
>>>
>>> z> -Craig
>>> z> _______________________________________________
>>> z> erlang-questions mailing list
>>> z> [hidden email]
>>> z> http://erlang.org/mailman/listinfo/erlang-questions
>>>
>>>
>>>
>>> --
>>> Best regards,
>>>  Andrew                             mailto:[hidden email]
>>>
>>> sent from a real computer
>>>
>>>
>>> _______________________________________________
>>> erlang-questions mailing list
>>> [hidden email]
>>> http://erlang.org/mailman/listinfo/erlang-questions
>
>
>
>
> --
> Best regards,
>  Andrew                             mailto:[hidden email]
>
> sent from a real computer
>
>
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Joe Armstrong-2
In reply to this post by Michael Truog
Thanks - great fun. Makes for an easy way to learn Elixir. Just
transpile into Erlang and read the
code :-)

/Joe

On Tue, Aug 8, 2017 at 10:48 AM, Michael Truog <[hidden email]> wrote:

> On 08/08/2017 01:00 AM, Joe Armstrong wrote:
>>
>> The last time I fiddled a bit (yesterday as it happened) - it turned
>> out to be less than
>> blindingly obvious how to call other than trivial elixir code from erlang.
>>
>> I was also wondering about cross-compilation. Has anybody written
>> something that turns
>> erlang code into elixir source code or vice. versa.
>
>
> I created a script ex2erl
> (https://github.com/okeuday/reltool_util/blob/master/ex2erl) to convert a
> single Elixir module into multiple Erlang modules.  Then it is easy to
> include Elixir source code into an Erlang project without creating a
> dependency on Elixir for all use of the Erlang project.  The ex2erl script
> requires an Elixir installation alongside the Erlang installation and
> depends on the Elixir module having the debug_info in the beam output which
> is done by default.
>
> Best Regards,
> Michael
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Richard A. O'Keefe-2
In reply to this post by Andrew McIntyre
From what I've been able to glean about the HL7 message
format, there are two aspects:
* the basic syntax is the multi-level delimited thingy but
* there is also *semantics*, predefined message types,
  an assortment of data types, rules for mapping data
  types to trees and so on.

From a quick look at the Elixir HL7 parser, they have
taken steps to handle some (but perhaps not all) of the
*semantics* of HL7 and don't just give a tree of strings,
but more structured data.

Parsing the multi-level delimited syntax is trivial.
Dealing with the semantics is not.
I think that in figuring out for yourself how to work
with HL7 messages in Erlang, the starting point would
be
 - what message types do you want to handle?
 - what kinds of data occur in them?
 - how do you want to represent those kinds of
   data in Erlang?
 - do you actually want to represent *every* field
   at all?  Some might not be relevant to you.
 - would you be streaming messages through a
   system (like some sort of pub/sub queueing
   middleware), summarising messages, storing
   them, or what?
 - what does a type declaration for a message type
   look like in HL7?  Is there some way to automatically
   derive parsing code from that?

What I'm getting at with the last point is that there
is ASN.1 support for Erlang.  Give it an ASN.1
definition, and you get Erlang code out the other end.
I am particularly thinking of the PADS project:

PADS: Processing Arbitrary Data Streams

Kathleen Fisher and Bob Gruber, AT&T Labs

Slides in ppt http://homepages.inf.ed.ac.uk/wadler/xmlbinding/

Transactional data streams, such as sequences of stock-market buy/sell orders, credit-card purchase records, web server entries, and electronic fund transfer orders, can be mined very profitably. As an example, researchers at AT&T have built customer profiles from streams of call-detail records to significant financial effect.

Often such streams are high-volume: AT&T's call-detail stream contains roughly 300 million calls per day requiring approximately 7GBs of storage space. Typically, such stream data arrives ``as is'' in ad hoc formats with poor documentation. In addition, the data frequently contains errors. The appropriate response to such errors is application-specific. Some applications can simply discard unexpected or erroneous values and continue processing. For other applications, however, errors in the data can be the most interesting part of the data.

Understanding a new data source and producing a suitable parser are crucial first steps in any use of such data. Unfortunately, writing parsers for this kind of data is a difficult task, both tedious and error-prone. It is complicated by lack of documentation, convoluted encodings designed to save space, the need to handle errors robustly, and the need to produce efficient code to cope with the scale of the stream. Often, the hard-won understanding of the data ends up embedded in parsing code, making long-term maintenance difficult for the original writer and sharing the knowledge with others nearly impossible.

The goal of the PADS project is to provide languages and tools for simplifying data processing. We have a preliminary design of a declarative data-description language, PADSL, expressive enough to describe the data feeds we see at AT&T in practice, including ASCII, binary, EBCDIC, Cobol, and mixed data formats. From PADSL we generate a tunable C library with functions for parsing, manipulating, and summarizing the data. In joint work with Mary Fernandez and Ricardo Medel, we are working to integrate PADS and XQuery to support declarative querying of data sources with PADS descriptions.

--------------------
The PADS project moved from AT&T to
http://pads.cs.tufts.edu/doc.html

I say this in all seriousness: if I had a need to process
GB of HL7 data, I would start by seeing if PADS was adequate
to describe it, and if so I'd write an HL7->Erlang data
translator in C or ML (as PADS has C and ML versions).
If not, I'd see what ideas I could steal from PADS.

Using a declarative data language to describe the message types
I was interested in would be an up-front cost, but it would
hugely simplify later maintenance.


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Richard A. O'Keefe-2

PS: one amazing thing with PADS is LearnPADS.
You give it a bunch of data and it tries to figure
out the format for itself.
It would be very interesting to see what LearnPADS
made of a bunch of HL7 data.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: List Question

Andrew McIntyre
In reply to this post by Richard A. O'Keefe-2
Hello Richard,

The oo implementation we have is in 2 parts,

1. parse into a tree with no knowledge of semantics

2. Use a specific Object to provide an interface to the data based on
its type


I am involved in standards process and there is a blog post here for
anyone interested:

https://kb.medical-objects.com.au/display/PUB/HL7v2+parsing


Essestially plan to do same in erlang

1. Parse into a tree

2. Create unit that has functions for reading the known values for a
specific segment - these are autogenerated  eg have a msh.erl that has
a msh:Sending_Facility() function

3. Use reader functions that allow message to become more complex, but
permit old function to continue to work


Andrew

Wednesday, August 9, 2017, 9:27:24 AM, you wrote:

RAOK> From what I've been able to glean about the HL7 message
RAOK> format, there are two aspects:
RAOK> * the basic syntax is the multi-level delimited thingy but
RAOK> * there is also *semantics*, predefined message types,
RAOK>   an assortment of data types, rules for mapping data
RAOK>   types to trees and so on.

RAOK> From a quick look at the Elixir HL7 parser, they have
RAOK> taken steps to handle some (but perhaps not all) of the
RAOK> *semantics* of HL7 and don't just give a tree of strings,
RAOK> but more structured data.

RAOK> Parsing the multi-level delimited syntax is trivial.
RAOK> Dealing with the semantics is not.
RAOK> I think that in figuring out for yourself how to work
RAOK> with HL7 messages in Erlang, the starting point would
RAOK> be
RAOK>  - what message types do you want to handle?
RAOK>  - what kinds of data occur in them?
RAOK>  - how do you want to represent those kinds of
RAOK>    data in Erlang?
RAOK>  - do you actually want to represent *every* field
RAOK>    at all?  Some might not be relevant to you.
RAOK>  - would you be streaming messages through a
RAOK>    system (like some sort of pub/sub queueing
RAOK>    middleware), summarising messages, storing
RAOK>    them, or what?
RAOK>  - what does a type declaration for a message type
RAOK>    look like in HL7?  Is there some way to automatically
RAOK>    derive parsing code from that?

RAOK> What I'm getting at with the last point is that there
RAOK> is ASN.1 support for Erlang.  Give it an ASN.1
RAOK> definition, and you get Erlang code out the other end.
RAOK> I am particularly thinking of the PADS project:

RAOK> PADS: Processing Arbitrary Data Streams

RAOK> Kathleen Fisher and Bob Gruber, AT&T Labs

RAOK> Slides in ppt http://homepages.inf.ed.ac.uk/wadler/xmlbinding/

RAOK> Transactional data streams, such as sequences of stock-market
RAOK> buy/sell orders, credit-card purchase records, web server
RAOK> entries, and electronic fund transfer orders, can be mined very
RAOK> profitably. As an example, researchers at AT&T have built
RAOK> customer profiles from streams of call-detail records to significant financial effect.

RAOK> Often such streams are high-volume: AT&T's call-detail stream
RAOK> contains roughly 300 million calls per day requiring
RAOK> approximately 7GBs of storage space. Typically, such stream data
RAOK> arrives ``as is'' in ad hoc formats with poor documentation. In
RAOK> addition, the data frequently contains errors. The appropriate
RAOK> response to such errors is application-specific. Some
RAOK> applications can simply discard unexpected or erroneous values
RAOK> and continue processing. For other applications, however, errors
RAOK> in the data can be the most interesting part of the data.

RAOK> Understanding a new data source and producing a suitable parser
RAOK> are crucial first steps in any use of such data. Unfortunately,
RAOK> writing parsers for this kind of data is a difficult task, both
RAOK> tedious and error-prone. It is complicated by lack of
RAOK> documentation, convoluted encodings designed to save space, the
RAOK> need to handle errors robustly, and the need to produce
RAOK> efficient code to cope with the scale of the stream. Often, the
RAOK> hard-won understanding of the data ends up embedded in parsing
RAOK> code, making long-term maintenance difficult for the original
RAOK> writer and sharing the knowledge with others nearly impossible.

RAOK> The goal of the PADS project is to provide languages and tools
RAOK> for simplifying data processing. We have a preliminary design of
RAOK> a declarative data-description language, PADSL, expressive
RAOK> enough to describe the data feeds we see at AT&T in practice,
RAOK> including ASCII, binary, EBCDIC, Cobol, and mixed data formats.
RAOK> From PADSL we generate a tunable C library with functions for
RAOK> parsing, manipulating, and summarizing the data. In joint work
RAOK> with Mary Fernandez and Ricardo Medel, we are working to
RAOK> integrate PADS and XQuery to support declarative querying of
RAOK> data sources with PADS descriptions.

RAOK> --------------------
RAOK> The PADS project moved from AT&T to
RAOK> http://pads.cs.tufts.edu/doc.html

RAOK> I say this in all seriousness: if I had a need to process
RAOK> GB of HL7 data, I would start by seeing if PADS was adequate
RAOK> to describe it, and if so I'd write an HL7->Erlang data
RAOK> translator in C or ML (as PADS has C and ML versions).
RAOK> If not, I'd see what ideas I could steal from PADS.

RAOK> Using a declarative data language to describe the message types
RAOK> I was interested in would be an up-front cost, but it would
RAOK> hugely simplify later maintenance.






--
Best regards,
 Andrew                             mailto:[hidden email]

sent from a real computer


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
12