remsh failing silently

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

remsh failing silently

Ben Hood
Hi,

I'm failing to invoke remsh on an OTP 17.5.6.10 build (cross compiled
for ppc64).

Kicking off the first node:

$ erl -sname a -setcookie 1
Eshell V6.4.1.7  (abort with ^G)
(a@icap5226)1>

I try to connect to it from a second node (to no avail):

$ erl -sname b -remsh a@icap5226 -setcookie 1
Eshell V6.4.1.7  (abort with ^G)
(b@icap5226)1> nodes().
[]
(b@icap5226)2>

The port mapper indicates that both nodes registered with it:

$ epmd -names
epmd: up and running on port 4369 with data:
name a at port 46201
name b at port 35447

However, if I actively ping the the first node from the second, the
ping succeeds and I assume that the net_kernel subsystem has joined
the nodes:

(b@icap5226)2> net_adm:ping('a@icap5226').
pong
(b@icap5226)3> nodes().
[a@icap5226]
(b@icap5226)4>

So I'm left wondering why the remsh argument can't trigger the same
node join when the second Erlang process is started.

The same invocation works perfectly well on more up to date versions
of OTP, but I'm prevented from just doing a large version upgrade.

Is there some way to get the runtime to output verbose diagnostics to
try to debug this?

TIA,

Ben
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: remsh failing silently

Ben Hood
On Sat, Dec 29, 2018 at 9:42 AM T Ty <[hidden email]> wrote:
> 3. In b@icap5226 shell hit Ctrl-G
> 4. At the --> prompt type
>        r 'a@icap5226' <Enter>
>        c 2 <Enter>
>
> If that all works then there should be no reason for the remsh not to work.

Many thanks for the heads up, much appreciated.

Unfortunately on the ppc machine Crtl-G is not being interpreted (but
Ctrl-C is):

$ erl -sname a -setcookie 1
Eshell V6.4.1.7  (abort with ^G)
(a@icap5226)1> ^G
(a@icap5226)1> ^C
BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
       (v)ersion (k)ill (D)b-tables (d)istribution

So I guess I'll need to get to the bottom of this symptom first.

For background context, the OTP 17.5.6.10 for ppc64 was cross compiled
and statically linked against libncurses - potentially the statically
linked libncurses was not packaged correctly which leads to terminal
character issues?
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: remsh failing silently

Mikael Pettersson-5
On Sat, Dec 29, 2018 at 3:33 PM Ben Hood <[hidden email]> wrote:

>
> On Sat, Dec 29, 2018 at 9:42 AM T Ty <[hidden email]> wrote:
> > 3. In b@icap5226 shell hit Ctrl-G
> > 4. At the --> prompt type
> >        r 'a@icap5226' <Enter>
> >        c 2 <Enter>
> >
> > If that all works then there should be no reason for the remsh not to work.
>
> Many thanks for the heads up, much appreciated.
>
> Unfortunately on the ppc machine Crtl-G is not being interpreted (but
> Ctrl-C is):
>
> $ erl -sname a -setcookie 1
> Eshell V6.4.1.7  (abort with ^G)
> (a@icap5226)1> ^G
> (a@icap5226)1> ^C
> BREAK: (a)bort (c)ontinue (p)roc info (i)nfo (l)oaded
>        (v)ersion (k)ill (D)b-tables (d)istribution
>
> So I guess I'll need to get to the bottom of this symptom first.
>
> For background context, the OTP 17.5.6.10 for ppc64 was cross compiled
> and statically linked against libncurses - potentially the statically
> linked libncurses was not packaged correctly which leads to terminal
> character issues?

For the record, I just built OTP 17.5.6.10 natively on
powerpc64-unknown-linux-gnu, and ^G works just fine there in the
Erlang shell.

I'd have to suspect your cross compilation setup.  Could be some
./configure test not doing the right thing in the cross compile case.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: remsh failing silently

Ben Hood
In reply to this post by Ben Hood
On Sun, Dec 30, 2018 at 9:10 AM T Ty <[hidden email]> wrote:
>
> How are you accessing the console ? Via telnet ? Serial cable ? Possibly your terminal type is not set correctly. Some apps like putty / iterm interpret certain keystrokes for you. Real pain for emacs programmers :D

I'm connecting using Putty and Mobaxterm from a Windows 7 Citrix
instance - I suspect the terminal emulation settings are not the
greatest and I have little control over it.

However, I do have an 17.4 OTP which was compiled natively for ppc -
on the same machine using the same shell this version is able to
interpret Ctrl-G and bring up the user switch command.

So this is why I believe there is an issue is the way that I cross
compiled and statically linked ncurses on the x86 build agent.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: remsh failing silently

Ameretat Reith
Can you check $TERM and whether terminfo is known to ncurses? Can connect with TERM=xterm?

_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: remsh failing silently

Ben Hood
In reply to this post by Mikael Pettersson-5
On Sun, Dec 30, 2018 at 2:34 PM Mikael Pettersson <[hidden email]> wrote:
> For the record, I just built OTP 17.5.6.10 natively on
> powerpc64-unknown-linux-gnu, and ^G works just fine there in the
> Erlang shell.
>
> I'd have to suspect your cross compilation setup.  Could be some
> ./configure test not doing the right thing in the cross compile case.

That sounds quite plausible.

This is the way that I configured the cross compile for ppc (I'm
trying to link openssl and ncurses statically, so that these versions
are independent of the package manager on the target machine).

There seems to be configure options to link openssl statically, but
fewer options for ncurses - does this look vaguely sane?

CONFIGURE_ARGS := --host=$(TARGET_ARCH) \
--disable-hipe \
--enable-smp-support \
--enable-threads \
--enable-kernel-poll \
--with-termcap \
--with-ssl=$(OPENSSL_DIST_PATH) \
--disable-dynamic-ssl-lib \
--build=$(BUILD_ARCH) \
--enable-native-libs \
--enable-static-nifs
LDFLAGS := "-L$(NCURSES_DIST_PATH)/lib -L$(OPENSSL_DIST_PATH)/lib
-L/opt/at12.0/$(TARGET_ARCH)/lib64"
LIBS := $(OPENSSL_STATIC_LIB)
CFLAGS := "-D_GNU_SOURCE -I$(NCURSES_DIST_PATH)/include
-I$(OPENSSL_DIST_PATH)/include -mcpu=$(TARGET_CPU)
-mtune=$(TARGET_CPU) -O3 -pthread"

$(OTP_SRC_DIR)/make/output.mk: $(SKIP_LIBS) $(NCURSES_STATIC_LIB)
$(OPENSSL_STATIC_LIB) $(OTP_SRC_DIR)/$(BOOTSTRAP_BEAM)
cd $(OTP_SRC_DIR) && erl_xcomp_sysroot=$(ERL_TOP) \
CC=$(CC) \
CXX=$(CXX) \
LD=$(LD) \
LDFLAGS=$(LDFLAGS) \
CFLAGS=$(CFLAGS) \
LIBS=$(LIBS) \
./configure $(CONFIGURE_ARGS) --prefix=$(OTP_INSTALL_DIR)
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: remsh failing silently

Ben Hood
In reply to this post by Ameretat Reith
On Wed, Jan 2, 2019 at 10:42 AM Ameretat Reith <[hidden email]> wrote:
>
> Can you check $TERM and whether terminfo is known to ncurses? Can connect with TERM=xterm?

Many thanks for the tip.

$TERM is set to xterm.

Unfortunately running the remote shell like this

TERM=xterm erl -sname b -setcookie 1 -remsh a@icap5526

Has the same behavior as without the TERM variable set.

So I'm thinking this is related to the way ncurses is linked during
cross compilation.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: remsh failing silently

Mikael Pettersson-5
In reply to this post by Ben Hood
On Wed, Jan 2, 2019 at 11:44 AM Ben Hood <[hidden email]> wrote:

>
> On Sun, Dec 30, 2018 at 2:34 PM Mikael Pettersson <[hidden email]> wrote:
> > For the record, I just built OTP 17.5.6.10 natively on
> > powerpc64-unknown-linux-gnu, and ^G works just fine there in the
> > Erlang shell.
> >
> > I'd have to suspect your cross compilation setup.  Could be some
> > ./configure test not doing the right thing in the cross compile case.
>
> That sounds quite plausible.
>
> This is the way that I configured the cross compile for ppc (I'm
> trying to link openssl and ncurses statically, so that these versions
> are independent of the package manager on the target machine).
>
> There seems to be configure options to link openssl statically, but3. If you want ncurses and openssl statically linked I'd build them with --disable-shared in the first place, alternatively (temporarily) remove their .so files from the cross compiler's libs.
> fewer options for ncurses - does this look vaguely sane?
>
> CONFIGURE_ARGS := --host=$(TARGET_ARCH) \
> --disable-hipe \
> --enable-smp-support \
> --enable-threads \
> --enable-kernel-poll \
> --with-termcap \
> --with-ssl=$(OPENSSL_DIST_PATH) \
> --disable-dynamic-ssl-lib \
> --build=$(BUILD_ARCH) \
> --enable-native-libs \
> --enable-static-nifs
> LDFLAGS := "-L$(NCURSES_DIST_PATH)/lib -L$(OPENSSL_DIST_PATH)/lib
> -L/opt/at12.0/$(TARGET_ARCH)/lib64"
> LIBS := $(OPENSSL_STATIC_LIB)3. If you want ncurses and openssl statically linked I'd build them with --disable-shared in the first place, alternatively (temporarily) remove their .so files from the cross compiler's libs.
> CFLAGS := "-D_GNU_SOURCE -I$(NCURSES_DIST_PATH)/include
> -I$(OPENSSL_DIST_PATH)/include -mcpu=$(TARGET_CPU)
> -mtune=$(TARGET_CPU) -O3 -pthread"
>
> $(OTP_SRC_DIR)/make/output.mk: $(SKIP_LIBS) $(NCURSES_STATIC_LIB)
> $(OPENSSL_STATIC_LIB) $(OTP_SRC_DIR)/$(BOOTSTRAP_BEAM)
> cd $(OTP_SRC_DIR) && erl_xcomp_sysroot=$(ERL_TOP) \
> CC=$(CC) \
> CXX=$(CXX) \
> LD=$(LD) \
> LDFLAGS=$(LDFLAGS) \
> CFLAGS=$(CFLAGS) \
> LIBS=$(LIBS) \
> ./configure $(CONFIGURE_ARGS) --prefix=$(OTP_INSTALL_DIR)

This doesn't say how ncurses was build, but I note a few things:
1. --enable-native-libs is meaningless with --disable-hipe
2. All those overrides of LDFLAGS, LIBS, CFLAGS etc look unnecessary.
If I was doing this, I'd build a self-contained cross-compiler with
binutils, gcc, glibc, ncurses, and openssl in $CROSS (e.g.
/opt/cross-ppc64-linux/), put $CROSS/bin first in PATH, then build OTP
following the procedure in HOWTO/INSTALL-CROSS.md section "Building
With configure/make Directly".
3. You might want to build ncurses and openssl with --disable-shared
to ensure that no dynamic linking is attempted.
4. If you `ldd beam.smp` on the target, does it list any unexpected
shared libraries?  And even if say libc is dynamically linked, is it
the same version as used in the cross compilation environment?
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: remsh failing silently

dmkolesnikov
Hello,

On 2 Jan 2019, at 15.07, Mikael Pettersson <[hidden email]> wrote:

3. You might want to build ncurses with --disable-shared
to ensure that no dynamic linking is attempted.

Recently, I’ve been fighting with same issue of cross compiling VM to alpine/musl. Unfortunately, statically linked libncurses would not help (or I hold it wrong). I’ve made a cross-compile of ncurses + openssl and VM with --disable-shared. As result, VM is failed to start due to missing terminal info database. I suspect that you need to compile ncurses with some magic… Do you happened to know it?

Best Regards, 
Dmitry  


_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: remsh failing silently

Ben Hood
In reply to this post by Mikael Pettersson-5
Hi Mikael,

Many thanks for taking the time to look into this, very much appreciated.

To make it easier to see what is going on, I've pushed my Makefile here:

https://github.com/0x6e6562/otp-ppc64le/blob/master/otp/Makefile

This set of Makefiles first builds openssl and ncurses statically, and
then attempts to cross compile OTP and link those two libraries.

The cross compiler is provided by the IBM Advanced Toolchain:
https://developer.ibm.com/linuxonpower/advance-toolchain/

The glibc runtime for the target machine is provided by the IBM
Advanced Toolchain as well.

On Wed, Jan 2, 2019 at 1:07 PM Mikael Pettersson <[hidden email]> wrote:
> > There seems to be configure options to link openssl statically, but3. If you want ncurses and openssl statically linked I'd build them with --disable-shared in the first place, alternatively (temporarily) remove their .so files from the cross compiler's libs.

It doesn't look like the ncurses/openssl builds are producing any *.so files.

But I'll look into the disable-shared flag for good measure.

ATM the ncurses build produces what seems to be a static lib:

$ readelf -h ncurses/dist/lib/libncurses.a

File: ncurses/dist/lib/libncurses.a(version.o)
ELF Header:
  Magic:   7f 45 4c 46 02 01 01 00 00 00 00 00 00 00 00 00
  Class:                             ELF64
  Data:                              2's complement, little endian
  Version:                           1 (current)
  OS/ABI:                            UNIX - System V
  ABI Version:                       0
  Type:                              REL (Relocatable file)
  Machine:                           PowerPC64
  Version:                           0x1
  Entry point address:               0x0
  Start of program headers:          0 (bytes into file)
  Start of section headers:          752 (bytes into file)
  Flags:                             0x2, abiv2
  Size of this header:               64 (bytes)
  Size of program headers:           0 (bytes)
  Number of program headers:         0
  Size of section headers:           64 (bytes)
  Number of section headers:         13
  Section header string table index: 12

> This doesn't say how ncurses was build, but I note a few things:

Fair point. I was trying to be too minimalistic - for reference I've
pushed the Makefile(s) here: https://github.com/0x6e6562/otp-ppc64le

> 1. --enable-native-libs is meaningless with --disable-hipe

OK, that's good to know. I think this is left over from me trying to
get HiPE compilation working with the cross compiler, but I abandoned
this - walk before you can run.

> 2. All those overrides of LDFLAGS, LIBS, CFLAGS etc look unnecessary.

By trial and error, this was the only way I found to specify the IBM
Advance Toolchain version of gcc, ld, etc.

> If I was doing this, I'd build a self-contained cross-compiler with
> binutils, gcc, glibc, ncurses, and openssl in $CROSS (e.g.
> /opt/cross-ppc64-linux/), put $CROSS/bin first in PATH, then build OTP
> following the procedure in HOWTO/INSTALL-CROSS.md section "Building
> With configure/make Directly".

Ah, so you build a special directory of all of the cross compiled
dependencies, put the bin directory first on the PATH and then this
will make sure that the subsequent OTP build picks up those specific
versions of gcc etc?

> 3. You might want to build ncurses and openssl with --disable-shared
> to ensure that no dynamic linking is attempted.

OK - I'll make this explicit in the ncurses Makefile.

> 4. If you `ldd beam.smp` on the target, does it list any unexpected
> shared libraries?  And even if say libc is dynamically linked, is it
> the same version as used in the cross compilation environment?

This is the ldd output for beam.smp - not sure whether it shows
anything surprising.

linux-vdso64.so.1 =>  (0x00003fff84c30000)
libutil.so.1 => /opt/at12.0/lib64/power8/libutil.so.1 (0x00003fff84c00000)
libdl.so.2 => /opt/at12.0/lib64/power8/libdl.so.2 (0x00003fff84bd0000)
libm.so.6 => /opt/at12.0/lib64/power8/libm.so.6 (0x00003fff84a70000)
libpthread.so.0 => /opt/at12.0/lib64/power8/libpthread.so.0 (0x00003fff84a20000)
libc.so.6 => /opt/at12.0/lib64/power8/libc.so.6 (0x00003fff847d0000)
/opt/at12.0/lib64/ld64.so.2 => /lib64/ld64.so.2 (0x0000000051060000)

I wonder if there is an executable test utility in the ncurses build
to make sure the library itself is compiled properly.

Ben
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: remsh failing silently

Ben Hood
In reply to this post by Ameretat Reith
On Wed, Jan 2, 2019 at 10:42 AM Ameretat Reith <[hidden email]> wrote:
>
> Can you check $TERM and whether terminfo is known to ncurses? Can connect with TERM=xterm?

I've managed to make some progress with terminfo.

By copying the terminfo directory from the ncurses build directory to
the target machine, and pointing the TERMINFO variable at this
database, I was able to able to get Ctrl-G working.

This also solved the remsh issue.

I'm not sure what is wrong with the standard /usr/share/terminfo
database from RHEL, but using this custom database makes Erlang a lot
happier.

I'm also not sure what the best way is to distribute the terminfo DB -
I imagine most OTP installs assume the existence of the OS standard
libncurses?
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions
Reply | Threaded
Open this post in threaded view
|

Re: remsh failing silently

Ben Hood
In reply to this post by dmkolesnikov
On Wed, Jan 2, 2019 at 1:22 PM Dmitry Kolesnikov <[hidden email]> wrote:

> Recently, I’ve been fighting with same issue of cross compiling VM to alpine/musl. Unfortunately, statically linked libncurses would not help (or I hold it wrong). I’ve made a cross-compile of ncurses + openssl and VM with --disable-shared. As result, VM is failed to start due to missing terminal info database. I suspect that you need to compile ncurses with some magic… Do you happened to know it?


What seems to work for me is to upload the terminfo database that is
generated as part of the ncurses build and point the TERMINFO env
variable to that database directory. This is on RHEL which seems to
have some quite aged stuff installed by default.
_______________________________________________
erlang-questions mailing list
[hidden email]
http://erlang.org/mailman/listinfo/erlang-questions