Omega2+ goes dead - under process load ?



  • Hi all,

    I am doing a load-test (a compiling) as an ordinary user over ssh (b233).

    There is ample disk space as /overlay(/dev/sda1). Disk subsystem appears to be fine (repeated trials).

    The info is taken from the console:
    root@Omega-745F:/# time dd if=/dev/zero of=/home/admin1/dd.out bs=1024 count=1M
    1048576+0 records in
    1048576+0 records out
    real 4m 4.14s
    user 0m 3.11s
    sys 0m 43.04s

    root@Omega-745F:~# free (at some point in time during compiling)
    total used free shared buffers cached
    Mem: 124808 101556 23252 2132 548 4164
    -/+ buffers/cache: 96844 27964
    Swap: 262136 125552 136584

    root@Omega-745F:~# uptime
    12:50:17 up 9 days, 17:05, load average: 3.79, 3.77, 3.33

    The power draw went to 0.2A from the usual (wifi ON, CPU idle) 0.12/0.13A. The metal shield's temperature appeared to be normal.
    In fact, I have run the process few times after resetting the power.

    Omega2+ goes dead all of a sudden while in the middle, every itme. The amber LED goes OFF. No specific log on the console. SSH connection drops.

    I guess this is because the load was too high ? Is it because of a thermal shutdown mechanism that gets activated ?

    Thanks..



  • This is just an update:

    The /dev/sda, (originally, a SanDisk thumb drive) was replaced with an external USB drive (SSD sata inside the enclosure), using external PS (PS set to 5.1vdc, that drops to 5.03 vdc under 2A+ draw using a rheostat)
    Observation is as above..
    Thanks..



  • @tjoseph1 I start by wondering why you are compiling on the device at all, this will always be a dead end street. But assuming you are simply using this as a load test, I will be happy to duplicate your test if you can provide more details.



  • @crispyoz :

    Thank you so much..

    Please see the steps below:

    -Reflash Omega2+
    # sysupgrade -n /tmp/2p-b233.bin

    -After reboot, connect thumb drive to dock's USB port.
    run fdisk on /dev/sda1; sda1 = ext4 (+4G), sda2 = swap (+512M);

    -mkfs.ext4 on sda1 and mkswap on sda2; "block info" shows:
    /dev/sda1: UUID="7a784400-e822-472c-bad4-bb4b2ca64f69" VERSION="1.0" TYPE="ext4"
    /dev/sda2: UUID="236ace29-0493-432a-b628-36249636bc67" VERSION="1" TYPE="swap"

    -Setup overlay; modify the fstab entries..

    #cat /etc/config/fstab
    config 'global'
    option anon_swap '0'
    option anon_mount '0'
    option auto_swap '1'
    option auto_mount '1'
    option delay_root '5'
    option check_fs '0'

    config 'mount'
    option target '/overlay'
    option uuid '7a784400-e822-472c-bad4-bb4b2ca64f69'
    option enabled '1'

    -add 'swap' to overlay fstab

    # cat /mnt/sda1/upper/etc/config/fstab
    config 'swap'
    option target '/dev/sda2'
    option uuid '236ace29-0493-432a-b628-36249636bc67'
    option enabled '1'

    Reboot to make overlay and swap active
    # df -hT
    Filesystem Type Size Used Available Use% Mounted on
    /dev/sda1 ext4 4.0G 606.3M 3.2G 16% /overlay
    overlayfs:/overlay overlay 4.0G 606.3M 3.2G 16% /

    # free
    total used free shared buffers cached
    Mem: 124808 46192 78616 88 8400 15732
    -/+ buffers/cache: 22060 102748
    Swap: 532476 0 532476

    # wifisetup
    # opkg update
    # opkg upgrade wget
    # opkg install make shadow-useradd shadow-groupadd sudo
    # groupadd sudo

    /* Uncomment /etc/sudoers; group sudo */
    %sudo ALL=(ALL) ALL

    # mkdir /home
    # useradd -m -d /home/admin -G sudo admin

    -set passwd for admin & login as admin over ssh
    $ wget -c http://musl.cc/mipsel-linux-muslsf-native.tgz

    /* extract to, say /opt */
    ~$ ls -l /opt/
    drwxrwxr-x 8 root root 4096 Sep 2 21:51 mipsel-linux-muslsf-native

    /* create ~/bin and add the links */
    ~$ ls -l ~/bin
    lrwxrwxrwx 1 admin admin 38 Oct 23 10:56 ar -> /opt/mipsel-linux-muslsf-native/bin/ar
    ..
    lrwxrwxrwx 1 admin admin 39 Oct 23 10:56 c++ -> /opt/mipsel-linux-muslsf-native/bin/c++
    ..
    lrwxrwxrwx 1 admin admin 41 Oct 23 10:56 strip -> /opt/mipsel-linux-muslsf-native/bin/strip

    -create .profile
    ~$ cat .profile
    export PATH="${HOME}/bin":$PATH

    /* logout and login */
    ~$ which gcc
    /home/admin/bin/gcc
    ~$ gcc -v
    ..
    Target: mipsel-linux-muslsf
    Configured with: ../src_gcc/configure --enable-languages=c,c++,fortran --.
    ..
    gcc version 9.3.1 20200828 (GCC)

    /* Compile musl-libc (Never saw compile failed..)
    Never saw "load average" going above 2.0
    (at this point /lib/ld-musl-mipsel-sf.so.1 points to /lib/libc.so) */

    lrwxrwxrwx 1 root root 7 Oct 28 12:59 ld-musl-mipsel-sf.so.1 -> libc.so

    $ wget -c http://musl.libc.org/releases/musl-1.2.1.tar.gz
    /*
    Extract, "configure", "make", "sudo make install" will install to /usr/local/musl
    after install /lib/ld-musl-mipsel-sf.so.1 shall point to /usr/local/musl/lib/libc.so
    */
    $ time make
    ..
    real 18m 27.81s
    user 16m 5.47s
    sys 2m 11.48s

    $ sudo make install
    ..
    $ ls -l /lib
    ..
    lrwxrwxrwx 1 root root 27 Oct 29 07:59 ld-musl-mipsel-sf.so.1 -> /usr/local/musl/lib/libc.so
    ..
    /* Compile protobuf (Always fail by Omega2 going dead..) /
    $ wget -c https://github.com/protocolbuffers/protobuf/releases/download/v3.13.0/protobuf-cpp-3.13.0.tar.gz
    /
    Extract; cd protobuf-3.13.0 */
    $ CXXFLAGS="-march=24kec
    -mfix-24k
    -mips32r2
    -mmcu
    -mtune=24kec
    -msoft-float"
    DIST_LANG="cpp"
    ./configure --prefix="/usr/local/protobuf"

    $ time make
    Fails .. (needs more time than musl-libc compiling, above)

    Note:
    In the qemu-mipsel system, the compile succeeds:

    $time make
    make all-recursive
    make[1]: Entering directory '/home/admin1/protobuf-3.13.0'
    ..
    make[2]: Leaving directory '/home/admin1/protobuf-3.13.0/src'
    make[1]: Leaving directory '/home/admin1/protobuf-3.13.0'

    real 752m25.591s
    user 731m37.532s
    sys 15m6.924s

    I had to modify two .la files, though..
    1:
    $cat /opt/mipsel-linux-muslsf-native/lib/libstdc++.la
    #libdir='/lib'
    libdir='/opt/mipsel-linux-muslsf-native/lib/'

    2:
    $cat /opt/mipsel-linux-muslsf-native/lib/libatomic.la
    #libdir='/lib'
    libdir='/opt/mipsel-linux-muslsf-native/lib/'



  • @tjoseph1 I was able to duplicate your problem. I ran the make 3 times and on each occasion the device died. I think it it running out of VM, so I'll increase swap and test again. I don't think this is a load issue, top shows cpu running around 55-85% during make.

    Is yours dying compiling google/protobuf/extension_set.lo?



  • @crispyoz :

    Thanks a lot for taking time to check..

    To me, the device goes down at random, not while compiling any specific unit..
    It is dead now, at google/protobuf/descriptor.lo

    One more thing, I have seen if we add "-mlxc1-sxc1" to CXXFLAGS, that would add more load..



  • @crispyoz :

    It compiled with OpenWRT 18.06.8 ramips fw.

    make[2]: Leaving directory '/home/admin1/protobuf-3.13.0/src'
    make[1]: Leaving directory '/home/admin1/protobuf-3.13.0'
    real 5h 24m 21s
    user 4h 44m 42s
    sys 14m 16.34s

    Setup is the same. Required to install more modules:
    fdisk block-mount e2fsprogs kmod-fs-ext4 kmod-usb-storage kmod-usb2

    Could you try that ?

    Thanks..



  • @crispyoz :

    The O2+ with b233 FW holds more time after installing 2 small heat-sinks on the metal shield:

    OMG-Heatsink.png

    The left one is running OpenWRT 18.06.8(active); right one was running b233 (dead).

    Thanks ..



  • @crispyoz :

    Your observation about VM was correct. Both OnionOS b233 and OpenWRT 18.06.8 crashes with 512MB swap.

    Observations:

    With 1024MB swap and with a pedestal fan blowing air on the metal-shield, OpenWRT can compile. With heat-sink as in the above photograph, OpenWRT can compile.

    OnionOS b233 was never able to complete the process.

    I think we are looking at 2 issues to address:

    1. OnionOS stability
      During compiling of "google/protobuf/descriptor.lo" processor load goes too high. We can feel it.. if we press <enter>, the response time it takes to get the new prompt is that of a heavily overloaded m/c.
      Further, during that time:
      root@Omega-745F:~# uptime
      13:39:14 up 1:20, load average: 5.73, 2.38, 1.43 and it died after a second or two.

    OpenWRT never exceeds 2.0 load average, and most of the time it stays below 1.0.

    1. Thermal design
      Needs a heat sink.

    Do you know by any chance, that if the metal shield is removed, can a small heat sink be mounted without shorting other components?

    I think the thermal part does have a role during field deployment.down..

    And the good part is that, 7688 has built-in thermal shutdown. The OpenWRT processor(left) is the same one, that was running OnionOS earlier, and crashed at least 10 times by now..

    Thanks..



  • @tjoseph1 I think all you have done is discovered what we already know. IoT devices are not designed for heavy load, they are minimalist devices.

    I also don't understand why anyone would use an IoT device as a development platform, this is not what it is designed for, it is designed as a deployment target. As you have discovered, you will spend too much time trying to fix issues when you use a device for something it is not designed for.

    You can remove the protective shield, I ran an Omega2+ for a few months with no shield, alas it got caught in the rain and went poof! Maybe you could place a heatsink on the Mediatek chip, but I'm not sure that is where the issue is, I have a suspicion it is the DDR2.



  • @crispyoz :
    Yes. You are right..IoT devices are minimalistic. Yet, I would think that the time is not wasted, it helped me to learn the behavior and the limits it can go..and how far it could be stretched..without breaking it.

    There are so many IoT devices out there..each has it's own limits. In that sense, minimalistic is a generic term, to me.

    My job is to build a reliable gadget..by looking from every corner. I am not at all constrained by time šŸ™‚



  • @tjoseph1 If you are not constraint by time then you are a very lucky person. Most of us are building stuff and we have timelines and time is money.



  • @crispyoz :
    Yes..what I said is true, for many years now.
    But, that makes me veture into unchartered waters..spend time on otherwise unnecessary things, sort of defocussed.

    When I look back, ppl who work under time constraints are lucky..

    Grass is greener..the other side.. šŸ™‚



  • @crispyoz :
    One other thing that was not quite right in this whole experiment was a "limited" raw swap patition?

    Wouldn't it be better to be on a file on sda1(that spans the entire sda), that is recreated at each boot, so that different blocks would be used for rw every time? r-only blocks stays. rw block decays isn't it?



  • @tjoseph1 Decay is a whole separate discussion because it depends on your use case. For my devices decay is not an issue as the devices read/write ratio is around 1:4200.


Log in to reply
 

Looks like your connection to Community was lost, please wait while we try to reconnect.