We have upgraded the community system as part of the upgrade a password reset is required for all users before login in.

Omega2S+ overlay JFFS2 filesystem corruption



  • Product being used: Omega2S+ on custom hardware
    Firmware version: omega2p-v0.3.3-b256, omega2p-v0.3.2-b246, also custom builds of latest https://github.com/OnionIoT/source

    Description of the issue and context:

    With from-factory firmware, or with one of the official firmwares above, the Omega2S+ seems to work fine.
    I put my Golang application files on it. Application runs fine
    Reboot and a lot of JFFS2 errors appear in the console log, the filesystem is broken, app cannot start, system binaries are also broken so I need to recover with uboot's flash from USB.

    Putting my application on ramdisk (/tmp), running the app, reboot, no errors from JFFS2.

    Putting my application on USB stick (/mnt/sda1), running the app, reboot, no errors from JFFS2.

    Putting my application files on the controller and rebooting before running the app, also breaks JFFS2.

    As a desperate workaround I tried to have the application stored on USB stick and only add the procd script to start it on flash (/etc/init.d).
    Again my FS becomes corrupt!

    Is there something residing on flash after I do sysupgrade -n -p so that I'm not getting back to a good state from that? The FS looks like from factory after that... Maybe it's not.

    Expected Behavior

    Filesystem stays intact.

    Observed Behavior

    Filesystem corrupted on reboot.

    Log extracts:

    [   54.861857] jffs2: notice: (1699) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x76ca80.
    [   54.872424] jffs2: notice: (1699) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x76ba3c.
    [   54.882971] jffs2: notice: (1699) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x76a9f8.
    [   54.893538] jffs2: notice: (1699) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x7699b4.
    [   54.904196] jffs2: notice: (1699) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x768970.
    [   54.914809] jffs2: notice: (1699) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x76792c.
    [   54.925408] jffs2: notice: (1699) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x7668e8.
    [   54.935955] jffs2: notice: (1699) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x7658a4.
    [   54.946488] jffs2: notice: (1699) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x764860.
    [   54.957084] jffs2: notice: (1699) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x76381c.
    [   54.967637] jffs2: notice: (1699) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x7627d8.
    
    [   55.240899] jffs2: iget() failed for ino #104
    [   60.268864] jffs2: warning: (1904) jffs2_get_inode_nodes: Eep. No valid nodes for ino #104.
    [   60.277413] jffs2: warning: (1904) jffs2_do_read_inode_internal: no data nodes found for ino #104
    [   60.286463] jffs2: iget() failed for ino #104
    [   65.314331] jffs2: warning: (1908) jffs2_get_inode_nodes: Eep. No valid nodes for ino #104.
    [   65.322816] jffs2: warning: (1908) jffs2_do_read_inode_internal: no data nodes found for ino #104
    [   65.331917] jffs2: iget() failed for ino #104
    [   70.359425] jffs2: warning: (1912) jffs2_get_inode_nodes: Eep. No valid nodes for ino #104.
    [   70.367973] jffs2: warning: (1912) jffs2_do_read_inode_internal: no data nodes found for ino #104
    [   70.377039] jffs2: iget() failed for ino #104
    [   75.404384] jffs2: warning: (1916) jffs2_get_inode_nodes: Eep. No valid nodes for ino #104.
    [   75.412868] jffs2: warning: (1916) jffs2_do_read_inode_internal: no data nodes found for ino #104
    [   75.422003] jffs2: iget() failed for ino #104
    

    Trying to list application files, most files are gone:

    root@Omega-266F:/# cd /isys/
    root@Omega-266F:/isys# ls -l
    [  142.705947] jffs2: notice: (1939) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x742450.
    [  142.716377] jffs2: notice: (1939) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x7422b0.
    [  142.727048] jffs2: notice: (1939) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x74216c.
    [  142.737619] jffs2: notice: (1939) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x74201c.
    [  142.747913] jffs2: warning: (1939) jffs2_do_read_inode_internal: no data nodes found for ino #105
    [  142.756959] jffs2: iget() failed for ino #105
    ls: ./config.jso[  142.762123] jffs2: warning: (1939) jffs2_get_inode_nodes: Eep. No valid nodes for ino #104.
    n: I/O error
    [  142.771902] jffs2: warning: (1939) jffs2_do_read_inode_internal: no data nodes found for ino #104
    [  142.782132] jffs2: iget() failed for ino #104
    ls: ./isysctrl: [  142.787342] jffs2: notice: (1939) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x8d642c.
    I/O error
    [  142.799085] jffs2: notice: (1939) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x8d619c.
    [  142.810262] jffs2: notice: (1939) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x8cd2f0.
    [  142.820553] jffs2: warning: (1939) jffs2_do_read_inode_internal: no data nodes found for ino #98
    [  142.829491] jffs2: iget() failed for ino #98
    ls: ./www: I/O error
    -rwx------    1 root     root           202 Nov 29 10:25 start.sh
    

    Steps to reproduce the issue

    1. Grab unused hardware (from-factory hardware) or flash firmware omega2p-v0.3.3-b256 with sysupgrade -n -p . Waiting until led stops blinking (overlay FS has been mounted from mtdblock6).
      Avahi is always stuck at "registering" this point and we want MDNS to discover the device, so we need to reboot or restart avahi.

      root@Omega-266F:/# ps | grep avahi
       2000 nobody    1588 S    avahi-daemon: registering [Omega-266F.local]
      
    2. Reboot. All looks good, no JFFS2 errors, avahi is now in "running" status.

    3. Transfer application.

      $ scp -r ../_testpackage/app root@10.150.201.26:/isys
      The authenticity of host '10.150.201.26 (10.150.201.26)' can't be established.
      RSA key fingerprint is SHA256:bmae8Lq0YSZqo/b83DFdxrquEs4fi6X1/xPYDtBKh20.
      Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
      Warning: Permanently added '10.150.201.26' (RSA) to the list of known hosts.
      root@10.150.201.26's password:
      config.json                                                                           100% 9260    94.1KB/s   00:00
      isysctrl                                                                              100% 1457KB 123.2KB/s   00:11
      start.sh                                                                              100%  202    27.3KB/s   00:00
      index.072ba125.css                                                                    100%  231KB 120.6KB/s   00:01
      index.093d5efa.js                                                                     100%  118KB 134.0KB/s   00:00
      index.html                                                                            100%  397    46.9KB/s   00:00
      
    4. Test application works OK on device

      root@Omega-266F:/# du -sh /isys
      1.8M    /isys
      
      root@Omega-266F:/# cd /isys/
      root@Omega-266F:/isys# ./isysctrl
      
    5. Reboot (using reboot command, I've also tried to use sync before rebooting).

    6. See JFFS2 errors or SQUASHFS errors during next boot.

    What's been attempted to resolve the issue

    Full golang binaries. Golang binaries that have reduced size (LDFLAGS) and compressed with upx (because initially I was thinking the file size was the problem).

    Various firmware versions. The very latest (v0.3.3-b256) and what I consider "the most stable previous version" (v0.3.2-b246).
    Also tried building the firmware myself, a minimal version, from the git repo (https://github.com/OnionIoT/source, openwrt-18.06 branch).
    Even tried building OpenWRT 22 (https://github.com/openwrt/openwrt/tree/openwrt-22.03 , supported as per https://openwrt.org/toh/start?dataflt[Brand*~]=onion).

    What kind of help are you looking for?

    I must be missing something, please help me understand why it breaks.

    Attachments (if any)

    None.



  • @magge Can you clarify please when you wrote Grab unused hardware (from-factory hardware) do you mean you used an Omega2+ on a dock? My first thought was an issue with the custom board but if you are experiencing the same issue with an Omega2+ on a dock then that would discount that line of thinking.



  • @crispyoz Hi and thank you for responding. Sorry I was unclear there; the Omega2s is on custom hardware. What I meant was just that I grabbed a device that I had not messed with, i.e. a device with an Omega2S+ that should be original / as-delivered.

    A bit more; Our device is quite simple (warning, I'm not an electrical engineer). It's voltage regulation and DC motor controllers (hooked up to GPIO for direction control), multi channel PWM controller (i2c), multi channel AD converter for motor current measurement (i2c). Buttons, connectors, LEDs. I'm powering my device with a lab/bench power.

    Are there any know electrical issues that I can try to look for? I have an oldskool Tekntronix oscilloscope that I might be able to investigate with...



  • @magge There are 3 scenarios I am aware of that have previously been reported to corrupt the file system.

    1. flaky power supply
    2. exhausted file system capacity
    3. EMI

    2 should be easy to isolate.
    3 perhaps try shielding the Omega2S temporarily and see if it this resolves your issue, if so then some redesign may be required.

    A fourth option can be a software defect that causes any files you're writing to disk to become corrupt, but JFSS2 is pretty robust so I'm not convinced this is the most likely cause.


  • administrators

    @magge what does your Golang program do? Are there a lot of writes/reads to the fs?

    Also, cheers for following the How to Ask for Help post.



  • @crispyoz Thanks again!

    1. I've seen the issue on more than one power supply, but I will try to check with oscilloscope if I can see something.

    2. Yes, that was suspect initially. However, FS is still corrupted for some reason even just adding a small /etc/init.d/ now (for the case where I run the app from USB storage). df -h reports still free space after I upload my files but my understanding is that JFFS2 is a compressing FS and that remaining space is a guesstimate.

    3. EMI. I read another article here on community that shielding was a thing. There is no radio on our device. Stick a tin foil hat on my Omega2S+ and see if it helps? 😆

    I will try check the power supply and get back to you. There are PWM'ed motors connected on the device, so I could see there being some noise coming back from that. That being said I'm almost sure I've seen FS go bad even without running the motors. I.e. uploading the software, rebooting without even running it, getting bad FS.

    If there is noise from motors/supply could that permanently damage the flash so that it's just broken now and that's why "nothing works"?
    Could a damaged flash operate like I have described due to caching smartness or similar; i.e. I upload the file and it looks okay, trouble only starts after a reboot?



  • @Lazar-Demin Thanks!

    My app logic only reads a config file, and that's it for FS access. It will then do GPIO to control motor directions and read buttons, I2C to PWM chip for motor speeds, I2C to AD converter chip for current measurements.
    Running the app does not seem to trigger issues from what I can tell. The trouble seems to start after uploading the files and rebooting.


  • administrators

    @magge Interesting. If the app only reads a config file then I think it's safe to assume the app itself isn't the problem.

    You should try to confirm this is the case. One way would be to try running the same software + firmware combination on all Onion hardware, like an Omega2S development kit or an Omega2 on a Dock.

    If the same issue happens, then we can safely assume the program is causing the fs issue.
    If not, we can rule out the software, and you can focus on your circuit/power supply/EMI/etc.



  • @magge If you don't have Omega2 and a dock, you could try just running your custom board but not your app. Then do some file system reads like ls -laR on a loop then see if rebooting results in a corrupted file system.



  • Short update.

    Unfortunately I don't have a Omega2P that is known to be good, I only have my custom hardware. Is there a prototyping board I should order when working with this? I tried to google for it but did not get any wiser...

    I have been working on this, trying to be more structured and changing less things at once, rebooting often and going through different variations.
    New observations when using the official omega2p-v0.3.3-b256.bin firmware

    • The binary used above (go build -ldflags "-s -w" and upx --brute, filesize < 2MB) can be transferred to Omega2P and MD5 does not change over reboot. No logs of filesystem corruption.
      However! If the app has been started at least once, the next reboot will log xz decompression failed data is probably corrupt, and next reboot the system is not well. My application fails and other things like opkg update will log errors, and there will be errors logged by squashfs . It was my understanding that squashfs was read-only on the Omega2P, so I'm not sure how it can fail.
      Recovery seems consistent by going to failsafe and wiping out rootfs_data (for example firstboot or mtd erase rootfs_data).

    • Alternative binary (no stripping, no upx, filesize 9.4 MB) I start getting JFFS2 warnings (errors?) in the log, like:

      [ 1179.029279] jffs2: warning: (703) jffs2_do_read_inode_internal: no data nodes found for ino #134
      [ 1179.038223] jffs2: iget() failed for ino #134
      

      The MD5 of the app binary changes over reboot (somehow it still worked though!).

      # before reboot
      $ md5sum /isys/isysctrl /mnt/sda1/isys-3/isysctrl-full
      ef7936ff1a4030871fd47687d9306965  /isys/isysctrl
      ef7936ff1a4030871fd47687d9306965  /mnt/sda1/isys-3/isysctrl-full
      
      # after reboot
      $ md5sum /isys/isysctrl /mnt/sda1/isys-3/isysctrl-full
      05300935ae776926d7ce20f145682392  /isys/isysctrl
      ef7936ff1a4030871fd47687d9306965  /mnt/sda1/isys-3/isysctrl-full
      

      /etc/banner got corrupted:

      BusyBox v1.28.3 () built-in shell (ash)
      
      â–’localâ–’â–’JSON_PREFIX=â–’1=â–’=â–’"â–’$___valâ–’"â–’eval_json_set_varâ–’1 .bpâ–’a_json_set_var1â–’btbâ–’bâ–’bâ–’bâ–’blb,bâ–’bb$bb_a_value=â–’â–’2=â–’localâ–’â–’JSON_PREFIX=â–’1=â–’=â–’"â–’${â–’JSON_PREFIX=â–’1=} â–’$_a_valueâ–’"â–’eval_jroot@Omega-266F:/#
      

      No xz decompression errors logged when rebooting after app has been running.

      JFFS2 seems to be in trouble , fex opkg fails:

      root@Omega-266F:/# opkg update
      [   86.107495] jffs2: notice: (1925) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x220558.
      [   86.118551] jffs2: notice: (1925) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x2204fc.
      [   86.129229] jffs2: notice: (1925) jffs2_get_inode_nodes: Wrong magic bitmask 0x0000 in node header at 0x22006c.
      [   86.139536] jffs2: warning: (1925) jffs2_do_read_inode_internal: no data nodes found for ino #113
      [   86.148627] jffs2: iget() failed for ino #113
      [   86.155510] jffs2: warning: (1925) jffs2_get_inode_nodes: Eep. No valid nodes for ino #113.
      [   86.164064] jffs2: warning: (1925) jffs2_do_read_inode_internal: no data nodes found for ino #113
      [   86.173074] jffs2: iget() failed for ino #113
      [   86.178849] jffs2: warning: (1925) jffs2_get_inode_nodes: Eep. No valid nodes for ino #113.
      [   86.187422] jffs2: warning: (1925) jffs2_do_read_inode_internal: no data nodes found for ino #113
      [   86.196445] jffs2: iget() failed for ino #113
      Collected errors:
       * opkg_conf_parse_file: /etc/opkg/distfeeds.conf:1: Ignoring invalid line: `â–’@'
      
      
    • The best working option I have found (no stripping, upx --brute, filesize 4MB).
      No errors after transfer.
      No errors first reboot after transfer.
      No xz decompression errors first reboot after app has been running.
      Still got /etc/banner corruption, opkg update still fails.



  • @magge If you don't use upx do you have the same issue?



  • @crispyoz Thanks for responding again!
    It should be 2nd bullet above, there I get a bunch of JFFS2 errors, so I'm not sure if maybe the file is just too big without upx, it's about 9.4 MB in that case.
    Looks to me like the 3rd bullet is slightly bettery (there the binary is not stripped but compressed with upx).



  • @magge Sorry missed that. We really need more information on your software's interaction with the file system. JFFS is a very robust system, Omega2 file system corruption such as you are describing is not something I have ever seen outside of a hardware design issue, and I don't recall any similar reports to your experience. My guess is that there is something in your code causing the issue and you've simply been lucky not to have experienced the issue on other hardware (I recall you mentioned your software was on another device).

    To remove hardware from the equation, I'd install your software on a stock power dock with Omega2+. Since we don't know what additional hardware your custom PCB includes, or exactly how your software interacts with the device, I can only offer an opinion on how I would proceed if faced with a similar issue.

    Honestly, I don't think the issue is the Omega2, your issue is simpy too reproducable not to have been experienced by those of us who have 100s or 1000s of Omega2 in production for years. So we're here to assist you in working thourgh the issue until the light bulb moment arrives 🙂


  • administrators

    @magge Thanks for sharing your detailed testing report.

    I agree with @crispyoz's suggestion to try running your software on stock Onion hardware - this way we can rule out circuit issues.

    @magge said in Omega2S+ overlay JFFS2 filesystem corruption:

    Is there a prototyping board I should order when working with this? I tried to google for it but did not get any wiser...

    There's an Omega2S development board you can use. Or if you would prefer, you can use an Omega2+ and an Expansion Dock.
    Our distributors have all 3 products in stock and ready to ship.



Looks like your connection to Community was lost, please wait while we try to reconnect.