Firmware Boot Failure on Onion Omega2+: Steady Orange LED, Stuck Bootloader Mode and Missing Hostname
-
@mayur_ingle A custom PCB makes it more tricky. Just so I am clear, you are using the Omega2+ (through hole version) not the Omega2S+ (surface mount)? Do you have a power dock and an ethernet expansion?
If the device can't boot how did you "Verified the /etc/rc.local configuration for any errors." I assume you mean you checked it from version control system, not the device directly.
I have a few thoughts on possible causes:
-
Have you checked your design against Hardware Design Guide? I read that your design is well tested already but I had a similar issue a few years ago where a pin should have been pulled/pushed (I forget) and had accidentally been so, but every once in a while my devices wouldn't boot or started to boot then stopped. I think this is less likely in your case, but worth checking.
-
My guess is the RAM is getting corrupted by something because we can see the correct relocation address, it reads the environment then stops. I recall a similar issue on another custom PCB, the Onion guys looked into it and adding some shielding resolved the issue.
-
You mentioned you are running some scripts, it is possible to corrupt your file system programmatically. Can you provide more detail on what the script(s) are doing.
If you are using an Omega2+ (through hole) my next step would be to insert it into a standard dock and view the boot process using minicom or some other terminal software. I looked at DockLight but haven't used it, but a raw terminal would remove any potential issues of handshaking or such causing the issue.
You can access the flash by removing the cover and accessing the chip directly (if you're adventurous) @luz provided some instructions on this. Check his posts for more details.
-
-
@crispyoz Thank you for your detailed response and suggestions.
1)Omega2+ Version Confirmation:
I am using the Omega2+ (through-hole version), not the Omega2S+ (surface-mount).2)RAM Corruption and Shielding:
Your observation about potential RAM corruption due to inadequate shielding, your point number 2 and 3 are insightful. I will investigate the custom PCB design to identify any potential interference or shielding gaps that might be causing the problem.3)Script Behavior and Potential File System Corruption:
I suspect the scripts might also be contributing to the issue. For clarity:
The main_app.py script captures Modbus data packets (32 packets of 105 bytes each) and appends them to an Excel file every 2 seconds. It then publishes a JSON string (~3147 bytes) via MQTT every 5 minutes. It handles reading Excel parameters, connecting to MQTT, and clearing processed data.
The check_system_status.sh script monitors the system and triggers a reboot if main_app.py encounters issues like corruption or logging failure.Important Observation:
The last log observed before the device got stuck in bootloader mode occurred immediately after executing the check_system_status.sh script which executes only if the main_app.py got corrupted/failure in data logging.
so auto reboot was provided in the same .sh script and it got stucked from then only.Recovery for Stuck Device:
Could you provide detailed steps to recover the device stuck in bootloader mode?
for the below Observed Behavior
The device fails to boot correctly, getting stuck in bootloader mode with a steady orange LED and no hostname availability.
Only a partial boot log is available after resetting.
-
@mayur_ingle this is a great bug report, very detailed!
Since you're using the through-hole Omega2+, I agree with @crispyoz 's suggestion to try out your stuck devices on a standard dock. Just to rule out any hardware issues.
IMO the issue is more likely to be file system corruption than RAM corruption
Avoiding File System Corruption
@mayur_ingle said in Firmware Boot Failure on Onion Omega2+: Steady Orange LED, Stuck Bootloader Mode and Missing Hostname:
The main_app.py script captures Modbus data packets (32 packets of 105 bytes each) and appends them to an Excel file every 2 seconds.
I responded to your colleague on a GitHub Issue but I will post this here for visibility:
A few other users have reported file system instability when programs are running that frequently write to the flash storage. To get around this, we recommend moving any file writes to the
/tmp
directory (as this is actually on the RAM, not the flash).In this case, data that should persist indefinitely should be copied over from
/tmp
to the flash filesystem (anything else on/
) at some longer interval, perhaps daily. Cron is solid tool for this copy job.Recovering Stuck Devices
How many stuck devices do you currently have?
I'd like to confirm if the bootloader can be accessed on a stuck device.On a working device, the bootloader menu can be enabled by powering on the device while holding the FW_RST pin (GPIO38) active. This reset pin is active-high, and this is the pin used by the reset button on the Omega2 Docks.
Keep in mind pressing the enter or space keys will not activate the bootloader menu.Please try this first on a working device, and then try it on a "stuck" device. Report back how it goes.
-
A few other users have reported file system instability when programs are running that frequently write to the flash storage
I can echo @Lazar-Demin's observation re regular flash writes leading to file system issues. I have a custom PCB based on Omega2S+ that maintains a sqlite3 database of network traffic on a specific set of ports, it then pushes counters via MQTT to our central server every few seconds. The upshot is that it is writing to FLASH sometimes every second or more. After about 3 months of normal use I started to see devices failing with file system issues. JFSS2 was able to fix many of the issues on restart and the sqlite3 db could be rebuilt to largely recover the historical data, but it was not a good long term solution.
I added an SD Card to my design and all my problems went away. Other than my hardware costs I use Kingston 16GB SDHC U1 C10 which you can buy for a few shekels each. A symlink or mountpoint negates the need to modify any software.
-
@Lazar-Demin Thank you for your detailed response and suggestions.
Updates and Observations
Setup: Using a custom PCB (not an expansion dock).1) File Writes to /tmp
Following your suggestion, I have updated the setup to move all file writes to the /tmp directory. This is intended to prevent flash wear and file system corruption.2) Recovery Attempts on Devices
I currently have three stuck devices.a) On a working device:
I followed the recommended steps: powering on while holding the FW_RST pin (GPIO38) active.
Observation: This process erases the existing firmware on the working device, resets it, and allows me to re-upload the firmware. After this, folders become accessible, and the hostname is visible.
I did not see a bootloader menu during this process; it directly erased the firmware and enabled reconfiguration. In short working fineb) On the stuck devices:
I performed the same steps as with the working device, but there was no change. The stuck devices remain in their frozen state, displaying the same log output as shared earlier:Board: Onion Omega2 APSoC
DRAM: 128 MB
relocate_code Pointer at: 87f60000
flash manufacture id: c2, device id 20 19
find flash: MX25L25635E
*** Warning - bad CRC, using default environmentCurrent Status:
Following your suggestion, I updated the file writes to /tmp and the device is under observation for 8 to 10 days (as my old device is got stucked after 8 days).Do you have any further suggestions for recovering the stuck devices, given that the bootloader menu doesn’t appear to be accessible? Could there be an underlying hardware issue contributing to this behavior?
I look forward to your advice.
-
@crispyoz Thank you for sharing your experience.
I’m using the Omega2+ (through-hole version) on a custom PCB, which already has an SD card slot below it.
**Do I need to configure anything on the software side for the SD card, or can I simply insert the SD card and start using it?If any configuration steps are required, could you kindly share them with me? Your guidance would be greatly appreciated.**
-
@mayur_ingle I can see you're using a standard Onion release of the firmware so the SD Card requirements are pre-installed. You can insert an SD card into the slot and you'll see in the log something like this:
[63130.024501] mmc0: new high speed SDHC card at address 59b4 [63130.041132] mmcblk0: mmc0:59b4 SD16G 14.6 GiB [63130.048468] mmcblk0: p1 p2
So we can see the device is mmcblk0 (the first device) and it has two partitions, p1 and p2. You set it to automount using these commands:
uci set fstab.@global[0].auto_mount='1' uci commit fstab
Re-insert the card and you should see the card is mounted automagically. Use the mount command to see where it was mounted:
/dev/mmcblk0p2 on /mnt/mmcblk0p2 type ext4 (rw,relatime) /dev/mmcblk0p1 on /mnt/mmcblk0p1 type vfat (rw,relatime,fmask=0000,dmask=0000,allow_utime=0022,codepage=437,iocharset=iso8859-1,shortname=mixed,errors=remount-ro)
This card I used above is from a Raspberry Pi 4B so it has two partitions, you probably will only see /dev/mmcblk0p1, you can see it is mounted at /mnt/mmcblk0p1, you can see it's contents using the command:
ls -la /mnt/mmcblk0p1
To set the default mountpoint for the card to my preferred location of /etc/myappname/data we can change this by adding an entry to fstab.First we need the UUID assigned to the SD Card device using the command:
block info
The output will be something like this:
/dev/mtdblock5: UUID="188c96f5-f6939c36-1805def7-637d9f6c" VERSION="4.0" MOUNT="/rom" TYPE="squashfs" /dev/mtdblock6: MOUNT="/overlay" TYPE="jffs2" /dev/mtdblock7: MOUNT="/mnt/mtdblock7" TYPE="jffs2" /dev/mmcblk0p1: UUID="3537-3964" LABEL="NO NAME" VERSION="FAT32" MOUNT="/mnt/mmcblk0p1" TYPE="vfat"
You can see the UUID of the mmc device at the bottom is "3537-3964" and the card is formatted as FAT32. Now I can add the default mount point using the following commands:
uci add fstab mount uci set fstab.@mount[0].uuid='3537-3964' <--- UUID you found above uci set fstab.@mount[0].target='/etc/myappname/data' uci set fstab.@mount[0].enabled='1' uci commit fstab
Remove the card then reinsert it, you should now be able to see the card is mounted at /etc/myappname/data
You can point your database or script output to this directory and everything will be written to your SD Card.
-
@mayur_ingle said in Firmware Boot Failure on Onion Omega2+: Steady Orange LED, Stuck Bootloader Mode and Missing Hostname:
a) On a working device:
I followed the recommended steps: powering on while holding the FW_RST pin (GPIO38) active.
Observation: This process erases the existing firmware on the working device, resets it, and allows me to re-upload the firmware. After this, folders become accessible, and the hostname is visible.
I did not see a bootloader menu during this process; it directly erased the firmware and enabled reconfiguration. In short working fineThis is not expected.
Can you elaborate on your observations? What do you mean by it erases the existing firmware on the device? What were the steps you had to do to make this happen? Can you post a log of the terminal?Expected behaviour
If you have Omega2 devices manufactured in the last ~7 years, you should see a bootloader menu if the device is powered on with the FW_RST active:
You then need to select an option from the menu. See the Firmware Flashing With Web Recovery Mode docs article for the full process.
@mayur_ingle said in Firmware Boot Failure on Onion Omega2+: Steady Orange LED, Stuck Bootloader Mode and Missing Hostname:
Following your suggestion, I updated the file writes to /tmp and the device is under observation for 8 to 10 days (as my old device is got stucked after 8 days).
Great! I suspect this will resolve the issue. Let us know how it goes!
-
@Lazar-Demin Now I observed after retest in working device I got the menu as expected from your mentioned steps:
log as below
b) On the stuck devices:
but on a stuck devices device it's no change, observed as same logs which updated earlier for not working device:
I performed the same steps (multiple times) as with the stuck devices. The stuck devices remain in their frozen state, displaying the same log output as shared earlier for not working device log.Observed Behavior
The device fails to boot correctly, getting stuck in bootloader mode with a steady orange LED and no hostname availability.
Only a partial boot log is available after the mentioned steps followed.log as below:
Board: Onion Omega2 APSoC
DRAM: 128 MB
relocate_code Pointer at: 87f60000
flash manufacture id: c2, device id 20 19
find flash: MX25L25635E
*** Warning - bad CRC, using default environment
-
@mayur_ingle ok, glad to hear you're now seeing the expected behaviour with the working devices.
The situation is a little unusual with the stuck devices. I didn't expect the bootloader to be impacted by the file system issue.
I agree with what @crispyoz said above:
If you are using an Omega2+ (through hole) my next step would be to insert it into a standard dock and view the boot process using minicom or some other terminal software. I looked at DockLight but haven't used it, but a raw terminal would remove any potential issues of handshaking or such causing the issue.
For the stuck devices your next step should be trying them on a standard Dock from Onion, and using a simple terminal program like screen, minicom, or putty to try to activate the bootloader menu.
Otherwise, these 3 devices might be write-offs. You can try to recover them by using an external device to rewrite the flash but we (Onion) don't recommend this procedure as a lot can go wrong.
-
@Lazar-Demin
Based on your suggestion, I updated the file writes to /tmp and monitored the device for 8–10 days. Initially, it performed as expected, but after 14 days, the device entered a stuck mode, similar to previous occurrences.
The symptoms include a steady orange LED, being stuck in bootloader mode, and a missing hostname.I’ve attached the last log before this device get in stuck mode (includes your updated suggestion into it) for your reference. The logs include lines related to LED set 1, dropbear, udhcpd, and python. The script is designed to automatically reboot and restart the system if Python is detected as not running, which has successfully handled such situations in the past (in this 14 days), also attaching the normal working log below at this situation
However, despite this safeguard, the device entered the stuck state after the attached log, failing to recover as intended.Last log device stuck at python not running situation:
Log when detected situation for python not running (but device working after reboot automatically):
Do you have any further suggestions for recovering the stuck devices, given that the bootloader menu doesn’t appear to be accessible? Could there be an underlying hardware issue contributing to this behavior?
I look forward to your advice let me know if there is anything, I can also do something externally on hardware (onion omega2+ through hole board Model OM-O2P).
-
@mayur_ingle said in Firmware Boot Failure on Onion Omega2+: Steady Orange LED, Stuck Bootloader Mode and Missing Hostname:
Initially, it performed as expected, but after 14 days, the device entered a stuck mode, similar to previous occurrences.
The fact that it now takes 14 days instead of 8-10 days is strange. If the cause of modules getting stuck was indeed too many writes to flash, then I would expect the problem to be resolved. If this was not the cause, I would expect the time it takes to get stuck to be the same as before. It's strange that it changed by a few days.
From a high-level, what else does the script do? Are there any other interactions with the filesystem?
How often do you move the files from /tmp to the flash storage?
@mayur_ingle said in Firmware Boot Failure on Onion Omega2+: Steady Orange LED, Stuck Bootloader Mode and Missing Hostname:
The script is designed to automatically reboot and restart the system if Python is detected as not running,
How are you starting the python program in the first place? And is there a specific reason why you're rebooting the whole device if the python program stops running?
A more straight forward solution would be to run the python program as a service: the system will take care of restarting it if it stops executing, without the need for a full reboot and downtime.
This is the recommended approach.See this blog post for more information on running a program as service: https://onion.io/2bt-custom-initd-service/
@mayur_ingle said in Firmware Boot Failure on Onion Omega2+: Steady Orange LED, Stuck Bootloader Mode and Missing Hostname:
Do you have any further suggestions for recovering the stuck devices, given that the bootloader menu doesn’t appear to be accessible?
This is a very strange situation. The bootloader is in a completely separate partition, I'm not sure how anything done in linux can impact the bootloader partition. Especially since the bootloader partition is set to read-only from Linux...
Suggestions from my previous post:
For the stuck devices your next step should be trying them on a standard Dock from Onion, and using a simple terminal program like screen, minicom, or putty to try to activate the bootloader menu.
Otherwise, these 3 devices might be write-offs. You can try to recover them by using an external device to rewrite the flash but we (Onion) don't recommend this procedure as a lot can go wrong.
@mayur_ingle said in Firmware Boot Failure on Onion Omega2+: Steady Orange LED, Stuck Bootloader Mode and Missing Hostname:
Could there be an underlying hardware issue contributing to this behavior?
I find this highly unlikely. Otherwise this issue would be seen on all Omega devices eventually.
Since other users have seen a similar issue and successfully resolved it by moving the writes to /tmp, I would venture a guess that it has something to do with how the compressed filesystem reacts to very frequent writes to files.There's one more avenue you could explore. Try running the same program (but as a service and all file writes going to /tmp, and your reboot script removed) on the new beta firmware.
The beta firmware is based on kernel 5.15, there may be updates to how the kernel interacts with the flash that could resolve the issue you're seeing.More info and installation instructions for the beta firmware found here: https://onion.io/embracing-the-future-new-omega2-beta-firmware-and-documentation-site/
Let me know how it goes!
-
@mayur_ingle For my 2c I'd run your device on a standard dock and see if you experience the same issue, I'd move yoru write to an SD Card and would also add a regular file system check that reports to the console to your script.