IOS APs - Flash Issues
Introduction
There have been some instances where IOS APs have failures reporting no flash access or file system corruption, especially when upgrading the WLC.
The problem can cause the following status on the AP after a reboot:
- AP losing configuration but register to the WLC
- AP unable to perform an upgrade
- AP stuck in a booting loop
- AP end up in ROMMON status - console to the AP is need to recover the AP
- AP might not be able to load the image from flash
- AP looses IOS images (recovery/ code image)
WLAN Poller Logic
The script works as follow:
Verify APs
The script first verifies whether the AP’s flash is accessible (by looking into the “show file systems” output)
If it’s accessible, then it runs “fsck flash:”
If all is OK, move on
else repeat the command up to 4 times, until there are no more errors.
This is done in order to make sure that the file system is clean before we even attempt a recover.. but if an error is reported, this will get recorded.
Script also verifies the MD5 checksum for critical files :
- IOS image
- Radio firmware
- Power tables
The script has to run at least once to build the MD5 hash database, so it would be able to detect MD5 mismatch only starting with the 2nd run.
The reason for this is that it’s quite impractical to maintain a static DB of such MD5 values (multiple files, multiple releases, multiple AP platforms), therefore at the first run the script checks for each specific file what is the MD5 checksum.. and picks the one that got most hits across all the APs on a WLC as the good one; this means that you’d need multiple APs of the same family (e.g. ap3g2 will cover AP 2600/3600/1700/2700/3700.. so it doesn’t necessarily need to be the same exact model..) in order for this part to work correctly.
Note: The “database” of MD5 checksums is stored on the script’s working directory in JSON format:
ap_md5_db.json - this is the current file
ap_md5_db.json.bkp - backup from the previous run (just in case..)
Recover APs
This part triggers a “test capwap image capwap” command only on the APs where the flash is accessible but some errors were found, either fsck and/or MD5 mismatch
This method of recovery will cause the AP to reload once the image is downloaded and installed, so you may want to run this in the evening/night, anytime when there’s no impact to the users.
Note: You can recover AP on demand by configuring config.ini file. Just add the AP list that you want to recover instead of having the script to pull the list from WLC and recover all AP registered if needed.
Script OutPut
You’ll find the raw CLI command output on the poller’s “data” directory, under a // path, one file per device containing all the CLI command output
On the same data directory for the current day you’ll also find the following files:
_ap_fs.csv - this is the file system check report, indicating the result of the above-mentioned tests for each AP
_ap_md5.csv - this file will have a list of all the files checked on all APs, each one with the computed MD5 check (and, starting with the 2nd run, whether the MD5 matches the “good” one for such file or not)