2020-12-23 at 8:40 PM #14976
I’m just wondering who might be doing some system stability/reliability testing ? and what their setup is.
By system stability: I define it as test setup representing real world conditions that can exercise a majority of core code.
Traditionally, in open source this is such a sqiggy area, that it requires a community of supporters to exercise it. In commercial organizations, this is typically performed by a quality assurance group that reports to a different level of management than the software designers.
My core stability test setup consists of a Mayfly 0.5b, with a wifi Digi XB2B, a temperature/humidity sensor AM2320 and a battery 4A, with a uSD, and a modified FTDI monitoring cable. My 2nd stability system consist of the same but with a Digi LTE modem.
The modified FTDI monitoring cable is to be able to provide non-intrusive monitoring for system status.
Mayfly 0.5b is the latest, and for want of any other definition, only the latest revision of hardware should be used.
Digi WiFi XB2B a local stable wifi may provide the most reliable access to the internet, and the WiFi provides the ability to manage a gateway error conditions. Big advantage with WiFi is unmetered connection.
4AHr LiIon Adafruit-354, Charging temperature 0~45C, discharge -20~60C . I’ve been having problems with 2A batteries in cold temperatures that I can’t explain.
Pimoroni AM2320 , Adafruit-3721 is so that the I2C subsystem is exercised Adafruit-3721
For real-world conditions I define it as varying equipment temperature, 0C(-5C) to 40C OK, varying wireless connectivity, varied solar including no-solar till the battery exhausts.
This generally looks like this – https://github.com/neilh10/ModularSensors/wiki/Testing-overview
The other part of a stability/reliable system is having a clear set of test objectives with a repeatable build process to produce software, which I believe I have. I always start from no libs and let it pull in everything.
I am building off 0.27.0 While ModularSensors is amazing, it has a diverse ecosystem of supporting modules and a reliable reference system is the basis of any reliability discussions.
I had been hoping to get a stability test for 0.27.0 on my standard system over the hols, but after running for a week of testing 24-48hours, finding some issues, tweaking , trying again, all I can say is I am running into issues on the basic stability system.
I did have some pretty good stability tests in the past (0.25?) , so now I’m questioning those, and thinking how I might check them out again. One big difference from my last tests is its mild winter, temperature drop closer to 0C than before.
The issues are the XB2B hybrid is not always responding, and I’m getting some unexpected RESETs reboots.
The XB2B hybrid seems to get busy, and a series of “+++” do not result in a response. I’ve solved some of them with some strategic delays(), but I think I need to explore TinyGSM 0.10.9
I’m getting periodic resets ~ at this point I can only think it is the WatchDog kicking in. However since it doesn’t advertise when its going to bite, I need to do some work on this and also how to figure out what caused the RESET (read MCUSR early) . However after reset, then the XB2B hybrid is not able to find the WiFi SSID sometimes for a couple of hours. If I turn off the power, and then power it back up it immediately connects.
The link to the data.monitormywatershed.org often doesn’t get a response over many Posts. I’m sort of assuming this may mostly on the MMW side, but could also be the XB2B links not always going up, so trying to figure out a reliable configuration. There maybe something happening with MMW response, that is then making the XB2B busy such that it doesn’t respond to “+++”
So really just putting this out there to see if anybody else on a defined release has got a setup that they are getting some good longterm results with? 🙂
2021-01-04 at 9:57 PM #14995
Some status – the periodic Mayfly RESETs turned out, I think, to be happening when polling the Insitu LT500 gauge over SDI12. https://github.com/EnviroDIY/ModularSensors/issues/344
The new code breaks the polling of the Insitu LT500, I’m doing debug under various scenarios to try and under stand it.
I’m seeing instability on the WiFi S6B hybrid that I didn’t see on the 0.25.0 release. It is repeatable, but doesn’t make a lot of sense to me at this point – so maybe its driver error. https://github.com/EnviroDIY/ModularSensors/issues/347
My next step is to try and do a poweroff of the WiFi S6B instead of Sleep using the LTE Bee Adapter.
2021-01-11 at 9:58 PM #15009
Well after running for some time with the WiFi S6B, and still having problems after it has been running for a couple of hours,
I’ve switched to one my target field internet comms of Digi XBee3 LTE-M over Verizon.
Very interesting the Digikey RevXsystems https://dataplans.digikey.com (thanks to @mbarney for the suggestion) has a new low cost 50M/month CAT-M1 plan.
Wow this is excellent!. Based on some recent past experience with a marginal/failing system this is going to be worthwhile.
Since this was a new Xbee3 LTE, I have upgraded the Xbee3 to the latest software using the Digi “TH Development” board. Then with on the LTE adapter on the Mayfly, I plugged in the battery. At its core the sw is @srgdamiano hard sweat with DigiXBeeCellularTransparent.cpp, though I have modified it to show the connecting process to the cell network.
Since connecting for the first time over LTE and Verizon is such an occasional event, I thought I would paste in the trace.
It connected first time… (whew!)
Attempting to connect to the internet and synchronize RTC with NIST
This may take up to two minutes!
Lte internet comms with Digi XBee3 Cellular LTE-M IMEI OK HwVer 4B48 FwVer 11417
Loop=Sec] rx db : Status ‘ Operator ‘ #Polled Cell Status every 1sec
0=7.89] 0:0x22 ‘OK’
1=8.91] 0:0x22 ‘OK’
2=9.93] 0:0xff ‘OK’
WATCHDOG ISR barksUntilReset 149 <–WatchDogAVR
3=10.95] 0:0xff ‘OK’
4=11.97] 0:0xff ‘OK’
5=12.99] 0:0xff ‘OK’
6=14.01] 0:0xff ‘OK’
7=15.04] 0:0x22 ‘OK’
8=16.06] 0:0x22 ‘OK’
9=17.08] 0:0x22 ‘OK’
10=18.10] 0:0x22 ‘OK’
Try +CREG ‘
11=19.13] 0:0xe ’22’
WATCHDOG ISR barksUntilReset 148 <–WatchDogAVR
12=20.15] 0:0x0 ’22’ Cnt=1
13=21.17] 0:0x0 ’22’ Cnt=2
14=22.19] 0:0x0 ’22’ Cnt=3
Digi Xbee3 setup Sucess. Registration ‘ 0 ‘
mdmIP[ 1 / 16 ] ‘ 0.0.0.0 ‘= 7
mdmIP[ 2 / 16 ] ‘ 0.0.0.0 ‘= 7
WATCHDOG ISR barksUntilReset 147 <–WatchDogAVR
mdmIP[ 3 / 16 ] ‘ 0.0.0.0 ‘= 7
mdmIP[ 4 / 16 ] ‘ 0.0.0.0 ‘= 7
mdmIP[ 5 / 16 ] ‘ 100.104.156.99 ‘= 14
XbeeWLTE IP# [ 100.104.156.99 ]
0 ] Connect time.nist.gov
WATCHDOG ISR barksUntilReset 146 <–WatchDogAVR
WATCHDOG ISR barksUntilReset 145 <–WatchDogAVR
1 ] Connect time.nist.gov
WATCHDOG ISR barksUntilReset 144 <–WatchDogAVR
NIST responded after 2562 ms
Internal Clock within 5 seconds of NIST.
Putting modem to sleep
2021-01-13 at 6:10 PM #15011
I’ve received more “LTE Bee Adapter” cards, and looking to experiment using it to investigate why the Xbee WiFi S6B is not reliably connecting to the local wifi network. https://github.com/neilh10/ModularSensors/issues/21
The LTE Bee Adapter card provides power directly from the LiIon battery, control of the Xbee reset, and potentially also an Xbee power OFF capability.
The WiFi S6B is specified for 3.14 to 3.46V so this isn’t a good long term solution. The LiIon Battery can be up to 4.2V.
Part of my tests are to let the LiIon battery discharge, as might be expected in the field with little sun, and then incrementally charge it back up, as solar is available, possibly over days. This is one of the most difficult parts of powering (and testing), slowly varying power availability. This appears to be causing some unreliability, and I haven’t yet been able to identify if it is something in my setup or something else.
Part of LiIon battery discharge characteristic, is its voltage drops and internal impedance rises. The priority is to keep the Mayfly running, with good traceable wall time, taking sensor readings (with wall time) and then transmit (when power available) to the internet. The rate of voltage drop, and impedance rise, is dependent on the capacity of the LiIon battery. I’m standardizing on a 4AHR outdoor (-10C?) rated battery. LiIon impedance also rises as temperature drops. So there is a narrow window of when the battery, as measured by its voltage, can support the highest dynamic power demand – typically when using RF power. For the real world, discharging a 4AHR battery can take a week, which is a good thing normally, but for testing I’m having to be creative.
So the first part of the test with WiFi S6B/LTE Bee Adapter was to see if it would get into the state of not connect to the WiFi network -~ and if did, would the RESET bring it out. However in overnight/24hrs it has connected to the WiFi every time as expected. So that’s a good thing. (Though MMW POSTs gave my a “201” in 500mS about 1-in-5 times, with the more typical no response timeout being 3000mS)
So going to go to back to standard powering WiFi, but with logic sensors on the WiFi S6B, and also add 0.1uF ceramic decoupling capacitance directly to the WiFi module pins, which will allow me to monitor the Vcc as well as logic sensor.
2021-01-17 at 10:33 PM #15017
I’m testing the stability of the Mayfly with the Digi WiFi S6B with software using the 0.27.5 base., running off the LiPo battery.
Some of the the WiFi S6B hybrids initially connect to the WiFi network, get NIST time, and then later after a couple of cycles of sleeping/waking will no longer connect to the wifi. I’ve connected a Salaea Logic Analyzer (8 Channels) to a number of pins on the WiFi S6B, and its showing problems with the power rail. The S6B hybrid has a tight specification for Vcc of 3.14 to 3.46V
With power supplied from the USB +5V rail(500mA)+LiIon, the Salaea Analog channel shows the Vcc at 3.266V, and when sleep req is activated, there is a 20uS glitch to 3.118V.
RF devices require good decoupling, and the LTE adapter has this decoupling. I’ve modified an LTE adapter to take power from the Mayfly Vcc ~ nominally 3.3V – and feed it into the LTE adapter’s power socket. I’ve also added a large capacitor 680uF with Low Series ESR 68mOhms to the 2pin power socket. (Wurth 860080274013)
I expect this to smooth any power surges from the S6B.
With Battery Power LiIon nominally 4.2V, and this carrier board, on power up, after reset there are some extended power spikes of 4mS, that dips from 3.229V to 3.06V.
After running for over 8hours on an speedy soak test cycle of sleeping/waking taking readings and POSTing to MMW every 2minutes successfully – it starts failing to connect to the local WiFi. Its left running for the next 48hours over the weekend, and fails to reconnect.
Just wondering if there are any suggestions?
2021-01-20 at 5:44 PM #15051
The WiFi S6B Vcc spec is very tight at 3.14-3.46V and the earlier trace showed that when the WiFi comes out of sleep, the current demand on the Vcc could be pulling it out of specification. In order to check if the Vcc is causing the Xbee S6B WiFi modem a problem, I’ve put together a separate regulator based on the TCR3DF335,LM which regulates to 3.35V and can pulse to 400mA, with a normal spec of 300mA.
SMT is so nice when you have the right parts. The TCR3D fits on a SC74 prototyping board with 1uF decoupling, and can fit between the LiIon bat at 4.2V and the LTE Adapter board that carries the X6B and plugs into the Mayfly.
Looking at the trace the Vcc power is much smoother and now well within specification. The Salaea Vcc probe on S6B Vcc, measures close to 3.35 (3.26V), then when S6B turns on drops to 3.31V. When the S6B has a power draws, it drops to 3.28V for up 1.5mS. All within good headroom from the lower 3.14V.
Trace of when the S6B initialization sequence below.
So now just need to let it run for a few days to see if it makes a difference.
2021-01-25 at 2:49 PM #15062Sara DamianoModerator
Did the extra power smoothing work for you? I’ve noticed the issues with the WiFi XBee’s, but I wasn’t having them drop that frequently and we don’t have any “production” loggers deployed with WiFi, so I never bothered to try and fix anything.
2021-01-26 at 2:01 PM #15066
Hi Sara, the short answer is no ~ which is a good.
I’ve had some other urgent issues that have come up, and so have had to leave it for a while. In my test bench trial above over a couple of days with 2minute sleep/wake – 3 runs from reset – all failed after #1~5hrs #2 4.5+7.5 #3-4Hrs – so something happening after 100+ sleep/wake events.
Its seems like it must be software tickling the S6B in some way that it doesn’t like and something changed but not sure when. I’m still thinking about it.
I do have a system that I want to deploy using WiFi but its not a high priority.
At the same time another system 0.27.5, with fixed SDI-12, using Verizon/LTE CAT-M1 , has been stable. Its using a 15minute sensor reporting, and 2hour update schedule.
2021-03-03 at 6:05 PM #15198
A status for testing with 0.28.01 – after integrating to this release with the SDI-12 bug fixed and then leaving it running for two weeks, using a Verizon LTE CAT-M1 has worked very well. A deliberate characteristic of this test setup was to have a very limited solar aspect a small charge at maximum 0.5A in the morning- but the overall power usage has been pretty low. U
Any usage of the Digi WiFi/0.28.01 soon gets hungup, and I plan to look at it.
On my fork I also have a BatteryManagementSubsystem combined with a reliable delivery that have worked in combination very well.
The following picture summarizes my testing.
2021-04-12 at 12:08 PM #15372
I’ve created a low cost monitoring console using a Raspberry Pi and FTDI cable. This allows a release debug OUTPUT to be monitored just as if it was on the console and compared to https://monitormywatershed.org/
2021-04-28 at 4:07 PM #15449
An update to my stability testing, this partly makes me collect the dates and status.
A test system “tu_rc_EC” standalone EC “Stream Disconnect” monitor built on 0.25.0 has been running since the beginning of Oct very well. I plan on describing this and haven’t done so yet.
The “TUCA-NA13” remote Verizon wireless system in the wilds measuring a stream depth, with two gauges – Keller and LTC500 – built on 0.25.0, has stopped recording on MMW on March 28. It started on Jan 29<sup>th</sup> after a previous outage, so ran for 2months. It is very remote and appears to go through periods when the Verizon network has low signal or MMW is not responding, but it has always recovered. A site visit next week might restore it.
An early beta “tu_rc_test06” is in my yard, and a duplicate of the TUCA-NA13, with version0.28.3. This is from my fork, with extra features, but based on 0.28.3. It stopped running, and I have a terminal on it that caught what happened
So tu_rc_test06 started Apr 1st (it survived Apr 1st), and froze on Apr 19<sup>th</sup> . Looking at the log, the Mayfly awoke at
… zzzZZ Awake @ 2021-04-19T16:07:00-08:00
then POSTED to MMW successfully
— Response Code — 201 waited 2107 mS Timeout 5000
Going to sleep. Ram( 6127 ) ZZzzz…
Watchdog disabled. barksUntilReset 150 <–WatchDogAVR
then never woke up.
At a guess, a hypothesis, the RTC clock never woke it up.
@srgdamiano I wonder if you’ve seen anything like this?
Looking at the Sodaq_DS3231 RTC it reinitializes every sleep cycle. In another life, working on a large product, we had some very occasional reliability issues with the I2C bus. When there was an issue it was spectacular, and once happened before a very visible customer. We came up with a workaround.
The I2C hardware protocol is not a guaranteed transaction, and could have noise on the line. So I’m trying a modification that does a read of the Sodaq_DS3231 registers to verify that they have been set correctly. It is a long shot, and happy to take any suggestions.
2021-04-28 at 5:31 PM #15450Sara DamianoModerator
No, I don’t think I’ve seen that.
2021-04-28 at 5:34 PM #15451
2021-04-30 at 2:58 PM #15460
My test06 system froze for a 2nd time. This time I pressed the User Button, which is also tied to an interrupt, and started up again. I have made updates described here; https://github.com/neilh10/ModularSensors/issues/34 and restarting the testing.
2021-09-13 at 6:31 PM #15887
- You must be logged in to reply to this topic.