Welcome to EnviroDIY, a community for do-it-yourself environmental science and monitoring. EnviroDIY is part of WikiWatershed, an initiative of Stroud Water Research Center designed to help people advance knowledge and stewardship of fresh water.
New to EnviroDIY? Start here

Data Dropped Between Mayfly and MMW (2.4%)

Home Forums Monitor My Watershed Data Dropped Between Mayfly and MMW (2.4%)

Viewing 10 reply threads
  • Author
    Posts
    • #13273
      BrianJastram
      Participant

        I’ve got 3 XBee LTE-M modems and an XBeeS6B WiFi modem all regularly transmitting data at 5 minute intervals to MMW. Is there any reason why I’m seeing about 97.6% of those data points make it through to MMW instead of 100%?

      • #13274
        Sara Damiano
        Moderator

          Well it’s certainly not intended that less than 100% of the data makes it online, if that’s what your asking.

          Beyond that I can’t really help without a lot more information. So, here goes:

          Do you know where in the chain the data is being lost? Are you using a Mayfly? What code are you running? If ModularSensors, what version of both ModularSensors itself and of TinyGSM. What firmware version do you have on the XBee3? What firmware version do you have on the SARA R410 inside of the XBee3? Are you using the the XBee3 in transparent or bypass mode?

          1. Is 100% of the data on your SD card? If so, the sensors are working and the logging portion of the code on your board is working fine; move on to 2. If not:
          1a. Are you recording the “ProcessorStats_SampleNumber?” If so, is this number being reset after a gap? If so, something is causing your board to freeze until the watch-dog timer goes off. That could be an electrical/wiring issue or a code issue.
          1b. If you’re not recording sample number, How long is the gap between points when data is dropped? If the gaps are always either 15 or 20 mintues long, that’s probably the watch dog. It bites after 15 minutes, but you may miss the first new point after it bites leaving a 20 minute gap.
          1c. If you think it’s the watchdog:
          1ci. Are you using any I2C sensors and are disconnecting their power between sampling. If the I2C bus is held low, it will cause the low level I2C/Wire library to hang on the next attempt to communicate with any I2C device (like the RTC).
          1cii. There is also an issue with the XBee3 in bypass mode that sometimes causes hangs. I know it happens. I can’t figure out why. You can read about it on GitHub. I’ve only seen this sort of hang once every few days or so, though, not 2.4% of the time.

          2. Is the issue with both wifi and cellular or just with the cellular? If both, skip to 4.

          3. Can you see your data use by individual sessions for your cellular plan (or wifi connection)? If you are using a hologram data plan, you can see this on their dashboard.
          3a. When looking at the data sessions on hologram, do you see a session at every single 5 minute interval. If not, you either don’t have good enough service or there’s some other issue with the modem seeing and registering to the internet.
          3b. If you see a session at every interval, do you see actual data being used on that session or are they 0 byte sessions? When running the XBee3 in transparent mode it will sometimes continue to report that it is unable to register to the network even though on hologram’s side the registration has been successful. At other times (again, when in transparent mode) the XBee3 fails to properly packet the data and actually send it out. Both of these things happens more on older version of the XBee3 firmware, but I’ve still seen it on the current version. I’ve tried a number of things to try to help the connection, but unfortunately this is on Digi to fix. I have had many, many fewer issues with empty sessions when running in Bypass mode, at the trade off of a know bug causing rare hangs in bypass.

          4. Finally, if you’re see the data on your SD card, you’re seeing sessions that consume at every single five minute interval, and your still not seeing data on the data portal, it could be something wrong either with the portal receiving the data or displaying it.
          4a. Is the data missing no matter how you attempt to view it? (ie, sparklines, pop-up table, downloaded csv from the data portal, plots in WebTSA, csv download from WebTSA) If it’s missing everywhere it’s *probably* not being processed and put into the “main” databased correctly.
          4b. If you can see data on any place in the database, but just not all of them, there are known communication issues between the “main” database and the higher performance “daughter” databases that produce the various types of plots and other ways of viewing the data. The data isn’t lost and hopefully these issues will be fixed soon, but the only solution in this case is to wait.

        • #13331
          Sara Damiano
          Moderator

            Did you get a chance to do any more troubleshooting on this?

          • #13332
            BrianJastram
            Participant

              Yes, some.
              All 4 modems are in transparent mode and there are no gaps on the SD card. The issue is on both the WiFi (1) and cellular (3) modems and the gaps are there in the data in all 3 ways I view the data on MMW except sparklines because those don’t show small gaps. Regarding Hologram some of my data sessions have no data consumption and some of the time periods are missed (see attachment).
              Thanks

              Attachments:
            • #13338
              Sara Damiano
              Moderator

                Hm. Well, I’m stumped. I have issues all the time with the cellular, but not the Wifi.

                I’m sorry; I’m going to throw a zillion questions at you again.

                You’re using 4 separate Mayfly’s, right? (ie, you don’t have both the wifi and cellular XBee’s hooked up to the same Mayfly). You’re wifi is connected to “real” wifi, right? (ie, not a cellular hot-spot). Are the loggers all near each other? Does the timing of the holes in the loggers match up or are they different for each? Have you ever manually changed any settings on the XBee’s (ie, with XCTU)?

                What sensors do you have attached? I don’t think you said, but you’re using ModularSensors, right? Have you set/modified MS_SEND_BUFFER_SIZE? Can you try running with the build flags “-DMS_DATAPUBLISHERBASE_DEBUG” and “-DMS_ENVIRODIYPUBLISHER_DEBUG” to look at the outgoing json? Is your json longer than the “default” buffer of 750 (ie, is it broken up into more than one chunk)? Do you see anything strange going on like variable numbers of significant figures that sometimes get truncated leading to ill-formed json? What response codes are you getting back (ie, 201/403/5xx)?

                Are you using LTE carrier boards for the LTE XBee’s? Do you have “useCTSforStatus” set appropriately (ie, false with a carrier board, true without)? If you’re really lucky and manage to catch the board “in the act” of a failed send, do the LED’s on the carrier board turn on as expected (ie, “on” is solid, “status” is solid and then begins to blink)?

              • #13340
                BrianJastram
                Participant

                  Answers:
                  <Yes, 4 separate Mayfly’s
                  <WiFi is real (no hot-spot)
                  <3 loggers (2 cell and 1 WiFi) at same office, 4th logger is miles away
                  <The timing of the holes doesn’t necessarily match up
                  <I have changed modem settings with XCTU on one modem from airplane mode to Normal operation. I have also updated their firmware to the newest version, 11413
                  <I have a Decagon CTD-10 hooked up to the WiFi modem, rain gauges hooked up to 2 Mayflys and just a small solar panel on the 3rd.
                  <I’m using modular sensors.
                  These are the only build flags I have in my .ini

                  I see the others now in my .ino. Looks like I’m going to be doing some flag building and testing now. I’ll get back to you on this topic.
                  <I am using LTE carrier boards for the LTEXBee’s.
                  <I have “useCTSforStatus” set false for all 4 Mayflys. I guess I should change it to true for the WiFi setup.
                  <I’ll keep an eye out for the blinking light clue.
                  Thanks for your help.

                • #13341
                  BrianJastram
                  Participant

                    <This is the output from the PUBLISHER_DEBUGs
                    <I’m not aware of any variable numbers getting truncated.

                  • #13342
                    BrianJastram
                    Participant

                      <I added and set the MS_SEND_BUFFER_SIZE to 750.
                      I’ll keep monitoring the data and keep you informed.
                      Thanks again.

                    • #13354
                      Sara Damiano
                      Moderator

                        The only thing coming to mind right now is possible issues with the json or the content length header. If the length of the json doesn’t match up with the actual size of the json, the send would probably be rejected. (You’d get a 5xx I think.) Unfortunately the only way you’d see that 5xx or details on why the send failed would be to turn on the modem’s “deep” debugging, log the full output to a text file, and then scroll through the thousands and thousands of lines of output looking for a time when a send failed to see what the response was. Or you could get *really* lucky and catch it in the act. I’ve been trying to catch the LTE XBee3’s in bypass mode in the act of crashing to see if I can figure out why it sometimes locks up but I haven’t managed.

                        You could also download the csv from MonitorMW and go line-by-line against a logger csv until you find a gap. See if the data in the gap is in any way different from the surrounding lines. Does one of the values have more or fewer digits when there is a gap? (As in, something like 4.65, 4.66, 4.62, *4.6*, 4.61 where the 4.6 is suddenly “shorter” because of the missing trailing 0.) I’m kind-of grasping at straws though.

                      • #13355
                        Sara Damiano
                        Moderator

                          To turn on the “deep” debugging for the modem, add build flags “-DMS_DIGIXBEECELLULARTRANSPARENT_DEBUG” and “-DMS_DIGIXBEECELLULARTRANSPARENT_DEBUG_DEEP” for cellular or “-DMS_DIGIXBEEWIFI_DEBUG” and “-DMS_DIGIXBEEWIFI_DEBUG_DEEP”. Keep the publisher build flags set. You’d also have to change your serial monitor line endings to just a carriage return instead of CR/LF (quirk of the XBee). In PlatformIO/Atom you can do that by clicking for more options in the pop-up to open the serial monitor. In VSCode, you need to add this to your platformio.ini in the env section:

                          If you want to save the output, I think you’ll need to use a different terminal monitor. I often use TeraTerm, but there are a lot of free terminals out there.

                          • #13357
                            BrianJastram
                            Participant

                              I checked the SD card from one of my cellular enabled Mayflys and was surprised to find 14 .csvs created in the last 20 days. Some of the breaks (with missing data point) in the .csvs match with a missing data point on MMW. One of the gaps on MMW is between voltage values of differing digits after the decimal, 4.745 and 4.73. So now I’m seeing missing data points in between consecutive .csvs on my SD card and missing data on MMW that is recorded on my SD card. I’ll work on the deep debugging and clarify my situation. Right now I don’t have a good questions to ask.

                          • #13362
                            Sara Damiano
                            Moderator

                              If a new csv is being created, it’s almost certainly because the logger is restarting. It could be some irregularity in the power that’s causing the board to restart directly or it’s the watchdog. If you’re not specifically setting the file name, whenever the board attempts to talk to the SD card, it puts together the logger ID and the date and checks if a file already exists with that name. If that file exists, it adds to it; if not, it creates it. So multiple restarts in the same day wouldn’t show up as multiple files, but you would see a new file for each new day there’s a restart. If the gap is 15 or 20 minutes, chances are really high the watchdog is firing.

                              It’s also possible that there’s something up with the connections between the SD card and the Mayfly, but since you’re seeing the issue on multiple loggers I doubt that’s it.

                          Viewing 10 reply threads
                          • You must be logged in to reply to this topic.