Welcome to EnviroDIY, a community for do-it-yourself environmental science and monitoring. EnviroDIY is part of WikiWatershed, an initiative of Stroud Water Research Center designed to help people advance knowledge and stewardship of fresh water.
New to EnviroDIY? Start here

Status update on MMW?

Home Forums Monitor My Watershed Status update on MMW?

Viewing 18 reply threads
  • Author
    Posts
    • #16138
      Matt Barney
      Participant

        Just wondering if we could please get a status update on MMW. Most of our sites have now been offline for ~21 hours. My colleague replaced a Mayfly and modem this morning (for a separate issue), and its data began reporting to MMW immediately, but our others are still offline. Any ETA on a fix?

        Thanks,
        Matt

      • #16141
        neilh20
        Participant

          I’m seeing the same – and strange data losses been going on since 2021-12-03 20:15 PST,
          I’ve entered it as
          https://github.com/ODM2/ODM2DataSharingPortal/issues/535

          I also asked three weeks ago how to do a test for a reliable data server
          https://github.com/ODM2/ODM2DataSharingPortal/issues/524

          and over a year ago, I asked for what could a Reliable Delivery model algorithm look like
          https://github.com/ODM2/ODM2DataSharingPortal/issues/485

        • #16144
          Heather Brooks
          Keymaster

            @mbarney, @neilh20, thanks for your posts. The switch from the LimnoTech server to Amazon Web Services happened shortly after 13:00 EST on December 7. LimnoTech’s team is seeing that their production server is still receiving data from devices, which means that data is not going to the new AWS production. @aufdenkampe and his LimnoTech team are working diligently on a fix.

          • #16150
            Anthony Aufdenkampe
            Participant

              @mbarney, @neilh20, thanks for pinging us.

              As @heather mentioned, we released MonitorMW v0.12 yesterday at that time, which has some major under-the-hood improvements to substantially improve reliability, including now being hosted by AWS (zone: us-east-2; Ohio). For details read our v0.12.0: Update to Python 3.8 & Django 2.2; Migrate to AWS release notes on GitHub.

              Unfortunately, to migrate to AWS we needed to change the Domain Name System (DNS) records for our URLs, which in turn needed to propagate to all internet name servers. That propagation took a surprisingly long time, especially for two in your regions: Verizon in Brooklyn NY (this could cover Michigan), and Corporate West in San Jose CA (which covers the entire US West). We reissued the DNS change today a little after 1 ET, which did seem to resolve those persistent holdout name servers. However, on top of that, devices and networks have their own DNS caches that can sometimes persist for quite a while.

              Fortunately, all your missing data is still showing up in our old database on LimnoTech servers, so we will be able to sync that to new AWS servers in coming days.

              We’re hoping that the DNS caches flush on their own in the coming day or two. Meanwhile, we are looking into a couple of options in case that doesn’t happen:

              • Temporarily shut down our old production server, to try to force devices to clear their caches.
                • We’ll likely do this late Thu or early Fri of this week, for about 20-30 minutes, during which time data will be unfortunately lost.
              • Ask you all to power cycle your devices, early next week, if data is still flowing to our old servers.
              • Set up forwarding from our old servers to the new AWS servers, to get data instantaneously logging in the correct database.
                • This is not a long-term solution because it includes a potential fail point that we’ve been trying to move away from for a while!


              @mbarney
              , the behavior you saw with your new station makes perfect sense given the issues we’re seeing.


              @neilh20
              , thanks for your patience on your suggestions for reliable data delivery approaches. Now that we’ve migrated to AWS and upgraded the software stack, we’re poised to finally start carefully considering your suggestions.

            • #16153
              Matt Barney
              Participant

                Thanks @heather, and @aufdenkampe, that does makes sense, with the symptoms we’re seeing. Good to know that the data are not being lost in the interim.

                Shutting down the old server temporarily sounds like a good idea; all of our devices are on a 15-minute sampling period, so 20+ minutes should cause at least one failed message from the device, and the relative cost of losing one or two data points would be well worth it.

                Power cycling all of our devices would be pretty painful; some stations (e.g. with wipered sensors) are hours from their caretakers’ locations.

                Wow, it would be nice to be able to remotely initiate a cache flush (or even a reset) on the LTE modems! But, as I read Digi’s documentation, I think that would have to be initiated by an AT command from the host (Mayfly). (Right??) I don’t *think* there’s a way to address this from the Hologram dashboard, but if anyone knows, please share. Hmm… I wonder what clicking Hologram’s “Pause” button for each SIM would do…?

                Matt

              • #16157
                Matt Barney
                Participant

                  We tried the Hologram SIM pause idea on one station, but it had no effect, which isn’t too surprising.


                  @aufdenkampe
                  , please keep us informed about the timing of the temporary shutdown of the old server. Thanks for your efforts.

                  Matt

                • #16158
                  Anthony Aufdenkampe
                  Participant

                    @mbarney, wouldn’t that have been great if the Hologram pause worked! Thanks for trying it out!

                    Have you seen any of your sites start reporting again in the last 24 hours? The couple that I’ve been tracking have not, which is disappointing. I definitely recognize how painful it would be to manually power cycle these stations.

                    We’ll likely try to shut down our production server this afternoon unless we see that some stations are starting to send data to the new server on their own.

                  • #16159
                    Matt Barney
                    Participant

                      We have 1 site that resumed uploading data at 12/8/2021 14:45 ET. We’re assuming that the volunteer restarted it.

                      5 additional sites are uploading after we restarted them, and 19 sites have not reported since the cutover to the new server.

                    • #16160
                      Anthony Aufdenkampe
                      Participant

                        @mbarny, that is very helpful to know that you have had success with power-cycling the stations. We’re seeing a few other sites switch over on their own in the last 24 hours.

                      • #16161
                        neilh20
                        Participant

                          For my status, there is one site that started uploading, OK – which is good as its about 3hours drive time.

                          One site that hasn’t updated. Its on private land and requires permission to enter the property, and its usually only done when there is a group of activities on the site. Its about one hour drive away.
                          Both are on verizon.

                          For my local test system, running over verizon I monitor the output of the connection process. Its having a lot of timeouts. The timeout is set to 5seconds.

                        • #16162
                          Shannon Hicks
                          Moderator

                            All of our stations that are using our new sim7089 EnviroDIY LTEbee made the changeover automatically.  The majority of our stations that still have the Digi Xbee LTE module have been offline since the server change, however a few random ones seemed to be fine and were online immediately after the changeover.  The rest of the stations with Digi boards have required visiting the station and cycling the power to get them back online.  Luckily we’re in the process of upgrading most of our stations to the sim7080 boards, so we have fewer Digi boards deployed that we did a few months ago.

                          • #16164
                            Matt Barney
                            Participant

                              Interesting – we have 1 site with the new LTEbee, and it also weathered the cutover successfully.


                              @aufdenkampe
                              , As of now, we’re not seeing any of our Digi stations that are able to upload to the new server as a result of the temporary shutdown of your old server, assuming that that indeed took place yesterday.

                              Matt

                            • #16165
                              Anthony Aufdenkampe
                              Participant

                                @mbarney and others,

                                After doing some research, we decided that shutting down our old servers for 30 minutes would not likely force the DNS on devices to refresh.

                                Our plan is to set up proxy forwarding on our LimnoTech server to instantly forward all data to the AWS servers. We plan to do that today, and will also set up a forwarding log to keep track of which devices are still sending to LimnoTech. We’ll leave this up for weeks or maybe months, or until nearly all devices make the switch. It may be that all devices need to be rebooted for that to happen. We’ll see.

                                Next week, we’ll sync the data over from the old servers to the new servers on AWS. That will fill in the data gaps that you’ve all been seeing.

                              • #16166
                                Matt Barney
                                Participant

                                  Thanks @aufdenkampe, that sounds like a great plan. I like the idea of the forwarding log too.

                                  We’ll watch for a change in stations’ status once the proxy is in place.

                                • #16167
                                  Anthony Aufdenkampe
                                  Participant

                                    We started the proxy forwarding about 45 minutes ago, so all data should now be rerouted from our old servers to the new servers on AWS.

                                    Let us know how things look!

                                  • #16168
                                    neilh20
                                    Participant

                                      Thanks for the update. The system that I had that wasn’t reporting this morning is now reporting – https://monitormywatershed.org/sites/TUCA_PO03/ as of Dec. 10, 2021, 10:30 a.m. (UTC-08:00)

                                      I’m logging in using FireFox for my production sites and Chrome for my test sites.
                                      For the above it looks different under the different browsers, and the TSV is accessible under FireFox, but not under Chrome, do you want me to log an issue on it?

                                    • #16169
                                      Matt Barney
                                      Participant

                                        @aufdenkampe – I’ve confirmed that indeed, all stations are now reporting data as expected.

                                      • #16170
                                        Anthony Aufdenkampe
                                        Participant

                                          @neilh, do a hard reset of your browser cache our URL to get TSV working on Chrome. That is definitely the issue there. We’ll be adding “cache-busting” code in our next update.

                                          • #16175
                                            Jake Lemon
                                            Participant

                                              Hi Anthony. Will the cache-busting code negate the need for MMW users to clear their cache to view sparklines? And if so, do you have an estimate on when you’ll be doing the next update? I’m debated whether to suggest to our volunteers/partners that they clear their cache in order to view sparklines.

                                               

                                              Thanks!

                                              • #16176
                                                Anthony Aufdenkampe
                                                Participant

                                                  @jlemontu-org,  we won’t be releasing that code until February with our planned v0.13 release, so I would encourage users to clear their cache before then. You can track that progress at: https://github.com/ODM2/ODM2DataSharingPortal/issues/529

                                                  That said, you don’t have to clear the cache for all websites, but just do it for MonitorMyWatershed. That’s not that painful. Clearing it for all websites is somewhat painful, so would be clear in how you make the recommendation.

                                            • #16171
                                              neilh20
                                              Participant

                                                I cleared browsing date for “cached images and files” & “browsing history” (a complete clear of everything is quite educational to recover from)
                                                https://support.google.com/accounts/answer/32050
                                                and it worked. thanks for the tip

                                            Viewing 18 reply threads
                                            • You must be logged in to reply to this topic.