Status update on MMW?

This topic has 23 replies, 7 voices, and was last updated 2024-06-17 at 1:58 PM by Heather Brooks.

Viewing 20 reply threads

Author

Posts
- 2021-12-08 at 9:53 AM #16138
  Matt Barney
  Participant
  Just wondering if we could please get a status update on MMW. Most of our sites have now been offline for ~21 hours. My colleague replaced a Mayfly and modem this morning (for a separate issue), and its data began reporting to MMW immediately, but our others are still offline. Any ETA on a fix?
  
  Thanks,
  Matt
- 2021-12-08 at 1:03 PM #16141
  neilh20
  Participant
  I’m seeing the same – and strange data losses been going on since 2021-12-03 20:15 PST,
  I’ve entered it as
  https://github.com/ODM2/ODM2DataSharingPortal/issues/535
  
  I also asked three weeks ago how to do a test for a reliable data server
  https://github.com/ODM2/ODM2DataSharingPortal/issues/524
  
  and over a year ago, I asked for what could a Reliable Delivery model algorithm look like
  https://github.com/ODM2/ODM2DataSharingPortal/issues/485
- 2021-12-08 at 1:24 PM #16144
  Heather Brooks
  Keymaster
  @mbarney, @neilh20, thanks for your posts. The switch from the LimnoTech server to Amazon Web Services happened shortly after 13:00 EST on December 7. LimnoTech’s team is seeing that their production server is still receiving data from devices, which means that data is not going to the new AWS production. @aufdenkampe and his LimnoTech team are working diligently on a fix.
- 2021-12-08 at 5:35 PM #16150
  Anthony Aufdenkampe
  Participant
  @mbarney, @neilh20, thanks for pinging us.
  
  As @heather mentioned, we released MonitorMW v0.12 yesterday at that time, which has some major under-the-hood improvements to substantially improve reliability, including now being hosted by AWS (zone: us-east-2; Ohio). For details read our v0.12.0: Update to Python 3.8 & Django 2.2; Migrate to AWS release notes on GitHub.
  
  Unfortunately, to migrate to AWS we needed to change the Domain Name System (DNS) records for our URLs, which in turn needed to propagate to all internet name servers. That propagation took a surprisingly long time, especially for two in your regions: Verizon in Brooklyn NY (this could cover Michigan), and Corporate West in San Jose CA (which covers the entire US West). We reissued the DNS change today a little after 1 ET, which did seem to resolve those persistent holdout name servers. However, on top of that, devices and networks have their own DNS caches that can sometimes persist for quite a while.
  
  Fortunately, all your missing data is still showing up in our old database on LimnoTech servers, so we will be able to sync that to new AWS servers in coming days.
  
  We’re hoping that the DNS caches flush on their own in the coming day or two. Meanwhile, we are looking into a couple of options in case that doesn’t happen:
  - Temporarily shut down our old production server, to try to force devices to clear their caches.
    
    We’ll likely do this late Thu or early Fri of this week, for about 20-30 minutes, during which time data will be unfortunately lost.
  - Ask you all to power cycle your devices, early next week, if data is still flowing to our old servers.
  - Set up forwarding from our old servers to the new AWS servers, to get data instantaneously logging in the correct database.
    
    This is not a long-term solution because it includes a potential fail point that we’ve been trying to move away from for a while!
  @mbarney, the behavior you saw with your new station makes perfect sense given the issues we’re seeing.
  
  @neilh20, thanks for your patience on your suggestions for reliable data delivery approaches. Now that we’ve migrated to AWS and upgraded the software stack, we’re poised to finally start carefully considering your suggestions.
- 2021-12-08 at 6:24 PM #16153
  Matt Barney
  Participant
  Thanks @heather, and @aufdenkampe, that does makes sense, with the symptoms we’re seeing. Good to know that the data are not being lost in the interim.
  
  Shutting down the old server temporarily sounds like a good idea; all of our devices are on a 15-minute sampling period, so 20+ minutes should cause at least one failed message from the device, and the relative cost of losing one or two data points would be well worth it.
  
  Power cycling all of our devices would be pretty painful; some stations (e.g. with wipered sensors) are hours from their caretakers’ locations.
  
  Wow, it would be nice to be able to remotely initiate a cache flush (or even a reset) on the LTE modems! But, as I read Digi’s documentation, I think that would have to be initiated by an AT command from the host (Mayfly). (Right??) I don’t *think* there’s a way to address this from the Hologram dashboard, but if anyone knows, please share. Hmm… I wonder what clicking Hologram’s “Pause” button for each SIM would do…?
  
  Matt
- 2021-12-09 at 9:29 AM #16157
  Matt Barney
  Participant
  We tried the Hologram SIM pause idea on one station, but it had no effect, which isn’t too surprising.
  
  @aufdenkampe, please keep us informed about the timing of the temporary shutdown of the old server. Thanks for your efforts.
  
  Matt
- 2021-12-09 at 9:38 AM #16158
  Anthony Aufdenkampe
  Participant
  @mbarney, wouldn’t that have been great if the Hologram pause worked! Thanks for trying it out!
  
  Have you seen any of your sites start reporting again in the last 24 hours? The couple that I’ve been tracking have not, which is disappointing. I definitely recognize how painful it would be to manually power cycle these stations.
  
  We’ll likely try to shut down our production server this afternoon unless we see that some stations are starting to send data to the new server on their own.
- 2021-12-09 at 10:25 AM #16159
  Matt Barney
  Participant
  We have 1 site that resumed uploading data at 12/8/2021 14:45 ET. We’re assuming that the volunteer restarted it.
  
  5 additional sites are uploading after we restarted them, and 19 sites have not reported since the cutover to the new server.
- 2021-12-09 at 12:00 PM #16160
  Anthony Aufdenkampe
  Participant
  @mbarny, that is very helpful to know that you have had success with power-cycling the stations. We’re seeing a few other sites switch over on their own in the last 24 hours.
- 2021-12-09 at 1:05 PM #16161
  neilh20
  Participant
  For my status, there is one site that started uploading, OK – which is good as its about 3hours drive time.
  
  One site that hasn’t updated. Its on private land and requires permission to enter the property, and its usually only done when there is a group of activities on the site. Its about one hour drive away.
  Both are on verizon.
  
  For my local test system, running over verizon I monitor the output of the connection process. Its having a lot of timeouts. The timeout is set to 5seconds.
- 2021-12-09 at 2:12 PM #16162
  Shannon Hicks
  Moderator
  All of our stations that are using our new sim7089 EnviroDIY LTEbee made the changeover automatically. The majority of our stations that still have the Digi Xbee LTE module have been offline since the server change, however a few random ones seemed to be fine and were online immediately after the changeover. The rest of the stations with Digi boards have required visiting the station and cycling the power to get them back online. Luckily we’re in the process of upgrading most of our stations to the sim7080 boards, so we have fewer Digi boards deployed that we did a few months ago.
- 2021-12-10 at 12:20 PM #16164
  Matt Barney
  Participant
  Interesting – we have 1 site with the new LTEbee, and it also weathered the cutover successfully.
  
  @aufdenkampe, As of now, we’re not seeing any of our Digi stations that are able to upload to the new server as a result of the temporary shutdown of your old server, assuming that that indeed took place yesterday.
  
  Matt
- 2021-12-10 at 12:51 PM #16165
  Anthony Aufdenkampe
  Participant
  @mbarney and others,
  
  After doing some research, we decided that shutting down our old servers for 30 minutes would not likely force the DNS on devices to refresh.
  
  Our plan is to set up proxy forwarding on our LimnoTech server to instantly forward all data to the AWS servers. We plan to do that today, and will also set up a forwarding log to keep track of which devices are still sending to LimnoTech. We’ll leave this up for weeks or maybe months, or until nearly all devices make the switch. It may be that all devices need to be rebooted for that to happen. We’ll see.
  
  Next week, we’ll sync the data over from the old servers to the new servers on AWS. That will fill in the data gaps that you’ve all been seeing.
- 2021-12-10 at 1:24 PM #16166
  Matt Barney
  Participant
  Thanks @aufdenkampe, that sounds like a great plan. I like the idea of the forwarding log too.
  
  We’ll watch for a change in stations’ status once the proxy is in place.
- 2021-12-10 at 1:36 PM #16167
  Anthony Aufdenkampe
  Participant
  We started the proxy forwarding about 45 minutes ago, so all data should now be rerouted from our old servers to the new servers on AWS.
  
  Let us know how things look!
- 2021-12-10 at 1:51 PM #16168
  neilh20
  Participant
  Thanks for the update. The system that I had that wasn’t reporting this morning is now reporting – https://monitormywatershed.org/sites/TUCA_PO03/ as of Dec. 10, 2021, 10:30 a.m. (UTC-08:00)
  
  I’m logging in using FireFox for my production sites and Chrome for my test sites.
  For the above it looks different under the different browsers, and the TSV is accessible under FireFox, but not under Chrome, do you want me to log an issue on it?
- 2021-12-10 at 2:06 PM #16169
  Matt Barney
  Participant
  @aufdenkampe – I’ve confirmed that indeed, all stations are now reporting data as expected.
- 2021-12-10 at 2:15 PM #16170
  Anthony Aufdenkampe
  Participant
  @neilh, do a hard reset of your browser cache our URL to get TSV working on Chrome. That is definitely the issue there. We’ll be adding “cache-busting” code in our next update.
  - 2021-12-14 at 2:59 PM #16175
    Jake Lemon
    Participant
    Hi Anthony. Will the cache-busting code negate the need for MMW users to clear their cache to view sparklines? And if so, do you have an estimate on when you’ll be doing the next update? I’m debated whether to suggest to our volunteers/partners that they clear their cache in order to view sparklines.
    
    Thanks!
    - 2021-12-14 at 3:53 PM #16176
      Anthony Aufdenkampe
      Participant
      
      @jlemontu-org, we won’t be releasing that code until February with our planned v0.13 release, so I would encourage users to clear their cache before then. You can track that progress at: https://github.com/ODM2/ODM2DataSharingPortal/issues/529
      
      That said, you don’t have to clear the cache for all websites, but just do it for MonitorMyWatershed. That’s not that painful. Clearing it for all websites is somewhat painful, so would be clear in how you make the recommendation.
- 2021-12-10 at 3:38 PM #16171
  neilh20
  Participant
  I cleared browsing date for “cached images and files” & “browsing history” (a complete clear of everything is quite educational to recover from)
  https://support.google.com/accounts/answer/32050
  and it worked. thanks for the tip
- 2024-06-15 at 9:45 AM #18524
  Jim Moore
  Participant
  “download Sensor data” was very slow to respond (several hours) now it doesn’t seem to work at all. Does anyone know the status of this? Is there another portal to download station data?
- 2024-06-15 at 9:51 AM #18525
  Jim Moore
  Participant
  Never mind! “download Sensor data” seems to be working now.
  - 2024-06-17 at 1:58 PM #18528
    Heather Brooks
    Keymaster
    @w3asa Glad to hear you were able to download your data. The Monitor My Watershed download function may not perform well if the file size is too large. Here are a couple of tips for avoiding download timeouts:
    
    Don’t collect data at 5-minute intervals for long periods unless you know you need it for a particular parameter and site; 15-minute data collection should be the default for most users.
    
    If you are requesting large volumes of data, view the data in TSV before using the download button. If TSV can plot the requested data, you can be reasonably sure the download will be successful.
Author

Posts