Welcome to EnviroDIY, a community for do-it-yourself environmental science and monitoring. EnviroDIY is part of WikiWatershed, a web toolkit designed to help citizens, conservation practitioners, municipal decision-makers, researchers, educators, and students advance knowledge and stewardship of fresh water. New to EnviroDIY? Start here

Anthony Aufdenkampe

Forum Replies Created

Viewing 8 reply threads
  • Author
    Posts
    • #16201
      Anthony Aufdenkampe
      Participant

      The first explanation that pops to mind is that the DIY Modbus wing has power bleed from the digital pins when the power is turned off.

      Modbus stop bits are high, which leaves the AltSoftSerial transmit pin (5 or 6) at 3.3V when the sensor power shuts down. This then bleeds through RS485 converter (when it is powered off) over to the switched power rails, which then interferes with the startup of some other sensors, such as the MaxSonar. For a solution, see ModularSensors Issue “Add to Modbus an AltSoftSerial “flush” or “end” function, to set pins low #140″ for the coding solution.

      The coding solution is now provided in the complex_loop option shown in the menu_a_la_carte.ino example. Specifically, at the end of every loop, the AltSoftSerial pins need to be set to low before the system is put to sleep, as shown here: https://github.com/EnviroDIY/ModularSensors/blob/292371055ab9d9b884f8fedb7c9587181cf7789d/examples/menu_a_la_carte/menu_a_la_carte.ino#L2928-L2932

      This in turn required an altSoftSerial.begin(9600); statement at the top of every measure loop, as shown here:  https://github.com/EnviroDIY/ModularSensors/blob/292371055ab9d9b884f8fedb7c9587181cf7789d/examples/menu_a_la_carte/menu_a_la_carte.ino#L2850-L2854

      This means that you must use the complex loop every time you use these Modbus Wings.

      Let us know if that solves it.

    • #16197
      Anthony Aufdenkampe
      Participant

      @neilh, we’re glad to have you doing the intensive testing, reporting what you find, and having patience with us over the years before we could dive in to this issues, and over these recent weeks and future months as we work out all the tech debt and also adjust to a growing data system (we have 400 million records!).

    • #16195
      Anthony Aufdenkampe
      Participant

      @neilh, we’ve been tracking your many issues in the last few days, including #542, and have been working on solutions. I just responded in detail here:  https://github.com/ODM2/ODM2DataSharingPortal/issues/542#issuecomment-999785715

      The short story is that we’re working hard to improve error handling (i.e. making it more accurate, rather than just passing a 201 immediately), but doing so has had some unintended consequences, especially for your specific code that resends all data that do not receive a 201.

      The old system would queue up all the POST requests every 5 to 10 minutes, sometimes taking a minute to complete them all. However, ModularSensors times out after 7 seconds, so the radio and logger can go back to sleep. Sending a 201 immediately, before the POST completes, allowed that to happen.

      The unintended consequence of our more accurate error handling is that if the POST doesn’t fully complete in <7 seconds, then you get a 504, even if it the data do get inserted into the database. With your specific “reliable delivery” code, you then send that data all over again at the next logging time along with one more data row. This has lead to a steady increase in our server load that was making the problem worse for everyone. This is why we recently switched back to sending 201 responses immediately.

      Right now we are fast-tracking some hot fixes to the Gunicorn app server to allow for more concurrent posts, along with other fixes. For the next release, we’ll be upgrading Django 2.2 to 3.2, which enables us to use an ASGI (Asynchronous Server Gateway Interface) that will perform much better at routing the ever increasing traffic.

    • #16170
      Anthony Aufdenkampe
      Participant

      @neilh, do a hard reset of your browser cache our URL to get TSV working on Chrome. That is definitely the issue there. We’ll be adding “cache-busting” code in our next update.

    • #16167
      Anthony Aufdenkampe
      Participant

      We started the proxy forwarding about 45 minutes ago, so all data should now be rerouted from our old servers to the new servers on AWS.

      Let us know how things look!

    • #16165
      Anthony Aufdenkampe
      Participant

      @mbarney and others,

      After doing some research, we decided that shutting down our old servers for 30 minutes would not likely force the DNS on devices to refresh.

      Our plan is to set up proxy forwarding on our LimnoTech server to instantly forward all data to the AWS servers. We plan to do that today, and will also set up a forwarding log to keep track of which devices are still sending to LimnoTech. We’ll leave this up for weeks or maybe months, or until nearly all devices make the switch. It may be that all devices need to be rebooted for that to happen. We’ll see.

      Next week, we’ll sync the data over from the old servers to the new servers on AWS. That will fill in the data gaps that you’ve all been seeing.

    • #16160
      Anthony Aufdenkampe
      Participant

      @mbarny, that is very helpful to know that you have had success with power-cycling the stations. We’re seeing a few other sites switch over on their own in the last 24 hours.

    • #16158
      Anthony Aufdenkampe
      Participant

      @mbarney, wouldn’t that have been great if the Hologram pause worked! Thanks for trying it out!

      Have you seen any of your sites start reporting again in the last 24 hours? The couple that I’ve been tracking have not, which is disappointing. I definitely recognize how painful it would be to manually power cycle these stations.

      We’ll likely try to shut down our production server this afternoon unless we see that some stations are starting to send data to the new server on their own.

    • #16152
      Anthony Aufdenkampe
      Participant

      Fortunately, we deployed to AWS region US-east-2 (Ohio) and it was the US-east-1 (Virgina) data center that went down, so our release yesterday was not affected by the AWS outage!

      Unfortunately, we’ve seen two issues with caching mess up our otherwise smooth release:

      1. Domain Name Service (DNS) caches are causing some devices to send data to our old production servers (see details at https://www.envirodiy.org/topic/status-update-on-mmw/#post-16150).
      2. Browser caches are causing sparkline plots to again fail for some users, because your web browser still has old JavaScript in it that is pointing to a database that no longer exists.

      The solution to #2 is to do a hard reset/delete/remove of your browser cache for monitormywatershed.org and related websites. This will cause your browser to fetch the latest JavaScript from our web servers, which will render the sparkline plots correctly.

      Our first task for the next round of development is to add some “cache-busting” code into the app, so errors like #2 don’t happen again (See our GitHub issue #529).

      Also, we won’t see #1 again, because now that we are on AWS, we will no longer need to change IP addresses when issuing a new release of the Monitor My Watershed web application.

       

    • #16176
      Anthony Aufdenkampe
      Participant

      @jlemontu-org,  we won’t be releasing that code until February with our planned v0.13 release, so I would encourage users to clear their cache before then. You can track that progress at: https://github.com/ODM2/ODM2DataSharingPortal/issues/529

      That said, you don’t have to clear the cache for all websites, but just do it for MonitorMyWatershed. That’s not that painful. Clearing it for all websites is somewhat painful, so would be clear in how you make the recommendation.

Viewing 8 reply threads