Welcome to EnviroDIY, a community for do-it-yourself environmental science and monitoring. EnviroDIY is part of WikiWatershed, a web toolkit designed to help citizens, conservation practitioners, municipal decision-makers, researchers, educators, and students advance knowledge and stewardship of fresh water. New to EnviroDIY? Start here

MMW timeouts

Home Forums Mayfly Data Logger MMW timeouts

Viewing 3 reply threads
  • Author
    Posts
    • #16194
      neilh20
      Participant

      I’m just wondering if anybody is monitoring the responses from POSTing to monitormywatershed.org
      and if they are getting a response ‘201’

      The data for my test systems, is being recorded, but I’m not seeing any 201.
      The timeout is 5sec, so now it waits fully connected for 5seconds and then gives up listening.

      [2021-12-22 09:16:14.343] Connected Internet
      [2021-12-22 09:16:14.374]
      [2021-12-22 09:16:14.374] pubDQTR Sending data to [ 0 ] monitormywatershed.org
      [2021-12-22 09:16:14.827] POST /api/data-stream/ HTTP/1.1
      [2021-12-22 09:16:14.827] Host: monitormywatershed.org
      [2021-12-22 09:16:14.827] TOKEN: 0cf7c40a-232e-457d-87d6-cea5c0757fec
      [2021-12-22 09:16:14.827] Content-Length: 467
      [2021-12-22 09:16:14.827] Content-Type: application/json
      [2021-12-22 09:16:14.827]
      [2021-12-22 09:16:14.827] {“sampling_feature”:”236c674b-69b9-43af-b0d6-33d67b870ecc”,”timestamp”:”2021-12-22T09:16:00-08:00″,”8c57835f-a32f-4d62-82dc-0ba09f04cf52″:322,”1f2c9e91-3aa6-44d7-9312-160d04fbf877″:1727.980,”65e0a9e5-cc8a-4ed6-8d28-127b5ec5e8e9″:-0.893,”a0e41a66-875a-44fc-9e2f-02c6e25f6063″:4.0431,”3bebd4a3-8b54-4f92-ba55-5fd2fd021358″:4.215,”43bcda9b-2973-4639-af2c-f0b6bb3fa44b”:-9999.00000,”ff4d732d-88d8-4a1b-b499-16417603edfe”:-9999.00,”7182846e-46e0-4a10-b110-9bc32de4aca9″:0}
      [2021-12-22 09:16:14.827]
      [2021-12-22 09:16:21.870] — Response Code — 504 waited 5011 mS Timeout 5000

      Typically there is a POST and then the server responds to ack the message,
      so from a few days ago I was getting
      [2021-12-17 11:50:13.520] Connected Internet
      [2021-12-17 11:50:13.551]
      [2021-12-17 11:50:13.551] Sending data to [ 0 ] data.envirodiy.org
      [2021-12-17 11:50:14.114] POST /api/data-stream/ HTTP/1.1
      [2021-12-17 11:50:14.114] Host: data.envirodiy.org
      [2021-12-17 11:50:14.114] TOKEN: 0cf7c40a-232e-457d-87d6-cea5c0757fec
      [2021-12-17 11:50:14.114] Content-Length: 469
      [2021-12-17 11:50:14.114] Content-Type: application/json
      [2021-12-17 11:50:14.114]
      [2021-12-17 11:50:14.114] {“sampling_feature”:”236c674b-69b9-43af-b0d6-33d67b870ecc”,”timestamp”:”2021-12-16T22:44:00-08:00″,”8c57835f-a32f-4d62-82dc-0ba09f04cf52″:192,”1f2c9e91-3aa6-44d7-9312-160d04fbf877″:1841.210,”65e0a9e5-cc8a-4ed6-8d28-127b5ec5e8e9″:-0.893,”a0e41a66-875a-44fc-9e2f-02c6e25f6063″:4.1065,”3bebd4a3-8b54-4f92-ba55-5fd2fd021358″:4.336,”43bcda9b-2973-4639-af2c-f0b6bb3fa44b”:-9999.00000,”ff4d732d-88d8-4a1b-b499-16417603edfe”:-9999.00,”7182846e-46e0-4a10-b110-9bc32de4aca9″:-28}
      [2021-12-17 11:50:14.114]
      [2021-12-17 11:50:16.986] — Response Code — 201 waited 1614 mS Timeout 5000

    • #16195
      Anthony Aufdenkampe
      Participant

      @neilh, we’ve been tracking your many issues in the last few days, including #542, and have been working on solutions. I just responded in detail here:  https://github.com/ODM2/ODM2DataSharingPortal/issues/542#issuecomment-999785715

      The short story is that we’re working hard to improve error handling (i.e. making it more accurate, rather than just passing a 201 immediately), but doing so has had some unintended consequences, especially for your specific code that resends all data that do not receive a 201.

      The old system would queue up all the POST requests every 5 to 10 minutes, sometimes taking a minute to complete them all. However, ModularSensors times out after 7 seconds, so the radio and logger can go back to sleep. Sending a 201 immediately, before the POST completes, allowed that to happen.

      The unintended consequence of our more accurate error handling is that if the POST doesn’t fully complete in <7 seconds, then you get a 504, even if it the data do get inserted into the database. With your specific “reliable delivery” code, you then send that data all over again at the next logging time along with one more data row. This has lead to a steady increase in our server load that was making the problem worse for everyone. This is why we recently switched back to sending 201 responses immediately.

      Right now we are fast-tracking some hot fixes to the Gunicorn app server to allow for more concurrent posts, along with other fixes. For the next release, we’ll be upgrading Django 2.2 to 3.2, which enables us to use an ASGI (Asynchronous Server Gateway Interface) that will perform much better at routing the ever increasing traffic.

    • #16196
      neilh20
      Participant

      @aufdenkampe many thanks. Appreciate this is covering timing issues that are a bit hard to flush out.

      I suppose I’m the messenger in checking the system to what appears to be the bar from the Mayfly, but its a pretty hard area to checkout – all I can say is that as the messenger I’ve seen the challenges in other systems and was trying to give the headsup with figuring out what the best way of characterizing 5the system with https://github.com/ODM2/ODM2DataSharingPortal/issues/524

      My LTE systems when responding is now doing so in 10seconds, or timing out at 25seconds.
      info on https://github.com/ODM2/ODM2DataSharingPortal/issues/542
      What’s nice it does appear to be getting to the database.

    • #16197
      Anthony Aufdenkampe
      Participant

      @neilh, we’re glad to have you doing the intensive testing, reporting what you find, and having patience with us over the years before we could dive in to this issues, and over these recent weeks and future months as we work out all the tech debt and also adjust to a growing data system (we have 400 million records!).

Viewing 3 reply threads
  • You must be logged in to reply to this topic.