@neilh, we’ve been tracking your many issues in the last few days, including #542, and have been working on solutions. I just responded in detail here: https://github.com/ODM2/ODM2DataSharingPortal/issues/542#issuecomment-999785715
The short story is that we’re working hard to improve error handling (i.e. making it more accurate, rather than just passing a 201 immediately), but doing so has had some unintended consequences, especially for your specific code that resends all data that do not receive a 201.
The old system would queue up all the POST requests every 5 to 10 minutes, sometimes taking a minute to complete them all. However, ModularSensors times out after 7 seconds, so the radio and logger can go back to sleep. Sending a 201 immediately, before the POST completes, allowed that to happen.
The unintended consequence of our more accurate error handling is that if the POST doesn’t fully complete in <7 seconds, then you get a 504, even if it the data do get inserted into the database. With your specific “reliable delivery” code, you then send that data all over again at the next logging time along with one more data row. This has lead to a steady increase in our server load that was making the problem worse for everyone. This is why we recently switched back to sending 201 responses immediately.
Right now we are fast-tracking some hot fixes to the Gunicorn app server to allow for more concurrent posts, along with other fixes. For the next release, we’ll be upgrading Django 2.2 to 3.2, which enables us to use an ASGI (Asynchronous Server Gateway Interface) that will perform much better at routing the ever increasing traffic.