Azure DPE jobs deployment latency
Incident Report for ForePaaS
Resolved
After a month of monitoring, we are happy to confirm that our patch solved the DPE timeout problem. Azure still facing issue with they kube API servers, with our patch this problem has no more impact for customers.
On our side, we will continue to work with Azure support to make the Kube API more stable for the futur.
Posted Nov 12, 2020 - 18:49 CET
Update
We are still working with Azure support team on the timeout issue. While waiting for a stable solution from Azure, we internally develop a temporary fix to provide a better experience to our client. Do not hesitate to contact your main contact at ForePaaS to let him now if you still facing the problem.
We will let you know when the problem will be definitively fixed, until then we are working closely with Microsoft support to find the cause of the problem in their infrastructure.
We apologize for the inconvenience.
Posted Oct 14, 2020 - 12:06 CEST
Update
While waiting for Azure to fix the bug, we are sending a patch in Azure clusters to bypasses Azure APIs and uses our backend instead.
Posted Oct 07, 2020 - 10:31 CEST
Update
Our technical team continues to work with Microsoft support to find the root cause of this error. For now on, we believe this is due of an high IO throttling per VM (according to metrics analysed with Microsoft support). IO throttling can cause network communication error and make the internal API not reachable (which means the DPE workflow will be blocked because it can not contact other services). According the diagnostic performed by Microsoft support team, our technical team decided to increase the size of disks to increase the IO throughput. Our team will continue to monitor the cluster to see if errors around the DPE have disappeared or if the problem persists. Thank you very much for your understanding and patience, we are doing our best to solve the problem asap.
Posted Oct 06, 2020 - 17:50 CEST
Investigating
We are currently investigating this issue.
Posted Sep 29, 2020 - 16:16 CEST
This incident affected: Microsoft Azure Clusters (Azure AKS - ForePaaS France).