Choosing QoS Level for the MQTT Bridge routes
Status
- Draft
- Proposed
- Accepted
- Deprecated
Context
This document lists the design considerations for selecting the appropriate Quality of Service (QoS) level for the MQTT Bridge. MQTT offers three QoS levels, each impacting message delivery guarantees and network overhead. This record will try to define the optimal QoS level based on each use case.
Decision
The MQTT Bridge is responsible for duplicating data between the Local UNS and the Enterprise UNS. This data has not the same lifecycle, nor the same granularity and it's being duplicated from different sources to different targets.
The purpose of the ADR is to define a specific Quality of Service (QoS) level for each kind of data. Each choice of QoS level needs to be analyzed, to avoid any dysfunctions.
Considered Options
QoS levels
MQTT defines three QoS levels:
- QoS 0 (At most once): also known as fire and forget. The message is sent only once. Delivery is not guaranteed. This is the fastest and most lightweight option, suitable for non-critical data where occasional message loss is acceptable.
- QoS 1 (At least once): The publisher resends the message until an acknowledgment (called PUBACK) is received from the broker. This ensures delivery but might lead to duplicates. To be used for moderately important data where some redundancy is tolerable.
- QoS 2 (Exactly once): A four-handshake process guarantees a single message delivery. This offers the highest reliability but incurs the most overhead. To be used for critical data requiring guaranteed delivery without duplicates.
NOTE: QoS 2 is not supported in Azure IoT Operations.
Data types
See descriptions for data flow data types below:
| Data Flow | Source | Target | Cardinality | Lifecycle |
|---|---|---|---|---|
| Metadata | L4 | L3 | 1 | Boot phase |
| Assets Data | L3 | L4 | N | Continuously |
Decision Conclusion
Discarded level
We introduced the MQTT Bridge to our solution design to ensure reliable data transfer between PSNet (L3) and Corporate network (L4). This means data must be delivered from the source to the target without errors. Because of this requirement, we won't be using Quality of Service (QoS) level 0, as it doesn't guarantee delivery.
Choosen levels
Metadata
The Metadata is:
- needed at least Once
- need to be retained as required for Data Pipelines
- Duplication of messages will not impact the execution of the Data Pipelines [To be confirmed]
NB: the MQTT Bridge does not support to add Retention flag until now.
Decision: QoS level 1 will be used for Metadata duplication.
Assets Data
The Assets Data will be duplicated from L3 to L4 and it's:
- needed at least Once
- Duplication of messages will not impact the use of the data [To be confirmed]
- needed asynchronously
Decision: QoS level 1 will be used for Assets Data duplication.
Questions
QoS 2 Implications
The use of QoS level 2 may increase the load of the MQTT Broker in L4, due to the delivery verifications sequences. To overcome all harm, we can increase the number of instances of the MQTT Broker in L4.
This is can be done by increasing the cardinality of these pods:
Frontend: MQTT gateway-like that will receive the requests and forward it to thebackendchains.Backend: responsible for storing and delivering messages to the clients.
To configure the cardinality of the MQTT Broker elements, we can:
-
define them in the Kubernetes workload definition for the MQTT Broker CRD:
apiVersion: mq.iotoperations.azure.com/v1beta1kind: Brokermetadata:name: brokernamespace: azure-iot-operationsspec:...mode: distributedcardinality:backendChain:partitions: 2redundancyFactor: 2workers: 2frontend:replicas: 2workers: 2encryptInternalTraffic: falsememoryProfile: medium- the
distributedmode is used to override the default values. - the
backendChainis using 2 partitions and 2 workers. - the
frontendis using 2 replicas (number offrontendpods to deploy) and 2 workers (number of workers to deploy per frontend). - the
memoryProfiledefines the settings profile for the memory usage. It's a very important value and needs good understanding of the targeted use cases. It has a huge impact on performance/infrastructure resource. For example, it can cause memory usage to go from99 MiBmax to4.9 GiBmax perfrontend replica.
- the
NB: All these settings are covered in details in the Configure core Azure IoT MQ Preview MQTT broker settings
-
or we can define the
cardinalityandmemoryProfilewhile deploying themqextension during the AIO Install.az iot ops init ... \--mq-mode distributed--mq-backend-workers--mq-broker--mq-frontend-replicas--mq-frontend-server--mq-frontend-workers--mq-mem-profile....
Very Important: There is no optimal MQTT broker pre-configuration. Testing with real data is crucial to determine the ideal settings for L4 MQTT Broker. This testing will help you define the actual load on the broker and size the Kubernetes infrastructure accordingly.
Related Documentation
- MQTT Bridge configuration
- Publishing data into the Local and Enterprise UNS
References
- Configure scaling settings - Configure core Azure IoT MQ
- az iot ops - Azure CLI Reference
- Secure Azure IoT MQ Preview communication using BrokerListener
AI and automation capabilities described in this scenario should be implemented following responsible AI principles, including fairness, reliability, safety, privacy, inclusiveness, transparency, and accountability. Organizations should ensure appropriate governance, monitoring, and human oversight are in place for all AI-powered solutions.