Dynamic Telemetry is a PROPOSAL : please provide feedback! :-)
Dynamic Telemetry is not an implementation, it's a request for collaboration, that will lead to an shared understanding, and hopefully one or more implementations.
Your feedback and suggestions on this document are highly encouraged!
Please:
-
Join us, by providing comments or feedback, in our Discussions page
-
Submit a PR with changes to this file ( docs/PositionPaper.DeliveryGuarantees.document.md)
Direct Sharing URL
Delivery Guarantees of Dynamic Telemetry (and OpenTelemetry)
- Telemetry must always be lossy; when push comes to shove
- Telemetry should never be lost, unless push has come to shove
In simpler terms:
- It's never okay to lose telemetry on machines that have surplus memory
- It's usually not a good idea to store telemetry on disk, except in dire emergency
- A delivery guarantee cannot be given to the users of telemetry - telemetry is not a replacement for transaction processing
Reason:
- Assume a service is operating nominally, servicing business needs
- Assume the telemetry backend locks up - perhaps a network outage
- A good telemetry system should queue, first to RAM, and maybe to disk
- As telemetry collects, at some point, a decision must be made
- start dropping telemetry
- stop servicing customer workloads
The right answer is to continue servicing customer loads, and to stop dropping telemetry.