Diagnosing instability in production-scale agent reinforcement learning | EngBrief