EngBrief
TopicsSources⌘K
EngBrief

Engineering insights from the world's best tech companies, curated and summarized.

Get the weekly brief

Browse

TopicsSourcesFavorites

More

SearchRSS Feed

© 2026 EngBrief

Updated every 4 hours

Today's Highlight
Slack EngineeringFeatured

Managing context in long-run agentic applications

To address the challenge of maintaining alignment and coherence in long-running agentic applications, Slack Engineering introduced a Director's Journal, a context channel that accumulates and structures the Director's working memory. The Journal captures decisions, observations, findings, questions, actions, and hypotheses, enabling the Director to lead the investigation towards a conclusion. This structured memory provides a common narrative that keeps other agents on track and enables them to make coherent decisions across rounds. The agentic framework also utilizes the Critic's Review, a context channel that consolidates an annotated findings report with credibility scores. Additionally, the Critic's Timeline context channel offers a chronological view of consolidated findings, enabling each agent to consume and produce different context sources that balance continuity and creativity.

Read article→

Latest Articles

Sort
Topic
Source
Slack EngineeringNew1 min2h ago

Managing context in long-run agentic applications

To address the challenge of maintaining alignment and coherence in long-running agentic applications, Slack Engineering introduced a Director's Journal, a context channel that accumulates and structures the Director's working memory. The Journal captures decisions, observations, findings, questions, actions, and hypotheses, enabling the Director to lead the investigation towards a conclusion. This structured memory provides a common narrative that keeps other agents on track and enables them to make coherent decisions across rounds. The agentic framework also utilizes the Critic's Review, a context channel that consolidates an annotated findings report with credibility scores. Additionally, the Critic's Timeline context channel offers a chronological view of consolidated findings, enabling each agent to consume and produce different context sources that balance continuity and creativity.

AIDistributed SystemsBackend
Agent-based SystemsReasoning and AlignmentLong-running Applications Design
Cloudflare BlogNew1 min5h ago

Building a CLI for all of Cloudflare

Here's a 3-sentence summary of the blog post: Cloudflare engineers have rebuilt the Wrangler CLI from scratch to make it the unified CLI for all Cloudflare products, with commands for over 100 products and 3,000 API operations. To enable this, they created a new TypeScript-based schema to define APIs, CLI commands, and arguments, allowing for automated generation of CLI commands and other interfaces. The new CLI also features Local Explorer, a beta feature for simulating and introspecting local resources, making local development more interactive and consistent with remote API usage.

CloudAI
CLICloudflareAPI DesignSoftware Development
Cloudflare BlogNew1 min7h ago

Durable Objects in Dynamic Workers: Give each AI-generated app its own database

Cloudflare engineers have enabled Durable Objects in Dynamic Workers, allowing AI-generated apps to have their own databases. This enables the creation of secure, sandboxed, and isolated applications with long-lived state and custom UIs. To achieve this, the Durable Object Facets feature loads and instantiates a Durable Object class dynamically, while providing it with a SQLite database for storage.

DatabasesAI
Durable ObjectsDynamic WorkersSQLiteDatabase DesignPersistent Storage
Cloudflare BlogNew1 min7h ago

Agents have their own computers with Sandboxes GA

Cloudflare has made its Sandboxes and Cloudflare Containers generally available, providing a solution for AI agents to develop and run code safely. Sandboxes offer persistent, isolated environments with features like secure credential injection, PTY support, and persistent code interpreters, simplifying agent lifecycle management and development workflows. With these enhancements, organizations can deploy agent workloads at scale while ensuring security, reliability, and performance.

CloudAI
Cloudflare SandboxesgRPCAPI DesignData PipelineMicroservices
Cloudflare BlogNew1 min7h ago

Dynamic, identity-aware, and secure Sandbox auth

Cloudflare has added outbound Workers to its Sandboxes, enabling programmatic egress proxies that allow users to easily connect to services, add observability, and implement safe authentication. This feature provides a flexible and secure authentication mechanism that meets key requirements such as zero trust, simplicity, flexibility, identity awareness, observability, and performance. By using outbound Workers, Cloudflare customers can write custom code to handle requests and apply specific rules for each sandbox instance.

SecurityAI
Cloudflare WorkersSecurityIdentity and Access ManagementZero-trustAPI Gateways
Cloudflare Blog1 min1d ago

Welcome to Agents Week

Cloudflare introduces Agents Week, marking a shift towards building infrastructure for the emerging age of AI. Agents are one-to-one instances that serve a single user and task, unlike traditional applications that follow a one-to-many model. This paradigm shift requires new infrastructure, specifically isolates, which are orders of magnitude more efficient than containers in terms of compute and memory. Cloudflare's infrastructure, particularly their Dynamic Workers open beta, is designed to support the large-scale adoption of agents. Isolates start in milliseconds and are securely sandboxed, enabling the running of millions of agent instances per second. This scalable infrastructure makes agents affordable for non-technical users, opening up possibilities for widespread adoption.

CloudAI
Cloudflare Blog1 min3d ago

500 Tbps of capacity: 16 years of scaling our global network

Cloudflare's global network recently crossed 500 Tbps of external capacity, with a peak utilization only a fraction of this number. This scale requires moving intelligence to every server in the network, enabling it to defend itself against attacks. As Cloudflare's network expanded, it adapted to meet customer needs for security, with systems like BGP-based secure tunnels and load balancers like Unimog protecting clients from a wide range of threats.

Cloud
CloudflareNetwork CapacityDDoS ProtectionNetwork Scaling
Netflix TechBlog10 min3d ago

Evaluating Netflix Show Synopses with LLM-as-a-Judge

by Gabriela Alessio, Cameron Taylor, and Cameron R. WolfeIntroductionWhen members log into Netflix, one of the hardest choices is what to watch. The challenge...

AIDistributed SystemsPerformance
LLMAPIJavaNatural Language ProcessingContent RecommendationMachine LearningAI in Content Creation
Engineering at Meta1 min4d ago

Escaping the Fork: How Meta Modernized WebRTC Across 50+ Use Cases

Meta's Real-Time Communication Team solved the "forking trap" in WebRTC by migrating 50+ use cases to a modular architecture built on top of the latest upstream version. They achieved this through a dual-stack architecture, using a shim layer to dispatch calls between the application layer and WebRTC. This resulted in a significant reduction in binary size and improved performance, while enabling safe A/B testing of new upstream releases before rolling them out.

Distributed SystemsPerformance
WebRTCMicroservicesAPI DesignMonorepo Management
AWS Architecture1 min5d ago

Build a multi-tenant configuration system with tagged storage patterns

Here is a 2-3 sentence summary of the blog post: To build a scalable multi-tenant configuration system, AWS Architecture proposes using a tagged storage pattern that routes configuration requests to the most suitable AWS storage service, such as Amazon DynamoDB or AWS Systems Manager Parameter Store, based on key prefixes. The architecture uses four layers, including a storage layer with multi-backend strategy, a service layer with gRPC and strategy pattern, an authentication layer with Amazon Cognito, and an event-driven refresh layer for zero-downtime config updates. By leveraging these technologies, the system achieves strict tenant isolation, real-time config updates, and high performance without creating a performance bottleneck.

PerformanceCloud
AWSAPI GatewayLambdaDynamoDBtagged storage patternMulti-TenancyConfiguration ManagementScalabilityCloud Architecture
Engineering at Meta1 min5d ago

Trust But Canary: Configuration Safety at Scale

Meta engineers employ "Trust But Canary" strategy for configuration rollouts at scale, utilizing canary rollouts and progressive rollouts to ensure safe deployments and detect regressions early. Health checks and monitoring signals help Meta's Configurations team catch problems before they spread, while data and AI/machine learning tools significantly reduce alert noise and accelerate error diagnosis. Incident reviews prioritize system improvements over blame, fostering a culture of safety and reliability.

PerformanceAIDistributed Systems
Meta Tech PodcastCI/CDObservability
Pinterest Engineering4 min5d ago

Performance for Everyone

Author: Lin Wang (Android Performance Engineer)Default FeatureFor mobile apps, performance is considered as the “default feature”, which means apps are...

Performance
Performance EngineeringMobile App DevelopmentAndroidResponsive Design
Cloudflare Blog1 min5d ago

From bytecode to bytes: automated magic packet generation

Cloudflare automated the generation of "magic packets" for Linux malware embedded in BPF socket programs using symbolic execution and Z3 theorem prover. This approach can work backward from malicious filters to automatically generate trigger packets, reducing manual analysis time from hours to seconds. By employing Z3 and scapy tools, researchers can now automate the creation of valid magic packets, streamlining security investigations and analysis workflows.

BPFZ3 theorem proverAutomated testingSymbolic executionBinary analysis
Cloudflare Blog1 min5d ago

Cloudflare targets 2029 for full post-quantum security

Cloudflare has accelerated its post-quantum (PQ) security roadmap, targeting full PQ security, including authentication, by 2029. This comes after breakthroughs in quantum computing advancements, including Google's improved algorithm to break elliptic curve cryptography and Oratomic's low qubit estimate for breaking RSA-2048 and P-256. The increased pace of progress on quantum hardware, error correction, and quantum software has pushed the predicted timeline for PQ migration forward from 2035+.

CloudSecurity
Quantum HardwarePost-Quantum SecuritySecurityEncryption
Airbnb Engineering10 min6d ago

Building a high-volume metrics pipeline with OpenTelemetry and vmagent

A production-tested approach for moving a large-scale metrics pipeline from StatsD to OpenTelemetry and Prometheus.By: Eugene Ma, Natasha AleksandrovaWhen...

OpenTelemetryvmagentPrometheusStatsDMetrics PipelineMonitoring SystemMigrationLarge-ScaleHigh-Volume
Pinterest Engineering10 min6d ago

Evolution of Multi-Objective Optimization at Pinterest Home feed

Homefeed: Jiacong He, Dafang He, Jie Cheng (former), Andreanne Lemay, Mostafa Keikha, Rahul Goutam, Dhruvil Deven Badani, Dylan WangContent Quality: Jianing...

PerformanceBackend
JenkinsApache AirflowPostgreSQLSparkTensorFlowData PipelineMachine LearningOptimizationCI/CDData Engineering
Stripe Engineering1 min6d ago

How agents, digital wallets, and trust are rewriting checkout

Here's a 3-sentence summary of the engineering blog post: Shoppers are increasingly making big-ticket purchases on mobile, prompting businesses to rethink checkout designs and adapt to regional and generational differences in digital wallet preferences. Supporting the right digital wallet can cut average mobile checkout time in half and substantially improve conversion rates: for example, offering BLIK in Poland increases checkout conversion by 46% and offering Pix in Brazil increases conversion by 31%. AI-assisted shopping and agents are further changing the path to checkout, requiring businesses to verify identity, intent, and authorization in real-time to improve payment performance and balance the trade-off between reducing fraud and lowering conversion rates.

PaymentsAI
StripeE-commerceCheckout ProcessDigital PaymentsTrust and SecurityUser Experience
Netflix TechBlog11 min6d ago

Stop Answering the Same Question Twice: Interval-Aware Caching for Druid at Netflix Scale

By Ben SykesIn a previous post, we described how Netflix uses Apache Druid to ingest millions of events per second and query trillions of rows, providing the...

PerformanceDistributed Systems
Apache DruidNetflixCI/CDReal-time InsightsData IngestionQuery PerformanceScaleCache Optimization
AWS Architecture1 min6d ago

Unlock efficient model deployment: Simplified Inference Operator setup on Amazon SageMaker HyperPod

Amazon SageMaker has introduced the HyperPod Inference Operator, a Kubernetes controller that simplifies model deployment on HyperPod clusters through flexible deployment interfaces and advanced autoscaling. The Inference Operator is now a native EKS add-on, enabling one-click installation and managed upgrades directly from the SageMaker console. With this integration, customers can create and configure required IAM roles and dependencies with a single click, streamlining the inference workflow.

CloudAI
Amazon SageMakerHyperPodAWS CLITerraformModel DeploymentCloud Deployment
Cloudflare Blog1 min6d ago

How we built Organizations to help enterprises manage Cloudflare at scale

Cloudflare has launched Organizations, a feature designed to help enterprises manage multiple Cloudflare accounts at scale. Organizations provides a unified management layer, allowing administrators to control user access, configurations, and analytics across multiple accounts. This feature addresses the complexity of managing large-scale Cloudflare deployments, while maintaining the principle of least privilege. The Organizations feature is built on top of Cloudflare's existing Tenant system and includes features such as an account list, org super administrator roles, HTTP traffic analytics, and shared configurations. These features enable enterprise customers to centrally manage policies, view analytics, and ensure that administrators have appropriate permissions. Organizations is currently in public beta and will be expanding to all customers in the coming months, with a focus on delivering additional features such as audit logs, billing reports, and self-serve account creation.

CloudSecurity
CloudflarebetaCloud ManagementEnterprise SoftwareAuthorization SystemsScalabilityManagement Layer