The System Design interview at Facebook (now Meta) is a defining stage of the hiring process for software engineers. It’s where your ability to build scalable, reliable, and user-centric systems is tested. Unlike algorithmic interviews that focus on data structures, Facebook System Design interview questions evaluate how you architect products that serve billions of users in real time.
From News Feed ranking to Messenger delivery, Facebook operates some of the largest distributed systems in the world. Each feature relies on globally distributed databases, caching layers, event-driven pipelines, and ML-driven personalization. Candidates must think clearly under complexity; handling data consistency, latency, and fault tolerance while balancing cost and maintainability.
This guide breaks down what you need to know to ace the round: the core System Design principles, two fully worked Facebook-style questions, and an interview roadmap tailored for Meta-scale systems.
Core concepts to master before the interview
Before you face System Design interview questions, you need to be fluent in large-scale distributed system patterns. Facebook expects breadth across infrastructure and depth in reasoning.
1. Non-functional requirements (NFRs)
Meta systems handle billions of daily active users; NFRs are paramount.
Always discuss metrics explicitly:
- Latency: <200 ms for API requests.
- Availability: 99.99% or higher.
- Consistency: Often eventual, sometimes tunable.
- Scalability: Handle peak traffic surges gracefully.
- Cost: Efficient replication and caching across data centers.
You’re expected to tie these directly to your architectural choices.
2. Scale and back-of-the-envelope sizing
Interviewers want to see quick estimates; proof that you understand scale.
Example:
Facebook News Feed: 2B users × 100 posts/day = 200B posts daily.
Even if 1% are viewed concurrently, that’s 2B reads/sec globally.
Such mental math helps justify partitioning, caching, or asynchronous pipelines.
3. Architecture building blocks
Facebook’s infrastructure relies on proven large-scale components:
- Load Balancers and API Gateways.
- Caching tiers (Memcache, TAO).
- Distributed databases (MySQL + sharded tiers).
- Pub/Sub event buses (Kafka-style pipelines).
- CDNs for static content.
- Real-time systems like GraphQL, Thrift RPCs, and HHVM servers.
Mentioning these abstractions makes your design sound grounded and Meta-aware.
4. Data modeling and consistency trade-offs
Facebook engineers must choose the right data guarantees per subsystem.
- Strong consistency: user actions (likes, comments, messages).
- Eventual consistency: counters, timelines, notifications.
- Causal consistency: for feeds and messaging, ensuring “happens-before” ordering.
Understanding CAP trade-offs (Consistency, Availability, Partition Tolerance) is key for Facebook System Design interview questions.
5. Caching, replication, and sharding
Performance at the Meta scale comes from layered optimization:
- Use read-through caches for hot data.
- Shard users or posts by user_id or region_id.
- Replicate writes across multiple data centers asynchronously.
- Apply write fan-out for feeds or read fan-out for notifications, depending on latency needs.
Knowing these trade-offs helps you justify feed or messaging design choices.
6. Failures, monitoring, and reliability
Facebook’s systems are designed for resilience:
- Redundant data centers with automatic failover.
- Health checks for microservices and queues.
- Canary deployments for new code.
- Metrics for queue lag, replication delay, and cache miss rates.
Interviewers expect you to discuss what happens when things break, and how you recover gracefully.
7. Trade-off thinking and cost awareness
Facebook optimizes not only for performance but for efficiency. Discuss trade-offs like:
“We replicate metadata globally for fast reads but keep writes regional to reduce cross-region latency.”
Demonstrating cost-performance awareness sets you apart.
8. Communication and clarity
Articulate your reasoning cleanly. Interviewers prefer candidates who narrate logically:
- Clarify requirements.
- State assumptions.
- Sketch high-level design.
- Dive deep into key flows (read/write, caching, scaling).
- Discuss trade-offs.
- Summarize.
Strong communication is half the battle in the Facebook System Design interview.
Sample Facebook System Design interview questions and walk-throughs
Let’s now explore two realistic Facebook-style questions with step-by-step breakdowns.
Prompt 1: Design Facebook News Feed
Scenario:
Design the backend for Facebook’s News Feed; users view a personalized stream of posts from friends and pages. It must be fast, personalized, and fault-tolerant.
Clarify & scope:
- Active users: ~2B
- Posts per day: ~200B
- Average friends: 300
- Latency target: <200 ms per feed fetch
- Personalization: By user preferences, engagement history, and recency
- Durability: High; posts can’t be lost
- Availability: Must serve cached content during failures
High-level architecture:
- Write Path (Fan-out-on-write):
- Post Service → Fan-out Service → News Feed Store (per user queue).
- Read Path (Fan-out-on-read):
- When user requests feed → Fetch posts from friends + Ranker Service → Deliver top N posts.
- Ranking Service:
- ML-based model that scores posts based on recency, engagement, and relationships.
- Caching Layer:
- Memcache or TAO caches feed results; refreshed periodically.
- Event Pipeline:
- Writes updates to Kafka for analytics and recommendation retraining.
Flow:
- User posts → Fan-out distributes post IDs to friends’ feed queues.
- Friends’ feed reads pull post IDs → fetch post content from Post Store.
- Ranking Service orders feed → results cached and delivered.
Data model & consistency:
- Post Table: post_id, author_id, timestamp, content, privacy.
- Feed Table: user_id, post_ids[], last_fetched.
- User Table: user_id, friends[], preferences[].
Consistency:
- Strong for writes (no missing posts).
- Eventual for ranking updates.
Scalability & caching:
- Shard by user_id; high-fanout users (celebrities) handled via async batching.
- Cache the top 100 posts per user.
- Invalidate cache on new posts or major updates.
Reliability & monitoring:
- Monitor fan-out latency, ranking service throughput, and cache hit ratio.
- Graceful degradation: serve stale feed if ranking fails.
Trade-offs:
- Fan-out-on-write vs read: Write-heavy but low-latency reads; scales well for average users.
- Ranking latency vs freshness: Accept slightly stale feeds for performance.
- Consistency vs scalability: Eventual consistency ensures system responsiveness at a global scale.
Summary:
The design balances freshness, scalability, and personalization; exactly the dimensions Facebook engineers optimize daily.
Prompt 2: Design Facebook Messenger
Scenario:
Design a real-time chat system that supports one-to-one and group messaging with read receipts and message synchronization across devices.
Clarify & scope:
- Users: >1B active
- Messages/day: 100B+
- Latency target: <100 ms
- Availability: 99.99%
- Requirements: Offline sync, message delivery guarantee, ordering
High-level architecture:
- Client → Edge Gateway → Chat Service (handles message routing).
- Message Queue (Kafka or custom) ensures reliable delivery.
- Storage Layer:
- Hot Storage (recent chats) in Redis.
- Cold Storage (older messages) in distributed DB.
- Presence Service: Tracks online status.
- Push Notification Service: Notifies offline users of new messages.
Flow:
- User A sends message → Chat Service assigns message_id → pushes to Queue.
- Queue ensures delivery to recipient’s Chat Service instance.
- Message persisted to storage → ack sent to sender.
- Read receipt updates via event bus to sender’s device.
Data model & consistency:
- Messages Table: message_id, sender_id, receiver_id, content, status, timestamp.
- Conversation Table: conv_id, members[], last_updated.
Consistency:
- Strong within conversation partitions for ordering.
- Eventual for cross-device sync.
Scalability & caching:
- Partition by conversation_id.
- Cache recent conversations and message IDs.
- Use region-based replication to minimize latency.
Reliability & monitoring:
- Store-and-forward queues for offline users.
- Duplicate detection for idempotency.
- Monitor latency, undelivered message queue depth, and read receipt delay.
Trade-offs:
- Strong ordering vs throughput: Achieved via partition-per-conversation.
- Durability vs cost: Write messages to cold storage asynchronously for cost efficiency.
- Push vs pull: Push for active sessions, pull for idle clients.
Summary:
This design ensures sub-100ms latency with durable delivery, using regionally replicated queues and partitioned chat services; a cornerstone of Meta’s global messaging architecture.
Other Facebook System Design interview questions to practice
Below are 12 practice scenarios that mirror the difficulty and scale of real Facebook System Design interview questions. Each follows the structured format used in Meta interviews.
1. Design Facebook Stories
Goal: Support ephemeral posts that disappear after 24 hours.
Clarify: Billions of views/day; multimedia content; low latency.
Design:
- Upload Service → CDN + Object Storage → Metadata DB with expiry.
- Story Viewer Service prefetches story batches per user.
Data model: story_id, user_id, expiry_ts, media_url, viewed_by[].
Consistency/Scale/Failures: TTL-based cleanup; replicated metadata; serve cached content if DB slow.
2. Design Facebook Live (live video streaming)
Goal: Allow users to broadcast and view live streams with real-time comments.
Clarify: Latency target <5s; millions of concurrent viewers.
Design:
- Ingest Nodes → Transcoder → CDN edge nodes.
- Comment Service through pub/sub; chat overlay on stream.
Data model: stream_id, chunk_url, viewer_count, comment_id[].
Consistency/Scale/Failures: Eventual consistency for viewer metrics; multi-CDN redundancy; failover to cached chunk.
3. Design Facebook Ads Delivery System
Goal: Deliver targeted ads efficiently based on user features.
Clarify: Billions of impressions/day; strict latency <100 ms.
Design:
- Request → Ads Retrieval → Filtering → Scoring → Auction → Delivery.
- Feature Store caches user data; auction performed in-memory.
Data model: ad_id, targeting_filters, bid, budget.
Consistency/Scale/Failures: Eventual for analytics; strong for budget updates; sharded campaign DB.
4. Design the Graph Search System
Goal: Enable natural-language search across people, pages, and posts.
Clarify: Must support text + entity relationships.
Design:
- Inverted Index + GraphDB hybrid; query rewritten via NLP engine.
Data model: entity_id, edges[], tokens[].
Consistency/Scale/Failures: Asynchronous index updates; fallback to cached search results.
5. Design the Notification System
Goal: Notify users when relevant events occur (likes, comments, tags).
Clarify: Billions of notifications/day.
Design:
- Event Producers → Notification Service → Delivery Queue → Push Service.
- User preferences stored in KV store.
Data model: notification_id, user_id, type, status, timestamp.
Consistency/Scale/Failures: Eventual for delivery ordering; retry on push failure; use DLQ for bad payloads.
6. Design Facebook Marketplace
Goal: Enable local buying/selling with search, messaging, and payments.
Clarify: Regional sharding; must handle listings, images, chats.
Design:
- Listing Service + Search Index + Chat Integration.
- Payment handled via secure microservice.
Data model: listing_id, seller_id, price, status.
Consistency/Scale/Failures: Strong for payments; eventual for listings; replicate search indices regionally.
7. Design Facebook Events
Goal: Create and manage events with invites, RSVPs, and reminders.
Clarify: Millions of active events; must handle push reminders.
Design:
- Event Service manages creation & membership; Notification pipeline for reminders.
Data model: event_id, host_id, attendees[], start_time.
Consistency/Scale/Failures: Eventual for invite propagation; idempotent RSVP updates; monitor push latency.
8. Design the “People You May Know” Recommendation Engine
Goal: Suggest relevant connections.
Clarify: Compute heavy; uses graph data + ML signals.
Design:
- Offline batch job builds graph embeddings; online service ranks candidates.
Data model: user_id, embedding_vector, mutual_friends[].
Consistency/Scale/Failures: Eventual consistency; retrain nightly; A/B test ranking models.
9. Design Facebook Watch (video platform)
Goal: Serve personalized video feed with watch history and recommendations.
Clarify: Heavy read throughput; prefetch next video.
Design:
- Video Catalog Service + Recommendation Engine + CDN distribution.
Data model: video_id, category, views, engagement_score.
Consistency/Scale/Failures: Eventual for analytics; strong for metadata; fallback playlist if recommender fails.
10. Design the Like and Reaction Counter
Goal: Handle billions of like updates per day efficiently.
Clarify: High write rate; accurate eventual count.
Design:
- Write buffer with sharded counters; periodic batch aggregation.
Data model: post_id, reaction_type, count.
Consistency/Scale/Failures: Eventual consistency acceptable; replay logs for correction; cache hot counters.
11. Design a Global Logging & Metrics System
Goal: Centralize logs, metrics, and alerts from all microservices.
Clarify: PB/day scale; near real-time dashboards.
Design:
- Agents → Log Stream → Storage tiers (hot/warm/cold) + Query Engine.
Data model: timestamp, service_id, level, message.
Consistency/Scale/Failures: Eventual for analytics; redundant ingestion paths; tiered retention.
12. Design Facebook Authentication and Session Service
Goal: Secure login, session management, and token refresh.
Clarify: Billions of sessions; cross-platform support.
Design:
- OAuth 2.0-based Auth Service → Token Store (Redis) → Validation middleware.
Data model: session_id, user_id, expiry_ts, device_id.
Consistency/Scale/Failures: Strong consistency for session revocation, TTL cleanup; replicate across regions.
Common mistakes in answering Facebook System Design interview questions
Avoid these pitfalls frequently seen during interviews:
- Skipping scale justification: Facebook wants quantitative reasoning; always estimate QPS and data volume.
- Neglecting caching layers: Forgetting Memcache/TAO usage signals weak understanding of Meta infrastructure.
- Not discussing failure handling: Interviewers expect clear mitigation steps (failover, replication, retries).
- Over-focusing on one component: Missing the full data flow across services.
- Ignoring ranking or personalization: Most Facebook products involve ML ranking; mention it explicitly when relevant.
- Weak communication: Failing to narrate trade-offs or diagram structure leads to confusion.
- Over-engineering: Proposing exotic tech when simpler solutions suffice is penalized.
- No metrics: Every design should reference latency, throughput, or consistency targets.
How to prepare effectively for Facebook System Design interview questions
A structured plan keeps your learning targeted and practical.
Step 1: Review the fundamentals of distributed systems
Study replication, partitioning, caching, and the CAP theorem. Facebook problems assume you can reason about these in seconds.
Step 2: Practice Meta-scale examples
Recreate systems like News Feed, Messenger, or Stories. Stress-test your designs using realistic numbers (billions of users, sub-second latency).
Step 3: Learn the layered architecture
Understand Facebook’s design philosophy: stateless front-ends, horizontally scalable services, and heavy caching. Practice explaining how requests traverse layers.
Step 4: Perform mock interviews
Pair with peers or use interview platforms. Focus on narrating your reasoning, drawing diagrams, and responding to pushback.
Step 5: Incorporate metrics and trade-offs
Every answer should include latency targets, QPS estimates, and explicit CAP trade-offs. Practice phrasing like:
“We accept eventual consistency here because read freshness isn’t critical.”
Step 6: Align with Meta’s culture
Facebook values impact and pragmatism. Show how your design is efficient, maintainable, and user-focused. Reference reliability, simplicity, and performance.
Step 7: Prepare your day-of execution
- Listen carefully and clarify before drawing.
- Sketch data flow clearly.
- Dive into one or two subsystems deeply.
- End by summarizing trade-offs and next steps.
Quick checklists you can verbalize during an interview
System Design checklist
- Did I clarify functional and non-functional requirements?
- Did I size the system (QPS, storage, users)?
- Did I outline all main components?
- Did I justify data store choices?
- Did I specify caching and sharding?
- Did I discuss failure modes and recovery?
- Did I mention monitoring and metrics?
- Did I highlight trade-offs and costs?
- Did I communicate clearly and summarize?
Trade-off keywords you can use
- “We chose eventual consistency to improve global availability.”
- “Fan-out-on-write reduces read latency for most users.”
- “Caching top N posts per user cuts DB load by 80%.”
- “Region-based sharding minimizes cross-data-center latency.”
- “Async replication trades write delay for higher durability.”
More resources
To master architecture patterns and practice structured walkthroughs for real interview systems, use:
Grokking the System Design Interview
This is one of the best System Design courses and it is highly aligned with Facebook System Design interview questions, helping you learn design patterns for feeds, messaging, and notification systems at scale.
Final thoughts
Success in answering Facebook System Design interview questions depends on clarity, scalability thinking, and practical trade-offs. Facebook’s systems run at a global scale, so interviewers look for engineers who can reason about billions of users, petabytes of data, and millisecond latency.
To excel:
- Structure every answer (clarify → design → trade-offs → summarize).
- Be explicit about metrics, consistency, and scaling strategy.
- Show cost awareness and operational reliability.
With structured practice and thoughtful communication, you’ll be ready to design systems that could run at the scale of Facebook itself.