Design a URL shortener like bit.ly

Start by clarifying requirements, such as expected traffic, read/write ratio, custom aliases, analytics, and expiration. Define success metrics, for example requests per second and acceptable latency, and decide whether you need strong uniqueness guarantees or eventual uniqueness for keys. Sketch a high-level design with components: an API layer, a service to generate and resolve short codes, a persistent store for mappings, and a cache for hot keys. For code generation use a base62 encoding of a sequence or a hashed slug with collision handling, and store mappings in a relational database with a write-through cache in Redis to serve hot redirects quickly. Watch for collisions, hot-key amplification, and abuse such as mass creation of links. Consider rate limiting, analytics sampling to reduce write volume, and a background job to expire or archive links to keep storage manageable.

Design a news feed system like Twitter

Begin by asking about scale, whether the feed must be strongly ordered, how many followers a user can have, and latency requirements. Choose between pull-based design where feeds are computed on read, and push-based fan-out on write, based on follower counts and read/write patterns. Use a hybrid approach: for users with few followers, fan-out on write to per-user feed stores; for celebrities with millions of followers, fan-out on read by fetching recent posts from a time-ordered store and merging. Store tweets in a durable store like Cassandra or DynamoDB and keep hot feeds in Redis or a CDN-backed cache for low-latency access. Plan for eventual consistency in fan-out, and handle backpressure from spikes in writes by batching fan-out jobs with queues like Kafka. Measure tail latencies and set eviction policies to prevent cache thrashing when many users request cold feeds simultaneously.

Design a rate limiter for an API

Clarify whether limits are per user, per API key, per IP, or per endpoint, and whether limits are global or distributed. Choose a limiting algorithm such as fixed window, sliding window, or token bucket based on fairness and burst tolerance needs. For distributed systems, implement a token bucket stored in a fast key-value store like Redis using atomic operations or Lua scripts to decrement tokens safely. For very high scale, consider a local token bucket per instance with periodic reconciliation to a central store to reduce coordination overhead. Be aware of clock skew and race conditions when using fixed windows, which can allow bursts at window edges. Monitor reject rates and provide clear Retry-After headers to clients so they can back off gracefully.

Design a scalable chat system with real-time messaging

Ask about requirements such as one-to-one versus group chats, message ordering, delivery guarantees, presence features, and expected message volume. Decide whether you need strict ordering and exactly-once delivery, since those needs affect protocol and storage complexity. Use WebSockets or long polling for real-time delivery, with a pub/sub layer such as Kafka or Redis Streams to route messages to connected clients. Persist messages in an append-only store like Cassandra, and use sequence numbers or vector clocks for ordering and reconciliation when users reconnect. Design for offline delivery by storing unread messages and replaying them when clients reconnect. Handle partition tolerance by preferring eventual consistency for large groups, and implement deduplication to avoid double deliveries during retries.

Design a file storage service like Dropbox

Clarify requirements for file sizes, sync frequency, versioning, metadata search, and consistency across devices. Decide on API constraints, whether you need strong consistency for metadata operations, and how to handle partial uploads and large files. Use object storage for file blobs, such as S3, and a metadata service backed by a transactional database for namespaces, permissions, and versions. Implement chunked uploads with checksums, and use a content-addressable store to avoid duplicate uploads; keep a mapping from file paths to content hashes in the metadata store. Plan for conflict resolution in concurrent edits with clients by using optimistic locking or operational transform for collaborative editing. Ensure you have quotas and garbage collection for orphaned chunks to control storage costs.

Design an autocomplete service for search queries

Ask whether suggestions should be personalized, how quickly suggestions must appear, and what data sources will populate suggestions. Choose a data structure and storage approach that supports low-latency prefix queries and frequent updates if needed. Use a trie or a prefix-sorted store for suggestions, and back it with a fast in-memory index like Elasticsearch or a Redis sorted set for simple cases. For personalization, maintain per-user recent queries or popularity scores and merge global and user-specific suggestions at query time. Optimize for memory by storing shared prefixes efficiently and by limiting suggestion depth. Precompute top suggestions for popular prefixes and use asynchronous updating for less common prefixes to keep query latency low.

Design a distributed caching strategy for a large web service

Start by identifying hot data and access patterns, including read-to-write ratios and acceptable staleness for cached items. Choose between client-side caching, in-process caches, and distributed caches, and decide on eviction and invalidation strategies accordingly. Consider consistent hashing for distributed caches to reduce key movement during node changes, and choose TTLs that balance freshness and load reduction. For write-heavy data, use write-through or write-back patterns carefully, and for complex invalidation use explicit cache invalidation on updates with versioning to prevent race conditions. Watch for cache stampede by implementing request coalescing or early recompute strategies. Monitor cache hit rate and tail latency, and design metrics and dashboards to detect hotspots so you can scale caches independently.

Design a photo sharing service that supports thumbnails and transformations

Clarify required image sizes, transformation types, on-the-fly versus precomputed thumbnails, and acceptable upload and download latency. Decide whether to store original images and generate derivatives on demand or to precompute common sizes at upload time. Use an object store for originals and a transformation service that reads from storage, applies operations, and writes back derived images. For heavy read workloads, serve transformed images from a CDN and cache popular sizes; for rare sizes generate on demand and cache the result with a versioned key. Protect the transformation pipeline with rate limits and size limits to avoid abuse, and compute checksums to detect corrupted uploads. Consider using background jobs to generate derivatives asynchronously when immediate availability is not required.

Design a logging and monitoring system for microservices

Ask about retention requirements, query patterns, alerting thresholds, and whether logs need to be searchable in real time. Choose components for collection, storage, indexing, and visualization keeping in mind cost and query performance. Collect structured logs with a sidecar or agent and forward them to a message bus like Kafka for buffering and processing. Index searchable fields in a system like Elasticsearch or ClickHouse for analytics, and push metrics to a TSDB like Prometheus; build dashboards and alerts on aggregated metrics and error rates. Design sampling for high-volume logs to control costs while preserving enough data for debugging, and ensure logs include correlation IDs to trace requests across services. Plan for secure storage and access controls because logs can contain sensitive data.

Design a system for consistent hashing and partitioning

Clarify the scale, expected number of nodes, failure modes, and whether you need weighted partitions for heterogeneous nodes. Explain consistent hashing concepts such as virtual nodes to balance load and reduce remapping when nodes join or leave. Implement a ring of hash values and map keys to the first node clockwise on the ring, using virtual nodes per physical node to smooth distribution. When nodes change, only a subset of keys move, and you can combine consistent hashing with replication by choosing multiple successive nodes on the ring for fault tolerance. Be careful with skew caused by poor hash functions or uneven virtual node counts, and monitor key distribution. Use health checks and automatic rebalancing to avoid hotspots and ensure replicas are promoted correctly when failures occur.

Tell me about a time you led a system design under a tight deadline

Situation and Task: Our team needed to deliver a prototype of a real-time analytics pipeline within two weeks to support a pilot program, and the deadline was non-negotiable. I was the technical lead responsible for system design and coordinating work across backend and ops engineers. Action: I prioritized core features that proved the business case, sketched a minimal architecture using managed services to reduce build time, and split work into clear tasks for ingestion, storage, and dashboarding. I scheduled daily syncs to remove blockers quickly and delegated integration testing to a pair who owned end-to-end validation. Result: We delivered the prototype on time, processed pilot data at target throughput, and the pilot led to additional funding to build the full system. The approach reduced initial engineering effort by focusing on high-impact components and measurable metrics.

Describe a time you had to choose between consistency and availability

Situation and Task: In a distributed shopping cart service, we faced choices between strong consistency for cart updates and keeping the cart available during network partitions. As the system owner, I had to choose the right trade-off for user experience and business needs. Action: I analyzed failure scenarios and found occasional network partitions between regions. I proposed an approach that favored availability for reads by serving last-known cart state and used conflict resolution on checkout to reconcile differences. We implemented versioned cart updates and merged items with user confirmation when conflicts arose. Result: Users experienced fewer hard errors during partitions, and conflicts were rare and resolved with clear UI prompts. Conversion rates remained stable, and the product team accepted the slight complexity introduced to handle reconciliation because it improved perceived reliability.

Tell me about a time you diagnosed and resolved a production outage

Situation and Task: A critical API began returning 500 errors during peak traffic, causing user-facing failures for our main product. I led the incident response to identify the root cause and restore service quickly while preserving data integrity. Action: I organized the on-call rotation into roles for triage, mitigation, and root cause analysis, and used recent deploy logs and metrics to narrow the issue to a database connection pool exhaustion. We mitigated by re-routing traffic to a read-only replica for non-critical reads and applied a quick config change to increase connection limits while avoiding cascading failures. Result: The service recovered within an hour with no data loss, and a postmortem led to improved capacity planning and connection pooling best practices. We also added automated alerts for pool saturation and a runbook to accelerate future responses.

How do you design for eventual consistency and what techniques help reconcile conflicts?

Explain eventual consistency by describing that replicas may diverge temporarily, and the system guarantees convergence over time. Discuss common techniques like version vectors, last-write-wins, and application-level merge logic to resolve conflicts depending on data semantics. Give an example: for a distributed user profile, use version vectors to detect concurrent updates and merge non-conflicting fields while prompting users to resolve contradictory fields like email. For counters where loss is acceptable, use CRDTs such as G-Counter or PN-Counter to automatically converge without coordination. Common pitfalls include assuming last-write-wins is acceptable for all data, which can silently lose updates, and underestimating the complexity of merge logic for domain-specific objects. Test conflict scenarios with chaos testing and validate that merges preserve invariants your application requires.

What is backpressure and how do you implement it in streaming systems?

Define backpressure as the mechanism to prevent a faster producer from overwhelming a slower consumer in a pipeline, protecting memory and ensuring stable throughput. Mention approaches like blocking producers, buffering with bounded queues, and applying flow control signals across components. In practice, use frameworks that support backpressure natively, such as reactive streams or Kafka with consumer lag monitoring, and implement circuit breakers or throttling to shed load gracefully. For HTTP-based ingestion, return 429 responses with Retry-After headers when downstream systems are overloaded. Pitfalls include unbounded buffers that cause memory exhaustion and naive retries that amplify load. Monitor end-to-end latency and add backoff strategies with jitter to avoid synchronized retries that worsen overload.

How would you design data partitioning for a multi-tenant database?

Start by clarifying tenant isolation, expected tenant sizes, and whether cross-tenant queries are needed. Choose between shared schema with tenant_id, separate schemas per tenant, or separate databases based on isolation, compliance, and operational complexity. For scalability, shard large tenants across partitions by a tenant-specific key or use hybrid sharding where very large tenants get dedicated shards and small tenants share shards. Keep routing logic simple by using a catalog service mapping tenants to shards and support rebalancing with minimal downtime. Avoid hot partitions by monitoring tenant growth and rebalancing proactively; also plan for tenant migrations and backups. Consider access control and encryption to meet compliance requirements for sensitive tenants.

Explain CAP theorem and practical trade-offs when designing distributed systems

State the CAP theorem: in the presence of network partitions, a distributed system can guarantee only two of Consistency, Availability, and Partition tolerance. Emphasize that partition tolerance is a must for real distributed systems, so the practical trade-off is between consistency and availability. Give a practical example: choose availability for a user-facing feed where stale reads are acceptable, and choose consistency for a payment ledger where correctness matters more than immediate availability. Use quorum-based systems like Cassandra or CP systems like etcd depending on which side you favor and tune read/write quorums for desired trade-offs. Avoid thinking of CAP as a strict blueprint; real systems use techniques like multi-versioning, conflict resolution, and compensating transactions to mitigate trade-offs. Measure which properties matter most for your use case and design to those SLAs rather than abstract rules.

system design Interview Questions: Complete Guide

System design interview questions test your ability to design scalable, reliable systems under real constraints. Expect an open-ended discussion where you clarify requirements, make trade-offs, and communicate a clear architecture while the interviewer asks follow-ups and pushes edge cases.

Common Interview Questions

Behavioral Questions (STAR Method)

STAR Method: Structure your answers using Situation, Task, Action, and Result to tell compelling stories about your experience.

Technical Questions

Questions to Ask the Interviewer

Show your interest by asking thoughtful questions

•What are the current scaling challenges the team is working on and which part of the stack is most constrained?
•How do you measure system health and what SLOs and SLAs does this service target?
•Can you describe the typical on-call responsibilities and how incidents are handled postmortem?
•What trade-offs has the team accepted in the system design recently and why were those choices made?
•How does the team balance feature development with technical debt and larger architectural work?

Interview Preparation Tips

Always start by clarifying requirements and constraints, and restate them to the interviewer before proposing a design.

Sketch a high-level component diagram first, then iterate on data models, APIs, scaling, and failure modes with the interviewer.

Quantify assumptions with rough calculations for traffic, storage, and latency to justify design decisions and show pragmatic thinking.

Practice communicating trade-offs clearly, explaining not only what you choose but why you rejected reasonable alternatives.

Overview

System design interviews test your ability to build scalable, maintainable services under real-world constraints. Interviewers expect clear trade-offs, measurable goals, and step-by-step architecture decisions.

Start by clarifying requirements: for example, ask whether a social feed must support 10 million monthly active users (MAUs) or a peak write throughput of 5k writes/second. Next, propose a high-level sketch: choose boundaries (API layer, load balancer, cache, data stores, message queues), estimate capacity, and justify component choices.

Concrete metrics matter. State latency targets (e.

g. , 95th percentile < 200 ms), availability goals (99.

95% uptime), and storage needs (1 TB/day of image data). Use back-of-envelope math to show you can convert requirements into infrastructure: for instance, 10k reads/sec with 64 KB average object = ~640 MB/s bandwidth, which maps to ~55 TB/day.

Walk through failure modes and mitigations: replica placement for availability, multi-AZ deployments for 99. 99% uptime, retries with exponential backoff, and circuit breakers to prevent cascading failures.

Use diagrams mentally but explain them verbally: identify bottlenecks, then propose caching, sharding, or batching to relieve pressure.

Actionable takeaway: always start with questions, quantify traffic and latency, and present a clear plan to handle load and failures using specific numbers and one or two concrete examples.

Key Subtopics to Master

Focus study on these high-impact areas, each paired with concrete examples and practice prompts.

•Scalability & Capacity Planning
•Example: design a URL shortener for 1 million shortened links and 100k redirects/sec. Estimate storage (assume 16 bytes/id) and throughput.
•Practice: calculate servers needed for 10k QPS given 1000 QPS per app server.

•Data Modeling & Partitioning
•Example: shard a users table by user_id range vs hash to balance load for 100M users.
•Practice: design schema for time-series metrics storing 10k metrics/sec.

•Caching Strategies
•Example: use LRU in-memory cache for 90% read hit rate to reduce DB load by 10x.
•Practice: choose TTL, cache invalidation, and cache-aside vs write-through.

•Consistency & Databases
•Example: pick SQL for strong consistency (banking) and NoSQL for high write throughput (activity stream).
•Practice: explain trade-offs using CAP theorem for a geo-replicated store.

•Messaging & Asynchrony
•Example: use Kafka for replayable event log supporting 1M messages/sec across topics.
•Practice: pick between SQS, Kafka, and RabbitMQ based on durability and consumer patterns.

•Observability & SLOs
•Example: set SLO = 99.9% success with 200 ms P95 latency and build alerts for 3-minute error spikes.

Actionable takeaway: practice with numbers—estimate traffic, storage, and cost for 5 common architectures.

Resources and Practice Plan

Use a mix of books, courses, blogs, and practical exercises. Allocate time: 6–8 weeks with 4–6 hours/week yields measurable improvement.

Books & Reading

•"Designing Data-Intensive Applications" by Martin Kleppmann — deep on data models, replication, and consistency. Read 2–3 chapters/week and summarize trade-offs in a notebook.
•"Systems Design Interview" by Alex Xu — concise patterns and sample designs; replicate 10 architectures.

Online Courses & Videos

•Educative: "Grokking the System Design Interview" — step-by-step templates and 20+ case studies. Practice one case per week.
•YouTube: Gaurav Sen and Tech Dummies Narendra L — watch 2–3 architecture walkthroughs and recreate diagrams.

Practice Platforms

•Pramp / Interviewing.io — do at least 6 mock interviews; focus on feedback loops.
•LeetCode Discuss & GitHub repos (search "system-design-primer") — review community designs and code for caching, rate limiting.

Blogs & Case Studies

•HighScalability and Netflix Tech Blog — read 1 architecture post weekly; note scaling decisions and metrics.

Practical Exercises

•Build 3 mini-projects: URL shortener, chat service, and image CDN prototype. Measure local QPS and latency; profile bottlenecks.

Actionable takeaway: follow a 6–8 week plan combining reading, mock interviews, and 3 hands-on projects; track progress with specific metrics (e. g.

, reduce prototype latency by 30%).

system design Interview Questions: Complete Guide

Michael Rodriguez

Common Interview Questions

Behavioral Questions (STAR Method)

Technical Questions

Questions to Ask the Interviewer

Interview Preparation Tips

Overview

Key Subtopics to Master

Resources and Practice Plan

Common Interview Questions

Build your job search toolkit

system design Interview Questions: Complete Guide

Michael Rodriguez

Common Interview Questions

Q1Design a URL shortener like bit.ly

Q2Design a news feed system like Twitter

Q3Design a rate limiter for an API

Q4Design a scalable chat system with real-time messaging

Q5Design a file storage service like Dropbox

Q6Design an autocomplete service for search queries

Q7Design a distributed caching strategy for a large web service

Q8Design a photo sharing service that supports thumbnails and transformations

Q9Design a logging and monitoring system for microservices

Q10Design a system for consistent hashing and partitioning

Behavioral Questions (STAR Method)

B1Tell me about a time you led a system design under a tight deadline

B2Describe a time you had to choose between consistency and availability

B3Tell me about a time you diagnosed and resolved a production outage

Technical Questions

T1How do you design for eventual consistency and what techniques help reconcile conflicts?

T2What is backpressure and how do you implement it in streaming systems?

T3How would you design data partitioning for a multi-tenant database?

T4Explain CAP theorem and practical trade-offs when designing distributed systems

Questions to Ask the Interviewer

Interview Preparation Tips

Overview

Key Subtopics to Master

Resources and Practice Plan

Common Interview Questions

Build your job search toolkit