A Software application to provide a platform to upload , view and search photos. Additionally, follow and search users.
Platform must provide seamless experience to users in all devices.
Requirements
Functional Requirement
System must allow users to Upload and view photos
System must allow users can follow users
Users can Search user and photos
Non-Functional Requirement
Low latency : System should provide near real time feed generation (~200ms)
Consistency : System should provide same message to users in all devices
Available : System must be highly avialable
Reliable : System should not lose photos/user data
Capacity Management
Total users : 500 M
Active users : 1 M (daily)
Avg photo size : 200 KB
Say 2M photos uploaded daily , then total space (photos) per day = 2M * 200 = 400G B per day
Total space (photos) in 10 years = 400 GB * 10 years * 365 days per year = 1425 TB
Say 2M photos upload per day * 284 bytes ~= 0.5 GB per day
For 10 years * 365 * 0.5 GB = 1.88 TB of storage.
User Follows space
For 500 million users * 500 followers * 8 bytes ~= 1.82 TB
Total space required for all tables for 10 years
Total space
= Users space + Photos space + User follows
= 32GB + 1.88TB + 1.82TB ~= 3.7 TB
Component Design
Assume web server has max 500 connection
This means 500 concurrent connecion (uploads or reads)
Separate and dedicated reads & uploads servers (So as uploads dont hog system and improve scaling)
Multiple copies of files to be stored
Multiple replicas of servers running (removes redundany)
Data Sharding
Partitioning based on UserID
Assume 1 shard has 1 TB , with 3.7 TB - we would require 4 shards
A few Questions
How do we handle influencers or people with many followers ?
How do we handle people who post more post than others ?
How do we handle sitaution where users photo are not in 1 shard ?
In case , lost of 1 shard - users might experience availability issue ?
Possible solution : If we can generate unique PhotoID first and then find shard number through “PhotoID % 10”, the above problems might be solved (partially)
Generate Photo IDs
Dedicate a separate database instance to generate auto-incrementing IDs.
Assume PhotoID fits into 64 bits, we can define a table containing only a 64 bit ID field.
Next, whenever we add a photo in our system, we can insert a new row in this table and take that ID to be our PhotoID of the new photo
In order to avoid single point of failure ,
We define with one generating even numbered IDs and the other odd numbered.
Place load balancer in front of both of these databases to round robin between them and to deal with downtime
Cache for Metadata users [LRU]
Read more about cache from here
Apply 80-20 rule
It means 20% of daily read volume for photos is generating 80% of traffic which means that certain photos are so popular that the majority of people read them. This dictates that we can try to cache 20% of daily read volume of photos and metadata.
More readings
Encourage to subscribe & go through course in more details from Educative.io - here