Design Video Sharing App

Definition

Design a video sharing service like Youtube where users will be able to upload/view/search videos.

Requirements

Functional Requirement

  • Users should be able to upload videos
  • Users should be able to share/view videos
  • Users should be able to search videos
  • System should be able to record stats (likes,etc)
  • Users should be able to add/view comments

Non-Functional Requirement

  • Low latency : Users should be able to view videos without much lag
  • Consistency : System should provide same videos to users in all devices
  • Available : System must be highly avialable
  • Reliable : System should not lose data

Out of Scope

  • Video recommendation
  • Channel subscription
  • Watch later

Capacity Management

Users estimate

  • Assume total user : 1.5 Billion
  • Active users : 800 M (daily)
  • On average , user views 5 videos per day
  • Total video views per second = 800M * 5 / 86400 sec => 46K videos/sec

Upload estimate

  • Assume upload:view ratio 1:200
  • For every video upload , we have 200 video views
  • 46K / 200 => 230 videos/sec

Storage estimate

  • Assume every minute 500 hours worth of videos are uploaded
  • On average, 1 min of video needs 50 Mb
  • Total storage = 500 hours * 60 min * 50MB => 1500 GB/min (25 GB/sec)
  • Not taking replication and compression into account

Bandwidth estimate

  • With 500 hours of video upload per min
  • Each video upload takes a bandwidth of 10 Mb/min
  • Total : 500 hours * 60 mins * 10MB => 300GB/min (5GB/sec)
  • Assuming an upload:view ratio of 1:200, we would need 1TB/s outgoing bandwidth

System API

We can have SOAP or REST APIs to expose the functionality of our service. The following could be the definitions of the APIs for uploading and searching videos:

uploadVideo (api_dev_key, video_title, vide_description, tags[], category_id,default_language, recording_details, video_contents)

Parameters:

  • api_dev_key (string): The API developer key of a registered account. This will be used to, among other things, throttle users based on their allocated quota.
  • video_title (string): Title of the video.
  • video_description (string): Optional description of the video.
  • tags (string[]): Optional tags for the video.
  • category_id (string): Category of the video, e.g., Film, Song, People, etc.
  • default_language (string): For example English, Mandarin, Hindi, etc.
  • recording_details (string): Location where the video was recorded.
  • video_contents (stream): Video to be uploaded.

Returns: (string) A successful upload will return HTTP 202 (request accepted) and once the video encoding is completed the user is notified through email with a link to access the video. We can also expose a queryable API to let users know the current status of their uploaded video.

searchVideo(api_dev_key, search_query, user_location, maximum_videos_to_return, page_token)

Parameters:

  • api_dev_key (string): The API developer key of a registered account of our service.
  • search_query (string): A string containing the search terms.
  • user_location (string): Optional location of the user performing the search.
  • maximum_videos_to_return (number): Maximum number of results returned in one request.
  • page_token (string): This token will specify a page in the result set that should be returned.

Returns: (JSON) A JSON containing information about the list of video resources matching the search query. Each video resource will have a video title, a thumbnail, a video creation date, and a view count.

streamVideo(api_dev_key, video_id, offset, codec, resolution)

Parameters:

  • api_dev_key (string): The API developer key of a registered account of our service.
  • video_id (string): A string to identify the video.
  • offset (number): We should be able to stream video from any offset; this offset would be a time in seconds from the beginning of the video. If we support playing/pausing a video from multiple devices. We will need to store the offset on the server. This will enable the users to start watching a video on any device from the same point where they left off.
  • codec (string) & resolution(string): We should send the codec and resolution info in the API from the client to support play/pause from multiple devices. Imagine you are watching a video on your TV’s Netflix app, paused it, and started watching it on your phone’s Netflix app. In this case, you would need codec and resolution, as both these devices have a different resolution and use a different codec.

Returns: (STREAM) A media stream (a video chunk) from the given offset

High Level Design

  1. Processing Queue: Each uploaded video will be pushed to a processing queue to be de-queued later for encoding, thumbnail generation, and storage.
  2. Encoder: To encode each uploaded video into multiple formats.
  3. Thumbnails generator: To generate a few thumbnails for each video.
  4. Video and Thumbnail storag: To store video and thumbnail files in some distributed file storage.
  5. User Database: To store user’s information, e.g., name, email, address, etc.
  6. Video metadata storage: A metadata database to store all the information about videos like title, file path in the system, uploading user, total views, likes, dislikes, etc. It will also be used to store all the video comments.

Database Schema

Video Metadata

  • VideoID , Title , Description , Size , Thumbnail , Uploader/User ,
  • Total number of likes
  • Total number of dislikes
  • Total number of views

Video Comment Data

  • CommentID , VideoID , UserID , Comment , TimeOfCreation

User data

  • UserID, Name, email, address, age, registration details etc.

Metadata Sharding

  • Sharding based on UserID
  • Sharding based on VideoID

Problems :

  • What happens when user becomes popular/influencer or videos become viral ?
  • WHat happens when a few vloggers are uploading more videos than others resulting to uneven distribution of data in shards ?

Solution :

  • Repartition/redistribute data or use Consistent Hashing to balance load between servers

High level process (Sharding and Hashing)

  • Hash function can map each UserID and VideoID to a random server
  • This will store that User and Video metadata accordungly.
  • To find videos of a user we will query all servers and each server will return a set of videos.
  • A centralized server will aggregate and rank these results before returning them to the user.
  • This approach solves our problem of popular users but shifts it to popular videos.

Video Deduplication

  • System must deal with duplicate videos. As duplicate videos can impact :
    • Data Storage : Huge waste of data storage waste
    • Caching : Degrade cache efficiency for taking up huge space by duplicate videos
    • Network usage : : Duplicate videos will also increase the amount of data that must be sent over the network
    • Energy consumption : Higher storage, inefficient cache, and network usage could result in energy wastage.

There are many algortihms which can be implemented to avoid videos duplication as soon as user uploads video :

  • Block Matching
  • Phase Correlation

Cache & Load Balancer

Read about load balancer from here

  • Uses Consistent Hashing among cache servers

Need to have :

  • Video Cache server
  • CDN [Content Delivery Network] (For globally distributed users)
  • Cache for Metadata users [LRU]
    • Cache for hot users or influencers Read more about cache from here

Apply 80-20 rule

It means 20% of daily read volume for video is generating 80% of traffic which means that certain videos (hot users/trending/influencers) are so popular that the majority of people view them. This dictates that we can try to cache 20% of daily view volume of videos and metadata.

Fault replication

Reed-Solomon encoding to distribute and replicate data

More readings

Encourage to subscribe & go through course in more details from Educative.io - here