How Google, Facebook, Instagram etc stores, manages and manipulate Thousands of Terabytes of data with High Speed and High Efficiency.

Harsh Agrawal
3 min readSep 16, 2020

--

Big companies like Google,Facebook,Apple,Microsoft etc get large amount of data from their users. The statistic shows that 500+terabytes of new data get ingested into the databases of social media site Facebook, every day.That means more than 15 petabytes in a month.This data is mainly generated in terms of photo and video uploads, message exchanges, putting comments etc.

Such a big amount of data is called as Big Data , a data with huge size , having large volume and yet growing exponentially with time. In short such data is so large and complex that none of the traditional data management tools are able to store it or process it efficiently.

Big data has three subproblems-
1.Volume — The amount of data matters. With big data, you’ll have to process high volumes of low-density, unstructured data. This can be data of unknown value, such as Twitter data feeds, clickstreams on a webpage or a mobile app, or sensor-enabled equipment. For some organizations, this might be tens of terabytes of data. For others, it may be hundreds of petabytes.

2.Velocity — Velocity is the fast rate at which data is received and (perhaps) acted on. Normally, the highest velocity of data streams directly into memory versus being written to disk. Some internet-enabled smart products operate in real time or near real time and will require real-time evaluation and action.

3.Variety — Variety refers to the many types of data that are available. Traditional data types were structured and fit neatly in a relational database. With the rise of big data, data comes in new unstructured data types. Unstructured and semistructured data types, such as text, audio, and video, require additional preprocessing to derive meaning and support metadata.

Big companies uses distributed model to store Big data.In this model there is one main computer and thousands and millions of computers are connected to it. With this master and slave type of topology , master computer have computation power equal to those slave computer. In this way data can be in “parallel” manner rather than storing all data in in one larger hard disk , it is better to distribute it.

Benefits of Big Data Processing

Ability to process Big Data brings in multiple benefits, such as-

  • Businesses can utilize outside intelligence while taking decisions

Access to social data from search engines and sites like facebook, twitter are enabling organizations to fine tune their business strategies.

  • Improved customer service

Traditional customer feedback systems are getting replaced by new systems designed with Big Data technologies. In these new systems, Big Data and natural language processing technologies are being used to read and evaluate consumer responses.

  • Early identification of risk to the product/services, if any
  • Better operational efficiency

Big Data technologies can be used for creating a staging area or landing zone for new data before identifying what data should be moved to the data warehouse. In addition, such integration of Big Data technologies and data warehouse helps an organization to offload infrequently accessed data.

--

--