Data Gravity

For those with lots of data, and I’m talking about petabytes, the location of data is a major factor cloud location decisions. This is when data gravity become an issue.

Data Gravity, explained on technopedia:

Data is something that continues to accumulate over time, and could be considered to become more dense, or have a greater mass. As density or mass accumulates, the data’s gravitational pull increases. Services and applications have their own mass and; therefore, have their own gravity. But data is much bigger and denser than the two. So, as data continues to build mass, services and applications are more likely to be drawn to the data, rather than vice versa. This much like an apple falling to earth, which if often provided as a typical example of gravity. Because the earth has more mass, the apple falls to the earth, rather than the other way around.

Paying to host petabytes of data in a cloud provider can be expensive. There’s a point where it’s more cost effective to host it on premise using something like Ceph.

Moving data out of cloud providers is also expensive and time consuming when dealing with petabytes of it.

This can create a form of lock-in to a platform.

Dealing with enterprises, some of whom have large amounts of data to go along with the amount of compute capacity they need, has made me consider where private clouds can potentially be useful.

Whether a startup or an enterprise it’s useful to be aware of the impact of data gravity.

Code Engineered