ResearchDrive - Understanding Storage Units GB vs. GiB

This document explains the differences between data storage units.

One very confusing aspect of data storage is the units that storage usage is reported in. Sometime GB is really gigabytes (base-10) and sometimes GB represents gibibytes (base-2) which should be abbreviated as GiB.

Summary (TL;DR)

Storage is sold in GB (base-10), but file system sizes are reported in GiB (base-2). Those are not the same thing, and if you assume they are, any cost calculations you do will not be correct.

So, what is a GiB?

A GiB or gibibyte is 2^30 or 1,073,741,824 bytes. Compared to a GB or gigabyte which is 10^9 or 1,000,000,000 (billion) bytes. A GiB is a base-2 number meaning 2 to the power of something. Most humans are used to thinking in base-10 units since most of us have 10 fingers and 10 toes. Base-10 numbers are based on 10 to the power of something. The metric system is a base-10 system and uses SI base-10 prefixes: 1 kilometer = 1,000 meters 1 millimeter = .001 meters

Bytes and Bits - Data Storage vs. Network Speed

The base unit for data storage is bytes and the abbreviation is B (upper case). 1 byte of data can store one character in this document. To make things even more confusing, network speeds are stated in bits. 8 bits = 1 byte. The abbreviation for bits is b (lower case). You can safely ignore bits for the rest of this discussion, but know that MB/s is not the same as Mb/s. 1 MB/s is 8 times as fast as 1 Mb/s.

Even though humans are used to thinking in base-10, computers operate in binary or base-2 at the lowest level. Because of that, some resources like memory address spaces make more sense to represent in base-2. Long ago, in 1984, someone (probably at IBM) decided to make computer file systems show sizes in base-2 to go along with the memory. That seemed like a good idea at the time, and when the units were smaller the size difference was negligible anyway. Now that our storage systems are much, much larger this unit difference is causing confusion.

Base-10/Base-2 Differences

You've probably heard of most of these in the first table, which are in base-10. The second table is in base-2. You can see that the number of bytes is similar at the small end of the scale, but they get much further apart the farther you go up the scale. The names and abbreviations for base-2 were invented in 1998.

Base-10 Chart
Unit Factor Bytes Abbreviation Difference
byte 10^0 1 B -0%
kilobyte 10^3 1,000 KB -2.3%
megabyte 10^6 1,000,000 MB -4.6%
gigabyte 10^9 1,000,000,000 GB -6.9%
terabyte 10^12 1,000,000,000,000 TB -9.1%
petabyte 10^15 1,000,000,000,000,000 PB -11.2%
exabyte 10^18 1,000,000,000,000,000,000 EB -13.3%
Base-2 Chart
Unit Factor Bytes Abbreviation Difference
byte 2^0 1 B +0%
kibibyte 2^10 1,024 KiB +2.4%
mebibyte 2^20 1,048,576 MiB +4.9%
gibibyte 2^30 1,073,741,824 GiB +7.4%
tebibyte 2^40 1,099,511,627,776 TiB +10.0%
pebibyte 2^50 1,125,899,906,842,624 PiB +12.6%
exbibyte 2^60 1,152,921,504,606,846,976 EiB +15.4%

Buying Storage

When you buy storage for your computer the vendor almost always sells it in base-10 units. If you order a 2 TB hard drive from Amazon, once you install that in your computer it's really only 1.8 TB? What happened? Well, you can see from this screenshot that computer knows that drive is about 2 trillion bytes, but once the operating system converts that to base-2 it just looks like it has less space because of the unit. So your hard drive really is about 2 TB (in this case slightly larger), but the the file system only has 1.8 TiB of free space. Windows is just displaying the incorrect unit type.

2,000,263,573,504 / 1,099,511,627,776 = 1.8192 TiB

2TB Hard Drive

Storage Reporting and Billing

Since selling storage in the industry is almost universally done in base-10, the storage team has decided to also sell storage in base-10 units. Most of our pricing is calculated in $/GB/year. It makes it much easier to estimate costs since people doing base 10 calculations in their head just divide by 1000s. Our storage costs have been calculated using GB. If we were to start selling by the GiB the price could be off by 7.4% which is considerable when we're selling petabytes of storage. We collect and store the occupancy data in bytes, so that units could be displayed in base-2 if need be.

Isilon Displays Quotas in Base-2

Our Dell Powerscale (Isilon) storage system, which is the back end for the Shared Drive, ResearchDrive, and Restricted Drive services, uses quotas to limit storage usage. Isilon uses base-2 units for quotas but doesn't abbreviate them correctly. So if your quota is listed as 25 TB, that's really 25 TiB or 27,487,790,694,400 bytes. Since our reporting and billing system is based on bytes and base-10 units you may see a difference between what your bill says and what the quota reports to the operating system.

ResearchDrive

ResearchDrive PIs get 25 TB for free, which is covered by the VCRGE. Since the quotas in Isilon are set using TiB, we are going to use that as the basis for the free disk. ResearchDrive users will get 27,487,790,694,400 bytes for free before any charges are applied. Storage fees for beyond the first 25 TiB will be charged by GB (base-10) just as all of the rest of the storage is billed.

Questions?

We know this is confusing, so we apologize for that in advance. We are trying to be as transparent as possible, while being consistent with the storage industry. Please email storage@doit.wisc.edu if you have any questions.