Ext3 file system overhead disclosed (Part 2)
In the previous part we have realized that we have disk actually smaller, then we thought, when we were buying it. Now we will investigate what is the overhead of the ext3 file system itself, where and why the space is used.
To start we will look again to our fdisk output, but a bit from a different point of view, (with -u option).
-u When listing partition tables, give sizes in sectors instead of cylinders. # fdisk -l -u /dev/sdh
Disk /dev/sdh: 999.9 GB, 999989182464 bytes
255 heads, 63 sectors/track, 121575 cylinders, total 1953103872 sectors
Units = sectors of 1 * 512 = 512 bytes
Device Boot Start End Blocks Id System
/dev/sdh1 63 1953102374 976551156 83 Linux From this output we can see that
Total disk size = 999989182464
and Total disk sectors = 1953103872
Also we can count that Disk sectors used for partition = 1953102374 – 63 + 1 = 1953102312
Interesting number! 1953103872 - 1953102312 = 1560 sectors lost. Which is 1560 * 512 = 286720 bytes. On my another 500G drive this value was about 7 Mebibytes. This is unknown for me, I suppose this is just roundup of usable space on the disk.
So Partition size is 1953102312 * 512 = 999988383744 bytes
Or if we will count in fylesystem blocks (which is by default 4k)
Partition 4k blocks = 999988383744 / 4096 = 244137789
This is exactly what we saw during the format
# mke2fs /dev/sdh1
mke2fs 1.39 (29-May-2006)
Filesystem label=
OS type: Linux
Block size=4096 (log=2)
Fragment size=4096 (log=2)
122077184 inodes, 244137789 blocks
12206889 blocks (5.00%) reserved for the super user
First data block=0
Maximum filesystem blocks=4294967296
7451 block groups
32768 blocks per group, 32768 fragments per group
16384 inodes per group
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848
Writing inode tables: done
Creating journal (32768 blocks): done
Writing superblocks and filesystem accounting information: done
This filesystem will be automatically checked every 30 mounts or
180 days, whichever comes first. Use tune2fs -c or -i to override. We know from the theory that ext3 file system consists of block groups.

This is done to optimize usage of the file system, make fragmentation less and increase speed. In the ext2 file system each block group consisted of Superblock, Group descriptor, Data Block bitmap, Inode bitmap, and Inode table.
From the format output we see that we have only 18 superblocks backups.
Superblock backups stored on blocks:
32768, 98304, 163840, 229376, 294912, 819200, 884736, 1605632, 2654208,
4096000, 7962624, 11239424, 20480000, 23887872, 71663616, 78675968,
102400000, 214990848 Each such block group is “full” and all the rest consists only of Data Block bitmaps, Inode bitmaps, and Inode table. This is possible because each superblock and group descriptor holds information for the whole file system. This feature appeared in ext3. Before this each Block group had the same number of available data blocks and inodes.
So Total number of super blocks = 1 primary + 18 backups = 19
I supposed that following "-O feature" makes this.
sparse_super
Create a filesystem with fewer superblock backup copies (saves space on large filesystems). However if you will omit it, mke2fs will still create only 18 backups. I guess this is hardcoded somewhere.
Let's check how many Block groups do we have.
# dumpe2fs /dev/sdh1
...
Block count: 244137789Number of Block groups = Fdisk Block count / blocks per group.
Blocks per group is fixed when file system is created. The block bitmap, which describes all the data blocks in the group, must fit into 1 block, hence the number of blocks per group is 8 * block size
So...
Number of Block groups = Fdisk Block count / ( 8 * block size ) = 244137789 / ( 8 * 4096 ) = 7450.494049072. Which means that we have 7450 full-sized block groups and one small. Size of the small group is 0.494049072 * 32768 = 16188 blocks. 0.999991296 blocks are lost (actually it isn't, we will see this later).
What is exactly corresponds to what we see from the mke2fs output (7451 block groups)
The size of 1 block bitmap = blocks per group * block size = 32768 * 4096 = 128M.
How much actually free usable space we have?
Each block group with super block ( totally 19 ) have
Data Blocks = Blocks Per group - (Super block + Group descriptors + Reserved GDT blocks + Data Block bitmap + Inode bitmap + Inodes table ) =
= 32768 - (1 + 59 + 965 + 1 + 1 + 512 ) = 31229
Each block group without super block ( 7450 - 19 = 7431 ) have
Data Blocks = Blocks Per group - (Data Block bitmap + Inode bitmap + Inodes table ) =
= 32768 - (1 + 1 + 512 ) = 32254
And the last small group have
Data Blocks = Blocks Per group - (Data Block bitmap + Inode bitmap + Inodes table ) =
= 16188 - (1 + 1 + 512 ) = 15674
So total Data blocks available = 19 * 31229 + 7431 * 32254 + 15674 = 240288499 blocks = 984,221,691,904 bytes.
What we see in df
# df /dev/sdh1
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdh1 961227340 73364 912326420 1% /usr/hosting_files8 Total fs size according to df is = 961227340 * 1024 = 984,296,796,160 bytes
df total size – total Data blocks = 984296796160 - 984221691904 = 75104256 bytes
df shows that Available space is 75104256 bytes bigger which is (/ 4096 =) 18336 blocks.
Probably df counts Reserved GDT blocks as free.
Reserved GDT blocks = 965 * 19 = 18335 blocks which is 1 block smaller then df output.
Couple of questions
1) Why the size of group descriptors is 59 blocks ?
2) Why the size of Reserved GDT block is 965 blocks ( Reserved GDT blocks: 965) ?
3) Why the size of inodes table is 512 blocks (Inode blocks per group: 512 )?
4) Where is that one free block?
5) And where is our journal ?
Answers (Updated):
2) The reserved GDT block is caused by this "-O feature"
resize_inode
Reserve space so the block group descriptor table may grow in the
future. Useful for online resizing using resize2fs. By default
mke2fs will attempt to reserve enough space so that the
filesystem may grow to 1024times its initial size. This can be
changed using resize extended option. So current Size of group descriptors + Reserved GDT blocks should equal 1024.
In our case 59 + 965 = 1024. So it is correct.
# tune2fs -l /dev/sdh1
...
Reserved GDT blocks: 9654) Where is that one free block?
I suppose that I did wrong assumption saying that size of the last small group is 0.494049072 * 32768 = 16188 blocks and that 0.999991296 blocks are lost. If it was rounded up to 16189 then it also makes sense.
5) Where is the journal stored?
So this means that we can treat this journal as ordinary file (32768 * 4096 = 128M), which df will show us as already USED.
# df /dev/sdh1les8
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdh1 961227340 204572 912195212 1% /usr/hosting_files8Let's turn off journal and see how much space we will have
# umount /usr/hosting_files8
# tune2fs -O ^has_journal /dev/sdh1
# mount /dev/sdh1 /usr/hosting_files8
# df /dev/sdh1
Filesystem 1K-blocks Used Available Use% Mounted on
/dev/sdh1 961227340 73364 912326420 1% /usr/hosting_files8204572K - 73364K = 128M
This proves the statement about journal. But why these 73364 blocks are Used? This is another question, on which I don't have answer yet. If I will not be too lazy, I will figure it out.
Let's summarize total ext3 file system overhead.
Total default ext3 file system overhead = number of block groups with superblocks * ( Super block + Group descriptors + Reserved GDT blocks + Data Block bitmap + Inode bitmap + Inodes table ) + number of block groups without superblocks * ( Data Block bitmap + Inode bitmap + Inodes table ).
In my case this equals 3859055 blocks or ~ 14.6 Gibibytes
In the next article I will do a little experiment and will try to reduce this overhead.
Links:
Understanding the Linux Kernel, 3rd Edition, By Daniel P. Bovet, Marco Cesati
The EXT2 File System, Presented by: S. Arun Nair, Abhinav Golas
