Skip to content

Performance optimization for size info in NTFS#33

Merged
LordMike merged 2 commits into
DiscUtils:masterfrom
ahdde:ntfs_performance
Jul 11, 2017
Merged

Performance optimization for size info in NTFS#33
LordMike merged 2 commits into
DiscUtils:masterfrom
ahdde:ntfs_performance

Conversation

@zivillian

Copy link
Copy Markdown
Contributor

I've found a major performance penalty in the space usage calculation of NTFS. Currenty the ClusterBitmap is read byte by byte from the underlying disk. This result in runtimes of up to one minute for large filesystems (>1TiB fs with a bitmap >50MiB).

I've changed the implementation to read the bitmap in blocks of 4K, which increased the performance by > 99% (in my test from ~1200ms down to ~45ms).

As a second optimization I've added a BitCounter which uses a precalculated lookup table to further speed up the size calculation from ~45ms down to ~15ms.

@LordMike LordMike left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great contribution. We can probably improve even more, but it's a great start. 👍

Please see the comments I've made.

Comment thread Library/DiscUtils.Ntfs/Bitmap.cs Outdated
return 0;
}

public int GetBytes(byte[] buffer, long index, int count)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We CANNOT have a public method formed like this. I would assume you mean the Index of the buffer, and not the index of the underlying structure.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Either:

  • make it private/internal
  • change it so that it takes an offset for the buffer as well (like Array.Copy())
  • some other way, so that we keep within the observed standards in .NET API's .. :)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also. Name the offset in buffer just that: offset. So we're consistent with others. :)

@zivillian zivillian Jul 11, 2017

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This signature was actually a bad copy of GetByte. I've added offset, made both internal and reordered the parameters to be more clear.


namespace DiscUtils.Streams
{
public static class BitCounter

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we add a little documentation on the purpose of this class? .. Much of the API does not have docmentation, but it'd be nice to know that the purpose of this class is (set) bit counting. Maybe on the public methods too.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

{
var end = start + count;
if (end > values.Length)
return 0;

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please throw exceptions for exceptional states.

@zivillian zivillian Jul 11, 2017

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done - but I'm actually unsure which exception and message fits best - I first thought about ArgumentOutOfRangeException, but there's no single argument, which is out of range.

return _lookupTable[value];
}

public static long Count(byte[] values, int start, int count)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we name them offset and count?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

@LordMike LordMike merged commit 9af6581 into DiscUtils:master Jul 11, 2017
@zivillian zivillian deleted the ntfs_performance branch July 11, 2017 18:41
LordMike added a commit that referenced this pull request Aug 16, 2017
Includes PRs:
#17, #18, #21, #22, #23, #27, #30, #31, #33, #34, #35, #36, #38, #39, #40, #41, #48, #51, #52, #55, #60
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants