Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement O(n) sort for inverted selection #1904

Closed
wants to merge 1 commit into from

Conversation

jarkkojs
Copy link

No description provided.

@jarun
Copy link
Owner

jarun commented Jun 28, 2024

@KlzXS can you please review this?

@jarun
Copy link
Owner

jarun commented Jun 28, 2024

@jarkkojs CI fails with the following:

########## clang-tidy-15 ##########
72 warnings generated.
101 warnings generated.
101 warnings generated.
/root/project/src/nnn.c:1761:2: error: Assigned value is garbage or undefined [clang-analyzer-core.uninitialized.Assign,-warnings-as-errors]
        size_t startpos = (size_t)marked[i].startpos;
        ^
/root/project/src/nnn.c:1788:20: note: Uninitialized value stored to field 'startpos'
        selmark *marked = malloc(2 * nselected * sizeof(selmark));
                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/project/src/nnn.c:1790:6: note: Assuming 'marked' is non-null
        if (!marked) {
            ^~~~~~~
/root/project/src/nnn.c:1790:2: note: Taking false branch
        if (!marked) {
        ^
/root/project/src/nnn.c:1796:14: note: Assuming 'i' is >= 'ndents'
        for (i = 0; i < ndents; ++i) {
                    ^~~~~~~~~~
/root/project/src/nnn.c:1796:2: note: Loop condition is false. Execution continues on line 1844
        for (i = 0; i < ndents; ++i) {
        ^
/root/project/src/nnn.c:1844:2: note: Calling 'sort_array'
        sort_array(marked, &marked[nselected], nselected, sizeof(selmark),
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/project/src/nnn.c:1736:2: note: Loop condition is true.  Entering loop body
        for (b = 0; b < 8; b++) {
        ^
/root/project/src/nnn.c:1740:15: note: Assuming 'i' is < 'nelem'
                for (i = 0; i < nelem; i++)
                            ^~~~~~~~~
/root/project/src/nnn.c:1740:3: note: Loop condition is true.  Entering loop body
                for (i = 0; i < nelem; i++)
                ^
/root/project/src/nnn.c:1741:8: note: Calling 'selmark_byte'
                        cnt[byte(data, i, b)]++;
                            ^~~~~~~~~~~~~~~~
/root/project/src/nnn.c:1761:2: note: Assigned value is garbage or undefined
        size_t startpos = (size_t)marked[i].startpos;
        ^                 ~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/project/src/nnn.c:1788:27: error: performing an implicit widening conversion to type 'unsigned long' of a multiplication performed in type 'int' [bugprone-implicit-widening-of-multiplication-result,-warnings-as-errors]
        selmark *marked = malloc(2 * nselected * sizeof(selmark));
                                 ^
/root/project/src/nnn.c:1788:27: note: make conversion explicit to silence this warning
        selmark *marked = malloc(2 * nselected * sizeof(selmark));
                                 ^~~~~~~~~~~~~
                                 (unsigned long)( )
/root/project/src/nnn.c:1788:27: note: perform multiplication in a wider type
        selmark *marked = malloc(2 * nselected * sizeof(selmark));
                                 ^
                                 (long)
Suppressed 102 warnings (99 in non-user code, 3 NOLINT).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
2 warnings treated as errors

Exited with code exit status 1

@jarkkojs
Copy link
Author

@jarkkojs CI fails with the following:

########## clang-tidy-15 ##########
72 warnings generated.
101 warnings generated.
101 warnings generated.
/root/project/src/nnn.c:1761:2: error: Assigned value is garbage or undefined [clang-analyzer-core.uninitialized.Assign,-warnings-as-errors]
        size_t startpos = (size_t)marked[i].startpos;
        ^
/root/project/src/nnn.c:1788:20: note: Uninitialized value stored to field 'startpos'
        selmark *marked = malloc(2 * nselected * sizeof(selmark));
                          ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/project/src/nnn.c:1790:6: note: Assuming 'marked' is non-null
        if (!marked) {
            ^~~~~~~
/root/project/src/nnn.c:1790:2: note: Taking false branch
        if (!marked) {
        ^
/root/project/src/nnn.c:1796:14: note: Assuming 'i' is >= 'ndents'
        for (i = 0; i < ndents; ++i) {
                    ^~~~~~~~~~
/root/project/src/nnn.c:1796:2: note: Loop condition is false. Execution continues on line 1844
        for (i = 0; i < ndents; ++i) {
        ^
/root/project/src/nnn.c:1844:2: note: Calling 'sort_array'
        sort_array(marked, &marked[nselected], nselected, sizeof(selmark),
        ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/project/src/nnn.c:1736:2: note: Loop condition is true.  Entering loop body
        for (b = 0; b < 8; b++) {
        ^
/root/project/src/nnn.c:1740:15: note: Assuming 'i' is < 'nelem'
                for (i = 0; i < nelem; i++)
                            ^~~~~~~~~
/root/project/src/nnn.c:1740:3: note: Loop condition is true.  Entering loop body
                for (i = 0; i < nelem; i++)
                ^
/root/project/src/nnn.c:1741:8: note: Calling 'selmark_byte'
                        cnt[byte(data, i, b)]++;
                            ^~~~~~~~~~~~~~~~
/root/project/src/nnn.c:1761:2: note: Assigned value is garbage or undefined
        size_t startpos = (size_t)marked[i].startpos;
        ^                 ~~~~~~~~~~~~~~~~~~~~~~~~~~
/root/project/src/nnn.c:1788:27: error: performing an implicit widening conversion to type 'unsigned long' of a multiplication performed in type 'int' [bugprone-implicit-widening-of-multiplication-result,-warnings-as-errors]
        selmark *marked = malloc(2 * nselected * sizeof(selmark));
                                 ^
/root/project/src/nnn.c:1788:27: note: make conversion explicit to silence this warning
        selmark *marked = malloc(2 * nselected * sizeof(selmark));
                                 ^~~~~~~~~~~~~
                                 (unsigned long)( )
/root/project/src/nnn.c:1788:27: note: perform multiplication in a wider type
        selmark *marked = malloc(2 * nselected * sizeof(selmark));
                                 ^
                                 (long)
Suppressed 102 warnings (99 in non-user code, 3 NOLINT).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.
2 warnings treated as errors

Exited with code exit status 1

I'll check what is going on, will return soon'ish.

QuickSort produces O(nlog n) at best and O(n^2) at worst. Sort selection
markers with a trivial radix sort algorithm, which guarantees O(n) time.
As a consequence both time and space consumption will be linear.

Do not generalize the algorithm, as other sites most likely require a
comparison based sorting algorithm. For those sites the goal should be
to eliminate the worst case O(n^2) and have a steady O(nlog n)
performance.

Signed-off-by: Jarkko Sakkinen <[email protected]>
@jarkkojs
Copy link
Author

jarkkojs commented Jun 28, 2024

~/work/nnn sort-marked
❯ clang-tidy src/nnn.c
Error while trying to load a compilation database:
Could not auto-detect compilation database for file "src/nnn.c"
No compilation database found in /home/jarkko/work/nnn/src or any parent directory
fixed-compilation-database: Error while opening fixed database: No such file or directory
json-compilation-database: Error while opening JSON database: No such file or directory
Running without flags.
96 warnings generated.
Suppressed 99 warnings (96 in non-user code, 3 NOLINT).
Use -header-filter=.* to display errors from all non-system headers. Use -system-headers to display errors from system headers as well.

I also wrote a better commit message:

    O(n) sorting for selection markers: sort_selection()
    
    QuickSort produces O(nlog n) at best and O(n^2) at worst. Sort selection
    markers with a trivial radix sort algorithm, which guarantees O(n) time.
    As a consequence both time and space consumption will be linear.
    
    Do not generalize the algorithm, as other sites most likely require a
    comparison based sorting algorithm. For those sites the goal should be
    to eliminate the worst case O(n^2) and have a steady O(nlog n)
    performance.

E.g. heap sort is much better option for comparison cases... These set actual guarantees for order of the function (in both time and space dimensions). And further, they are somewhat de-facto choices for e.g. embedded system with low specs, which I guess fit tho this project quite well.

@N-R-K
Copy link
Collaborator

N-R-K commented Jun 28, 2024

QuickSort produces O(nlog n) at best and O(n^2) at worst.

We use the libc qsort function. Which, despite the confusing name, isn't required to be actually quicksort. In fact most libcs don't use quicksort for qsort, glibc uses mergesort for example. (See: https://nullprogram.com/blog/2016/09/05/)

I'll look into the patch. Though I'm skeptical of whether the performance improvement (if any) is actually worthwhile or not. So far I've never run into a situation where sorting was the bottleneck.

Copy link
Collaborator

@KlzXS KlzXS left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are mostly code-style related change requests. The core logic seems good.

I'll test it out and try to benchmark it, but as @N-R-K said I'm not expecting a real performance boost.

Comment on lines -1757 to +1786
selmark *marked = malloc(nselected * sizeof(selmark));
selmark *marked = calloc(nselected, sizeof(*marked));
selmark *scratch = calloc(nselected, sizeof(*scratch));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we replacing the actual struct selmark with this dereferencing stuff? Is this even well-formed C? Initializing a pointer with a value that dereferences said pointer seems like an obvious no-no to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I just realized that it's changed to calloc(). Why? I don't think we need that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this even well-formed C? Initializing a pointer with a value that dereferences said pointer seems like an obvious no-no to me.

Yeah, that's fine. sizeof only needs to know the type of the operand in order to derive it's size. It's guaranteed not to evaluate anything (with the exception when the operand is a VLA, one more reason I hate VLAs).

For reference C11 6.5.3.4:

If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more you know. I'm still more for sizeof(selmark), using an identifier during its initialization just looks off to me. Up to you @jarkkojs.

calloc() still unnecessary and I'd still move scrath into the sorting function.

void *tmp;
int b;

for (b = 0; b < 8; b++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic 8 being?

Copy link
Collaborator

@N-R-K N-R-K Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had intended to comment on this. I think it's assuming sizeof(size_t) == 8, but that's not always true, nnn should work fine on 32bit systems too.

EDIT: just noticed the comment down below, looks like you figured it out :D


for (i = 0; i < nelem; i++) {
value = (size_t)data[i].startpos;
j = idx[(value >> (b * 8)) & 0xFF]++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only here do we learn what b actually is and where the magic 8 comes from. It's the index of the byte in value.

But this also assumes that size_t is actually 64-bit i.e. 8 bytes long. I think we support 32-bit and systems where size_t may not be 8 bytes.

Adjust the names and the limit for b to depend on the size of size_t.

Comment on lines +1733 to +1738
size_t cnt[256];
size_t idx[256];
size_t value;
size_t i, j;
void *tmp;
int b;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer these names to be a bit more descriptive. What is b? cnt and idx make sense if you know what radix sort does, so that's more or less fine. i is an iteration variable that's fine, but j isn't, it's used for something completely unrelated.

Also small nitpick, we know the type of tmp, it should be selmark * just as data and scratch are.

@@ -1850,6 +1881,7 @@ static void invertselbuf(const int pathlen)
}

free(marked);
free(scratch);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should scratch really be allocated here? It's only used in sort_marked() and I see no benefit in allocating it in this function. Might as well just allocate and free it there.


return ma->startpos - mb->startpos;
idx[0] = 0;
for (i = 1; i < 256; i++)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mostly use ++i. Let's keep it consistent.

@jarun
Copy link
Owner

jarun commented Jun 29, 2024

@KlzXS do you really see any merit in this change? I would prefer to stick to the library function.

@KlzXS
Copy link
Collaborator

KlzXS commented Jun 29, 2024

There is some merit to it, yes. How much exactly is debatable.

On a modern system? Probably little to none. Unless you get nselected up to above 100k or a million I doubt you'd be able to perceive the difference as an actual delay in sorting (I personally consider 20ms or 50ms as the lower limit). Even more so considering the majority of the inversion process is actually spent searching for things in the selection buffer. Optimizing calls to findinsel() would yield more performance, but I don't see a clear way to do that without changing the way the selection buffer works.

On something like a router or another embedded low power device? It would be better due to the raw difference in processing power. A 100MHz CPU would be happier with radix sort. Though the searching overhead is still there.

@jarun
Copy link
Owner

jarun commented Jun 30, 2024

@jarkkojs can you please share some performance numbers (along with the processor details, if possible on a regular smartphone)?

@KlzXS
Copy link
Collaborator

KlzXS commented Jul 1, 2024

If you want to reduce the searching overhead, we could put all the filenames in a hash table and then in one pass through pselbuf check the hash table each time we encounter the current path. That would also remove the need for sorting altogether since we'd be adding entries to marked in the order they appear in pselbuf.

Again, I'm not sure how impactful that would be or if we should even bother improving this function at all other than for the sake of bragging rights for it being "optimal". I'll leave that for you to consider.

@jarkkojs
Copy link
Author

jarkkojs commented Jul 2, 2024

OK, I'll close this.

@jarkkojs jarkkojs closed this Jul 2, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants