Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement O(n) sort for inverted selection #1904

Closed
wants to merge 1 commit into from
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
48 changes: 40 additions & 8 deletions src/nnn.c
Original file line number Diff line number Diff line change
Expand Up @@ -1724,12 +1724,40 @@ static char *findinsel(char *startpos, int len)
}
}

static int markcmp(const void *va, const void *vb)
{
const selmark *ma = (selmark*)va;
const selmark *mb = (selmark*)vb;
/*
* Sort selection markers by their starting position in O(n) time. Use one byte
* radix and eight passes (starting from the LSB).
*/
static void sort_marked(selmark *data, selmark *scratch, size_t nelem)
{
size_t cnt[256];
size_t idx[256];
size_t value;
size_t i, j;
void *tmp;
int b;
Comment on lines +1733 to +1738
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd prefer these names to be a bit more descriptive. What is b? cnt and idx make sense if you know what radix sort does, so that's more or less fine. i is an iteration variable that's fine, but j isn't, it's used for something completely unrelated.

Also small nitpick, we know the type of tmp, it should be selmark * just as data and scratch are.


for (b = 0; b < 8; b++) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The magic 8 being?

Copy link
Collaborator

@N-R-K N-R-K Jun 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had intended to comment on this. I think it's assuming sizeof(size_t) == 8, but that's not always true, nnn should work fine on 32bit systems too.

EDIT: just noticed the comment down below, looks like you figured it out :D

memset(cnt, 0, sizeof(cnt));
for (i = 0; i < nelem; i++) {
value = (size_t)data[i].startpos;
cnt[(value >> (b * 8)) & 0xFF]++;
}

return ma->startpos - mb->startpos;
idx[0] = 0;
for (i = 1; i < 256; i++)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We mostly use ++i. Let's keep it consistent.

idx[i] = idx[i - 1] + cnt[i - 1];

for (i = 0; i < nelem; i++) {
value = (size_t)data[i].startpos;
j = idx[(value >> (b * 8)) & 0xFF]++;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only here do we learn what b actually is and where the magic 8 comes from. It's the index of the byte in value.

But this also assumes that size_t is actually 64-bit i.e. 8 bytes long. I think we support 32-bit and systems where size_t may not be 8 bytes.

Adjust the names and the limit for b to depend on the size of size_t.

memcpy(&scratch[j], &data[i], sizeof(selmark));
}

tmp = data;
data = scratch;
scratch = tmp;
}
}

/* scanselforpath() must be called before calling this */
Expand All @@ -1754,10 +1782,13 @@ static void invertselbuf(const int pathlen)
int i, nmarked = 0, prev = 0;
struct entry *dentp;
bool scan = FALSE;
selmark *marked = malloc(nselected * sizeof(selmark));
selmark *marked = calloc(nselected, sizeof(*marked));
selmark *scratch = calloc(nselected, sizeof(*scratch));
Comment on lines -1757 to +1786
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why are we replacing the actual struct selmark with this dereferencing stuff? Is this even well-formed C? Initializing a pointer with a value that dereferences said pointer seems like an obvious no-no to me.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, I just realized that it's changed to calloc(). Why? I don't think we need that.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this even well-formed C? Initializing a pointer with a value that dereferences said pointer seems like an obvious no-no to me.

Yeah, that's fine. sizeof only needs to know the type of the operand in order to derive it's size. It's guaranteed not to evaluate anything (with the exception when the operand is a VLA, one more reason I hate VLAs).

For reference C11 6.5.3.4:

If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The more you know. I'm still more for sizeof(selmark), using an identifier during its initialization just looks off to me. Up to you @jarkkojs.

calloc() still unnecessary and I'd still move scrath into the sorting function.


if (!marked) {
if (marked == NULL || scratch == NULL) {
printwarn(NULL);
free(marked);
free(scratch);
return;
}

Expand Down Expand Up @@ -1810,7 +1841,7 @@ static void invertselbuf(const int pathlen)
* With entries sorted we can merge adjacent ones allowing us to
* move them in a single go.
*/
qsort(marked, nmarked, sizeof(selmark), &markcmp);
sort_marked(marked, scratch, nselected);

/* Some files might be adjacent. Merge them into a single entry */
for (i = 1; i < nmarked; ++i) {
Expand Down Expand Up @@ -1850,6 +1881,7 @@ static void invertselbuf(const int pathlen)
}

free(marked);
free(scratch);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should scratch really be allocated here? It's only used in sort_marked() and I see no benefit in allocating it in this function. Might as well just allocate and free it there.


/* Buffer size adjustment */
selbufpos -= shrinklen;
Expand Down
Loading