Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[feature request] Saving a file index to avoid recomputing each time #4

Open
zjijz opened this issue Dec 21, 2017 · 6 comments
Open

Comments

@zjijz
Copy link
Contributor

zjijz commented Dec 21, 2017

@jegesh Could we add the ability for a file index to be saved to disk to avoid indexing each time a file is opened? I've been using this package for machine learning batches, and indexing the file each training run has been noticeable.

A similar package called linereader saves the index automatically.

I can help implement this too.

@zjijz zjijz changed the title Saving a file index to avoid recomputing each time [feature request] Saving a file index to avoid recomputing each time Dec 21, 2017
@carvetighter
Copy link

would you recommend pickling the index, possibly in the same directory as the file? a possible file extension could be "*.idx".

@zjijz
Copy link
Contributor Author

zjijz commented Dec 21, 2017

@carvetighter Pickling could work. Is there some fact about the index structure that could let it be compressed more?

@carvetighter
Copy link

carvetighter commented Dec 21, 2017

@zjijz I don't know about compressing the index. It was just an idea. Do you want to access the index information quickly?

I'm looking at the linereader code and it's interesting how he counts the lines and makes every line the same length by padding with spaces at the end in the index file. It's always hard reading someone else's code. I don't understand why he is doing some things. Like the index file which in an integer than a lot of spaces after (e.g. '32 ...a bunch of spaces... \n'). It just seems odd to me. If you pickle the index then you can just load it and use it easily.

@jegesh
Copy link
Owner

jegesh commented Jan 1, 2018

A pull request would be well received. If neither of you have the time for it, maybe I can put something basic together.

@zjijz
Copy link
Contributor Author

zjijz commented Jan 25, 2018

Hey, sorry about the delay. I was working on a school project that would use this feature but the class ended and some other workloads piled up. Do you have a date you would want a version of this done by?

@jegesh
Copy link
Owner

jegesh commented Jan 26, 2018

You requested it, so you tell me!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants