Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

iterate-dir throws OOME on large directory structures #38

Open
pmonks opened this issue Jan 4, 2013 · 3 comments
Open

iterate-dir throws OOME on large directory structures #38

pmonks opened this issue Jan 4, 2013 · 3 comments

Comments

@pmonks
Copy link

pmonks commented Jan 4, 2013

The iterate-dir function consumes all available heap and throws an OOME on large directory structures.

The following typescript demonstrates this problem in a couple of different ways when the function is presented with a directory containing approximately 600,000 files & sub-directories (note: embedded ANSI escape characters have been manually removed from this typescript for clarity):

Script started on Thu Jan  3 22:45:39 2013
bash-3.2$ ls -R /Users/pmonks/Development | wc -l
ls: unreadableDirectory: Permission denied
  614630
bash-3.2$ lein repl
nREPL server started on port 52181
REPL-y 0.1.0-beta10
Clojure 1.4.0
    Exit: Control+D or (exit) or (quit)
Commands: (user/help)
    Docs: (doc function-name-here)
          (find-doc "part-of-name-here")
  Source: (source function-name-here)
          (user/sourcery function-name-here)
 Javadoc: (javadoc java-object-or-class-here)
Examples from clojuredocs.org: [clojuredocs or cdoc]
          (user/clojuredocs name-here)
          (user/clojuredocs "ns-here" "name-here")
fs-scan.core=> (require '[fs.core :as fs])
nil
fs-scan.core=> (defn walker [root dirs files] ())
#'fs-scan.core/walker
fs-scan.core=> (fs/walk walker "/Users/pmonks/Development")�
OutOfMemoryError Java heap space  java.util.Arrays.copyOf (Arrays.java:2882)

fs-scan.core=> (fs/iterate-dir "/Users/pmonks/Development")�
OutOfMemoryError Java heap space  java.util.Arrays.copyOf (Arrays.java:2882)

fs-scan.core=> (do (fs/iterate-dir "/Users/pmonks/Development")� ())
OutOfMemoryError Java heap space  java.util.Arrays.copyOf (Arrays.java:2882)

fs-scan.core=> exit
Bye for now!

bash-3.2$ exit
exit

Script done on Thu Jan  3 22:53:42 2013

I believe this is occurring because iterate-dir is not lazy (despite the doc comment), and is eagerly building the entire sequence of pathnames in memory.

@pmonks
Copy link
Author

pmonks commented Jan 4, 2013

For my use case, this issue appears when using the walk function. Basically I want to be able to walk very large directory structures (10s to 100s of millions of files, transitively), processing as I go.

@Raynes
Copy link
Owner

Raynes commented Jan 4, 2013

I see. The problem is that the zipper used under the hood holds the whole tree in memory. I'll get a fix in asap. Should just be a tree-seq (I didn't write this code. I never write code that blows the heap, you see ;)).

@pmonks
Copy link
Author

pmonks commented Jan 4, 2013

;-)

Thanks for the lickety-split response - I'll keep an eye out for the update and give the new version a whirl.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants