Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Doesn't find OpenMP on macOS (though libomp is installed) #5511

Open
forthrin opened this issue Jul 11, 2024 · 16 comments
Open

Doesn't find OpenMP on macOS (though libomp is installed) #5511

forthrin opened this issue Jul 11, 2024 · 16 comments
Labels
enhancement question User support question

Comments

@forthrin
Copy link

Using OpenCL formats is much faster than its CPU counterparts, but they seem to have something of a one minute startup penalty. (Accordingly, when running an actual crack, nothing happens until after about one minute.) Is this:

  1. Hardware limitation
  2. Necessity / By design
  3. Bug / Known issue
  4. User error
$ time ./john --format=dmg --test
Benchmarking: dmg, Apple DMG [PBKDF2-SHA1 128/128 ASIMD 4x 3DES/AES]... DONE
Speed for cost 1 (iteration count) of 1000, cost 2 (version) of 2 and 1
Raw:    4312 c/s real, 4334 c/s virtual
0m2.410s # FAST

$ time ./john --format=dmg-opencl --test
Device 1: Apple M1
Benchmarking: dmg-opencl, Apple DMG [PBKDF2-SHA1 3DES/AES OpenCL]... LWS=64 GWS=524288 (8192 blocks) DONE
Speed for cost 1 (iteration count) of 1000, cost 2 (version) of 2 and 1
Raw:    160824 c/s real, 34952K c/s virtual
1m4.962s # SLOW

# System Version: macOS 14.5 (23F79)
# Model Identifier: MacBookAir10,1

$ ./john --list=build-info
Version: 1.9.0-jumbo-1+bleeding-19d731b1ca 2024-07-07 17:21:12 +0200
Build: darwin23.5.0 64-bit arm ASIMD AC OPENCL
SIMD: ASIMD, interleaving: MD4:2 MD5:2 SHA1:1 SHA256:1 SHA512:1
$JOHN is ./
Format interface version: 14
Max. number of reported tunable costs: 4
Rec file version: REC4
Charset file version: CHR3
CHARSET_MIN: 1 (0x01)
CHARSET_MAX: 255 (0xff)
CHARSET_LENGTH: 24
SALT_HASH_SIZE: 1048576
SINGLE_IDX_MAX: 2147483648
SINGLE_BUF_MAX: 4294967295
Effective limit: Number of salts vs. SingleMaxBufferSize
Max. Markov mode level: 400
Max. Markov mode password length: 30
clang version: 15.0.0 (clang-1500.3.9.4) (gcc 4.2.1 compatibility)
OpenCL headers version: 1.2
Crypto library: OpenSSL
OpenSSL library version: 030300010
OpenSSL 3.3.1 4 Jun 2024
GMP library version: 6.3.0
File locking: fcntl()
fseek(): fseek
ftell(): ftell
fopen(): fopen
memmem(): System's
times(2) sysconf(_SC_CLK_TCK) is 100
Using times(2) for timers, resolution 10 ms
HR timer: mach_absolute_time(), latency 42 ns
Total physical host memory: 8 GiB
Available physical host memory: 3118 MiB
Terminal locale string: UTF-8
Parsed terminal locale: UTF-8
@solardiz solardiz added the question User support question label Jul 11, 2024
@solardiz
Copy link
Member

There are several (non-)issues here.

  1. Yes, the slow start of OpenCL formats is by design. There are several reasons for it: building the kernel from source the first time you run it (then a cached build is normally reused), auto-tuning the LWS and GWS settings (this is usually where most time is spent), and running self-test at a rather high GWS (which is so high because of the higher concurrency needed to use an OpenCL device optimally). You can see what it's doing during this time by increasing verbosity, e.g. with -v=5. You can also force usage of previously tuned LWS/GWS to skip the auto-tuning, e.g. by LWS=64 GWS=524288 john ... (setting those as environment variables via the Unix shell; note that it's easy copy-paste of what had been output before) or by using the --lws and --gws command-line options. You can also skip the self-test with --skip-self-test, but we generally recommend against skipping it.
  2. Your build appears to lack OpenMP support - why is that? So your CPU run is single-core. Perhaps with OpenMP it'd show much higher speed, but would also take somewhat longer to start (since auto-tuning and self-test at higher concurrency). Did you explicitly disable OpenMP at build time, or was it somehow not auto-detected in this build with clang for macOS? Could be something for us to fix.
  3. Also the default benchmark being for a mix of two different cost settings is something we could want to fix, standardizing on one cost.

@forthrin
Copy link
Author

Thanks for reaching out!

  1. Consecutive plain john --format=dmg-opencl --test runs are consistently slow (eg. nothing seems automatically cached/reused between runs, if that was supposed to happen). Explicitly setting LWS and GWS finished the test in 12s.

  2. Build was the plain recommended ./configure && make -s clean && make -sj4. Do tell how to provide the necessary information to pinpoint why OpenMP is not enabled by default as expected.

checking for gcc option to support OpenMP... unsupported
AES-NI support ..................................... no
Cross compiling .................................... no
OpenMP support ..................................... no
librexgen (regex mode, see doc/README.librexgen) ... no
ZTEX USB-FPGA module 1.15y support ................. no

$ which gcc
/usr/bin/gcc
$ gcc --version
Apple clang version 15.0.0 (clang-1500.3.9.4)
Target: arm64-apple-darwin23.5.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

PS! Adding a stopwatch timestamp at the beginning of each debug line would make it easier to see/share performance!

@solardiz
Copy link
Member

Consecutive plain john --format=dmg-opencl --test runs are consistently slow (eg. nothing seems automatically cached/reused between runs, if that was supposed to happen).

That's wrong conclusion - like I wrote, only 1 out of 3 things is supposed to be cached, so the runs would remain slow overall, yet some caching would occur. That's by design.

Since the auto-tuning is usually the slowest part, no wonder setting explicit LWS/GWS removes most of the delay for you.

Do tell how to provide the necessary information to pinpoint why OpenMP is not enabled by default as expected.

OK, apparently Apple still provides only crippled clang in Xcode. This has some answers: https://stackoverflow.com/questions/71061894/how-to-install-openmp-on-mac-m1

We also have binary builds in https://github.com/openwall/john-packages/releases - I think these are with OpenMP.

I don't use macOS myself.

@forthrin
Copy link
Author

forthrin commented Jul 11, 2024

$ ./omptest # works even with /usr/bin/gcc
Hello World... from thread = 0
<snip>
Hello World... from thread = 7
$ brew info libomp
==> libomp: stable 18.1.8 (bottled) [keg-only]
$ echo $LDFLAGS
-L/opt/homebrew/opt/libomp/lib
$ echo $CPPFLAGS
-I/opt/homebrew/opt/libomp/include
$ brew info gcc
==> gcc: stable 14.1.0 (bottled), HEAD
$ echo $PATH
/opt/homebrew/Cellar/gcc/14.1.0_1/bin:<snip> # no "gcc", only "gcc-14" ++
$ ./configure
OpenMP support ..................................... still no

@solardiz: If you're on another OS and out of ideas, hope someone on macOS can chip in, for optimal performance.

@forthrin forthrin changed the title One minute startup delay using OpenCL formats Doesn't find OpenMP on macOS (though libomp is installed) Jul 11, 2024
@solardiz
Copy link
Member

$ echo $PATH
/opt/homebrew/Cellar/gcc/14.1.0_1/bin: # no "gcc", only "gcc-14" ++

As some people on that StackOverflow wrote, you may need to use e.g. gcc-14 explicitly. Try e.g. ./configure CC=gcc-14. Alternatively, apparently it should also be possible to use libomp coming from Brew along with clang.

@solardiz
Copy link
Member

@claudioandre-br Maybe you want to comment on how we're doing our CI builds for macOS - with OpenMP, right?

@forthrin
Copy link
Author

CC=gcc-14 got somewhere! How to get a true "yes", or is "yes, but" as good as it gets?

OpenMP support ..................................... yes (not for fast formats)

@solardiz
Copy link
Member

You can force it to a plain yes with --enable-openmp-for-fast-formats, but we recommend that you don't do that.

@claudioandre-br
Copy link
Member

@claudioandre-br Maybe you want to comment on how we're doing our CI builds for macOS - with OpenMP, right?

Yes, OpenMP on"default" clang. The rolling-2404 is:

Version: 1.9.0-jumbo-1+bleeding-f9fedd238b 2024-04-01 13:35:37 +0200
Build: darwin23.5.0 64-bit arm ASIMD AC OMP OPENCL
SIMD: ASIMD, interleaving: MD4:2 MD5:2 SHA1:1 SHA256:1 SHA512:1
OMP fallback binary: john-arm64
$JOHN is ../run/
[...]
Max. Markov mode password length: 30
clang version: 15.0.0 (clang-1500.3.9.4) (gcc 4.2.1 compatibility)
OpenCL headers version: 1.2
Crypto library: OpenSSL
OpenSSL library version: 030300010
OpenSSL 3.3.1 4 Jun 2024
GMP library version: 6.3.0
[...]

Build on:

ProductVersion:		14.5
BuildVersion:		23F79
Homebrew 4.3.9
Xcode_15.4

doc/INSTALL recommends gcc (passing the CC value for .configure). That's what magnum used to use.

@solardiz
Copy link
Member

@claudioandre-br In other words, you're saying we get OpenMP with the exact same clang version that @forthrin also has, but does not get OpenMP with? Are we perhaps installing something from Homebrew to get this to work?

BTW, we have doc/INSTALL* files for several systems, but not for macOS. Maybe we should add one.

@forthrin
Copy link
Author

@claudioandre-br: Ah! I see this is actually thoroughly described at the bottom of DOC/INSTALL. Should have read the whole thing. Maybe configure should say:

OpenMP ... no (See DOC/INSTALL "Optimal build on macOS") (Actually still says "OS X")

Happy to say john managed to crack a forgotten password which saved a ton of work!

Closing up:

  1. Why is --enable-openmp-for-fast-formats discouraged? (Which are the "fast formats"?)
  2. Can John make word lists such as:

(0-2 exclamation marks) (random word from list) (0 or 1 spaces) (number 0-9999) (0 or 1 spaces) (random word from list) (0-2 exclamation marks)

... or is it generally recommended to make such with a scripting language in advance? (Had a look at the mask and rules docs, but couldn't really figure out eg. combining two words.)

@claudioandre-br
Copy link
Member

In other words, you're saying we get OpenMP with the exact same clang version that @forthrin also has, but does not get OpenMP with?

Yes. See the -Xclang -fopenmp at https://github.com/openwall/john-packages/blob/main/scripts/ci_controller.sh#L97.

There is some related magic and Google-sama might have an explanation on the matter. By the way: the order of the arguments is important.

@solardiz
Copy link
Member

See the -Xclang -fopenmp at https://github.com/openwall/john-packages/blob/main/scripts/ci_controller.sh#L97.

Should we possibly get this into the main project here, perhaps as a change to configure.ac? Would you test and contribute?

When linking to specific line numbers, let's also link to specific revisions - GitHub substitutes it when you press y. Anyway, I'll quote the relevant excerpt below this time.

	if [[ "$TARGET_ARCH" == *"macOS"* ]]; then
		SYSTEM_WIDE=''
		REGULAR="$SYSTEM_WIDE $ASAN $BUILD_OPTS"
		NO_OPENMP="--disable-openmp $SYSTEM_WIDE $ASAN $BUILD_OPTS"

		#Libraries and Includes
		if [[ "$TARGET_ARCH" == *"macOS X86"* ]]; then
			MAC_LOCAL_PATH="usr/local/opt"
		else
			MAC_LOCAL_PATH="opt/homebrew/opt"
		fi
		LDFLAGS_ssl="-L/$MAC_LOCAL_PATH/openssl/lib"
		LDFLAGS_gmp="-L/$MAC_LOCAL_PATH/gmp/lib"
		LDFLAGS_omp="-L/$MAC_LOCAL_PATH/libomp/lib -lomp"

		CFLAGS_ssl="-I/$MAC_LOCAL_PATH/openssl/include"
		CFLAGS_gmp="-I/$MAC_LOCAL_PATH/gmp/include"
		CFLAGS_omp="-I/$MAC_LOCAL_PATH/libomp/include"

		if [[ $TARGET_ARCH == *"macOS ARM"* ]]; then
			brew link openssl --force
		fi

		if [[ $TARGET_ARCH == *"macOS X86"* ]]; then
			do_configure "$NO_OPENMP" --enable-simd=avx && do_build ../run/john-avx
			do_configure "$REGULAR" --enable-simd=avx LDFLAGS="$LDFLAGS_omp" CPPFLAGS="-Xclang -fopenmp $CFLAGS_omp -DOMP_FALLBACK_BINARY=\"\\\"john-avx\\\"\" " && do_build ../run/john-avx-omp
			do_configure "$NO_OPENMP" --enable-simd=avx2 && do_build ../run/john-avx2
			do_configure "$REGULAR" --enable-simd=avx2 LDFLAGS="$LDFLAGS_omp" CPPFLAGS="-Xclang -fopenmp $CFLAGS_omp -DOMP_FALLBACK_BINARY=\"\\\"john-avx2\\\"\" -DCPU_FALLBACK_BINARY=\"\\\"john-avx-omp\\\"\" " && do_build ../run/john-avx2-omp
			BINARY="john-avx2-omp"
		else
			do_configure "$NO_OPENMP" LDFLAGS="$LDFLAGS_ssl $LDFLAGS_gmp" CPPFLAGS="$CFLAGS_ssl $CFLAGS_gmp" && do_build "../run/john-$arch"
			do_configure "$REGULAR" LDFLAGS="$LDFLAGS_ssl $LDFLAGS_gmp $LDFLAGS_omp" CPPFLAGS="-Xclang -fopenmp $CFLAGS_ssl $CFLAGS_gmp $CFLAGS_omp -DOMP_FALLBACK_BINARY=\"\\\"john-$arch\\\"\" " && do_build ../run/john-omp
			BINARY="john-omp"
		fi
		do_release "No" "Yes" $BINARY # --system-wide, --support-opencl, --binary-name
	fi

@solardiz
Copy link
Member

the default benchmark being for a mix of two different cost settings is something we could want to fix, standardizing on one cost.

Maybe what we have now isn't that bad - it's same iteration count. I don't recall what effect on speeds the version has for the same iteration count. Looking at the test vectors we currently have, there aren't any two that look the same anyway - they all have data of different lengths.

Speed for cost 1 (iteration count) of 1000, cost 2 (version) of 2 and 1

@solardiz
Copy link
Member

1. Why is `--enable-openmp-for-fast-formats` discouraged? (Which are the "fast formats"?)

Those are implementations of hashes that are too fast for efficient scaling with our current use of OpenMP. We recommend usage of --fork on them instead of OpenMP.

2. Can John make word lists such as:

(0-2 exclamation marks) (random word from list) (0 or 1 spaces) (number 0-9999) (0 or 1 spaces) (random word from list) (0-2 exclamation marks)

... or is it generally recommended to make such with a scripting language in advance?

This varies. I'd normally use John itself, but for complicated cases such as yours it may take non-trivial configuration or/and multiple invocations of it. Here's a start, but then you need to write rule sets and likely use two invocations, piping intermediate list between them:

./john -w=w --external=combinator --rules-stack=': /[ ] Ap" [0-9][0-9][0-9][0-9]"' --stdout

I'd have greater motivation to spend time on a more complete answer if we were discussing this on the john-users list, but here it's just not worth my time.

@forthrin
Copy link
Author

@solardiz: Good you're focusing your valuable time. Happy that OP segued into worthwhile compilation issues. Ping me for testing. Thanks for your help and keep up the great work!

PS! If the mailing list was moved to "Discussions" at some point that would be convenient!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement question User support question
Projects
None yet
Development

No branches or pull requests

3 participants