Performance #34

mewmew · 2018-11-09T03:56:37Z

This issue is intended to profile the performance of the llir/llvm library, measure it against the official LLVM distribution and evaluate different methods for improving the performance.

This is a continuation of mewspring/mewmew-l#6

The benchmark suite is at https://github.com/decomp/testdata. Specifically, the LLVM IR assembly of these projects are used in the benchmark:

Below follows a first evaluation of using concurrency to speed up parsing. The evaluation is based on a very naiive implementation of concurrency, just to get some initial runtime numbers. It is based on 3011396 of the development branch, and subsets of the following patch has been applied https://gist.github.com/mewmew/d127b562fdd8f560222b4ded739861a7

Official LLVM results

For comparison, below are the runtime results of the opt tool from the official LLVM distribution (using opt -verify foo.ll).

Coreutils

real 8.18
user 7.22
sys 0.88

SQLite

real 1.90
user 1.73
sys 0.13

`llir/llvm` results

Coreutils

no concurrency

total time for file "testdata/coreutils/testdata/yes.ll": 55.744113ms
real 11.54
user 14.70
sys 0.16

concurrent `translateTopLevelEntities`

4 go-routines with waitgroup in translateTopLevelEntities

total time for file "testdata/coreutils/testdata/yes.ll": 53.49785ms
real 10.28
user 16.06
sys 0.15

concurrent `translateGlobals`

2 go-routines with waitgroup in translateGlobals (for global and function definitions)

total time for file "testdata/coreutils/testdata/yes.ll": 55.567134ms
real 9.83
user 17.18
sys 0.17

concurrent `translateTopLevelEntities` and `translateGlobals`

4 go-routines with waitgroup in translateTopLevelEntities
2 go-routines with waitgroup in translateGlobals (for global and function definitions)

total time for file "testdata/coreutils/testdata/yes.ll": 58.474581ms
real 9.23
user 18.08
sys 0.16

SQLite3

no concurrency

total time for file "shell.ll": 3.147106433s
real 3.18
user 3.86
sys 0.32

concurrent `translateTopLevelEntities`

4 go-routines with waitgroup in translateTopLevelEntities

total time for file "shell.ll": 2.848574349s
real 2.88
user 4.67
sys 0.32

concurrent `translateGlobals`

2 go-routines with waitgroup in translateGlobals (for global and function definitions)

total time for file "testdata/sqlite/testdata/shell.ll": 2.86919391s
real 2.90
user 4.90
sys 0.32

concurrent `translateTopLevelEntities` and `translateGlobals`

4 go-routines with waitgroup in translateTopLevelEntities
2 go-routines with waitgroup in translateGlobals (for global and function definitions)

total time for file "shell.ll": 2.897873366s
real 2.93
user 4.79
sys 0.33

The text was updated successfully, but these errors were encountered:

mewmew · 2018-12-07T21:30:55Z

As we've reached the hand-wavy ~2x slowdown target for the v0.3.0 release, I've changed the milestone to future, as we will constantly seek to improve the performance of llir. It would be really interesting also to explore using concurrency to spin up the cores of multicore machines. I tried a naive approach for this and it seems to be a valid approach for bringing wall clock run time improvements. We just need to figure out how to do it in a less naive way, and also to find the trade-off factors for when we should spin up Go-routines and how many to run. We can define threshold limits for these.

Challenge: beat LLVM wall time performance using concurrency

Anyone interested in experimenting a bit, have a look at asm/translate.go which defines the high-level AST to IR translation order. In it, a few good candidates for concurrent execution have been identified:

Order of translation

Note: step 3 and the substeps of 4a can be done concurrently.
Note: the substeps of 4a can be done concurrently.
Note: the substeps of 4b can be done concurrently.
Note: steps 5-7 can be done concurrently.
Note: the substeps of 8 can be done concurrently.

Index AST top-level entities.
Resolve IR type definitions.

a. Index type identifiers and create scaffolding IR type definitions
(without bodies).

b. Translate AST type definitions to IR.
Translate AST comdat definitions to IR.

Note: step 3 and the substeps of 4a can be done concurrently.

Resolve remaining IR top-level entities.

a. Index top-level identifiers and create scaffolding IR top-level
declarations and definitions (without bodies but with types).

Note: the substeps of 4a can be done concurrently.
1. Index global identifiers and create scaffolding IR global
  declarations and definitions, indirect symbol definitions, and
  function declarations and definitions (without bodies but with
  types).
2. Index attribute group IDs and create scaffolding IR attribute group
  definitions (without bodies).
3. Index metadata names and create scaffolding IR named metadata
  definitions (without bodies).
4. Index metadata IDs and create scaffolding IR metadata definitions
  (without bodies).
b. Translate AST top-level declarations and definitions to IR.

Note: the substeps of 4b can be done concurrently.
1. Translate AST global declarations and definitions, indirect symbol
  definitions, and function declarations and definitions to IR.
2. Translate AST attribute group definitions to IR.
3. Translate AST named metadata definitions to IR.
4. Translate AST metadata definitions to IR.

Note: steps 5-7 can be done concurrenty.

Translate use-list orders.
Translate basic block specific use-list orders.
Fix basic block references in blockaddress constants.
Add IR top-level declarations and definitions to the IR module in order of
occurrence in the input.

Note: the substeps of 8 can be done concurrently.

a. Add IR type definitions to the IR module in alphabetical order.

b. Add IR comdat definitions to the IR module in alphabetical order.

c. Add IR global variable declarations and definitions, indirect symbol
definitions, and function declarations and definitions to the IR module
in order of occurrence in the input.

d. Add IR attribute group definitions to the IR module in numeric order.

e. Add IR named metadata definitions to the IR module in order of
occurrence in the input.

f. Add IR metadata definitions to the IR module in numeric order.

pwaller · 2018-12-09T15:42:04Z

It would be great to improve this in the future! And there is definitely some low hanging fruit to be found. @pwaller has gotten a ~10% speed increase by making changes to Textmapper, so that the parser uses values instead of pointers for subtype (%interface) AST nodes, which puts less preasure on the Go garbage collector.

Filed: inspirer/textmapper#29

mewmew · 2019-12-29T00:11:26Z

Moving performance related note from TODO.md to this issue.

figure out how to remove backtracking table in lexer (ref: http://textmapper.org/documentation.html#backtracking-and-invalid-tokens)

mewmew mentioned this issue Nov 9, 2018

Parsing is slow relative to LLVM mewspring/mewmew-l#6

Closed

mewmew added this to the v0.3 milestone Nov 12, 2018

This comment has been minimized.

Sign in to view

mewmew modified the milestones: v0.3, future Dec 7, 2018

mewmew added the meta label Mar 14, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Performance #34

Performance #34

mewmew commented Nov 9, 2018 •

edited

Loading

This comment has been minimized.

mewmew commented Dec 7, 2018 •

edited

Loading

pwaller commented Dec 9, 2018

mewmew commented Dec 29, 2019

Performance #34

Performance #34

Comments

mewmew commented Nov 9, 2018 • edited Loading

Official LLVM results

Coreutils

SQLite

llir/llvm results

Coreutils

no concurrency

concurrent translateTopLevelEntities

concurrent translateGlobals

concurrent translateTopLevelEntities and translateGlobals

SQLite3

no concurrency

concurrent translateTopLevelEntities

concurrent translateGlobals

concurrent translateTopLevelEntities and translateGlobals

This comment has been minimized.

mewmew commented Dec 7, 2018 • edited Loading

Challenge: beat LLVM wall time performance using concurrency

Order of translation

pwaller commented Dec 9, 2018

mewmew commented Dec 29, 2019

mewmew commented Nov 9, 2018 •

edited

Loading

`llir/llvm` results

concurrent `translateTopLevelEntities`

concurrent `translateGlobals`

concurrent `translateTopLevelEntities` and `translateGlobals`

concurrent `translateTopLevelEntities`

concurrent `translateGlobals`

concurrent `translateTopLevelEntities` and `translateGlobals`

mewmew commented Dec 7, 2018 •

edited

Loading