Gliner on CPU with multiple cores #155

vijayendra-g · 2024-07-16T09:06:54Z

I want to use Gliner on CPU . The medium model takes anywhere between 18- 20 minutes for extracting entities from given text.
My question is,

Does Gliner support multiple cores on a single cpu and multiple cpus ? Will there be improvement in performance ?
Assuming the answer to question 1 is yes, If I were to increase the no of cores and no of cpus, Then what sort of time improvement can we expect. Has anyone tried doing this?

urchade · 2024-07-16T09:20:51Z

hi, 20 min a for a single text ?

vijayendra-g · 2024-07-16T09:28:45Z

yes, Its a small paragraph

urchade · 2024-07-16T10:01:50Z

this should take seconds

vijayendra-g · 2024-07-16T10:22:02Z

Can you please review my code

from gliner import GLiNER

model = GLiNER.from_pretrained("urchade/gliner_medium-v2.1")

start = time.time()
text = """
The history of the football is as rich and varied as the game itself. Ancient civilizations, including the Chinese, Greeks, and Romans, played early forms of football. These rudimentary games involved kicking a ball, albeit with different rules and objectives. The modern football, as we know it, took shape in the 19th century, primarily in England. The establishment of standardized rules by the Football Association in 1863 marked a significant milestone, paving the way for the football to become a global icon."""

labels = [ "year", "country", "features"]

entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

end = time.time()
print(end - start) # time in seconds

In my case

text length is 1200 words and I am trying to extract 4-5 entities.
Machine details. RAM - 32G

Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 43 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 128
On-line CPU(s) list: 0-127
Vendor ID: AuthenticAMD
Model name: AMD EPYC 7742 64-Core Processor
CPU family: 23
Model: 49
Thread(s) per core: 1
Core(s) per socket: 64
Socket(s): 2
Stepping: 0
Frequency boost: enabled
CPU max MHz: 2250.0000
CPU min MHz: 1500.0000
BogoMIPS: 4491.55

BeanWei · 2024-07-17T03:17:34Z

I'm also trying to run in a CPU environment. I tried the sample code you provided, and it showed results in about 0.8 seconds. However, my model is loaded manually by specifying the directory, so is it faster? By the way, from what I've observed, running on CPU seems to be very resource-intensive. When I run predict_entities multiple times in my local environment, the CPU usage stays consistently at 100%, with no progress so far. So I have a similar question about running Gliner in a resource-constrained environment.

vijayendra-g · 2024-07-17T06:07:20Z

@BeanWei If I understand you correctly, is this correct

****model = GLiNER.from_pretrained("/home/.../gliner_medium-v2")****
I did the changes , It's still running in minutes

from gliner import GLiNER
**model = GLiNER.from_pretrained("/home/.../gliner_medium-v2")**
start = time.time()
text = """
The history of the football is as rich and varied as the game itself. Ancient civilizations, including the Chinese, Greeks, and Romans, played early forms of football. These rudimentary games involved kicking a ball, albeit with different rules and objectives. The modern football, as we know it, took shape in the 19th century, primarily in England. The establishment of standardized rules by the Football Association in 1863 marked a significant milestone, paving the way for the football to become a global icon."""

labels = [ "year", "country", "features"]

entities = model.predict_entities(text, labels)

for entity in entities:
    print(entity["text"], "=>", entity["label"])

end = time.time()
print(end - start) # time in seconds

polodealvarado · 2024-08-07T09:15:20Z

Hello @vijayendra-g .

I tried to replicate your results and I got it in minutes.
On the other hand, I tried to use it with onnx but It takes more time than using just torch ( two times slower)

Has anyone tested it with onnx ?

polodealvarado · 2024-08-08T12:15:52Z

I fixed the code and now it works for me.
I got 30% of improvement in the speed.

vijayendra-g · 2024-08-08T12:49:01Z

@polodealvarado what is the code fix ? How much time does Gliner - medium take now? Please specify Gliner version as well .

psydok · 2024-08-19T09:08:49Z

@polodealvarado @vijayendra-g I have encountered the same problem. Onnx runs 2 times longer than the normal model.
It was written that the fixes thread has the needed fix, but that doesn't seem to be it. Can you tell me what kind of fixes are needed? How did you find the bug?
I don't often work with onnx, I don't understand what could be wrong with the current code.

The quantized model sags a lot in quality. it does not seem to be the sag that might have been expected.

polodealvarado · 2024-08-19T10:48:05Z

@psydok @vijayendra-g My problem arose with the sequence length. I realized that with a sequence longer than 512 tokens the onnx model takes a lot of time. So, I have just shortened it.

However, as you said @psydok there is a significant degradation in the model’s performance with the quantized versions (up to 100% in some cases).

psydok · 2024-08-19T11:33:53Z

@polodealvarado
Thank you for your answer!
I conducted experiment: I limited the sequence to 512. The response time of the model actually got better, but the results are about the same, what with onnx, what without onnx...
I checked it like this (labels=["person", "location"]):

# model = GLiNER.from_pretrained("my_models/gliner_multi", load_onnx_model=True, load_tokenizer=True)
%%timeit
entities = model.predict_entities(text[:512], labels, threshold=0.25)

onnx: 79.4 ms
without onnx: 72.9 ms

%%timeit
entities = model.predict_entities(text[:384], labels, threshold=0.25)

onnx (opset_version=14): 69.1 ms
without onnx: 65.4 ms

That is, there is still no increase in speed... I converted the model as in the guide of this repository: https://github.com/urchade/GLiNER/blob/main/examples/convert_to_onnx.ipynb

urchade · 2024-08-19T12:02:53Z

text[:512]assume 512 characters not tokens

you can change the maximum size by settingmodel.config.max_len = 512

psydok · 2024-08-19T12:28:56Z

The default value is model.config.max_len #= 384
I tried more models from onnx-community. Either the quality is terrible, or the original model is already there...
https://huggingface.co/onnx-community/gliner_multi

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Gliner on CPU with multiple cores #155

Gliner on CPU with multiple cores #155

vijayendra-g commented Jul 16, 2024 •

edited

Loading

urchade commented Jul 16, 2024

vijayendra-g commented Jul 16, 2024 •

edited

Loading

urchade commented Jul 16, 2024

vijayendra-g commented Jul 16, 2024 •

edited

Loading

BeanWei commented Jul 17, 2024

vijayendra-g commented Jul 17, 2024 •

edited

Loading

polodealvarado commented Aug 7, 2024 •

edited

Loading

polodealvarado commented Aug 8, 2024

vijayendra-g commented Aug 8, 2024 •

edited

Loading

psydok commented Aug 19, 2024 •

edited

Loading

polodealvarado commented Aug 19, 2024 •

edited

Loading

psydok commented Aug 19, 2024 •

edited

Loading

urchade commented Aug 19, 2024

psydok commented Aug 19, 2024

Gliner on CPU with multiple cores #155

Gliner on CPU with multiple cores #155

Comments

vijayendra-g commented Jul 16, 2024 • edited Loading

urchade commented Jul 16, 2024

vijayendra-g commented Jul 16, 2024 • edited Loading

urchade commented Jul 16, 2024

vijayendra-g commented Jul 16, 2024 • edited Loading

BeanWei commented Jul 17, 2024

vijayendra-g commented Jul 17, 2024 • edited Loading

polodealvarado commented Aug 7, 2024 • edited Loading

polodealvarado commented Aug 8, 2024

vijayendra-g commented Aug 8, 2024 • edited Loading

psydok commented Aug 19, 2024 • edited Loading

polodealvarado commented Aug 19, 2024 • edited Loading

psydok commented Aug 19, 2024 • edited Loading

urchade commented Aug 19, 2024

psydok commented Aug 19, 2024

vijayendra-g commented Jul 16, 2024 •

edited

Loading

vijayendra-g commented Jul 16, 2024 •

edited

Loading

vijayendra-g commented Jul 16, 2024 •

edited

Loading

vijayendra-g commented Jul 17, 2024 •

edited

Loading

polodealvarado commented Aug 7, 2024 •

edited

Loading

vijayendra-g commented Aug 8, 2024 •

edited

Loading

psydok commented Aug 19, 2024 •

edited

Loading

polodealvarado commented Aug 19, 2024 •

edited

Loading

psydok commented Aug 19, 2024 •

edited

Loading