-
Notifications
You must be signed in to change notification settings - Fork 107
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Gliner on CPU with multiple cores #155
Comments
hi, 20 min a for a single text ? |
yes, Its a small paragraph |
this should take seconds |
Can you please review my code
In my case
Architecture: x86_64 |
I'm also trying to run in a CPU environment. I tried the sample code you provided, and it showed results in about 0.8 seconds. However, my model is loaded manually by specifying the directory, so is it faster? By the way, from what I've observed, running on CPU seems to be very resource-intensive. When I run predict_entities multiple times in my local environment, the CPU usage stays consistently at 100%, with no progress so far. So I have a similar question about running Gliner in a resource-constrained environment. |
@BeanWei If I understand you correctly, is this correct
|
Hello @vijayendra-g . I tried to replicate your results and I got it in minutes. Has anyone tested it with onnx ? |
I fixed the code and now it works for me. |
@polodealvarado what is the code fix ? How much time does Gliner - medium take now? Please specify Gliner version as well . |
@polodealvarado @vijayendra-g I have encountered the same problem. Onnx runs 2 times longer than the normal model. The quantized model sags a lot in quality. it does not seem to be the sag that might have been expected. |
@psydok @vijayendra-g My problem arose with the sequence length. I realized that with a sequence longer than 512 tokens the onnx model takes a lot of time. So, I have just shortened it. However, as you said @psydok there is a significant degradation in the model’s performance with the quantized versions (up to 100% in some cases). |
@polodealvarado
onnx: 79.4 ms
onnx (opset_version=14): 69.1 ms That is, there is still no increase in speed... I converted the model as in the guide of this repository: https://github.com/urchade/GLiNER/blob/main/examples/convert_to_onnx.ipynb |
you can change the maximum size by setting |
The default value is |
I want to use Gliner on CPU . The medium model takes anywhere between 18- 20 minutes for extracting entities from given text.
My question is,
The text was updated successfully, but these errors were encountered: