-
Notifications
You must be signed in to change notification settings - Fork 432
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Suspicious extra node with opset 13 #1543
Comments
interesting, awesome catch! The graph for opset-9/10 looks much better. |
@spillerrec the extra node should be removed now. Have you been able to work around the ExtractImagePatches op? |
I have managed to get the model running by splitting the model into two parts and running |
You could try using onnxruntime_extensions instead of cutting the graph. It will let you implement the missing op in python (just call TF). |
Thanks for the suggestion! Sadly I'm not having a lot of luck this way either. The ExtractImagePatches operator uses attributes which are not strings and this is apparently not (yet) supported: microsoft/onnxruntime-extensions#74
If I try to ignore those integer attributes and hardcode them (as they happen to match for the two instances), I'm getting another error:
And if I try to patch those attributes out of the network I'm getting another validation error:
I'm assuming there is some kind of schema created during the export it is now no longer matching. Well, at least it is giving error messages which are actually insightful. |
Hmm, you should be able to use it. Like you said, it doesn't support attributes so you will need to remove them from the graph. The tutorial I sent you shows an example of converting the attributes to inputs, but hard-coding the values should be fine too. Your second error looks the most promising. It is actually calling extract_image_patches function but you seem to have defined that function with 3 extra positional args. If you are hard-coding their values into the TF call anyway, removing those should be fine and fix the issue. |
Using a conversion rule to remove the attributes worked. And the output is the same as splitting the graph in two, i.e. not correct. After further investigation, I found that the inputs/outputs are supposed to be normalized to [-1, 1] instead of [0, 1] and now the output is as expected. So the small filesize is indeed correct and the extra nodes with opset 13 are gone on master. 🎉 Now I just need to find a proper alternative to the ExtractImagePatches op without relying on tensorflow. And clean up the graph further, there are a bunch of transposes and squeeze/unsqueeze you can get rid of it you move nodes around. (And the entire subgraph in the middle being replicated 8 times due to the batch size being fixed to 8. Not sure if it is stored like that in the checkpoint or it is the exporting, but it is quite inconvenient.) |
@spillerrec Great, glad to hear you got it working! What is your intended use case? You can probably implement ExtractImagePatches in python using numpy, or you could write a C++ implementation (and even contribute it to onnxruntime_extensions repo). For Transposes/Unsqueezes, we try to get rid of those, so if there is anything obvious feel free to open another issue. The 8 subgraphs probably isn't us; we don't duplicate stuff like that but it can be introduced by TF especially if you use a python loop in your tf code. |
@TomWildenhain-Microsoft I'm tying to convert existing AI projects to use ONNX models along with a simple implementation. I'm hoping this will make it easier to for application developers (myself included) to include them in their own projects, without having to deal with TensorFlow, PyTorch, etc.. And learn a bit about what kind of network architectures are used and how they work in the process, which is why I'm spending some effort on making the graph prettier. While I intend to use onnxruntime myself (likely both a C++ and Python implementation), I would really like to avoid non-standard extensions if possible as it will only cause friction if someone wants to convert it to another runtime. (I would like to try onnxjs for example, though the lack of support for ConvTranspose means I only have a ESRGAN model to try it with right now.) I will start with a numpy implementation of ExtractImagePatches and take it from there, but the TensorFlow documentation is quite lacking and I'm not sure if I will create a complete implementation that I can feel confident in working for all cases. But I will try to see if I can find a more general solution to what can be done with ExtractImagePatches, either by some kind of conversion or some argumentation to why ExtractImagePatches should be standalized. I will comment in #436 if I make any progress on that. |
@spillerrec Cool! Where are you putting the results of your conversion project? ONNX has a model zoo but it can be lacking: https://github.com/onnx/models. Sounds like you are primarily using onnx for it's interoperability, not the runtime perf gains of onnxruntime. Is that accurate? I'm informally surveying to see whether people care more about onnxruntime CPU or GPU perf, but it sounds like perf might not be the primary concern for you. The ideal solution would probably be converting the op into a sequence of onnx ops if possible. It is probably sufficient to convert just the rate = [1, 1, 1, 1] case. I'm a bit busy the next two weeks but maybe I'll get a chance soon-ish. |
@TomWildenhain-Microsoft It is still at a quite early stage so currently I just have a shared folder with the models: https://drive.google.com/drive/folders/1Rc_VxLYXKapwpde_7haCI6j4_wiooitE?usp=sharing For this, yes, performance is not the main focus. GPU acceleration without complicated dependencies (such as non-distributable CUDA libraries) is a wish. I much rather want 20% slower GPU performance and being able to support non-NVIDIA hardware. At work, we are trying to start to use ML and here we care about GPU performance. Running CNNs on 50 Mpix images at full resolution takes time and CPU performance is way too slow. GPU performance is usable, but not great. (We are using CoreML and WinML at the moment.) |
Describe the bug
Converting a checkpoint model gives a suspiciously small ONNX file and using
--opset 13
has an extra node compared to default opset:The
Split
node to the left does not appear when using opset 9. I initially thought this node was unconnected but I believe I might be mistaken and theSplit
node is using a constant input which isn't optimized away with opset 13.Urgency
None
System information
To Reproduce
Download this model: https://drive.google.com/open?id=1IMwzqZUuRnTv5jcuKdvZx-RZweknww5x
Run:
Additional context
I do not know the specifics of the model as I haven't made it, but since the file size before conversion is 180 MB and only 14 MB after, it looks like something isn't working as it should. It does give an error about a missing op ExtractImagePatches #436 :
But since the node is actually placed in the graph, I assume it is fine and it just mean I can't run it. I want to take a closer look at ExtractImagePatches and see if there is a way to work around it, but before I spend too much time on trying to understand it and how it could be mapped by other operators, I want to be more confident that the ONNX file actually is correct as I can't test it in its current state.
The checkpoint saved state might contain a discriminator network as well (assuming GAN training) which could perhaps be the reason for the file size discrepancy. But I do not know Tensorflow and I can't tell from the code, nor how to extract this to see if that is the remaining 166 MB. However I do know that this model replaces an older Keras based model which I successfully converted to ONNX and that was ~128MB, so I remain suspicious.
The text was updated successfully, but these errors were encountered: