Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

htmlchecker: allow specifying error handling on encoding error #410

Closed
wants to merge 2 commits into from

Conversation

Bixilon
Copy link

@Bixilon Bixilon commented Jan 18, 2024

I need to fetch a binary encoded file, which contains the update name. The file is binary and just contains some readable strings. The problem is, that it already fails with Error querying for new versions: 'utf-8' codec can't decode bytes in position thus I am not able to apply any regexes on it. Now I can set an attribute called encoding-error to ignore. This is kinda hacky.

I am using the following checker code which now just works fine (tested it):

        x-checker-data:
          type: html
          url: http://versions.teamspeak.com/ts3-client-2
          version-pattern: "\u0006stable\u0010.*3\\.(\\d+\\.\\d+)\u0012"
          encoding-error: ignore
          url-template: https://files.teamspeak-services.com/releases/client/3.$version/TeamSpeak3-Client-linux_amd64-3.$version.run

I need to fetch a binary encoded file, which contains the update name. The file is binary and just contains some readable strings. The problem is, that it already fails with `Error querying for new versions: 'utf-8' codec can't decode bytes in position` thus I am not able to apply any regexes on it. Now I can set an attribute called `encoding-error` to `ignore`. This is kinda hacky.
Bixilon added a commit to flathub/com.teamspeak.TeamSpeak3 that referenced this pull request Jan 20, 2024
@Bixilon
Copy link
Author

Bixilon commented Jan 30, 2024

So, anything on this?

@dbnicholson
Copy link
Contributor

I'm not the maintainer here, but I don't think you should try force a binary download through htmlchecker. HTML by definition is a text language. I took a look at http://versions.teamspeak.com/ts3-client-2, and it's definitely not HTML. What an odd choice to encode that in a custom binary format instead of JSON or something.

As it turns out, I think this is protobuf format.

$ hd ts3-client-2 
00000000  08 05 12 16 0a 06 73 65  72 76 65 72 10 e5 b0 f0  |......server....|
00000010  f0 05 1a 06 33 2e 31 31  2e 30 12 1e 0a 0f 61 6c  |....3.11.0....al|
00000020  70 68 61 5f 6c 69 6e 75  78 5f 78 38 36 10 e6 c3  |pha_linux_x86...|
00000030  f9 fd 05 1a 05 33 2e 35  2e 36 12 1d 0a 0e 62 65  |.....3.5.6....be|
00000040  74 61 5f 6c 69 6e 75 78  5f 78 38 36 10 e6 c3 f9  |ta_linux_x86....|
00000050  fd 05 1a 05 33 2e 35 2e  36 12 1f 0a 10 73 74 61  |....3.5.6....sta|
00000060  62 6c 65 5f 6c 69 6e 75  78 5f 78 38 36 10 e6 c3  |ble_linux_x86...|
00000070  f9 fd 05 1a 05 33 2e 35  2e 36 12 13 0a 04 62 65  |.....3.5.6....be|
00000080  74 61 10 dd ff aa a8 06  1a 05 33 2e 36 2e 32 12  |ta........3.6.2.|
00000090  15 0a 06 73 74 61 62 6c  65 10 dd ff aa a8 06 1a  |...stable.......|
000000a0  05 33 2e 36 2e 32 12 14  0a 05 61 6c 70 68 61 10  |.3.6.2....alpha.|
000000b0  e9 f7 96 ab 06 1a 05 33  2e 36 2e 33 18 04        |.......3.6.3..|
000000be
$ ~/go/bin/protoscope ts3-client-2 
1: 5
2: {
  1: {
    14:SGROUP
    12: 4.5449766e30i32   # 0x72657672i32
  }
  2: 1578899557
  3: {"3.11.0"}
}
2: {
  1: {"alpha_linux_x86"}
  2: 1606312422
  3: {"3.5.6"}
}
2: {
  1: {"beta_linux_x86"}
  2: 1606312422
  3: {"3.5.6"}
}
2: {
  1: {"stable_linux_x86"}
  2: 1606312422
  3: {"3.5.6"}
}
2: {
  1: {"beta"}
  2: 1695203293
  3: {"3.6.2"}
}
2: {
  1: {"stable"}
  2: 1695203293
  3: {"3.6.2"}
}
2: {
  1: {"alpha"}
  2: 1701166057
  3: {"3.6.3"}
}
3: 4

It looks like each item is a tuple of name, time of update and version number. While you could probably get away with parsing it with a regex, it's certainly not robust. This seems like it needs to be a custom checker to be done correctly.

Alternatively, there could maybe be a type: raw checker that reads in binary data and then uses a binary regex before decoding the match back to a string.

@wjt
Copy link
Contributor

wjt commented May 15, 2024

Agreed. I don't think the html checker is the right tool to use here.

Happily it seems that Teamspeak is covered by release-monitoring.org https://release-monitoring.org/project/8714/ so you can use the anitya checker.

@wjt wjt closed this May 15, 2024
@Bixilon
Copy link
Author

Bixilon commented May 16, 2024

@dbnicholson Thats a interesting call, did not notice it (not worked with proto buf before)

@wjt Agreed, but maybe there is future use for this, there are broken webpages. But Yes, I am abusing it for my usecase.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants