Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deal with BOM and other encoding issues with input files. #619

Open
igorbrigadir opened this issue Mar 30, 2022 · 0 comments
Open

Deal with BOM and other encoding issues with input files. #619

igorbrigadir opened this issue Mar 30, 2022 · 0 comments
Labels

Comments

@igorbrigadir
Copy link
Contributor

The error was from here: DocNow/twarc-timeline-archive#5 (comment)

We should strip / detect or otherwise deal with BOM and other special control characters so that the whole thing doesn't crash - this applies to any part of twarc that opens files as input, not just that one plugin.

Twitter API stuff is all in UTF8, and these issues with BOM and output files have come up before with using > output redirection in Windows in #343, #207, #297

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

1 participant