Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Give UnigramVocabulary the option to split/join on the space character in all encode/decode functions. #756

Closed
wants to merge 0 commits into from

Conversation

copybara-service[bot]
Copy link

@copybara-service copybara-service bot commented Jul 9, 2024

Give UnigramVocabulary the option to split/join on the space character in all encode/decode functions.

Previously, decode joined on space, while encode/encode_tf/decode_tf all just ignored tokens after the first. Now if "split_on_space" is True, all four functions are consistent with decode.

Having a UnigramVocabulary that encodes/decodes invertibly is useful for testing.

@copybara-service copybara-service bot force-pushed the test_650738325 branch 4 times, most recently from 612c5fd to a6965ca Compare July 10, 2024 20:02
@copybara-service copybara-service bot changed the title Make UnigramVocabulary split/join on the space character in all encode/decode functions. Give UnigramVocabulary the option to split/join on the space character in all encode/decode functions. Jul 10, 2024
@copybara-service copybara-service bot closed this Jul 10, 2024
@copybara-service copybara-service bot deleted the test_650738325 branch July 10, 2024 20:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

0 participants