Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Are gpt tokenizer model open-source? #653

Open
xyyintel opened this issue May 26, 2023 · 3 comments
Open

Are gpt tokenizer model open-source? #653

xyyintel opened this issue May 26, 2023 · 3 comments
Assignees

Comments

@xyyintel
Copy link

Hi ,

When I'm trying to download tokenizer model from gs://mlperf-llm-public2/vocab/c4_en_301_5Mexp2_spm.model using such command:
./google-cloud-sdk/bin/gsutil cp -R gs://mlperf-llm-public2/vocab/c4_en_301_5Mexp2_spm.model ./
I received the error:
AccessDeniedException: 403 [email protected] does not have storage.objects.list access to the Google Cloud Storage bucket. Permission 'storage.objects.list' denied on resource (or it may not exist).

Any suggestions?

@xyyintel
Copy link
Author

PS. I created a new project, and in 'IMA' page already grant such access:

Environment and Storage Object Administrator
Environment and Storage Object User
Owner
Storage Admin
Storage Object Admin
Storage Object Viewer

Suppose it shall include 'storage.objects.list'.

Does it because the dataset owner didn't set public to all users without any limitation?

@ShriyaPalsamudram
Copy link
Contributor

All required data can be downloaded using instructions in the S3 artifacts download section of the README.

@hiwotadese
Copy link

@xyyintel can you try with the updated instruction in the readme?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants