Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Crawl button with javascript navigation #665

Open
hamzamac opened this issue Aug 6, 2024 · 5 comments
Open

Crawl button with javascript navigation #665

hamzamac opened this issue Aug 6, 2024 · 5 comments

Comments

@hamzamac
Copy link

hamzamac commented Aug 6, 2024

Hi,
we are try to crawl a site that use s with Javascript based navigation instead of links of tags. The JavaScripts code controlling the behavior of the buttons is hosted on a different domain (CDN) from that of the target site.
When replaying the WACZ, everywhere we have buttons instead of links result into en error, it can not reach the link to JavaScript file.

How can we crawl such a website with Browsertrix-crawler?

@tw4l
Copy link
Contributor

tw4l commented Aug 6, 2024

Hi @hamzamac, would you be able to share the URL of the site you're trying to capture so I can take a look?

@hamzamac
Copy link
Author

hamzamac commented Aug 6, 2024

Hi @tw4l, thank you for responding. The site is actually a SharePoint site with MFA. We manages to crawl it by creating a profile. but the links to folders appears to be spans.
image

when when clicking the button on the replayweb.page it shows this error below
image
(the URL is pointing to is a public CDN URL which is accessible)
Do we need to include all the URI for JavaScripts in the seeds?

@tw4l
Copy link
Contributor

tw4l commented Aug 6, 2024

Hm, you shouldn't need to include the URIs for scripts - if the script is on the page, the crawler will discover it. This looks to me like it's more likely to be a bug in our replay engine than a missing script. It's hard to tell further without being able to reproduce it ourselves - would you be able to share a copy of the WACZ by email?

@hamzamac
Copy link
Author

hamzamac commented Aug 7, 2024

Hi @tw4l, sure I will send the WACZ to the email on your profile.

@hamzamac
Copy link
Author

Hi @tw4l , can you please confirm if you have received the WACZ file? thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: Triage
Development

No branches or pull requests

2 participants