Dynamically extract the download URL from the feed #47

mbridak · 2023-06-04T06:18:54Z

Description

Use lxml xpath to pull download link off of web page to clear issue #10

Fixes #10

Type of change

Bug fix

How has this been tested?

By running it.

Checklist

Issue exists for PR
Code reviewed by the author
Code documented (comments or other documentation)
Changes tested
All tests pass (see DEVELOPING.md, if it exists)
CHANGELOG.md updated if needed
Informative commit messages
Descriptive PR title

Extract the download link.

Add depenency

classabbyamp · 2023-06-04T06:25:58Z

ctyparser/bigcty.py

-                dl_url = f'http://www.country-files.com/bigcty/download/{update_date[:4]}/bigcty-{update_date}.zip'  # TODO: Issue #10
+                page = session.get(update_url)
+                tree = html.fromstring(page.content)
+                dl_url = tree.xpath("//a[contains(@href,'zip')]/@href")[0]
                rq = session.get(dl_url)
                if rq.status_code == 404:


is this codepath still necessary? I think it exists because there was a change in the feed bafa05d

I think that fallback is needed, but very differently, and only if the download URL extraction fails (see other comment)

classabbyamp · 2023-06-04T06:27:10Z

ctyparser/bigcty.py

-                dl_url = f'http://www.country-files.com/bigcty/download/{update_date[:4]}/bigcty-{update_date}.zip'  # TODO: Issue #10
+                page = session.get(update_url)
+                tree = html.fromstring(page.content)
+                dl_url = tree.xpath("//a[contains(@href,'zip')]/@href")[0]


what if tree.xpath() doesn't find anything? would [0] IndexError? we should handle that, perhaps

classabbyamp · 2023-06-04T06:27:40Z

ctyparser/bigcty.py

@@ -172,7 +173,9 @@ def update(self) -> bool:

            with tempfile.TemporaryDirectory() as temp:
                path = pathlib.PurePath(temp)
-                dl_url = f'http://www.country-files.com/bigcty/download/{update_date[:4]}/bigcty-{update_date}.zip'  # TODO: Issue #10
+                page = session.get(update_url)


should probably check for status before trying to parse the content

0x5c

CHANGELOG.md needs to be updated

Your commits will also need to be squashed together, and the resulting commit should describe what it does

0x5c · 2023-06-04T16:16:15Z

ctyparser/bigcty.py

@@ -172,7 +173,9 @@ def update(self) -> bool:

            with tempfile.TemporaryDirectory() as temp:
                path = pathlib.PurePath(temp)
-                dl_url = f'http://www.country-files.com/bigcty/download/{update_date[:4]}/bigcty-{update_date}.zip'  # TODO: Issue #10


It would probably be good to keep that URL as a fallback, like the other URL format already is

0x5c · 2023-06-04T16:17:07Z

ctyparser/bigcty.py

-                dl_url = f'http://www.country-files.com/bigcty/download/{update_date[:4]}/bigcty-{update_date}.zip'  # TODO: Issue #10
+                page = session.get(update_url)
+                tree = html.fromstring(page.content)
+                dl_url = tree.xpath("//a[contains(@href,'zip')]/@href")[0]
                rq = session.get(dl_url)
                if rq.status_code == 404:


I think that fallback is needed, but very differently, and only if the download URL extraction fails (see other comment)

0x5c · 2023-06-04T16:20:44Z

devrequirements.txt

@@ -7,3 +7,4 @@ sphinx
 # Dependencies
 feedparser
 requests
+lxml


Is lxml not needed as a normal dependency? This file is only used for development, so lxml should also be added to setup.py (setup.py being what defines dependencies to install when someone install ctyparser)

mbridak · 2023-06-04T18:47:29Z

Sorry, I think I'm creating more of a problem for you than what I'm solving.

mbridak added 2 commits June 3, 2023 23:02

Update bigcty.py

44cc59d

Extract the download link.

Update devrequirements.txt

d3e4715

Add depenency

classabbyamp reviewed Jun 4, 2023

View reviewed changes

0x5c changed the title ~~Fix for issue #10~~ Dynamically extract the download URL from the feed Jun 4, 2023

pushed changes

3badac2

0x5c requested changes Jun 4, 2023

View reviewed changes

mbridak closed this Jun 4, 2023

mbridak deleted the patch-1 branch June 4, 2023 18:47

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Dynamically extract the download URL from the feed #47

Dynamically extract the download URL from the feed #47

mbridak commented Jun 4, 2023 •

edited by 0x5c

Loading

classabbyamp Jun 4, 2023

0x5c Jun 4, 2023

classabbyamp Jun 4, 2023

mbridak Jun 4, 2023

classabbyamp Jun 4, 2023

mbridak Jun 4, 2023

0x5c left a comment

0x5c Jun 4, 2023

0x5c Jun 4, 2023

0x5c Jun 4, 2023

mbridak commented Jun 4, 2023

Dynamically extract the download URL from the feed #47

Dynamically extract the download URL from the feed #47

Conversation

mbridak commented Jun 4, 2023 • edited by 0x5c Loading

Description

Type of change

How has this been tested?

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

0x5c left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mbridak commented Jun 4, 2023

mbridak commented Jun 4, 2023 •

edited by 0x5c

Loading