Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Youtube's atom feed is parsed as XMLFeed #65

Open
gnull opened this issue Jul 2, 2023 · 2 comments
Open

Youtube's atom feed is parsed as XMLFeed #65

gnull opened this issue Jul 2, 2023 · 2 comments

Comments

@gnull
Copy link

gnull commented Jul 2, 2023

Each channel on Youtube has an Atom feed. Or what seems to be a valid Atom feed — I'm not an expert.

For example:

$ curl -s https://www.youtube.com/feeds/videos.xml?channel_id=UCL1rJ0ROIw9V1qFeIN0ZTZQ | head -6
<?xml version="1.0" encoding="UTF-8"?>
<feed xmlns:yt="http://www.youtube.com/xml/schemas/2015" xmlns:media="http://search.yahoo.com/mrss/" xmlns="http://www.w3.org/2005/Atom">
 <link rel="self" href="http://www.youtube.com/feeds/videos.xml?channel_id=UCL1rJ0ROIw9V1qFeIN0ZTZQ"/>
 <id>yt:channel:</id>
 <yt:channelId></yt:channelId>
 <title>Екатерина Шульман</title>

But if I try to parse it with parseFeedSource, it produces an XMLFeed. Which, I assume, is a generic placeholder for everything that wasn're recognized as Atom or RSS:

ghci> :module + Control.Lens Network.Wreq Text.Feed.Types Data.XML.Types
ghci> f <- Network.Wreq.get "https://www.youtube.com/feeds/videos.xml?channel_id=UCL1rJ0ROIw9V1qFeIN0ZTZQ"
ghci> Just f' = parseFeedSource $ f ^. responseBody
ghci> XMLFeed f'' = f'
ghci> elementName f''
Name {nameLocalName = "feed", nameNamespace = Just "http://www.w3.org/2005/Atom", namePrefix = Nothing}

Am I correct to assume this is a bug? If so, can you give me any pointers on where to start fixing it?

Ivan

@gnull
Copy link
Author

gnull commented Jul 2, 2023

https://www.stackage.org/lts-20.18/package/feed-1.3.2.1

I'm using this version of Feed.

@gnull
Copy link
Author

gnull commented Jul 3, 2023

u <- pLeaf "updated" es

It's this check that's failing. And indeed, the Youtube's feed doesn't contain an <updated> element.
When I change the quoted line to

u <- pLeaf "updated" es mplus return (T.pack "<unknown>")

Youtube's feed gets parsed correctly.

The Atom RFC requires the <updated> element:

atom:feed elements MUST contain exactly one atom:updated element.

Is it worth adding a workaround deviating from the standard or let the dogs who are generating Youtube's feed fix it on their end?

Ivan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant