Login and get codingThere is a new Python news aggregator in town! Check it out here. In this Bite you will parse it!
Update April 2023: seems the site is no longer online, but we use a static copy for this Bite.
Imagine you want to email yourself and colleagues a Friday digest of top articles, based on number of points and comments.
Our first go would be
feedparser
but there is not an RSS feed yet.So in this Bite you will use some
BeautifulSoup
(4.7.1) to parse the HTML yourself. Not a bad skill to have, no?We have you parse a static copy of the site so we have predictable data to test your code against. As you can see in the tests your code should work with the second (paginated) page as well.
Note we had some issues getting
lxml
to work on the platform so usebs4
'shtml.parser
for now. Also the W3C validator does not really like the HTML so you cannot rely on article or table while parsing out the entries. Search for the title class instead.Good luck and bookmark this site to keep up2date with Python news. If you see anything interesting feel free to share it in our community.
Update 20th of Oct 2019: there is an RSS feed available now, but no count of comments/points so you will still need
BeautifulSoup
/ scraping. No worries though, if you want to scrape RSS feeds, take one of ourfeedparser
Bites ...Keep calm and code more Python!
113 out of 113 users completed this Bite.
Will you be the 114th person to crack this Bite?
Resolution time: ~75 min. (avg. submissions of 5-240 min.)
Our community rates this Bite 4.89 on a 1-10 difficulty scale.
» Up for a challenge? 💪