This challenge write-up first appeared on PyBites.
This week, each one of you has a homework assignment ... - Tyler Durden (Fight club)
Example output:
$ python tags.py
* Top 10 tags:
python 10
learning 7
tips 6
tricks 5
github 5
cleancode 5
best practices 5
pythonic 4
collections 4
beginners 4
* Similar tags:
game games
challenge challenges
generator generators
Start coding by forking our challenges repo:
$ git clone https://github.com/pybites/challenges
If you already forked it sync it:
# assuming using ssh key
$ git remote add upstream [email protected]:pybites/challenges.git
$ git fetch upstream
# if not on master:
$ git checkout master
$ git merge upstream/master
Use one of the templates:
$ cd 03
$ cp tags-help.py tags.py
# or:
$ cp tags-nohelp.py tags.py
# code
# run the unittests (optional)
$ python test_tags.py
...
----------------------------------------------------------------------
Ran 3 tests in 0.155s
OK
As we update our blog regularly we provided a recent copy of our feed in the 03 directory: rss.xml. We also provided a copy of tags.html for verification (used by unittests in test_tags.py).
Both templates provide 3 constants you should use:
TOP_NUMBER = 10
RSS_FEED = 'rss.xml'
SIMILAR = 0.87
Rest is documented in the methods docstrings. Again use tags-help.py if you need more guidance, tags-nohelp.py is for the more experienced and/or if you want more freedom. Same goes for tests: use them if you need them.
Talking about freedom feel free to use our live feed but then the tests will probably break.
Hint: for word similarity feel free to use NLTK, or your favorite language processing tool. However, stdlib does provide a nice way to do this. Using this method we came to 0.87 as a threshold to for example not mark 'python' and 'pythonic' as similar.
Remember: there is no best solution, only learning more and better Python.
Enjoy and we're looking forward reviewing on Friday all the cool / creative / Pythonic stuff you come up with.
Have fun!
Again to start coding fork our challenges repo or sync it.
More background in our first challenge article.