This is a good one!
Previous entries in this series: http://www.guldmyr.com/blog/wasthereannhlgamelastnight-com-now-using-object-storage/ and http://www.guldmyr.com/blog/wasthereannhlgamelastnight-appspot-com-fixed-working-again/
Renamed to wtangy.se
First things first! The website has been renamed to wtangy.se! Nobody in their right mind would type out
wasthereannhlgamelastnight.com.. so now it’s an acronym of wasthereannhlgameyesterday. wtangy.se . Using Sweden .se top level domain because there was an offer making it really cheap :)
Automatic testing and deployment
Second important update is that now we do some automatic testing and deployment.
This is done with travis-ci.org where one can view builds, the configuration is done in this file.
In google cloud there’s different versions of the apps deployed. If we don’t promote a version it will not be accessible from wtangy.se (or wasthereannhlgamelastnight.appspot.com) but via some other URL.
Right now the testing happens like this on every commit:
- deploy the code to a testing version (which we don’t promote)
- then we run some scripts:
- pylint on the python scripts
- an end to end test which tries to visit the website.
- if the above succeeds we do deploy to master (which we do promote)
To continue this series of blog posts about the awesome https://wasthereannhlgamelastnight.appspot.com/WINGS web site where you can see if there was in fact, an NHL game last night :)
Some background: First I had a python script that scraped the website of nhl.com and later changed that to just grab the data from the JSON REST API of nhl.com – much nicer. But it was still outputing the result to stdout as a set and a dictionary. And then I would in the application import this file to get the schedule. This was quite hacky and ugly :) But hey it worked.
As of this commit it now uses Google’s Cloud Object Storage:
- a special URL (one has to be an admin to be able to access it)
- there’s a cronjob which calls this URL once a day (22:00 in some time zone)
- when this URL is called a python script runs which:
- checks what year it is and composes the URL to the API so that we only grab this season’s games (to be a bit nicer to the API)
- does some sanity checking – that the fetched data is not empty
- extracts the dates and teams as before and writes two variables,
- one list which has the dates when there’s a game
- one dictionary which has the dates and all the games on each date
- probably the last would be enough ;)
- finally always overwrites the schedule
To only update it when there are changes would be cool as then I could notify myself (and possibly others) when there have been changes, but it would mean that the JSON dict has to be ordered, which they aren’t by default so I’d have to change some stuff. The GCSFileStat has a checksum-like metadata of the files called ETAG. But probably it would be best to first compute a checksum of the generated JSON and then add that as an extra metadata to the object as this ETAG is probably implemented differently between providers.
wasthereannhlgamelastnight.appspot.com – fixed – working again!
With NHL 2017-2018 season coming up and I had some extra spare time I thought why not finally fix this great website again :)
As NHL changed the layout of their schedule page about two seasons ago – there’s these days “infinite scrolling” or whatever it’s called when the page only loads what you see on the screen. This means it’s a bit difficult to scrape the page (but not impossible).
Lately I’ve been using REST API and JSON data for quite many things – after a short search I managed to find this hidden gem: https://statsapi.web.nhl.com/api/v1/schedule?startDate=2016-01-31&endDate=2016-02-05&expand=schedule.teams,schedule.linescore,schedule.broadcasts,schedule.ticket,schedule.game.content.media.epg&leaderCategories=&site=en_nhl&teamId=
Now that’s a link to an API provided by NHL where you get the schedule and you can filter it. I’m not sure what all the parameters do, they’re not all needed. You just need the startDate and endDate. The API also has standings and results. I have not managed to find any documentation for it. Best so far seems to be this blog post. So I’m not sure about if it’s OK to use it or if there are any restrictions.
p.s. – there is a shorter URL to the main page: https://rix.fi/nhl – but the commands – like https://wasthereannhlgamelastnight.appspot.com/MTL – does not work.
Was there an NHL game last night?