I had a look. It's not trivial to say the least. Actually I'd say it's not
doable at all. Here's a quick summary of why:
The main page (for your zip code/location) loads a lot of scripts, one of
which has the JSONP payload we want. So the first step would need to be
something that still parses the main page for your location to search for
and extract the correct script URL to then load and parse.
Here's the main page for zip code 10020:
http://www.weather.com/weather/today/l/10020:4:US
Load that in Chrome with the developer tools panel open. Stand well back.
What you'll find among the blizzard of other resources that get loaded is
this:
http://dsx.weather.com/wxd/v2/(BERec...r.callbacks._c
That provides the following JSONP (again, discoverable only via the
developer tools panel):
---------------------
angular.callbacks._c({ "status": 200, "body": [
{"id": "/wxd/v2/BERecord/en_US/USNY1252:1:US",
"status": 204
}
,
{"id": "/wxd/v2/MORecord/en_US/USNY1252:1:US",
"status": 200,
"generatedTime": 1417019149,
"cacheMaxSeconds": 300,
"currentTime": 1417019201
, "doc":
{"MOHdr":{"obsStn":"T72503067","procTm":"201411261 11035","_procTmLocal":"2014-11-26T11:10:35.000-05:00","procTmISO":"2014-11-26T16:10:35.000Z"},"MOData":{"stnNm":"Glen
Head","obsDayGmt":"20141126","obsTmGmt":"160500"," dyNght":"D","locObsDay":"20141126","locObsTm":"110 416","tmpF":35,"tmpC":2,"sky":11,"wx":"Light
Rain","iconExt":1201,"alt":30.1,"baroTrnd":2,"baro TrndAsc":"Falling
Rapidly","ceil":800,"ceilM":244,"clds":"OVC","dwpt F":33,"dwptC":1,"hIF":35,"hIC":2,"rH":94,"pres":10 20.2,"presChnge":-0.05,"visM":5.0,"visK":8.05,"wCF":27,"wCC":-3,"wDir":20,"wDirAsc":"NNE","wSpdM":10,"wSpdK":16, "wSpdKn":9,"tmpMx24F":55,"tmpMx24C":13,"tmpMn24F": 35,"tmpMn24C":2,"tmpMx6F":-21,"tmpMx6C":-29,"prcp24":0.27,"prcp3_6hr":0.27,"prcpHr":0.05,"p rcpMTD":4.47,"prcpYr":39.69,"prcp2Dy":0.27,"prcp3D y":0.83,"prcp7Dy":0.83,"snwDep":0.5,"snwIncr":0.2, "snwTot":0.5,"snwTot6hr":0.5,"snwMTD":0.8,"snwSsn" :0.8,"snwYr":47.6,"snw2Dy":0.5,"snw3Dy":0.5,"snw7D y":0.5,"sunrise":"06:52
am","sunset":"04:28
pm","uvIdx":1,"uvDes":"Low","uvWrn":0,"flsLkIdxF": 27,"flsLkIdxC":-3,"recTyp":"TECCI","vocalCd":"OIT72503067:OZ201411 261605:OT35:OTC27:OTF27:OTH55:OTL35:OTD-21:OU1:OH94:OX1201:OW01S10:OD33:OV50:OC8:OP3010T01 :ORH5:ORQ27:ORD27:OSH2:OSQ5:OSD5:ORM447:ORY3969:OM R352:OYR4244:OSM8:OSY476:OSS8:OQ1156","avgMTDPreci p":3.52,"avgYTDPrecip":42.44,"wxMan":"wx2510","qul fr":"OQ1156","qulfrSvrty":2,"_presIn":30.13,"_altM eters":9.17,"_snwDepCm":1.27,"_prcp24Cm":0.69,"_pr cp24Mm":6.86,"_prcpYrMm":1008.13,"_prcpMTDMm":113. 54,"_prcp2DyMm":6.86,"_prcp3DyMm":21.08,"_prcp7DyM m":21.08,"_snwYrCm":120.9,"_snw2DyCm":1.27,"_snw3D yCm":1.27,"_snw7DyCm":1.27,"_sunriseISOLocal":"201 4-11-26T06:52:00.000-05:00","_sunsetISOLocal":"2014-11-26T16:28:00.000-05:00","obsDateTimeISO":"2014-11-26T16:05:00.000Z","sunriseISO":"2014-11-26T11:52:00.000Z","sunsetISO":"2014-11-26T21:28:00.000Z","_obsDateLocalTimeISO":"2014-11-26T11:05:00.000-05:00","_extendedQulfrPhrase":"A
mix of wintry precipitation is occurring at other points
nearby.","_wDirAsc_en":"NNE"}}
}
] })
---------------------
And in fact you can see in there various useful bits of information, like
temperature, etc. JSON key/value pairs that could, in theory, be
extracted.
Here's the thing. That URL doesn't exist if you just download the main
page via, say, curl (or via a perl script, same thing). So in other words,
it's only because some other Javascript is evaluated that the browser then
makes a request for that URL -- but without going through that process,
you can't know what URL to request. And whatever state is necessary? You
won't have that either.
So a perl script can't access it unless we play guessing games with the
URL and assume it will always be of a certain form, then grab it directly.
That's not likely possible.
My recommendation is to abandon weather.com as a source. If I had to do
this insane parsing job for some reason I'd be looking at using PhantomJS:
http://phantomjs.org/
which is basically a Javascript enabled headless browser that you can then
interrogate. So you tell it to load the weather.com page, it will happily
run their metric f**k ton of JS, and at that point you then you have
access to the DOM and you can go to town, similar to inspecting things via
the developer tools panel.
Needless to say, this is not something I think that we want our Squeezebox
servers doing...
So I think people who want weather need another source and a reboot of the
parser for that source.
The good news is that screen scraping (DOM scraping, really) with the perl
is actually very straightforward. As long as you don't need to run JS to
get there...
In fact I had a look at forecast.io. It's global, and although it too uses
JSON, it's much more lightweight and straightforward. Again though,
fitting all into the existing SDT framework would be some work no matter
what.
SBB
doable at all. Here's a quick summary of why:
The main page (for your zip code/location) loads a lot of scripts, one of
which has the JSONP payload we want. So the first step would need to be
something that still parses the main page for your location to search for
and extract the correct script URL to then load and parse.
Here's the main page for zip code 10020:
http://www.weather.com/weather/today/l/10020:4:US
Load that in Chrome with the developer tools panel open. Stand well back.
What you'll find among the blizzard of other resources that get loaded is
this:
http://dsx.weather.com/wxd/v2/(BERec...r.callbacks._c
That provides the following JSONP (again, discoverable only via the
developer tools panel):
---------------------
angular.callbacks._c({ "status": 200, "body": [
{"id": "/wxd/v2/BERecord/en_US/USNY1252:1:US",
"status": 204
}
,
{"id": "/wxd/v2/MORecord/en_US/USNY1252:1:US",
"status": 200,
"generatedTime": 1417019149,
"cacheMaxSeconds": 300,
"currentTime": 1417019201
, "doc":
{"MOHdr":{"obsStn":"T72503067","procTm":"201411261 11035","_procTmLocal":"2014-11-26T11:10:35.000-05:00","procTmISO":"2014-11-26T16:10:35.000Z"},"MOData":{"stnNm":"Glen
Head","obsDayGmt":"20141126","obsTmGmt":"160500"," dyNght":"D","locObsDay":"20141126","locObsTm":"110 416","tmpF":35,"tmpC":2,"sky":11,"wx":"Light
Rain","iconExt":1201,"alt":30.1,"baroTrnd":2,"baro TrndAsc":"Falling
Rapidly","ceil":800,"ceilM":244,"clds":"OVC","dwpt F":33,"dwptC":1,"hIF":35,"hIC":2,"rH":94,"pres":10 20.2,"presChnge":-0.05,"visM":5.0,"visK":8.05,"wCF":27,"wCC":-3,"wDir":20,"wDirAsc":"NNE","wSpdM":10,"wSpdK":16, "wSpdKn":9,"tmpMx24F":55,"tmpMx24C":13,"tmpMn24F": 35,"tmpMn24C":2,"tmpMx6F":-21,"tmpMx6C":-29,"prcp24":0.27,"prcp3_6hr":0.27,"prcpHr":0.05,"p rcpMTD":4.47,"prcpYr":39.69,"prcp2Dy":0.27,"prcp3D y":0.83,"prcp7Dy":0.83,"snwDep":0.5,"snwIncr":0.2, "snwTot":0.5,"snwTot6hr":0.5,"snwMTD":0.8,"snwSsn" :0.8,"snwYr":47.6,"snw2Dy":0.5,"snw3Dy":0.5,"snw7D y":0.5,"sunrise":"06:52
am","sunset":"04:28
pm","uvIdx":1,"uvDes":"Low","uvWrn":0,"flsLkIdxF": 27,"flsLkIdxC":-3,"recTyp":"TECCI","vocalCd":"OIT72503067:OZ201411 261605:OT35:OTC27:OTF27:OTH55:OTL35:OTD-21:OU1:OH94:OX1201:OW01S10:OD33:OV50:OC8:OP3010T01 :ORH5:ORQ27:ORD27:OSH2:OSQ5:OSD5:ORM447:ORY3969:OM R352:OYR4244:OSM8:OSY476:OSS8:OQ1156","avgMTDPreci p":3.52,"avgYTDPrecip":42.44,"wxMan":"wx2510","qul fr":"OQ1156","qulfrSvrty":2,"_presIn":30.13,"_altM eters":9.17,"_snwDepCm":1.27,"_prcp24Cm":0.69,"_pr cp24Mm":6.86,"_prcpYrMm":1008.13,"_prcpMTDMm":113. 54,"_prcp2DyMm":6.86,"_prcp3DyMm":21.08,"_prcp7DyM m":21.08,"_snwYrCm":120.9,"_snw2DyCm":1.27,"_snw3D yCm":1.27,"_snw7DyCm":1.27,"_sunriseISOLocal":"201 4-11-26T06:52:00.000-05:00","_sunsetISOLocal":"2014-11-26T16:28:00.000-05:00","obsDateTimeISO":"2014-11-26T16:05:00.000Z","sunriseISO":"2014-11-26T11:52:00.000Z","sunsetISO":"2014-11-26T21:28:00.000Z","_obsDateLocalTimeISO":"2014-11-26T11:05:00.000-05:00","_extendedQulfrPhrase":"A
mix of wintry precipitation is occurring at other points
nearby.","_wDirAsc_en":"NNE"}}
}
] })
---------------------
And in fact you can see in there various useful bits of information, like
temperature, etc. JSON key/value pairs that could, in theory, be
extracted.
Here's the thing. That URL doesn't exist if you just download the main
page via, say, curl (or via a perl script, same thing). So in other words,
it's only because some other Javascript is evaluated that the browser then
makes a request for that URL -- but without going through that process,
you can't know what URL to request. And whatever state is necessary? You
won't have that either.
So a perl script can't access it unless we play guessing games with the
URL and assume it will always be of a certain form, then grab it directly.
That's not likely possible.
My recommendation is to abandon weather.com as a source. If I had to do
this insane parsing job for some reason I'd be looking at using PhantomJS:
http://phantomjs.org/
which is basically a Javascript enabled headless browser that you can then
interrogate. So you tell it to load the weather.com page, it will happily
run their metric f**k ton of JS, and at that point you then you have
access to the DOM and you can go to town, similar to inspecting things via
the developer tools panel.
Needless to say, this is not something I think that we want our Squeezebox
servers doing...
So I think people who want weather need another source and a reboot of the
parser for that source.
The good news is that screen scraping (DOM scraping, really) with the perl
is actually very straightforward. As long as you don't need to run JS to
get there...
In fact I had a look at forecast.io. It's global, and although it too uses
JSON, it's much more lightweight and straightforward. Again though,
fitting all into the existing SDT framework would be some work no matter
what.
SBB