Changeset 571 for bbox/DEVELOPMENT

Show
Ignore:
Timestamp:
11/27/06 18:03:14 (2 years ago)
Author:
zool
Message:

wrote a lot of acceptance tests for bbox. moved spatial indexing out of core.
providing a lookup table for feedparser shorthand for RSS namespaces that map to functions.


Files:

Legend:

Unmodified
Added
Removed
Modified
Copied
Moved
  • bbox/DEVELOPMENT

    Revision 563 Revision 571
    1:Mon Dec 13 14:54:09 GMT 2004 1:Mon Dec 13 14:54:09 GMT 2004 
    2 2 
    3From Matt Webb: 3From Matt Webb: 
    4 4 
    5>here's my scenario, in which the system i'm building interacts with a black 5>here's my scenario, in which the system i'm building interacts with a black 
    6>box, X: i ask X, please subscribe to these syndication feeds, please get 6>box, X: i ask X, please subscribe to these syndication feeds, please get 
    7>anything on del.icio.us and Flickr tagged with "foo" [1]. i wait for a few 7>anything on del.icio.us and Flickr tagged with "foo" [1]. i wait for a few 
    8>days. i then use the bloglines API to pull out weblog entries, and some api or 8>days. i then use the bloglines API to pull out weblog entries, and some api or 
    9>another to pull out the tagged information, and maybe another to do a search 9>another to pull out the tagged information, and maybe another to do a search 
    10>across the whole datastore for a URL [in the feed text] or keywords. X has gone 10>across the whole datastore for a URL [in the feed text] or keywords. X has gone 
    11>away and looked after fetching and storing feeds, fixing rss 0.91, and throwing 11>away and looked after fetching and storing feeds, fixing rss 0.91, and throwing 
    12>errors for 404'd feeds. 12>errors for 404'd feeds. 
    13 13 
    14:Tue Dec 14 15:00:23 GMT 2004 14:Tue Dec 14 15:00:23 GMT 2004 
    15 15 
    16http://www-106.ibm.com/developerworks/xml/library/x-rdfprov.html is edd's article on tracking rss provenance etc with redland contexts. this will be a useful approach, espec for ensuring that feeds are hosted on the same domains they're talking about, for events in the future. we may even be able to subclass edd's aggregator package as-is, then provide simple gateways for other feed formats in and out. 16http://www-106.ibm.com/developerworks/xml/library/x-rdfprov.html is edd's article on tracking rss provenance etc with redland contexts. this will be a useful approach, espec for ensuring that feeds are hosted on the same domains they're talking about, for events in the future. we may even be able to subclass edd's aggregator package as-is, then provide simple gateways for other feed formats in and out. 
    17 17 
    18need to make sure that epistomat either has sensible support for contexts, or that we can provide it in a non-gnarly way. we can probably also augment fraggle with our more pleasant syntax for uris.  18need to make sure that epistomat either has sensible support for contexts, or that we can provide it in a non-gnarly way. we can probably also augment fraggle with our more pleasant syntax for uris.  
    19 19 
    20looking at edd's code as it stands, it's very low-level, full of workarounds for things that have since been fixed in the redland python API; a place to start, though... it mentions TODO: recording last-modified and using if-modified-since: we need to get that working with urllib2. http://www.btree.net/python/http_web_services/etags.html runs through this process.  20looking at edd's code as it stands, it's very low-level, full of workarounds for things that have since been fixed in the redland python API; a place to start, though... it mentions TODO: recording last-modified and using if-modified-since: we need to get that working with urllib2. http://www.btree.net/python/http_web_services/etags.html runs through this process.  
    21 21 
    22:Wed Dec 15 17:02:44 GMT 2004 22:Wed Dec 15 17:02:44 GMT 2004 
    23 23 
    24http://sourceforge.net/projects/feedparser/ 24http://sourceforge.net/projects/feedparser/ 
    25 25 
    26is mark pilgrims last-ditch rss parser thing. i'd be happiest, i suppose if it did straight transformation of any feed format into rss1. let's see... 26is mark pilgrims last-ditch rss parser thing. i'd be happiest, i suppose if it did straight transformation of any feed format into rss1. let's see... 
    27 27 
    28happily it seems to have good handling for last-modified and etag based requests; i only have to receieve and send the right headers from the store. it doesn't seem to do transformation, just build data structures from common feed elements and provide a nice interface for accessing properties... 28happily it seems to have good handling for last-modified and etag based requests; i only have to receieve and send the right headers from the store. it doesn't seem to do transformation, just build data structures from common feed elements and provide a nice interface for accessing properties... 
    29 29 
    30having a look at the state of the redland store after running edd's decmo, it holds a model like this:  30having a look at the state of the redland store after running edd's decmo, it holds a model like this:  
    31 31 
    32{(r1103038973r1), [http://www.w3.org/1999/02/22-rdf-syntax-ns#_8], [http://sippey.com/archives/000757.php]} {{{[http://usefulinc.com/fraggie/fetch/1]}}} 32{(r1103038973r1), [http://www.w3.org/1999/02/22-rdf-syntax-ns#_8], [http://sippey.com/archives/000757.php]} {{{[http://usefulinc.com/fraggie/fetch/1]}}} 
    33{(r1103038973r1), [http://www.w3.org/1999/02/22-rdf-syntax-ns#_9], [http://www.scottandrew.com/main/2003_07#a000695]} {{{[http://usefulinc.com/fraggie/fetch/1] 33{(r1103038973r1), [http://www.w3.org/1999/02/22-rdf-syntax-ns#_9], [http://www.scottandrew.com/main/2003_07#a000695]} {{{[http://usefulinc.com/fraggie/fetch/1] 
    34 34 
    35this suggests we should keep an incrementing counter per feed, as well as a counter per fetch of it,to keep these numbers in a serial order? we shouldn't worry about it too much as most of the output to queries will be lists of things constructted in date order. so what is the point of storing the sequentiality of items at all? we could plan for this but not bother in the first iteration, where all we need is a statement item -> partof -> channel. 35this suggests we should keep an incrementing counter per feed, as well as a counter per fetch of it,to keep these numbers in a serial order? we shouldn't worry about it too much as most of the output to queries will be lists of things constructted in date order. so what is the point of storing the sequentiality of items at all? we could plan for this but not bother in the first iteration, where all we need is a statement item -> partof -> channel. 
    36 36 
    37:Wed Dec 22 16:01:35 GMT 2004 37:Wed Dec 22 16:01:35 GMT 2004 
    38 38 
    39i am starting to sketch out code and made a distribution here, which includes the epistomat source and that of mark pilgrim's feedparser.  i got distracted by this article of his which was hevaily linked to on the foaf wiki; the scutter vocab material there, turned out to be not much use. 39i am starting to sketch out code and made a distribution here, which includes the epistomat source and that of mark pilgrim's feedparser.  i got distracted by this article of his which was hevaily linked to on the foaf wiki; the scutter vocab material there, turned out to be not much use. 
    40 40 
    41This <a href"http://diveintomark.org/archives/2003/07/21/atom_aggregator_behavior_http_level">mark pilgrim article about feed aggregation behaviour</a> looks like a good read, anyway. 41This <a href"http://diveintomark.org/archives/2003/07/21/atom_aggregator_behavior_http_level">mark pilgrim article about feed aggregation behaviour</a> looks like a good read, anyway. 
    42 42 
    43:Tue Jan 11 07:49:03 IST 2005 43:Tue Jan 11 07:49:03 IST 2005 
    44 44 
    45eek, it's been a while. ongoing notes: 45eek, it's been a while. ongoing notes: 
    46 46 
    47 47 
    48import httpserver 48import httpserver 
    49 49 
    50bloglines API for retrieval 50bloglines API for retrieval 
    51 51 
    52feed mgmt - model, collections, collection instances 52feed mgmt - model, collections, collection instances 
    53 53 
    54learning from past response rate - an urgency parameter which is calculated from the mean time between changes. 54learning from past response rate - an urgency parameter which is calculated from the mean time between changes. 
    55 55 
    56http://frot.org/2005/bbox/ 56http://frot.org/2005/bbox/ 
    57 57 
    58bbox:Feed 58bbox:Feed 
    59        bbox:source 59        bbox:source 
    60                rss:channel 60                rss:channel 
    61 61 
    62        bbox:last_status 62        bbox:last_status 
    63                200/403/etc 63                200/403/etc 
    64        bbox:last_etag 64        bbox:last_etag 
    65                foo010101 65                foo010101 
    66        bbox:last_modified 66        bbox:last_modified 
    67                20059020213 67                20059020213 
    68        bbox:schedule 68        bbox:schedule 
    69                (hours 1-24 between fetches?) 69                (hours 1-24 between fetches?) 
    70 70 
    71bbox:Visit 71bbox:Visit 
    72        ical:datetime 72        ical:datetime 
    73                2005etc 73                2005etc 
    74        bbox:status 74        bbox:status 
    75                200/500/etc 75                200/500/etc 
    76 76 
    77 77 
    78each item is tagged with a visit as context 78each item is tagged with a visit as context 
    79        resolving multiples on the way out? 79        resolving multiples on the way out? 
    80 80 
    81special rules: 81special rules: 
    82        if 404 - check 5 previous fetches - if all 404 suspend 82        if 404 - check 5 previous fetches - if all 404 suspend 
    83        if 301 - follow, make note       83        if 301 - follow, make note       
    84        if 302 - follow, change bbox:source 84        if 302 - follow, change bbox:source 
    85        if 410, switch off forever 85        if 410, switch off forever 
    86        - other statuses embedded in feedparser? 86        - other statuses embedded in feedparser? 
    87 87 
    88parse gives us a dict oriented model 88parse gives us a dict oriented model 
    89        we just use timestamped items and don't use the _1, _2 etc model? 89        we just use timestamped items and don't use the _1, _2 etc model? 
    90        as this will confuse us between different sources 90        as this will confuse us between different sources 
    91 91 
    92        d.etag, d.modified, d.status, d.feed.has_key('foo') 92        d.etag, d.modified, d.status, d.feed.has_key('foo') 
    93         93         
    94        there is dc:creator support; we should patch to include foaf:maker, and always use a foaf model for creator details. 94        there is dc:creator support; we should patch to include foaf:maker, and always use a foaf model for creator details. 
    95 95 
    96:Tue Feb 22 17:55:33 GMT 2005 96:Tue Feb 22 17:55:33 GMT 2005 
    97 97 
    98long lag, in which i've spent a couple of hours making things compile and bashing on the epistomat. to the extent that feedreader hooks up, read different formats, collapses into a model which has contexts. 98long lag, in which i've spent a couple of hours making things compile and bashing on the epistomat. to the extent that feedreader hooks up, read different formats, collapses into a model which has contexts. 
    99 99 
    100made a simple http server for the bloglines interface, and now i'm wondering about user accounts. presumably we need them; i had half-envisioned one bbox for one collection of feeds. 100made a simple http server for the bloglines interface, and now i'm wondering about user accounts. presumably we need them; i had half-envisioned one bbox for one collection of feeds. 
    101 101 
    102options  102options  
    103- make a bbox which doesn't know about user accounts, to test out and use for single-purpose installations (e.g, to crawl spatial info for wirelesslondon, and just have wirelesslondon talk to it) 103- make a bbox which doesn't know about user accounts, to test out and use for single-purpose installations (e.g, to crawl spatial info for wirelesslondon, and just have wirelesslondon talk to it) 
    104- make a bbox which has user accounts, have a stub or generic one for single-purpose uses. don't worry about user management, but have some kind of HTTP basic auth for transactions. 104- make a bbox which has user accounts, have a stub or generic one for single-purpose uses. don't worry about user management, but have some kind of HTTP basic auth for transactions. 
    105 105 
    106case b is probably better, as it won't be much harder to do, will allow us to build-in the right funcitonality straight away, and we can always have an 'all' mode superuser which can't "mark as read" which emulates case a, if that seems necessary. 106case b is probably better, as it won't be much harder to do, will allow us to build-in the right funcitonality straight away, and we can always have an 'all' mode superuser which can't "mark as read" which emulates case a, if that seems necessary. 
    107 107 
    108user-mode is not for collection of feeds, but it is for 'reading' them NNTP style and also for managing a subscription list, foaf-wise. 108user-mode is not for collection of feeds, but it is for 'reading' them NNTP style and also for managing a subscription list, foaf-wise. 
    109 109 
    110management etc can be done via the HTTP representation, The Sync API doesn't let you add subscriptions through it, so we need to create that. 110management etc can be done via the HTTP representation, The Sync API doesn't let you add subscriptions through it, so we need to create that. 
    111 111 
    112we also need to have a new component; a crawler module, that manages getting updates and http status comprehension and timing of future actions; the model in the bbox already handles that stuff, the practicalities of etags etc all supplied by feedparser, which is pretty cool.  112we also need to have a new component; a crawler module, that manages getting updates and http status comprehension and timing of future actions; the model in the bbox already handles that stuff, the practicalities of etags etc all supplied by feedparser, which is pretty cool.  
    113 113 
    114we should probably think pretty seriously about moving to twisted, though; let's look at the docs and compare to a gang of cron jobs / dodgy daemons... 114we should probably think pretty seriously about moving to twisted, though; let's look at the docs and compare to a gang of cron jobs / dodgy daemons... 
    115 115 
    116:Sun Mar  6 16:15:14 GMT 2005 116:Sun Mar  6 16:15:14 GMT 2005 
    117 117 
    118keep thinking about this again the the context of wirelesslondon / as a grout replacement. does what grout does for WL, with a more specialised and thought-out machine interface. has optional 'spatial extensions, basically, which are stored in PostGIS, often for mapserver's benefit, with references to URIs that are members in a Redland store. 118keep thinking about this again the the context of wirelesslondon / as a grout replacement. does what grout does for WL, with a more specialised and thought-out machine interface. has optional 'spatial extensions, basically, which are stored in PostGIS, often for mapserver's benefit, with references to URIs that are members in a Redland store. 
    119 119 
    120:Tue Mar 15 17:01:08 GMT 2005 120:Tue Mar 15 17:01:08 GMT 2005 
    121 121 
    122done a fair bit of work on the underlying 'framework' or what have you. The upgraded rdf-object wrapper is almost debugged and dusted. This has been largely for the benefit of other applications, for wirelesslondon and the consume nodedb. 122done a fair bit of work on the underlying 'framework' or what have you. The upgraded rdf-object wrapper is almost debugged and dusted. This has been largely for the benefit of other applications, for wirelesslondon and the consume nodedb. 
    123 123 
    124in that context i've also been having quite lovely experiences with Quixote, and can now see no reason to build http apps any other way. it can slot into twisted or fastcgi or what have you, easily. 124in that context i've also been having quite lovely experiences with Quixote, and can now see no reason to build http apps any other way. it can slot into twisted or fastcgi or what have you, easily. 
    125 125 
    126i made a nice home page for bbox: http://frot.org/bbox/ and hope to get a public svn or cvs repository together just as soon as the tests pass. (tests!) 126i made a nice home page for bbox: http://frot.org/bbox/ and hope to get a public svn or cvs repository together just as soon as the tests pass. (tests!) 
    127 127 
    128 128 
    129:Fri Mar 18 23:34:37 GMT 2005 129:Fri Mar 18 23:34:37 GMT 2005 
    130 130 
    131Flush with "getting things done", i made a simple quixote ui stub for bbox, and started emulating bloglines API functions. I'll stick this stuff in CVS now. Doesnt' do much yet, not far off. A simple temporal query outlined, a spatial boundign box (with different projections, at  least wgs84 and utm zone N...?) should come next. 131Flush with "getting things done", i made a simple quixote ui stub for bbox, and started emulating bloglines API functions. I'll stick this stuff in CVS now. Doesnt' do much yet, not far off. A simple temporal query outlined, a spatial boundign box (with different projections, at  least wgs84 and utm zone N...?) should come next. 
    132 132 
    133In theory redland supports RDQL and simialr query languages. The question is mappign the column-table, variable-has-value results you get back from the RDF query, into the graph which makes statements that you'd like to complete. Thsi isn't such a big deal short term.  it will enable more inteersting, foafy sort of things, in the future... 133In theory redland supports RDQL and simialr query languages. The question is mappign the column-table, variable-has-value results you get back from the RDF query, into the graph which makes statements that you'd like to complete. Thsi isn't such a big deal short term.  it will enable more inteersting, foafy sort of things, in the future... 
    134 134 
    135:Tues Mar 22 20:22:00 GMT 2005 135:Tues Mar 22 20:22:00 GMT 2005 
    136 136 
    137finally we sat down and fixed the rdfobj wrapper layer. i put a copy of it in here, involved setting PYTHONPATH to include the rdfobj directory.  137finally we sat down and fixed the rdfobj wrapper layer. i put a copy of it in here, involved setting PYTHONPATH to include the rdfobj directory.  
    138 138 
    139So this has facilitated a lot of stuff. Feeds download and are stored in the RDF model, but the clean etag/modified handling advertised by feedparser isn't seamless :/ 139So this has facilitated a lot of stuff. Feeds download and are stored in the RDF model, but the clean etag/modified handling advertised by feedparser isn't seamless :/ 
    140 140 
    141i should open up the rdf import too. i wanted to check this in before i broke anything, though. 141i should open up the rdf import too. i wanted to check this in before i broke anything, though. 
    142 142 
    143:Fri Mar 25 13:08:25 GMT 2005 143:Fri Mar 25 13:08:25 GMT 2005 
    144 144 
    145I stole wholeheartedly from diveintopython.org a tactful http handler, which i'm using to pick at both rss and rdf feeds. I'm still having niggles serialising the context, but bbox is definitely ready to test now. (needs more tests written, too.) 145I stole wholeheartedly from diveintopython.org a tactful http handler, which i'm using to pick at both rss and rdf feeds. I'm still having niggles serialising the context, but bbox is definitely ready to test now. (needs more tests written, too.) 
    146 146 
    147The GIS handling which i'd tentatively inserted, i removed; there is a spatialStore object in the wirelesslondon code tree, which would do the job better and more cleanly, opening up to a standalon spatial index abstraction and remove the postgis dependency which is , well, kludgy. 147The GIS handling which i'd tentatively inserted, i removed; there is a spatialStore object in the wirelesslondon code tree, which would do the job better and more cleanly, opening up to a standalon spatial index abstraction and remove the postgis dependency which is , well, kludgy. 
    148 148 
    149next is to finish the http interface - bloglines - and figure out how best to do temporal searches; on a per-feed basis we can work around that, for now. 149next is to finish the http interface - bloglines - and figure out how best to do temporal searches; on a per-feed basis we can work around that, for now. 
    150 150 
    151:Fri Apr 22 02:48:39 BST 2005 151:Fri Apr 22 02:48:39 BST 2005 
    152 152 
    153I realise i should have a lot of time and energy to devote to bbox at the moment, and am flailing a little faced with the code, looking at different applications. 153I realise i should have a lot of time and energy to devote to bbox at the moment, and am flailing a little faced with the code, looking at different applications. 
    154 154 
    155I should do a source release, which would help. i should also add a crawler and collector component to wirelesslondon; to init from openguides and then pick up the recent changes RSS. That would be useful, but wouldn't help with the implications of bbox as a bigger bit of software. 155I should do a source release, which would help. i should also add a crawler and collector component to wirelesslondon; to init from openguides and then pick up the recent changes RSS. That would be useful, but wouldn't help with the implications of bbox as a bigger bit of software. 
    156 156 
    157I've been holding out for interfaces like the ontomatic, because that does potentially really liberate me from the need to hack on cheesy web applications, much if at all. 157I've been holding out for interfaces like the ontomatic, because that does potentially really liberate me from the need to hack on cheesy web applications, much if at all. 
    158 158 
    159Experimenting with drupal and its RSS aggregator enlightened me as to the need for a monitor-feed-index. perhaps just an RSS bot that i could ask for status, for now. 159Experimenting with drupal and its RSS aggregator enlightened me as to the need for a monitor-feed-index. perhaps just an RSS bot that i could ask for status, for now. 
    160 160 
    161 161 
    162:Fri Apr 22 10:08:19 BST 2005 162:Fri Apr 22 10:08:19 BST 2005 
    163 163 
    164a simple way of doing user and feed management, basically. i wanted to allow people to hook in, or at elast model their own userdb. we have a lot of this code in wirelesslondon; needs plugged in to a simple deliciouslike API. we may as well bung a few template widgets for HTML into our handler for now, then abstract 'em out into the ontomatic later. o, and started an irc bot to do something like reporting, so i can ponder over monitoring functions. The idea is that the information about the latter should drop out of the model; if adequate info isn't contained in it, something is mildly wrong. 164a simple way of doing user and feed management, basically. i wanted to allow people to hook in, or at elast model their own userdb. we have a lot of this code in wirelesslondon; needs plugged in to a simple deliciouslike API. we may as well bung a few template widgets for HTML into our handler for now, then abstract 'em out into the ontomatic later. o, and started an irc bot to do something like reporting, so i can ponder over monitoring functions. The idea is that the information about the latter should drop out of the model; if adequate info isn't contained in it, something is mildly wrong. 
    165 165 
    166i just stole all the user account creation code from wl.user and dropped it into bbox and bbox.ui. This is defeinitely provoking me to wonder if i'm writing the same application. but i need to spike out of stasis at the moment. 166i just stole all the user account creation code from wl.user and dropped it into bbox and bbox.ui. This is defeinitely provoking me to wonder if i'm writing the same application. but i need to spike out of stasis at the moment. 
    167 167 
    168:Sun Oct  9 10:28:49 BST 2005 168:Sun Oct  9 10:28:49 BST 2005 
    169 169 
    170Good lord, i've been slack with this process. 170Good lord, i've been slack with this process. 
    171 171 
    172BBox changed a bit while i was writing nodel; now it only stores and queries geometry in wgs84, this seemed unnesc complex to be reprojecting. nodel uses bbox a lot, and there have been many small bugfixes to bbox in the process. 172BBox changed a bit while i was writing nodel; now it only stores and queries geometry in wgs84, this seemed unnesc complex to be reprojecting. nodel uses bbox a lot, and there have been many small bugfixes to bbox in the process. 
    173 173 
    174After i talked to Benoit Gregoire about it, i realised it should store full geometries for all types, there was only stub support for lines and polygons. i am adding that now, supporting a simple RSS serialisation like Mikel's one  174After i talked to Benoit Gregoire about it, i realised it should store full geometries for all types, there was only stub support for lines and polygons. i am adding that now, supporting a simple RSS serialisation like Mikel's one  
    175at http://brainoff.com/worldkit/doc/polygon.php . As spatial queries for bounding boxes were already being done by making a POLYGON and asking for stuff Within() it, this looks simple; the tests already pass; but now i have to go back through, fix the existing interfaces in bbox and get those passing again. 175at http://brainoff.com/worldkit/doc/polygon.php . As spatial queries for bounding boxes were already being done by making a POLYGON and asking for stuff Within() it, this looks simple; the tests already pass; but now i have to go back through, fix the existing interfaces in bbox and get those passing again. 
    176 176 
    177Then we need a plan for finding data. I know where there is a lot of data nearby me. in the past, i've collected it mostly using scripts - complete mirrors of the 'open guide to london', that kind of thing. now it really needs to be on an aggregation schedule.  177Then we need a plan for finding data. I know where there is a lot of data nearby me. in the past, i've collected it mostly using scripts - complete mirrors of the 'open guide to london', that kind of thing. now it really needs to be on an aggregation schedule.  
    178 178 
    179If we're going to make a nodel UI for bbox then we might as well make a very simple feed-status-manager as well, just a browsable view on fbox.Feed class objects.  179If we're going to make a nodel UI for bbox then we might as well make a very simple feed-status-manager as well, just a browsable view on fbox.Feed class objects.  
    180 180 
    181but a lot of aggregation events should actually be described by more codelike rules, and they are handled through nodel's API to different services which is much more sophis. than bbox's model of get feed, look for spatial stuff, remember it all.  181but a lot of aggregation events should actually be described by more codelike rules, and they are handled through nodel's API to different services which is much more sophis. than bbox's model of get feed, look for spatial stuff, remember it all.  
    182 182 
    183i would say a lot of this for now can be driven by a script on the cron that is explorign the model - get me all tags which an event is tagged with and look at the flickr feed for updates, and so on... get me everything from EVNT from different peoples changes and inboxes... 183i would say a lot of this for now can be driven by a script on the cron that is explorign the model - get me all tags which an event is tagged with and look at the flickr feed for updates, and so on... get me everything from EVNT from different peoples changes and inboxes... 
    184 184 
    185:Mon Oct 30 20:51:14 GMT 2006 185:Mon Oct 30 20:51:14 GMT 2006 
    186 186 
    187It's been a long time. 187It's been a long time. 
    188 188 
    189I'm digging this codebase out because: 189I'm digging this codebase out because: 
    190 190 
    191- Jamie King was asking about it 191- Jamie King was asking about it 
    192- Saul keeps mentioning it in the context of rebooting wirelesslondon 192- Saul keeps mentioning it in the context of rebooting wirelesslondon 
    193- there is a remote possibility that i might get paid for it  193- there is a remote possibility that i might get paid for it  
    194- mapufacture, bless their cotton socks, have no real incentive to release other than goodwill, they need a structure around them. 194- mapufacture, bless their cotton socks, have no real incentive to release other than goodwill, they need a structure around them. 
    195 195 
    196Now i could be contributing my time to egging on mapufacture as i could be to owslib as well. But i'm reminded that bbox is not far off finished. That it did work pretty well just had bad performance problems on serialisation, trying to haul around too many bulky and interconnected python objects at once. 196Now i could be contributing my time to egging on mapufacture as i could be to owslib as well. But i'm reminded that bbox is not far off finished. That it did work pretty well just had bad performance problems on serialisation, trying to haul around too many bulky and interconnected python objects at once. 
    197 197 
    198When i started thinking about a WFS-basic implmenetation my first thought is that would belong here. Also if one were thinking about writing a prototype video metadata aggregator - as i assume Jamie is though i thought they were stuck into prototyping right now, and i know some other people are working on a drupal based solution - though this looks more like drawing, socialising and planning for a big sprint in the spring. But by then they (the transmission.cc people) need something that they can be learning from issues with and using to demonstrate proof of value for their contributing participants. 198When i started thinking about a WFS-basic implmenetation my first thought is that would belong here. Also if one were thinking about writing a prototype video metadata aggregator - as i assume Jamie is though i thought they were stuck into prototyping right now, and i know some other people are working on a drupal based solution - though this looks more like drawing, socialising and planning for a big sprint in the spring. But by then they (the transmission.cc people) need something that they can be learning from issues with and using to demonstrate proof of value for their contributing participants. 
    199 199 
    200One issue Jan [sic?] had lamented was the lack of extensibility of the aggregators supplied with drupal. BBox as is, is pretty much the same - it collects a common core of properties well known to feedparser, plus geo:lat and geo:long - feedparser at least is catholic about what it extracts, as long as it's easy to configure what should be learned, this shouldn't be hard to change and will be useful. (i wonder how it handles a lot of atom extensions? - we'll also have to look for updates). 200One issue Jan [sic?] had lamented was the lack of extensibility of the aggregators supplied with drupal. BBox as is, is pretty much the same - it collects a common core of properties well known to feedparser, plus geo:lat and geo:long - feedparser at least is catholic about what it extracts, as long as it's easy to configure what should be learned, this shouldn't be hard to change and will be useful. (i wonder how it handles a lot of atom extensions? - we'll also have to look for updates). 
    201 201 
    202BBox is totally meant to be light footprint and i see the dependency on nodel crept into it for its http interfaces. This shouldn't have to be the case now - nodel though lovely was an overgrowth - can be replaced with the web.py currently in the geometa codebase. 202BBox is totally meant to be light footprint and i see the dependency on nodel crept into it for its http interfaces. This shouldn't have to be the case now - nodel though lovely was an overgrowth - can be replaced with the web.py currently in the geometa codebase. 
    203 203 
    204How does this one connect to that - both are doing the broker/decorator thing - only the other has a very specific schema. A WFS interface could be appropriate for both though geometa only has envisaged, not implemented support for individual vector features. 204How does this one connect to that - both are doing the broker/decorator thing - only the other has a very specific schema. A WFS interface could be appropriate for both though geometa only has envisaged, not implemented support for individual vector features. 
    205 205 
    206We should a/ work from the data - find a good collection of features that we need to treat of and work from there  206We should a/ work from the data - find a good collection of features that we need to treat of and work from there  
    207b/ figure out one directed thing that we can do and finish and that others will see benefit in, whether that is simplifying and extending bbox or extending and rethinking geometa. WFS-basic is super appealing though i am less sure how to implement the equivalent of OWSCat over it. This would be simple and impressive to do. I would not mind restricting this so that the data or at least an index of it has to be in PostGIS. One could index all the shapes in a shapefile as long as one had some way of referring consistently to the originals. But this swiftly starts to get into the domain of annotation system - a problem which looks the same as attaching potentially arbitrary properties and accompanying values to features and collections of them. This is why i keep thinking about bbox, because the arbitrariness is what the rdf store is for. With wfs-basic we don't need to mess around with geoserver and the allocation of URIs any more, and we get the facility of DescribeFeatureType to abuse how we like. 207b/ figure out one directed thing that we can do and finish and that others will see benefit in, whether that is simplifying and extending bbox or extending and rethinking geometa. WFS-basic is super appealing though i am less sure how to implement the equivalent of OWSCat over it. This would be simple and impressive to do. I would not mind restricting this so that the data or at least an index of it has to be in PostGIS. One could index all the shapes in a shapefile as long as one had some way of referring consistently to the originals. But this swiftly starts to get into the domain of annotation system - a problem which looks the same as attaching potentially arbitrary properties and accompanying values to features and collections of them. This is why i keep thinking about bbox, because the arbitrariness is what the rdf store is for. With wfs-basic we don't need to mess around with geoserver and the allocation of URIs any more, and we get the facility of DescribeFeatureType to abuse how we like. 
    208 208 
    209 209 
    210:Sat Nov  4 03:14:17 GMT 2006 210:Sat Nov  4 03:14:17 GMT 2006 
    211 211 
    212Property extensibility crossed my mind briefly while looking back through __init__.py and i see a long rush of stuff saying "if e.has_key('geo_lat')" etc etc. We definitely have to fix this. This is even worse as it occurs conditionally according to whether or not one has enabled the spatial index. I'm thinking about making the spatial index mandatory.  212Property extensibility crossed my mind briefly while looking back through __init__.py and i see a long rush of stuff saying "if e.has_key('geo_lat')" etc etc. We definitely have to fix this. This is even worse as it occurs conditionally according to whether or not one has enabled the spatial index. I'm thinking about making the spatial index mandatory.  
    213 213 
    214Part of this is because there's no date range query support here yet. I really thought there was; i think it dropped out over iterations when spatial query became all important. One can ask for N recent() things but that's based on date collected, not date emitted with the data. We look at the latter and store it in the RDF store in iCal format. Then the object going into the spatial store is decoupled, as above only if it's enabled. Without it, we can't do date range queries without resorting all the way to SPARQL. Goodness knows rdfobj should have an interface through to SPARQL in redland, which just wasn't stable back when Schuyler and i wrote it. (SPARQL has dateTime-less-than and dateTime-greater-than predicates, the syntax is messy and right at this minute i don't want to go there - i just want to get the baseline of WFS Simple implemented, no matter how nasty it looks inside for now, and worry later. 214Part of this is because there's no date range query support here yet. I really thought there was; i think it dropped out over iterations when spatial query became all important. One can ask for N recent() things but that's based on date collected, not date emitted with the data. We look at the latter and store it in the RDF store in iCal format. Then the object going into the spatial store is decoupled, as above only if it's enabled. Without it, we can't do date range queries without resorting all the way to SPARQL. Goodness knows rdfobj should have an interface through to SPARQL in redland, which just wasn't stable back when Schuyler and i wrote it. (SPARQL has dateTime-less-than and dateTime-greater-than predicates, the syntax is messy and right at this minute i don't want to go there - i just want to get the baseline of WFS Simple implemented, no matter how nasty it looks inside for now, and worry later. 
    215 215 
    216So basically i am adding a 'dated' datetime field to the index  and that'll have to work just how within_box works now - construct the sql, run it, get a list of node identifiers back. 216So basically i am adding a 'dated' datetime field to the index  and that'll have to work just how within_box works now - construct the sql, run it, get a list of node identifiers back. 
    217We should be able to pass date range limits into within_box or within_shape. Plus we want to be able to do date range queries without a box, for new. This just goes in the spatialStore.py module for now. Because that's an accreted mess which needs rewritten before any Shiny New Release, anyway. 217We should be able to pass date range limits into within_box or within_shape. Plus we want to be able to do date range queries without a box, for new. This just goes in the spatialStore.py module for now. Because that's an accreted mess which needs rewritten before any Shiny New Release, anyway. 
    218 218 
    219:Sun Nov  5 07:31:12 GMT 2006 219:Sun Nov  5 07:31:12 GMT 2006 
    220 220 
    221Right now i want to get to demo fastest, so I hooked this up to wirelesslondon's old database and copied the created dates to dated dates. there are 4.5K things in it which GetFeature by default won't deal very well with. 221Right now i want to get to demo fastest, so I hooked this up to wirelesslondon's old database and copied the created dates to dated dates. there are 4.5K things in it which GetFeature by default won't deal very well with. 
      222 
      223:Thu Nov 23 13:42:14 GMT 2006 
      224 
      225WFS Simple was all very nice, and now we are going back and thinking about refactoring. 
      226 
      2271/ bbox/lookup.py contains a pile of lookup functions and a dict mapping them to properties that come out of extended namespaces in feedparser, the idea being that it's easy to extend there what the parser puts into the RDF model. 
      228 
      2292/ As an artefact of adding doctests to this we went back and fixed rdfobj so it looks for new namespaces to add as module globals after every load(), which is a cheap way of doing it but works and gets rid of a nasty bug we had for a long time (e.g. having to quit the store in the console after a namespace load, or having to run all the tests twice, first time for bootstrap). Which is great. 
      230 
      2313/ But now the callback either wants the spatialStore or we do it another way. Actually it would be great to get the direct spatial stuff out of bbox and provide like a post-hook for objects; so when an object gets past the feedparser or the rdf parser a post-process (like spatial indexing) or even a number of them can get run. So we might want to generalise this. Best way to do it? 
      232 
      2334/ While looking at spatialStore's interface we see it deals in strings and not really in rdf objects (though some newer methods have to do that) and it should probably be the latter on the outside interface - simpler. One could e.g. stick a call to an encoding-sniffer web service in here and insert the metadata one gets back in the object (it doesn't matter if we are waiting, right, there is not often going to be a client sitting around looking at this process) 
      234 
      235I'm happy with this, extensions + some solid refactoring. 
      236 
      237Thanks, Tav, for slapping my python up. 
      238 
      239So:  
      240- BBox gets started with a bunch of indexes (remember it had spatial, text etc as separate inputs before 
      241- at the end of read_rdf / read_rss we loop through the indexes calling a add_to_index method which we must provide on the index's interface, pushing each item in at a time. Then *all* the spatially-specific code can move right out of bbox.Yeah! 
      242 
      243Ugh, i found a lot of nodel-like authentication code right inside bbox/__init__; it has to leave there.