PubWanUseCase
In the PubWan article, StarPilot uses digital cameras as an example. So we start with a digital camera case study. Price tracking (such as with C-Net) is already an established practice in the electronic appliance sector. Pubwan may have a role in improving this process...making data available as "file dumps" rather than unrich bloated copyrighted results to time-consuming narrow process-constrained queries. Also, developing a price aggregator outside the business sector is a primary goal of pubwan. Nevertheless, price aggregation in some form is already commonplace, so perhaps pubwan should start somewhere else, say XMLization of specifications. Before we talk about how we plan to index and distribute contributions of such data, let's discuss how we might query from it. This might yield ideas on how to index it most efficiently.
Let's say we have two camera users called Dinney and Daniou. For the sake of argument, let's say Dinney is into monochrome "art" photography, while Daniou is interested in sports videography, and improving his golf swing. Both view camera selection (for the time being) as an optimization problem. Both would like a preliminary survey of available data, and they want a "language" for expressing their priorities. For this example, I will use [[|http://www2.iro.umontreal.ca/~paquetse/cgi-bin/wiki.cgi?CreateNormativeSpecificationsForResource_Allocations a notational shorthand I described elsewhere in Wikidom]]. Using that notation, here are how our hypothetical camera users' priorities stack up:
|- |< ||'Dinney' ||'Dariou' |- |<aperture ||minhi ||minlo |- |<autofocus ||maxlo ||maxhi |- |<frame rate ||maxlo ||maxhi |- |<light sensitivity ||maxhi ||maxlo |- |<manual focus ||maxhi ||maxlo |- |<mass ||minlo ||minhi |- |<max. shutter time (min. shutter speed) ||maxhi ||maxlo |- |<memory ||maxhi ||maxhi |- |<min. shutter time (max. shutter speed) ||minhi ||minhi |- |<optical zoom ||maxhi ||maxhi |- |<price ||minhi ||minlo |- |<resolution ||maxhi ||maxlo |- |<sensor size ||maxhi ||maxlo
Note that while priorities (hi or lo) are sometimes different, directionality (max or min) is always the same in this example. Exceptions to this rule will exist, but are not very common. For most parameters, either "more is better" or "less is better", universally. The designation of mins and maxes unambiguously defines a subset of all catalogued digital cameras that is called by a host of different names, including "[[|http://electronics.cnet.com/electronics/0-1429209.html?tag=st.cn.1.cameras.1429209 nondominated||http://citeseer.nj.nec.com/243743.html]]", "|http://lib.stat.cmu.edu/joint94/Abstracts/0086", and "|http://www.democracy2000.org/effrontierS.htm". We wouldn't want to recommend a digital camera based on specifications alone, so we'll humbly call it a "preliminary selection". Given the number of objectives in our optimization scheme, it probably includes all cameras anyway. The hi and lo priority desigmations suggest a location somewhere on the "optimal" (13 dimensional) surface. Here is a suggested way to do this:
- construct a 13 dimensional hypercube.
- list the 13 specifications of n cameras, excluding "dominated" ones if there are any.
- rank each value of each specification, from 1/(2n) to (2n-1)/2n. Reverse the order for minimized objectives.
- plot the normalized coordinates for each camera in the hypercube.
- plot a point for Dinney at (2/3,1/3,1/3,2/3,2/3,2/3,2/3,2/3,2/3,2/3,1/3,2/3,2/3), and a point for Dariou at (1/3,2/3,2/3,1/3,1/3,2/3,1/3,2/3,2/3,2/3,1/3,1/3,1/3).
The preference points are not likely to be anywhere near/on the efficient surface, but hopefully they are closer to some (occupied) parts of it than to others.
These specifications of priorities are suggested as a means of querying pubwan, not an addition to its database. The idea of uploading query parameters into pubwan itself raises some interesting questions. If nothing else, pubwan is about decentralizing what is sometimes called "intelligence". Consumers' statements of their priorities certainly have potential to enhance someone's intelligence. Since a primary goal of pubwan is to initiate sophisticated intelligence-gathering by individuals about institutions, it might not be ideal to include information about individuals. Privacy is not a primary goal of pubwan, but it should certainly be respected. Keeping the individual in control is key. Pubwan-querying software should have optional settings for one-way or two-way information flow, depending on how much a user prioritizes privacy. Those who choose to share their optimization specs can help identify underserved market segments, identify clusters of people with similar tastes, and facilitate all sorts of KnowledgeDiscovery useful not only to the pubwan community, but to the business entities it studies. With any luck, tossing an informational bone industry's way might make for a less adversarial relationship, something important to a project such as pubwan in an age of ever expanding definitions of intellectual property, and the gold rush mentality toward machine readable data.
I| have provided some LISP routines to illustrate the selection of the nondominated set and the quantile ranking thereof. I have not yet written a routine to calculate the nearest point to the point suggested by priorities.