ProductCategorization
From RCC 2007
How can a large dataset, with tags that might be adhoc, be built, administered and delivered?
Don't let the perfect be the enemy of the good
Tech
- how will the data appear on the page, how will it look? What is the delivery mechanism?
- barrier to adoption for web clients of the data has to be low
- clear distinctions between "unassailable" , generally quantitative data, and user-reviews and commentary. Both present different problems
Tools
- ontologies ex. papers
- ML / provable systems (are these really one category Sandra?) Conf Link
- Relational Theory Wikipedia Article
- Unsupervised learning techniques Intro
- IBM UIMA Unstructured Information Management Architecture
- RDF models Simile Home Page
Mashups are happening right now. Companies like [StrikeIron] are providing data source services, and some tools like their Excel plugin that reads StrikIron sources.
RDF is notoriously abstracted and "not Done Yet"
Note that really large datasets are in practice not relational tables, but carefully chosen flat tables, because of the enormous compute resource cost associated with managing them.
Some Relevant Mashup Links
http://wiki.mashupcamp.com/index.php/LicensingCommericalMashup
http://blog.programmableweb.com/?p=521
http://redmonk.com/sogrady/2007/01/17/john-herren-mashup-camp/
Legal/Business
Does this require a neutral third party? Is the Wiser model the right one? When will we know the details of the Wiser link contracts?











