Assignment 2


CIS453/553, Data Mining 2008
due 11:59pm Friday, Feb 8

  1. A data warehouse for previous NBA games has four dimensions (date, game, location, player, and spectator), and two measures (count and charge), where charge is a spectator pays when watching a game in a given date and location. However, the spectator can be adults, seniors, children (younger than 6), students, with each category having its own charge rate. The player information include name, age, height, weight, points that game.

  2. We talked about virtual data warehouse and data mediator in the class. Are they exactly the same thing? If not, what's the difference? Whether the virtual data warehouse can support OLAP mining (OLAM)?

  3. Suppose a bank database includes following attributes to describe customers: name, age, gender, address, phone#, credit-ranking (good and bad), year-income, job title (student, engineer, professor etc..). Based on your selected schema in Problem 4 in assignment 1 (Generalized relation), write a DMQL query to compare the general properties between customers who have good or bad credit-ranking. (Note, you do not exactly follow the DMQL syntax especially if you do not know SQL.)

  4. Based on the crosstab of Table 4.22 in the second edition, page 216 (i.e., Table 5.13 in the first edition, page 204). Please write down all quantitative description rules related to both Europe and Computer.

  5. If you use the first edition, please show the information gain of gpa for Table 5.5 and Table 5.6 (page 199) is 0.4490. If you use the second edition, please show the information gain of student in Table 6.1 (page 299) is 0.151.

    To turn in by paper version: Ask Cheri or Star to put your answers to Dejing's mailbox or submit to Dejing during the class or his office hour.

    To turn in by emails: Send them to dou@cs.uoregon.edu. Dejing prefers that you send plain text, or a pdf file which is better. If you are using Word, you should be able to convert your word file to a pdf file.