BTW: I like the notion of the Cogito virus:
Cute.
This week I would like to improve our findUIMatch method, making it a bit more flexible. You can see the way the method looks at the moment:
def findUIMatch( user_input, xml ) REXML::Document.new(xml).root.get_elements('user_input').each do |entry| entry = entry.to_s text = getText(entry) if( user_input == text ) #think about less strict matches return getID(entry) end end return nil endWhat bothers me is that == check. What if the user misspells a word? It won't match! So if what the user types does not exactly match with what is in the collective, nil is returned. Seems kind of harsh. I think I found an algorithm that might help us. It comes from a US patent obtained in 1918 (even before I was alive!). The general idea is to take letters that sound the same and collapse them all into a single code. The algorithm applies to a single word. Just to give you some examples, when I tried the patented algorithm on these words, they came out as matching, i.e., they all produced the same code!
{colr, coler, colur, culler} all match colorCool, huh. Now we can match words spelled phonetically by the user. I'm going to give you the outline of the method we need below. You are asked to fill in pieces of it. First, let me give you the written description from the patent. The general idea is to take an English word, and return a "hash" for it. A hash will be a letter followed by 3 numbers. So any word, no matter how big or small, will end up with a 4 character hash (1 letter and 3 numbers).
b, f, p, v => "1" c, g, j, k, q, s, x, z => "2" d, t => "3" l => "4" m, n => "5" r => "6"General idea is that all letters in a group have same sound. You don't have to believe this, but it is what is on the patent.
Problem 1. Start of the method is below. You will need to fill in pieces, using your knowledge of string manipulation. I've put comments in to remind you what you need to do. I've also added debugging code to help you out, i.e., I added a set of statements (puts) that tell you what the value of str is as you work your way through the steps. This allows you to see intermediate values along the way. This is a standard way of debugging code used by good programmers. When you are happy with the way things are working, you can either remove the puts statements from your code, or comment them out (place a # in front of each one).
def hashWord(str) return nil if str.empty? str = str.delete(" \n") #get rid of any lingering blanks and newlines # step 0 str = #make everything lower case. Hint: dot-method upcase makes everything upper case. puts "Debug str after step 0: " + str # step 1 first_letter = #remember first letter for later (but don't delete it) puts "Debug first_letter after step 1: " + first_letter #step 2 str = #change {a, e, h, i, o, u, w, y} to "0". Hint: see Ruby tools described below puts "Debug str after step 2: " + str #step 3 str = #change {b,f,p,v} to "1" str = #change {g,j,k,q,s,x,z} to "2" str = #change {dt} to "3" str = #change {l} to "4" str = #change {m,n} to "5" str = #change {r} to "6" puts "Debug str after step 3: " + str #step 4 str = #reduce seq of 1s to single 1, e.g., "1110112" => "1012" Hint: remember the reduction trick. str = #reduce seq of 2s to single 2 str = #reduce seq of 3s to single 3 str = #reduce seq of 4s to single 4 str = #reduce seq of 5s to single 5 str = #reduce seq of 6s to single 6 puts "Debug str after step 4: " + str #step 5 ? = first_letter #restore first letter str = #Now delete the 0s str = #pad with three 0s to guarantee at least length of 4, e.g., "s" => "s000", etc. puts "Debug str after pad: " + str str = #slice off the first 4 characters (the "hash") puts "Debug str after slice: " + str return str endHere is some code to help with testing. You can add it below your hashWord method and run it, or just copy and paste the method and this code into the playground.
while( true ) print "Type an English word: " w = gets.delete("\n") puts "Here is the hash: " + hashWord( w ) endHere are some test cases to try: both "Robert" and "Rupert" return the same string "r163", while "Rubin" yields "r150". You can check to make sure that {colr, coler, colur, culler, color} all hash to the same value. To test step 1, try this: "lloyd" => "l300". If you remove the first letter, you will miss the double l squish in step 4, and end up incorrectly with => "l430". So don't remove the first letter in step 1!
When you are happy with your code, remove or comment out the debugging statements, i.e., all the puts statements.
Finally, I'll give you some Ruby tools that might help you with steps 2 and 3.
str.gsub(pattern, replacement) => new_str Returns a copy of str with all occurrences of pattern replaced with replacement. The pattern will typically be a Regexp (a pattern between two slashes). Example: "hello".gsub(/[aeiou]/, '*') #=> "h*ll*"
str.tr(from_str, to_str) => new_str Returns a copy of str with the characters in from_str replaced by the corresponding characters in to_str. If to_str is shorter than from_str, it is padded with its last character. Example: "hello".tr('aeiou', '*****') #=> "h*ll*" "hello".tr('aeiou', '*') #=> "h*ll*" (shortcut alternative) Next example is one of first encryption algorithms - right shifts each letter, i.e., a => b, b => c, etc. word.tr('abcdefghijklmnopqrstuvwxyz', 'bcdefghijklmnopqrstuvwxyza')
Example: "putters shoot balls".squeeze("m-z") #=> "puters shot balls"
str.ljust(integer, padstr=' ') => new_str If integer is greater than the length of str, returns a new String of length integer with str left justified and padded with padstr; otherwise, returns str. "hello".ljust(4) #=> "hello" "hello".ljust(20) #=> "hello " (pads with spaces) "hello".ljust(20, '1234') #=> "hello123412341234123"
Problem 2. There is no problem 2 this week! However, you should think a bit about how we will work hashWord into our chatbot next week. The hashWord method works on a single word. We need to match sentences. We have the user sentence. We have the sentences in the collective wisdom, i.e., in the user_input tags. We will need to go through each sentence, word by word, and hash it. For both sentences! Then we can compare the hashed version of each sentence using ==. Kind of make sense? Cogitate on it.