FUNDAMENTALS OF ARTIFICIAL INTELLIGENCE - CS161

Programming Assignment 1 - Due Friday, October 9

This problem is about matching. Consider the Unix grep function which search a file for a pattern. For example, given the file "fox.txt" contains the text

        The quick brown fox 
        jumped over
        the lazy dog
The Unix command
        grep dog fox.txt
will print out the line
        the lazy dog

1. Write a LISP function (using only the core LISP functions), called grep, that takes a matching pattern pattern in the form of a non-null list, and a line of text text (also in the form of a non-null list), and returns the line if the pattern matches part of the line. Otherwise the function returns nil. Thus

      (grep '(quick brown) '(the quick brown fox))
returns
      (the quick brown fox)

Test your program on the following examples

Example 1A, 1B, and 1C:

      (grep '(fox) '(the quick brown fox))
      (grep '(quick brown) '(the quick brown fox))
      (grep '(quick fox) '(the quick brown fox))

2. Write a LISP function (using only the core LISP functions), called grep2, which extends the grep program by matching the pattern against any sublist in the text. The function will return the sublist that the pattern has matched, or nil if there is no match.

Example 2A:

      (grep2 '(computer science) 
             '(a i (artificial intelligence) (a domain of computer science) is fun))
returns
      (a domain of computer science)

Example 2B:

      (grep2 '(a) '(a i (a domain of computer science) is fun))
returns
      (a i (a domain of computer science) is fun)
Notice in this example, that while there are actually two matches for the pattern (a), the grep2 returned the encompassing match.

Example 2C: (grep2 '(a) '(artificial intelligence (a domain of computer science) is (a little bit) fun)) The function can return

      (a domain of computer science) 
or
     (a little bit)
its your option.

3. Write a LISP function grep3 that extends grep2 by adding meta-patterns. The following meta-patterns are to be used.
_ The underscore matches any atom.
+ The plus symbol matches one or more occurrences of the previous expression.
? The question mark symbol matches zero or one occurrences of the previous expression.
~ The tilde symbol matches anything but the previous expression.

Example 3A:

      (grep3 '(of _ science) '(a i  (a domain of computer science) is fun))
returns
      (a domain of computer science) 

Example 3B:

      (grep3 '(system ? science) '(a i (a domain of computer science) is fun))
returns
      (a domain of computer science) 

Example 3C:

     (grep3 '(a domain _ + science) 
             '(a i (a domain of computer science) is fun))
returns
      (a domain of computer science) 

Example 3D:

      (grep3 '(a domain ~) '(a i (a domain of computer science) is fun))
returns
      (a i (a domain of computer science) is fun)
Note: that '(a domain ~) does not match '(a domain of computer science), but does match '(a i ...).

4. Write a LISP function grep4 that extends grep3 by adding additional meta-patterns.
( The forward parentheses, matches the beginning of a sublist.
) The ending parentheses, matches the end of a sublist.

Example 4A:

      (grep4 '(a i ( _ + ) is fun) 
             '(a i (a domain of computer science) is fun))
returns
      (a i (a domain of computer science) is fun)

Note You may only use constructs covered in class and recitation. You may not use any global variables or looping constructs. Try to minimize the number of arguments to your functions. You may use auxiliary functions in this assignment.

In order to get your program working, you may assume that you will NOT encounter 2 meta symbols that refer to previous symbols next to another. I.e. You will not encounter the sequence (A ~ +) [both the ~ and + refer to previous symbols], but you may encounter (A _ + ) [only the + refers to a previous symbol]. Extra credit may be given for these are special cases if you wish to tackle once you get your basic grep functions working.

  1. (A ~ +)   indicates a symbol sequence of non As,
  2. (A ~ ~)   this is the 'NOT of the NOT of A' or just A itself - a redundant expression that will never be used,
  3. (A + ~)   a sequence of 1 or more As followed by a non A, Note that
             (grep4 '(A ~ +) '(A A A C))
    
    would return a match even though there is a sequence of 1 or more A's. This is becuase the C at the end matches the pattern of a sequence of 1 or more non-A's.
  4. (A ? +)   a sequence of 0 or more As, the same as ( A ? ),
  5. (A ? ~)   a sequence of 0 or more As followed by a Non A, however this is subject to interpretation.

Submit a sample execution showing the values your programs return at least for the test cases (examples) indicated above, along with your commented program code, and the answers to the questions asked above. Your programs should be written in good style. In LISP, a comment is any characters following a semicolon (;) on a line. Provide an overall comment explaining your solutions. Furthermore, every function should have a header comment explaining precisely what its arguments are, and what value it returns in terms of its arguments. In addition, you should use meaningful variable names. Finally, the physical layout of the code on the page is very important for making LISP programs readable. Make sure that you use blank lines between functions and indent properly. Programming style will be a major consideration in grading the assignments in this class.

FAQ

>   I have a question regarding part c of homework 1. By previous
> "expressions", do you mean ALL the symbols before the meta-character? For
> does ( _ _ + ) match
> only words of even length, or length two or more.
Only the previous expression,  thus ( _ _ + maches
any string of 2 or more words.

> For grep2 - grep4,
> is it possible to have a sublist inside the sublist in a text?
> For example,
>     (grep2 '(a) '(c (d c (a b)) d))
>
Yes it is.

> My grep2 returns (a) for the following call:
> (grep2 '(a) '(d c (a j) e ((a) f)))
> Is this acceptable? Or, should it return (a j)?
> The result shown above is because my grep2 returns a sublist of the last
> match for the case of two or more matches of the same level. For example,
> (grep2 '(a) '(d (a b) g (o a)))  returns (o a).
> I know this is acceptable according to the specification; however, I am not
> sure of the one on the top.
I believe it should return (a j), but I dont have the assignment
in front of me. Review the assignment for the exact interpretation.

> The example for problem 2 says that grep2 should return a  match that
> encompasses another.  What about the situation where matches have different
> depths, but neither match encompasses the other.
>
> (grep2 '(B C) '( ((A B C)) (B B C) ))
>
> Is it alright to return (A B C) in this case?
Yes.

> The example for problem 2 in the assignment says that grep2 should return a
> match that encompasses another.  What about this situation, where matches
> are on different levels but neither encompasses the other?
>
> (grep2 '(B C) '( ((A B C)) (B B C) ))
>
> Is it alright to return (A B C)?
When there are 2 matches, one iniside the other, it should
return the encompassing match. e.g. ( b b c ( a  b c )).  Otherwise it
doesnt matter.

>   I have the following questions regarding hw1.
> 1. Do null lists match null lists?
Yes.
> 2. Does (a ~) match '()?
No. It maches any symbol not an a.
> 3. Does () sublist operator match sublists that are of different levels
>    i.e. does '(a i (is fun) ) match '(a i ((is fun)) ) ?
No. it must be an exact match.

> When we write grep2, grep3, grep4, can we assume that the previous
> functions are loaded?  For example, can we just use the function grep
> inside the grep2 function or do we have to modify it grep2 so that we
> don't need to load grep before using grep2.

I would think so.

> Can you help me clarify the following cases ?
> 1: (a b c ~) match (a b) ? it should because (c ~)
> match with ( ), or we don't consider ( ) as symbol?

This does not match as ( c ~) matches any symbol that is
not c. Thus ( a b c ~ ) matches any pattern that has an a
followed by a b followed by any non c character.

> In your instructions, you state that "_  The underscore matches any atom"
> I know this can mean that the _ is nothing but does it also mean that it
> is limited to replace only one word (one atom).

Yes. For the homework "_" matches any single atom.
I'm not sure what you mean when you say the _ is nothing, it can
only mean nothing if it is followed by the ? symbol, meaning
zero or more occurrences of the preceeding symbol.

> What should 
>    (grep3 '(computer system ? Science) 
>        '(a I (a domain of computer science) is fun)) 
>  and 
>    (grep3 '(computer _ ? science) '(a I (a domain of computer science) is fun))
> return? Thanks.

(grep3 '(computer system ? science)
	'(a I (a domain of computer science) is fun)) and (grep3
should return
	( a domain of computer science)
	as system ? matches 0 occurrences of system.
(grep3 '(computer _ ? science)
	'(a I (a domain of computer science) is fun))
matches part of the inner list, and should return
	(a domain of computer science)
the ? matches 0 occurrences of the _ symbol.

> For question 2, it state that the program should work even in the sublist 
> in the text,but do you also consider the "sublist of sublist"?
> for example:
>   (grep2 '(computer science)
     '(a i (aritificial intelligence) ( a domain of (computer science)) is fun))
> should the program return
>     ( a domain of (computer science))
> or
>     (computer science) ??

It should return
	(computer science)

> If
>       (grep3 '(of _ science) '(a I (a domain of computer science) is fun))
> returns (a domain of computer science),
> My question is that since the underscore ( _ ) matches any atom, then
> would
>    (grep2 '(domain _ science) '(a I (a domain of computer science) is fun))
> return (a domain of computer science)
> because the underscore would match one atom, either "of" or "science".

(grep3 '(domain _ science) '(a I (a domain of computer science) is fun))
would return nil, because if "_" matches "of", (domain of science)
it does not match (a domain of computer science); if "_" stands for "computer",
(domain computer science) still does not match (a domain of computer
science). The pattern has to match a consecutive portion of the text.

The _ symbol matches a single symbol in the text, thus
	(grep3 '(domain _ science) '(a I (a domain of computer science) is fun))
does not match anything but
	(grep3 '(domain _ _ science)
		'(a I (a domain of computer science) is fun))
would match the sublist
	( a domain of computer science)
becuase the first _ matches "of" and the second matches "computer".
	(grep3 '(domain _ + science)
would also match ( a domain of computer science).

> For the ~, it matches anything but the previous expression. Does it only
> match atoms or can it match a sublist? (grep3 '(a domain ~) '(a (b) c))
> would return what? 
I suppose it would match and return
	(a (b) c)