recommendation-systems

Recall: We model the collection of user ratings of the four movies Dune, Oppenheimer, Barbie, Nimona as the set \(\{-1,0,1\}^4\) . One function that compares pairs of ratings is \[d_0: \{-1,0,1\}^4 \times \{-1,0,1\}^4 \to \mathbb{R}\] given by \[d_0 (~(~ (x_1, x_2, x_3, x_4), (y_1, y_2, y_3, y_4) ~) ~) = \sqrt{ (x_1 - y_1)^2 + (x_2 - y_2)^2 + (x_3 -y_3)^2 + (x_4 -y_4)^2}\]

Notice: there are (at most) \[(3 \cdot 3 \cdot 3 \cdot 3)\cdot (3 \cdot 3 \cdot 3 \cdot 3) = 3^8 = 6561\] many pairs of ratings. There are therefore lots and lots of real numbers that are not the output of \(d_0\).

Recall: RNA is made up of strands of four different bases that encode genomic information in specific ways.
The bases are elements of the set \(B = \{\texttt{A}, \texttt{C}, \texttt{U}, \texttt{G}\}\). The set of RNA strands \(S\) is defined (recursively) by: \[\begin{array}{ll} \textrm{Basis Step: } & \texttt{A}\in S, \texttt{C}\in S, \texttt{U}\in S, \texttt{G}\in S \\ \textrm{Recursive Step: } & \textrm{If } s \in S\textrm{ and }b \in B \textrm{, then }sb \in S \end{array}\] where \(sb\) is string concatenation.

Pro-tip: informal definitions sometime use \(\cdots\) to indicate “continue the pattern”. Often, to make this pattern precise we use recursive definitions.

Name	Domain	Codomain	Rule	Example
\(rnalen\)	\(S\)	\(\mathbb{Z}^+\)	\[\begin{aligned} &\textrm{Basis Step:} \\ &\textrm{If } b \in B\textrm{ then } \textit{rnalen}(b) = 1 \\ &\textrm{Recursive Step:}\\ &\textrm{If } s \in S\textrm{ and } b \in B\textrm{, then }\\ &\textit{rnalen}(sb) = 1 + \textit{rnalen}(s) \end{aligned}\]	\[\begin{aligned} rnalen(\texttt{A}\texttt{C}) &\overset{\text{rec step}}{=} 1 +rnalen(\texttt{A}) \\ &\overset{\text{basis step}}{=} 1 + 1 = 2 \end{aligned}\]
\(basecount\)	\(S \times B\)	\(\mathbb{N}\)	\[\begin{aligned} &\textrm{Basis Step:} \\ &\textrm{If } b_1 \in B, b_2 \in B \textrm{ then} \\ &basecount(~(b_1, b_2)~) = \\ &\begin{cases} 1 & \textrm{when } b_1 = b_2 \\ 0 & \textrm{when } b_1 \neq b_2 \\ \end{cases}\\ &\textrm{Recursive Step:}\\ &\textrm{If } s \in S, b_1 \in B, b_2 \in B\\ &basecount(~(sb_1, b_2)~) = \\ &\begin{cases} 1 + \textit{basecount}(~(s, b_2)~) & \textrm{when } b_1 = b_2 \\ \textit{basecount}(~(s, b_2)~) & \textrm{when } b_1 \neq b_2 \\ \end{cases} \end{aligned}\]	\[\begin{aligned} basecount(~(\texttt{A}\texttt{C}\texttt{U}, \texttt{C})~) = \end{aligned}\]
“\(2\) to the power of”	\(\mathbb{N}\)	\(\mathbb{N}\)	\[\begin{aligned} &\textrm{Basis Step:} \\ &2^0= 1 \\ &\textrm{Recursive Step:}\\ &\textrm{If } n \in \mathbb{N}, 2^{n+1} = \phantom{2 \cdot 2^n} \end{aligned}\]
“\(b\) to the power of \(i\)”	\(\mathbb{Z}^+ \times \mathbb{N}\)	\(\mathbb{N}\)	\[\begin{aligned} &\textrm{Basis Step:} \\ &b^0 = 1 \\ &\textrm{Recursive Step:}\\ &\textrm{If } i \in \mathbb{N}, b^{i+1} = b \cdot b^i \end{aligned}\]

Ratings encoding

In the table below, each row represents a user’s ratings of movies: (check) indicates the person liked the movie, (x) that they didn’t, and \(\bullet\) (dot) that they didn’t rate it one way or another (neutral rating or didn’t watch). Can encode these ratings numerically with \(1\) for (check), \(-1\) for (x), and \(0\) for \(\bullet\) (dot).

Person	Dune	Oppenheimer
\(P_1\)		\(\bullet\)

\(P_2\)

\(P_3\)

\(P_4\)	\(\bullet\)

\(You\)

Defining sets

To define a set using roster method, explicitly list its elements. That is, start with \(\{\) then list elements of the set separated by commas and close with \(\}\).

To define a set using set builder definition, either form “The set of all \(x\) from the universe \(U\) such that \(x\) is ..." by writing \[\{x \in U \mid ...x... \}\] or form “the collection of all outputs of some operation when the input ranges over the universe \(U\)" by writing \[\{ ...x... \mid x\in U \}\]

We use the symbol \(\in\) as “is an element of” to indicate membership in a set.

Example sets: For each of the following, identify whether it’s defined using the roster method or set builder notation and give an example element.

Defining functions ratings

Recall our representation of Netflix users’ ratings of movies as \(n\)-tuples, where \(n\) is the number of movies in the database. Each component of the \(n\)-tuple is \(-1\) (didn’t like the movie), \(0\) (neutral rating or didn’t watch the movie), or \(1\) (liked the movie).

Consider the ratings \(P_1 = (-1, 0, 1, 0)\), \(P_2 = (1, 1, -1, 0)\), \(P_3 = (1, 1, 1, 0)\), \(P_4 = (0,-1,1, 0)\)

Which of \(P_1\), \(P_2\), \(P_3\) has movie preferences most similar to \(P_4\)?

One approach to answer this question: use functions to quantify difference among user preferences.

For example, consider the function \(d_0: \{-1,0,1\}^4 \times \{-1,0,1\}^4 \to \mathbb{R}\) given by \[d_0 (~(~ (x_1, x_2, x_3, x_4), (y_1, y_2, y_3, y_4) ~) ~) = \sqrt{ (x_1 - y_1)^2 + (x_2 - y_2)^2 + (x_3 -y_3)^2 + (x_4 -y_4)^2}\]

Defining functions more examples

Ratings encoding

Defining sets

Defining functions ratings