MATH 494: Mathematical Foundations of Machine Learning

 

Assignment #6

 

(Due Wednesday, March 20, 2024)

 

 

This assignment is a little like the previous one. You will create a noisy dataset based on a linear combination pattern, and you will use multiple regression to discover the pattern. You will compare the results of regression that you write from scratch with the results obtained from the sklearn LinearRegression procedure.

 

·       Generate a dataset of 100 rows and 4 columns, all from the U(0, 1) distribution.

·       Obtain the fifth column (output/label) by computing the linear combination of the first four columns, where the coefficients are the four last digits of your USD ID#. (For instance, if your ID # ended with 2345, you would multiply the first column by 2, add the second column multiplied by 3, etc. If you have a zero among the digits, replace it with 9.)

·       Add a little noise to the fifth column: try adding a Gaussian with the mean of 0 and standard deviation of about one-fifth of the average of the four digits. Tweak, if needed.

·       Code the multiple regression using the normal equation of regression (the one that I was so excited about in class).

·       Display the resulting coefficients, which should be close to the “hidden” pattern.

·       Run the sklearn LinearRegression procedure (see example in the handout from class 18) on the very same dataset and compare the results with your own regression. Display (on the screen) a conclusion from the comparison: how close the results are to the actual pattern and to each other.