Step 1. Think of a way you can collect some numbers. These should be numbers that can collect yourself – no simulation, and no data from the internet! Things like the weights of shoes in your house, people on a train car, time between shots on goal in a hockey game. Choose something you like, and which you can measure in a relatively brief period of time.
Step 2. Think about the kind of statistical distribution that might represent this phenomenon! Write it down. Why does this distribution seem to match your data? is it constrained in some way?
Step 3. Collect some observations – aim for a good number, at least 15 or so.
Step 4. Type in these data and visualize them. Do they match your hypothesized distribution? How do you know?
Step 1. Return to your hypothesized distribution. Find its random number generation function (e.g. rbinom()
for the Binomial Distribution). Experiment with different parameters values to create data that looks close to your own data.
Step 2. Attempt to fit the same parameters by Maximum likelihood. If you distribution has two parameters, fix one and find the Maximum Likelihood value of the other. Stretch goal Try to fit both of them at once.
Step 3. Look at the help for ?MASS::fitdistr()
. Try to use it to find your distribution parameters.