.. Created by Adam Cunnningham on Fri Jun 3 2016. .. Major rewrite on Thurs Oct 27 2016. .. Separated report description and hints, Weds Dec 14 2016. .. include:: isogrk3.rst **The Generation and Use of Random Numbers** ============================================ Report Description ------------------ **Exercise 1. Linear Congruential Generators** - Write a function ``rng`` that implements a linear congruential generator with the values a = 427419669081, m = 999999999989. - Write a function ``randu`` that implements a linear congruential generator with the values a = 2 :superscript:`16` + 3, m = 2 :superscript:`31`. - Test the linear congruential generators in your ``rng`` and ``randu`` functions, and the generator in **random.rand** for (i) uniformity, and (ii) lack of (coarse-scale) successive pair correlation. **Exercise 2. Estimate a Mean using Monte Carlo Integration** - Download/save the :download:`mystery <../Reports/mystery.pyc>` module from the website to the same directory where Jupyter notebook is being run. - Use Monte Carlo integration to find the mean of the function ``mysteryf`` in the mystery module by random sampling over the interval [0, 5]. **Exercise 3. Estimate an Area using Monte Carlo Integration** - Use Monte Carlo integration to estimate the area of the "flower" whose boundary has the equation :math:`r(\theta) = 2 + \cos(7\theta)` in polar coordinates. **Exercise 4. Bernoulli Random Variables** - Numerically estimate the probability of getting 29 hits in 100 swings in a baseball game if the probability of getting a hit on any one swing is 0.29 (and is independent of other swings). **Exercise 5. Sums of Random Variables** - Produce a histogram of the sums of pairs of uniformly distributed random numbers produced by **random.rand**. - Produce histograms for the sums of M-tuples of uniform random numbers for values of M greater than 2. - See if it is possible to shift and stretch the sums of M-tuples so that their distribution converges to something as M increases without bounds. - Plot the sums of M-tuples of non-uniform random numbers to see if these sums also converge to some limiting distribution. There is no need to do any kind of detailed analysis here. **Exercise 6. Stock market Model** - Implement the simple stock price model defined in class using a value for |mu| of 0.0 and |sigma| of 0.05 to find out what happens to an initial investment of 1 unit of money over 365 time steps. The model is: .. math:: s(t + 1) = (1 + \mu + \sigma \varepsilon) s(t)` where: - s(t) is the stock price at time t. - |mu| is an underlying rate of growth per step. - |epsiv| is a "stochastic" or random term with a normal distribution. - |sigma| is the volatility, the amplitude of the stochastic term. - Plot the changing price of several stocks over this period. - Plot a histogram of final stock prices after 365 steps using a large number of trials. - Estimate the "expected value" (average over all trials) of the investment after 1 year. - Calculate the likelihood of losing money. This is the proportion of trials where s(365) ends up less than 1. - Calculate the "most likely" value of the investment after 1 year, using an appropriate definition of "most likely". - Draw some conclusion about investing, or on the value of money. - Run another simulation using a more modest value for |sigma| of 1%. Describe how changing |sigma| affects the final distribution i.e. the histogram of final prices. Hints and Suggestions for the Report ------------------------------------ Linear Congruential Generator ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Testing for uniformity is done by plotting a histogram showing the distribution of the points in the interval [0, 1]. - Testing for pairwise correlation is done by plotting successive pairs :math:`(x_i, x_{i+1})` as points in the plane. - Use enough bins for the histogram to provide a clear idea of the shape of the distribution. - Use enough points that every bin in the histogram contains a large number - the more points the better. Finding the Mean of the "Mystery" Function ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Use an appropriately large number of samples generated using **rand**. - Scale the points returned by **rand** so they are uniform in the interval [0, 5], not [0, 1]. - Import the function mysteryf using the code: :code:`from mystery import mysteryf` - The formula for the mean of a function is: .. math:: \overline{f} = \dfrac{1}{n} \sum_{i=1}^{n} f(x_i) - This can be calculated using NumPy array operations and **mean** without needing "for" loops. - The mean can be estimated a few times to get an idea of how accurate the estimate is. Counting the Points Inside the "Flower" ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - The "flower" is contained in a bounding box [-3, 3] :math:`\times` [-3, 3] in the xy-plane. - Use **rand** to generate a large number of points, then scale the output of this function so that points (x, y) are uniform inside this box. An appropriate transformation is :math:`u \rightarrow 6u - 3`. - Use **hypot** and **arctan2** to find the values of r and |theta| corresponding to each (x, y) pair. - If a point (x, y) has polar coordinates (r, |theta|), then it is inside the curve if r < 2 + cos(7\ |theta|). Bernoulli Random Variable ~~~~~~~~~~~~~~~~~~~~~~~~~ - Treat each swing as a Bernoulli random number with p = 0.29. - Numerically estimate the probability by running a large number of trials, each of which consists of 100 swings. We want the proportion of trials with exactly 29 hits. - The most compact way to do this is to start with a 2D NumPy array of shape (100, ntrials) generated using **rand**. - A combination of operations applied directly to this array (to turn a uniform random number into a Bernoulli random number, for example), and an axis-dependent operation (for counting the hits per trial), will allow the result to be found without using any "for" loops. Generating Histograms of M-Tuple Sums ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Use **rand** to create a 2D NumPy array of shape (M, npts). - Sum all the M-tuples using **sum** as an axis-dependent operation i.e. by specifying which axis to sum over. Finding the "Right" Scaling ~~~~~~~~~~~~~~~~~~~~~~~~~~~ - The "right" scaling is a function of M. When divided by this scaling, the histograms produced for different values of M should converge to the same distribution as M increases without bound. - For each histogram generated for a different value of M, first subtract the mean (M/2) to shift the histogram so that it is centered on zero. - Use the *normed=True* keyword argument to **hist** to turn the histogram into a probability distribution. - Consider using the *histtype='step'* keyword argument when plotting multiple histograms on the same axes. This just plots the outline of a histogram, so it may be easier to see when convergence is happening. M-Tuples of Non-Uniform Random Numbers ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - Choose some other non-uniform probability density function, p(v). It can be any function you like, as long as both: - :math:`p(v) \geq 0` for all v - :math:`\int\limits_{-\infty}^{\infty} p(v) dv = 1`. - Construct a function g(u) so that v = g(u) has the pdf p(v). Remember from class that the steps to construct this function are: 1. Let :math:`u = \int p(v) dv` 2. Plug in your probability density function p(v), and integrate to find u as a function of v. 3. Solve for v as a function of u, given by v = g(u). - Sum M-tuples of large numbers of samples drawn from p(v) using v = g(u). Remember that using a NumPy 2D array of shape (M, npts) will allow this kind of summing to be performed very compactly. Plotting the Changing Stock Prices ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - The random |epsiv| term needs to be calculated separately for every step. - A "for" loop will be needed since the stock price each day is calculated from the previous day. - Prices :math:`\{s(0), s(1), ..., s(365)\}` should be saved either to a list or into an array. - Plot enough stock prices to see how much variation there is in the different stock trajectories. - Note that **plot** can be passed a 2D array to plot. When this is done, each column is plotted as a separate line. Plotting the Histogram of Final Prices ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - A large number of stock prices need to be tracked to get a good idea of the final distribution. - A 1D array can be used to store the current prices of the different stocks. These prices can all be updated together each day inside the "for" loop. - After the "for" loop is complete, this 1D array will contain the final price for every stock. Analyzing the Distribution of Final Prices ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ - The array of final stock prices can be used to find both the expected value, the likelihood of losing money, and the "most likely" value of the investment after 1 year.