 # Calculate Stock Market Returns Using SQL II In the previous post we looked at stock market investment strategies. It looked like we'd hit on a winning method: buy \$100 of shares every time the S&P 500 falls 2% or more during a day. This provided the best return on our money.

As hinted, there are problems with this method though. Let's find out what these are and see if we can mitigate them. We'll use SQL magic to do this, including a look at the model clause.

First, let's review how feasible the 2% buy rule is. Our previous check for this was basic. How many times did this happen between the start of 2005 and the end of 2014?

We were working to a monthly budget. With one \$100 transfer to our broker a month, we could buy 120 times. Our analysis of daily 2% falls stopped when we found this happened 109 times.

This is too simple. To be able to buy every time the rule triggers, the drops need to be evenly spread.

Is this really the case?

Let's check again. This time we'll break it down by year.

To do this, use trunc again to convert each date to the first of the year. Then group by this and count the occurrences:
```  select trunc(quote_date, 'y'), count(*)
from   sp_quotes
where  ((close-open)/open)  < -0.02
group  by trunc(quote_date, 'y')
order  by trunc(quote_date, 'y');

TRUNC(QUOTE_DATE,'Y')   COUNT(*)
--------------------- ----------
01 Jan 2007                   11
01 Jan 2008                   38
01 Jan 2009                   22
01 Jan 2010                    9
01 Jan 2011                   20
01 Jan 2012                    3
01 Jan 2013                    2
01 Jan 2014                    4

8 rows selected ```
That's not even at all!

There were no two percent daily falls in either 2005 or 2006. And over a third of these happened in 2008!

This leads to a couple of problems.

The first is psychological. If we had started this method at the beginning of 2005 it would have been a long wait before we bought anything.

How long exactly?

To find this we need the first day there was a 2% fall. This is the minimum quote_date. So we can find it with:
```  select min(quote_date)
from   sp_quotes
where  ((close-open)/open)  < -0.02;

MIN(QUOTE_DATE)
---------------
27 Feb 2007  ```
Nearly 26 months. It's difficult to keep faith in a strategy that buys nothing! It's too tempting to give up and do something else.

The next three years had the opposite problem: too many falls. The market hit a 10 plus year low in early 2009. Not many people would be confident to keep buying in this situation.

The second problem is practical. If there's a time when the total number of possible purchases is greater than the number of transfers in, we're over budget. Depending on our means, we may be unable to buy every time the alert triggers.

We would have been over budget several times in this period. Over a third of the possible purchases were in 2008 alone. By the end of 2011 we would have spent \$1,600 more than our transfers in. Possibly more if we had spent some of the allotted money in 2005-6 instead of transferring it to a broker.

How does this affect the gains?

Let's compare two scenarios. In both we'll transfer \$100 at the start of each month to a trading account. If there's a 2% daily fall we buy. If we have no money left however, we buy nothing.

In the first case let's spend everything we have saved since the last purchase. In the second we'll only spend \$100.

How do we write the SQL to calculate the returns?

Let's start with the transfers in. There were 120 months in this period. So we need that many deposits. Let's use the connect by level row generation technique to create this number of rows.

For each of these we want to return 100 (our transfer amount) and the first of the month. We can use the add_months(<date>, <months>) function to supply the dates. This does what is says on the tin: adds the given number of months to the date you pass and returns the result.

Our starting date is 1 Jan 2005. We can use rownum minus one to provide the number of months to add. This gives us:
```  select transfer_date,
amount
from (
100 amount
from   dual
connect by level <= 120
)```
Now we need to figure out how many shares we plan to buy and on which dates.

For the "spend everything" method, this is the sum of the money we have saved. This is the total of all the transfers in after the last purchase up to the current date.

To find this, we can start with the query in the previous post to find the two percent drops:
```  select * from (
select quote_date,
lead(open) over (order by quote_date) purchase_price,
100/lead(open) over (order by quote_date) shares,
(close-open)/open day_return,
100 purchase
from   sp_quotes
)
where  day_return < -0.02;```
We don't know how much we'll be spending each time. So we need to remove the shares and amount columns. To get these values we want to find all the transfers in between the current and previous falls. To make this easier, let's add the previous purchase date to this query.

We can do this in two passes. First we identify the days we want to buy. This is the day after each fall. So we need the next trading day. To get this, copy the lead() function from purchase_price. Just replace open with quote_date in lead().

The next pass finds the previous purchase date. We want to look backwards in our results to do this. The analytic lag() helps here. This is the opposite of lead(). It reverses down the result set instead of looking forward.

There's a small problem here though. For the first transaction, there's no prior purchase date. So lag() returns null.

To overcome this, we can make use of the optional parameters for lag(). The second parameter is the offset. This is how many rows to look back. We want to keep this at the default of one. The third is more interesting. This is the default value it returns if it doesn't find a row. For example, the first row in a result set has no previous. By providing 1 Jan 2005 for this we solve this issue.

Our query to find days we want to buy is then:
```  select purchase_date, purchase_price, day_return,
lag(purchase_date, 1, date'2005-01-01') over (order by quote_date) prev_purchase
from (
select quote_date,
lead(open) over (order by quote_date) purchase_price,
lead(quote_date) over (order by quote_date) purchase_date,
(close-open)/open day_return
from   sp_quotes
)
where  day_return < -0.02```
We now have our two queries. One to calculate our transfers into the broker account. The other to find all the dates we plan to buy. To ease readability, let's place both of these in CTEs again. We'll call them transfer_dates and purchase_dates respectively.

Let's assume that transfers to the broker account arrive before the purchase. Therefore the total we spend is the sum of all deposits made after the previous (potential) purchase up to and including the current purchase date:
```  select purchase_date,
sum(amount) purchase,
sum(amount) / max(purchase_price) shares
from   transfer_dates
join   purchase_dates
on     transfer_date <= purchase_date
and    transfer_date > prev_purchase
group  by purchase_date ```
All that remains now is to calculate the value of our holdings and return. We can reuse the final_close query and formulas from the previous post to do this. The finished query is:
```  with final_close as (
select close
from   sp_quotes
where quote_date = date'2014-12-31'
), transfer_dates as (
select transfer_date,
amount
from (
100 amount
from   dual
connect by level <= 120
)
), purchase_dates as (
select purchase_date, purchase_price, day_return,
lag(purchase_date, 1, date'2005-01-01') over (order by quote_date) prev_purchase
from (
select quote_date,
lead(open) over (order by quote_date) purchase_price,
lead(quote_date) over (order by quote_date) purchase_date,
(close-open)/open day_return
from   sp_quotes
)
where  day_return < -0.02
)
select round(
((( sum(shares)*max(close) ) - sum(purchase)  )/ sum(purchase)  ) * 100,
2
) pct_gain,
to_char(sum(shares)*max(close), 'fm\$999,999.00') value,
sum(purchase) total_spend
from (
select purchase_date,
sum(amount) purchase,
sum(amount) / max(purchase_price) shares,
prev_purchase
from   transfer_dates
join   purchase_dates
on     transfer_date <= purchase_date
and    transfer_date > prev_purchase
group  by purchase_date , prev_purchase
) s, final_close;```
How much money does this make us?
```    PCT_GAIN VALUE                                    TOTAL_SPEND
---------- ------------ ---------------------------------------
56.72 \$18,335.92                                     11700```
Hmmm. That's the worst return so far!

Why is this?

Our first purchase was on 27 Feb 2007. At this point the index was near an all-time high. We spent \$2,600 at this price. This was also our largest transaction. This means we spent the most at the market's peak. This is the opposite of what we want!

To make matters worse, we missed many buying chances near the bottom in 2008. There were months with multiple large daily falls. We could only buy after the first of these each month. So we weren't able to buy low. Even when we did, it was for relatively small amounts.

Clearly this method doesn't work. We're saving while the market is climbing. This can lead to us spending a lot when there's a tiny fall near the peak.

Let's look at the second case. Here we'll only purchase \$100 when the market falls. This avoids the "spend lots at the peak" issue.

Our budget is still fixed. If our total purchases will be greater than our total deposits, we do nothing.

To figure out whether we have any cash available, we need to keep a running total of the balance. For each transfer in date we add 100. When we buy we subtract 100 - unless this brings our balance below zero. If this happens, we do nothing.

For example, in the following table we have two transfers in (1 Oct and Nov). There's four possible purchase dates (3, 25 & 31 Oct and 1 Nov). We can't buy on 25 and 31 Oct though, because this will leave us with a negative balance. So we do nothing and the account stays at zero.
```  DATE        AMOUNT BALANCE
----------- ------ -------
01 Oct 2011    100     200
03 Oct 2011   -100     100
25 Oct 2011   -100       0
31 Oct 2011   -100       0
01 Nov 2011    100     100
01 Nov 2011   -100       0```
Calculating the balance like this using a spreadsheet is easy. Simply add the amount to the previous balance. Then return the greatest of this and zero. If date, amount and balance are the columns A, B and C respectively and row 1 is the transfer in on 1 Oct, the formula for cell C2 (bolded) is:
`  =greatest(C1 + B2, 0)`
The general version of this for field C<N> is:
`  =greatest(C<N-1> + B<N>, 0)`
At first glance this is hard to do in SQL. Oracle has something that mimics spreadsheet calculations however. The model clause.

To use it, we want a table that looks like the one above. This has a series dates with positive amounts for deposits and negative for purchases. We can then calculate the balance using the model clause.

Our transfer_dates and purchase_dates queries are good starting points. Our purchase amounts are now fixed. So we can add an amount column to purchase_dates. This has the value minus one hundred.

We no longer want to join these queries however. We want to merge them instead. The set operator union is perfect for this:
```  select transfer_date quote_date, amount, null purchase_price
from   transfer_dates
union all
select quote_date, amount, purchase_price
from   purchase_dates```
This gives the dates and the amounts. Now comes the fun part - using model to calculate the cash available!

First, we need to define how we're going to reference rows. We do this with the dimension by clause.

An easy way to do this is to assign each row a number. We can do this with another analytic function row_number(). We pass to this the order in which we want Oracle to assign the values.

We're working through our rows by date, from oldest to newest. Therefore we're ordering by date.

Next we need to consider what happens if a deposit and buy happen on the same day. We're assuming the monies arrive before the purchase. So we need to order by amount from highest to lowest.

This gives us:
`  dimension by ( row_number() over (order by quote_date, amount desc) rn )`
Next is the measures clause. This lists the columns we want our query to return. These are the date, amount, balance and purchase price. Just list these like so:
`  measures ( quote_date, amount, 0 balance, purchase_price )`
Note balance doesn't exist in our original tables. We need to define how Oracle assigns values to it. We do this in the rules clause.

To figure out what we need to put here, first let's restate our formula. For each row, the balance is the balance of the previous row plus the current amount. If this is less than zero, return zero:
`  balance [ current row ] = greatest ( balance [ previous row ] + amount [ current row ] , 0 )`
We want this rule to apply to all rows. The any wildcard enables this. So balance [current row] becomes balance [any].

Next we need to find previous balance and current amount. The cv() (current value) function helps here. Specifying column[ cv() ] returns the value of the column for the current row. So the current amount is amount [ cv() ].

You can also use this function to access values from previous rows. Do this by subtracting the number of rows to look back from cv(). We need the previous balance. So this is balance [ cv() - 1 ].

So our rules formula is:
`  balance[ any ] = greatest ( balance[ cv()-1 ] + amount[ cv() ],  0 )`
We're not quite finished. The first row has no previous balance. Therefore balance [ cv() - 1 ] is null. Adding anything to null returns null. So this will return null for all the balances!

To get around this, we need to return zero when the balance is null. We can use nvl() for this:
`  balance[any] = greatest ( nvl( balance[cv()-1], 0 ) + amount[cv()],  0 )`
We now have our running total. Next we need to figure out how many shares we're going to buy. This is the transaction amount divided by the share price.

We have the prices in the purchase_dates query. And we know we're spending \$100. We can't simply divide 100 by these prices however. This gives us shares even if we have no money left!

We could calculate the number bought when the balance is greater than zero. This isn't right either though. Our balance formula works on the assumption the transaction happened. If we have \$100 left a purchase brings our cash to zero.

What we want is to set the purchase price to null if buying would leave us owing money.

We can do this using the model clause again. It can overwrite values for existing columns. We want to do this for any row where the balance would be less than zero.

We do this by checking whether the balance calculation above (without the greatest function) is negative. If it is, set the price to null. Otherwise return it:
```  purchase_price[any] =
case
when nvl(balance[cv()-1], 0) + amount[cv()] < 0 then null
else purchase_price[cv()]
end```
Finally, a bit of tidying up.

We've modeled purchases as negative amounts. Therefore our shares purchased and total spend will also be negative. We want these values to be positive.

An easy way to do this is return the absolute amounts. We can also do this in the model clause. Just set the amount to the abs() of its current value:
`  amount[ any ] = abs( amount[ cv() ] )`
The full model clause is then:
```  model
dimension by (row_number() over (order by quote_date, amount desc) rn)
measures (quote_date, 0 balance, amount, purchase_price)
rules (
balance[any] =
greatest (
nvl( balance[cv()-1] , 0) + amount[cv()],
0
),
purchase_price[any] =
case
when nvl(balance[cv()-1], 0) + amount[cv()] < 0 then null
else purchase_price[cv()]
end,
amount[any] = abs(amount[cv()])
)```
So our complete query for the "buy \$100 if the market falls 2% or more and we have money available" rule is:
```  with final_close as (
select close
from   sp_quotes
where quote_date = date'2014-12-31'
), transfer_dates as (
select transfer_date,
amount
from (
100 amount
from   dual
connect by level <= 120
)
), purchase_dates as (
select quote_date, purchase_price, day_return,
lag(quote_date, 1, date'2005-01-01') over (order by quote_date) prev_purchase,
-100 amount
from (
select quote_date,
lead(open) over (order by quote_date) purchase_price,
(close-open)/open day_return
from   sp_quotes
)
where  day_return < -0.02
), dts as (
select transfer_date quote_date, amount, null purchase_price
from transfer_dates
union all
select quote_date, amount, purchase_price
from purchase_dates
)
select round(
((( sum(shares)*max(close) ) - sum(purchase)  )/ sum(purchase)  ) * 100,
2
) pct_gain,
to_char(sum(shares)*max(close), 'fm\$999,999.00') value,
sum(purchase) total_spend
from (
select quote_date,
amount/purchase_price shares,
case
when purchase_price is not null then amount
end purchase
from (
select * from dts
model
dimension by (row_number() over (order by quote_date, amount desc) rn )
measures (quote_date, 0 balance, amount, purchase_price)
rules (
balance[any] =
greatest (
nvl(balance[cv()-1], 0) + amount[cv()],
0
),
purchase_price[any] =
case
when nvl(balance[cv()-1], 0) + amount[cv()] < 0 then null
else purchase_price[cv()]
end,
amount[any] = abs(amount[cv()])
)
)
), final_close;

PCT_GAIN VALUE                                    TOTAL_SPEND
---------- ------------ ---------------------------------------
83.15 \$17,033.03                                      9300```
The second best return for regular purchases. We've only spent three quarters of our available cash though. So we may be leaving money on the table.

We could fiddle with the percentage that triggers our purchase rule. Lowering it means we'll buy more often. We can tweak the trigger point until we find one that gives the best final value. We risk falling for the Texas sharpshooter fallacy however, making our model fit the data.

Instead, let's take a step back. We've only looked at a ten-year slice of prices. Let's increase our time horizon and ask:

How often have two percent daily drops happened in the S&P's history?

In doing so we'll find a bigger flaw in the 2% fall rule.

The current S&P 500 was founded on 4 March 1957. Let's load sp_quotes with all the prices from then until 31 Dec 2014. You can download these here.

That's nearly 58 years. If the past ten years were representative we would expect 600-700 daily falls of 2% or more.

Is this the case?
```  select count(*)
from   sp_quotes
where  ((close-open)/open)  < -0.02;

COUNT(*)
----------
291```
Nowhere near! The past ten years are less than 20% of the full time period. These accounted for well over a third of the large daily falls however. The recent recession really has been the worst in a lifetime, at least in terms of stock market fluctuations.

Still, it's worth having a quick check how much money we would make. Just update the transfer_dates query to generate 694 rows, starting at 1 March 1957 and the default value for purchase_dates.prev_purchase to this as well. Do so and we find:
```    PCT_GAIN VALUE                                    TOTAL_SPEND
---------- ------------ ---------------------------------------
525.83 \$182,116.99                                    29100```
How does this compare to the simple "buy \$100 when the market opens each month" method?
```    PCT_GAIN VALUE                                    TOTAL_SPEND
---------- ------------ ---------------------------------------
1261.8 \$945,088.77                                    69400```
Ouch. We'd nearly be millionaires with the dumb method. We're not even a fifth of the way there with the 2% approach.

We did fall for the Texas sharpshooter fallacy. We picked a technique and our data just happened to fit it.

There are a few lessons here:
• When looking for patterns, ensure your sample is representative of the full data set
• Ensure you have two datasets. One to come up with a hypothesis. Another to test it!
• Who needs spreadsheets when you have the model clause?
SQL is the perfect tool for helping you analyze your data. It can only work with the data you provide it though. In the era of Big Data this mistake is common. It's important to validate your dataset before you draw any conclusions from it!

What do you think? Is there any hope for saving the 2% strategy? Are there any methods better than simply buying a fixed amount each month? How else could we calculate the gains?

I'd love to hear from you. Let us know in the comments!