# Calculate Stock Market Returns Using SQL

What's the best strategy for investing in the stock market?

Simple - buy low and sell high. There's big a catch to this method. We don't know in advance what the highs and lows are! Only by looking back can we know the best times to buy and sell.

So which approaches can we use? And, more importantly, how much money can we make by following them?

In this post we'll use SQL to calculate stock market returns for different purchasing strategies. The purpose is to show how to write the queries. We'll see if we can learn anything about investing in the process.

Before we start, let's make some assumptions. I doubt people's ability to beat the market by selecting stocks. So we'll simulate returns by buying a (hypothetical) S&P 500 index tracker. To keep it simple, this charges no fees and matches the index exactly (if only such a fund existed!).

Let's assume we have \$100/month available to invest over ten years. This gives \$12,000 in total. We'll only purchase shares in this period (no selling).

Unfortunately my crystal ball is (still) broken. So I'll use quotes for the past ten years (1 Jan 2005 - 31 Dec 2014) for comparisons.

If you want to play along, create the following table:
```  create table sp_quotes (
quote_date date primary key,
open       number(12,6),
high       number(12,6),
low        number(12,6),
close      number(12,6),
volume     number(12,0),
);```
Then load it with daily quotes for the S&P 500 (^GSPC) from 1 Jan 2005 - 31 Dec 2014. You download these here.

Finally, I'm going to calculate returns using the total return formula:
`  (End Value - Initial Value) / Initial Value`
To benchmark strategies let's first calculate the returns for two scenarios:
• Invest all \$12,000 when the market first opens in January 2005. Make no further transactions.
• Make one purchase \$1,200 of shares each year. Do this on the lowest point for the year.
First up, what's the return if we had invested all our money when the market opened on 3 Jan 2005 (it was closed on the first and second) until it closed on 31 Dec 2014?

We can find this by getting the opening price on 3 Jan 2005 and the closing price at the end of 2014. I know that the S&P closed much higher than it opened in this period. So we can plug in maximum closing price for as end value and minimum opening price for initial value to the formula above.

This gives the following query:
```  select round((max(close) - min(open)) / min(open) * 100, 2) pct_gain,
to_char((1 + (max(close) - min(open)) / min(open)) * 12000, 'FM\$999,999.00') total
from   sp_quotes
where  quote_date in (date'2005-01-03', date'2014-12-31');```
Note - this query only gives the correct answer if we know that the market rose between the two dates we provide!

How much money did we make?
```    PCT_GAIN TOTAL
---------- ------------
69.89 \$20,386.49  ```
A gain just short of \$8,400. A decent return considering this includes the crash of 2008-09 and one of the worst recessions ever.

This strategy works on the assumption we have a (large) lump sum to invest. For most of us this isn't the case. We can only drip-feed money in. So we want a regular investment strategy.

Let's look at the returns from a (hypothetical) "timing the market" method.

If we're lucky enough to buy shares at their cheapest each year we're "buying low". Let's calculate our gains if we make a single \$1,200 purchase each year when the S&P is at its lowest that year.

To do this we need to group all the dates within a year together. We can easily do this using the function trunc(<date>, 'y'). This converts the supplied date to the 1 Jan in the same year. To find the cheapest price, group by this function and return the minimum low:
```  select trunc(quote_date, 'y') y, min(low)
from   sp_quotes
group  by trunc(quote_date, 'y')
order  by trunc(quote_date, 'y');```
Next we need to calculate how many shares this purchases. Do this by dividing \$1,200 by each min(low). Summing this up gives our total holding.

To find the final value of our shares, just multiply our total shares by the closing price on the last day.

We can find the final closing price with the following query:
```  select close
from   sp_quotes
where  quote_date = date'2014-12-31';```
I'm building the full query using the with clause (aka subquery factoring or common table expressions (CTEs)). This enables us to define "views" on-the-fly. We can then reference them the main query. This can improve the maintenance and readability of SQL. Each of the two queries above will go in their own CTE.

Putting it all together gives us:
```  with final_close as (
select close
from   sp_quotes
where quote_date = date'2014-12-31'
), share_purchases as (
select trunc(quote_date, 'y') y,
1200 / min(low) shares,
1200 purchase
from   sp_quotes
group  by trunc(quote_date, 'y')
)
select round(
((( sum(shares)*max(close) )- sum(purchase) )/ sum(purchase) ) * 100,
2
) pct_gain,
to_char(sum(shares)*max(close), 'fm\$999,999.00') value,
sum(purchase) total_spend
from   share_purchases, final_close;

PCT_GAIN VALUE                                    TOTAL_SPEND
---------- ------------ ---------------------------------------
90.93 \$22,911.25                                     12000```
Much better than the simple buy-and-hold approach!

Unfortunately, we can only know the cheapest price in a year once it's over. By this time it's too late to buy shares at that price!

These two strategies give us reference points. Now let's a take a look at simple passive investing methods we can actually follow. First, buying \$100 at the start of each month. Using this approach, we'll make our purchase when the market opens on the first trading day of each month.

To find the first day of each month we can use trunc again. If we pass 'mm' instead of 'y', it converts dates to the first of the corresponding month. So to return the quotes on these days, join the quote date to the result of month truncing, like so:
`  where quote_date = trunc(quote_date, 'mm')`
Again, we need to calculate how many shares we buy. This is 100 divided by the opening price. To confirm how much we're spending, I've also included a purchase column fixed at 100. This gives the following query.
```  select quote_date,
100 / open shares,
100 purchase
from   sp_quotes
where  quote_date = trunc(quote_date, 'mm')```
At this point we can realize a benefit of using the with clause. The query to calculate returns for monthly purchases is similar to the one for yearly buying. We just need to change how many shares we are buying when. To do this, we just replace share_purchases with the query above.

To validate our expenditure, I've also add the sum of the purchase column. This gives:
```  with final_close as (
select close
from   sp_quotes
where quote_date = date'2014-12-31'
), share_purchases as (
select quote_date,
100 / open shares,
100 purchase
from   sp_quotes
where  quote_date = trunc(quote_date, 'mm')
)
select round(
((( sum(shares)*max(close) )- sum(purchase) )/ sum(purchase) ) * 100,
2
) pct_gain,
to_char(sum(shares)*max(close), 'fm\$999,999.00') value,
sum(purchase) total_spend
from   share_purchases, final_close;

PCT_GAIN VALUE                                    TOTAL_SPEND
---------- ------------ ---------------------------------------
59.57 \$12,286.57                                      7700```
Hmmm. We've only spent \$7,700. Why is that?

The index is closed at weekends and public holidays. We need to find the first day the market was open each month. This is the minimum date within the month, not the first. We can find the opening days grouping by trunc(quote_date, 'mm') and finding the min(quote_date):
```  select min(quote_date)
from   sp_quotes
group  by trunc(quote_date, 'mm')```
To get the result we want, replace the where clause for share_purchases with this subquery:
```  with final_close as (
select close
from   sp_quotes
where quote_date = date'2014-12-31'
), share_purchases as (
select quote_date,
100 / open shares,
100 purchase
from   sp_quotes
where  quote_date in (select min(quote_date)
from   sp_quotes
group  by trunc(quote_date, 'mm'))
)
select round(
((( sum(shares)*max(close) )- sum(purchase) )/ sum(purchase) ) * 100,
2
) pct_gain,
to_char(sum(shares)*max(close), 'fm\$999,999.00') value,
sum(purchase) total_spend
from   share_purchases, final_close;

PCT_GAIN VALUE                                    TOTAL_SPEND
---------- ------------ ---------------------------------------
59.01 \$19,081.15                                     12000```
Now we've spent our whole budget. Our gains are notably worse than both methods we described previously however. It's ten percentage points worse than buy-and-hold (69.89) and thirty worse than timing the market (90.93).

Are there any other simple, regular investment approaches we could try?

We can't know at the time if the price on a given day is the lowest the market will be that week, month or year. We can identify local minimums though. If the market closes lower than it opens it's a daily low. If this fall is great enough it may also be a weekly or even monthly low.

What if we create a rule to buy the day after the index falls 2% or more?

Sticking with the total return formula, we can identify these days using the following equation:
`  ((close-open)/open)  < -0.02`
First, a sanity check for how viable this approach is. We've assumed 120 rounds of \$100 purchases. So we should check how often there is a two percent drop in a day. If there's significantly more than 120, we can't afford it. Notably less and we're leaving too much money on the table.

To find the number of days, just count how many rows meet the formula:
```  select count(*)
from   sp_quotes
where  ((close-open)/open)  < -0.02;

COUNT(*)
----------
109```
Eleven off. Close enough.

We can use the previous query to identify the days where there's a 2% or more drop. Next we need to find how much we would pay following this rule.

To find the daily fall, we have to wait until the market closes. Therefore we'll make our purchases the day after the S&P drops. This means we need to find the opening price for the next day (tomorrow's opening can be different from today's close).

We can use analytic functions to do this. Lead() enables us to fetch values from the next row in the results. We pass this the column we want values for. We want the opening price, so we're looking for lead(open). Next we need to tell Oracle which order to traverse the results. We do this with an over() clause. This takes an order by clause including the column we want to sort our data by. Here this is quote_date, giving:
`  lead(open) over (order by quote_date)`
We can then add this to our select statement.

It's tempting to replace count(*) in our earlier query with this function and the other values we want like so:
```  select quote_date,
lead(open) over (order by quote_date) purchase_price,
100/lead(open) over (order by quote_date) shares,
(close-open)/open day_return,
100 purchase
from   sp_quotes
where  (close-open)/open < -0.02```
This gives incorrect results however.

The where clause is processed before analytics. So in the query above, lead() only applies to rows where there's a 2% or more fall instead of all the rows. This means lead returns the opening price on the next day there's a big drop, not the day immediately after a fall!

This isn't what we want. We need the opening price on the next trading day. So we need to place the query above inside an inline view. The where clause goes outside this view:
```  select * from (
select quote_date,
lead(open) over (order by quote_date) purchase_price,
100/lead(open) over (order by quote_date) shares,
(close-open)/open day_return,
100 purchase
from   sp_quotes
)
where  day_return < -0.02;```
As before, we can now replace share_purchases in the full query. With this in place it's time to find our gains:
```  PCT_GAIN VALUE                                    TOTAL_SPEND
-------- ------------ ---------------------------------------
91.27 \$20,848.08                                     10900```
More than 91%! This is the best we've done so far. It works with information we have available and we have \$1,100 left over :)

This looks like a winning strategy. It's easy to do. Potentially we could automate the whole process. There's no guarantee it'll work in the future. Based on this analysis it's tempting to try however.

Before you rush out to do this we need to analyze this further however. There are big problems with this approach. We'll look at these in the next post. You can read this here.

*** Disclaimer. This post is intended to explore and understand use cases for SQL. It is not financial advice. Do not base your financial decisions on the musings of a SQL advocate! ***

#### Join the discussion

• JimS Sunday, November 1, 2015

interesting article Chris, thankyou