Day Trading with a Data Scientist Mindset: (3) Evaluating an Edge

18 min readJul 19, 2024

How can we quickly evaluate a hypothesis about a potential trading edge to determine whether it is potentially viable? Before going to the much more strenuous effort of doing full tuning and back-testing on a strategy, it is important to have some confidence that the underlying edge has the potential to be profitable.

Now that we have defined what day trading success looks like and have collected a wealth of trading data, it is time to start developing trading strategies. Every trading strategy relies on what is called a trading edge: a statistical data insight that, over the course of numerous trades, is expected to be profitable. It is important to differentiate this from success percentage: a trading strategy that produces winning trades less than half the time can still be profitable — so long as the wins are substantially larger than the losses.

This article provides a methodology for rapid evaluation of potential trading edges. The idea is to programmatically identify all cases where a trade would be triggered and compute the average percentage gain if every trade was executed. We aren’t trying to prove that a strategy based on this edge will be profitable; we just want to weed out bad potential edges before devoting much time to them. Here, we’ll look at two trading edges and take them through this methodology: starting with an overly simple definition of the edge and making some iterative refinements to find a viable trading edge.

There are many aspects of trading that are ignored in this evaluation — and which are called out at the end of this article. But, in employing this methodology, we should be able to identify viable trading edges with less than an hour of effort. That’s important, because we are likely going to need to sift through dozens of potential edges before finding a few that are worth developing further.

Higher Highs; Higher Lows

The basic idea of the “Higher Highs; Higher Lows” trading strategy is that if a stock is going up in a consistent way, where both the highs and lows are increasing, then it is likely to continue doing so for a while. The figure below shows a couple examples of this sort of trending. You can visualize the stock motion as bouncing around within a tunnel that is going upward.

The figure above shows two examples of Higher Highs; Higher Lows being triggered for June 3, 2024 (which happens to be a very successful day for this strategy).

The figure above shows the same two examples, but with candlesticks from the start of the trading day until 10:30. In both cases the upward momentum continued for a few candlesticks before tapering off.

How can we determine whether this has the potential to be used in a trading strategy? We will use the historical data we downloaded in our previous article to simulate its behavior over a large window of time (at least 6 months).

First, let’s define an initial attempt at a trigger for this strategy. I’ve invented some notation to enable the equations to be more compact. I put a number in parentheses after a field name to indicate how far in past or future it is from our decision time. So, if our decision time is 12:00 (meaning that it’s noon and we are deciding whether to buy a stock right now):

High(-3) is the High value for the 11:45 bar — covering 11:45:00 to 11:49:59
Open(0) is the Open price for the 12:00 bar — which is about what we’d expect to buy the stock if the rule triggered
Open(2) is the Open price for the 12:10 bar — which is what price we’d expect to sell the stock if the rule triggered and we sold after 10 minutes

A key thing to remember when building our triggers is to make decisions only using data from the past: we need to be sure that our equations exclusively use values with negative values in the parentheses.

Using this notation, we will trigger the “Higher Highs; Higher Lows” strategy if all the following are true

High(-3) < High(-2)
High(-2) < High(-1)
Low(-3) < Low(-2)
Low(-2) < Low(-1)

For our initial strategy, we’ll just hold onto the stock for 10 minutes and then sell it. So, our profit % will be:

100 * (Open(2) / Open(0) — 1)

Before we look at the implementation, let’s take a look at an example of these in a table of bar values.

The horizontal double line indicates that our decision time is 9:50. We use the values in green to determine whether this is an instance of Higher Highs; Lower Lows. The values in blue indicate whether the trade would have been successful.

Now, let’s implement that with Python so we can see how it performs. The Pandas DataFrame class allows us to shift columns up and down and perform calculations on these shifted columns very efficiently. It may seem awkward at first to use these native Pandas operations, but it becomes second nature after a while. Here’s how we can realize the above logic over a time window of 6 months (December 2023 through May 2024):

results = list()  # List of DataFrame
for ts in trading_dates(timestamp('2023-12-01'), timestamp('2024-05-31')):
    print(ts)
    intraday_details = read_intraday_details(ts)
    for symbol in get_symbols():
        symbol_details = extract_symbol_details(intraday_details, symbol)
        symbol_details['decision_time'] = symbol_details['timestamp'] + pd.Timedelta('00:05:00')
        shift2 = symbol_details.shift(2)  # Bars covering decision_time - 15:00 to decision_time - 10:01
        shift1 = symbol_details.shift(1)  # Bars covering decision_time - 10:00 to decision_time - 5:01

        # Compute percent changes at each shift
        symbol_details['high_pct_2'] = 100 * (shift1['high'] / shift2['high'] - 1)
        symbol_details['high_pct_1'] = 100 * (symbol_details['high'] / shift1['high'] - 1)
        symbol_details['low_pct_2'] = 100 * (shift1['low'] / shift2['low'] - 1)
        symbol_details['low_pct_1'] = 100 * (symbol_details['low'] / shift1['low'] - 1)

        # See if we trigger and compute profit%
        symbol_details['hhhl_trigger'] = ((symbol_details['high_pct_1'] > 0)
                                          & (symbol_details['high_pct_2'] > 0)
                                          & (symbol_details['low_pct_1'] > 0)
                                          & (symbol_details['low_pct_2'] > 0))
        symbol_details['buy_price'] = symbol_details.shift(-1)['open']  # We purchase at the open of the next bar
        symbol_details['sell_price'] = symbol_details.shift(-3)['open']  # ... and sell 10 minutes later
        symbol_details['profit_pct'] = 100 * (symbol_details['sell_price'] / symbol_details['buy_price'] - 1)
        hits = symbol_details[symbol_details['hhhl_trigger'] & symbol_details['profit_pct']]
        results.append(hits)
results_df = pd.concat(results)
results_df = results_df.round(4)
results_df.to_csv(temp_files_path('hhhl_results_first_try.csv'), index=False)

How does it perform? Not well, sadly. Here are some key stats gleaned from the resulting CSV:

Over the 125 trading days, we got triggered 715,931 times (an average of 5727 per day). That’s well above the 3–5 trades per day I want to make.
The average profit is -0.003% — just short of breaking even

We have to do better! The first thing that jumps out at me when looking at the data is that, in the vast majority of cases, the stock is barely moving upwards. Let’s filter the results to only include cases where the High and Low values go up an average of 1% per bar:

Now we have 394 triggers over the 125 days
The average profit is 0.04% — better but probably not the foundation for a good trade strategy

Looking through the cases, I notice that in many of the losing cases, the Close value is significantly lower than High value, meaning that, during this bar, the price went up, but then came way down. Let’s filter the result to only include cases where the High is at most 0.25% higher than the Close:

Now we have 136 triggers over the 125 days
The average profit is 0.16% — that should be enough for a profitable trading strategy

Finally, I know that the initial bar of the day (covering 9:30:00 to 9:34:59) tends to be very chaotic, and frequently doesn’t fit the patterns I try to find in the data. Let’s also filter out the data for decision-time 9:40 (which relies on the 9:30 and 9:35 bars):

Now we have 80 triggers over the 125 days
The average profit is now 0.22%! That’s good enough to warrant further investigation.

Now seems like a good time to talk about magic numbers and dangers of over-fitting. When refining a strategy, it is tempting to strive to set thresholds to exclude problematic cases (or include particularly good cases). I limit myself to a handful of thresholds and try to limit tuning to “choosing a sensible threshold” using the following possibilities:

0.1%: Something bigger that needs to be bigger than 0
0.25%: This is my target average gain per trade, so I use it when considering likely gain for a trade
0.5%: This is my wishful average gain per trade, so I use it when I think the likelihood of success is lower
1%: I use this to indicate a big movement in price
2%: I frequently use this as the most I’m willing to lose; or the most I expect to gain

These thresholds can help determine whether there is likely to be a viable trading strategy for a candidate approach. And, if things look promising, we can do some more precise tuning later. But first, we need to see whether our results with more recent dates match up with our historical set. If the results for the recent dates are significantly lower, that would indicate that we have overfit our data — which would not be a huge surprise as we’ve weeded 700k candidate triggers down to just 80.

Here’s a code snippet to use our updated filtering on data from 6/1/2024 to 7/18/2024.

gain_threshold = 4  # The sum of the four gains must be at least 4%
max_high_close_gap = 0.25  # The last high must be at most 0.25% higher than the close
look_forward = 3  # Take profit in look_forward * 5 minutes
results = list()  # List of DataFrame
for ts in trading_dates(timestamp('2024-06-01'), timestamp('2024-07-18')):
    print(ts)
    intraday_details = read_intraday_details(ts)
    for symbol in get_symbols():
        symbol_details = extract_symbol_details(intraday_details, symbol)
        symbol_details['decision_time'] = symbol_details['timestamp'] + pd.Timedelta('00:05:00')
        shift2 = symbol_details.shift(2)  # Bars covering decision_time - 15:00 to decision_time - 10:01
        shift1 = symbol_details.shift(1)  # Bars covering decision_time - 10:00 to decision_time - 5:01

        # Compute percent changes at each shift
        symbol_details['high_pct_2'] = 100 * (shift1['high'] / shift2['high'] - 1)
        symbol_details['high_pct_1'] = 100 * (symbol_details['high'] / shift1['high'] - 1)
        symbol_details['low_pct_2'] = 100 * (shift1['low'] / shift2['low'] - 1)
        symbol_details['low_pct_1'] = 100 * (symbol_details['low'] / shift1['low'] - 1)
        symbol_details['score'] = (symbol_details['high_pct_1'] + symbol_details['high_pct_2']
                                   + symbol_details['low_pct_1'] + symbol_details['low_pct_2'])
        symbol_details['high_to_close'] = 100 * (symbol_details['high'] / symbol_details['close'] - 1)

        # See if we trigger and compute profit%
        symbol_details['hhhl_trigger'] = ((symbol_details['high_pct_1'] > 0)
                                          & (symbol_details['high_pct_2'] > 0)
                                          & (symbol_details['low_pct_1'] > 0)
                                          & (symbol_details['low_pct_2'] > 0))
        symbol_details['buy_price'] = symbol_details.shift(-1)['open']  # We purchase at the open of the next bar
        symbol_details['sell_price'] = symbol_details.shift(-3)['open']  # ... and sell 10 minutes later
        symbol_details['profit_pct'] = 100 * (symbol_details['sell_price'] / symbol_details['buy_price'] - 1)
        hits = symbol_details[symbol_details['hhhl_trigger']
                              & (symbol_details['score'] >= gain_threshold)
                              & (symbol_details['high_to_close'] <= max_high_close_gap)
                              & (symbol_details['time'] != '09:40:00')
                              & symbol_details['profit_pct']]
        results.append(hits)
results_df = pd.concat(results)
results_df = results_df.round(4)
results_df.to_csv(temp_files_path('hhhl_results_second_try.csv'), index=False)

And the results look good:

We have 20 triggers over 31 trading days
The average profit is 0.21%. That’s just about what we found in our historical set.

So, we have the start of a viable trading strategy here. Though we are getting less than a trade per day, so perhaps it could be one strategy within a larger trading plan. Also, when a stock continues to gain, we are getting multiple triggers for it. In a future article we’ll further refine the strategy and do a more rigorous simulation of its performance.

One of the first trading strategies I developed looks for cases where a gain in one stock frequently leads to a gain in a second stock. I call this strategy “Fast Follower” because, as soon as the first stock (the independent stock) moves significantly, we want to immediately buy the second stock (the dependent stock). In some cases, there is some obvious relationship between the two stocks — but in most, it isn’t clear. But, for a day trader, the important thing is that these relationships exist and can be exploited.

Finding these relationships is non-trivial, but Pandas provides the functionality that does all the heavy lifting. The basic idea is this:

We will look at the historical data (5-minute bars) for the 10 previous trading days
For each pair of stocks, compute
Count (the number of times the independent stock went up at least 1% in a 15-minute window) and
Mean_gain_pct (the average resulting gain in the dependent stock)

We can then use these results (we’ll call it the Average Gains table) to determine which pairs of stocks have a profitable relationship.

The code below shows how we can implement this in Python, looking at the last two weeks of May 2024.

lookback_window = 10  # Look at the previous 10 trading days when finding correlations (two weeks)
trigger_gain_threshold = 1.0  # Trigger if the stock goes up 1% during the trigger window
effect_window = 3  # Sell after 15 minutes

def compute_trigger_and_effect_df(intraday_details):
    symbol_results = list()  # List of DataFrame (one for each symbol)
    for symbol in get_symbols():
        s_hist = extract_symbol_details(intraday_details, symbol)
        s_hist['trigger_pct'] = 100 * (s_hist['close'] / s_hist.shift(2)['open'] - 1)
        s_hist['gain_pct'] = 100 * (s_hist.shift(-effect_window)['close'] / s_hist.shift(-1)['open'] - 1)
        s_hist = s_hist.dropna()  # Drop rows where we don't have gain_pct (usually beginning or end of day)
        symbol_results.append(s_hist)
    symbol_results_df = pd.concat(symbol_results)
    symbol_results_df = symbol_results_df.reset_index(drop=True)
    return symbol_results_df

# Collect the testing results for each day
results = list()  # List of DataFrame
for ts in trading_dates(timestamp('2024-05-17'), timestamp('2024-05-31')):
    print(ts)

    # Compute the training values
    train_details = read_intraday_details(previous_trading_date(ts, offset=lookback_window),
                                          previous_trading_date(ts))
    training_set = compute_trigger_and_effect_df(train_details)

    # Compute expected gain for each trigger
    triggers = training_set[training_set['trigger_pct'] >= trigger_gain_threshold]
    cross_join = triggers.merge(training_set, on='timestamp')
    average_gains = cross_join.groupby(['symbol_x', 'symbol_y']).agg({'gain_pct_y': ['count', 'mean']})
    average_gains = average_gains.reset_index(drop=False)
    average_gains.columns = average_gains.columns.to_flat_index().str.join('_')  # Flatten the multi-index
    average_gains = average_gains.rename(columns={'symbol_x_': 'independent_symbol',
                                                  'symbol_y_': 'dependent_symbol',
                                                  'gain_pct_y_count': 'count',
                                                  'gain_pct_y_mean': 'mean_gain_pct'})
    average_gains = average_gains[average_gains['mean_gain_pct'] > 0.5]
    average_gains['date'] = date_string(ts)
    results.append(average_gains)
results_df = pd.concat(results).round(4)
results_df = results_df.sort_values(by=['mean_gain_pct'], ascending=False)
results_df.to_csv(temp_files_path('average_gains_2w.csv'), index=False)

If we sort resulting table on mean_gain_pct (descending) we see the following:

Here we see that ENPH and PODD had some huge trading gains during May, but the number of times it happened is low — suggesting that these relationships are not going to be very predictive. Let’s filter our set down to something manageable, by only including cases with a count of at least 5, and an average gain of at least 0.5%.

So, in our initial trading strategy, for each trading day we will:

Compute the Average Gains table for the preceding 10 trading days
Filter the Average Gains table to include entries where count >= 5 and average gain >= 0.5%
This filtered table will guide our trades: If the independent stock goes up 1%, buy the dependent stock and hold for 15 minutes
Look in the historical data for our trading day for Fast Follower trades:
For each of the independent stocks, find all cases where it went up 1% in a 15-minute window
For each of these, compute the 15-minute gain for any dependent stocks associated with it in our Average Gains table. These are the trades suggested by our strategy.

For the Python implementation it would be tempting to have multiple nested loops to gather these results, but we can accomplish the same thing much more efficiently using Pandas merge operations. Here’s how we can realize the above logic over a time window of 6 months (December 2023 through May 2024):

lookback_window = 10  # Look at the previous 10 trading days when finding correlations (two weeks)
trigger_gain_threshold = 1.0  # Trigger if the stock goes up 1% during the trigger window
mean_gain_threshold = 0.5  # Only accept pairs with average gain of 0.5% per trade in the training
min_count = 5  # Only accept pairs if there were at least 5 instances in the training set (every other day)
effect_window = 3  # Sell after 15 minutes

def compute_trigger_and_effect_df(intraday_details):
    symbol_results = list()  # List of DataFrame (one for each symbol)
    for symbol in get_symbols():
        s_hist = extract_symbol_details(intraday_details, symbol)
        s_hist['trigger_pct'] = 100 * (s_hist['close'] / s_hist.shift(2)['open'] - 1)
        s_hist['gain_pct'] = 100 * (s_hist.shift(-effect_window)['close'] / s_hist.shift(-1)['open'] - 1)
        s_hist = s_hist.dropna()  # Drop rows where we don't have gain_pct (usually beginning or end of day)
        symbol_results.append(s_hist)
    symbol_results_df = pd.concat(symbol_results)
    symbol_results_df = symbol_results_df.reset_index(drop=True)
    return symbol_results_df

# Collect the testing results for each day
results = list()  # List of DataFrame
for ts in trading_dates(timestamp('2024-05-17'), timestamp('2024-05-31')):
    print(ts)

    # Compute the training values
    train_details = read_intraday_details(previous_trading_date(ts, offset=lookback_window),
                                          previous_trading_date(ts))
    training_set = compute_trigger_and_effect_df(train_details)

    # Compute expected gain for each trigger
    triggers = training_set[training_set['trigger_pct'] >= trigger_gain_threshold]
    cross_join = triggers.merge(training_set, on='timestamp')
    average_gains = cross_join.groupby(['symbol_x', 'symbol_y']).agg({'gain_pct_y': ['count', 'mean']})
    average_gains = average_gains.reset_index(drop=False)
    average_gains.columns = average_gains.columns.to_flat_index().str.join('_')  # Flatten the multi-index
    average_gains = average_gains.rename(columns={'symbol_x_': 'independent_symbol',
                                                  'symbol_y_': 'dependent_symbol',
                                                  'gain_pct_y_count': 'count',
                                                  'gain_pct_y_mean': 'mean_gain_pct'})

    # Select the best ones (i.e., ones where the average gain meets our goals)
    average_gains = average_gains[(average_gains['mean_gain_pct'] >= mean_gain_threshold)
                                  & (average_gains['count'] >= min_count)
                                  & (average_gains['independent_symbol'] != average_gains['dependent_symbol'])]

    # Find the trades and results
    test_details = read_intraday_details(ts)  # the day following the training set
    test_details['decision_time'] = test_details['timestamp'] + pd.Timedelta('00:05:00')
    test_details['time'] = test_details['decision_time'].map(lambda d: time_string(d))

    testing_set = compute_trigger_and_effect_df(test_details)
    triggers = testing_set[testing_set['trigger_pct'] >= trigger_gain_threshold]
    cross_join = triggers.merge(testing_set, on='timestamp')
    cross_join = cross_join.rename(columns={'symbol_x': 'independent_symbol',
                                            'symbol_y': 'dependent_symbol',
                                            'decision_time_y': 'decision_time',
                                            'date_x': 'date',
                                            'time_y': 'time',
                                            'trigger_pct_x': 'trigger_pct',
                                            'gain_pct_y': 'gain_pct'})
    filter_df = average_gains[['independent_symbol', 'dependent_symbol']]
    test = pd.merge(cross_join, average_gains, on=['independent_symbol', 'dependent_symbol'])
    test = test[['decision_time', 'independent_symbol', 'dependent_symbol', 'date', 'time',
                 'trigger_pct', 'count', 'mean_gain_pct', 'gain_pct']]
    results.append(test)
results_df = pd.concat(results).round(4)
results_df.to_csv(temp_files_path('fast_follower_results_2w_first_try.csv'), index=False)

How does this initial implementation perform? Once again, not well. Here are some key stats gleaned from the resulting CSV:

Over the 125 trading days, we got triggered 55,476 times (an average of 444 per day).
The average profit is -0.0064% — just short of breaking even and worse than our first potential edge

We can fuss with setting thresholds to try to improve our results, but let’s get some more data to work with. We’ll collect the following sets of variables:

Recent history for the independent stock: Intuitively, I’d expect a stock consistent rising over the 15 minutes to be a better indicator than a stock that went way up 10 minutes ago and is now tailing off. We’ll measure these values:

% gain over the last 15 minutes
% gain over the last 10 minutes
% gain over the last 5 minutes

Recent history for the dependent stock: Intuitively, if a stock is falling, I wouldn’t expect it to be a good Fast Follower. We’ll measure the same values as we do for the independent.

Characteristics of the relationship throughout the training set: Intuitively, for all the cases where we get triggered, how well does the dependent stock perform? We’ll measure the following:

Count: How many times were trades triggered
Mean gain %: For those cases, what was the average resulting gain of the dependent stock
Gain 0.0%: What portion of those cases broke even?
Gain 0.5%: What portion of those cases gained at least 0.5%?
Gain 1.0%: What portion of those cases gained at least 1.0%?

We can extend our previous implementation to collect these additional values:

lookback_window = 10  # Look at the previous 10 trading days when finding correlations
trigger_gain_threshold = 1.0  # Trigger if the stock goes up 1% during the trigger window
mean_gain_threshold = 0.5  # Only accept pairs with average gain of 0.5% per trade in the training
min_count = 5  # Only accept pairs if there were at least 5 instances in the training set (every other week)
effect_window = 3  # Sell after 15 minutes

def compute_trigger_and_effect_df(intraday_details):
    symbol_results = list()  # List of DataFrame (one for each symbol)
    for symbol in get_symbols():
        s_hist = extract_symbol_details(intraday_details, symbol)
        s_hist['trigger_last_15_pct'] = 100 * (s_hist['close'] / s_hist.shift(2)['open'] - 1)
        s_hist['trigger_last_10_pct'] = 100 * (s_hist['close'] / s_hist.shift(1)['open'] - 1)
        s_hist['trigger_last_05_pct'] = 100 * (s_hist['close'] / s_hist.shift(0)['open'] - 1)
        s_hist['gain_pct'] = 100 * (s_hist.shift(-effect_window)['close'] / s_hist.shift(-1)['open'] - 1)
        s_hist['gain_00'] = s_hist['gain_pct'] >= 0  # Did we break even?
        s_hist['gain_05'] = s_hist['gain_pct'] >= 0.5  # Did we gain at least 0.5%
        s_hist['gain_10'] = s_hist['gain_pct'] >= 1.0  # Did we gain at least 1.0%
        s_hist = s_hist.dropna()  # Drop rows where we don't have gain_pct (usually end of day)
        symbol_results.append(s_hist)
    symbol_results_df = pd.concat(symbol_results)
    symbol_results_df = symbol_results_df.reset_index(drop=True)
    return symbol_results_df

results = list()  # List of DataFrame
for ts in trading_dates(timestamp('2023-12-01'), timestamp('2024-05-31')):
    print(ts)
    train_details = read_intraday_details(previous_trading_date(ts, offset=lookback_window),
                                          previous_trading_date(ts))  # 2 weeks
    test_details = read_intraday_details(ts)  # the day following the training set

    # Compute the training values
    training_set = compute_trigger_and_effect_df(train_details)

    # Compute expected gain for each trigger
    triggers = training_set[training_set['trigger_last_15_pct'] >= trigger_gain_threshold]
    cross_join = triggers.merge(training_set, on='timestamp')
    average_gains = cross_join.groupby(['symbol_x', 'symbol_y']).agg({'gain_pct_y': ['count', 'mean'],
                                                                      'gain_00_y': ['mean'],
                                                                      'gain_05_y': ['mean'],
                                                                      'gain_10_y': ['mean']})
    average_gains = average_gains.reset_index(drop=False)
    average_gains.columns = average_gains.columns.to_flat_index().str.join('_')  # Flatten the multi-index
    average_gains = average_gains.rename(columns={'symbol_x_': 'independent_symbol',
                                                  'symbol_y_': 'dependent_symbol',
                                                  'gain_pct_y_count': 'count',
                                                  'gain_pct_y_mean': 'mean_gain_pct',
                                                  'gain_00_y_mean': 'gain_00',
                                                  'gain_05_y_mean': 'gain_05',
                                                  'gain_10_y_mean': 'gain_10'})

    # Select the best ones (i.e., ones where the average gain meets our goal
    average_gains = average_gains[(average_gains['mean_gain_pct'] >= mean_gain_threshold)
                                  & (average_gains['count'] >= min_count)
                                  & (average_gains['independent_symbol'] != average_gains['dependent_symbol'])]

    # Find the trades and results
    testing_set = compute_trigger_and_effect_df(test_details)
    triggers = testing_set[testing_set['trigger_last_15_pct'] >= trigger_gain_threshold]
    cross_join = triggers.merge(testing_set, on='timestamp')
    cross_join = cross_join.rename(columns={'symbol_x': 'independent_symbol',
                                            'symbol_y': 'dependent_symbol',
                                            'date_x': 'date',
                                            'time_x': 'time',
                                            'trigger_pct_x': 'trigger_pct',
                                            'gain_pct_y': 'gain_pct'})
    filter_df = average_gains[['independent_symbol', 'dependent_symbol']]
    test = pd.merge(cross_join, average_gains, on=['independent_symbol', 'dependent_symbol'])
    test = test[['timestamp', 'independent_symbol', 'dependent_symbol', 'date', 'time',
                 'trigger_last_05_pct_x', 'trigger_last_10_pct_x', 'trigger_last_15_pct_x',
                 'trigger_last_05_pct_y', 'trigger_last_10_pct_y', 'trigger_last_15_pct_y',
                 'count', 'mean_gain_pct', 'gain_00', 'gain_05', 'gain_10',
                 'gain_pct']]
    results.append(test)
results_df = pd.concat(results).round(4)
results_df.to_csv(temp_files_path('fast_follower_results_training_set.csv'), index=False)

Now we refine our criteria and see if we can improve the performance:

Initial score is -0.0064%
First, remembering that the 9:30 bar is frequently problematic, filter out cases where decision time is 9:45 (the only time that uses the 9:30 bar). New score: 0.0036%
Make sure our dependent stock has been going up in the last 10 minutes (trigger_last_05_pct_y >= 1.0). New score is 0.056%
Make sure our independent stock has been going up in the last 10 minutes (trigger_last_05_pct_x >= 1.0). New score is 0.098%
Choose pairs that gain 0.5% are successful 2/3 of the time (gain_05 >= 0.6666). Final score is 0.2%. That is promising! Also, we have 270 cases over the 125 trading days

What is a good way to choose where to place thresholds, without going fully into parameter tuning and model building? I look for correlations between each numeric column and our gain_pct column (which we are trying to maximize), selecting the most highly correlated column and then trying a reasonable threshold for it. Whenever possible, I select a threshold using the “magic numbers” guideline for above. If, after a few iterations of computing correlations and thresholding the most correlated value, we get an average gain of 0.2% or better, we have a potentially profitable edge. When evaluating potential edges, it is important to remember that our goal is to quickly determine whether further analysis is warranted — not to optimize anything. We’ll get to that later.

Now, let’s try this against the most recent month of trading data and see if these thresholds hold up to testing:

We have 87 triggers over 27 trading days
The average profit is 0.13%. That’s lower than our training set, but still promising. Interestingly, if I eliminate the filter for gaining 0.5%, the average gain is 0.2%.

Abstractions

As mentioned above, in the interest of being able to evaluate our trading edges quickly, we are ignoring many important aspects of trading. In future articles, we will take all of them into consideration:

Could we actually make all these trades? On many days, we will have more triggers than it makes sense to trade. If so, how do we determine which trades to make, and which to ignore? How many simultaneous trades should we make, and how much of our capital should be allocated to each trade?
How do we enter and exit the trade? Do we just make a market buy and our decision time, and make a market sell 10 (or 15) minutes later? Do we set Stop Limit or Take Profit orders to potentially exit the trades early?
Should we reversing the logic to also make short trades? Frequently, a trading edge can also be configured to identify opportunities for going short, which can be very profitable when the market is bearish.
How do we consider the Ask/Bid spread? At any moment, the price to buy a stock will be higher than the price to sell the stock. For stocks that have high trading volume this spread is commonly less than 0.1%, but for smaller volume stocks and at times of high volatility, the spread can be much higher, sometimes more than 1%.

We will address the first two considerations in our next article.

Conclusion

In this article, we looked at the methodology I use to evaluate candidate trading edges — enabling us to quickly determine which trading edges are worth developing further. We looked at two promising edges: Higher Highs; Higher Lows and Fast Follower — and saw how we could refine the approaches to give a very rough estimation of their potential profitability.

In the next article, we will refine these trading edges into an actual trading strategy and simulate its running on our historical data.

Thanks for reading. If you’re interested in reading more, please subscribe here: https://findahappy.medium.com/subscribe

Here is the set of articles in this series:

Why Day Trade? Setting goals and expectations for a successful day trading approach.
Data Sets: Sources and Test versus Training. Accessing stock data from Yahoo Finance and Alpaca.
Evaluating an Edge. How to identify potential trading edges and predict their profitability
Building a trading plan and simulating its performance
Building an actual trading harness
Gotchas to look out for. In simulating trading plans it is easy to make little mistakes that make an edge appear much more lucrative than it really is.
More Complicated Trading Schemes. Shorting and arbitrage
Machine Learning:

For the Codebase, go here: https://github.com/WolfgangGreen/TraderBlog003

Day Trading with a Data Scientist Mindset: (3) Evaluating an Edge

Higher Highs; Higher Lows

Abstractions

Conclusion

Written by David Franklin

No responses yet