The world's leading A.I. powered trading network

# Data Surprise

I wanted to share an interesting quantitative research experience we have just been through. This is a lesson in precision and deep thinking if you wish to be successful in research and trading.

Olga joined PsyQuation as a data scientist about 6 weeks ago.

The first problem she was given was to explore is the effect of round numbers. In general, people like round numbers. We like numbers ending with a 0 or 5, for example  0, 0.5, 10, 30, 55, etc. Using data from retail fx traders we wanted to check if traders actually trade more often at five-minutes intervals?

## Data for analysis

We analyzed data for one of the most popular traded instruments – EURUSD (3 years from July 2014 to September 2017). The table includes information about the opentime and closetime of the each trade.

You can see from the chart below there are peaks in the number of opened trades in 5 minutes intervals. Vlad and I were quietly thinking what has Olga discovered, could this be our Nobel Prize moment. Its at times like this that one has to ask some serious questions. After all King Solomon says there is nothing new under the sun, so why were we discovering such an amazing behavioural effect that nobody else was publishing? This is when Olga’s real work started.

We wanted to see how the spread, i.e. the difference between the bid and ask behaved and found something similar to our opening trades finding. The plots show that average spreads are at their maximum at “round minutes”. On the 0 minute average difference is 0.00065 and on 30 minutes – 0.00047.

Then we checked different hours as perhaps we were focused on an hour with an anomaly; the peaks persisted. Crack the Champagne, Nobel baby here we come ?.

We then noticed a very interesting phenomenon when we dropped the scale to second time intervals, every 0 second in a minute cycle saw the spread increasing and decreasing on the 1st second each minute. We wrote a basic trading program to take advantage of this anomaly but we couldn’t make money in our backtests. Something’s wrong here….

## Additional data analysis

The next step was checking the data from other sources.  We took tick data from TrueFX for each month of 2017 and from Axi database and we didn’t find the same price behaviour, the plot was thickening, why were we observing these peaks at five-minute intervals? Perhaps we had an error in our calculations or maybe Tenfore (Morningstar) the reputable data source we use for our quant models has a special way of dealing with data. We investigated both and couldn’t find any answers.

## Problem Identification

To help find the cause of the problem we did a data source comparison and BINGO we found the source of the problem:

## Conclusion

If this story exhausted you I can assure you that Olga and to a lesser degree Vlad found it far more exhausting. The mistake came from the most unlikely source. PsyQuation very very occasionally works with a 3rd party on research projects. We are under pressure to deliver a signal service which we had hoped to launch in January. The signals we are looking at using require a large amount of computational power using tick data. To help us setup our database in the best possible way to handle size of data and speed we used an experience quant with a Ph.D. in Computer Science and more than 5yrs research experience at one of the largest banks in the world quant division. I think we all felt we were in safe hands.

Here is the mistake. Dr Smarty Pants setup the api to download the data at all time scales in the same database table. The ticks, 1 min, 5 min, 30 min, 1hr were all sitting in the database causing all the drama.

There are many take aways from this experience. A couple of stand outs are: Communication and always be inquisitve. As a researcher always understand your data, know all about its source and if you find a surprising result don’t become complacent. Investigate further think why are you finding something that nobody else is finding.