We have constructed a simple statistical model to measure the linear association between Tether printing and 7 day returns on BTC.

This post has been inspired by the 2019 narrative - first popularized by a paper by Griffin et al. - that the newly printed Tether (USDT) is used to manipulate the value of Bitcoin.

We will naively explore this claim using a very simple statistical procedure and check if there is a linear relationship between Tether printing and 7 day Bitcoin returns. 7 days has been picked under the assumption that it would take as long to execute the trades which might have an effect on price.

This post is divided in two parts:

1.) Obtaining relevant datasets from Viewly One

2.) Building a statistical model in R

For this analysis we need historic prices of cryptoassets (Bitcoin and Ethereum) as well as Tether printing and burning events. Both can be obtained from Viewly One Pro dashboard.

The daily OHLC data can be obtained from `CMC Daily`

table. As the name implies, the data originates from CoinMarketCap.

The data can be exported as Excel, JSON or csv file. For our analysis we will need the `.csv`

.

The Tether events are available in `Stablecoin Events`

table. Please make sure to filter by Tether brand and only select `Print`

and `Burn`

events. Unfiltered table has over 300 million events, and exporting it would take a long time.

Now that we have the data, its time to analyze it. For this post I will be using R and the brms library - a convenient abstraction layer over Stan.

Stan is a state-of-the-art platform for statistical modeling and high-performance statistical computation. Thousands of users rely on Stan for statistical modeling, data analysis, and prediction in the social, biological, and physical sciences, engineering, and business.

First, lets load the data.

```
d.tether <-
read_csv('tether.csv') %>%
select(Time, Amount, Symbol, EventType, Protocol, USDValue) %>%
mutate(USDValue = ifelse(EventType == 'Mint', USDValue, USDValue * -1)) %>%
arrange(Time)
knitr::kable(d.tether %>% tail())
```

Time | Amount | Symbol | EventType | Protocol | USDValue |
---|---|---|---|---|---|

2020-02-20 14:17:33 | 3.0e+08 | USDT | Burn | TRC20 | -3.0e+08 |

2020-02-22 17:23:03 | 2.0e+07 | USDT | Mint | TRC20 | 2.0e+07 |

2020-02-24 10:51:32 | 6.0e+07 | USDT | Mint | ERC20 | 6.0e+07 |

2020-02-27 15:59:24 | 1.5e+07 | USDT | Mint | TRC20 | 1.5e+07 |

2020-03-05 09:40:18 | 2.0e+08 | USDT | Mint | ERC20 | 2.0e+08 |

2020-03-05 15:49:47 | 6.0e+07 | USDT | Mint | ERC20 | 6.0e+07 |

Second, lets compute some basic features, such as 1 and 7 day historic and future BTC returns (in market cap terms).

```
d.btc <-
read_csv('btc.csv') %>%
arrange(Date) %>%
select(Date, BTC=MarketCap) %>%
# lets peek 1,7 days into the past and into the future
mutate(
H1 = (BTC / lag(BTC, 1)),
H7 = (BTC / lag(BTC, 7)),
F1 = (lead(BTC, 1) / BTC),
F7 = (lead(BTC, 7) / BTC),
) %>%
# cut away rows for which we don't have features
slice(8:(n()-8)) %>%
drop_na()
knitr::kable(d.btc %>% tail())
```

Date | BTC | H1 | H7 | F1 | F7 |
---|---|---|---|---|---|

2020-02-21 | 176587087363 | 1.0082213 | 0.9399798 | 0.9976986 | 0.8959415 |

2020-02-22 | 176180696548 | 0.9976986 | 0.9778039 | 1.0271456 | 0.8905402 |

2020-02-23 | 180963233540 | 1.0271456 | 0.9996993 | 0.9724506 | 0.8633742 |

2020-02-24 | 175977808526 | 0.9724506 | 0.9965599 | 0.9681311 | 0.9197817 |

2020-02-25 | 170369581558 | 0.9681311 | 0.9217212 | 0.9442970 | 0.9413862 |

2020-02-26 | 160879489024 | 0.9442970 | 0.9162436 | 0.9960157 | 0.9933389 |

Afterwards, we need to merge the pricing table and the Bitcoin table. I will do a full join here, such that the no-Tether events are included in the model. We also center the variables for faster HMC convergence and partial interpretability. Lastly, the Tether printing and Future return variables are imbalanced, so I’ve used Synthetic Minority Oversampling Technique (SMOTE) to deal with that.

Before we start, we have to note the violations of linear modelling asumptions.

As in all financial series, the return variables are auto-correlated, which will likely result in our residuals violating the homoskedasticity assumption.

Second, the variable distribution is not gaussian in nature, so we will use StudenT instead - this will make our model to ignore fat tail events (which we cannot infer with simple statistical methods anyway).

The differenced Tether issuance distribution also exhibits some heavy skewness.

The above mentioned issues do indeed show up - although it doesn’t seem too bad - in our best model (m2), which will be introduced shortly.

Using the pairs plot we can see that there are no strong bi-variate linear relationships. We will look at the multi-variate relationships next.

First is the *benchmark* model, where the future 7 days return are just the avereage future 7 days return.

\[ F7 \sim {\displaystyle t_{\nu }(\mu, \Sigma )} \\ \mu \sim N(0, 0.1) \\ \Sigma \sim Cauchy(0, 1) \]

The model is predictive slighly negative future returns, as the average returns in the 2 year “bear market” were negative.

Next up, we add the Tether growth as predictor.

\[ F7 \sim 1 + Tether \]

And finally, lets try conditioning on historic 7 day returns as well.

\[ F7 \sim 1 + Tether + H7 \]

I have fitted more models (ie. conditioning on year, forcing interactions and adding covariance matrix between some params), but none seemed to work as well as the simplest ones. Of the simple ones, m2 is the best, although it is beating the benchmark model only slighly (using WAIC).

```
Error in object@.MISC$stan_fit_instance$unconstrain_pars(pars) :
Exception: Variable Intercept missing (in 'model169ca4083a647_9ad482385900adb1bfbbf1df1aa5ce68' at line 31)
```

Model | Type | ELPD | ELPD_SE | LOOIC | LOOIC_SE | WAIC | RMSE | Performance_Score |
---|---|---|---|---|---|---|---|---|

m2 | brmsfit | 776.22 | 28.71 | -1552.43 | 57.41 | -1552.46 | 0.1 | 0.59 |

m.bench | brmsfit | 776.65 | 28.90 | -1553.30 | 57.80 | -1553.31 | 0.1 | 0.50 |

m1 | brmsfit | 776.05 | 28.87 | -1552.11 | 57.74 | -1552.14 | 0.1 | 0.47 |

The posterior distribution of the Tether models covers a wide range around 0, with \(p(Tether | \theta) > 0\) of the winning model being 64.1%. Furthemore, an astute reader might have noticed that the \(\sigma\) (model’s epistemic uncertainty) MAP is greater than Tether parameter MAP, rendering the results less singificant. I think this means that there is no strong “evidence” that - at least in a simple linear model - Tether issuance is associated with 7 day future returns of Bitcoin.

In the preprocessing step I’ve used random oversampling to correct for F7 imbalance due to non-stationatiry (in bull market future returns are more often positive, and in bear market more often negative), and Tether has been printed only on 148 of 1146 days. The results are biased, because Tether seems to be printed more in bull market, and not printed or burned in bear market.

HasTetherPrinted | FutureDirection | n |
---|---|---|

FALSE | down | 441 |

FALSE | up | 560 |

TRUE | down | 78 |

TRUE | up | 73 |

If we do not adjust for this with synthetic sampling, we get outcomes that seem quite intuitive.

If we only include sample days when Tether has been printed, Tether printing is positively associated with F7 returns. \(p(Tether | \theta) > 0\) is 65.25%

If we include all sample days, Tether printing is slighly negatively associated with F7 returns. \(p(Tether | \theta) > 0\) is 48.625%

One has to be careful to not infer causality from this observation. Tether printing could be influenced by natural demand in bull and bear markets, or it could be confounded with some other, unknown variable. And finally, as in the balanced sampling case, the posterior distribution spans a wide range around 0, meaning that there is no strong linear relationship here anyway.

In my opinion, the results from a simple model are not conclusive. While we could try to play with the assumptions and tune the model for larger effect size, I think that things in real world - as is often the case - are not as simple. More advanced research is required to attempt to reject the hypothesis behind the narrative. Perhaps one may find on-chain funds flow and disaggregated trading data of help here. This is however an interesting subject for the next post.