The effect of Tether printing on Bitcoin price

We have constructed a simple statistical model to measure the linear association between Tether printing and 7 day returns on BTC.

Furion https://github.com/Netherdrake
03-06-2020

This post has been inspired by the 2019 narrative - first popularized by a paper by Griffin et al. - that the newly printed Tether (USDT) is used to manipulate the value of Bitcoin.

We will naively explore this claim using a very simple statistical procedure and check if there is a linear relationship between Tether printing and 7 day Bitcoin returns. 7 days has been picked under the assumption that it would take as long to execute the trades which might have an effect on price.

This post is divided in two parts:
1.) Obtaining relevant datasets from Viewly One
2.) Building a statistical model in R

Obtaining the datasets

For this analysis we need historic prices of cryptoassets (Bitcoin and Ethereum) as well as Tether printing and burning events. Both can be obtained from Viewly One Pro dashboard.

Pricing Data

The daily OHLC data can be obtained from CMC Daily table. As the name implies, the data originates from CoinMarketCap.

The data can be exported as Excel, JSON or csv file. For our analysis we will need the .csv.

Tether Printing and Burning events

The Tether events are available in Stablecoin Events table. Please make sure to filter by Tether brand and only select Print and Burn events. Unfiltered table has over 300 million events, and exporting it would take a long time.

Analysis

Now that we have the data, its time to analyze it. For this post I will be using R and the brms library - a convenient abstraction layer over Stan.

Stan is a state-of-the-art platform for statistical modeling and high-performance statistical computation. Thousands of users rely on Stan for statistical modeling, data analysis, and prediction in the social, biological, and physical sciences, engineering, and business.

First, lets load the data.


d.tether <- 
  read_csv('tether.csv') %>%
  select(Time, Amount, Symbol, EventType, Protocol, USDValue) %>%
  mutate(USDValue = ifelse(EventType == 'Mint', USDValue, USDValue * -1)) %>%
  arrange(Time)

knitr::kable(d.tether %>% tail())
Time Amount Symbol EventType Protocol USDValue
2020-02-20 14:17:33 3.0e+08 USDT Burn TRC20 -3.0e+08
2020-02-22 17:23:03 2.0e+07 USDT Mint TRC20 2.0e+07
2020-02-24 10:51:32 6.0e+07 USDT Mint ERC20 6.0e+07
2020-02-27 15:59:24 1.5e+07 USDT Mint TRC20 1.5e+07
2020-03-05 09:40:18 2.0e+08 USDT Mint ERC20 2.0e+08
2020-03-05 15:49:47 6.0e+07 USDT Mint ERC20 6.0e+07

Second, lets compute some basic features, such as 1 and 7 day historic and future BTC returns (in market cap terms).


d.btc <-
  read_csv('btc.csv') %>%
  arrange(Date) %>%
  select(Date, BTC=MarketCap) %>%
  # lets peek 1,7 days into the past and into the future
  mutate(
    H1 = (BTC / lag(BTC, 1)),
    H7 = (BTC / lag(BTC, 7)),
    F1 = (lead(BTC, 1) / BTC),
    F7 = (lead(BTC, 7) / BTC),
  ) %>%
  # cut away rows for which we don't have features
  slice(8:(n()-8)) %>%
  drop_na()

knitr::kable(d.btc %>% tail())
Date BTC H1 H7 F1 F7
2020-02-21 176587087363 1.0082213 0.9399798 0.9976986 0.8959415
2020-02-22 176180696548 0.9976986 0.9778039 1.0271456 0.8905402
2020-02-23 180963233540 1.0271456 0.9996993 0.9724506 0.8633742
2020-02-24 175977808526 0.9724506 0.9965599 0.9681311 0.9197817
2020-02-25 170369581558 0.9681311 0.9217212 0.9442970 0.9413862
2020-02-26 160879489024 0.9442970 0.9162436 0.9960157 0.9933389

Assumptions

Afterwards, we need to merge the pricing table and the Bitcoin table. I will do a full join here, such that the no-Tether events are included in the model. We also center the variables for faster HMC convergence and partial interpretability. Lastly, the Tether printing and Future return variables are imbalanced, so I’ve used Synthetic Minority Oversampling Technique (SMOTE) to deal with that.

Before we start, we have to note the violations of linear modelling asumptions.

As in all financial series, the return variables are auto-correlated, which will likely result in our residuals violating the homoskedasticity assumption.

Second, the variable distribution is not gaussian in nature, so we will use StudenT instead - this will make our model to ignore fat tail events (which we cannot infer with simple statistical methods anyway).

The differenced Tether issuance distribution also exhibits some heavy skewness.

The above mentioned issues do indeed show up - although it doesn’t seem too bad - in our best model (m2), which will be introduced shortly.

Using the pairs plot we can see that there are no strong bi-variate linear relationships. We will look at the multi-variate relationships next.

Benchmark Model

First is the benchmark model, where the future 7 days return are just the avereage future 7 days return.

\[ F7 \sim {\displaystyle t_{\nu }(\mu, \Sigma )} \\ \mu \sim N(0, 0.1) \\ \Sigma \sim Cauchy(0, 1) \]

The model is predictive slighly negative future returns, as the average returns in the 2 year “bear market” were negative.

(m1) Tether as predictor

Next up, we add the Tether growth as predictor.

\[ F7 \sim 1 + Tether \]

(m2) Tether + Historic Returns

And finally, lets try conditioning on historic 7 day returns as well.

\[ F7 \sim 1 + Tether + H7 \]

Best model

I have fitted more models (ie. conditioning on year, forcing interactions and adding covariance matrix between some params), but none seemed to work as well as the simplest ones. Of the simple ones, m2 is the best, although it is beating the benchmark model only slighly (using WAIC).


Error in object@.MISC$stan_fit_instance$unconstrain_pars(pars) : 
  Exception: Variable Intercept missing  (in 'model169ca4083a647_9ad482385900adb1bfbbf1df1aa5ce68' at line 31)
Model Type ELPD ELPD_SE LOOIC LOOIC_SE WAIC RMSE Performance_Score
m2 brmsfit 776.22 28.71 -1552.43 57.41 -1552.46 0.1 0.59
m.bench brmsfit 776.65 28.90 -1553.30 57.80 -1553.31 0.1 0.50
m1 brmsfit 776.05 28.87 -1552.11 57.74 -1552.14 0.1 0.47

Results

The posterior distribution of the Tether models covers a wide range around 0, with \(p(Tether | \theta) > 0\) of the winning model being 64.1%. Furthemore, an astute reader might have noticed that the \(\sigma\) (model’s epistemic uncertainty) MAP is greater than Tether parameter MAP, rendering the results less singificant. I think this means that there is no strong “evidence” that - at least in a simple linear model - Tether issuance is associated with 7 day future returns of Bitcoin.

Notes on sampling

In the preprocessing step I’ve used random oversampling to correct for F7 imbalance due to non-stationatiry (in bull market future returns are more often positive, and in bear market more often negative), and Tether has been printed only on 148 of 1146 days. The results are biased, because Tether seems to be printed more in bull market, and not printed or burned in bear market.

HasTetherPrinted FutureDirection n
FALSE down 441
FALSE up 560
TRUE down 78
TRUE up 73

If we do not adjust for this with synthetic sampling, we get outcomes that seem quite intuitive.

If we only include sample days when Tether has been printed, Tether printing is positively associated with F7 returns. \(p(Tether | \theta) > 0\) is 65.25%

If we include all sample days, Tether printing is slighly negatively associated with F7 returns. \(p(Tether | \theta) > 0\) is 48.625%

One has to be careful to not infer causality from this observation. Tether printing could be influenced by natural demand in bull and bear markets, or it could be confounded with some other, unknown variable. And finally, as in the balanced sampling case, the posterior distribution spans a wide range around 0, meaning that there is no strong linear relationship here anyway.

Comment

In my opinion, the results from a simple model are not conclusive. While we could try to play with the assumptions and tune the model for larger effect size, I think that things in real world - as is often the case - are not as simple. More advanced research is required to attempt to reject the hypothesis behind the narrative. Perhaps one may find on-chain funds flow and disaggregated trading data of help here. This is however an interesting subject for the next post.