A Practical Guide to Developing a Crypto Arbitrage Bot

·

In the world of finance, arbitrage is a strategy where a security, commodity, or currency is bought in one market and simultaneously sold in another market at a higher price. The price difference between the two markets, minus any transaction fees, represents the profit.

Within the cryptocurrency space, most trading activity occurs 24/7 across numerous exchanges. The price of an asset is determined by supply and demand economics, and the price on different exchanges is not necessarily identical. A cross-exchange arbitrage strategy can take advantage of these differences, offering traders a potential opportunity for profit.

This guide walks through the process of building a Python-based bot designed to identify arbitrage opportunities by tracking cryptocurrency prices and other useful metrics. Please note that the bot we will build does not cover the actual execution of trades on exchanges.

We will use the demo endpoints from the CoinGecko API to retrieve cryptocurrency price data. These endpoints are accessible for free but require key authentication.

Understanding the Core Components

A crypto arbitrage bot functions by continuously monitoring price discrepancies for the same asset across different trading platforms. The fundamental logic involves data collection, data processing, opportunity identification, and optionally, trade execution.

For our purposes, we will focus on the first three components: fetching data, processing it, and highlighting potential arbitrage opportunities. This involves interacting with several key API endpoints to gather the necessary market intelligence.

Key API Endpoints for Data Retrieval

The CoinGecko API provides several endpoints critical for gathering exchange and price data:

Prerequisites and Initial Setup

Before diving into the code, ensure you have the necessary tools and libraries installed. We will be using a Jupyter notebook for creating and running the bot.

Required Software and Packages

Make sure Python 3 is installed on your system. You will also need to install the following additional packages using pip:

pip install jupyterlab
pip install pandas
pip install numpy
pip install pytz

To start coding in a new notebook, execute the following command in your terminal. This should open a new tab in your browser window where you can begin development.

jupyter lab

Loading Essential Python Packages

The first step within the notebook is to import all the required libraries. These packages will handle data requests, manipulation, and time-zone conversions.

import requests as rq
import json
import pandas as pd
pd.set_option('display.precision', 4, 'display.colheader_justify', 'center')
import numpy as np
import warnings
import pytz
import datetime
import time
from IPython.display import clear_output

The pd.set_option function configures display options for the pandas library, which will be useful later for visualizing data in DataFrames. The clear_output function is necessary for refreshing data tables periodically within the notebook.

Configuring API Access

To access the CoinGecko API, you need an API key. You can register for a free demo plan, which provides a key with a monthly limit of 10,000 calls and a rate limit of 30 calls per minute.

Since our crypto arbitrage bot is designed to run continuously, you might encounter these rate limits. For a production environment, a professional API key would be beneficial.

Defining API Request Functions

The API key can be read from a local file. A header, use_demo, is defined for the get_response function, which will handle our API requests. A status code of 200 indicates a successful request.

def get_demo_key():
    f = open("/path/to/your/CG_demo_key.json")
    key_dict = json.load(f)
    return key_dict["key"]

use_demo = {
    "accept": "application/json",
    "x-cg-demo-api-key" : get_demo_key()
}

def get_response(endpoint, headers, params, URL):
    url = "".join((URL, endpoint))
    response = rq.get(url, headers = headers, params = params)
    if response.status_code == 200:
        data = response.json()
        return data
    else:
        print(f"Failed to fetch data, check status code {response.status_code}")

You can review complete response codes and their corresponding states in the API documentation.

Fetching and Processing Exchange Data

The foundation of our arbitrage strategy is understanding which exchanges to monitor and what assets to track.

Retrieving a List of All Cryptocurrency Exchanges

To get a complete list of cryptocurrency exchanges, make an API call to the /exchanges endpoint in the CoinGecko API. The following query parameters ensure we get all results on a single page.

exchange_params = {
    "per_page": 250,
    "page": 1
}

exchange_list_response = get_response("/exchanges", use_demo, exchange_params, PUB_URL)
df_ex = pd.DataFrame(exchange_list_response)

To list exchanges by trading volume, you can easily sort the resulting DataFrame based on the trade_volume_24h_btc column. You can also filter exchanges based on their registered country, which is useful for targeting specific markets later.

df_ex_subset = df_ex[["id", "name", "country", "trade_volume_24h_btc"]]
df_ex_subset = df_ex_subset.sort_values(by = ["trade_volume_24h_btc"], ascending = False)

Obtaining Ticker Information for a Specific Exchange

For each exchange, there is data on numerous cryptocurrency tickers, or trading pairs. We want to filter this list to find the latest trade for the specific base and target currencies we are interested in.

After making a request to the relevant endpoint, we can loop through the response to find a matching pair. If no match is found, an empty string is returned, as not all exchanges offer all trading pairs.

def get_trade_exchange(id, base_curr, target_curr):
    exchange_ticker_response = get_response(f"/exchanges/{id}/tickers", use_demo, {}, PUB_URL)
    found_match = ""
    for ticker in exchange_ticker_response["tickers"]:
        if ticker["base"] == base_curr and ticker["target"] == target_curr:
            found_match = ticker
            break
    if found_match == "":
        warnings.warn(f"No data found for {base_curr}-{target_curr} pair in {id}")
    return found_match

Converting Timestamps to Local Timezone

The returned data often includes timestamps from different time zones. Converting these to your local timezone is helpful for monitoring bot activity. This can be easily achieved using the pytz library.

def convert_to_local_tz(old_ts):
    new_tz = pytz.timezone("Europe/Amsterdam")  # Replace with your timezone
    old_tz = pytz.timezone("UTC")
    format = "%Y-%m-%dT%H:%M:%S+00:00"
    datetime_obj = datetime.datetime.strptime(old_ts, format)
    localized_ts = old_tz.localize(datetime_obj)
    new_ts = localized_ts.astimezone(new_tz)
    return new_ts

Aggregating Data from Multiple Exchanges

Knowing how to get ticker data for a single exchange, we can expand this logic to gather data from multiple exchanges within a specific country or region.

Collecting Ticker Data Across a Market

From the ticker response, we collect the last trade price, last trade volume, bid-ask spread, and trade time (converted to local timezone). If the trading pair is not listed on a given exchange, a warning is displayed.

def get_trade_exchange_per_country(country, base_curr, target_curr):
    df_all = df_ex_subset[(df_ex_subset["country"] == country)]
    exchanges_list = df_all["id"]
    ex_all = []
    for exchange_id in exchanges_list:
        found_match = get_trade_exchange(exchange_id, base_curr, target_curr)
        if found_match == "":
            continue
        else:
            temp_dict = dict(
                exchange = exchange_id,
                last_price = found_match["last"],
                last_vol = found_match["volume"],
                spread = found_match["bid_ask_spread_percentage"],
                trade_time = convert_to_local_tz(found_match["last_traded_at"])
            )
            ex_all.append(temp_dict)
    return pd.DataFrame(ex_all)

The bid-ask spread percentage is the difference between the lowest price asked by a seller and the highest price bid by a potential buyer for an asset. A lower spread value indicates higher liquidity and trading volume for that asset on the exchange. A larger spread typically suggests lower liquidity. This metric is useful for deciding whether to consider a specific exchange for an arbitrage trade.

Handling Currency Conversions and Historical Data

Data from various endpoints (e.g., volume) is often reported in BTC. It is useful for our bot to also determine what percentage of the total volume a given ticker represents, giving us further insight into an exchange's liquidity for that pair.

Fetching Bitcoin Exchange Rates

To convert BTC to different target currencies, we can fetch the current exchange rate using the CoinGecko API.

def get_exchange_rate(base_curr):
    exchange_rate_response = get_response(f"/exchange_rates", use_demo, {}, PUB_URL)
    rate = ""
    try:
        rate = exchange_rate_response["rates"][base_curr.lower()]["value"]
    except KeyError as ke:
        print("Currency not found in the exchange rate API response:", ke)
    return rate

Analyzing Historical Volume Data

Using historical volume data for a given period, we can determine the latest volume using a simple moving average (SMA) within a 7-day window. This volume (defaulting to BTC) can then be converted to our currency of interest using the exchange rate identified in the previous section.

Knowing the total volume (the sum for all tickers) makes it easy to determine the percentage of volume for our specific ticker.

def get_vol_exchange(id, days, base_curr):
    vol_params = {"days": days}
    exchange_vol_response = get_response(f"/exchanges/{id}/volume_chart", use_demo, vol_params, PUB_URL)
    time, volume = [], []
    ex_rate = 1.0
    if base_curr != "BTC":
        ex_rate = get_exchange_rate(base_curr)
    if ex_rate == "":
        print(f"Unable to find exchange rate for {base_curr}, vol will be reported in BTC")
        ex_rate = 1.0
    for i in range(len(exchange_vol_response)):
        s = exchange_vol_response[i][0] / 1000  # Convert to seconds
        time.append(datetime.datetime.fromtimestamp(s).strftime('%Y-%m-%d'))
        volume.append(float(exchange_vol_response[i][1]) * ex_rate)  # Convert volume
    df_vol = pd.DataFrame(list(zip(time, volume)), columns = ["date", "volume"])
    df_vol["volume_SMA"] = df_vol["volume"].rolling(7).mean()  # Calculate 7-day SMA
    return df_vol.sort_values(by = ["date"], ascending = False).reset_index(drop = True)

Running the Arbitrage Bot

The core function of the bot is to continuously monitor the latest trades across multiple exchanges.

Data Aggregation and Display Logic

Our bot will periodically fetch the latest trade data. It's crucial to aggregate this data over time, removing duplicates and calculating statistics like the number of trades, average price, and average volume per exchange.

A new column showing the percentage of total exchange volume is also added for context. The bot highlights the exchanges with the highest and lowest prices to quickly identify potential arbitrage opportunities.

def display_agg_per_exchange(df_ex_all, base_curr):
    df_agg = (
        df_ex_all.groupby("exchange").agg
        (
            trade_time_min = ("trade_time", 'min'),
            trade_time_latest = ("trade_time", 'max'),
            last_price_mean = ("last_price", 'mean'),
            last_vol_mean = ("last_vol", 'mean'),
            spread_mean = ("spread", 'mean'),
            num_trades = ("last_price", 'count')
        )
    )
    df_agg["trade_time_duration"] = df_agg["trade_time_latest"] - df_agg["trade_time_min"]
    df_agg = df_agg.reset_index()
    last_vol_pert = []
    for i, row in df_agg.iterrows():
        try:
            df_vol = get_vol_exchange(row["exchange"], 30, base_curr)
            current_vol = df_vol["volume_SMA"][0]
            vol_pert = (row["last_vol_mean"] / current_vol) * 100
            last_vol_pert.append(vol_pert)
        except:
            last_vol_pert.append("")
            continue
    df_agg["last_vol_pert"] = last_vol_pert
    df_agg = df_agg.drop(columns = ["trade_time_min"])
    df_agg = df_agg.round({"last_price_mean": 2, "last_vol_mean": 2, "spread_mean": 2})
    display(df_agg.style.apply(highlight_max_min, color = 'green', subset = "last_price_mean"))
    return None

def highlight_max_min(x, color):
    return np.where((x == np.nanmax(x.to_numpy())) | (x == np.nanmin(x.to_numpy())), f"color: {color};", None)

The Main Execution Loop

The bot uses a while loop to run continuously until stopped by the user. A delay of one minute is introduced between updates using a sleep statement. This aligns with the API's refresh rate under the demo plan.

def run_bot(country, base_curr, target_curr):
    df_ex_all = get_trade_exchange_per_country(country, base_curr, target_curr)
    while True:
        time.sleep(60)
        df_new = get_trade_exchange_per_country(country, base_curr, target_curr)
        df_ex_all = pd.concat([df_ex_all, df_new])
        df_ex_all = df_ex_all.drop_duplicates()
        clear_output(wait = True)
        display_agg_per_exchange(df_ex_all, base_curr)
    return None

For example, after running the bot for a couple of hours on the ETH-USDT pair for US exchanges, you might see the lowest price on Exchange A and the highest price on Exchange B highlighted in green. A potential arbitrage strategy would be to buy ETH on Exchange A and immediately sell it on Exchange B.

A notable observation is the correlation between the bid-ask spread and the number of trades. A high spread often coincides with low trade counts, indicating lower liquidity for that pair on that exchange, which is an important risk factor.

To stop the bot, navigate to the 'Kernel' tab at the top of the Jupyter interface and select 'Interrupt Kernel'.

👉 Explore more advanced trading strategies

Frequently Asked Questions

What is crypto arbitrage trading?
Crypto arbitrage is a trading strategy that exploits price differences of the same cryptocurrency across different exchanges. A trader simultaneously buys the asset on one exchange where the price is lower and sells it on another exchange where the price is higher, profiting from the spread minus transaction fees.

What are the main risks involved in arbitrage trading?
Key risks include execution risk (prices changing before trades are complete), exchange withdrawal fees and limits, transaction fees eroding profits, and the inherent volatility of cryptocurrency markets. Low liquidity on an exchange can also make it difficult to execute large orders at the expected price.

Why do price differences between exchanges exist?
Price discrepancies occur due to variations in supply and demand on different trading platforms. Factors like trading volume, liquidity, regional regulations, fiat currency pair availability, and time delays in order book updates all contribute to these differences.

Do I need to be an expert programmer to build a bot?
While building a basic monitoring bot requires intermediate Python skills, creating a fully automated bot that executes trades demands a higher level of expertise in programming, APIs, and exchange security protocols. Understanding the financial risks involved is equally important.

How much capital is needed to start arbitrage trading?
The amount of capital required depends on the price differences and transaction fees. While even small discrepancies can be profitable with large volumes, beginners should start with a manageable amount they are willing to risk, thoroughly testing their strategy before committing significant funds.

Is crypto arbitrage legal?
Yes, crypto arbitrage is a legal trading strategy. However, it is crucial to comply with the regulations and terms of service of the exchanges you use and report any profits for tax purposes according to the laws in your country of residence.