区块链数据分析之保存币安历史数据

最近币圈大跌,作为一棵吊在高处的韭菜,还是觉得好好的做开发比跟风靠谱多了,遂静下心来,好好的写点文章,做点数据分析,抓取进阶下。

接触区块链其实在三年前,一个朋友就提醒我们多关注区块链,那个时候的我们还是少不更事的认为那个虚的东西又撒好玩的,到现在回首看,眼界比技能和运气要重要得多。做人没有好的眼光,上不去的,还是捡起来好好的分析下区块链技术,后续会多些一些。

言归正传,最近在分析各个交易所的数据,对接了几个交易所(bitflyer, coincheck, binance,btcbox)后续在对接一些其他的。首先作为技术出身,还是好好的做数据分析吧,先爬一些数据下来,就到处找相关的接口文档,然后通过python来把数据抓下来。这块就需要大家提供的访问api,有很多人写了交易所第三方的访问api,发布在github上,我们就使用这些api来访问以获取数据。

看了好几篇文章,第一篇先给澳大利亚的一个大叔级高手打广告,翻译下他的文章,这个哥们写了好几个交易所的获取客户端。

原文链接: https://sammchardy.github.io/binance/2018/01/08/historical-data-download-binance.html

正文

任何交易策略的基础都是有一个好的回测方案,如果你没有数据,就没法回测,数据还是很重要的。

在这篇文章中,我将详细描述通过Binance API在指定时间范围内下载和保存币安的历史K线数据。

这个例子不需要有币安的账户,直接访问公开的API即可。

首先需要处理时间格式

因为币安服务器只接受毫秒级别的时间戳的时间段,所以需要将工作日时间转换为毫秒时间戳。

本次使用强大的python来进行处理,安装dateparser包。命令如下:

1
pip install dateparser

我们可以得写个函数将日期字符串直接转换成毫秒格式。例子如下:

1
2
3
print(date_to_milliseconds("January 01, 2018"))
print(date_to_milliseconds("11 hours ago UTC"))
print(date_to_milliseconds("now UTC"))

获取币安的K线数据

现在我们希望通过 get_klines API来获取实际的交易量数据。

先看下接口参数:

1
2
3
4
5
symbol - e.g ETHBTC  币种数据
interval - one of (1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d, 3d, 1w, 1M) 时间范围
limit - max 500 限制条数
startTime - milliseconds 开始时间 单位毫秒
endTime - milliseconds 结束时间 单位毫秒

因为一次最多取500条,如果时间范围比较大,我们得循环获取。

返回结果格式:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
[
1499040000000, # Open time
"0.01634790", # Open
"0.80000000", # High
"0.01575800", # Low
"0.01577100", # Close
"148976.11427815", # Volume
1499644799999, # Close time
"2434.19055334", # Quote asset volume
308, # Number of trades
"1756.87402397", # Taker buy base asset volume
"28.46694368", # Taker buy quote asset volume
"17928899.62484339" # Ignore
]

我们把返回的结果都存储起来,有多少存多少。

币安时间间隔intervals

参数中的intervals是一个字符串,一次请求只能是在某个范围内,一次限制是500,所以需要获取较长时间段数据的话,需要对时间间隔进行毫秒转换。将API中设定的时间,转换成毫秒, 例子如下:

1
2
3
4
5
from binance.client import Client

print(interval_to_milliseconds(Client.KLINE_INTERVAL_1MINUTE))
print(interval_to_milliseconds(Client.KLINE_INTERVAL_30MINUTE))
print(interval_to_milliseconds(KLINE_INTERVAL_1WEEK))

获取K线数据

准备好了之后,我们开始写方法来获取历史数据。 我们通过时间范围和intervals间隔参数很容易的获取。例子如下:

1
2
3
4
5
6
7
8
9
10
from binance.client import Client

# fetch 1 minute klines for the last day up until now
klines = get_historical_klines("BNBBTC", Client.KLINE_INTERVAL_1MINUTE, "1 day ago UTC")

# fetch 30 minute klines for the last month of 2017
klines = get_historical_klines("ETHBTC", Client.KLINE_INTERVAL_30MINUTE, "1 Dec, 2017", "1 Jan, 2018")

# fetch weekly klines since it listed
klines = get_historical_klines("NEOBTC", KLINE_INTERVAL_1WEEK, "1 Jan, 2017")

完整的代码可以从python-binance项目的例子中获取,链接: examples folder

保存到文件

获取到的数据后存到文件以方便后面使用。代码如下:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import json
from binance.client import Client

symbol = "ETHBTC"
start = "1 Dec, 2017"
end = "1 Jan, 2018"
interval = Client.KLINE_INTERVAL_30MINUTE

klines = get_historical_klines(symbol, interval, start, end)

# open a file with filename including symbol, interval and start and end converted to milliseconds
with open(
"Binance_{}_{}_{}-{}.json".format(
symbol,
interval,
date_to_milliseconds(start),
date_to_milliseconds(end)
),
'w' # set file write mode
) as f:
f.write(json.dumps(klines))

打赏功能

这些好的方法,已经添加到python-binance方便大家使用。

date_to_millisecondsinterval_to_milliseconds已经添加到binance.helpers中。get_historical_klines已经添加到finance.client中,直接调用即可。代码如下:

1
2
3
4
5
6
import json
from binance.client import Client

client = Client("", "")

klines = client.get_historical_klines("ETHBTC", Client.KLINE_INTERVAL_30MINUTE, "1 Dec, 2017", "1 Jan, 2018")

接下来

使用这些获取的数据,我们可以在后续的回测中使用。

使用Kucoin交易所的用户同样使用python-kucoin 也有这些功能。

后续还会写一些使用 pandasTA-Lib 来进行简单的回测的文章。

结语

sammchardy 是一个很帅的澳洲哥们,大家可以在twitter上关注他, 作为开源的api客户端,写的也非常的便捷,对于这样有分享的精神的哥们,大家是不是给点打赏呢,他也给了Donate地址了。大家可以查看原文。

我在最后也贴下全文例子的链接:

save_historical_data.py 大家可以下下来实际的跑一下试试。

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
import time
import dateparser
import pytz
import json

from datetime import datetime
from binance.client import Client


def date_to_milliseconds(date_str):
"""Convert UTC date to milliseconds
If using offset strings add "UTC" to date string e.g. "now UTC", "11 hours ago UTC"
See dateparse docs for formats http://dateparser.readthedocs.io/en/latest/
:param date_str: date in readable format, i.e. "January 01, 2018", "11 hours ago UTC", "now UTC"
:type date_str: str
"""
# get epoch value in UTC
epoch = datetime.utcfromtimestamp(0).replace(tzinfo=pytz.utc)
# parse our date string
d = dateparser.parse(date_str)
# if the date is not timezone aware apply UTC timezone
if d.tzinfo is None or d.tzinfo.utcoffset(d) is None:
d = d.replace(tzinfo=pytz.utc)

# return the difference in time
return int((d - epoch).total_seconds() * 1000.0)


def interval_to_milliseconds(interval):
"""Convert a Binance interval string to milliseconds
:param interval: Binance interval string 1m, 3m, 5m, 15m, 30m, 1h, 2h, 4h, 6h, 8h, 12h, 1d, 3d, 1w
:type interval: str
:return:
None if unit not one of m, h, d or w
None if string not in correct format
int value of interval in milliseconds
"""
ms = None
seconds_per_unit = {
"m": 60,
"h": 60 * 60,
"d": 24 * 60 * 60,
"w": 7 * 24 * 60 * 60
}

unit = interval[-1]
if unit in seconds_per_unit:
try:
ms = int(interval[:-1]) * seconds_per_unit[unit] * 1000
except ValueError:
pass
return ms


def get_historical_klines(symbol, interval, start_str, end_str=None):
"""Get Historical Klines from Binance
See dateparse docs for valid start and end string formats http://dateparser.readthedocs.io/en/latest/
If using offset strings for dates add "UTC" to date string e.g. "now UTC", "11 hours ago UTC"
:param symbol: Name of symbol pair e.g BNBBTC
:type symbol: str
:param interval: Biannce Kline interval
:type interval: str
:param start_str: Start date string in UTC format
:type start_str: str
:param end_str: optional - end date string in UTC format
:type end_str: str
:return: list of OHLCV values
"""
# create the Binance client, no need for api key
client = Client("", "")

# init our list
output_data = []

# setup the max limit
limit = 500

# convert interval to useful value in seconds
timeframe = interval_to_milliseconds(interval)

# convert our date strings to milliseconds
start_ts = date_to_milliseconds(start_str)

# if an end time was passed convert it
end_ts = None
if end_str:
end_ts = date_to_milliseconds(end_str)

idx = 0
# it can be difficult to know when a symbol was listed on Binance so allow start time to be before list date
symbol_existed = False
while True:
# fetch the klines from start_ts up to max 500 entries or the end_ts if set
temp_data = client.get_klines(
symbol=symbol,
interval=interval,
limit=limit,
startTime=start_ts,
endTime=end_ts
)

# handle the case where our start date is before the symbol pair listed on Binance
if not symbol_existed and len(temp_data):
symbol_existed = True

if symbol_existed:
# append this loops data to our output data
output_data += temp_data

# update our start timestamp using the last value in the array and add the interval timeframe
start_ts = temp_data[len(temp_data) - 1][0] + timeframe
else:
# it wasn't listed yet, increment our start date
start_ts += timeframe

idx += 1
# check if we received less than the required limit and exit the loop
if len(temp_data) < limit:
# exit the while loop
break

# sleep after every 3rd call to be kind to the API
if idx % 3 == 0:
time.sleep(1)

return output_data


symbol = "ETHBTC"
start = "1 Dec, 2017"
end = "1 Jan, 2018"
interval = Client.KLINE_INTERVAL_30MINUTE

klines = get_historical_klines(symbol, interval, start, end)

# open a file with filename including symbol, interval and start and end converted to milliseconds
with open(
"Binance_{}_{}_{}-{}.json".format(
symbol,
interval,
date_to_milliseconds(start),
date_to_milliseconds(end)
),
'w' # set file write mode
) as f:
f.write(json.dumps(klines))

执行例子:

1
python3 save_historical_data.py

生成文件:

1
Binance_ETHBTC_30m_1512086400000-1514764800000.json

返回结果例子:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
[
[
1512086400000,
"0.04368400",
"0.04375100",
"0.04334200",
"0.04366500",
"2081.85600000",
1512088199999,
"90.79655078",
3904,
"976.19100000",
"42.59074736",
"271480.34213668"
],
[
1512088200000,
"0.04360200",
"0.04369900",
"0.04325100",
"0.04350100",
"2420.48100000",
1512089999999,
"105.27683806",
2775,
"1133.24800000",
"49.31486895",
"271300.32546398"
],
[
1512090000000,
"0.04350100",
"0.04379400",
"0.04304900",
"0.04370500",
"2192.51500000",
1512091799999,
"95.48824264",
2359,
"1029.30200000",
"44.86783356",
"271143.81327337"
],
[
1512091800000,
"0.04374900",
"0.04392000",
"0.04361300",
"0.04378100",
"1482.24800000",
1512093599999,
"64.86877196",
2112,
"704.81600000",
"30.85380066",
"271394.84389969"
],
[
1512093600000,
"0.04375800",
"0.04424900",
"0.04364800",
"0.04403800",
"2073.49800000",
1512095399999,
"90.95341447",
2763,
"996.44700000",
"43.72006243",
"271126.63134592"
],
[
1512095400000,
"0.04400500",
"0.04421400",
"0.04380000",
"0.04389900",
"1675.47800000",
1512097199999,
"73.62046216",
1817,
"808.25500000",
"35.52008528",
"271368.11447690"
],
[
1512097200000,
"0.04390400",
"0.04413600",
"0.04364200",
"0.04400500",
"2138.10400000",
1512098999999,
"93.95901243",
2260,
"1157.93700000",
"50.88996640",
"271143.40315253"
]
]

大家可以试试,在后续回测中可以用到。

如果大家遇到报错,请升级下cryptography的包,命令如下:

1
pip3 install -U cryptography

后续我也会多些一些原创的区块链技术方面的文章。

坚持原创技术分享,您的支持将鼓励xinqiyang继续创作!