Tushare技能实战：如何高效获取金融数据并避免API限流

8次阅读

共计 2359 个字符，预计需要花费 6 分钟才能阅读完成。

金融数据开发者使用 Tushare 免费 API 时主要面临三类问题：

请求配额限制：免费版每分钟仅允许 60 次请求，单日限额 500 次，且不同接口共享配额
稳定性挑战：连续高频请求易触发风控，导致 IP 暂时封禁（错误码 429）
数据一致性风险：网络波动可能造成数据缺失，需保证采集的幂等性

采用线程池 + 队列实现可控并发，关键参数：

线程数建议设置为min(4, CPU 核心数 -1)
每个线程维护独立 API token
共享全局请求计数器

from concurrent.futures import ThreadPoolExecutor
import tushare as ts

class ConcurrentFetcher:
    def __init__(self, tokens):
        self.token_ring = itertools.cycle(tokens)
        self.counter = 0

    def fetch_data(self, stock_code):
        pro = ts.pro_api(next(self.token_ring))
        try:
            # 示例：获取日线数据
            df = pro.daily(ts_code=stock_code) 
            self.counter += 1
            return df
        except Exception as e:
            logger.error(f"Failed to fetch {stock_code}: {str(e)}")
            return None

# 使用示例
fetcher = ConcurrentFetcher(['token1','token2'])
with ThreadPoolExecutor(max_workers=4) as executor:
    results = list(executor.map(fetcher.fetch_data, stock_list))

采用 SQLite 实现三级缓存策略：

内存缓存 ：使用functools.lru_cache 缓存最近请求
磁盘缓存 ：SQLite 按[接口名 + 参数] 做哈希存储
持久化备份：每日凌晨导出 CSV 到云存储

import sqlite3
from hashlib import md5

class TushareCache:
    def __init__(self, db_path='tushare_cache.db'):
        self.conn = sqlite3.connect(db_path)
        self._init_db()

    def _init_db(self):
        self.conn.execute('''CREATE TABLE IF NOT EXISTS cache (
            key TEXT PRIMARY KEY,
            data TEXT,
            expire_time INTEGER
        )''')

    def get(self, api_name, params):
        key = self._make_key(api_name, params)
        cursor = self.conn.execute(
            'SELECT data FROM cache WHERE key=? AND expire_time>?',
            (key, int(time.time()))
        )
        if row := cursor.fetchone():
            return json.loads(row[0])
        return None

    def set(self, api_name, params, data, ttl=86400):
        key = self._make_key(api_name, params)
        self.conn.execute('INSERT OR REPLACE INTO cache VALUES (?,?,?)',
            (key, json.dumps(data), int(time.time())+ttl)
        )
        self.conn.commit()

动态调整请求间隔的 PID 控制器：

基础间隔 = 60 秒 / 当前剩余配额
根据最近 5 次请求的响应时间动态修正
遇到 429 错误时自动指数退避

class RequestController:
    def __init__(self):
        self.last_error_time = 0
        self.error_count = 0

    def get_delay(self):
        base_delay = 60 / (MAX_REQUESTS - request_counter)

        # 错误退避机制
        if time.time() - self.last_error_time < 300:
            backoff = min(2 ** self.error_count, 300)
            return base_delay + backoff

        return max(base_delay, 0.5)  # 最低 0.5 秒间隔