Batch mode request for multiple symbols for daily historical data

mnikhil · October 16, 2024, 2:41pm

Hi,

I was looking at API to get the historical data other day and realized that only one symbol per API request is allowed.

This works fine when you have only 1 symbol to get the historical data.

However, what becomes a problem is when you have multiple symbols, lets say all of NSE 500 stocks and want to fetch them in either multiple small batches of individual requests or groups of batches of bunch of symbols packed together as input request.

API tier come into picture and show us that there are about only 10 requests allowed per minute in v2 API, which means only 10 symbols are allowed for download in a given minute with the help of a loop.

At this rate, NSE universe of 1900+ stocks will take about 3-4 hours? please help or correct me if I am mistaken here in the math. This will only increase with the increase in number of stocks. It will be comparably even higher for BSE universe. Also, this assumes the download at every stage of this process goes smooth with out any external or internal glitches otherwise may or could warrant a restart or resume (if intelligently done at the client side) of the download process.

I do think that this API tier for historical data is too restrictive or even over limiting.
At the very least, either the number has to go up or allow batch of symbols with a limit on the batch count, ofcourse, to be allowed in a single request to cope with the API tier limits.

Even, Exchanges themselves do not have such limits for downloading historical data. Dhan, on the other hand provides API support so there is a dependency which I would like to have, and would also like to same level if not relaxed tier limits comparative to that of exchanges providing for downloading historical data.

Also, most importantly an important question: does historical data provide latest OHLCV (that is today’s snapshot OHLCV) if the end/to date is specified as today?

thanks

cc @Hardik @PravinJ @Pranita @Dhan_Cares

mnikhil · October 16, 2024, 3:02pm

Challenge has to be that the improvements have to be better than what can be already done with nse/bse python open source libraries. At the current rate, API tier imposing limits make it way way sluggish in front of what is possible with off the shelf open source libraries.

I can prove to you that downloading NSE 500 group of companies historical daily data from NSE today (single symbol per request) for a range of 365+ days of historical OHLCV – takes about less than 20minutes

and for the record, yfinance takes literally less than 20 seconds. Yes, seconds!

and all of this, without any caching or persistence at the local disk at the time of download.

So, how is this in comparision for Dhan’s historical data download API throughput/performance?

API product team – would definitely like to know your response on this.

mnikhil · October 20, 2024, 8:52pm

@PravinJ @Hardik @Dhan_Cares @Dhan @Pranita – I am hoping to get attention to the need here

mnikhil · October 22, 2024, 5:34am

trying one more time … @PravinJ @Dhan @Dhan_Cares @Hardik @Pranita

it is strange that I include so many handles from Dhan and yet it is only @Hardik to respond, why does not Pranita or Pravin respond? or why not Dhan/Dhan cares respond?

Hardik · October 22, 2024, 7:00am

Hello @mnikhil

Sorry for missing this thread.

No. You can actually do 10 requests per second. So 2000 stocks will take 200 seconds or roughly three minutes.

Also, one of the important consideration here is this is minute OHLC data. If you fetch in 5 min or above time frames, then there is no limitation at all.

On responses, I look after APIs as a product here. Although we all try to be active on the community and check each and every post, it does gets missed sometime. The best way to reach Dhan/Dhan Cares is via Chat/Email.

mnikhil · October 23, 2024, 10:01am

this is for daily historical data, for a period as long as the stock could fetch upto (like 10 years, for example)

does this still apply?

@Hardik also, if I specify the end date as today, does today’s data with latest OHLCV snapshot of today is also present in the response?

Subhajitpanja · October 23, 2024, 10:17am

But as per knowledge, till now Dhan allow to fetch data for last 7 days/1 week’s historical data

Hardik · October 23, 2024, 10:18am

Hello @Subhajitpanja @mnikhil

Yes. For Stocks, you can fetch Daily Historicaly data for companies since inception.
But it doesn’t include today’s OHLCV.

This is for minute data. Daily data is for longer duration.

mnikhil · October 23, 2024, 3:58pm

Ok, any possible way to provide that? if specified as additional ask parameter in the method/api?

Also, to the original ask/question is need for supplying batch symbols at once instead of in a looping over them… so the response is for multiple symbols (much like what yfinance download for multiple singles could do)

Hardik · October 24, 2024, 4:59am

No, there is no additional parameter there. You can use market quote APIs for this.

On multiple symbols, yes I understood the API response here. Right now, the APIs are designed to fetch single symbol data, for larger timeframes, while ensuring data output is not large enough to require gzip.

mnikhil · October 26, 2024, 8:39pm

Well, we do not have to go the route of gzip, the size of the data should not be that big. Or, it could be client side thing to brute force and do all symbols in a loop with trying to assess for any individual symbol download failure.

My main request stems from the point that can I use these API to develop my models in a continuous manner? At this point, it would seem like probably, no? from throughput perspective, ofcourse!

mnikhil · November 3, 2024, 2:51pm

@Hardik quick question is does it account for stock split or reverse split events? that they are reflected and back propagated in OHLC data?

for example:
day1: download OHLC data
day2: stock split event materialized
day3: download OHLC data will have different prices reflecting the split materialization

or it does not?

mnikhil · November 3, 2024, 2:53pm

Is it possible for you to give a working code example on a jupyter notebook that demonstrates and prove this throughput?

mnikhil · November 6, 2024, 2:50pm

hi @Hardik any luck you could please respond to this

mnikhil · November 7, 2024, 5:29pm

trying my luck here again for response

cc @Hardik @Dhan_Cares

Hardik · November 8, 2024, 4:43am

Hello @mnikhil

It might not be possible to help you here with a working code for time constraints. I do have code written for Fetching Intraday Minute data in V1, will require some modifications.

Hope the logic for the same can be useful for you:

def fetch_minute_data_for_instrument(dhan, security_id):
    try:
        result = dhan.intraday_minute_data(
            security_id=security_id,
            exchange_segment=dhan.NSE,
            instrument_type="EQUITY"
        )
        return result
    except Exception as e:
        return {"status": "failure", "message": str(e)}

def fetch_minute_data_periodically(dhan, interval=1, max_iterations=10):
    instruments = ["1333", "11536", "7229"]  # List of different instruments
    for i in range(max_iterations):
        with concurrent.futures.ThreadPoolExecutor() as executor:
            futures = {executor.submit(fetch_minute_data_for_instrument, dhan, instrument): instrument for instrument in instruments}
            for future in concurrent.futures.as_completed(futures):
                instrument = futures[future]
                try:
                    result = future.result()
                    print(f"Iteration {i+1} - Data fetched for instrument {instrument} at {time.strftime('%Y-%m-%d %H:%M:%S')}")
                    print(f"Status: {result.get('status', 'Unknown')}")
                    
                    if result.get('status') == 'failure':
                        print(f"Error message: {result.get('message', 'No error message provided')}")
                        print(f"Full result: {result}")
                        raise Exception(f"Data fetch failed for instrument {instrument}")  # Stop the program on failure
                    elif isinstance(result.get('data'), dict):
                        print(f"Number of data points for {instrument}: {len(result['data'].get('open', []))}")
                    else:
                        print(f"Unexpected data format for {instrument}. Data received: {result.get('data')}")
                    
                    print("---")
                except Exception as e:
                    print(f"Error occurred for instrument {instrument}:")
                    print(f"{type(e).__name__}: {str(e)}")
                    import traceback
                    print(traceback.format_exc())
                    raise
        
        if i < max_iterations - 1:  # Don't sleep after the last iteration
            time.sleep(interval)

I wrote this for stress testing API library, you can keep max_iterations as 1 itself and mention date range as you do on historical data APIs.