Help needed Edward (edk)

RPC · Post by **RPC** » Tue Jan 21, 2025 7:03 am

Hi
I am trying to download stock quotes for S&P 500 stocks from marketwatch.com website.
However when I download them , I can download only first 190-200 stocks and then I receive garbage.
I am enclosing my program and its output.
Can you help me resolve this.
Thanks
Rajeev

Post by **edk** » Thu Jan 23, 2025 11:49 am

Unfortunately, but the marketwatch.com website has a bot protection.
If the website detects that you are downloading data in bulk and too fast, a captcha protection appears to confirm that you are a human. That is why you get garbage in response (html and js code trying to show a captcha image).

: SP500-bot protect.jpg (204.58 KiB) Viewed 1291 times

RPC · Post by **RPC** » Thu Jan 23, 2025 5:30 pm

Edward
Thanks for the reply,
1) I also thought anti-scraping measures are implemented by the website, Do we have to enter captcha during the process of download. I can do that .
2) I tried chatGPT to find a solution. It suggested rotating User-Agent, mimicking request coming from different browser.
Is there a way we can do it.
3) Alternatively, I found python can download from yahoo finance which also does not allow to download using anti-scraping mechanism. Can we integrate python code in our program.
Thanks.
Rajeev.

Post by **edk** » Fri Jan 24, 2025 10:01 am

Ad.1 We cannot execute java scripts using the ServerXMLHTTP object. We would have to emulate the browser in some way, but I don't know how.
Ad.2 It seems to me that this would be sufficient if we wanted to download several dozen links, but not nearly 500 per day. In addition, it seems to me that web service checks the number of requests from the IP address, and this address does not change.
Ad.3 If the phyton code uses its own native libraries and/or classes, it may be difficult or impossible to convert it to harbour.

I'm sorry, but I can't help with this issue.

Post by **serge_girard** » Fri Jan 24, 2025 4:33 pm

data in bulk and too fast: maybe build a kind of pause/wait and let program run some hours..?

RPC · Post by **RPC** » Fri Jan 24, 2025 6:23 pm

Thanks Edward for providing satisfying answer to my problem.
Rajeev

RPC · Post by **RPC** » Fri Jan 24, 2025 6:26 pm

Serge, What you suggested was programmed by Edward in my another program.
There I had to wait only for a few seconds only. In marketwatch case wait may be for few hours so it is not practical.
Thanks for suggestion though.

HMGforum.com

Help needed Edward (edk)

Help needed Edward (edk)

Re: Help needed Edward (edk)

Re: Help needed Edward (edk)

Re: Help needed Edward (edk)

Re: Help needed Edward (edk)

Re: Help needed Edward (edk)

Re: Help needed Edward (edk)