Page 1 of 1

Help needed Edward (edk)

Posted: Tue Jan 21, 2025 7:03 am
by RPC
Hi
I am trying to download stock quotes for S&P 500 stocks from marketwatch.com website.
However when I download them , I can download only first 190-200 stocks and then I receive garbage.
I am enclosing my program and its output.
Can you help me resolve this.
Thanks
Rajeev

Re: Help needed Edward (edk)

Posted: Thu Jan 23, 2025 11:49 am
by edk
Unfortunately, but the marketwatch.com website has a bot protection.
If the website detects that you are downloading data in bulk and too fast, a captcha protection appears to confirm that you are a human. That is why you get garbage in response (html and js code trying to show a captcha image).
SP500-bot protect.jpg
SP500-bot protect.jpg (204.58 KiB) Viewed 1288 times

Re: Help needed Edward (edk)

Posted: Thu Jan 23, 2025 5:30 pm
by RPC
Edward
Thanks for the reply,
1) I also thought anti-scraping measures are implemented by the website, Do we have to enter captcha during the process of download. I can do that .
2) I tried chatGPT to find a solution. It suggested rotating User-Agent, mimicking request coming from different browser.
Is there a way we can do it.
3) Alternatively, I found python can download from yahoo finance which also does not allow to download using anti-scraping mechanism. Can we integrate python code in our program.
Thanks.
Rajeev.

Re: Help needed Edward (edk)

Posted: Fri Jan 24, 2025 10:01 am
by edk
Ad.1 We cannot execute java scripts using the ServerXMLHTTP object. We would have to emulate the browser in some way, but I don't know how.
Ad.2 It seems to me that this would be sufficient if we wanted to download several dozen links, but not nearly 500 per day. In addition, it seems to me that web service checks the number of requests from the IP address, and this address does not change.
Ad.3 If the phyton code uses its own native libraries and/or classes, it may be difficult or impossible to convert it to harbour.

I'm sorry, but I can't help with this issue.

Re: Help needed Edward (edk)

Posted: Fri Jan 24, 2025 4:33 pm
by serge_girard
data in bulk and too fast: maybe build a kind of pause/wait and let program run some hours..?

Re: Help needed Edward (edk)

Posted: Fri Jan 24, 2025 6:23 pm
by RPC
Thanks Edward for providing satisfying answer to my problem.
Rajeev

Re: Help needed Edward (edk)

Posted: Fri Jan 24, 2025 6:26 pm
by RPC
Serge, What you suggested was programmed by Edward in my another program.
There I had to wait only for a few seconds only. In marketwatch case wait may be for few hours so it is not practical.
Thanks for suggestion though.