Hi
I am trying to download stock quotes for S&P 500 stocks from marketwatch.com website.
However when I download them , I can download only first 190-200 stocks and then I receive garbage.
I am enclosing my program and its output.
Can you help me resolve this.
Thanks
Rajeev
Help needed Edward (edk)
Moderator: Rathinagiri
Help needed Edward (edk)
- Attachments
-
- SP500Dnld.zip
- (10.34 KiB) Downloaded 47 times
Re: Help needed Edward (edk)
Unfortunately, but the marketwatch.com website has a bot protection.
If the website detects that you are downloading data in bulk and too fast, a captcha protection appears to confirm that you are a human. That is why you get garbage in response (html and js code trying to show a captcha image).
If the website detects that you are downloading data in bulk and too fast, a captcha protection appears to confirm that you are a human. That is why you get garbage in response (html and js code trying to show a captcha image).
Re: Help needed Edward (edk)
Edward
Thanks for the reply,
1) I also thought anti-scraping measures are implemented by the website, Do we have to enter captcha during the process of download. I can do that .
2) I tried chatGPT to find a solution. It suggested rotating User-Agent, mimicking request coming from different browser.
Is there a way we can do it.
3) Alternatively, I found python can download from yahoo finance which also does not allow to download using anti-scraping mechanism. Can we integrate python code in our program.
Thanks.
Rajeev.
Thanks for the reply,
1) I also thought anti-scraping measures are implemented by the website, Do we have to enter captcha during the process of download. I can do that .
2) I tried chatGPT to find a solution. It suggested rotating User-Agent, mimicking request coming from different browser.
Is there a way we can do it.
3) Alternatively, I found python can download from yahoo finance which also does not allow to download using anti-scraping mechanism. Can we integrate python code in our program.
Thanks.
Rajeev.
Re: Help needed Edward (edk)
Ad.1 We cannot execute java scripts using the ServerXMLHTTP object. We would have to emulate the browser in some way, but I don't know how.
Ad.2 It seems to me that this would be sufficient if we wanted to download several dozen links, but not nearly 500 per day. In addition, it seems to me that web service checks the number of requests from the IP address, and this address does not change.
Ad.3 If the phyton code uses its own native libraries and/or classes, it may be difficult or impossible to convert it to harbour.
I'm sorry, but I can't help with this issue.
Ad.2 It seems to me that this would be sufficient if we wanted to download several dozen links, but not nearly 500 per day. In addition, it seems to me that web service checks the number of requests from the IP address, and this address does not change.
Ad.3 If the phyton code uses its own native libraries and/or classes, it may be difficult or impossible to convert it to harbour.
I'm sorry, but I can't help with this issue.
- serge_girard
- Posts: 3309
- Joined: Sun Nov 25, 2012 2:44 pm
- DBs Used: 1 MySQL - MariaDB
2 DBF - Location: Belgium
- Contact:
Re: Help needed Edward (edk)
data in bulk and too fast: maybe build a kind of pause/wait and let program run some hours..?
There's nothing you can do that can't be done...
Re: Help needed Edward (edk)
Thanks Edward for providing satisfying answer to my problem.
Rajeev
Rajeev
Re: Help needed Edward (edk)
Serge, What you suggested was programmed by Edward in my another program.
There I had to wait only for a few seconds only. In marketwatch case wait may be for few hours so it is not practical.
Thanks for suggestion though.
There I had to wait only for a few seconds only. In marketwatch case wait may be for few hours so it is not practical.
Thanks for suggestion though.