Help needed Edward (edk)

General Help regarding HMG, Compilation, Linking, Samples

Moderator: Rathinagiri

Post Reply
RPC
Posts: 304
Joined: Fri Feb 10, 2017 4:12 am
DBs Used: DBF

Help needed Edward (edk)

Post by RPC »

Hi
I am trying to download stock quotes for S&P 500 stocks from marketwatch.com website.
However when I download them , I can download only first 190-200 stocks and then I receive garbage.
I am enclosing my program and its output.
Can you help me resolve this.
Thanks
Rajeev
Attachments
SP500Dnld.zip
(10.34 KiB) Downloaded 47 times
edk
Posts: 999
Joined: Thu Oct 16, 2014 11:35 am
Location: Poland

Re: Help needed Edward (edk)

Post by edk »

Unfortunately, but the marketwatch.com website has a bot protection.
If the website detects that you are downloading data in bulk and too fast, a captcha protection appears to confirm that you are a human. That is why you get garbage in response (html and js code trying to show a captcha image).
SP500-bot protect.jpg
SP500-bot protect.jpg (204.58 KiB) Viewed 1286 times
RPC
Posts: 304
Joined: Fri Feb 10, 2017 4:12 am
DBs Used: DBF

Re: Help needed Edward (edk)

Post by RPC »

Edward
Thanks for the reply,
1) I also thought anti-scraping measures are implemented by the website, Do we have to enter captcha during the process of download. I can do that .
2) I tried chatGPT to find a solution. It suggested rotating User-Agent, mimicking request coming from different browser.
Is there a way we can do it.
3) Alternatively, I found python can download from yahoo finance which also does not allow to download using anti-scraping mechanism. Can we integrate python code in our program.
Thanks.
Rajeev.
edk
Posts: 999
Joined: Thu Oct 16, 2014 11:35 am
Location: Poland

Re: Help needed Edward (edk)

Post by edk »

Ad.1 We cannot execute java scripts using the ServerXMLHTTP object. We would have to emulate the browser in some way, but I don't know how.
Ad.2 It seems to me that this would be sufficient if we wanted to download several dozen links, but not nearly 500 per day. In addition, it seems to me that web service checks the number of requests from the IP address, and this address does not change.
Ad.3 If the phyton code uses its own native libraries and/or classes, it may be difficult or impossible to convert it to harbour.

I'm sorry, but I can't help with this issue.
User avatar
serge_girard
Posts: 3309
Joined: Sun Nov 25, 2012 2:44 pm
DBs Used: 1 MySQL - MariaDB
2 DBF
Location: Belgium
Contact:

Re: Help needed Edward (edk)

Post by serge_girard »

data in bulk and too fast: maybe build a kind of pause/wait and let program run some hours..?
There's nothing you can do that can't be done...
RPC
Posts: 304
Joined: Fri Feb 10, 2017 4:12 am
DBs Used: DBF

Re: Help needed Edward (edk)

Post by RPC »

Thanks Edward for providing satisfying answer to my problem.
Rajeev
RPC
Posts: 304
Joined: Fri Feb 10, 2017 4:12 am
DBs Used: DBF

Re: Help needed Edward (edk)

Post by RPC »

Serge, What you suggested was programmed by Edward in my another program.
There I had to wait only for a few seconds only. In marketwatch case wait may be for few hours so it is not practical.
Thanks for suggestion though.
Post Reply