Grabbing data from a web page

Wisermans · June 12, 2021, 12:45pm

Hello i’m playing with xpath/3 which works fine for example on data.un.org but it seems that what i see with F12 on Edge can’t be grabbed the same way on a page like ACCOR € 34.94 (euronext.com). How can i get for example the quote price that is in the header at //*[@id=“header-instrument-price”] and do not appear in my DOM when i capture it in a file ?

CapelliC · June 12, 2021, 4:00pm

Web scraping has changed a lot with the spread of AJAX - that is - dynamic content generated by Javascript.
Probably the content of the link you posted was generated by some Javascript framework.
Personally, some time ago I had to scrape a lot of dynamically generated data, and I used Puppeteer. A big steer from the tools I was using before, among them library(xpath)…

Wisermans · June 12, 2021, 5:02pm

Yup i saw that (UN data is static compared to most new web platforms). My aim is to try staying on a Prolog environement, though in many situations each time the reply seems to be to to use another language platform and to use Prolog like a glue rather than a full developement platform. I could also follow the trend on C# due to the huge investments done by Microsoft into their libraries. In fact my question is let’s say about Prolog’s development strategy = reverse the trend and request from Prolog … also why i opened that discussion based on getting just one figure on a whole page … Any idea on how to stay in Prolog and get that figure ?

CapelliC · June 12, 2021, 5:10pm

This sound interesting, could you share some tip ? I ask, because from what I heard, M$ embraced chromium for its browser. Puppeteer is also chromium based…

Wisermans · June 12, 2021, 5:39pm

Well if you look at Microsoft’s strategy since beginning (and i knew them being one of their OEM strategic partners when they were still a small company) … there approach has always been to catch markets tunirng around when they fail doing it by the front door … example C# existed after they got problems with their JIT compiler as they were trying to make their own Java sauce, more recently they made a smart move with LinkedIn, they bought Github to renew links with developpers, they are pushing for Python hiring its father because it is a good way to recenter it around themselves, they are pushing for Visual Code etc. and if you look at recent moves they are getting back to Javascript as 1/ half of their own platforms are still using it 2/ thanks to their new “open source” image their are no more considered as evil … but they are the “M” from GAFAM … so not angels either … a huge group here to make money even if they have more competitors than they had 20 years ago … embracing Chromium has also been a very smart strategic move as now Edge is not considered as being a Microsoft hell but reinforced as a standard platform … keep in mind that Microsoft makes money on its cloud platforms … the browser is just a door to an online world … so all what can help virtualize services is good for Microsoft … Balmer’s war againt open souce and Linux is old times … Windows grew up even bigger and (soon) all that will certainly even be vritualized as it is much more simple for companies not even to have to manage their terminals … back to old times but now with terminals being intelligent … even games are following the trend and Microsoft is thinking about getting back also to end users that way with TV etc. As said Steve Jobs his huge mistake has been not to believe into smart phones … though in that time he was leading the wave … To get back to Prolog my personal feeling is that the way to get out of its ghetto is to be able to live as a real platform = being able to solve my figure problem (= having a bridge or so enabling it in spite of being the AI glue) and i gonna add another post about DDE too

Wisermans · June 13, 2021, 5:50am

@CapelliC an example of VS code embracing pupeeter and a link to Pupeeter sharp. Then follows my question, as you said that you moved to Pupeeter = how would you get the quote value from my Accor Euronext example within SWI Prolog and using Pupeeter ? (not catching a figure on screen but the “DOM dump” way as it seems that Pupeeter can catch it the same way i get it on screen with F12 on Edge)

Wisermans · June 13, 2021, 11:05am

@CapelliC After looking at Pupeeter to understand what it does and how web works “under the hood” … i found a solution looking at what goes thru to solve my “grabbing data” = in fact there are simple http requests from the Ajax part …

In my Euronext example i have several links depending on data, may it be https://live.euronext.com/en/ajax/getIntradayPrice/FR0000120404-XPAR or some others (address + data requested + ID of the instrument), with a DOM where i can use XPATH. That way, I keep being 100% with SWI Prolog predicates http_open + load_html + xpath …

Methodology = F12 in my Edge browser to look at what “goes on the line to the browser”, get the addresses, DOM + Xpath … make an inventory of what is at disposal, then i get the data.

To secure it as for the quality of financial data 1/ can be done / compared the same way with different internet sites looking at who is the feed provider in order to avoid providers mistakes 2/ also compare some data with those from Excel financial feed data provided with Offfice 365.

PS: The same way UN country data are indexed with ISO-3166 codes, financial data need a dictionary based on ISIN + alpha code (some others are Reuters or Bloomberg codes too). I suppose that i will certainly need to add some requests with headers and so on to get some more specific filtering but for now i solved my initial request.

Wisermans · June 14, 2021, 7:49pm

as i m thinking about how to go ahead grabbing pages and if it can be useful … another link that i am going to look at : AJAX The XMLHttpRequest Object (w3schools.com)

Topic		Replies	Views
Htmx and end-to-end declarative web programming Stream - Reply 1 General	5	380	May 30, 2024
TailwindCSS for Prolog - Replies General	5	233	July 9, 2021
Web query help needed Predicate how-to	9	732	July 20, 2022
Advice for people learning Modern, Declarative Prolog from older texts Nice to know how-to	11	1656	May 18, 2021
Quick check: Is the swiplwebtut still good for learning server setup? General	6	151	April 15, 2024

Grabbing data from a web page

Related topics