Hello i’m playing with xpath/3 which works fine for example on data.un.org but it seems that what i see with F12 on Edge can’t be grabbed the same way on a page like ACCOR € 34.94 (euronext.com). How can i get for example the quote price that is in the header at //*[@id=“header-instrument-price”] and do not appear in my DOM when i capture it in a file ?
Personally, some time ago I had to scrape a lot of dynamically generated data, and I used Puppeteer. A big steer from the tools I was using before, among them library(xpath)…
Yup i saw that (UN data is static compared to most new web platforms). My aim is to try staying on a Prolog environement, though in many situations each time the reply seems to be to to use another language platform and to use Prolog like a glue rather than a full developement platform. I could also follow the trend on C# due to the huge investments done by Microsoft into their libraries. In fact my question is let’s say about Prolog’s development strategy = reverse the trend and request from Prolog … also why i opened that discussion based on getting just one figure on a whole page … Any idea on how to stay in Prolog and get that figure ?
This sound interesting, could you share some tip ? I ask, because from what I heard, M$ embraced chromium for its browser. Puppeteer is also chromium based…
@CapelliC an example of VS code embracing pupeeter and a link to Pupeeter sharp. Then follows my question, as you said that you moved to Pupeeter = how would you get the quote value from my Accor Euronext example within SWI Prolog and using Pupeeter ? (not catching a figure on screen but the “DOM dump” way as it seems that Pupeeter can catch it the same way i get it on screen with F12 on Edge)
@CapelliC After looking at Pupeeter to understand what it does and how web works “under the hood” … i found a solution looking at what goes thru to solve my “grabbing data” = in fact there are simple http requests from the Ajax part …
In my Euronext example i have several links depending on data, may it be https://live.euronext.com/en/ajax/getIntradayPrice/FR0000120404-XPAR or some others (address + data requested + ID of the instrument), with a DOM where i can use XPATH. That way, I keep being 100% with SWI Prolog predicates http_open + load_html + xpath …
Methodology = F12 in my Edge browser to look at what “goes on the line to the browser”, get the addresses, DOM + Xpath … make an inventory of what is at disposal, then i get the data.
To secure it as for the quality of financial data 1/ can be done / compared the same way with different internet sites looking at who is the feed provider in order to avoid providers mistakes 2/ also compare some data with those from Excel financial feed data provided with Offfice 365.
PS: The same way UN country data are indexed with ISO-3166 codes, financial data need a dictionary based on ISIN + alpha code (some others are Reuters or Bloomberg codes too). I suppose that i will certainly need to add some requests with headers and so on to get some more specific filtering but for now i solved my initial request.
as i m thinking about how to go ahead grabbing pages and if it can be useful … another link that i am going to look at : AJAX The XMLHttpRequest Object (w3schools.com)