Question about domain knowledge. Porting info from Excel to Prolog

in 1 project our domain knowledge-experts deliver their knowledge in a format and tool which they have choosen by themselves, and in this case they chose to work in ms Excel. in 1 excel sheet they placed data in several different colomns with arbitrary rows starts and a bold colomn headers, also they use background- Colors in cells so that cells with the same background-color have a connection/relation. So this means the programmers in the project have to parse binary-excel files formats, because when saved or exported to Csv the colors of these cells are lost.

Is this normal that you have parse and cope with .xlsx files directly?. Before i could read the converted to .Csv files, but now that is not possible because the cell colors are only present in the excel file itself

thankyou

That depends. In this case a color for a cell is being used to represent knowledge and you want to use the piece of knowledge then one of the most practical ways to get the color of a cell is to use one of the Excel APIs to get at the color.

A long time ago when I use to work with Microsoft Office products for which Microsoft Excel is one, the use of VBA (Visual Basic for Applications) was the way to go. (ref) The other was was to use one of the programming languages based on .Net and access the Excel via the API.

Another option would be to create a shadow set of the information. What I mean by that is there would be another page or section of cells that hold the mapping information represented by the colors that is in text format and can be exported as a separate CSV file. The problem with this is that if the colors and shadow set are different it can/will lead to bugs.

HTH

dear Eric, thankyou for your info.

in many cases the so called domain experts allways choose whatever format they like or what is conveniant to them and never adhere to parsable formats. All the rest is the task of the programmer to parse whatever the domain expert decides to intend in whatever format. So in this case i must use Visual Basic for Applications which exists only in paid license? or is there a free version of VBA. Thankyou for the information

Sorry for not noting cost. I have not used VBA in decades and when I did I did own purchased copies of MS Office so VBA was included.

If this were my problem and I did not want to purchase Excel to get VBA but still needed to access the color information of the cells then using one of the .Net languages such as C# or F# along with the Office interop objects would be the next way I would investigate. The .Net family of tools are free for community edition. The Office primary interop assemblies are a separate install.

HTH

thankyou for the information

1 Like

No. Demand a sensible format, e.g. JSON via HTTPS API or file.

It is annoying to have to deal with this but it is not unheard of. You can try to slowly help the expert to start using a more sane process but at least at the beginning you are stuck with what they have chosen. I have had to use excel spreadsheets from different clients, from biologists to just business people.

The way out is to get a library that deals with the file format. Newer excels are not binary, they are some XML I thought but that doesn’t make it any easier. You need some Perl/Python/Java library to get the information out of the excel and then you can move on.

What does become a problem sometimes is that the format they have somehow adopted is not unambiguous.

PHPSpreadsheet usually does a good job both at reading and writing several different file formats. It’s rather easy to use…

But to stay in Prolog, you could directly open the xlsx files. They are just XML files in a compressed folder. Just rename the .xlsx file adding .zip to it and look into for your data…

2 Likes

yes that is correctly stated, for example in this case there are colomns with 1 header in this column, freely placed on different row beginnings, which makes it indeed not a 1 way implement, the info Is ambiguous. i asked the domain expert if i had to parse this color excel file, and then they thought i was trying to make fun of them, and that on purpose i was distorting their valuable input for the project

thankyou this is great to know, php has in this case a tool which reads large parts these files, great that is the easiest way i believe

Since you are open to other means and seem want to stay with SWI-Prolog as much as possible there is

https://www.swi-prolog.org/pack/list?p=odf_sheet

Load and analyze ODF spreadsheets (.ods)

As I have never used this SWI-Prolog pack, this is not a recommendation.


Yes Excel can be saved as ods format.

great idea indeed, go from excel to Ods or Odf in which way or whatsoever and then the rest with swi prolog

1 Like

I must say that it is a pleasure to help you and answer your questions because you only bring the facts needed to solve the problem, listen to all, are willing to change as needed and nicely give more information to clarify what you need. :slightly_smiling_face:

though i can almost assure that the domain expert will never take the energie to save the file as Odt or Odf ( from excel or from open office ) and they see that as a part of the programmers job, so as long as you dont make a batch convert to xml or odf of 1 folder of excel files you will eventually end up performing save-as tasks a lot of times because they can come up with newer versions every day or every 10 minutes