This is related to this post Making a large prolog based knowledge base for bioinformatics/epidemiology
Here are some example of the predicates used when parsing binary data files. binary.pl (9.9 KB)
get_byte(Stream,Address,Word) :-
seek(Stream,Address,bof,_New_address),
get_byte(Stream,Word).
get_byte(Stream,Address0,Address,Byte) :-
get_byte(Stream,Address0,Byte),
Address is Address0 + 16'01.
get_word(Stream,Endian,Word) :-
get_byte(Stream,Byte_0),
get_byte(Stream,Byte_1),
(
Endian == big
->
Word #= (Byte_0 << 8) + Byte_1
;
Word #= (Byte_1 << 8) + Byte_0
).
get_word(Stream,Endian,Address,Word) :-
seek(Stream,Address,bof,_New_address),
get_word(Stream,Endian,Word).
get_word(Stream,Endian,Address0,Address,Word) :-
get_word(Stream,Endian,Address0,Word),
Address is Address0 + 16'02.
Example to parse the start and header of a BGEN file (format).
Example BGEN file (link) - File name: complex.10bits.bgen
% -----------------------------------------------------------------------------
:- module(bgen_dcg,
[
]).
% -----------------------------------------------------------------------------
:- use_module(library(dcg/basics)).
:- use_module('D:/binary.pl').
% -----------------------------------------------------------------------------
parse_bgen_file(Stream,bgen(First_variant_offset,Header),Size) :-
Offset0 = 16'00,
binary:get_doubleword(Stream,little,Offset0,Offset1,First_variant_offset),
get_header(Stream,Offset1,Offset2,Header),
Offset = Offset2,
Size = Offset.
get_header(Stream,Offset0,Offset,header(Header_length,Variant_data_block_count,Sample_count,Magic_number,Free_data,Flags)) :-
binary:get_doubleword(Stream,little,Offset0,Offset1,Header_length),
binary:get_doubleword(Stream,little,Offset1,Offset2,Variant_data_block_count),
binary:get_doubleword(Stream,little,Offset2,Offset3,Sample_count),
binary:get_bytes(Stream,Offset3,Offset4,4,Magic_number),
string_codes(String,Magic_number),
assertion( String == "bgen" ),
Free_data_size is Header_length - 20,
binary:get_bytes(Stream,Offset4,Offset5,Free_data_size,Free_data),
binary:get_doubleword(Stream,big,Offset5,Offset,Flags).
% -----------------------------------------------------------------------------
:- begin_tests(parse_BGEN).
test(001) :-
Path = 'D:/complex.10bits.bgen',
open(Path,read,Stream,[type(binary)]),
parse_bgen_file(Stream,Data,Size),
assertion( Data == bgen(68,header(20,10,4,`bgen`,[],150995072)) ), % NB `bgen` is a character code list not a string
assertion( Size == 24 ).
:- end_tests(parse_BGEN).
% -------------------------------------
Note to Sam. The original code I sent could only do big endian, this can also do little endian.