SGML parser

Is there a way to tell the XML/SGML parser to stop parsing after finding an end tag?

I am trying to parse XMPP which in the beginning of the network session sends an XML fragment that is not closed:

<stream:stream
        from='im.example.com'
        id='t7AMCin9zjMNwQKDnplntZPIDEI='
        to='juliet@im.example.com'
        version='1.0'
        xml:lang='en'
        xmlns='jabber:client'
        xmlns:stream='http://etherx.jabber.org/streams'>

   <stream:features>
        <starttls xmlns='urn:ietf:params:xml:ns:xmpp-tls'>
          <required/>
        </starttls>
      </stream:features>

Notice stream is left open. I am able to use the call(…) option of sgml_parse/2, and it processes the elements, but I need the parsing to end when it reaches <stream:features>, otherwise the parser keeps waiting for the </stream:stream> (because the tcp stream needs to be left open). I can’t use the `content_length(…)’ option of sgml_parse/2 because the length is variable.

That is rather curious XML. Is this never closed with </stream:stream>? The indentation suggests it is. You can operate the parser in call back mode, watching for the <stream:features> open term, calling it to parse the current node and process its content. That is for example how the RDF/XML parser works.

Thanks I’ll try that

For the initial tls negotiation it is not closed but there are other situations in which it is closed

Also, is there a way to get the raw xml that goes through the parser, so that i can call debug(xmpp(xml), ... and print out the raw xml that it is processing?

I don’t think so. If it is a finite message I typically use a temporary hack to read the input to a string and then parse the string. You can also use a file for that purpose.

Thanks, yes, I think I am going to read it into a string using tcp_fcntl/3 to read until EWOULDBLOCK to make the data available for debug/3 and then send the string to the parser.

If you want to dump the header of some fixed length you can also use peek_string/3. This leaves all data in the input buffer.

Thanks! I actually need to show the whole conversation between client and server, and the snippets are of variable length. I ended up using read_pending_codes/3 when debug is enabled by calling fill_buffer/1 before. It works so far, but it may break if the message is larger than the buffer size.

BTW, is the buffer size the kernel buffer size? Or is there a prolog buffer?

See set_stream/2 using the buffer_size(Size) option. There is no limit as long as you have the memory.

Great! that solves the problem nicely! Thanks for writing a great prolog implementation.