Ok, let me try to motivate the choice of subtitle. (Sorry about the length of this post. You can always ignore it if you want, but then you would miss what I consider an interesting aspect of Web Prolog and the Prolog Web.)
Developing Web Prolog into a language that allow people to program such agents is actually what I think we should be aiming for. But before I try to explain how this might be done for agents such as Siri or Alexa, I’d like to start with something simpler, namely pengines. I like to think of a pengine as a simple kind of intelligent and conversational software agent, and of the Prolog Web as the environment in which such agents are born, act and die. While they are alive, they talk to other agents that populate the Prolog Web, some of which are software agents, some of which are humans.
Pengines are agents
The notion of an agent is rather fuzzy but there are at least three properties most theorists would agree a software agent must possess: it must be a process of a sort, only loosely connected to other processes, it must be stateful, thus have a kind of memory of its own, and it must be capable of interacting with the world external to it. Note that under this definition, any stateful actor would qualify as an agent, and even Erlang might be seen as an agent programming language. A pengine is a kind of actor which in addition to the properties listed above has two other traits we intuitively tend to associate with agenthood: it is capable of reasoning and capable of giving answers to queries – answers that follow logically from what it believes.
The intelligence of a pengine agent is of course very limited. It is capable of an elementary form of reasoning from knowledge in the form of Prolog source code, and that is about it. The conversational abilities of a pengine are also very limited. It is capable of answering simple questions based on conclusions it draws from the knowledge it has to its disposal.
The birth, life and death of a pengine
Note: This section is more or less an extended version of the example in my Erlang’19 paper. I included it here just to make the point that talking to Prolog is like talking to something which on a certain level of abstraction can be described as an intelligent conversational agent. If you already believe in this point, you should skip to the section about “real” voice-based intelligent and conversational agents.
Below, we show how to create and interact with a pengine process which is running as a child of the current top-level process. Indeed, what we have here is a pengine running another pengine, a Prolog top-level running another Prolog top-level:
?- pengine_spawn(Pid, [
node('http://ex.org'),
src_text("p(a). p(b). p(c)."),
monitor(true),
exit(false)
]),
pengine_ask(Pid, p(X), [
template(X)
]).
Pid = 439752@'http://ex.org'.
?- flush.
Shell got success(439752@'http://ex.org',[a],true)
true.
?- pengine_next($Pid, [
limit(2)
]),
receive({Answer -> true}).
Answer = success(439752@'http://ex.org',[b,c],false).
?-
There is quite a lot going on here. The node
option passed to pengine_spawn/2
allowed us to spawn the pengine on a remote node, the src_text
option was used to send along three clauses to be injected into the process, and the monitor
options allowed us to monitor it. These options are all inherited from spawn/3
.
Given the pid returned by the pengine_spawn/2
call, we then called pengine_ask/2-3
with the query ?-p(X)
, and by passing the template
option we decided the form of answers. Answers were returned to the mailbox of the calling process (i.e. in this case the mailbox belonging to the pengine running our top-level). We inspected them by calling flush/0
. By calling pengine_next/2
with the limit
option set to 2
we then asked for the last two solutions, and this time used receive/1
to view them.
Since we passed the option exit(false)
to pengine_spawn/2
the pengine is not dead and we can use it to demonstrate how I/O works:
?- pengine_ask($Pid, pengine_output(hello)),
receive({Answer -> true}).
Answer = output(439752@'http://ex.org',hello).
?-
Input can be collected by calling pengine_input/2
, which sends a prompt
message to the client which can respond by calling pengine_respond/2
:
?- pengine_ask($Pid, pengine_input('|:', Answer)),
receive({Message -> true}).
Message = prompt(439752@'http://ex.org','|:').
?- pengine_respond($Pid, hi),
receive({Message -> true}).
Message = success(439752@'http://ex.org',[pengine_input('|:',hi)],false).
The pengine is still not dead so let us see what happens when a query such as ?-repeat,fail
is asked:
?- pengine_ask($Pid, (repeat, fail)).
true.
?-
Although nothing is shown, we can assume that the remote pengine is just wasting CPU cycles to no avail. Fortunately, we can always abort a runaway process by calling pengine_abort/1
:
?- pengine_abort($Pid),
receive({Answer -> true}).
Answer = abort(439752@'http://ex.org').
?-
When we are done talking to the pengine we can kill it:
?- pengine_exit($Pid, goodbye),
receive({Answer -> true}).
Answer = down(439752@'http://ex.org',goodbye).
?-
Note that messages sent to a pengine will always be handled in the right order even if they arrive in the “wrong” order (e.g. next
before ask
). This is due to the selective receive which defers the handling of them until the PCP protocol permits it. This behaviour guarantees that pengines can be freely “mixed” with other pengines or actors. The messages abort
and exit
, however, will never be deferred.
Voice-based intelligent conversational agents
Pengines are kind of dumb, but agents can be become both smarter and more conversational by programming. If the most optimistic AI researchers are right, there is in fact no limit to how smart a software agents might become. To be maximally useful for humans however, they need to be able to talk to us using natural language.
As a sign of where voice user interface technology may be heading, here is a call for participation in a the Conversational Interaction Conference that was held is San Jose, California, March 11th and 12th, 2019:
Talking to computers has long been a staple of science fiction. Today, talking or typing in human language to computers is becoming commonplace.
But this isn’t just a neat trick. Sure, you can ask Alexa to turn off the lights, play a song of your choice, or tell you a joke. But you can also ask her to connect with a company and talk to that company about products, services, issues, and even buy something.
Conversational interaction isn’t just a major technology trend in Artificial Intelligence. It’s also a breakthrough in user interface technology. It’s coming at a time it is needed, as the Graphical User Interface (GUI) that has served us so well is getting overburdened with too many features, icons, and long menus - with the small screen of mobile phones further limiting its effectiveness. The user manual for the Conversational User Interface is simply “Say or type what you want” or, even more simply, “How can I help you?”
Companies are hearing repeatedly about having an “AI Strategy,” but that vague admonition comes with significant hurdles, even to understand what it means. But every company can use the AI technology of conversational interaction. Companies can, for example, use it to improve customer service while lowering its cost or make employees more efficient.
And the good news is - there are many vendors providing tools that reduce the effort and risk in using this technology. This isn’t the future - companies are benefiting now!
That’s a lot of marketing lingo, but Vlad Sejnoha, former CTO for Nuance Communications, has written a brilliant, enthusiastic yet sobering article in the Wired magazine about the future prospects for intelligent conversational agents, in which he asked, with reference to the film Her: ``Can We Build `Her’?: What Samantha Tells Us About the Future of AI’'.
Hardware devices for intelligent conversational agents
The figure below shows five different hardware devices, all (normally) connected to the Web, often over Wi-Fi, sometimes over 4G or 5G. They can be seen as representing the most recent additions to the infrastructure of the Web.
a) A mobile phone is used by more people than ever to access the Web. Most mobile phones are equipped with a virtual assistants (such as Siri or Google Now) which uses a voice interface to answer questions, make recommendations, and perform actions.
b) Amazon Echo is a smart speaker equipped with the virtual assistant Alexa. According to Amazon, 100 million products with Alexa integrated have been sold.
c) Mycroft Mark II is an open-source alternative to Amazon’s Echo device.
d) Furhat is a social robot that communicates with us humans as we do with each other - by speaking, listening, showing emotions and maintaining eye contact. According to Furhat Robotics, the company that makes it, it is “the world’s most advanced robot of its kind. In any scenario where communication is required, Furhat can potentially fill this gap. Ask questions, practice interviews, train your skills, play games or learn something new.” See here for a lot of video clips. (Note: I know the people who founded the company, and I admire what they’ve done, but I don’t work for them.)
e) Oculus GO is a Virtual Reality (VR) headset. It is equipped a browser that allows a user to access the VR world. While inside (and maybe in the context of a game), the user may encounter virtual conversational agents (maybe NPCs) that may want to strike up a conversation with you. For inspiration, you may want to listen to a pod episode Human Interact’s Starship Commander, which uses AI-enabled, voice-activated commands in order to participate within an interactive story.
So, how to program these things?
There appear to be four main drivers behind the trend towards conversational voice user interfaces: 1) machine learning has improved a lot, 2) speech technologies have matured and error rates have gone down, 3) the Web (at least from a technological point of view) is in good shape and is only getting faster, bigger and better, and 4) as we saw above, new kinds of hardware devices that take advantage of improvements in those areas are now commercially available.
Technologies that in my opinion ought to be able to play an important role in these developments, but which seem to be under-utilised at this time, are 1) technologies for symbolic knowledge representation and reasoning technologies and 2) for specifying interaction. I think Web Prolog and the Prolog Web has something to contribute here.
Interestingly, Prolog has already been used (by people in this group!) for programming Alexa. Sam Neaves (@sam.neaves) made a video clip in the “Playing with Prolog” series, where he described how to set things up. More recently, Falco Nogatz (@fnogatz) et al published an interesting paper dealing with the subject.
With built-in logic-based knowledge representation and reasoning, and a built-in grammar formalism for parsing and generation of natural language, Prolog appears to be ideal for these sort of things, and with the even more powerful means for knowledge representation which tabling and well-founded semantics brings to the language, it gets even better. However, conversational systems, at least if they are sophisticated (e.g. Furhat robots and VR game NPCs), need concurrency as well as primitives for sending and receiving messages. I never felt that Prolog was a good language for programming interaction, at least not if fine-grained real-time interaction is called for. Traditional Prolog just doesn’t provide us with the proper means. This is where the actor model and the Erlang-ish features of Web Prolog comes in handy. Most likely, sophisticated conversational agents can be built from components that are themselves actors running concurrently, allowing agents, as it were, to think, listen, speak and act at the same time.
As far as I’m aware, event-driven state machines have not been used much in the Prolog world. Perhaps the reason is that primitives for sending and receiving messages did not appear in Prolog until (a few) platforms implemented the ISO Prolog Threads draft standard. As can be seen by an Erlang/OTP behaviour such as gen_statem
, Erlang takes event-driven state machines very seriously. I think it may be time for Prolog to follow suit. I have therefore, in a chapter of my manuscript, provided a somewhat sketchy proposal for how to introduce Web Prolog as a scripting language for State Chart XML (SCXML), a W3C standard which provides an XML-based notation for statecharts.
SCXML is based on the graphical statechart notation introduced by David Harel in (Harel, 1987). Statecharts already has a solid reputation as a great tool for the design and implementation of user interfaces and, because of this, I believe that the combination of SCXML and Web Prolog might be a great choice when programming conversational systems such as digital assistants and various forms of multi-modal user interfaces.
A chapter on Web Prolog and SCXML is here: https://github.com/Web-Prolog/swi-web-prolog/raw/master/book/web-prolog-and-scxml.pdf .
So, to summarise my argument, I think Web Prolog can be “sold”, not only as language for building a Prolog Web, but also as an excellent choice of language for programming voice-based intelligent and conversational agents.
(Again, sorry about the length of this post.)