Maybe if you filtered GPT-3’s output through a Polog program (perhaps implemented as a DCG transducer, outputting corrected strings) to make it more coherent semantically, you could get something out of it. Otherwise it’s awfully inconsistent in terms of meaning (though very good at representing structure).
I was wondering if others were thinking along the same lines about transformers, the T in GPT. As many know transformers fail at evluations such as math but are great at generating, including generating Prolog code. I have been experiementing with them generating DCGs but ChatGPT is not perfect in this area either but the fact that it can create Prolog DCGs is impressive and that transformers can be trained to become better is still possible.
So while you used the word filtered
I would replace that with eval
to cover a larger set of problems. Also instead of having GPT generate the initial story instead consider having GPT or such generate Prolog DCGs that generated the story and do all necessary checking.
With OpenAI you would need a large token count to generate the rather lengthy DCG and with ChatGPT set at either 4,000 or 8,000 tokens for the prompt and completion I do not see this working correctly, Even using the continue
prompt to get more code, However GPT-4 was just released and has a model with 32K token limit.
Along a similar lines I posted an example ChatGPT prompt for doing math (ref) since ChatGPT can not do math evaluations. Also tired having CHatGPT create a query for Wolfram Alpha to do the evaluation but was not getting consistent valid queires so set that aside for now.
So in a few years what you seek should be mainstream and might even be as free as a Google query.