This is not quite human-level question-answering in the everyday sense of those words. The ZDNet headline is too clickbaity for my taste.
The answer to every question in the test is a preexisting snippet of text, or "span," from a corresponding reading passage shown to the model. The model has only to select which span in the reading passage gives the best answer -- i.e., which sequence of words already in the text best answers the question.[a]
[a] If this explanation isn't entirely clear to you, it might help to think of the problem as a challenging classification task in which the number of possible classes for each question is equal to the number of possible spans in the corresponding reading passage.
Before you blink an eye there will be some MBA-types working on PowerPoint proposals with detailed cost-benefit analyses for using those new AI machines they heard about that can read better than human beings. Needless to say, the technology will fall far short of expectations.
This is why there have been two AI winters already.
I think it's incumbent on people like you to get the word out that ML isn't going to put everyone's jobs at risk. Between this and self-driving cars, local governments are beginning to weigh spending tax dollars on these boondoggles, for example, self-driving cars, instead of on proven modes of transit like public transport.
The futurist writers peddling this stuff need to take a moment to chill and learn about the actual state of the underlying technology.
I currently am working on improving a existing QA style dataset and am exploring the best way to have Machine Reading Comprehension stop being about span selection and move to some kind of reasoning. Turns out making questions in a repeatable way that have some kind of reasoning is quite hard.
> The model has only to select which span in the reading passage gives the best answer -- i.e., which sequence of words already in the text best answers the question.[a]
Sounds like they've reinvented Jeopardy Watson's ability to excel at Q&A, but 12 years later.
The answer to every question in the test is a preexisting snippet of text, or "span," from a corresponding reading passage shown to the model. The model has only to select which span in the reading passage gives the best answer -- i.e., which sequence of words already in the text best answers the question.[a]
Actual current results:
https://rajpurkar.github.io/SQuAD-explorer/
Paper describing the dataset and test:
https://arxiv.org/abs/1606.05250
[a] If this explanation isn't entirely clear to you, it might help to think of the problem as a challenging classification task in which the number of possible classes for each question is equal to the number of possible spans in the corresponding reading passage.