Primed for disappointment

By Marcia McNutt
Geophysicist and 22nd President of the National Academy of Sciences
A close-up image of a digitally designed series of cubes that range from colorless to glowing vibrant shades of green. This visual represents how an idea ignites the potential of AI to contribute to human flourishing, magnifying endless possibilities in this rapid revolution.

To say that I approached a review of the newest OpenAI version of ChatGPT with skepticism would be an understatement. Before test-driving GPT-4, I had played around with earlier versions and concluded they were not ready for prime time. I asked GPT-3.5 some questions and was frankly surprised at what I was told. While I was fully prepared for the model to be unable to accurately weigh the importance of accomplishments of various public figures, I did not expect it to literally make up stuff about people. Unbeknownst to them, some of my colleagues had won the National Medal of Science. Some were credited with discoveries of other researchers. I was also puzzled when the model provided an incorrect answer to a basic geological question. The output was especially surprising since the correct answer is backed by strong scientific consensus for more than fifty years. Nevertheless, I was impressed that even this earlier version translated scientific jargon into terms that non-experts and non-scientists would understand. No easy feat.


I would not count on even the newest version of GPT to ace a university-level science test. Yet.

With this prior experience, I was not expecting miracles from GPT-4. But I was pleasantly surprised that the same queries as before returned far more accurate and appropriate responses. I did not see made-up honors or wrongly attributed discoveries. Scientific queries returned easy-to-understand summaries of the current state of understanding. This is not to say that improvement isn’t necessary. As one example, GPT-4 conflated the geologic age of a volcano with that of the seafloor upon which it erupted. I would not count on even the newest version of GPT to ace a university-level science test. Yet.

However, the vast improvement from GPT-3.5 to GPT-4 in such a short time convinced me that we are at the dawn of AI research assistants. Whereas a simple internet search results in a list of relevant websites, GPT-4 summarizes the information contained in those links using plain language. This advanced capability promises new, time-saving opportunities as well as challenges.

It will be interesting to see the myriad of applications for AI assistants that may emerge over the coming years. Here are just a few of the science applications for GPT-4 I attempted, with promising results, that demonstrate its potential:

  1. Science education. Most elementary and junior high school science teachers do not have degrees in science. GPT-4 and future versions could help them answer questions from their students and translate more complicated scientific principles into language that both they and their students can understand.
  2. Background research. GPT can provide lists of important research articles on any topic. Its conversational tone will make this a popular tool for students just starting to delve into a new topic. Using an AI assistant that suggests background reading might also result in less researcher bias in approaching a problem.
  3. Biographical memoires. The National Academy of Sciences has a long list of deceased members for whom no biographical memoire has been penned. GPT-4 could generate a first draft, followed by light editing and fact checking by the deceased member’s colleagues and family.
  4. Flagging conflicts of interest. GPT-4 is quickly able to sort through the myriad web pages of public information to seek out prior associations that might constitute a conflict of interest for a person being considered for an assignment. A challenge for anyone putting together committees and review panels is to avoid conflicts of interest among the members. Sometimes even the individual being considered for the appointment overlooks associations that might be perceived as biasing their advice.

Despite these and other benefits, I do worry that use of GPT-4 could have negative repercussions. For example:

  1. Loss of research skills. Researchers could lose the ability to analyze separate and sometimes conflicting lines of argument prior to arriving at some reconciliation. However, I note that 50 years ago the introduction of pocket calculators led many to lament the impact on math skills. In the end, there was no way to stop the use of tools that save time and increase accuracy, and I expect that the same will ultimately be true of AI-assisted background research.
  2. Concerns about transparency. The algorithms underlying AI assistants are generally proprietary. Every teacher knows that the best way to understand whether a student has mastered a topic is to ask the student to show their work: derivations, expert sources, etc. For the user of GPT-4, there is no such evidence trail. On the other hand, I did appreciate the cautious framing of responses provided by the model, which could help reduce resistance to the technology. For example, GPT-4 will not provide personal details for anyone who is not a public figure, to avoid privacy concerns. GPT-4 is also cautious in not providing its own opinions on topics that cannot be decided on scientific knowledge and facts alone.
  3. Lack of accountability. Any person who makes a claim can be held personally accountable to defend the accuracy of that statement. The AI model cannot be held responsible, and in fact can be inconsistent in its responses to queries. That lack of accountability explains why journals are not allowing GPT-4 to become a “co-author” on research papers. But journals are also requesting that model text be flagged as such to prevent plagiarism. This policy appears to anthropomorphize this AI technology in direct contrast to the authorship policy. How this will be enforced is not apparent, as even identical queries result in slightly different phrasing of the answer.

As with other groundbreaking new capabilities, the ultimate uses and misuses of this technology will slowly emerge in the coming years. Nearly 50 years ago scientific leaders gathered at the Asilomar Conference Center in California to draw up voluntary guidelines to govern the use of the emerging technology of that era: recombinant DNA. To this day, that effort stands out as an extraordinary and successful move by the research community to invoke the precautionary principle in the applications of new technology. Artificial intelligence is another opportunity for the research community to come together and agree on how to advance the use of this new technology while avoiding undesirable outcomes.

Note: this essay is the personal opinion of the author and should not be construed as a review by the National Academies of Science, Engineering, and Medicine.

The view, opinion, and proposal expressed in this essay is of the author and does not necessarily reflect the official policy or position of any other entity or organization, including Microsoft and OpenAI. The author is solely responsible for the accuracy and originality of the information and arguments presented in their essay. The author’s participation in the AI Anthology was voluntary and no incentives or compensation was provided.

McNutt, M. (2023, June 5). Primed for Disappointment. In: Eric Horvitz (ed.), AI Anthology. https://unlocked.microsoft.com/ai-anthology/marcia-mcnutt


Marcia McNutt

Marcia McNutt is a geophysicist and the 22nd president of the National Academy of Sciences. McNutt previously served as editor-in-chief of Science journals and director of the U.S. Geological Survey. She is a fellow of the American Geophysical Union, Geological Society of America, the American Association for the Advancement of Science, and the International Association of Geodesy. McNutt was awarded the U.S. Coast Guard’s Meritorious Service Medal, AGU’s Macelwane Medal, and the Maurice Ewing Medal.

A portrait of Marcia McNutt