Dealing with a loved one’s belongings after their death is never easy. But as Alaska’s state courts have discovered, an inaccurate or misleading artificial intelligence chatbot can easily make matters worse.
For more than a year, Alaska’s court system has been designing a pioneering generative AI chatbot termed the Alaska Virtual Assistant (AVA) to help residents navigate the tangled web of forms and procedures involved in probate, the judicial process of transferring property away from a deceased person.
Yet what was meant to be a quick, AI-powered leap forward in increasing access to justice has spiraled into a protracted, yearlong journey plagued by false starts and false answers.
AVA “was supposed to be a three-month project,” said Aubrie Souza, a consultant with the National Center for State Courts (NCSC) who has worked on and witnessed AVA’s evolution. “We are now at well over a year and three months, but that’s all because of the due diligence that was required to get it right.”
Designing this bespoke AI solution has illuminated the difficulties government agencies across the United States are facing in applying powerful AI systems to real-world problems where truth and reliability are paramount.
“With a project like this, we need to be 100% accurate, and that’s really difficult with this technology,” said Stacey Marz, the administrative director of the Alaska Court System and one of the AVA project’s leaders.
“I joke with my staff on other technology projects that we can’t expect these systems to be perfect, otherwise we’d never be able to roll them out. Once we get the minimum viable product, let’s get that out there, and then we’ll enhance that as we learn.”
But Marz said she thinks this chatbot should be held to a higher standard. “If people are going to take the information they get from their prompt and they’re going to act on it and it’s not accurate or not complete, they really could suffer harm. It could be incredibly damaging to that person, family or estate.”
While many local government agencies are experimenting with AI tools for use cases ranging from helping residents apply for a driver’s license to speeding up municipal employees’ ability to process housing benefits, a recent Deloitte report found that less than 6% of local government practitioners were prioritizing AI as a tool to deliver services.
The AVA experience demonstrates the barriers government agencies face in attempting to leverage AI for increased efficiency or better service, including concerns about reliability and trustworthiness in high-stakes contexts, along with questions about the role of human oversight given fast-changing AI systems. These limitations clash with today’s rampant AI hype and could help explain larger discrepancies between booming AI investment and limited AI adoption.
Marz envisioned the AVA project as a cutting-edge, low-cost version of Alaska’s family law helpline, which is staffed by court employees and provides free guidance about legal matters ranging from divorce to domestic violence protective orders.
“Our goal was to basically try to replicate the services with the chatbot that we would provide with a human facilitator,” Marz told NBC News, referring to AVA’s team of attorneys, technical experts and advisers from the NCSC. “We wanted a similar self-help experience, if somebody was able to talk to you and say, ‘This is what I need help with, this is my situation.’”
While the NCSC provided an initial grant to get AVA off the ground as part of its growing work on AI, the chatbot has been technically developed by Tom Martin, a lawyer and law professor who launched a law-focused AI company called LawDroid and designs legal AI tools.
Describing the AVA service, Martin highlighted many critical decisions and considerations that go into the design process, such as choosing and shaping an AI system’s personality.
Many commentators and researchers have illustrated how certain models or versions of AI systems behave in different ways, almost as if they adopt different personas. Researchers and even users can alter these personas through technical tweaks, as many ChatGPT users found out earlier this year when the OpenAI service fluctuated between personalities that were either gushing and sycophantic or emotionally distant. Other models, like xAI’s Grok, are known for having looser guardrails and increased willingness to embrace controversial topics.
“Different models have almost different types of personalities,” Martin told NBC News. “Some of them are very good at rule-following, while others are not as good at following rules and kind of want to prove that they’re the smartest guy in the room.”
“For a legal application, you don’t want that,” Martin said. “You want it to be rule-following but smart and able to explain itself in plain language.”
Even traits that would otherwise be welcomed become more problematic when applied to topics as consequential as probate. Working with Martin, NCSC’s Souza noted that early versions of AVA were too empathetic and annoyed users who might have been actively grieving and simply wanted answers about the probate process: “Through our user testing, everyone said, ‘I’m tired of everybody in my life telling me that they’re sorry for my loss.’”
“So we basically removed those kinds of condolences, because from an AI chatbot, you don’t need one more,” Souza said.
Beyond the system’s superficial tone and pleasantries, Martin and Souza had to contend with the serious issue of hallucinations, or instances in which AI systems confidently share false or exaggerated information.
“We had trouble with hallucinations, regardless of the model, where the chatbot was not supposed to actually use anything outside of its knowledge base,” Souza told NBC News. “For example, when we asked it, ‘Where do I get legal help?’ it would tell you, ‘There’s a law school in Alaska, and so look at the alumni network.’ But there is no law school in Alaska.”
Martin has worked extensively to ensure the chatbot only references the relevant areas of the Alaska Court System’s probate documents rather than conducting wider web searches.
Across the AI industry, AI hallucinations have decreased over time and present less of a threat today than they did even several months ago. Many companies building AI applications like AI-agent provider Manus, which was recently acquired by Meta for more than $2 billion, stress the reliability of their services and include several layers of AI-powered verification to ensure their results are accurate.
To evaluate the accuracy and helpfulness of AVA’s responses, the AVA team designed a set of 91 questions regarding probate topics, asking the chatbot, for example, which probate form would be appropriate to submit if a user wanted to transfer the title of their deceased relative’s car to their name.
Yet the 91-question test proved too time-consuming to run and evaluate, according to Jeannie Sato, the Alaska Court System’s director of access to justice services, given the stakes at hand and the need for human review.
So Sato said the team landed on a refined list of just 16 test questions, featuring “some questions that AVA had answered incorrectly, some that were complicated, and some that were pretty basic questions that we think AVA may be asked frequently.”
Cost is another critical issue for Sato and the AVA team. New iterations and versions of AI systems have caused usage fees to fall precipitously, which the AVA team sees as a key advantage of AI tools given limited court budgets.
Martin told NBC News that under one technical setup, 20 AVA queries would cost only about 11 cents. “I’m mission-driven, and it’s about impact for me in helping people in the world,” Martin said. “To be able to carry forward with that mission, of course, cost is extremely important.”
Yet the ever-changing and advancing systems that power AVA’s answers, like OpenAI’s GPT family of models, mean that the administrative team will likely have to continuously and regularly monitor AVA for any behavioral or accuracy changes.
“We anticipate needing to do regular checks and potentially update prompts or the models as new ones come out and others are retired. It’s definitely something we’ll need to stay on top of rather than a purely hands-off situation,” Martin said.
Despite its many fits and starts, AVA is now scheduled to be launched in late January, if all goes according to plan. For her part, Marz remains optimistic about AVA’s potential to help Alaskans access the probate system but is more clear-eyed about AI’s current limits.
“We did shift our goals on this project a little bit,” Marz said. “We wanted to replicate what our human facilitators at the self-help center are able to share with people. But we’re not confident that the bots can work in that fashion, because of the issues with some inaccuracies and some incompleteness. But maybe with increasing model updates, that will change, and the accuracy levels will go up and the completeness will go up.”
“It was just so very labor-intensive to do this,” Marz added, despite “all the buzz about generative AI, and everybody saying this is going to revolutionize self-help and democratize access to the courts. It’s quite a big challenge to actually pull that off.
