Stanford students apologize for copying Chinese-developed open source AI model
Two students of Stanford University in the United States, dealing with artificial intelligence, have apologized for copying an open source large language model developed by Tsinghua University and tech company ModelBest in China.
The two students issued an apology on Monday on social media platform X, formerly Twitter, while waiting for a response from another member of their team, Mustafa Aljadery.
"We apologize to the authors of (MiniCPM) for any inconvenience that we caused for not doing the full diligence to verify and peer review the novelty of this work," Siddharth Sharma and Aksh Garg wrote in the post.
Sharma said that he and Garg posted the project, Llama3-V, online, and that Aljadery was the person who wrote the code. "Our role here was to help him promote the model on medium and Twitter," Sharma said.
"After seeing the Twitter posts about this topic (on Sunday), we asked Mustafa about proof of originality for Llama3-V and asked for the training code but we haven't seen any response so far. We were waiting for Mustafa to take the lead but instead we are releasing our own statement," he said.
Sharma said all references to Llama3-V have been taken down, and he promised that the team will be "cautious and diligent" in the future.
Sharma and Garg are computer science students at Stanford, according to their web pages; Aljadery, a graduate of computer science from the University of Southern California, is based in San Diego, according to his LinkedIn page. His account at X has been set as private, and his website has been deleted.
China Daily sent emails requesting comment to Sharma and Garg but did not receive an immediate response.
Concerns about the team's project arose among Chinese internet users after the team announced the project online on May 29, according to the tech news website TechNode.
Chinese company ModelBest confirmed on Sunday that the Stanford team's large model project Llama3-V, similar to ModelBest's MiniCPM, is able to recognize Tsinghua Bamboo Slips, a collection of Chinese texts dating to the Warring States Period (475-221 BC) that are written with ink on slips of bamboo. The slips were donated to Tsinghua University in 2008.
The Stanford team's project not only replicates the Chinese model's newly developed recognition ability involving the ancient Chinese texts, but also has the same errors, according to the TechNode report.
Li Dahai, CEO of ModelBest, said the Chinese team spent several months scanning the texts, character by character, from voluminous bamboo slips, annotating the data and integrating it into the model.
He said in a social media post that he hoped the ModelBest team's efforts can attract more attention and recognition, but not by being imitated or plagiarized.
"On the one hand, the imitation can be regarded as a kind of recognition from international counterparts," Li said. "On the other hand, we call for building an open, collaborative and trustful technical community."
Liu Zhiyuan, co-founder of Model-Best, told English-language Chinese news website Yicai Global that the development of AI is inseparable from the open source sharing of global algorithms, data and models.
However, what Stanford's Llama3-V team did seriously undermined the foundations of open source sharing, including adherence to open source protocols, trust in other contributors and respect for previous achievements, he said.
He added that although China still faces gaps with top projects such as Sora and GPT-4, it has rapidly grown from a "nobody" a decade ago to becoming a key driver of innovation in AI technology.
A recent research-misconduct scandal at the Stanford University involved former president Marc Tessier-Lavigne, who resigned in August after an investigation found serious flaws in studies he had supervised going back decades.
Contact the writers at liazhu@chinadailyusa.com














