I worry about the training of Artificial Intelligence using the internet as the main source of information. One of the biggest challenges in teaching AI is in teaching it how to group things. Unless a group is clearly identified, it’s not a group. That’s ok when counting items, but not ideas. What is fact vs fiction? What is truth vs a lie vs an embellishment vs an exaggeration vs a theory vs a really, really bad theory?
There are some dark places on the internet. There are some deeply flawed ideas about culture, race, gender, politics, and even health and fitness. There are porn sites that objectify women, and anti-science websites that read like they are reporting out facts. There is a lot of ‘stupid shit’ on the internet. How is this information grouped by not-yet intelligent AI systems?
There is the old saying, ‘Garbage in, garbage out’, and essentially that’s my concern. Any form of artificial general intelligence is only as good as the intelligence put into the system, and while the internet is a great source of intelligent information it’s also a cesspool of ridiculous information that’s equally as easy to find. I’m not sure these two dichotomous forms of information are being grouped by AI systems in a meaningful and wise way… mainly because we aren’t smart enough to program these systems well enough to know the difference.
The tools we have for searching the internet are based on algorithms that are constantly gamed by SEO techniques and search is based on words, not ideas. The best ideas on the internet are not the ones necessarily most linked to, and often bad ideas get more clicks, likes, and attention. How does an AI weigh this? How does it group these ideas? And what conclusions does the AI make? Because the reality is that the AI needs to make decisions or it wouldn’t be considered intelligent. Are those decisions ones ‘we’ are going to want it to make? If the internet is the the main database of information then I doubt it.