buc.ci is a Fediverse instance that uses the ActivityPub protocol. In other words, users at this host can communicate with people that use software like Mastodon, Pleroma, Friendica, etc. all around the world.
This server runs the snac software and there is no automatic sign-up process.
Huzzah! @shom and @elaterite - I think you might find this particularly interesting…
I managed to add a handful of custom metadata fields to #darktable (I only have a few folders of images there as a test) to store film/camera/development info. After a bit of troubleshooting and reading output from #exiftool, I added a formula in the export metadata settings to concatenate a handful of them into “description” exif fields. Now adding the image in my mastodon app (mona) pulls in the description as alt-text just like my #Lightroom / #NLP images do. I uploaded the same image in a #Piwigo album and it pulls in the same.
One MAJOR hurdle overcome - I have 9-ish months left of free Lightroom thanks to my new Nikon, so I have that long to try to figure out the rest.
(By the way this image was reversed with negadoctor, and it’s a test image from the Moskva I bought in NY.)
This misguided trend has resulted, in our opinion, in an unfortunate state of affairs: an insistence on building NLP systems using ‘large language models’ (LLM) that require massive computing power in a futile attempt at trying to approximate the infinite object we call natural language by trying to memorize massive amounts of data. In our opinion this pseudo-scientific method is not only a waste of time and resources, but it is corrupting a generation of young scientists by luring them into thinking that language is just data – a path that will only lead to disappointments and, worse yet, to hampering any real progress in natural language understanding (NLU). Instead, we argue that it is time to re-think our approach to NLU work since we are convinced that the ‘big data’ approach to NLU is not only psychologically, cognitively, and even computationally implausible, but, and as we will show here, this blind data-driven approach to NLU is also theoretically and technically flawed.From Machine Learning Won't Solve Natural Language Understanding, https://thegradient.pub/machine-learning-wont-solve-the-natural-language-understanding-challenge/
#AI #GenAI #GenerativeAI #LLM #LLMs #NLP #NLU #GPT #ChatGPT #Claude #Gemini #LLAMA
- Statistics, as a field of study, gained significant energy and support from eugenicists with the purpose of "scientizing" their prejudices. Some of the major early thinkers in modern statistics, like Galton, Pearson, and Fisher, were eugenicists out loud; see https://nautil.us/how-eugenics-shaped-statistics-238014/
- Large language models and diffusion models rely on certain kinds of statistical methods, but discard any notion of confidence interval or validation that's grounded in reality. For instance, the LLM inside GPT outputs a probability distribution over the tokens (words) that could follow the input prompt. However, there is no way to even make sense of a probability distribution like this in real-world terms, let alone measure anything about how well it matches reality. See for instance https://aclanthology.org/2020.acl-main.463.pdf and Michael Reddy's The conduit metaphor: A case of frame conflict in our language about language
Early on in this latest AI hype cycle I wrote a note to myself that this style of AI is necessarily biased. In other words, the bias coming out isn't primarily a function of biased input data (though of course that's a problem too). That'd be a kind of contingent bias that could be addressed. Rather, the bias these systems exhibit is a function of how the things are structured at their core, and no amount of data curating can overcome it. I can't prove this, so let's call it a hypothesis, but I believe it.
#AI #GenAI #GenerativeAI #ChatGPT #GPT #Gemini #Claude #Llama #StableDiffusion #Midjourney #DallE #LLM #DiffusionModel #linguistics #NLP