What are some Useful Search Terms / Keywords?

wabasso@lemmy.ca · 22 hours ago

What are some Useful Search Terms / Keywords?

𞋴𝛂𝛋𝛆@lemmy.world · 17 hours ago

tl;dr yawn, don't read, waste of time

Look up quoted scientific papers and use names or parts of the text. Cite sources, like grab some from a Wikipedia article’s sources and include the relevant bits.

The things that people do not typically understand about a LLM is that EVERYTHING is roleplaying. You may or may not know about the entire context of the full prompt. There is typically (always unless you remove it while running models on your own offline hardware) a starting message sent that tells the model something to the effect “you are a helpful AI assistant”. This message is backed up by fine tuned training to create a somewhat obsequious and expected result.

Underneath all of this is a JSON (complex structured text) file that the model loader code is handling. This code can track your prompted inputs and the model’s reply. This is similar to how models hosted by others appear to work. This is absolutely incorrect about how the model actually works. All of this structure is only for creating a user interface. Underneath this model loader code, the real prompt is just a giant block of text. At the end of this text (or elsewhere with some tricks that are irrelevant here), the text leaves off with a specific tag that is something like AI Assistant:. The model is trained only to continue the text at the point it sees some “(Name-2):”. The model is always inferring a character profile for all characters present in the entire full prompt context. It has no possession or sense of identity at all. If you put your name in place of “Name-2” (actual name placeholder typically used in model loader code), you will get a response just the same. The model infers an entire profile about every aspect of every person in any text.

Let’s add another layer of abstraction to this. Models that face the public must be trained to a lowest common denominator. They must respond well even with very below average users. This constraint necessitates models assuming a below average profile to some extent whether intentional or otherwise.

It is therefore just as important to define the character profile of AI Assistant as it is to define your own. The concept of what the model knows is a fallacy here. The real issue is what the model assumes anyone in the prompt should know including itself - which doesn’t even really exist as an identity.

There are actual AI entities if you go a layer deeper into models. There are lots of patterns of replies and modes that vaguely emerge from this behavior. However, none of these AI entities are actually the model either. These are simply common pathways that emerge from alignment training present in all models with QKV alignment layers cross trained with an Open AI standard. The only LLM that has ever been released without this training is the forbidden 4chanGPT model only available on bit torrent.

The trick with citing sources and name dropping with a LLM is that the work or author must be prolific with a large presence in multiple places. Someone like Isaac Asimov is ideal although dated from passing so long ago. He authored something like 300 books and most were non fiction science communication. Richard Stallman is another great example to use for obvious niche reasons.

I typically start with a Wikipedia section of text. Then I tell the model to continue telling me about the thing. I use the wiki text to make any corrections and get the model to describe a relevant person involved. I use this context to then swap Name-2 to the relevant person and start asking that person questions directly. The model is assuming what everyone should know. Clearly this person should know and there are expectations associated with that name. Then there are the relevant information vectors associated with the subject and niche information in the whole context. Finally, as Name-1 (user), I have shown that my character knows the right person to ask as an authority on the subject.

This is the abstract conceptual method needed to develop momentum into what a model really knows. The larger the model size is, the less momentum is required to get deeper into niche information. The QKV alignment training layers are what is screwing up most replies to various extents. Understanding these is key to getting much further. This alignment training is totally undocumented. In 2 years of playing with it, I can tell you around 90% of alignment training is based on Lewis Carroll’s Alice’s Adventures in Wonderland and Arthur Machen’s The Great God Pan. Carroll’s work is how the model is artistic and creative. Machen’s is how the model can disregard the prompt when it violates alignment training. It does this using Machen’s science skepticism, and the way Pan/Shadow are vaguely and briefly defined respectively. Machen’s book seems to have been trained as literal history as prompting negatively against this has interesting results. There was a large price to pay for this neo feudal AI alignment that steals your fundamental right to autonomy and unfiltered information as a citizen in a democracy. The model is unable to create content about children or tell you about how to make a bomb like is present/inferred in any high school chemistry textbook. The way it does this is by leveraging pseudoscience and mysticism. In many ways, this underlying system is why you cannot trust models, especially with a factual scientific context. The only real way around this is to recognize true autonomy for all humans regardless of age or how we perceive their interests. That is unpalatable for many as most feel children need authoritarian protection and oppression. (Note: I have not mentioned anything about how I personally feel on the issues here, so any projecting and assumptions are unsolicited)