• model_tar_gz@lemmy.world
    link
    fedilink
    English
    arrow-up
    4
    ·
    edit-2
    3 months ago

    Reward models (aka reinforcement learning) and preference optimization models can come to some conclusions that we humans find very strange when they learn from patterns in the data they’re trained on. Especially when those incentives and preferences are evaluated (or generated) by other models. Some of these models could very well could come to the conclusion that nuking every advanced-tech human civilization is the optimal way to improve the human species because we have such rampant racism, classism, nationalism, and every other schism that perpetuates us treating each other as enemies to be destroyed and exploited.

    Sure, we will build ethical guard rails. And we will proclaim to have human-in-the-loop decision agents, but we’re building towards autonomy and edge/corner-cases always exist in any framework you constrain a system to.

    I’m an AI Engineer working in autonomous agentic systems—these are things we (as an industry) are talking about—but to be quite frank, there are not robust solutions to this yet. There may never be. Think about raising a teenager—one that is driven strictly by logic, probabilistic optimization, and outcome incentive optimization.

    It’s a tough problem. The naive-trivial solution that’s also impossible is to simply halt and ban all AI development. Turing opened Pandora’s box before any of our time.

    • tal
      link
      fedilink
      English
      arrow-up
      1
      ·
      edit-2
      3 months ago

      Yeah, it’s not easy. I’m not sure that the problem is realistically solvable. On the other hand, the potential rewards for doing so are immeasurable – at the extreme, you’re basically creating and chaining a “god”, which would be damned nice to have at one’s beck and call. So it’d be damned nice to solve it.

      1. The technical problems are hard, because we’d like to build a self-improving system, and build constraints that apply to it even after its complexity has grown far beyond our ability to understand it or even the ability of our tools to do so. It’s like a bacterium trying to genetically-engineer something that will evolve into a human compelled to do what the bacterium wants.

      2. However we constrain the system…maybe in the near term, we could recover from a flawed “containment” system. But in the long run, those constraints are probably going to have to permit for zero failures. You make yourself a god and it slips its leash, you may not get a second chance to leash it. Zero failures, ever, forever, hardware or software, is kind of an unimaginable bar for even the vastly more-simple systems that we build today.

      3. Even if one can build a system to constrain something that we cannot understand, and works perfectly, forever, part of the problem is that when building computer systems, the engineer has to iron out corner cases that don’t come up when requirements are specified in a rather-loose fashion, in everyday English. We have a hard time getting a sufficiently-complete specification for most of what software does today. The problems involved in ironing out the corner cases to write a sufficiently-complete specification of “what is in humanity’s interest” when we often can’t even agree on that ourselves seems rather difficult. That’s not even a computer science issue and we’ve been banging on that one for all of human history and couldn’t come up with an answer.

      4. The above specification has to hold for all kinds of environments, including ones with technology that will not exist today. Like, take a kind of not-unreasonable-sounding utilitarian philosophical position – “seek to maximize human happiness for the greatest number of people”. Well…that’s not even complete for today (what exactly constitutes “happiness”?), but in a world where an AI with a sufficient level of technological advancement could potentially both surgically modify a human to hardwire their pleasure sensations and also clone and mass-grow more human fetuses, that quite-reasonable-sounding rule suddenly starts to look rather less-reasonable.

      I’ve wondered before whether artificial general intelligence might be the answer to the Fermi paradox.

      https://en.wikipedia.org/wiki/Fermi_paradox

      The Fermi paradox is the discrepancy between the lack of conclusive evidence of advanced extraterrestrial life and the apparently high likelihood of its existence. As a 2015 article put it, “If life is so easy, someone from somewhere must have come calling by now.”

      Italian-American physicist Enrico Fermi’s name is associated with the paradox because of a casual conversation in the summer of 1950 with fellow physicists Edward Teller, Herbert York, and Emil Konopinski. While walking to lunch, the men discussed recent UFO reports and the possibility of faster-than-light travel. The conversation moved on to other topics, until during lunch Fermi blurted out, “But where is everybody?” (although the exact quote is uncertain).

      There have been many attempts to resolve the Fermi paradox, such as suggesting that intelligent extraterrestrial beings are extremely rare, that the lifetime of such civilizations is short, or that they exist but (for various reasons) humans see no evidence.

      One such potential answer is rather dark:

      It is the nature of intelligent life to destroy itself

      This is the argument that technological civilizations may usually or invariably destroy themselves before or shortly after developing radio or spaceflight technology. The astrophysicist Sebastian von Hoerner stated that the progress of science and technology on Earth was driven by two factors—the struggle for domination and the desire for an easy life. The former potentially leads to complete destruction, while the latter may lead to biological or mental degeneration. Possible means of annihilation via major global issues, where global interconnectedness actually makes humanity more vulnerable than resilient, are many, including war, accidental environmental contamination or damage, the development of biotechnology, synthetic life like mirror life, resource depletion, climate change, or poorly-designed artificial intelligence. This general theme is explored both in fiction and in scientific hypothesizing.

      In 1966, Sagan and Shklovskii speculated that technological civilizations will either tend to destroy themselves within a century of developing interstellar communicative capability or master their self-destructive tendencies and survive for billion-year timescales.

      The concerning thing is that if this is the answer, we have spaceflight now and so we probably aren’t all that far from interstellar travel. We made it this far, so there’s not a lot of time left for us to have our near-inevitable disaster. This should be a critical phase where we expect to have our disaster soon…yet we don’t see a technology or anything likely to cause our certain or near-certain destruction.

      Sagan thought that nuclear weapons might be the answer. It is a technology associated with interstellar flight – one probably needs nuclear propulsion to travel between star systems. So it’d potentially almost-certainly be discovered at about the right time. The “start time” for the technology checks out for nuclear weapons.

      But it’s not clear why we’d almost-certainly need to have a cataclysmic nuclear war in the near future. I mean, sure, there’s a chance, but a certainty? Enough to wipe out every civilization out there that developed more-quickly than our own?

      The problem here is that Sagan’s “hold it together long enough to start spreading through the universe and then no single disaster can reasonably wipe you out” is at least plausible for a lot of technologies, like nuclear war.

      But a technology that everyone would seek to have and make use of and where some kind of catastrophic event could spread at the speed of light, along information channels…that could potentially destroy a civilization that has even passed the “interstellar travel” barrier and is on multiple star systems. The time requirements for an AI spreading out of control are potentially a lot laxer than having a nuclear war. That’s a disaster that doesn’t have to happen very shortly after interstellar travel is achieved.

      And if it then itself was not stable, collapsed, that’d explain why we don’t see AIs running around the universe either.

      sighs

      But it sure is a technology that it’d be terribly nice to have.