• FaceDeer@fedia.io
    link
    fedilink
    arrow-up
    7
    arrow-down
    1
    ·
    8 months ago

    But it doesn’t fully understand young and “naked young person” isn’t just a scaled down “naked adult”.

    Do you actually know that, or are you just assuming it?

    Personally, I’m basing my assertions off of experience with related situations, where I’ve asked image AIs to generate images of things that I’m quite sure weren’t in its training set and that require conceptual understanding to create “hybrids.” It’s done a decent job of those so I’m assuming that it can figure out this specific situation as well, since most of these models have a lot of examples of naked people and young people in their training sets. But I haven’t actually asked any AIs to generate images of naked young people to test this one specific case.

    • xmunk@sh.itjust.works
      link
      fedilink
      arrow-up
      1
      arrow-down
      6
      ·
      8 months ago

      My opinion here is that “naked young person” isn’t as simple as other compound concepts because there are physiological changes we go through during puberty that an AI can’t reverse engineer. Something like “Italian samurai” involves concepts that occur at a surface level that it can easily understand while “naked young person” involves some components that can’t be derived simply from applying “young” to “naked person” or “naked” to “young person”.

      Someone did have a valid counter argument in this subthread though: https://sh.itjust.works/comment/11713795

      • FaceDeer@fedia.io
        link
        fedilink
        arrow-up
        3
        arrow-down
        1
        ·
        8 months ago

        Well, I haven’t gone to any of my image AIs and actually asked them to generate naked pictures of young people. So unless you want to go there this will necessarily involve some degree of theoretical elements.

        However, according to the article it’s possible to generate this stuff with Stable Diffusion models, and Stable Diffusion models have a negligible amount of CSAM in the training set. So short of actually doing the experiment that would seem to settle it.

        I think a lot of people don’t appreciate just how surprisingly sophisticated the “world model” that these image AIs have learned is. There was a paper a while back where some researchers were trying to analyze how image generators were working internally, and they discovered that if you were to for example ask one to make a picture of a bicycle it will first come up with a depth map of the image before it starts doing anything to the visual output. That shows that the AI has figured out what the three-dimensional form of a bicycle is based entirely on a pile of two-dimensional training images, with no other clues telling it that the third dimension even exists in the first place.