• Muehe@lemmy.ml
    link
    fedilink
    arrow-up
    1
    ·
    11 months ago

    Yeah but that’s my point, right?

    That

    1. you do not “replace data until your desired objective”.
    2. the original model stays intact (the W in the picture you embedded).

    Meaning that when you change or remove the LoRA (A and B), the same types of biases will just resurface from the original model (W). Hence “less biased” W being the preferable solution, where possible.

    Don’t get me wrong, LoRAs seem quite interesting, they just don’t seem like a good general approach to fighting model bias.

    • jarfil@beehaw.org
      link
      fedilink
      arrow-up
      2
      ·
      edit-2
      11 months ago

      “less biased” W being the preferable solution, where possible.

      Not necessarily. There are two parts to a diffusion model: a tokenizer, and a neural network with a series of layers (W in this case would be a single layer) that react in some way to some tokens. What you really want, is a W “with more information”, no matter if some tokens refer to a more or less “fair” (less biased) portion of it.

      It doesn’t really matter if “girl = 99% chance of white girl + 1% of [other skin tone] girl”, and “asian girl = sexualized asian girl”… as long as the “biased” token associations don’t reduce de amount of “[skin tone] girl” variants you can extract with specific prompts, and still react correctly to negative prompts like “asian girl -sexualized”.

      LoRAs are a way to bludgeon a whole model into a strong bias, like “everything is a manga”, or “everything is birds”, or “all skin is frogs”, and so on. The interesting thing of LoRAs is that, if you get a base model where “girl = sexualized white girl”, and add an “all faces are asian” LoRA, and a “no sexualized parts” LoRA… then well, you’ve beaten the model into submission without having to use prompts (a pyrrhic victory).

      That is, unless you want something like a “multirracial female basketball team”.

      That would require the model to encode the “race” as multiple sets of features, then pick one at random for every player in whatever proportion you find acceptable… but for that, you’re likely better off with adding an LLM preprocessor stage to pick a random set of races in your desired proportion, then have it instruct a bounded box diffusion model to draw each player with a specific prompt, so the bias of the model’s tokens would again become irrelevant.

      • Muehe@lemmy.ml
        link
        fedilink
        arrow-up
        1
        ·
        11 months ago

        a neural network with a series of layers (W in this case would be a single layer)

        I understood this differently. W is a whole model, not a single layer of a model. W is a layer of the Transformer architecture, not of a model. So it is a single feed forward or attention model, which is a layer in the Transformer. As the paper says, a LoRA:

        injects trainable rank decomposition matrices into each layer of the Transformer architecture

        It basically learns shifting the output of each Transformer layer. But the original Transformer stays intact, which is the whole point, as it lets you quickly train a LoRA when you need this extra bias, and you can switch to another for a different task easily, without re-training your Transformer. So if the source of the bias you want to get rid off is already in these original models in the Transformer, you are just fighting fire with fire.

        Which is a good approach for specific situations, but not for general ones. In the context of OP you would need one LoRA for fighting it sexualising Asian women, then you would need another one for the next bias you find, and before you know it you have hundreds and your output quality has degraded irrecoverably.

        • jarfil@beehaw.org
          link
          fedilink
          arrow-up
          1
          ·
          11 months ago

          It basically learns shifting the output of each Transformer layer

          That would increase inference time, which is something they explicitly avoid.

          Check point 4.1 in the paper. W is a weight matrix for a single layer, and the training focuses on finding a ∆W such that the result is fine tuned. The LoRA optimization lies in calculating a ∆W in the form of BA with lower ranks, but W still being a weight matrix for the layer, not its output:

          W0 + ∆W = W0 + BA

          A bit later:

          When deployed in production, we can explicitly compute and store W = W0 + BA and perform inference as usual

          W0 being the model’s layer’s original weight matrix, and W being the modified weight matrix that’s being “executed”.

          the original Transformer stays intact

          At training time, yes. At inference time, no.

          before you know it you have hundreds and your output quality has degraded irrecoverably.

          This is correct. Just not because you’ve messed with the output of each layer, but with the weights of each layer… I’d guess messing with the outputs would cause a quicker degradation.