̶P̶r̶o̶t̶e̶c̶t̶ Obfuscate your content from bots and AIs

touzovitch@lemmy.ml · edit-2 1 year ago

̶P̶r̶o̶t̶e̶c̶t̶ Obfuscate your content from bots and AIs

Leraje@lemmy.blahaj.zone · 1 year ago

I totally applaud your efforts to find a solution to this issue but I don’t think this is practicable, at least in it’s current form. I get the underlying idea that changes to the extension will have to be continually adapted to by the scrapers but that’ll slow them down for a negligible amount of time.

I don’t mean to sound negative and I really do thank you for your efforts but I can’t see how this could be effective.

touzovitch@lemmy.ml · 1 year ago

Slow them down and prevent them to scale is actually not that bad. We are in the context of public content accessible to anyone, so by definition it can not be bulletproof.

Online Privacy becomes less binary (public vs private) when the internet contains content encrypted using various encryption methods, making it challenging to collect data efficiently and at scale.

Thank you so much for your comment though <3

random65837@lemmy.world · 1 year ago

So people without the extension would only see gibberish?

hakunawazo@lemmy.world · 1 year ago

That explains so many subs/comments. But maybe I’m out of touch like Skinner.

But on topic: I see the same problem as with link shorteners. One single service or extension disappears and all good content or links are gone.

Tippon@lemmy.dbzer0.com · edit-2 1 year ago

That’s the biggest problem. I used to use a suspension service for Chrome that would change your open links to its own format when a tab was suspended. I bookmarked hundreds of links in their format over the years.

The service was bought out by a third party, then sold to a scammer, leading to it getting banned by Google.

I’ve now got hundreds of links that are obfuscated, and the only way to get them back is to manually edit them and see which ones are important.

touzovitch@lemmy.ml · 1 year ago

But on topic: I see the same problem as with link shorteners. One single service or extension disappears and all good content or links are gone.

Not exactly. The extension is open source so even if the official extension is gone, you would still be able to decrypt previously “redakted” content.

touzovitch@lemmy.ml · edit-2 1 year ago

Exactly!

For example, here’s a Medium article with encrypted content: https://redakt.org/demo/

otp@sh.itjust.works · edit-2 1 year ago

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Wow, I couldn’t read a thing without the extension! Works perfectly!

Haha

touzovitch@lemmy.ml · 1 year ago

😂😂😂

ares35@kbin.social · 1 year ago

concept is ‘workable’ in an open, but small, tight-knit community.

but in general, if google can’t read it–few eyeballs will ever see it.

touzovitch@lemmy.ml · 1 year ago

but in general, if google can’t read it–few eyeballs will ever see it.

You bring up a good point. The Internet is full of spider bots that crawl the web to index it and improve search results (ex: Google). In my case, I don’t want that any comment I post here or on big platforms like Reddit, Twitter or LinkedIn to be indexed. But I still want to be part of the conversation. At least I would like to have the choice wether or not any text I publish online is indexed.

Linus_Torvalds@lemmy.world · edit-2 1 year ago

Not sure. Couldn’t the bots just decrypt it the same way?

Ahhh, didn’t read to the end. Hm. Still not convinced. I don’t want captchas etc to use the internet

touzovitch@lemmy.ml · edit-2 1 year ago

Captcha was just an example :-)

What I’m trying to say is that any small changes that we add to the extension will have very few (or none) effect on the real users, but will force the srappers to adapt. That might require important human and machine ressources to collect data at a massive scale.

EDIT: And thank you for your feedback <3

LWD@lemm.ee · edit-2 1 year ago

deleted

touzovitch@lemmy.ml · edit-2 1 year ago

You are absolutely right! Using a single public encryption key can not be considered as secured. But it is still more than having your content in clear.

I intend to add more encryption options (sharable custom key, PGP), that way users can choose the level of encryption they want for their public content. Of course, the next versions will still be able to decrypt legacy encrypted content.

In a way, it makes online Privacy less binary:

Instead of having an Internet where we choose to have our content either “public” (in clear) or “private” (E2E encrypted), we have an Internet full of content encrypted with heterogeneous methods of encryption (single key, custom key, key pairs). It would be impossible to scale data collection at this rate!

LWD@lemm.ee · edit-2 1 year ago

deleted

touzovitch@lemmy.ml · 1 year ago

You have a point. Or even malicious links!

We have to be careful with the decrypted output. Redakt is an open source and collaborative project, just saying… 😜

LWD@lemm.ee · edit-2 1 year ago

deleted

touzovitch@lemmy.ml · 1 year ago

Image injection is something I will need to stress out.

Lemongrab@lemmy.one · 1 year ago

Maybe if this was condesed to a userscript, or instead of encryption use base 64 encoding. Its really just about obfuscating/transforming text to automated systems, not securing it.

touzovitch@lemmy.ml · edit-2 1 year ago

You’re right. “Securing” is bad word. “Obfuscating” might be more appropriate. Actually had the same feedback from Jonah of Privacy Guides.

I use AES encryption with a single public key at the moment. That way, if I want to give the option to the user to create encrypt with a custom key, I don’t have to change the encryption method.

EDIT: Editing the title of this thread ̶P̶r̶o̶t̶e̶c̶t̶

andruid@lemmy.ml · 1 year ago

Can you create custom decryption keys? I like the idea of an easy to use encryption mechanism for non private platforms.

touzovitch@lemmy.ml · 1 year ago

What do you mean by non private platforms?

In this POC, you can only encrypt content using Redakt’s public key. That way you are guaranteed to see the content since the key is already installed in the extension.

I intend to add the option to encrypt with a custom sharable key in the v.2.

andruid@lemmy.ml · 1 year ago

Honestly even this platform, but any public platform without e2e and the direct choice of who to share it with.

touzovitch@lemmy.ml · 1 year ago

r3d4kt-U2FsdGVkX1/lGJZ5fHhIJPQ8w7fdKIrvJKGa4C6hVzgxa99BNXMr7LQFL9Rur05EFVITe2pREZaianyq1F5k4dQEovbUKXWwjoj7R2ZXmu3z836vItVgTHh/Wen4p0pp&&&

umami_wasabi@lemmy.ml · 1 year ago

I’m browsing via the Jerboa app, which I can’t read anything except some non sense strings.

You got the idea but the execution is subpar TBH. Browsers are not the only method to view contents nowadays.

touzovitch@lemmy.ml · edit-2 1 year ago

You’re right, App traffic is something we’ll need to crack. But as a first step, anything traffic going through a web browser is already significant.

wowwoweowza@lemmy.ml · 1 year ago

I see what you did there.

PowerCrazy@lemmy.ml · 1 year ago

This is a cool proof of concept and pretty easy to adapt for almost any purpose not just text. I don’t think it’s “useful” but then again “usefulness” isn’t exactly well defined in the first place.

touzovitch@lemmy.ml · edit-2 1 year ago

Thank you 😊

I actually thought about this. Adapting the same approach with other kind of content like image, audio or video would be game breaker!!

Imagine uploading videos to Youtube that only viewers with a key would be able to understand!

But it is a challenge as it might require advanced knowledge in image and audio.

LastoftheDinosaurs@lemmy.world · 1 year ago

deleted by creator

touzovitch@lemmy.ml · edit-2 1 year ago

But why? Why do you people hate AI so much?

I don’t think it’s a question to “hate” AI or not. Personally, I have nothing against it.

As always with Privacy, it’s a matter of choice: when I publish something online publicly, I would like to have the choice wether or not this content is going to be indexed or used to train models.

It’s a dual dilemma. I want to benefit from the hosting and visibility of big platforms (Reddit, LinkedIn, Twitter etc.) but I don’t want them doing literally anything with my content because lost somewhere in their T&C it’s mentioned “we own your content, we do whatever tf we want with it”.

war@kbin.social · 1 year ago

deleted by creator

queermunist she/her@lemmy.ml · 1 year ago

How come when I plagiarize other people’s creative content it’s illegal, but when AI does it it’s fine?

S410@kbin.social · 1 year ago

It’s illegal if you copy-paste someone’s work verbatim. It’s not illegal to, for example, summarize someone’s work and write a short version of it.

As long as overfitting doesn’t happen and the machine learning model actually learns general patterns, instead of memorizing training data, it should be perfectly capable of generating data that’s not copied verbatim from humans. Whom, exactly, a model is plagiarizing if it generates a summarized version of some work you give it, particularly if that work is novel and was created or published after the model was trained?

queermunist she/her@lemmy.ml · 1 year ago

All these AI do is algorithmically copy-paste. They don’t have original thoughts and or original conclusions or original ideas, all if it is just copy-paste with extra steps.

S410@kbin.social · 1 year ago

Learning is, essentially, “algorithmically copy-paste”. The vast majority of things you know, you’ve learned from other people or other people’s works. What makes you more than a copy-pasting machine is the ability to extrapolate from that acquired knowledge to create new knowledge.

And currently existing models can often do the same! Sometimes they make pretty stupid mistakes, but they often do, in fact, manage to end up with brand new information derived from old stuff.

I’ve tortured various LLMs with short stories, questions and riddles, which I’ve written specifically for the task and which I’ve asked the models to explain or rewrite. Surprisingly, they often get things either mostly or absolutely right, despite the fact it’s novel data they’ve never seen before. So, there’s definitely some actual learning going on. Or, at least, something incredibly close to it, to the point it’s nigh impossible to differentiate it from actual learning.

LWD@lemm.ee · edit-2 1 year ago

deleted

S410@kbin.social · edit-2 1 year ago

Not once did I claim that LLMs are sapient, sentient or even have any kind of personality. I didn’t even use the overused term “AI”.

LLMs, for example, are something like… a calculator. But for text.

A calculator for pure numbers is a pretty simple device all the logic of which can be designed by a human directly.

When we want to create a solver for systems that aren’t as easily defined, we have to resort to other methods. E.g. “machine learning”.

Basically, instead of designing all the logic entirely by hand, we create a system which can end up in a number of finite, yet still near infinite states, each of which defines behavior different from the other. By slowly tuning the model using existing data and checking its performance we (ideally) end up with a solver for something a human mind can’t even break up into the building blocks, due to the shear complexity of the given system (such as a natural language).

And like a calculator that can derive that 2 + 3 is 5, despite the fact that number 5 is never mentioned in the input, or that particular formula was not a part of the suit of tests that were used to verify that the calculator works correctly, a machine learning model can figure out that “apple slices + batter = apple pie”, assuming it has been tuned (aka trained) right.

queermunist she/her@lemmy.ml · 1 year ago

Chat bots do not learn, stop anthropomorphizing them.

S410@kbin.social · edit-2 1 year ago

Not once did I claim that LLMs are sapient, sentient or even have any kind of personality. I didn’t even use the overused term “AI”.

LLMs, for example, are something like… a calculator. But for text.

A calculator for pure numbers is a pretty simple device all the logic of which can be designed by a human directly.

When we want to create a solver for systems that aren’t as easily defined, we have to resort to other methods. E.g. “machine learning”.

Basically, instead of designing all the logic entirely by hand, we create a system which can end up in a number of finite, yet still near infinite states, each of which defines behavior different from the other. By slowly tuning the model using existing data and checking its performance we (ideally) end up with a solver for something a human mind can’t even break up into the building blocks, due to the shear complexity of the given system (such as a natural language).

And like a calculator that can derive that 2 + 3 is 5, despite the fact that number 5 is never mentioned in the input, or that particular formula was not a part of the suit of tests that were used to verify that the calculator works correctly, a machine learning model can figure out that “apple slices + batter = apple pie”, assuming it has been tuned (aka trained) right.

LastoftheDinosaurs@lemmy.world · 1 year ago

deleted by creator

LWD@lemm.ee · edit-2 1 year ago

deleted

queermunist she/her@lemmy.ml · 1 year ago

People make derivative works because they add their own ideas and spin. AI do not have ideas or spin, it’s copy-paste with extra steps.

LastoftheDinosaurs@lemmy.world · 1 year ago

deleted by creator

queermunist she/her@lemmy.ml · 1 year ago

The tech requires huge amounts of processing power and loose laws to even exist. It could be banned quite easily.

It won’t be lol

moreeni@lemm.ee · 1 year ago

Have you even been following what images AI can generate now? Every work is original, it doesn’t just copy and paste pixels.

queermunist she/her@lemmy.ml · 1 year ago

What it does is use a large statistical model to determine which pixels it copies, but it’s still copy/paste with extra steps.

Zach777@fosstodon.org · 1 year ago

@queermunist @moreeni I have to disagree. The plagiarism claims are unfounded as the ais are making their own artwork off of what they have learned. Usually starting from noise and de-noising it into something that matches its’ memories of the key words. In the case of the generative art ais anyway.

While there can be valid arguments against copyrighted material being used for the ais, plagiarism is not one of them.

queermunist she/her@lemmy.ml · 1 year ago

Far be it from me to defend the concept of intellectual property, but if a chat bot can be argued to not plagiarize then that implies it has an intelligence. It really doesn’t. It’s plagiarism with extra steps.

LWD@lemm.ee · edit-2 1 year ago

deleted

PipedLinkBot@feddit.rocks · 1 year ago

Here is an alternative Piped link(s):

It’s complicated

Piped is a privacy-respecting open-source alternative frontend to YouTube.

I’m open-source; check me out at GitHub.

Albinjose7345@lemmy.dbzer0.com · edit-2 1 year ago

deleted by creator

touzovitch@lemmy.ml · 1 year ago

I don’t think AI is bad as a whole. At least I would like to choose if the content I post online can be used (or not) to train models.