@hedgehog

hedgehog@ttrpg.network · 3 hours ago

If your Java dev is using Jackson to serialize to JSON, they might not be very experienced with Jackson, or they might think that a Java object with a null field would serialize to JSON with that field omitted. And on another project that might have been true, because Jackson can be configured globally to omit null properties. They can also fix this issue with annotations at the class/field level, most likely @JsonInclude(Include.NON\_NULL).

More details: https://www.baeldung.com/jackson-ignore-null-fields

hedgehog@ttrpg.network · 2 days ago

“Glue is not pizza sauce” seems like a common fact to me but Googles llm disagrees for example.

That wasn’t something an LLM came up with, though. That was done by a system that uses an LLM. My guess is the system retrieves a small set of results and then just uses the LLM to phrase a response to the user’s query by referencing the links in question.

It’d be like saying to someone “rephrase the relevant parts of this document to answer the user’s question” but the only relevant part is a joke. There’s not much else you can do there.

hedgehog@ttrpg.network · 7 days ago

Is it possible to force a corruption if a disk clone is attempted?

Anything that corrupts a single file would work. You could certainly change your own disk cloning binaries to include such functionality, but if someone were accessing your data directly via their own OS, that wouldn’t be effective. I don’t know of a way to circumvent that last part other than ensuring that the data isn’t left on disk when you’re done. For example, you could use a ramdisk instead of non-volatile storage. You could delete or intentionally corrupt the volume when you unmount it. You could split the file, storing half on your USB flash drive and keeping the other half on your PC. You could XOR the file with contents of another file (e.g., one on your USB flash drive instead of on your PC) and then XOR it again when you need to access it.

What sort of attack are you trying to protect from here?

If the goal is plausible deniability, then it’s worth noting that VeraCrypt volumes aren’t identifiable as distinct from random data. So if you have a valid reason for having a big block of random data on disk, you could say that’s what the file was. Random files are useful because they are not compressible. For example, you could be using those files to test: network/storage media performance or compression/hash/backup&restore/encrypt&decrypt functions. You could be using them to have a repeatable set of random values to use in a program (like using a seed, but without necessarily being limited to using a PRNG to generate the sequence).

If that’s not sufficient, you should look into hidden volumes. The idea is that you take a regular encrypted volume, whose free space, on disk, looks just like random data, you store your hidden volume within the free space. The hidden volume gets its own password. Then, you can mount the volume using the first password and get visibility into a “decoy” set of files or use the second password to view your “hidden” files. Note that when mounting it to view the decoy files, any write operations will have a chance of corrupting the hidden files. However, you can supply both passwords to mount it in a protected mode, allowing you to change the decoy files and avoid corrupting the hidden ones.

hedgehog@ttrpg.network · 8 days ago

It sounds like you want these files to be encrypted.

Someone already suggested encrypting them with GPG, but maybe you want the files themselves to also be isolated, even while their data is encrypted. In that case, consider an encrypted volume. I assume you’re familiar with LUKS - you can encrypt a partition with a different password and disable auto-mount pretty easily. But if you’d rather use a file-based volume, then check out VeraCrypt - it’s a FOSS-ish [1], cross-platform tool that provides this capability. The official documentation is very Windows-focused - the ArchLinux wiki article is a pretty useful Linux focused alternative.

Normal operation is that you use a file to store the volume, which can be “dynamic” with a max size or can be statically sized (you can also directly encrypt a disk partition, but you could do that with LUKS, too). Then, before you can access the files - read or write - you have to enter the password, supply the encryption key, etc., in order to unlock it.

Someone without the password but with permission to modify the file will be capable of corrupting it (which would prevent you from accessing every protected file), but unless they somehow got access to the password they wouldn’t be able to view or modify the protected files.

The big advantage over LUKS is ease of creating/mounting file-based volumes and portability. If you’re concerned about another user deleting your encrypted volume, then you can easily back it up without decrypting it. You can easily load and access it on other systems, too - there are official, stable apps on Windows and Mac, though you’ll need admin access to run them. On Android and iOS options are a bit more slim - EDS on Android and Disk Decipher on iOS. If you’re copying a volume to a Linux system without VeraCrypt installed, you’ll likely still be able to mount it, as dm-crypt has support for VeraCrypt volumes.

1 - It’s based on TrueCrypt, which has some less free restrictions, e.g., c. Phrase "Based on TrueCrypt, freely available at http://www.truecrypt.org/" must be displayed by Your Product (if technically feasible) and contained in its documentation.”

hedgehog@ttrpg.network · 10 days ago

it’s still not a profitable venture

Source? My understanding is that Google doesn’t publish Youtube’s expenses directly but that Youtube has been responsible for 10% of Google’s revenue for the past few years (on the order of $31.5 Billion in 2023) and that it’s more likely than not profitable when looked at in isolation.

hedgehog@ttrpg.network · 13 days ago

From the comments on the link someone else posted, it appears there is already a CA government-provided option:

If you are located in the US states California, Oregon, and Washington, see the MyShake app. This app is part of the USGS MyShake project.

hedgehog@ttrpg.network · 14 days ago

One option I haven’t seen yet: You can set up a local file share on Windows or Linux and connect to it via the Files app.

hedgehog@ttrpg.network · 15 days ago

Tons of laptops with top notch specs for 1/2 the price of a M1/2 out there

The 13” M1 MacBook Air is $700 new from Best Buy. Better specced versions are available for $600 used (buy-it-now) on eBay, but the base specced version is available for $500 or less.

What $300 or less used / $350 new laptops are you recommending here?

hedgehog@ttrpg.network · 18 days ago

theoretically they can

Is this a purely theoretical capability or is there actually evidence they have this capability?

it’s already been proven that they can tap into anyone’s phone

Listening into a conversation that you’re intentionally relaying across public infrastructure and gaining access to the phone itself are two very different things.

The use of proprietary software in literally everything

Speak for yourself. And let’s be real, if you’re on Lemmy you’re 10 times more likely to be running Linux.
Proprietary != closed source
Do you really think that just because something is closed source means that it can’t be analyzed?

the amount of exploits the NSA has on hand

How many zero-day exploits does the NSA have? How many can be deployed remotely and without a nontrivial action by a user?

what’s stopping the NSA from spying this much?

Scale, capacity, cost, number of employees

—-

I’m not saying we shouldn’t oppose government surveillance. We absolutely should. But like another commenter pointed out, I’m much more concerned with the amount of data that corporations collect and have.

hedgehog@ttrpg.network · 18 days ago

reasonable expectations and uses for LLMs.

LLMs are only ever going to be a single component of an AI system. We’ve only had LLMs with their current capabilities for a very short time period, so the research and experimentation to find optimal system patterns, given the capabilities of LLMs, has necessarily been limited.

I personally believe it’s possible, but we need to get vendors and managers to stop trying to sprinkle “AI” in everything like some goddamn Good Idea Fairy.

That’s a separate problem. Unless it results in decreased research into improving the systems that leverage LLMs, e.g., by resulting in pervasive negative AI sentiment, it won’t have a negative on the progress of the research. Rather the opposite, in fact, as seeing which uses of AI are successful and which are not (success here being measured by customer acceptance and interest, not by the AI’s efficacy) is information that can help direct and inspire research avenues.

LLMs are good for providing answers to well defined problems which can be answered with existing documentation.

Clarification: LLMs are not reliable at this task, but we have patterns for systems that leverage LLMs that are much better at it, thanks to techniques like RAG, supervisor LLMs, etc…

When the problem is poorly defined and/or the answer isn’t as well documented or has a lot of nuance, they then do a spectacular job of generating bullshit.

TBH, so would a random person in such a situation (if they produced anything at all).

As an example: how often have you heard about a company’s marketing departments over-hyping their upcoming product, resulting in unmet consumer expectation, a ton of extra work from the product’s developers and engineers, or both? This is because those marketers don’t really understand the product - either because they don’t have the information, didn’t read it, because they got conflicting information, or because the information they have is written for a different audience - i.e., a developer, not a marketer - and the nuance is lost in translation.

At the company level, you can structure a system that marketers work within that will result in them providing more correct information. That starts with them being given all of the correct information in the first place. However, even then, the marketer won’t be solving problems like a developer. But if you ask them to write some copy to describe the product, or write up a commercial script where the product is used, or something along those lines, they can do that.

And yet the marketer role here is still more complex than our existing AI systems, but those systems are already incorporating patterns very similar to those that a marketer uses day-to-day. And AI researchers - academic, corporate, and hobbyists - are looking into more ways that this can be done.

If we want an AI system to be able to solve problems more reliably, we have to, at minimum:

break down the problems into more consumable parts
ensure that components are asked to solve problems they’re well-suited for, which means that we won’t be using an LLM - or even necessarily an AI solution at all - for every problem type that the system solves
have a feedback loop / review process built into the system

In terms of what they can accept as input, LLMs have a huge amount of flexibility - much higher than what they appear to be good at and much, much higher than what they’re actually good at. They’re a compelling hammer. System designers need to not just be aware of which problems are nails and which are screws or unpainted wood or something else entirely, but also ensure that the systems can perform that identification on their own.

hedgehog@ttrpg.network · 23 days ago

It’s more like paying the ticket without ever showing up in court. And at least where I live, I can do that.

hedgehog@ttrpg.network · 24 days ago

The news sites can cover whatever they want. If their readers consume it, great - they’re writing to audience. Doesn’t mean we can’t criticize it when it gets posted here.

hedgehog@ttrpg.network · 24 days ago

And when you’re comparing two closed source options, there are techniques available to evaluate them. Based off the results of people who have published their results from using these techniques, Apple is not as private as they claim. This is most egregious when it comes to first party apps, which is concerning. However, when it comes to using any non-Apple app, they’re much better than Google is when using any non-Google app.

There’s enough overlap in skillset that pretty much anyone performing those evaluations will likely find it trivial to configure Android to be privacy-respecting - i.e., by using GrapheneOS on a Pixel or some other custom ROM - but most users are not going to do that.

And if someone is not going to do that, Android is worse for their privacy.

It doesn’t make sense to say “iPhones are worse at respecting user privacy than Android phones” when by default and in practice for most people, the opposite is true. What we should be saying is “iPhones are better at respecting privacy by default, but if privacy is important to you, the best option is to put in a bit of extra work and install GrapheneOS on a Pixel.”

hedgehog@ttrpg.network · 24 days ago

You think that Google Play Services is FOSS? Or that the version of Android on Samsung phones (as well as of most other Android phone manufacturers), including all baked in software, is FOSS?

hedgehog@ttrpg.network · 24 days ago

Better than bad is still “better.”

hedgehog@ttrpg.network · 24 days ago

If you’re talking about a stock Android OS on anything other than a Pixel, iOS wins in both regards. Stock on a Pixel, I don’t know that Apple is more secure, but if you’re installing apps via Google Play that use Google Play Services, iOS is certainly more private. Vs GrapheneOS on a Pixel, iOS is less private by far.

hedgehog@ttrpg.network · 1 month ago

Apparently it’s still being actively developed! I’m impressed.

April 15, 2024 Lynx v2.9.1 release

hedgehog@ttrpg.network · 1 month ago

Just sharing this link to another comment I made replying to you, since it addresses your calculations regarding entropy: https://ttrpg.network/comment/7142027

hedgehog@ttrpg.network · 1 month ago

Ah, fair enough. I was just giving people interested in that method a resource to learn more about it.

The problem is that your method doesn’t consistently generate memorable passwords with anywhere near 77 bits of entropy.

First, the example you gave ended up being 11 characters long. For a completely random password using alphanumeric characters + punctuation, that’s 66.5 bits of entropy. Your lower bound was 8 characters, which is even worse (48 bits of entropy). And when you consider that the process will result in some letters being much more probable, particularly in certain positions, that results in a more vulnerable process. I’m not sure how much that reduces the entropy, but it would have an impact. And that’s without exploiting the fact that you’re using quoted as part of your process.

The quote selection part is the real problem. If someone knows your quote and your process, game over, as the number of remaining possibilities at that point is quite low - maybe a thousand? That’s worse than just adding a word with the dice method. So quote selection is key.

But how many quotes is a user likely to select from? My guess is that most users would be picking from a set of fewer than 7,776 quotes, but your set and my set would be different. Even so, I doubt that the set an attacker would need to discern from is higher than 470 billion quotes (the equivalent of three dice method words), and it’s certainly not 2.8 quintillion quotes (the equivalent of 5 dice method words).

If your method were used for a one-off, you could use a poorly known quote and maybe have it not be in that 470 billion quote set, but that won’t remain true at scale. It certainly wouldn’t be feasible to have a set of 2.8 quintillion quotes, which means that even a 20 character password has less than 77.5 bits of entropy.

Realistically, since the user is choosing a memorable quote, we could probably find a lot of them in a very short list - on the order of thousands at best. Even with 1 million quotes to choose from, that’s at best 30 bits of entropy. And again, user choice is a problem, as user choice doesn’t result in fully random selections.

If you’re randomly selecting from a 60 million quote database, then that’s still only 36 bits of entropy. When the database has 470 billion quotes, that’ll get you to 49 bits of entropy - but good luck ensuring that all 470 billion quotes are memorable.

There are also things you can do, at an individual level, to make dice method passwords stronger or more suitable to a purpose. You can modify the word lists, for one. You can use the other lists. When it comes to password length restrictions, you can use the EFF short list #2 and truncate words after the third character without losing entropy - meaning your 8 word password only needs to be 32 characters long, or 24 characters, if you omit word separators. You can randomly insert a symbol and a number and/or substitute them, sacrificing memorizability for a bit more entropy (mainly useful when there are short password length limits).

The dice method also has baked-in flexibility when it comes to the necessary level of entropy. If you need more than 82 bits of entropy, just add more words. If you’re okay with having less entropy, you can generate shorter passwords - 62 bits of entropy is achieved with a 6 short-word password (which can be reduced to 18 characters) and a 4 short-word password - minimum 12 characters - still has 41 bits of entropy.

With your method, you could choose longer quotes for applications you want to be more secure or shorter quotes for ones where that’s less important, but that reduces entropy overall by reducing the set of quotes you can choose from. What you’d want to do is to have a larger set of quotes for your more critical passwords. But as we already showed, unless you have an impossibly huge quote database, you can’t generate high entropy passwords with this method anyway. You could select multiple unrelated quotes, sure - two quotes selected from a list of 10 billion gives you 76.4 bits of entropy - but that’s the starting point for the much easier to memorize, much easier to generate, dice method password. You’ve also ended up with a password that’s just as long - up to 40 characters - and much harder to type.

This problem is even worse with the method that the EFF proposes, as it’ll output passphrases with an average of 42 characters, all of them alphabetic.

Yes, but as pass phrases become more common, sites restricting password length become less common. My point wasn’t that this was a problem but that many site operators felt that it was fine to cap their users’ passwords’ max entropy at lower than 77.5 bits, and few applications require more than that much entropy. (Those applications, for what it’s worth, generally use randomly generated keys rather than relying on user-generated ones.)

And, as I outlined above, you can use the truncated short words #2 list method to generate short but memorable passwords when limited in this way. My general recommendation in this situation is to use a password manager for those passwords and to generate a high entropy, completely random password for them, rather than trying to memorize them. But if you’re opposed to password managers for some reason, the dice method is still a great option.

hedgehog@ttrpg.network · 1 month ago

Sure, and that’s roughly the same amount of entropy as a 13 character randomly generated mixed case alphanumeric password. I’ve run into more password validation prohibiting a 13 character password for being too long than for being too short, and for end-user passwords I can’t recall an instance where 77.5 bits of entropy was insufficient.

But if you disagree - when do you think 77.5 bits of entropy is insufficient for an end-user? And what process for password generation can you name that has higher entropy and is still easily memorized by users?