bjorney

bjorney@lemmy.ca · 1 day ago

You can just point your domain at your local IP, e.g. 192.168.0.100

bjorney@lemmy.ca · 14 days ago

Right? it screams wayyyy pre-y2k but MySQL was only release in 95

bjorney@lemmy.ca · 14 days ago

will it become a relic of the past?

Probably

why YEAR in the first place, who would actually make use of it?

Accounting systems in the 90s that needed to squeeze out every drop of performance imaginable

bjorney@lemmy.ca · 14 days ago

I expect it won’t

The year datatype is a 1 byte integer, but the engine adds/subtracts 1900 to the value under the hood and has special handling for zero.

If you need to store more than 255 years range, you can use a 2 byte integer, which doesn’t need that special handling under the hood, because with 2 bytes you can store 65000+ years

bjorney@lemmy.ca · 14 days ago

Literally every library with any traction in any field is MIT licensed.

If the scientific python stack was GPL, then industry would have just kept paying for Matlab licenses

bjorney@lemmy.ca · 23 days ago

For every 1 person who knows how to use the windows command line, there are 50 people struggling because they didn’t embed their video into their PowerPoint, or worse, their USB stick only contains a shortcut to their actual .ppt file

bjorney@lemmy.ca · edit-2 25 days ago

There are like 10,000 different solutions, but I would just recommend using what’s built in to python

If you have multiple versions installed you should be able to call python3.12 to use 3.12, etc

Best practice is to use a different virtual environment for every project, which is basically a copy of an existing installed python version with its own packages folder. Calling pip with the system python installs it for the entire OS. Calling it with sudo puts the packages in a separate package directory reserved for the operating system and can create conflicts and break stuff (as far as I remember, this could have changed in recent versions)

Make a virtual environment with python3.13 -m venv venv the 2nd one is the directory name. Instead of calling the system python, call the executable at venv/bin/python3

If you do source venv/bin/activate it will temporarily replace all your bash commands to point to the executables in your venv instead of the system python install (for pip, etc). deactivate to revert. IDEs should detect the virtual environment in your project folder and automatically activate it

bjorney@lemmy.ca · 1 month ago

The feature is explicit sync, which is a brand new graphics stack API that would fix some issues with nvidia rendering under Wayland.

It’s not a big deal, canonical basically said ‘this isn’t a bug fix or security patch, it’s not getting backported into our LTS release’ - so if you want it you have to install GNOME/mutter from source, switch operating systems, or just wait a few months for the next Ubuntu release

bjorney@lemmy.ca · 1 month ago

GNOME said this update is a minor bug fix (point release)

Canonical said this is actually a major feature update, and doesn’t want to backport it into its LTS repositories

bjorney@lemmy.ca · edit-2 2 months ago

Reddit has way more data than you would have been exposed to via the API though - they can look at things like user ARN (is it coming from a datacenter), whether they were using a VPN, they track things like scroll position, cursor movements, read time before posting a comment, how long it takes to type that comment, etc.

no one at reddit is going to hunt these sophisticated bots because they inflate numbers

You are conflating “don’t care about bots” with “don’t care about showing bot generated content to users”. If the latter increases activity and engagement there is no reason to put a stop to it, however, when it comes to building predictive models, A/B testing, and other internal decisions they have a vested financial interest in making sure they are focusing on organic users - how humans interact with humans and/or bots is meaningful data, how bots interact with other bots is not

bjorney@lemmy.ca · edit-2 2 months ago

Not with 64gb ram and 16+ cores on that budget

bjorney@lemmy.ca · edit-2 2 months ago

To compare every comment on reddit to every other comment in reddit’s entire history would require an index

You think in Reddit’s 20 year history no one has thought of indexing comments for data science workloads? A cursory glance at their engineering blog indicates they perform much more computationally demanding tasks on comment data already for purposes of content filtering

you need to duplicate all of that data in a separate database and keep it in sync with your main database without affecting performance too much

Analytics workflows are never run on the production database, always on read replicas which are taken asynchronously and built from the transaction logs so as not to affect production database read/write performance

Programmers just do what they’re told. If the managers don’t care about something, the programmers won’t work on it.

Reddit’s entire monetization strategy is collecting user data and selling it to advertisers - It’s incredibly naive to think that they don’t have a vested interest in identifying organic engagement

bjorney@lemmy.ca · edit-2 2 months ago

Look at the picture above - this is trivially easy. We are talking about identifying repost bots, not seeing if users pass/fail the Turing test

If 99% of a user’s posts can be found elsewhere, word for word, with the same parent comment, you are looking at a repost bot

bjorney@lemmy.ca · 2 months ago

I know everyone here likes to circle jerk over “le Reddit so incompetent” but at the end of the day they are a (multi) billion dollar company and it’s willfully ignorant to infer that there isn’t a single engineer at the company who knows how to measure string similarity between two comment trees (hint: import difflib in python)

bjorney@lemmy.ca · 2 months ago

If you have access to the entire Reddit comment corpus it’s trivial to see which users are only reposting carbon copies of content that appears elsewhere on the site

bjorney@lemmy.ca · 2 months ago

Reddit has access to its own data - they absolutely know which users are posting unique content and which user’s content is a 100% copy of data that exists elsewhere on their own platform

bjorney@lemmy.ca · 2 months ago

Reddit probably omits bot accounts when it sells its data to AI companies

bjorney@lemmy.ca · 2 months ago

So the “biologists and pharmacologists” you are citing are just armchair scientists in the Lemmy comment section

bjorney@lemmy.ca · 3 months ago

They aren’t being made anymore - people are just reselling old hoarded stock

https://eyeondesign.aiga.org/we-spoke-with-the-last-person-standing-in-the-floppy-disk-business/

bjorney@lemmy.ca · 3 months ago

They aren’t talking about system administrators. They are talking about 3rd party software presenting a privilege escalation prompt (administrator access) and changing your default browser without you knowing about it