As a machine learning platform engineer these sound like technology choices as opposed to infrastructure decisions. I would love to read this post but really with the infrastructure trade-offs that were made. But thanks for the post.
Side node: There is a small typo repeated twice "Kuberentes"
A shower thought I had recently is that we should start referring to "Business cases" as "Survival cases" and "Business books" as "Survival books" and that makes the inherent bias so much clearer.
Well,… my fave post grad course was Bankruptcy. It was insolvency after insolvency after insolvency and in a way also survivorship bias, but I digress — like Poor Charlie (RIP) used to say: find out where you are going to die and make sure never to go there.
And if you happen to end up there any way, make sure you die properly and in grace. Aka, manage your bankruptcy properly, without any shenenigans and refusal to accept the inevitable.
I have one notebook that I carry around where I list tasks just in order to cross them out. There is something so viscerally satisfying in crossing out a task that just provides so much momentum to whatever I'm doing. There's just something beautiful about making the implicit explicit and then striking it through.
I have been known to list a bunch of sometimes fairly trivial stuff I need to do in a day in my notebook calendar. And it is very satisfying to cross a half-dozen things off.
Very interesting post and I read it in full. The article is definitely interesting but I found it a shame that the author didn't talk more about his system that brought him from 5k to 200k as the title implies :)
I asked this question 5 years ago in an AMA the CEO did once here, but they plainly ignored me and are a unicorn 5 years later. The trick here is to move so fast that the law has trouble keeping up (same with Uber/AirBNB).
This basically means: Use the version of python between 3.8 and 3.9 and use any version higher than 1.4.3 for pandas.
What I like about poetry is that it makes sure that the whole dependency graph of the packages that you add is correct. If it can not solve the graph, then it fails fast and it fails hard, which is a good thing.
This is probably a very bad explainer of what poetry does, but be sure to check it out! :)
Poetry has a major issue with its lockfiles when working with active projects. It generates a top level dependency listing checksum, which causes any two PRs/branches that independently update the top level requirements to conflict with each other.
The other issue with Poetry is that it uses its own pyproject.toml dependency listing format instead of the one standardized in the relevant PEP (https://peps.python.org/pep-0621/). This is understandable for historical reasons (Poetry was first written before this was standardized), but Poetry should have been updated to support the standard format.
A relatively minor issue, but the poetry shell command is also a footgun. It's presented as a way to configure your shell to activate the virtualenv for the project. In reality it's a very slow, barely functional terminal emulator running on top of your terminal, which will cause problems for any programs that assume a working terminal or talk to the tty directly.
poetry shell is also not a term emulator, it's just a subshell with some environment variables setup for your project. Once you are in, it's just a regular shell. If anything is slow, it's where you add or remove a dependency, but it's probably faster than you editing requirements.txt, clearing out your virtualenv and then reinstalling everything again.
The process spawned by `poetry shell` is a terminal emulator driven by the pexpect and cleo packages. It hijacks and proxies the user's keystrokes before sending them to the underlying terminal.
That is the point I was making. It's not a proper terminal emulator, instead it's a half-assed one. If it gets between the user's keystrokes and the host shell, it should be a proper emulator. Otherwise it should set up the environment and get out of the way.
Poetry is good, I'll hit a bug in poetry from time to time but it's improving quickly.
Direnv is better, direnv layout will automatically create a hidden virtualenv directory on your work tree and sets up your path when you cd into it. The only downside is it doesn't seem to work on Windows.
I don't see the link between poetry and direnv? Poetry is about solving python's dependency issues, direnv doesn't seem to have anything to do with that
Give it a go then tell me what the point is for any of these poetry/pipenv/hatch/flit/pdm/pyflow thingy if neither you or your teammates work on Windows.
That will only take care of setting up and loading the correct virtualenv. Which you may prefer to "poetry shell" or "poetry run", since it's more automatic, but that's not the main reason for using poetry.
The main reason to use poetry is sane dependency management the way it exists for most other ecosystems (bundler, cargo, npm, maven, gradle, ...). In particular, that includes lockfiles.
Use direnv to take care of the virtualenv part of functionality poetry offers, use pip-tools to deal with the dependency management/reproducible build part. Or if you like, straight up pip freeze.
Use pyenv to install all the different python versions you need.
Ideally, all of these functionality should bundled in one tool, but the only thing that's available is pyflow, which don't stop blowing up with an exception for me.
The hardest packaging problem in Python is not resolving dependencies or pinning down dependencies. Pip has it largely solved a some years ago. Literally every Python packager using some parts of pip underneath. The messiest part is literally packaging and how to install and isolate the packages.
How you install multiple Python versions doesn't really matter as long as the binaries are on your PATH. You can use Homebrew or Macports or Pyenv or whatever. The only remaining problem is how to manage your virtualenvs. You can use virtualenv or venv directly, but you will have to manage where to put them and remember to activate them before you install dependencies and dev tooling. But if you use direnv, it's fire and forget, once you have direnv setup and one line of directive in a .envrc file, or perhaps a few gitignore, you don't have to think about where to put the virtualenv or remember to activate it again.
So yes, I'm actually just recommending direnv if you want to keep it simple.
When your project depends on a module version and that module depends on another one (sub dependency), it’s very common that re-installing the same module version in a new environment will cause something to break because the sub dependency was updated. This is not something that direnv solves therefore those other tools are still needed.
Maybe it is the same, but the idea is to take that selected version of the dependency and store a hash checksum for it, so that one can later get the exact same dependencies. Poetry only does not source setup.cfg, but its tool-specific config file. This way it stays out of the way of any other tool.
A Pipfile can store hashes for multiple versions of a package built for multiple architectures; whereas requirements.txt can only store the hash of one version of the package on one platform.
Can a requirements.txt or a Pipfile store cryptographically-signed hashes for each dependency? Which tool would check that not PyPI-upload but package-builder-signing keys validate?
FWIU, nobody ever added GPG .asc signature support to Pip? What keys would it trust for which package? Should twine download after upload and check the publisher and PyPI-upload signatures?
If the hashes are retrieved over the same channel as the package (i.e. HTTPS), and that channel is unfortunately compromised, why wouldn't a MITM tool change those software package artifact hash checksums too?
Only if the key used to sign the package / package manifest with per-file hashes is or was retrieved over a different channel (i.e. WKD, HKP (HTTPS w/w/o Certificate Pinning (*))), and the key is trusted to sign for that package, then install the software package artifact and assign file permissions and extended filesystem attributes.
> sigstore empowers software developers to securely sign software artifacts such as release files, container images, binaries, bill of material manifests [SBOM] and more. Signing materials are then stored in a tamper-resistant public log.
> It’s free to use for all developers and software providers, with sigstore’s code and operational tooling being 100% open source, and everything maintained and developed by the sigstore community.
> How sigstore works: Using Fulcio, sigstore requests a certificate from our root Certificate Authority (CA). This checks you are who you say you are using OpenID Connect, which looks at your email address to prove you’re the author. Fulcio grants a time-stamped certificate, a way to say you’re signed in and that it’s you.
> You don’t have to do anything with keys yourself, and sigstore never obtains your private key. The public key that Cosign creates gets bound to your certificate, and the signing details get stored in sigstore’s trust root, the deeper layer of keys and trustees and what we use to check authenticity.
> our certificate then comes back to sigstore, where sigstore exchanges keys, asserts your identity and signs everything off. The signature contains the hash itself, public key, signature content and the time stamp. This all gets uploaded to a Rekor transparency log, so anyone can check that what you’ve put out there went through all the checks needed to be authentic.
In comparison to poetry I think it includes more advanced multi-environment and multi-python-version support and a tox-like testing matrix. It probably gets a little too complex there.
It also works with pyproject.toml
If anyone else has experience with Hatch vs Poetry please share!
I've been using PDM recently and although there have been a few issues I really like just cd'ing into a directory, running "python" and the correct set of packages being available.
I seem to remember that some part of poetry's slowness is due to how the python index works and is therefore a problem shared by all such tools.
That said, I've used both pipenv and poetry, and I had projects where pipenv would simply time out when trying to resolve packages. I haven't seen the same behaviour with poetry (indeed, that was the reason I migrated one project from pipenv to poetry after I just had to give up with the former).
I'm half asian and I get the "Asian flush" [0] which for all intents and purposes induces the same effect as this drug (DSF)
>Individuals who experience the alcohol flushing reaction may be less prone to alcoholism. Disulfiram, a drug sometimes given as treatment for alcoholism, works by inhibiting acetaldehyde dehydrogenase, causing a five to tenfold increase in the concentration of acetaldehyde in the body. The resulting irritating flushing reaction tends to discourage affected individuals from drinking.[9][10]
I literally have a negative feedback loop from drinking alcohol, it makes me feel really bad, my skin flushes, I feel my heartbeat in my chest/neck/head, and I literally feel miserable.
Honestly, it kind of sucks, because when you can't drink you notice how much socialising is based around alcohol. I've found ways to cope with this and am on good terms with myself now but it really took me a long time. Especially to get around the social/peer pressure of having to drink.
On the bright side, I'll likely never become an alcoholic.
The effect you describe is only one of the effects of antabuse. The others are substantially worse (in that they are extremely painful and in that they can be medically quite severe).
The mechanism is similar, you're right, but not quite the same.
Side node: There is a small typo repeated twice "Kuberentes"