Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC: Software Engineering Language Policy at Posthog #71

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

fuziontech
Copy link
Member

We've talked around this one for a while. I know some people here are polyglots and really love working with and being exposed to a few different languages, others are extremely dedicated to their languages.

The important thing to underscore here is that it is about 10% proposal to use Golang and about 90% honest desire to get people's opinions.

This is a rare chance for us to do Issues over PRs...otherwise I would have just put up a huge PR entirely in Golang and said 🚢 it.

Copy link
Contributor

@mariusandra mariusandra left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not in any of the teams that'll be maintaining this going forward. Their buy in is more important than mine. That said, I think this is a distraction. First, the RFC reads like it's written to find a problem whose answer is "go". Second, the "frankly" part of the problem statement was frankly weak and unconvincing. 🤷

Let's not. 🤣

Comment on lines +13 to +19
Frankly:

- Python is slow.
- Node is a memory hog.
- Node tooling is slow.
- Dependencies can be huge.
- No guarantees that code is correct.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Python is slow.

True generally, but relevant only for specific usecases (like here).

  • Node is a memory hog.

Has this been a problem? There are alternative implementations of JS (e.g. bun, deno) that do this better, but are too fresh to rely on in production.

  • Node tooling is slow.

Depends on the tooling. The frontend builds quite fast (sub 3-10sec), the plugin server can be improved 10x if needed.

  • Dependencies can be huge.

We're switching to pnpm that'll fix this

  • No guarantees that code is correct.

Can you ellaborate?


The bad:

- Considered somewhat crusty
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strong opinions weakly held time. Anyone writing java should be writing kotlin.

The dependency management and build tooling in java is horrible: arcane, manual, and hard to debug.


## Problem statement

We have the opportunity coming up to rebuild or greenfield build out services that are critical parts of our data pipeline. It will be important for these services to be correct, fast, and efficient. Considering this now is a good time to ask: Are we using the correct tools for the job?
Copy link
Member

@pauldambra pauldambra Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd suggest some of the benefits of switching language comes from re-writing/extracting the service and only part from re-writing/extracting the service in a different language

At a previous job we spent a while rewriting from Python to C# because "C# is faster than Python". Then realised that we were CPU bound and moved a bunch of work onto the GPU. Python and C# were then faster than we needed them to be.

So: "are we using the correct tools for the job" is a great question. But being really clear on why we're asking that and how we'll know when we're done is super important. E.g. moving ingestion onto the GPU is probably more complicated and not actually faster.

So, there are two interesting questions here:

  1. what needs to be done to make ingestion safer/faster/misc
  2. how many programming languages can we support

These are really different questions.

I'm not working on ingestion so I can have opinions on 1 but "so what?"

On "how many programming languages"... The problem with a new (to you) programming language is almost never the language.

Aside... from https://neverworkintheory.org/2014/01/29/stefik-siebert-syntax.html Python and Ruby are consistently measured as easiest to learn. C-style languages are no easier to learn than a language made of random keywords.

The problem with a new (to you) language is generally the tooling. Most noticeably, in my experience: dependency management, and building and releasing things.

(go and update our android library from java 8 to java 19 if you want proof of this :))

So, we need a comparison of building, releasing, and running in k8s for the languages we consider. I think we can consider a smaller list (although C# marketing should be "Java but good", and I've a friend we could hire if we started using clojure which seems to inspire massive love from folk that use it)

The other thing is adoption within the org. Who needs to learn the language, and how and when do they do that? How do we know when to write it? Are we migrating to it or adding it alongside? Which services mustn't it be used for?

And finally: hiring. "Come work here because you're excited about language X. Incidentally the first few months you'll be working on these bugs in Python". So, we can pull from a wider pool with a wider pool of languages but are we in a place to hire someone who only wants to work on language X


Bonus "after finally" point... who owns tooling for each language? Does the platform team commit to providing support for building, deploying, and running all the languages? Is that in our definition of platform? Or do we need some champions


Grouping these together because they really are converging. Really this is any language on the JVM.

The JVM is truly a wonderful runtime. It's very fast. The hotspot detection + Just In Time Compiling is _super_ impressive.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw a talk by the folk at a German car sales company who were shipping an ML model in Java. They had to write tooling to hit all of the code paths they cared about during boot because JIT was a problem for them.

That said there are a bunch of alternative VMs that are focussed on speed because of serverless-style loads...


The goal here is to spark debate about languages that we should use here at Posthog. What are we open to? Why should we not adopt new technologies.

Success here would be to make a decision and effectively enact a policy on this and have engineers aligned and not worry about this again (at least for some time)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As above... one decision only?

  1. what needs to be done to make ingestion safer/faster/misc
  2. how many programming languages can we support
  3. how do we support any one programming language

@charlescook-ph
Copy link
Contributor

Obviously hiring difficulty shouldn't be a deciding factor here (and I am 100% not suggesting that), but if we don't have a strong feeling overall for Python vs. Go or Rust here, it's worth noting that there are ~5x Python devs out there vs. either of those languages.

(Again, any decision should be what's best for the company and product, not what is easier to hire for - I'm just calling this out if there doesn't end up being strong consensus one way or the other.)

@Twixes
Copy link
Contributor

Twixes commented Nov 10, 2022

I can see how some of these languages might be good choices for a greenfield project (especially Rust and Go in my view), but I'm curious what such greenfield project are we planning? It doesn't seem like it would benefit us to rewrite anything.

Copy link

@ellie ellie left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a bunch of thoughts here, and I think the outcome of this depends on what we're trying to achieve

  1. are we suggesting a paved road or framework for all future services, which requires buy-in from everyone and continued maintenance into the future? Ie, are we wanting to eventually replace python and node, and consider them "old" technologies in our stack?

  2. do we want to use something that isn't Python for the inserter service?

I think if this is the case, instead of arguing that technology X is better than technology Y in general, we should instead argue why introducing another technology is worth it for that specific use case.

From my understanding of the inserter service, it would be well implemented in Go. But we should definitely ensure that we explore Python as a possibility and conclude as to why it isn't a good idea, given Go would add a little friction elsewhere.


Otherwise. Going forwards I'd love us to be more data-driven with our decisions like this. It's almost ironic that we aren't, given our product enables companies to make data-driven decisions about their own products. Before we make sweeping decisions about a language/tech, we definitely need to measure things in much more detail and know why what we have right now does not work well enough for us.

While I love playing with new tech, it's also worth giving this essay a read: https://mcfunley.com/choose-boring-technology


Frankly:

- Python is slow.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Python can also be blazingly fast

  1. Do we know where our python code is slow, and why? Perhaps there is a hot path we can optimize, some memory allocations we can clean up, or wasteful logic
  2. New Python asyncio runtimes can be super fast. I spent a previous job writing a lot of async Python with things like FastAPI and uvicorn, and that was in no way slow
  3. Have we considered PyPy, or similar?
  4. Worst case we could write hot paths in something like Rust, and use something like PyOxide/similar. But for our application, I highly doubt we are CPU bound + are unlikely to see any meaningful performance benefit.

Anyway, this is all a way of saying that you can write slow python and you can write fast python, and perhaps we should consider focusing our efforts on the latter before introducing new languages.


## Meet the eligible languages

### Golang
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tbh I would not be against us introducing Go for a new service, so long as we're sure that Python does not and cannot suit our use case

- Not terribly expressive
- Verbose, but easy to read

### Rust
Copy link

@ellie ellie Nov 10, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I would not be wrong in saying I have the most Rust experience here. I LOVE Rust. and I'd love to write it at work. I also have several friends who are fantastic engineers + would consider applying here purely because of Rust.

But tbh I'd be super against us adopting it. The learning curve is huge (prepare to not be comparatively productive for around a quarter). Unless it turns out there's actually more people writing Rust for fun here than I thought 😄

While maturing, the ecosystem is nothing like Python. We'd be spending a lot more time in the weeds. There are fewer "obvious" choices than other languages. EG, which async runtime should we use (if any...)? Why? Do we know enough about how our application does/should work in production to even start making such a choice?

Generally organizations choose Rust because

  1. They have explicitly decided that they will be a "Rust company", from the beginning (eg see TrueLayer)
  2. They have eked out every last drop of performance from their current stack, and need to go yet further (eg see Discord)

@hazzadous hazzadous removed their request for review December 28, 2022 09:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants