Our Java Open Source Contributions to Presto & Airlift

Introduction

This is a follow-up to a previous post describing the work we are doing towards constructing a state-of-the-art data platform from the ground up:

Previous post

Here I will describe a couple of key contributions we developed and merged into the Presto & Airlift codebases in order to be able to fully incorporate these technologies into our project.

1. Contributing to an Open Source Project

In my opinion, every time you decide to incorporate an open source component into an enterprise system, it is a very good practice to not only read through the documentation and examples thoroughly, but also look closely into the codebase where you can easily spot a few important things:

The more boxes you check when looking at the codebase, the more trust you can have in the project and its viability to become a vital part of your own system. Furthermore, it is important to make sure that if / when needed: small adjustments can be made, bugs can be corrected, patches can be issued, additional features which make sense can be developed, documentation can be improved and so on in an iterative & agile fashion.

That being said, you can always take the next step and actually contribute to the project. Even the smallest form of collaboration can make a big difference and benefit the whole community, for example: submitting improvements to the documentation, voting up and / or commenting on reported issues, reviewing relevant PRs, and of course writing code for new features, bugfixes, patches, etc.

Almost in any case, in order to contribute efficiently it is useful to have a good grip on the Git flow as well as on concepts such as squashing, rebasing and so on.

In the case of Airlift & Presto, these are very well-structured Java projects (especially Presto), where most things are highly automated through bots and actions, from signing the CLI to getting your stuff thoroughly tested via pre-baked docker compose scenarios which include not only a single instance topology but also production grade configurations including critical dependencies such as Hive, HDP, etc. All of this makes it super easy to just focus on the task at hand and move along the paces until the code is ready for review and promotion from the maintainers.

The two contributions described on this blogpost were a bit challenging to get on an official release because they meddle with the core functionality of the products, so there was a lot of back and forth to make sure that whatever breaking changes we were introducing could be easily managed through configuration and didn’t create major issues for current users of these technologies.

2. Adding a Pluggable Certificate Authenticator to Presto

Link to the original PR

Our PR addressed the following concerns:

This functionality was made available from release 334 after the PR was approved and merged.

3. Making SSL Hostname Verification Configurable for Airlift’s Embedded Jetty

Link to the original PR

This functionality is available from release 198.

4. Key Takeaways

Thanks for reading!!