In the early days of computing, programmers shared software to learn from each other and evolve. Though the open source notion gradually moved to commercialization, the attention that free software gets is significant. Netscape was a pioneer in publishing the source code for their free software suite. The Open Source Initiative (OSI) of 1998 is one of those things that happened driven by Netscape’s inspirational software. OSI then inspired developers around the world to publish open-source software and the rest is history. The open source culture encouraged collaboration among developers, which resulted in higher quality software. Audits, quick fixes, updates, and license management are better when the software is open source. Here is a list of top six cool new open source projects released over the past year.
1. Ludwig by Uber
Ludwig is a TensorFlow-based toolbox that allows you to train and test deep learning models without the need to write any of the code. Incubated at Uber for the last two years, Ludwig was finally open sourced this February to incorporate the contributions of the data science community. With Ludwig, a data scientist can train a deep learning model by simply providing a CSV file that contains the training data as well as the YAML file with the outputs and inputs of the model.
Ludwig was created with five fundamental principles:
- No code: You don’t need coding skills to train a model and use it for obtaining predictions.
- Generability: A new data type based approach to data learning models design that makes the tool usable across many different use cases.
- Flexibility: Experienced users have extensive control over the model training and building, while newcomers will also find it easy to use.
- Extensibility: Easy to add new model architecture and some new feature data types.
- Visualizations: Some of the deep learning model internals are considered as black boxes. But here, they provide you with standard visualizations to understand the performance and compare the actual predictions.
To solve the latency problem in a distributed microservice application, Netflix has an open source tool called FlameScope. Netflix FlameScope is a performance visualization utility. It allows programmers and administrators to analyze CPU activity. It generates a subsecond offset heat map in which the arbitrary spans of time can be selected by the user. Further analysis by selecting a portion of the heat map, for which a flame graph is generated for a corresponding block of time.
Netflix has a long history of releasing utilities developed internally for performance analysis and debugging as open source software. The new visualization tool instantly generates flame graphs from the sections of system profiles. The tool is a boon to programmers who want to identify the origin of performance issues.
3. Kraken by Uber
Uber has open sourced its internal peer-to-peer Docker registry, Kraken to the public. Kraken was developed by the company’s cluster management team in early 2018 to solve the performance issues they faced with their legacy Docker registry stack. At Uber, Kraken is used internally for managing and distributing Docker images. It is also capable of distributing terabytes of data in seconds.
Kraken is designed for Docker image management, distribution, and replication in a hybrid cloud environment. With pluggable backend support, Kraken can be plugged into the existing Docker registry setups as the distribution layer. Kraken was designed with scalability in mind. According to Uber’s engineers, Docker containers are the building block of Uber’s infrastructure. As the size and count of the compute clusters grow, a simple Docker registry with sharding and cache can’t keep up with the throughput required to efficiently distribute Docker images.
4. Vulcanizer by Github
Elasticsearch is a distributed search and analytical engine. Elasticsearch lets you perform and combine many types of searches the way you want. Besides performing quick searches, it also features impressive scalability for running on a multitude of servers. Github Vulcanizer is a library for using Elasticsearch.
Vulcanizer is great at getting to the nodes of cluster settings. It safely adds (and removes) nodes from settings to ensure that the shards don’t unexpectedly allocate onto a node. This Golang library interacts with an Elasticsearch cluster. It is not a full-fledged API client but helps you with common tasks when operating a cluster. These tasks include querying health status, updating cluster settings, and migrating data off nodes. The idea of Vulcanizer was born out of a frustrated effort to administer their clusters by building a packaged chat app. Initially, the project was executed with the following simple goals in mind:
- Access the REST endpoints on a host.
- Perform an action.
- Provide results of the action.
Ideas like shard allocation, recovery, and more index-related cases are proposed for the future.
5. Spectrum by Facebook
Spectrum is an open source image processing library released by Facebook. Mobile cameras are becoming increasingly powerful. This translates to a lot of technical implications from a social media sharing perspective. Large images consume more network bandwidth when shared online. This is why Facebook automatically compresses images. But this results in reduced image quality (lossy compression).
With Facebook Spectrum, you don’t have to trade off quality for a good upload experience. Spectrum ensures lossless resizing even when cropping and rotating JPEG images. It comes with native image compression libraries like Mozjpeg (Mozilla’s flagship JPEG encoder). The consistent API enables developers to control advanced parameters including chroma subsampling. Spectrum has helped the company improve the quality of images uploaded via its own suite of apps.
6. Dopamine by Google
Dopamine is a TensorFlow-based framework released by Google’s DeepMind team. Dopamine aims to provide stability, flexibility, and reproducibility for reinforcement learning (RL) researchers. This release included a set of colabs to provide clarity on how to use the framework. It fills the need for an easily grokked codebase in which users can independently experiment with speculative research. The framework is compact, reliable, flexible, and facilitates reproducibility in results.
Despite the advancements in AI technology, deep learning systems still can’t keep up when trying to mimic the human brain. They require hundreds of hours to master even simple video games. Frameworks like Dopamine enable machines to learn faster. The concept was built around a meta-learning approach where rules are derived from examples and concepts are learned from the results. It comes with a single-GPU rainbow agent implements three vital components: n-step bellman updates, prioritized experience replay, and distributional reinforcement learning. The project was named after dopamine, a neurotransmitter responsible for sensations, emotions, and movements.
Open source projects: More to come
The open collaboration approach has unlocked a whole new level of transparency. Last year, open source was a significant accelerator of innovation especially in areas like machine learning, cloud computing, microservices, and blockchain. Last year witnessed $53 billion of transactions involving open source projects. Experts predict that 2019 will double this down. IBM’s $34 billion acquisition of Red Hat for its open source technology reinforces this hope. The future will see a lot of hyper-successful open source projects like the Linux operating system, Firefox browser, WordPress, and the Apache web server.
Featured image: Pixabay