I love Jupyter Notebook. You can use them to write your ideas, to test their feasibility, and finally to share them and their artifacts with the world. Even if in the majority of the time most people are passive in consuming content (just as what you are doing now when reading this blog post), you want to learn and utilize your learnings to act on the world around you. A better environment should allow you to execute what you read and see its effect on your environment and allows you to experiment with the details of these actions.
To understand what is wrong with the current web, let’s take the following example: if I tell you that notebooks are a great way to learn and develop a new skill, you want to see it in action and play around with such a notebook to learn the desired skill. At the same time, I, as the author of the notebook, want you to apply your curiosity and attention as quickly as possible, to increase the probability that you do it. In the “old days” of the Internet, the main way to be active was to click on links. If you wanted to follow an idea and learn more about a concept or a topic, the author of the article could include a link to another read-only page. The way you can make money in this way is when the click is monetized. In the “old Internet,” many monetizations mechanisms where developed, from sites that sell physical or virtual goods for hard money, to sites that are paying to get attention and use that attention to shift your views on commercial brands or political candidates. This method is working well as people love to read stories and use them to shape their worldviews. The passive reading of interesting and compelling narratives was served well with read-only HTML pages with links.
One direction of “improvement” on the read-only pages is to add images, endless feeds, or videos. Nevertheless, this is only deepening the passive addiction and not encouraging active participation on the receiving end.
Can we do better?
Yes, we can with Jupyter notebooks. Instead of having only a few active writers and many (clicking) passive readers, we now have teachers (authoring notebooks), students (interacting with the notebooks) and administrators (security and resources providers). In the next posts, we will discuss the administrative aspects of notebooks in a team, while here will explore more the students' aspects.
Let’s start using a Jupyter notebook. The following example is showing a few cells of a notebook that is was created to generate a pre-signed URL for a notebook using Amazon SageMaker:
The short and simple, yet useful, example of the notebook is showing the various cells in a typical notebook. The first important type is the markdown cells that have text, headers, images, lists, and other navigation and explanation descriptions. Next, you see code cells that can be changed if needed. For example, if I want the pre-signed link of my notebook to be valid for an hour, I can change the number of seconds at the end of the command from 24,800 to 3,600. The last type is the output cell, that shows the output of the previous code cell execution. In this example, the output cell includes the https URL address of the notebook, which I can open without login to the AWS management console. In a single page, we learned how to use an API, customized it to fit out needs, and received an output that we can use to continue our work.
Let’s try it ourselves!
There are a few free options to run a notebook and see it in action. First, you can install Jupyter on your computer, or you can try online services such as Jupyter on the Jupyter site, or in Google Colab.
If you clicked on the link above and opened the simple notebook on Google Colab, you might have noticed a short delay between the first time to click on a play button and the time you saw the output result. From the second click onward the delay disappeared as the notebook is connected to a kernel that can execute the code cells. You can also see the “Connected” status on the top right of the page once the kernel is connected to the notebook. This kernel is the small difference between the existing HTML experience, which is running locally in my browser and uses server resources for computation only when I call a remote endpoint. The current “read-only” setup might give a good feeling of security when thinking that you know what is running on your browser and that when you are calling to a remote server it is encrypted with HTTPS and sent to a trusted server. Nevertheless, if you ever checked the “Inspect” option to view the developer tools of the browser, you probably know that the amount of cryptic code that you are running is huge. See, for example, the “Inspect” view of Medium below.
If you are also curious and want to click on the “Network” tab in the developer tools, you will see the many network calls that are happening behind the scene on your behalf:
I hope that you understand that the browser interface is not as simple and safe, and when we move beyond the passive HTML format, we can build it better. Therefore, the opportunity to offer kernels and services such as Google Colab, Crestle, or Amazon SageMaker is only in its beginning, and the race to control, secure, simplify the discovery of content, and monetize the notebooks eco-system is still in the very early stages.
Where are we going with Jupyter Notebooks?
Jupyter notebooks started as a simple interface for data scientists to sketch, try and share their research, and it is already used to teach and document technical code based products with the popular “Read the docs” framework (see for example these guides to learn PyTorch, AWS SDK Boto3, Haskell Libraries, NLP with PolyGot, or running medical jobs pipelines with Toil).
The concept that the development of the code is done at the same time and with the same tool as the development of the documentation and the tutorials is already used in many DevOps teams, Open Source projects, and any good developer that want people to use and benefit from their hard work. The next obvious step is to allow the reader of the tutorial to actually execute the commands in an interactive notebook, and from there every textbook can become interactive and evolve from a passive tool to an active teacher.
Jupyter notebooks will be used in many other environments, beyond the science lab or the classroom. As I briefly wrote in the article Using jupyter notebooks as your cloud ide for DevOps, we are developing many of our professional services projects in AllCloud using Jupyter notebooks. We are not only using them for the data science or data engineering parts, but also for DevOPs tasks, that were done in the past through terminals and other command line interfaces. The very impressive work of Netflix in this domain, which is explained in Beyond Interactive: Notebook Innovation at Netflix, gives a lot of hope that the tooling is only going to get better.
I believe that these are only small examples of the bigger trend we are heading to. One pillar of the change is the break down of large, complex and monolithic systems into micro-services and APIs. Amazon Web Services is a dramatic example of this change where they are selling thousands of small APIs instead of a single license to an Oracle database, SAP system or Office 365. It is much easier for a CFO to approve a budget with six, seven or eight digits license for Microsoft than to monitor and control a few thousands of dollars that are actually consumed by his teams and services. Nevertheless, a few years from now, once all the upfront license purchase business models die, the “pay-for-what-you-use” model will be the main consumption models for online services, and most likely they will be using micro-services and API.
The second pillar of the trend is the ability of more people in an organization to innovate with technology, beyond the need for dedicated software developers for every small change. This change will probably take longer than the previous micro-services change, and it requires IT departments in many organizations to find their position in a world with the cloud providers such as AWS, the software as a service (SaaS) systems such as SalesForce, and similar changes that are replacing the traditional jobs of building data centers, installing data based and other software packages, and developing large monolithic systems.
The new role of IT departments will be to provide more API and notebooks for their users to connect and consume these internal and cloud API.
Notebooks will not be the only interface for these APIs. I see also natural language interfaces such as Slack, Alexa or Siri. However, most of these interfaces are a simple front to an API, where the human utterances or text messages are translated to intents and slots, that is a different name for API calls and their arguments. These interfaces will be used a lot for quick exchange of information, but not for long-running projects of building meaningful innovative services in future organizations. These projects will better be built using notebooks that are gluing together the efforts across multiple API, across time (I can come back to my notebook tomorrow), and across people.
These two waves of more micro-services and more innovative people and their need to use these APIs require good tools. Jupyter notebooks are positioned well to play the role of “the glue”, beyond HTML editors, Wordpress, or even this site, Medium, as a SaaS browser-based platform. They give the flexibility to support more use cases than authoring static web or mobile sites, into the work tool of every knowledge worker, beyond Word documents or Excel spreadsheets.
Jupyter notebooks are still missing a lot of functionality, administration simplicity, environment providers and visibility, but they are already the best web-based tool for many use cases as we discussed above. I’ll continue to publish interesting notebooks to show what you can learn, build, and share with this powerful and open tool. Please feel free to share with me your notebooks or questions about using notebooks for your use cases and problems. We can together push the web to the next level.